Signals: Plugging in the data from Quandl

surajp · March 18, 2021, 2:51pm

Quandl example model: example_model_quandl.py

Google Colab notebook: Signals_Quandl_EOD_baseline.ipynb

Quandl is a financial, economic, alternative data marketplace which provides premium and free data.

One such data source is End of Day US Stock Prices by QuoteMedia (premium, so need to set API_KEY in example_model_quandl.py).

Updated daily, this data feed offers end of day prices, dividends, adjustments and splits for US publicly traded stocks with history to 1996. Prices are provided both adjusted and unadjusted.

This example downloads the whole ‘Time-series’ data in a .zip file and loads Adj_Open and Adj_Close columns from it. However, specific tickers for specific time span can also be loaded iteratively using API(much slower). Getting started with the API.

While the feature extraction and modeling part are very similar to the main example_model.py, the focus here is to make the data loading flexible so different data sources can be easily ‘plugged’.

Steps to re-arrange the data as in Signals’ main example_script.py

Find common tickers between EOD data source ticker list and Numerai Signals Universe’s yahoo tickers.
Specify the columns in download_full_and_load with common tickers and rename columns as required by feature extraction setup.

    # column names in the csv file without headers
    cols = [
        "ticker", "date", "Open", "High", "Low", "Close", "Volume", "Dividend",
        "Split", "Adj_Open", "Adj_High", "Adj_Low", "Adj_Close", "Adj_Volume",
    ]

    # usecols refers to the column in the csv.
    # using only [ticker, date, adj_open, adj_close]
    # Loading only needed columns as FP32
    print("loading from csv...")
    full_data = pd.read_csv(
        f_name,
        usecols=[0, 1, 9, 12],
        compression="zip",
        dtype={0: str, 1: str, 9: np.float32, 12: np.float32},
        header=None,
    )

    # renaming the columns
    filter_columns = ["ticker", "date", "Adj_Open", "Adj_Close"]
    full_data.columns = filter_columns
    full_data.set_index("date", inplace=True)
    full_data.index = pd.to_datetime(full_data.index)

Map ticker names to Bloomberg tickers using Numerai’s Bloomberg ticker map.

    full_data = full_data[full_data.ticker.isin(common_tickers)]
    full_data["bloomberg_ticker"] = full_data.ticker.map(
        dict(zip(ticker_map["yahoo"], ticker_map["bloomberg_ticker"]))
     )

After creating a day_chg column and applying RSI and SMA on them, features are quintiled and lags are calculated as in main example_model.py.

Validation results:

Thanks @_liamhz for the feedback

richai · March 19, 2021, 6:36pm

nice! not a bad result for such simple data.

you say this is premium, how much does this data cost you per month?

(also did you train on validation for this great cumsum graph?)

surajp · March 20, 2021, 4:52am

This one costs me USD $49/mo. but I guess some organizational licensing is there.

This wasn’t trained on validation data. However, this has some extra features compared to default yfinance example_model.py:

a day_change column
RSI: (14, 21)
SMA: (14, 21)
quintilation factor to 100 from 5

we get 72 features after computing lags.

Topic		Replies	Views
【日本語】Numerai Signals について雑談・質問 Other Languages	5	1742	August 23, 2021
How to download quarterly reports from IEX Cloud Signals	0	701	August 12, 2021
Stock Data Transform Example Notebook Signals	1	1140	September 8, 2021
Signals V1 Data Release Signals	4	2702	February 23, 2025
Free or cheap data and tools for Numerai Signals Signals	18	10725	July 23, 2021

Signals: Plugging in the data from Quandl

Related topics