Signals: Plugging in the data from Quandl

Quandl example model: example_model_quandl.py

Google Colab notebook: Signals_Quandl_EOD_baseline.ipynb

Quandl is a financial, economic, alternative data marketplace which provides premium and free data.

One such data source is End of Day US Stock Prices by QuoteMedia (premium, so need to set API_KEY in example_model_quandl.py).

Updated daily, this data feed offers end of day prices, dividends, adjustments and splits for US publicly traded stocks with history to 1996. Prices are provided both adjusted and unadjusted.

This example downloads the whole ‘Time-series’ data in a .zip file and loads Adj_Open and Adj_Close columns from it. However, specific tickers for specific time span can also be loaded iteratively using API(much slower). Getting started with the API.

While the feature extraction and modeling part are very similar to the main example_model.py, the focus here is to make the data loading flexible so different data sources can be easily ‘plugged’.

Steps to re-arrange the data as in Signals’ main example_script.py

  1. Find common tickers between EOD data source ticker list and Numerai Signals Universe’s yahoo tickers.
  2. Specify the columns in download_full_and_load with common tickers and rename columns as required by feature extraction setup.
    # column names in the csv file without headers
    cols = [
        "ticker", "date", "Open", "High", "Low", "Close", "Volume", "Dividend",
        "Split", "Adj_Open", "Adj_High", "Adj_Low", "Adj_Close", "Adj_Volume",
    ]

    # usecols refers to the column in the csv.
    # using only [ticker, date, adj_open, adj_close]
    # Loading only needed columns as FP32
    print("loading from csv...")
    full_data = pd.read_csv(
        f_name,
        usecols=[0, 1, 9, 12],
        compression="zip",
        dtype={0: str, 1: str, 9: np.float32, 12: np.float32},
        header=None,
    )

    # renaming the columns
    filter_columns = ["ticker", "date", "Adj_Open", "Adj_Close"]
    full_data.columns = filter_columns
    full_data.set_index("date", inplace=True)
    full_data.index = pd.to_datetime(full_data.index)
  1. Map ticker names to Bloomberg tickers using Numerai’s Bloomberg ticker map.
    full_data = full_data[full_data.ticker.isin(common_tickers)]
    full_data["bloomberg_ticker"] = full_data.ticker.map(
        dict(zip(ticker_map["yahoo"], ticker_map["bloomberg_ticker"]))
     )

After creating a day_chg column and applying RSI and SMA on them, features are quintiled and lags are calculated as in main example_model.py.

Validation results:

Thanks @_liamhz for the feedback :slight_smile:

3 Likes

nice! not a bad result for such simple data.

you say this is premium, how much does this data cost you per month?

(also did you train on validation for this great cumsum graph?)

This one costs me USD $49/mo. but I guess some organizational licensing is there.

This wasn’t trained on validation data. However, this has some extra features compared to default yfinance example_model.py:

  • a day_change column
  • RSI: (14, 21)
  • SMA: (14, 21)
  • quintilation factor to 100 from 5

we get 72 features after computing lags.

2 Likes