Quandl example model: example_model_quandl.py
Google Colab notebook: Signals_Quandl_EOD_baseline.ipynb
Quandl is a financial, economic, alternative data marketplace which provides premium and free data.
One such data source is End of Day US Stock Prices by QuoteMedia (premium, so need to set API_KEY
in example_model_quandl.py
).
Updated daily, this data feed offers end of day prices, dividends, adjustments and splits for US publicly traded stocks with history to 1996. Prices are provided both adjusted and unadjusted.
This example downloads the whole ‘Time-series’ data in a .zip file and loads Adj_Open and Adj_Close columns from it. However, specific tickers for specific time span can also be loaded iteratively using API(much slower). Getting started with the API.
While the feature extraction and modeling part are very similar to the main example_model.py
, the focus here is to make the data loading flexible so different data sources can be easily ‘plugged’.
Steps to re-arrange the data as in Signals’ main example_script.py
- Find common tickers between EOD data source ticker list and Numerai Signals Universe’s yahoo tickers.
- Specify the columns in
download_full_and_load
with common tickers and rename columns as required by feature extraction setup.
# column names in the csv file without headers
cols = [
"ticker", "date", "Open", "High", "Low", "Close", "Volume", "Dividend",
"Split", "Adj_Open", "Adj_High", "Adj_Low", "Adj_Close", "Adj_Volume",
]
# usecols refers to the column in the csv.
# using only [ticker, date, adj_open, adj_close]
# Loading only needed columns as FP32
print("loading from csv...")
full_data = pd.read_csv(
f_name,
usecols=[0, 1, 9, 12],
compression="zip",
dtype={0: str, 1: str, 9: np.float32, 12: np.float32},
header=None,
)
# renaming the columns
filter_columns = ["ticker", "date", "Adj_Open", "Adj_Close"]
full_data.columns = filter_columns
full_data.set_index("date", inplace=True)
full_data.index = pd.to_datetime(full_data.index)
- Map ticker names to Bloomberg tickers using Numerai’s Bloomberg ticker map.
full_data = full_data[full_data.ticker.isin(common_tickers)]
full_data["bloomberg_ticker"] = full_data.ticker.map(
dict(zip(ticker_map["yahoo"], ticker_map["bloomberg_ticker"]))
)
After creating a day_chg
column and applying RSI and SMA on them, features are quintiled and lags are calculated as in main example_model.py
.
Validation results:
Thanks @_liamhz for the feedback