Signals Starter at Kaggle

Hi all:D

I made a starter for Numerai Signals some time ago in the kaggle platform. The purpose was to encourage more people to join the Signals. This is another Signals starter using YFinance (free) stock price data, which I also update on a daily basis as a kaggle dataset.


[NumeraiSignals] Starter for Beginners


YFinance Stock Price Data for Numerai Signals

Although this end-to-end notebook uses the minimal set of features and modeling techniques, it has demonstrated OK-level performance so far.

By adding more features and/or using more sophisticated modeling, you can do even better.


I needed to change my kaggle dataset a bit to avoid the potential OOM issue. If you use my kaggle dataset for the Numerai Signals, please see the following discussion to follow what has been changed:

Presumably all you need to change in your pipeline is to use pd.read_parquet instead of pd.read_csv.

Hi Katsu1110, thanks for sharing the code.

I am trying your code cell by cell using colab. in the cell 13, try to fetch the data from kaggle:

df = pd.read_parquet(pathlib.Path(f’{CFG.INPUT_DIR}/full_data.parquet’))

And i get error:

FileNotFoundError: …/input/yfinance-stock-price-data-for-numerai-signals/full_data.parquet

Do i miss anything at configuration section ?


1 Like

Hi autratec,

If you use the notebook outside Kaggle, you need to place the stock price data from yfinance in your environment.

If you haven’t downloaded the data in your colab environment, download it and replace the CFG class in the notebook with your data path.

thanks for the quick reply. i have tried use API to download the yahoo data. It took me around 1HR. but still working. unfortunately, my colab was crashed in the later steps due to lack of enough resource.

I need to figure the way to reduce the memory usage. btw, can you help come out al light version of data set, like SP500, rather than whole 5K stocks prediction which is pretty heavy to those free environment. Just a thought, and hope it can work.

Kaggle notebook is also free, you know. You can simply run my notebook on it without problems.

When it comes to the SP500, I have another dataset for the world indices. This may be something you might be interested in.

kastu1110, i have just tried Kaggle notebook. the whole script is running fine, expect the final submission session. i made some minor changes and be able to submit it. thanks again of sharing the code.

1 Like