Numerapi command line interface with new data format

Hi - I’m using uuazed’s numerapi command line interface to download the Numerai data (scripted). I cannot really find the right option to change to the new data format. In fact, ideally I’d like to download only the tournament parquet file (as the other files don’t change that often). Can somebody help me out here?

napi = numerapi.NumerAPI()
napi.download_dataset( "numerai_tournament_data.parquet")

This is what works for me!

Indeed, the cli interface doesn’t work with the new data yet. I’ve ticketed it and will take a look soonish. Of course, PRs are welcome :wink:

1 Like

Nice. Does uploading work with both old and new datasets?

(Small comment BTW: the warning says “use download_dataset_old” but should say “use download-dataset-old” (for command line users))

1 Like

Thanks for pointing out the type, I fixed that. Uploading now also works for both datasets when using the command line interface. You need to add --new-data flag

Hi - Is it planned to implement the cli interface for the v4 data also?

Have you tried? Anything not working?

Yes.

numerapi download-dataset --filename numerai_tournament_data.parquet

works, but

numerapi download-dataset --filename “v3/numerai_tournament_data.parquet”
numerapi download-dataset --filename “v4/live.parquet”
numerapi download-dataset --filename live.parquet

do not (with or without quotes). Also, I wouldn’t know which switch to use for submitting, as “–new_data” obviously assumes version v3, not v4. I haven’t tried that yet though.

I double checked and made some tweaks to numerapi. Both, downloading and uploading, should now work via the cli.

  • re downloading: Check that the filename is valid by calling numerapi --list-datasets. If the filename contains a directory, like v3, it tries downloading to a directory as well. The latest numerapi version will ensure directories exist. The other workaround is to specify the dest_path explicitely` or to create the directory manually.
  • re uploading: that --new_data flag is no longer needed and uploading works the same across all data versions. In fact, the live ‘ids’ are the same for v2, v3 and v4.

Thanks. Creating the directory beforehand did the trick. I also hadn’t upgraded to the latest version (which then, as you say, doesn’t need the directory anymore). Shame on me. Will try the submission on Saturday.