V4 data realease - questions

Hi,

  • I downloaded the new v4 data and the last era (era 1004) has target filled with NaNs.
    Does the era 1004 correspond to the round 310? Respectively, is the era 1004 the current era?

  • Also, are there any missing weeks? Is it always true that era(X) and era(X+1) corresponds to consecutive weeks?

  • Another thing that I found is that compared to V3 some eras have different amount of instances. Is that OK? (I checked only validation data.)

era  V3   V4
0871 4910 4911
0872 4918 4919
0873 4932 4933
0875 5051 5052
0902 4997 5002
0912 5174 5182
0913 5001 5009
0914 5203 5212
0915 5185 5193
0916 5183 5191
0936 5192 5191
  • How does the mapping of features in features.json work? I wanted to harvest my previous research on features so I wouldn’t have to start over again; however, I found only an array of features inside the json file.: json.load('features.json')['feature_sets']['v3_equivalent_features'], so I thought it should match the ordering of features from the V3 dataset, but when I run correlation test on each pair it is not exact match.
0.9930286498066472
0.9973728390894694
0.9618079870303051
0.9936341791676602
0.9831262504964846
0.9934326084172069
0.9990904802940008
0.9916135182068833
1.0
1.0
1.0
0.9996968267646669
0.998989490641361
0.9987875113360812
0.9989895927800676
0.9989895927800676
1.0
0.9937375180105527
0.9704001245800853
0.9956573427977774
0.9165522644319917
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0

Thank you!

Sneaky

3 Likes

Another question: Will there be int8 versions for V4?

@sneaky GitHub - miciasto/numerai

1 Like

It is there, I managed to download it via api. I am afk, but I think you just need to add sufix _int8.

how can you specify in the api to download from v4 instead of v3?

@mundan see https://numer.ai/data/v4

1 Like

This link should definitely be added to Numerai Tournament Overview - Numerai Tournament .

thanks! I ended up there

Find here the shapes of the v3 data and v4 datav3/train v3/val v3/tour v4/train and v4/val respectively

 (2412105, 1073)  # v3/train
  (539658, 1073)  # v3/val
 (1412927, 1073)  # v3/tour
 (2420521, 1214)  # v4/train
 (2203644, 1214)  # v4/val

v4/val has some test entries, but most of them are validation entries (2176973 of them have targets)

No more unlabeled data then!

1 Like

so just to settle this question, because I was wondering the same:

  • I downloaded the new v4 data and the last era (era 1004) has target filled with NaNs.
    Does the era 1004 correspond to the round 310? Respectively, is the era 1004 the current era?

which means:

  • the last era that is shipped in validation data corresponds to what was shipped as live data the week before
  • in other words: era = round + 695
2 Likes

Is it possible that the json file “v4/features.json” contains invalid json?

I am downloading the file via NumerAPI
napi.download_dataset("v4/features.json", "features.json")

When trying to parse the file using:

with open(".../<path to feature file>/v4/features.json", "r") as f:
    feature_metadata = json.load(f)
features = feature_metadata["feature_sets"]["v3_equivalent_features"]

I get the following error:

JSONDecodeError("Extra data", s, end)
JSONDecodeError: Extra data

Also the Chrome extension “{JSON} Editor” tells me that the json is invalid.

Am I missing something here?