Trouble understanding datasets


It may be a trivial question. I do not understand what “t_id” is… and why it is in tournament set and not in training set…


Do you mean the era column?


If you’re seeing a t_id column, then you either have a very old dataset or you are looking at an old example/walkthrough. Both the training and tournament sets now have an “id” column, whereas it used to be the case that only the tournament dataset had a “t_id” column. This is just an identifier for each datapoint and is only useful for Numerai when they are mapping our predictions into whatever the actual data points are.

  1. do all those id (like eg n2b2e3dd163cb422) mean different assets (like stocks, bonds, etc.)?
  2. so what is target?


yes, i also do not know what era is…


Ignore id, it’s not relevant for participants in any way other than it needs to be included in your predictions file.

Target is what you’re trying to predict.

Era is some minimal time information. In that all samples from the same era come from roughly the same time period, and samples in different eras come from a different time period, but no official information has really been provided for it beyond that.