A very common question. And yes, that’s exactly what it means – you cannot track a specific stock from era to era – the ids are essentially just random one-time-use identifiers. (If you are dying to do time-series analysis, take a look at the Signals side of things, but you have to bring your own data in that case.)
This seems to be done to deliberately prevent historical or time-series analysis. I think most noobs (like me!) come into this with the impression that that is what is to be done - time series analysis. But I have come to understand that that is not the case! You have some features and an outcome (target). Use features at era X to guess target at era X. Or perhaps more accurately, use features at era X to guess target at era X+δX. Your model will be used on features Y to determine how the hedge fund will invest at Y+δY. This is the strategy of the hedge fund I assume… we have some data right now - can it tell us what will happen in the near future with a stock price. They are not trying to build a ‘case’ over time, with more time data improving the prediction.
Yes, it is best just to forget all about the stock market when you are first making models. It is a black box problem with unknown features. Trying to apply financial domain knowledge will just frustrate you because you can’t.
Eras are in chronological order allowing you to use a walk forward approach to validation. You may not be able to break them down into individual tickers but you can still apply time series thinking to your development.
Additionally, these groupings give you insight to when your model does well and when it doesn’t. Building a separate model(s) on difficult eras and adding it to your ensemble may help you generalize better in the future.
Finally, if you want to dig deeper, modeling the eras individually can reveal differences in feature influence. You may find Eras grouped by FI perform better/worse with different levels of neutralization. Weaving into the above ensemble, models with varying FN, could have impacts on generalization as well as MMC.
Admittedly I am new and learning, but the way I see it is this… each era can be considered an independent sample of data with many features that result in a target.
Say I want to predict what kind of vehicle is coming down my road next. It is far away so I can only determine some coarse grain properties. I can see its color, I can see how fast it is going, I can see if its exhaust is clear or sooty. Yellow, slow, sooty features suggest the next vehicle will be a school bus (in North America anyway). If those features were red, fast and clear exhaust that would suggest a Ferrari. Now there is no way this data will tell me what the vehicle after the Ferrari is, so no point in trying to model yellow->red and slow->fast and soot->clean.
So, there is probably no pattern in the sequence of vehicles as they come down my road (maybe there is - you can always look for one) - but since Tournament is arranged the way it is my guess is the underlying assumption is stock prices are best predicted by current features, not long term feature patterns. The hedge fund people probably know what they are doing. But then again Signals is now a thing, and something I know nothing about yet, so this is a whole different box I’m not ready to open yet.
You are assuming though that such time-based things are not already baked into the features, and they may well be. At least some of them. Also sometime soon the # of features is going to explode (x10) with various versions of the same features and maybe some truly novel ones.