Missing values in targets

Hi all,

Just wanted to let you know that, at the moment, there are apparently missing values in nearly all targets, both in the training as well as in the validation data:

Furthermore, the symmetry is not any longer warranted for many of the targets.

So please be careful if you have a model that’s re-trained (semi-)automatically.

What do you mean exactly? There are always some missing targets (in the non-nomi targets). Nothing new there – unless targets that used to be there for some rows disappeared or there was just a massive amount of targets missing. I update weekly to get the new eras from the validation set that have had their targets added (I only am interested in eras with both 20d & 60d targets). As part of this process, I double-check that no previous era that I’ve already saved/processed has any changes. The data has always passed this check – no row has ever changed, no target (once added) has ever disappeared, and no missing target (once all the targets around it have been filled-in) has ever gotten filled-in later. But some targets are just missing, yeah. (But it was known and announced that some would be when they started generating multiple targets.)

And I just re-checked the training set – nothing has changed.

Hi wigglemuse,

For some targets I see 29K missing in the training set, which represents around 1%. And for the validation set it’s just shy of 79K, representing around 3.2%. This is not a massive amount, but also not negligible.

It’s the first time I notice it, but then you seem to have more sophisticated checks in place than me. I might have raised the alarm bells too quickly, in which case I apologize.

Yes, so that’s normal. For whatever reason they can’t generate some targets for some rows (probably companies going out of business, mergers, splits, whatever). It is possible (I’m just guessing) that some of the nomi targets actually should be missing, but they fill them in anyway with 0.5s (maybe) just because we are scored on nomi and that’s supposed to be the “main” target. (for now)

And I’m sure you’ve also noticed that with the newest datasets and going forward, some features are also missing, but not with individual rows in an era (which should not happen I’m told), but whole columns for an era with features missing. And that could potentially happen even in new live data. If it ever does, I expect many people to not submit that day as it will screw up many models…

Thanks a lot for confirming. This is nice - no need for major concerns and I can go ahead as planned :slight_smile: