take a closer look at the first 100 models and most of them will demonstrate the same pattern. I believe that the nature of this pattern is the period shocks so-called “good” or “bad” eras - that are easy or difficult to predict on the current dataset.
I’d go with your #1 but not #2 as I don’t use any ML techniques, it’s all old-school analysis and stats pour moi, and I have a similar overall pattern.
I think what would really help would be if Numerai simply started providing the target values for the test data once the test data was a year or so old.
I actually first started working with various transform methods + simple inversion, and moved on from there. I guess my workhorse routines now are principal components analysis, kernel density estimation, and Gaussian mixtures. I think it’s more a question of habitude than anything else; I’ve been using tools like those for decades, and I’m comfortable with them (I’m old enough to be one of the early adopters of Numerical Recipes, back when 386s and math coprocessors were the hottest things on the block ).
Thanks for the trip down the memory lane! My first PC was a 486DX4-100, but my school had a couple of 386s before that, and I’d even managed to get a copy of the Intel 386 Programmer’s Reference Manual, with some difficulty. I’d spent a lot of time with those 386s after school, learning to write DOS TSRs with TC and TASM.
That’s an old classic! Modeling enough patterns that survive, one can see those patterns that did not survive…
In this instance, they mapped during world war 2 all the bullet holes that were on the airplanes that came back. They used this bullet hole map to decide to re-enforce the airplanes in the areas which did not have red dots - presumably airplanes that were hit in those areas went down and did not make it back!