On traditional quant projects, synthetic data generation to support backtesting is becoming a common practice, there is a good summary of it in the appendix A of “Machine Learning for Asset Managers”, by Marcos Lopez de Prado.
We can use various different methods to generate synthetic data, one of the most promising ones is GANs (Generative Adversarial Networks).
I have searched for discussions on the forum about this topic but couldn’t find any, so I’m posting this to bring it up. What do you think? Could it be useful in numerai?
no, I haven’t tried such approach so far. and probably won’t be doing it due to the fact that they are releasing a larger dataset soon. Also, I have been pretty happy with my models so far
I would wait till the new dataset arrivate anyway, because there will be additional validation data to play with
I’m curious about how effective this would be. Is there a sanity test you can use on the synthetic samples to verify that they reflect patterns in the dataset? You can’t eyeball them like image GANs.
Yes, there is, but they’re fairly more complex than for other types of non-time series data (like images).
Here’s an example:
And you can find the source code here:
Marcos Lopez de Prado books (“Advances in Financial Machine Learning” and “Machine Learning for Asset Managers”) also have discussions about this problem, but unfortunately they aren’t available online for free, so I can’t link them.
there have been plenty of chatters on rocket chat, the released date is supposed to be 8th Sep, so in 2 days time. They also say there will be a post in the forum.