Fellow Numeratis:
Just to be clear, this isn’t some kind of “campaign message” the following is something I really want to do - to bring some of my favourite ML/DS communities closer together, and do greater things together. it IS my main idea going into the CoE membership voting, but I would also just like to share it so that we can perhaps do something together. So, here we go:
My main proposal would be to facilitate “targeted activities” to Extend awareness of Numerai to the wider ML/DS community , and by doing this also facilitate high-quality content sharing between us.
Rationale
Having been to NumeraiConf, and having talked to the team and other participants, I believe this is the most critical topic shared by the team and participants. Wider engagement and participation are critical for more model diversity for the metamodel, more talents participating leading to better idea sharing and mutual learning, and better recognition of the project and the NMR token.
Meanwhile, I believe that the Numerai dataset, with its clean data, steady & improving training and validation sets, and readily productionalised process & toolkits, is probably the most ideal open dataset around for data scientists of all levels to learn all kinds of tricks (feat. engineering, selection, dim. reduction, representation learning, stacking, ensembling, etc), so there is clearly something we should work more to make the wider ML/DS community to be aware of.
At the same time, platforms like Kaggle are offering less and less interesting tabular competition, and this seems to be an unsolvable problem ever since they got acquired by Google. So even just targeting 10 million Kaggle registered users, there is a lot to gain both for recognition for the Numerai projects, and get more people to start building models around the numerai dataset
Concrete Actions
Here are some of the things I am confident we can achieve with the sponsorship from the CoE, some of them needed to be better thought through, but chiefly I would like us to bring numerai to wider DS/ML community, get them to try it out, and then make it easy & straight forward for them to join the tournaments properly.
-
Assemble a technical task force to create high-quality and “ready to run” technical notebooks on subjects that are related to Numerai, and share them widely. one of the best examples is this by @perfect_fit - look at the number of upvotes there! we can also create tutorial series like this to cover a wide range of end-to-ent stuff
-
Launch CoE-hosted numerai-related competitions, such that they are run by the numerai community (i.e. not by the team), and can come in different iterations. My first idea would be using Kaggle’s well-established community competition platform (free storage & compute & comp. hosting) where we can host numerai datasets (V2/V3), incrementally update (V4), provide baseline examples, set our metrics, and encourage both Numeratis and non-numeratis to compete side-by-side by individuals or by teams. The CoE can give out non-cash kudo to the winners, and the best content sharers - Cash not allowed as prize for community challenges, giving NMR as prize might be controversial given certain event in the distant past (but why not) With well-oiled APIs on both sides (Kaggle & Numerai) we can even set up our own LB and do a TC optimisation challenge! just need to think through how to set such a competition up.
Anyway, these are my main thoughts for now, nice to have this opportunity to share it here, and hope it doesn’t sound too outlandish - but this is a DAO right? so it shouldn’t be lacking ambition happy to discuss further to hammer out the details regardless the voting results!