Python is a good place to start if you are new to programming, there are many tutorials online for Python and it’s a fairly easy language to pick up.
I will also echo the comment from Daenris. This is not a good project for a first time programmer. This is a very difficult machine learning problem and lots of the techniques you will learn from online tutorials and machine learning material do not apply here. The principles are the same, but the execution is totally different.
I think what Daenris was alluding to with the “signal to noise” comment was that in the Numerai dataset, it’s really difficult to find the “correct” answer. Think of it this way; Imagine you were to play three pieces of music at the same time. It’s fairly easy to “find” the song you wanted from that resulting noise and “extract” it. This is a high signal to noise ratio. Now imagine playing thousands of pieces of music at the same time. Now it’s much harder to “extract” the song you want. This is a low signal to noise (this is very simplified explanation).
In normal machine learning tutorials you see online, the algorithms can predict results to a very high degree of accuracy (ie, the “signal” is much larger than the noise). In the Numerai dataset, the algorithms can barely predict better than 50% accuracy (ie, the “signal” is almost identical to the noise).
I hope that helps. Good luck with the competition.