The training data file is supposed to be used to build your model. Internally you probably want to use a holdout of this set for validation while optimizing your models, or use k-fold cross validation or something (as long as it’s aware of the era structure when making the folds).
The validation portion of the tournament data file is what is used to measure the performance of your model for the checks on the website. The ideal situation would be that you don’t use this data at all and only measure your performance on it after you’ve chosen your model to use.
The test portion of the tournament data file is presumably what Numer.ai uses as their validation data to create their meta model. We don’t have target information on that.
The live portion of the tournament data file is the data that neither us nor Numer.ai knows the targets for yet, it is what they are actually trading on and what our model’s performance is rated on 3 weeks after the end of the round.