Is it a bad idea to use validation set for learning ensemble weight?

First of all, i have trained a bunch of tree models. Then I tried to fit a linear model on those tree models without any constraints to maximize the Sharpe ratio. The optimization cannot converge but during the training process, the sharpe can be increased to a crazy high value (I stopped the optimization halfway around 100 iterations, the sharpe is already at around 8).
After thinking a bit more, i think we at least need to put constraints on using positive weight only as it doesn’t make sense to have negative weight. Below are the result (i also include the ensemble of selection top 25 models based on sharpe):

@maxchu the moment you use the validation set for anything (e.g. learning ensemble weights, early stopping on your training base learners, picking tree architecture, etc) it ceases to become an out-of-sample and the test results are not meaningful. If you want to utilize the validation data for live models, I would suggest:

  • first cut the training set into train-train, and train-validation
  • automate your training, optimization, architecture selection there
  • feed the validation set “once” to this pipeline, and if you like the results
  • rebuild the models with the same pipeline using the train and validation sets and deploy

I think the MLDP book had a section on quantifying how much each “look” at your out-of-sample test reduces the reliability of the test results.