How to Reduce Overfitting in Random Forest Regression


I am trying to predict stocks prices using Random Forest Regression (rfr), and I’m using a function from scikit-learn like this

rfr=RandomForestRegressor(random_state=200, oob_score=True, max_features=‘sqrt’)

Now, I am using the r2_score with the test set and the predicted values to get an idea of the model’s accuracy. However I’m always getting a value for r2 above 0.9 which I think to be odd.

My data set has 30k data points. The only variable that I’m to change is the random_state. The goal of this post is to find an effective variable of the model which is able to control overfitting.

Hi, bias-variance tradeoff is a complex issue. I think it is well described for RandomForests here: .

How about setting max_features=int(1)?
This is a good reference for Numerai Tournament;

1 Like