How to Reduce Overfitting in Random Forest Regression

kaiomarques93 · June 16, 2021, 12:26pm

Hi,

I am trying to predict stocks prices using Random Forest Regression (rfr), and I’m using a function from scikit-learn like this

rfr=RandomForestRegressor(random_state=200, oob_score=True, max_features=‘sqrt’)

Now, I am using the r2_score with the test set and the predicted values to get an idea of the model’s accuracy. However I’m always getting a value for r2 above 0.9 which I think to be odd.

My data set has 30k data points. The only variable that I’m to change is the random_state. The goal of this post is to find an effective variable of the model which is able to control overfitting.

sneaky · June 18, 2021, 11:06am

Hi, bias-variance tradeoff is a complex issue. I think it is well described for RandomForests here: https://towardsdatascience.com/random-forests-and-the-bias-variance-tradeoff-3b77fee339b4 .

ageonsen · June 18, 2021, 9:28pm

How about setting max_features=int(1)?
This is a good reference for Numerai Tournament;
https://poseidon01.ssrn.com/delivery.php?ID=029123122124116016116079103000088086046017064042000030023121064104123102009109030098059042125003033000013011018067017083069092045061037061029013099121103079098068046061084090104007105026069123066115104022026082067025122108023065086113079113019101024&EXT=pdf&INDEX=TRUE

Topic		Replies	Views
Feature Neutralization Increases Bias and Reduces Variance Data Science	1	1237	June 4, 2024
Advice from the Kaggle which I've found very useful Data Science	2	2763	June 14, 2021
ShatteredX's Improved & Compact Feature Set (225 features) for v4.3 Midnight Data Data Science	13	3137	March 7, 2024
Which Model is Better? Tournament	44	2628	January 27, 2022
【日本語】Numerai Signals について雑談・質問 Other Languages	5	1744	August 23, 2021

How to Reduce Overfitting in Random Forest Regression

Related topics