16GB Intermediate solution: XGB Era Boosting

mesomachukwu12 · January 3, 2022, 11:17pm

I modified example_advanced_32GB.py as follows;

do ensembles

    training_data["ensemble_neutral_riskiest_50"] = sum(
        [training_data[pred_col] for pred_col in pred_cols if pred_col.endswith("neutral_riskiest_50")]).rank(
        pct=True)
    training_data["ensemble_not_neutral"] = sum(
        [training_data[pred_col] for pred_col in pred_cols if "neutral" not in pred_col]).rank(pct=True)
    training_data["ensemble_all"] = sum([training_data[pred_col] for pred_col in pred_cols]).rank(pct=True)
    training_data["preds_model_target_neutral_riskiest_50"] = sum([training_data[pred_col] for pred_col in pred_cols]).rank(pct=True)

    ensemble_cols.add("ensemble_neutral_riskiest_50")
    ensemble_cols.add("ensemble_not_neutral")
    ensemble_cols.add("ensemble_all")
    ensemble_cols.add("preds_model_target_neutral_riskiest_50")

br1 · January 7, 2022, 10:33pm

much appreciated, resolved it. I was off chasing the error in utils thinking it was spitting it out when it was calling validation_metrics

mesomachukwu12 · January 9, 2022, 2:53pm

@bigbertha which model do you use example_advanced_32GB.py or the intermediate?

objectscience · January 10, 2022, 5:42pm

I’m just getting caught up on all of this, I’ll try to get all of this updated in the next week or so. Holidays have me way behind. @mesomachukwu12 appreciate your help here. Thank you!

bigbertha · January 10, 2022, 8:35pm

I have a different way of constructing my models from the way the example model is built. But I am using all available information on feature groupings (I really miss feature groups!)

mesomachukwu12 · January 15, 2022, 4:36pm

@bigbertha, I will appreciate if you can share with me [email protected]

objectscience · January 15, 2022, 5:40pm

I’ve been giving this a lot of thought since yesterday’s TC announcement: I’m not sure I’m going to be able to reproduce the boruta output when the new classic data drops. The initial run took a little over a week and when the data 3x’s, that could obviously push out much further. This is being further complicated by my new signals pipeline which is turning into a compute black hole.

I just wanted to give you all a heads up and time to prepare for the next data drop. GL!

mesomachukwu12 · January 15, 2022, 6:07pm

Thanks for the heads up. The challenge is good

bigbertha · January 15, 2022, 10:08pm

No worries your work is highly appreciated!

gcotti · January 22, 2022, 11:16pm

Hey, can you explain why exactly you need to fill int8 with 2 instead of 0.5 in this case? Been trying to find the answer myself and can’ seem to sort it out

wigglemuse · January 22, 2022, 11:21pm

int8 = integers only, so instead of values 0,0.25,0.5,0.75,1.0, you have 0,1,2,3,4. So 2 is the middle/neutral value instead of 0.5.

gcotti · January 22, 2022, 11:25pm

Ahhhh yeah how embarrassing… here I was thinking it was something to do with the pandas fillna method.

Thanks!

objectscience · January 22, 2022, 11:58pm

I actually made the exact same mistake, don’t think anything of it.

joakim · March 31, 2022, 9:56am

Hey @objectscience, I’m not able to access the repo anymore.

objectscience · April 1, 2022, 2:49am

Almost sent you a message the other day, haven’t seen you on the forum in a minute. Hope all is well.

Things on this end have gotten a little dicey since the first of the year and I’ve largely had to pull away from the comp. I was having a really hard time staying on top of the things that needed addressed with the repo and with TC and the new data-set coming thought it was best to pull it, so people didn’t waste time trying to use something that is out of date.

I still have some of the output from the run though, can paste it here if it will help.