16GB Intermediate solution: XGB Era Boosting

I modified example_advanced_32GB.py as follows;

do ensembles

    training_data["ensemble_neutral_riskiest_50"] = sum(
        [training_data[pred_col] for pred_col in pred_cols if pred_col.endswith("neutral_riskiest_50")]).rank(
        pct=True)
    training_data["ensemble_not_neutral"] = sum(
        [training_data[pred_col] for pred_col in pred_cols if "neutral" not in pred_col]).rank(pct=True)
    training_data["ensemble_all"] = sum([training_data[pred_col] for pred_col in pred_cols]).rank(pct=True)
    training_data["preds_model_target_neutral_riskiest_50"] = sum([training_data[pred_col] for pred_col in pred_cols]).rank(pct=True)

    ensemble_cols.add("ensemble_neutral_riskiest_50")
    ensemble_cols.add("ensemble_not_neutral")
    ensemble_cols.add("ensemble_all")
    ensemble_cols.add("preds_model_target_neutral_riskiest_50")

much appreciated, resolved it. I was off chasing the error in utils thinking it was spitting it out when it was calling validation_metrics

@bigbertha which model do you use example_advanced_32GB.py or the intermediate?

I’m just getting caught up on all of this, I’ll try to get all of this updated in the next week or so. Holidays have me way behind. @mesomachukwu12 appreciate your help here. Thank you!

I have a different way of constructing my models from the way the example model is built. But I am using all available information on feature groupings (I really miss feature groups!)

2 Likes

@bigbertha, I will appreciate if you can share with me [email protected]

I’ve been giving this a lot of thought since yesterday’s TC announcement: I’m not sure I’m going to be able to reproduce the boruta output when the new classic data drops. The initial run took a little over a week and when the data 3x’s, that could obviously push out much further. This is being further complicated by my new signals pipeline which is turning into a compute black hole.

I just wanted to give you all a heads up and time to prepare for the next data drop. GL!

Thanks for the heads up. The challenge is good

1 Like

No worries your work is highly appreciated!

1 Like

Hey, can you explain why exactly you need to fill int8 with 2 instead of 0.5 in this case? Been trying to find the answer myself and can’ seem to sort it out

int8 = integers only, so instead of values 0,0.25,0.5,0.75,1.0, you have 0,1,2,3,4. So 2 is the middle/neutral value instead of 0.5.

Ahhhh yeah how embarrassing… here I was thinking it was something to do with the pandas fillna method.

Thanks!

I actually made the exact same mistake, don’t think anything of it.

2 Likes

Hey @objectscience, I’m not able to access the repo anymore. :frowning:

Almost sent you a message the other day, haven’t seen you on the forum in a minute. Hope all is well.

Things on this end have gotten a little dicey since the first of the year and I’ve largely had to pull away from the comp. I was having a really hard time staying on top of the things that needed addressed with the repo and with TC and the new data-set coming thought it was best to pull it, so people didn’t waste time trying to use something that is out of date.

I still have some of the output from the run though, can paste it here if it will help.