Four things collided during the long Christmas weekend:
- My building anxiety over not sciencing the numerai data for a year
- Faith II v5.2 changes everything (yet again)
- Codex CLI with gpt-52-xhigh is a beast. It will not let go until it gets stuff done. Biggest improvement to my AI workflows in (checks watch) about four weeks.
- The Kaggle Grandmasters Playbook (ht @ia_ai) has been begging for attention like a lonely golden retriever.
So I figured, let’s have some fun.
The Skill
I asked Claude Opus to build a skill based on the Kaggle Grandmasters Playbook.
Here is the unedited conversation, including my silly-ass questions and Claude’s thinking traces. I am the “Human” (say it with a Ferengi accent for additional fun).
Claude decided to specialize the tabular-ml-modeling skill for numerai instead of describing the numerai bits in the prompt. It’s his show, so I went along. Feeling the AGI.
The Machine
Started with the Kaggle machine (ht @svendaj, your examples and data rock).
Codex couldn’t find its groove in 30GB of RAM though. So I said what the heck, it’s Christmas, and splurged on a month of Colab Pro+. This got me enough units for roughly 80 hours of a 160GB RAM / A100-80GB instance.
Installed Codex on Colab. Told it to use the skill to solve the numerai problem, and keep iterating until extraordinary.
What Happened So Far
Codex got to work. No questions, no hesitation, total confidence. Used the skill, built and ran a bunch of throwaway scripts to explore data, built a pipeline for ensembling XGBoost, CatBoost, LightGBM, and just started cranking.
I got a few hours of sleep (not getting much of that since they doubled the Claude and Codex limits until New Year).
Woke up to see numerai correlation > 0.20 and tmbc > 0.17 in the interim models. After allowing myself to bask in the glory of my prompting skills for a full minute, I copied the Codex training code into a ChatGPT conversation to find the data leak.
Well, not a leak from val to train exactly, but let’s say GroupKFold is not the right way to cut temporal data. Quick search on Discord, update the skill with code from the grandmaster (ht @shatteredx) TimeSeriesSplitEras.
Oh well, still got plenty of Colab credits. Restart runtime, gave Codex the new skill, it rebuilds its complete pipeline, and starts running.
That Brings Us to This Morning
Codex finished building the deep XGBoost models on six targets: out-of-fold validation numerai_corr ~0.0330, correlation contribution ~0.0030 (against v52_lgbm_ender20).
It has now moved onto CatBoost, which is taking a long time on a single CPU before each GPU burst. I may need to get in there and adjust some n-threads parameter manually, but I’m letting Codex work through things itself for now.
During the first (failed GroupKFold) run I used a second terminal to work on a different task on Colab. Codex saw it on the process list and freaked out, so I’m giving it its space and not touching anything.
I read Colab Pro+ instances shut down after 24 hours. I’m taking snapshots of the ~/.codex folder every minute, so I’ll restart and resume the conversation if that happens. We’ll see.
It still needs to build the CatBoost and LightGBM models, stack and ensemble them, and if it follows the skill, go back and rebuild models with 100% of the data.
I don’t think my Colab hours will be enough at the rate training is going, but there are enough logs and indicators that it doesn’t need to finish for me to get some value out of this exercise.
I’m not expecting a eureka moment out of this and will post back with whatever happens. Or not, if they happen to be post-human good!
Are we having fun
I gotta tell you, from not having anything remotely resembling AI agency 10 months ago to watching Codex do its thing is just mindblowing.
Keep iterating until extraordinary was the last sentence in Claude’s prompt to Codex. I suppose that’s the dream?