What is your machine spec(CPU and GPU) for NUMERAI?

Yeah I probably could’ve been smarter about some things but yeah I ran out of ram when I was modeling with XGBoost and I’m too lazy to get it working with less memory.

2 Likes

I currently use Google Colab Pro+ which has some sort of Intel Xeon, 50GB RAM, and an Nvidia Tesla V100. Occasionally I get lucky and score an A100 instance.

Yes, XGBoost is very greedy with memory, especially on Windows. It can easily run out of memory even with 64GB on the full v4 dataset.

For some reason, XGBoost behaves better under Linux with less “out of memory” scenarios. I am not sure why.

In my experience, LightGBM can handle twice as much data as XGBoost using the same amount of RAM.

EDIT: Google is now enforcing a credit system with Colab, can’t use their high end GPUs nonstop anymore :frowning: Probably will shift model training to my desktop, which is Intel 8700k, 64 GB RAM, Nvidia 3090.

1 Like

Yeah, but you can’t say XGBoooOOOOOOSTTT with LightGBM :confused:

1 Like

The dataset will probably get bigger in the future and even 128 GB won’t be enough. Hopefully the price of NMR 10x before that so i can buy a machine with even more memory.

4 Likes

@dzheng1887
By the way, why do you use XGBoost? You do not use LightGBM?

CPU: AMD TR 1920x
GPU: RTX 3090 x2
RAM: 128GB

I am running my models for free :crazy_face: in Kaggle notebooks CPU runtime mostly XGBoost or LGBM without GPU or TPU.
But @nyuton here explains how you can train your model on full V4 dataset with only 8GB RAM:

3 Likes

3060ti (8GB) + intel i9-10980XE + 64GB ram. Heavily leaning on Intel MKL for a lot of the calculations, and only sending the NN training off to the GPU. A future GPU with a larger memory might allow me to stay more on the GPU, but I wasn’t going to pay more than what they asked for a 3060ti back when I bought one :-).

1 Like

No reason in particular, it wins a lot of kaggle competitions and it’s sort of a running gag for me now to act silly and push data through XGBoost to solve all my problems

That said, I learned recently that you can generally think of boosting algorithms as a non parametric approach to estimation, so good reason why it does well generally. Combining a non-parametric approach with true structural assumptions in how the data behaves is always more optimal though, but it requires more work and thinking about the actual underlying process than just getting predictions out of xgboost

1 Like

Primary machine:
CPU: 12 core Intel i9 10920X @ 3.5Ghz
RAM: 256GB
GPU: 2x 3090 with 24GB of GPU memory on each card

Older secondary machine:
CPU: Quad core Intel i7-7700k @ 4.2Ghz
RAM: 64GB
GPU: 2x 2080TI with 11GB of GPU memory on each card

Both machines are headless.

3 Likes

Hi JRB,

I have 2x 3090 as well, blower style 2-slot so room for 2 more if my PSU can handle it (at least 1 would require a riser cable).

Would linking them with an NVLINK help at all on the Numerai data you reckon? Or not really? I’m mostly looking for an easy way to pool the ram together for a total of 48GB.

I thought someone use threadripper. No one uses it?

With pytorch and cuml you can push all your ML load to GPUs. They are faster and cheaper then a threadripper.

2 Likes

Thank you very much. I misunderstood that for GBDT many core CPU is faster than GPU. I will tri it.

1 Like

I have NVLINK on both my machines. I haven’t used it for Numerai data, although I have for used it for some large computer vision models in the past. I don’t know of any automatic way to use multiple GPUs as one, for this use-case (model parallelism). When people say multi-GPU training, they usually mean training large batches on multiple GPUs (data parallelism), which is the easy case and trivially automated by all frameworks.

It’s fairly straightforward to place some layers (i.e weights for those layers) on different GPUs. I’ve done this with tensorflow and JAX. Still a bit slower than using a single GPU, but noticeably faster than when doing the same without NVLINK, because device to device copies are much faster with it.

2 Likes

My CPU is a first Gen threadripper. 12 cores 24 threads, and fairly slow compared to latest Gen CPUs (I’ve overlooked it though so all cores are running at 4ghz all the time.). I use it mostly for pre and post processing. Been contemplating if I should upgrade to a 2990wx second Gen (32 cores 64 threads), which is the biggest one the motherboard supports. I don’t really need it though.

1 Like

Thanks for a very thorough answer!

3090 x 2… Do the house lights dim just a little when you fire that up? :slight_smile: Nice setup!

3 Likes

An Asus Eee PC 901, shipped with an Intel N270 Atom CPU clocked at 1.6GHz and 1GB of RAM.
It runs on Debian 9 and I use it for inference only !!!
I have models using XGBoost and that’s fine. I had to struggle a bit more with my PyTorch models as it is 32bit device and not PyTorch is not working on 32bit systems. So I have broken my models down to play them with the simple linear algebra using numpy.

Of course for training I have other devices : a MSI laptop (i7, 16GB, GTX 1060) and a retired open air mining rig (AMD CPU, 12GB, GTX 1080) but no data science war machine.

All models are running on cloud. Start fromAzure ML studio. Moved to Colab and Python as others suggested. Moved to Kaggle notebook later and now using Deepnote to conduct daily submission.