Modern Portfolio Theory for my models


I’ve built a good number of models in the recent months and some of them are promising.
So I was wondering, how I can stake these models to get good results with low volatility.
What weights I should give them.

Apparently modern portfolio theory is exactly for that.
Check out the math here: Modern portfolio theory - Wikipedia

I also implemented it for checking how I should share stake (or weight in an ensemble) between two models.
You can run your code too: numerai/Portfolio optimalization.ipynb at main · nemethpeti/numerai · GitHub

Just replace your model names and multipliers.

Have fun!
Feedback is welcome!


Scary how I thought earlier this week about how to optimize my model returns, and then you create this post. Will definitely give it a try if I find some time on the weekend.

So I couldn’t wait because I realised because of my research earlier this week I already had most of everything that I needed and curiosity became to great.

Your code will work only for two models what I didn’t like, so I read the wiki article you posted and quickly dangled together some code that will do the same as your code but for all models. I don’t have some shareable code of how to collect the necessary model performance data, but I wanted to share the optimization part, feel free to update your github repo with this, as I don’t have a github account yet.

So suppose you have a representation of your own model performances in a pandas dataframe that looks like the result generated by this code:

models = ["modelA", "modelB", "modelC"]

fake_tcs = np.random.normal(0.,0.08,(20,len(models)))
fake_corrs = np.random.normal(0.02,0.02,(20,len(models)))
corr_columns = ["corr_" + m for m in models]
tc_columns = ["tc_" + m for m in models]

fake_df = pd.DataFrame(
    columns = corr_columns + tc_columns

fake_df["round"] = np.arange(20) + 312

df = fake_df

Dummy dataframe:

import pandas as pd
from scipy.optimize import minimize
import numpy as np

#Your desired multipliers for each model
corr_multiplier = 1.
tc_multiplier = 0.5

reordered_columns = []
for tcol, ccol in zip(tc_columns, corr_columns):

#calculate the returns
mpo_data = pd.concat(
            df[corr_columns] * corr_multiplier,
            df[tc_columns] * tc_multiplier
    )[reordered_columns] \
    .rolling(2, axis=1) \
    .sum()[reordered_columns[1::2]] \
    .rename( \
            c: "return_" + m
            for c,m in zip(corr_columns, models)

def mpo_function(x, Rcov, q, R):
    return x.reshape(1,-1).dot(Rcov).dot(x) - q*R.reshape(1,-1).dot(x)

#covariance matrix of the returns
Rcov = mpo_data.cov().values

#risk tolerance factor, if too high optimization will fail
q = 0.01

#return per model
R = mpo_data.sum(axis=0).values

result = minimize(
    args=(Rcov, q, R),
print("Minimization successful:", res.success)

for w, m in zip(result.x, models):
    print(m, w)

Have fun with it, feel free to share / modify / improve, but don’t hold me liable if you lose money because I made a mistake :grin:

Awesome, thanks!

So I made some tweaks and merged into my code, so that it works for many models now.
Here you can find the generic version:


Super interesting approach @nyuton! I’m wondering what a good time window would be necessary (from starting_round to end_round) for the mean and std to be reliable

1 Like

3 months at least I guess. I don’t have any calculations tough.
But you need that 3 months anyway to validate the model in production.