Leaderboard Bonus Exploit Uncovered

Leaderboard Bonus Exploit Uncovered

As many of you are aware, we plan to remove the leaderboard bonus going forward. The primary reason I’ve been pushing for this discontinuation is because it is susceptible to certain attacks by bad-actors. We can’t afford to have the most profitable piece of the tournament be exploitable - as profitability of a system increases, so does the effort people will put into gaming it. I have been suspicious of a couple of accounts for some time now, but only today was able to prove it.

P, 1-P

There is an attack that some users are aware of called the p, 1-p attack. This attack is specifically designed to take advantage of reward-structure-asymmetries. The premise is this:

  • Submit one high variance model.
  • Make another model that is the exact opposite of the first model.
  • Stake the exact same amount on both models.
  • Due to the high variance, one will likely do very very poorly, while the other does extremely well.
  • However, payouts are now guaranteed to be exactly opposite for both models round to round.
  • The model which does well will have a high position on the leaderboard and receive the leaderboard bonus.
  • Since the round-to-round payouts will cancel out, there is exactly 0 risk from burns. There remains only potential profit from leaderboard bonus.

The Exploit

Madmin, numer.ai/madmin, has been at the top of the staked leaderboard for some 2 months now.

Some of our analytics have shown the model itself is not particularly interesting, just some linear combination of a few features. This leads to a very high variance model that is able to take high risk and has the potential to blow every other user away in the proper regime. We might be okay with this as a standalone model, as the downside risk is high as well.

What I discovered though is another account called Madmax, numer.ai/madmax.

This model has been consistently outside of the top 300.

I immediately decided to check if it’s a p, 1-p attack as the names might imply. Though close to opposite, the models are not actually inverses of one another (their scores do not sum to 0). However, they both started staking exactly 40 NMR on exactly the same date. While this is suspicious, it’s not enough evidence to take action against an exploit at this point.

As I am working on an MMC payouts proposal though, I’m realizing that perhaps I should find some hard evidence to make my case for discontinuing the leaderboard bonus.

So through some investigation, I was able to uncover a third account. The_Guy, numer.ai/the_guy.

This user sits outside of the top 300 as well. They started staking exactly 40 NMR on exactly the same date as Madmax and Madmin. The account was created exactly 1 day before Madmax. And for any given round, if you sum the scores of Madmax, Madmin, and The_Guy, the result is always 0.

So like with the p, 1-p attack, this set of models achieves 0 risk, but actually has an even higher chance of one of them achieving high reputation because they are submitting 3 equally spaced high-variance models at different ends of the spectrum, rather than 2.

These models began with 40 NMR each in October 2019.
To date, the models have a combined 222 NMR.
17 for Madmax, 38 for The_Guy, and 167 for Madmin.
This is a clear exploitation of the payout system for over 100 NMR and 85% returns in less than 6 months.

The Punishment

From docs.numer.ai:

Let me first say, that as a “crypto company”, we are sympathetic towards the idea that code is law, and if someone exploits it, it is the code writer’s fault. It is a goal of the company to decentralize to the point where we can live by this.

On the other hand, as the docs say for now, “We reserve the right to refund your stake and void all earnings and burns if we believe that you are actively abusing or exploiting the payout rules.” We feel like protecting the integrity of the tournament for legitimate users is quite obviously more critical than adhering to this crypto idealism at present.

In this case, we believe the following actions are just:

  • Revert all payouts. This means they will receive all of their originally staked 120 NMR, but all payouts and burns will be undone.
  • Ban all three accounts from the tournament going forward.

We will be discontinuing the leaderboard bonus in 100 days, as planned, removing this asymmetry and disabling the attack vector. There is little risk of exploitation in the coming 100 days, as new stakes will not take effect by then.

And of course, we will be monitoring the leaderboard and banning without remorse any users suspected of such an attack.




Update:
Edit:
We have reconsidered the punishment terms since the writing of this post. Now the punishment is this:

  • Ineligible for the leaderboard and leaderboard bonus for its remaining 100 days.
  • No ban going forward
  • Keeps all NMR earned up to this point.

There are a few key points contributing to this.

  1. No rebalancing. If the user was truly trying to achieve zero risk, they would need to rebalance their NMR across the models every few weeks. The fact that no rebalancing occurred points to it being a viable strategy and genuine “skin in the game” on Madmin, with belief that it was going to continue to be the best model going forward.
  2. We don’t want users to have to think about potential punishments. They should be able to play the
    game given the incentives we’ve given them. That’s why leaderboard bonus is going away. We see it as our responsibility to make not-exploitable payout systems. We think this type of modeling could be viable even without leaderboard bonus, and we don’t want to inhibit this type of creative thinking and hedging as we go forward.
  3. Already received payments should be final. We don’t want to set a precedent where we take away NMR that’s already been earned and in the user’s account.

Ultimately it is our responsibility to make a tournament in which users are incentivized to work towards goals in the intended way. User’s should not have the burden of thinking about what is legal and what is not, and have to think about how the Numerai team will view their behavior.

We still reserve the right to return payouts, as described in this original post, but we will only do it in the most extreme, unquestionable circumstances. This circumstance does not meet that criteria.

17 Likes

Great job Mike! After reading your post I had to think for a bit to figure out how one would pull this off and how to generalize to any number of models. As long as you are using linear models it’s fairly simple, just takes a bit of geometric thinking. For three models you just need the three 310 length vectors that define the models to be coplanar and 120 degrees apart. Any projection onto three such vectors will sum to exactly 0 and so you will get three models whose predictions will perfectly cancel out each other just like p/1-p. Because spearman correlation isn’t a simple linear operation, the average score and thus the total risk doesn’t come out to exactly 0 (nor does it for madmax, madmin, and the_guy) but it’s close enough! I wrote some code below to illustrate this and made it work for an arbitrary number of models.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import spearmanr

def angle_between(v1, v2):
    v1_u = v1/ np.linalg.norm(v1)
    v2_u = v2/ np.linalg.norm(v2)
    return np.arccos(np.clip(np.dot(v1_u, v2_u), -1.0, 1.0))

def planar_vector(q1, q2, angle):
    return q1*np.cos(angle) + q2*np.sin(angle)

def orthogonalize(v1, v2):
	e1 = v1 / np.linalg.norm(v1)
	u2 = v2 - v1*np.dot(v1, v2.T)/np.dot(v1, v1.T)
	e2 = u2 / np.linalg.norm(u2)
	return e1, e2

# Define our axes. v1 and v2 could be random vectors, fit models, etc., whatever 
# you think might define a good plane. The first axis is going be be colinear with v1.

v1 = np.random.randn(310)
v2 = np.random.randn(310)

# Make 90 degrees apart to serve as axes for our plane
axis1,axis2 = orthogonalize(v1,v2)

number_of_models = 3

models = [planar_vector(axis1, axis2, a*np.pi/180.) for a in np.arange(0, 360, 360/number_of_models)]

print(angle_between(models[0],models[1])*180/np.pi)
print(angle_between(models[1],models[2])*180/np.pi)


perfs = []

for era in range(20):
	# Let's simulate live data that changes every week
	# along with a model that changes every week

	# generate random data
	x = np.random.randn(5000,310)
	# generate random model
	h = np.random.randn(310)
	# get output from random model
	y = np.dot(x,h)

	# get predictions from co-planar models
	preds = [np.dot(x, model) for model in models]

	# get correlations for all co-planar models
	perfs.append([spearmanr(pred, y).correlation for pred in preds])

perfs = np.array(perfs)

plt.plot(perfs)
plt.plot(perfs.mean(axis=1))
plt.legend([f'model {i+1}: {perfs[:,i].mean()}' for i in range(number_of_models)] + ['average correlation'])
plt.xlabel('era')
plt.ylabel('correlation')

So you might be thinking: With a lot of models on a plane, it might start getting fairly crowded with the models getting quite correlated with each other. Is there any way to achieve zero risk but spread the models out in space a bit more? Well in three dimensions, rather than a plane, spreading the model vectors out such that an arbitrary projection will always cancel out doesn’t work for an arbitrary number of vectors. I believe the only way to do it exactly is have the models be the vertices of a regular polytope, and so in three dimensions you can only do it for 4,6,8,12, and 20 models/vertices, e.g. the vertices of the Platonic solids. I leave the work of extending the code to 3+ dimensions as an exercise for the reader :wink:

12 Likes

What I have argued is that this is a very natural thing to try to do but instead of starting with valueless models, starting with models with positive correlation. If one finds the correct basis, then one can see that the models are orthogonal, but that the sum of such set of models is nonzero, the center of the polytope is offset from zero. However, the orthogonalization process can reduce the performance of individual models and in that case it might look like the same exploit being discussed.

Ok, so there’s a 2nd flavor of 1 - P attack. For that attack to be feasible it has a footprint; some collection of models adds up to 0 (or maybe 0-like with that chart above). The probability of that being a coincidence seems tiny, ESPECIALLY combined with symetrical betting. Couldn’t the Numerai backend systems behind the API detect that as people submit, to put a stop to it?

As a person doing this tournament on the up and up it’s been rough watching all of these massive shifts, on a regular basis. I’d hate to see this tournament throw out the baby with the bath water on this one.

So what exact rule was broken? Where is it made explicit that submitting multiple (inverse) models is “abuse” or an “exploit”?

One could argue that this is just a case of a user or group of users hedging against their own models, right? They staked all models and one stuck and is contributing to your meta-model—consistently! The other models sucked and lost their stake.

I don’t really see how the best-scoring model so far can be considered an “attack”. It provides valuable predictions against a risk and contributes to the meta-model!

I’m just trying to wrap my head around this. Isn’t hedging the name of the game?

2 Likes

It does undermine the intent of the tournament. If a staked model is considered trusted, a group of models like this is actually negative. 2 trusted voices say do this garbage (really just hedge), and one trusted voice says do this better thing. Even if the 1 good model is the best in the whole world, the person who said it also told 2 lies.

The fix isn’t too bad. Take a small sample of data and check heavily for pairings of models, if some look suspect, analyze them further, and repeat. That wouldn’t even be too terrible on compute costs.

Isn’t the intent of the tournament to let a ton of models compete, and let Numerai weed out the garbage models through a system of reputation, correlation, staking, etc.? And isn’t that working, since the other two models started at 40 NMR and now have a lower balance?

Now the best model submitted so far is getting kicked of the leaderboard, but why? I understand that people trying to find ways to game Numerai’s system is a pain in the butt. However, without clearly defining which rules cannot be broken this is a very arbitrary decision.

Should other users submitting multiple models be worried now if their models display negative correlations?

2 Likes

It seems to be staying on the leaderboard (just not on the staked), and still is getting payouts if it performs. Just no more bonus.