Are predictions discrete or continuous?

Ranked sequences can have ties, there’s nothing strange or illegal about that. Indeed, there are many options in pandas.Series.rank() for how to manage (and preserve) ties in rank. So having ties in a series doesn’t mean that it isn’t ranked.

But who am I, right? I’ll let @jrb chime in:

Are the targets ranked or not? We have two more senior folks here in disagreement (@wigglemuse @jrb). Maybe you two can work it out publicly so the rest of us can benefit from the debate?

I’m sticking with the idea that they are ranked, using a method of handling ties similar - but not quite the same - as the pandas “dense” method. If I’m right, I’d like to see the true ranking code shared. Because it seems strange to me to compute the Spearman between sequences that were ranked using different techniques for resolving ties.

I did discover that if I compute the Spearman between the targets and the targets re-ranked using panda’s “dense” method, I do finally get a correlation coefficient of 1.0, which is nice. Even though the resulting re-ranked sequence doesn’t match the original sequence.

def correlation(predictions, targets):
    ranked_preds = predictions.rank(pct=True, method="dense")
    return np.corrcoef(ranked_preds, targets)[0, 1]

correlation( valid_df[ "target"], valid_df[ "target"])
> 0.99999999999

I’m sure someone will object to this change of ranking methods, even though it feels right to have the targets be 100% correlated with themselves.