I am just starting out creating a model. What should I be trying to hit when I run the “Score” method against the [data_type == ‘validation’] data? I see the top 100 people in the tournament are >0.03 but that seems like a high goal at the start.
# Submissions are scored by spearman correlation
def correlation(predictions, targets):
ranked_preds = predictions.rank(pct=True, method="first")
return np.corrcoef(ranked_preds, targets)[0, 1]
# convenience method for scoring
def score(df):
return correlation(df[PREDICTION_NAME], df[TARGET_NAME])