Calculating LogLoss in Python


Hi, noob here.

Can someone explain how to calculate the logloss in python ? I tried using sklearn.metrics.log_loss() but my values are way off.



Solved after reading following:



just as an information for those who stuck on getting other logloss rates in their calculations.

Numerai ist calculating the logloss on the “validation”-set in the torunament test data. So simply copy the rows from this file containing “validation” in the data_type column to a new CSV.
Then you can simply make the test prediction on this file like this:

eval_y are your “target” columns from the copied part of the testfile.
prediction is your predict_proba on the eval_x ( the feature columns of the same file )



There’s probably a more elegant way (or 20), but I cobbled this block that should work with

validation_data = prediction_data.loc[prediction_data['data_type'] == "validation"]
numValidationRows = validation_data.count()
eval_y = validation_data["target"]
predictions = pd.DataFrame(results_df.iloc[:numValidationRows['target']])
print ("Logloss : %f" % metrics.log_loss(eval_y, predictions))

# Logloss : 0.692814
# Uploaded predictions Numerai: 0.69281