Calculating LogLoss in Python


#1

Hi, noob here.

Can someone explain how to calculate the logloss in python ? I tried using sklearn.metrics.log_loss() but my values are way off.

Stephane


#2

Solved after reading following:

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html

http://www.exegetic.biz/blog/2015/12/making-sense-logarithmic-loss/

https://www.kaggle.com/wiki/LogLoss


#3

Hello,

just as an information for those who stuck on getting other logloss rates in their calculations.

Numerai ist calculating the logloss on the “validation”-set in the torunament test data. So simply copy the rows from this file containing “validation” in the data_type column to a new CSV.
Then you can simply make the test prediction on this file like this:

eval_y are your “target” columns from the copied part of the testfile.
prediction is your predict_proba on the eval_x ( the feature columns of the same file )


#4

Thanks @DEUSVULT

There’s probably a more elegant way (or 20), but I cobbled this block that should work with example_model.py

validation_data = prediction_data.loc[prediction_data['data_type'] == "validation"]
numValidationRows = validation_data.count()
eval_y = validation_data["target"]
predictions = pd.DataFrame(results_df.iloc[:numValidationRows['target']])
print ("Logloss : %f" % metrics.log_loss(eval_y, predictions))

# Logloss : 0.692814
# Uploaded predictions Numerai: 0.69281