Consistency Algorithm


#1

What is the algorithm used to test for consistency?

I can see the code used for testing concordance and originality in this github repo, but I cannot locate where the consistency is calculated.


#2

Ohh, i found it here in the DatabaseManager class inside the database_manager.py file.


#3

For anyone interested, I created a simplified version of the code to calculate the consistency. I use numpy arrays for all the data, so the function is designed to be used with numpy arrays:

from sklearn.metrics import log_loss
import numpy as np

def calc_consistency(labels, preds, eras):
    """ Calculate the consistency score.

    Args:
        labels: (np array) The correct class ids
        preds:  (np array) The predicted probabilities for class 1
        eras:   (np array) The era each sample belongs to
    """
    unique_eras = np.unique(eras)
    better_than_random_era_count = 0
    for era in unique_eras:
        this_era_filter = [eras == era]
        logloss = log_loss(labels[this_era_filter], preds[this_era_filter])
        if logloss < -np.log(0.5):
            better_than_random_era_count += 1

    consistency = better_than_random_era_count / float(len(unique_eras)) * 100
    return consistency