Some info on multiple targets?

This post made me curious and so here is the correlation matrix between targets and the dendrogram to better highlight relationships.

Here is the code.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt


targets = [
    'target_nomi_v4_20',
    'target_nomi_v4_60',
    'target_jerome_v4_20',
    'target_jerome_v4_60',
    'target_janet_v4_20',
    'target_janet_v4_60',
    'target_ben_v4_20',
    'target_ben_v4_60',
    'target_alan_v4_20',
    'target_alan_v4_60',
    'target_paul_v4_20',
    'target_paul_v4_60',
    'target_george_v4_20',
    'target_george_v4_60',
    'target_william_v4_20',
    'target_william_v4_60',
    'target_arthur_v4_20',
    'target_arthur_v4_60',
    'target_thomas_v4_20',
    'target_thomas_v4_60']

# analyse the validation data, but we could do the same on the training data
df = pd.read_parquet('v4/validation.parquet', columns=targets + ['era'])

# compute the mean of the era correlation of every target with any other target
corr = df.groupby('era').corr(method='spearman').mean(axis=0, level=1)

# arrange the order of the columns and rows (for visualization) so that they
# are sorted by correlation with the target 'target_nomi_v4_20' 
corr = corr.sort_values(
    'target_nomi_v4_20',
    axis=0,
    ascending=False).sort_values(
        'target_nomi_v4_20',
        axis=1,
    ascending=False)

sns.heatmap(corr, annot=True)
plt.show()

sns.clustermap(corr)
plt.show()

2 Likes