Era distribution

Interesting era distribution. Am I missing something? I was expecting to have continous era. Or maybe the numbers does not matter? I expected a longer train era.

import pandas as pd
import numpy as np
import as px
import plotly.graph_objects as go
import as pio
from plotly.subplots import make_subplots
pio.renderers.default = “browser”
training_data = pd.read_csv(“numerai_training_data.csv”)
tournament_data = pd.read_csv(“numerai_tournament_data.csv”)

df = pd.concat([training_data, tournament_data],ignore_index=True)

df[‘era_value’] = df[‘era’].str[3:]
df.loc[df[‘era’] == ‘eraX’, ‘era’] = ‘era0’
df[‘era_value’] = df[‘era’].str[3:].astype(int)
max_era = df[‘era_value’].max()
df.loc[df[‘era_value’] == 0, ‘era_value’] = max_era + 1

fig = px.scatter(df, x=“era_value”, y=“data_type”, color=“data_type”,
title=“Interesting era distribution”)

1 Like

Not all eras are in months. the live, test and val2 eras are done in weeks. this results in more eras in them than would be expected. below is how the data is currently formatted. In the latest fireside chat they have said this will change soon though. They are planning to give us the target values for all train, val and test eras in a weekly format (if i understood correctly). hope this helps


edit: ignore the current/proposed titles. this post was from a long time ago so only the “proposed” image is relevant today (but as i said, will probably change soon)