Era distribution

marianotir · July 29, 2021, 3:38pm

Interesting era distribution. Am I missing something? I was expecting to have continous era. Or maybe the numbers does not matter? I expected a longer train era.

import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
from plotly.subplots import make_subplots
pio.renderers.default = “browser”
training_data = pd.read_csv(“numerai_training_data.csv”)
tournament_data = pd.read_csv(“numerai_tournament_data.csv”)

df = pd.concat([training_data, tournament_data],ignore_index=True)

df[‘era_value’] = df[‘era’].str[3:]
df.loc[df[‘era’] == ‘eraX’, ‘era’] = ‘era0’
df[‘era_value’] = df[‘era’].str[3:].astype(int)
max_era = df[‘era_value’].max()
print(max_era)
df.loc[df[‘era_value’] == 0, ‘era_value’] = max_era + 1

fig = px.scatter(df, x=“era_value”, y=“data_type”, color=“data_type”,
title=“Interesting era distribution”)
fig.show()

andy_shaps · July 30, 2021, 10:22am

Not all eras are in months. the live, test and val2 eras are done in weeks. this results in more eras in them than would be expected. below is how the data is currently formatted. In the latest fireside chat they have said this will change soon though. They are planning to give us the target values for all train, val and test eras in a weekly format (if i understood correctly). hope this helps

edit: ignore the current/proposed titles. this post was from a long time ago so only the “proposed” image is relevant today (but as i said, will probably change soon)

Topic		Replies	Views
Relation of Eras with time periods Tournament	1	1652	May 10, 2020
Making sense of era number Tournament	3	2365	April 4, 2021
Era types (monthly, weekly, daily) in V4.2 dataset Tournament	1	486	October 15, 2023
Monthly vs weekly eras Tournament	3	1074	July 21, 2021
Taking advantage of Eras Data Science	6	3371	June 10, 2021

Era distribution

Related topics