eleele
April 16, 2022, 10:49pm
1
Hi, as far as I understand, v4/train.parquet should not change along the weeks but today I observed some differences between v4/train.parquet
downloaded one week ago with respect to the one downloaded today. The md5sums are:
00b6c902c6c01aa9b93928509f0a7d70 311/train.parquet
5e6b4f650354d5de0b274d983ef4c76a 312/train.parquet
Is there a reason?
Thanks
jaqume
April 17, 2022, 3:11am
2
The new file has fewer lines:
8896499 v4_311/train.parquet
8879285 v4_312/train.parquet
and reading any file from previous week gives error:
OSError: Could not open Parquet input source '<Buffer>': Couldn't deserialize thrift: TProtocolException: Invalid data
looks like napi.download_dataset() function left them corrupted.
1 Like
eleele
April 17, 2022, 3:36pm
3
So a bug in numerapi
? It is a bit unexpected given the substantial amount of users for that basic function…
Anyway, thanks!