I have a background in statistics but what’s the most basic example of how data could be encrypted while preserving signal?
This is a paper discussing the type of method Numerai has in the past suggested they’re using.
It’s like a generative adversarial network in that you have two opposing networks: an encryption/decryption network and an eavesdropping network. The encryption/decryption pair has a secret key and is learning a reversible encoding of the data+key. The eavesdropping network doesn’t have the key, so only has access to the encrypted output. The objective of the encryption/decryption pair is to minimize reconstruction error of the data, while also preventing eavesdropping. So at the end of the day, the “encrypted” data is really just a particular complicated transformation of the data. Presumably, since Numerai has access to the ground truth data and labels for everything, they can confirm that some sort of structure is preserved in the encrypted data, though there is almost certainly some loss of information/structure as well unfortunately.
For other examples of encryption that can preserve signal in the data, check out homomorphic encryption: https://en.wikipedia.org/wiki/Homomorphic_encryption