Strange correlation behavior

I was watching this NNTaleb video on correlation (MINI-LESSON 5: Correlation, the intuition. Doesn't mean what people usually think it means. - YouTube) and he talks about how correlation is often not a good metric for measuring dependence between variables.

Here’s the example:

image

Any nonlinear model can use x to predict y. The takeaway could be not to use correlation to decide what features to include.

3 Likes

It is critical to discern between correlation and dependence…
https://cran.r-project.org/web/packages/NNS/vignettes/NNSvignette_Correlation_and_Dependence.html

2 Likes

I think a better takeaway is just that correlation is limited and should be used judiciously. If you take your triangle example (or Taleb’s—thanks for posting the video, btw) you’ll note that while a single correlation doesn’t produce any useful information, two correlations (one on each leg of the triangle) would. That then introduces a new question, how to partition the domain under analysis into suitable “regimes” where simple methods suffice.

The regime question surfaces here from time to time, and it’s one I do find fascinating. In practical terms, one might think of regimes in the Tournament as eras in which a specific set of features might correlate well with the targets, while a different regime would consist of eras in which a different set of features would do so. If one could identify the regime of an era before inverting the features to estimating targets, then Bob’s-yer-uncle you’ll be rich :laughing:.

2 Likes