I find this mathematically quite beautiful and wanted to share it.
The plot shows the almost complete orthogonality of two large clusters of targets. The actual ‘target’ is in one of the clusters, I haven’t looked yet, but I’m guessing it’s in the larger of the two.
Why the decay at the North and South corners, and not the East and West corners? That’s quite informative.
Look carefully and you can see striping at 0.25, 0.5, and 0.75, both SW to NE, and NW to SE. Colour depth indicates an anomaly score, light=low.
Look at the target vectors; in each orthogonal direction there are two groups of dominant (longer), and less dominant (shorter) vectors.
I’m not going to say much about how this is derived, except to say that it’s a low rank, accurate representation of the full rank data.
I think there are lots of opportunities here for segmenting and ensembleing.
Follow-up with same analysis for features. (Every 4th era, 412789 rows, 1586 cols)
Labelling only the dominant (longer) feature vectors, looks like some interesting grouping for feature selection. Again the colour depth is an anomaly score but it looks all very normally distributed (nice looking Tukey box-plots not shown.)
A more detailed, side-by-side display showing the associated scree plots and box plots; features on the left, targets on the right.
You can see the relationships between the targets nicely in this more conventional correlation cluster map; they’re a bit obscured in the biplot.
Doing separate decompositions for the target variants shows the dominant targets (arthur, alan, janet) and their orthogonal relation to the rest.
It makes sense that he 20 day targets and the 60 day targets would be each clustered together, although I wouldn’t really expect them to be super orthogonal to each other since they are the same targets just farther out.
Yes, important to realise that the extreme orthogonality is in the low rank-2 approximation; but also that that is the overwhelmingly dominant sub-space. There is more beyond rank-2; we can either compute the relations in ever more inclusive dimensions (all the way up to exact full rank), or visualise the more subtle relations as they come out in subsets like in the last plot.
What we see in the biplot is an extraction of the most dominant relationships.
Aha. Then the orthogonality makes sense also when looking at LR2.