Definition: Interpolation occurs for a sample if it belongs to the convex hull of a set of samples. (extrapolation defined conversely)
Common assumption: as an algorithm transitions from interpolation to extrapolation, its performance decreases
Goal of the paper: show interpolation almost surely never occurs in high-dimensional spaces (>100) regardless of the underlying intrinsic dimension of the data manifold.
- DL models basically always extrapolate
- Extrapolation regime is not necessarily to be avoided
- Generalisation should not be thought of in terms of extrapolation/interpolation
From the conclusion:
Interpolation and extrapolation [...] provide an intuitive geometrical characterization on the location of new samples with respect to a given dataset. Those terms are commonly used as geometrical proxy to predict a model’s performances on unseen samples and many have reached the conclusion that a model’s generalization performance depends on how a model interpolates. In other words, how accurate is a model within a dataset’s convex-hull defines its generalization performances. In this paper, we proposed to debunk this (mis)conception.
Ambient dimension : dimension of the space in which the data lives
(Underlying data manifold) Intrinsic dimension : the number of variables needed in a minimal representation of the data
Convex hull dimension: the dimension of the smallest affine subspace that includes all the data manifold.
Claim: The probability of interpolation occuring depends on the convex hull dimension, not the intrinsic (manifold) dimension.
Question: what if real datasets have a special type of low-dim manifold embedding that means we are still in the interpolation regime?
Result: on MNIST, CIFAR and Imagenet, despite the low-dim manifold, finding samples in the interpolation regime is still exp-difficult.
Question: "one could argue that the key interest of machine learning is not to perform interpolation in the data space, but rather in a (learned) latent space" - so do we interpolate in the latent space?
Result: apparently not!? This seems remarkable but makes total sense when you consider how high-dimensional the latent space is.
We observed that embedding-spaces provide seemingly organized representations (with linear separability of the classes), yet, interpolation remains an elusive goal even for embedding-spaces of only 30 dimensions. Hence current deep learning methods operate almost surely in an extrapolation regime in both the data space, and their embedding space.
dimensionality reduction methods loose the interpolation/extrapolation information and lead to visual misconceptions significantly skewed towards interpolation