🪅

# Learning in High Dimension Always Amounts to Extrapolation

Title
Learning in High Dimension Always Amounts to Extrapolation
Authors
Randall Balestriero, Jerome Pesenti, Yann LeCun
Date
2021
Venue
DBLP
Keywords
extrapolation

### Introduction

Definition: Interpolation occurs for a sample if it belongs to the convex hull of a set of samples. (extrapolation defined conversely)
Common assumption: as an algorithm transitions from interpolation to extrapolation, its performance decreases
Goal of the paper: show interpolation almost surely never occurs in high-dimensional spaces (>100) regardless of the underlying intrinsic dimension of the data manifold.
Corrolaries:
1. DL models basically always extrapolate
1. Extrapolation regime is not necessarily to be avoided
1. Generalisation should not be thought of in terms of extrapolation/interpolation
From the conclusion:
Interpolation and extrapolation [...] provide an intuitive geometrical characterization on the location of new samples with respect to a given dataset. Those terms are commonly used as geometrical proxy to predict a model’s performances on unseen samples and many have reached the conclusion that a model’s generalization performance depends on how a model interpolates. In other words, how accurate is a model within a dataset’s convex-hull defines its generalization performances. In this paper, we proposed to debunk this (mis)conception.

### Interpolation is Doomed by the Curse of Dimensionality

#### The Role of the Intrinsic, Ambient and Convex Hull Dimensions

Ambient dimension : dimension of the space in which the data lives
(Underlying data manifold) Intrinsic dimension : the number of variables needed in a minimal representation of the data
Convex hull dimension: the dimension of the smallest affine subspace that includes all the data manifold.

Claim: The probability of interpolation occuring depends on the convex hull dimension, not the intrinsic (manifold) dimension.
Evidence:

### Real Datasets and Embeddings are no Exception

Question: what if real datasets have a special type of low-dim manifold embedding that means we are still in the interpolation regime?
Result: on MNIST, CIFAR and Imagenet, despite the low-dim manifold, finding samples in the interpolation regime is still exp-difficult.

#### No interpolation in embedding-space (!)

Question: "one could argue that the key interest of machine learning is not to perform interpolation in the data space, but rather in a (learned) latent space" - so do we interpolate in the latent space?
Result: apparently not!? This seems remarkable but makes total sense when you consider how high-dimensional the latent space is.
Key quote:
We observed that embedding-spaces provide seemingly organized representations (with linear separability of the classes), yet, interpolation remains an elusive goal even for embedding-spaces of only 30 dimensions. Hence current deep learning methods operate almost surely in an extrapolation regime in both the data space, and their embedding space.

#### Is interpolation/extrapolation info preserved when using dimensionality reduction techniques?

TL;DR:
dimensionality reduction methods loose the interpolation/extrapolation information and lead to visual misconceptions significantly skewed towards interpolation