### Introduction

**Definition:**Interpolation occurs for a sample if it belongs to the convex hull of a set of samples. (extrapolation defined conversely)

**Common assumption:**as an algorithm transitions from interpolation to extrapolation, its performance decreases

**Goal of the paper:**show interpolation almost surely never occurs in high-dimensional spaces (>100) regardless of the underlying intrinsic dimension of the data manifold.

**Corrolaries:**

- DL models basically always extrapolate

- Extrapolation regime is not necessarily to be avoided

- Generalisation should not be thought of in terms of extrapolation/interpolation

From the conclusion:

Interpolation and extrapolation [...] provide an intuitive geometrical characterization on the location of new samples with respect to a given dataset. Those terms are commonly used as geometrical proxy to predict a model’s performances on unseen samples and many have reached the conclusion that a model’s generalization performance depends on how a model interpolates. In other words, how accurate is a model within a dataset’s convex-hull defines its generalization performances. In this paper, we proposed to debunk this (mis)conception.

### Interpolation is Doomed by the Curse of Dimensionality

#### The Role of the Intrinsic, Ambient and Convex Hull Dimensions

**Ambient dimension**

**:**dimension of the space in which the data lives

**(Underlying data manifold) Intrinsic dimension**

**:**the number of variables needed in a minimal representation of the data

**Convex hull dimension:**the dimension of the smallest affine subspace that includes all the data manifold.

**Claim:**The probability of interpolation occuring depends on the convex hull dimension,

*not*the intrinsic (manifold) dimension.

**Evidence:**

### Real Datasets and Embeddings are no Exception

**Question:**what if real datasets have a special type of low-dim manifold embedding that means we are still in the interpolation regime?

**Result:**on MNIST, CIFAR and Imagenet, despite the low-dim manifold, finding samples in the interpolation regime is still exp-difficult.

#### No interpolation in pixel-space

#### No interpolation in embedding-space (!)

**Question:**"one could argue that the key interest of machine learning is not to perform interpolation in the data space, but rather in a (learned) latent space" - so do we interpolate in the latent space?

**Result:**apparently not!?

**This seems remarkable**but makes total sense when you consider how high-dimensional the latent space is.

Key quote:

We observed that embedding-spaces provide seemingly organized representations (with linear separability of the classes), yet, interpolation remains an elusive goal even for embedding-spaces of only 30 dimensions. Hence current deep learning methods operate almost surely in an extrapolation regime in both the data space, and their embedding space.

#### Is interpolation/extrapolation info preserved when using dimensionality reduction techniques?

TL;DR:

dimensionality reduction methods loose the interpolation/extrapolation information and lead to visual misconceptions significantly skewed towards interpolation