🪆

Matryoshka Representations

Title

Matryoshka Representations for Adaptive Deployment

Authors

Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi

Date

2022

Venue

Link

https://arxiv.org/abs/2205.13147

DBLP

Keywords

Introduction

Deployment of (inductive?) ML models has 2 parts:

Generate the representation

Use the representation for some downstream application (scales with and )

Problem: downstream apps dominate compute at web-scale; each have different costs

Solution:

encoding coarse-to-fine-grained representations, which are as accurate as the independently trained counterparts, we learn with minimal overhead a representation that can be deployed adaptively at no additional cost during inference.

notion image

Focus on two key (large-scale) tasks: large scale classification & retrieval

Method

Matryoshka Representation Learning (MRL):

Run the full neural network as normal to generate the output embedding .

Take each power-of-two-size chunk of , and apply a (necessarily different) projection to get a prediction for each🪆

Apply the loss to each prediction, and minimise the sum

Result:

different embedding sizes

In fact you can interpolate effectively to get different sizes

Efficient MRL:

Share projections across sizes by taking initial slice of largest-size projection

Saves on (almost) half the projection size

notion image