πŸͺ†

Matryoshka Representations

Title
Matryoshka Representations for Adaptive Deployment
Authors
Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford, Aditya Sinha, Vivek Ramanujan, William Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi
Date
2022
Venue
DBLP
Keywords

Introduction

Deployment of (inductive?) ML models has 2 parts:
  1. Generate the representation
  1. Use the representation for some downstream application (scales with and )
Problem: downstream apps dominate compute at web-scale; each have different costs
Solution:
encoding coarse-to-fine-grained representations, which are as accurate as the independently trained counterparts, we learn with minimal overhead a representation that can be deployed adaptively at no additional cost during inference.
notion image
Focus on two key (large-scale) tasks: large scale classification & retrieval

Method

Matryoshka Representation Learning (MRL):
  1. Run the full neural network as normal to generate the output embedding .
  1. Take each power-of-two-size chunk of , and apply a (necessarily different) projection to get a prediction for eachπŸͺ†
  1. Apply the loss to each prediction, and minimise the sum
Result:
  • different embedding sizes
  • In fact you can interpolate effectively to get different sizes
Efficient MRL:
  • Share projections across sizes by taking initial slice of largest-size projection
  • Saves on (almost) half the projection size
Β 
notion image