🌸

BLOOM

Title

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Authors

Le Scao et al. (BigScience Workshop)

Date

2022

Venue

Link

https://arxiv.org/abs/2211.05100

DBLP

Keywords

Introduction

BigScience Large Open-science Open-access Multilingual Language Model (BLOOM)

Motivation

Costs of LLM training only affordable for big tech

Prior to OPT & BLOOM most LLMs not publicly available

Previous LLMs primarily trained on English language

Overview

176B params (released publicly)

46 natural languages, 13 programming languages

Compute provided by the French govt., using the Jean Zay supercomputer

Aim of paper is to document process for sake of community

notion image

Model

Training Data

Uses ROOTS corpus

Emphasis on needs and rights of “data subjects” (those who create text or whom it is about)

And on reducing bias resulting from naive web-crawling

Tools for visualising dataset available on 🤗 website

Some web-crawled data (OSCAR dataset) still used for the sake of volume (38% of corpus)

notion image

notion image

Model Architecture

See

What Language Model to Train if You Have One Million GPU Hours?

Balance tradeoff between existing, proven LLM architectures, versus promising but untested architectural innovations

Chose a GPT-style causal decoder model because of zero/few-shot abilities - finetuning 100B param LLMs is unwieldy

Main objective here is zero-shot generalisation
Results from investigation suggest this is best for causal decoder models

Did not consider MoEs “due to a lack of widely used GPU-based codebases suitable for training them at scale”

Numerics:

Started off with float16 but switched to bfloat16 because of “training instabilities” (they cite OPT and GLM-130B as examples)