I couldn’t find any good visualisations of machine learning number formats online, so I’ve decided to make one. It’s interactive, and hopefully gives a sense of the trade-offs between using different formats. Enjoy!

**Note: please refresh the page to make the visualisation appear!****How to use/read this chart:**

- This is a histogram of the positive values represented by each number format.

- Scroll on the body of the chart to zoom, drag to pan.

- Click on the 🟦🟩… boxes on the right to show/hide formats.
`shift-click`

to select multiple.

- Be careful - the x-axis values are all . The y-axis values aren’t, but the spacing is still logarithmic.

### What does it all mean?

Below are some anticipated questions about the visualisation, and number formats in general. Click the ➤ icon to expand:

**What does this chart show me? How do I interpret the axes?**

On the x-axis we have a series of histogram bins. The numbers below the axis represent the of the bin edges .

On the y-axis we have the count of how many values fall within that bin for each format.

In short, if the bin

`[0.5, 0.75)`

has a count of 33 for BF16, then we know that BF16 can represent 33 different numbers between and .**What are number formats and why should I care about what values they can represent? What are these new FP8 formats?**

On paper, machine learning uses continuous numbers for activations, weights and gradients. In practice, we have to represent these values approximately using a fixed number of bits. The different methods for doing this are known as number formats.

There’s a trade-off when selecting a number format: the fewer bits we use the faster our models will run, but the less accurate our approximation will be. At a certain point this approximation starts to hurt our ML models and degrade their performance.

The use of smaller number formats has contributed more to improving ML performance than any other hardware development in the last decade. Initially, practitioners used the FP32 “single-precision” format for training. In the last few years, researchers have found some values can be reduced to 16-bit “half-precision” formats (BF16 & FP16).

Recently, two proposals for 8-bit formats have emerged (the ”GAQ” format, and the “NAI” format). In both cases, it‘s been shown that, remarkably, training is possible with some values in FP8 (GAQ paper, NAI paper). For an overview of the difference between the two proposals, see the table below.

This brings us into a world of very small numerics indeed! Whereas the original FP32 format has ~4 billion different numbers it can represent, the FP8 formats have fewer than 256 values available.

With this in mind, understanding the values you can represent in different formats is becoming increasingly important. 8-bit formats promise substantial speedups and memory savings, but also introduce great challenges due to reduced precision and range.

This visualisation is intended to highlight the relationship between the different number formats used in machine learning, in order to help practitioners make the most of the formats available to them.

**What are the different formats and how are they defined?**

There’s not room to give a full summary of the different formats here, but a quick overview of the IEEE 754 floating point standard should cover a lot. All of the 16 and 32-bit formats above are based on this standard, as well as the FP8_E5M2_NAI format.

The standard says that to specify a

*floating-point format*you need the following:- a number of mantissa bits ()

- a number of exponent bits ()

- a base → always 2 for binary numbers ()

Then to specify a

*particular number*within that format, you need to fill in:- the sign bit ()

- the exponent bits ()

- the mantissa bits ()

This string of bits is then interpreted in the following way:

Where:

There are also a small number of “special values” which override the above rule. These define bit strings that are used to represent infinities, NaN (not-a-number) and special “subnormal numbers”. See the wiki page for further details.

For the FP8 formats, the best sources of information are the links to blog posts and papers provided in the previous section.

**What is the ****normal dist.**** box? What do the **** sliders mean?**

**normal dist.**

Included in the visualisation is a histogram of values sampled from a normal distribution. This is provided to give an example of a set of numbers we might wish to represent using one of the formats.

Note that this doesn’t look like a typical bell curve, as a) it’s only positive values, b) it’s in log space. The bin counts are also the

*expected*count, rather than an actual sample (hence why some aren’t exact integers).The first slider below the chart changes the (log) standard deviation of the distribution. In regular-space, this would scale the width of the distribution; in log-space it simply moves it left-or-right.

The second slider changes the (log) number of samples taken. This slider increases the height of the histogram (we have more values per-bin) and its width (as the height increases, some values which are usually hidden below the y-axis now become visible. This may sound strange - it’s a result of our logarithmic y-axis). The starting value of samples corresponds to the hidden size of a GPT-3 6.7B model.

**Can you explain the shapes of the different formats’ histograms?**

**Width:**(excluding the slope on the left end) this is proportional to the number of exponent bits used by the format. The more exponent bits available, the larger the

*dynamic range*of values that can be represented.

**Height:**this is proportional to the number of mantissa bits. The mantissa bits interpolate between exponent values. Having more of them means we have more values to choose from when representing a continuous number, giving us greater

*precision*.

**Saw-tooth peaks at the top of the histograms:**zoomed-out, the floating-point formats look uniformly distributed. This can be attributed to their exponent bits generating uniformly-spread values in log-space.

However, when zoomed-in, we see small peaks across the top of the bins. This is because the mantissa bits (which interpolate between the exponent values) are only uniform in regular-space. They look like the diagonal lines seen here when put into log-space.

**Slope on the left end:**on the left end on the larger formats we see a diagonal slope down to zero. This is a result of the special

*subnormal*numbers defined by the IEEE standard. These values are more spread-out than the rest, giving decreasing precision (the bins get shorter), but increasing the dynamic range (width) relative to what we’d otherwise have.

Note that the same effect is not observed on the right, as the standard doesn’t define an equivalent of subnormal values for large numbers.

**FP8 formats:**the

`FP8_1.5.2_GAQ`

and `FP8_E5M2_NAI`

formats only have two mantissa bits - this is so few that the above histogram, with its bin width, only has a maximum of one representable value in each bin. Hence why it looks like a square. The few separate bins on the left represent subnormal numbers.The

`FP8_1.4.3_GAQ`

and `FP8_E4M3_NAI`

formats look very similar, with the difference that the extra mantissa bit means we have two values in each bin.**INT8 formats:**the INT8 format is very similar to a floating-point format with no exponent bits. Just as the mantissa part of the floating-point formats creates diagonal peaks in their histograms, the INT8 format is one large peak. Again, this is due to the values being distributed uniformly in regular-space rather than uniformly in log-space.

For INT8 there are two version provided:

`INT8x2`

and `INT8x512`

. The values after the `x`

denote scalar multiplication of the representable values. and were chosen to align the max values with the two classes of FP8 formats. These scaled-up versions of INT8 don’t actually exist, but when using INT8 one typically does something equivalent - dividing the values to be represented by a factor before quantisation. The two formats chosen are thus equivalent to dividing by and respectively.**How did you make this chart & the table below?**

### Comparing ML number formats

The following table outlines how each number format is defined, their min/max values, and how they differer from the IEEE 754 standard:

Format

IEEE compliant

E bits

M bits

Max

|Min normal|

|Min|

Bias

Max exp

Min exp

Inf encoding

NaN encoding

Zero encoding

IEEE compliant

2^max_e * (2 - 2^-M)

2^min_e

2^(min_e - M)

2^(E-1) - 1

2^(E-1) - 1

2 - 2^(E-1)

E=1s M=0s

E=1s M≠0s

S=0/1 E=0s M=0s

### Implementation

Below is the code used to generate the above visualisation / table. These classes define exactly which numbers the formats are able to represent. For the full implementation, visit:

visualising-ml-number-formats

thecharlieblake • Updated Feb 17, 2023

#### Floating point code

#### Integer code

If you have any questions/feedback, feel free to get in touch with me at thecharlieblake [at] gmail.com, or via my twitter @thecharlieblake.

If you have a notion account you can also leave comments on the page directly.