## Benefits of sub-32-bit fp training

- Less memory usage

- Less memory bandwidth required (local and network)

- Math faster

## Minimum non-denorm value in FP16

- Bias =

- Min exp number for non-denorm = 1

- The minimum value therefore is

## Minimum denorm value in FP16

- Minimum denorm value is

- If smallest bit set then we multiply by

- Giving

## Maximum value in FP16

- Set all exp bits to 1 except smallest (to avoid NaN)

- Bias =

- Exp number =

- Combined with the bias this gives a value of

- The significand bits set all to 1 then multiplies by

- To give a max value of

- = 65504

## Loss scaling steps

All in FP16 (with a few specific exceptions):

- Multiply loss by scale factor

- Standard backprop (chain rule ensures scaling propagates)

- Multiply the weight gradient by and feed it to the optimiser

## Loss scaling: when to use FP32 typically

- Master copy of weights

- Large reductions âĄī¸ e.g. batch-norm mean & var statistics

## Loss scaling: how to choose a scaling factor dynamically

Increase gradually until overflow occurs, then decrease