Experiment with bf16 on nvidia graphics cards #647

rminsil · 2025-02-07T02:06:00Z

Overview

Our nllb models use the standard fp16 floating point format under the hood. This issue is proposing to explore the benefits of using bf16 ("brain floating point") as it may:

improve accuracy
reduce training time
reduce memory requirements

Proposal

understand what support nvidia GPU's have for bf16
add an experimental flag that allows enabling bf16 (current default behavior won't change)
for nllb, investigate what effect using bf16 instead of fp16 has in terms of
- model accuracy
- training time
- GPU memory usage

Investigation related to tf32 is out of scope.

In the future if bf16 proves to be stable and generally better, we may make that the default, but that's out of scope for this issue.

Current settings

Currently we use this logic to set the preferred floating point:

def _create_training_arguments(self) -> Seq2SeqTrainingArguments:
    ...
    merge_dict(
        args,
        {
            "fp16": self._mixed_precision and not self._is_t5, # <------------
            "bf16": self._mixed_precision and self._is_t5,     # <------------
            "tf32": self._mixed_precision,
        },
    )
    ...

(For context, the _is_t5 field is related to the google madlad model and will be False for nllb)

For nllb, mixed precision is enabled by default, so usually the above will reduce to:

{
    "fp16": True,
    "bf16": False,
    "tf32": True,
}

This means the models are using fp16 currently.

Notes on floating point formats

fp16

fp16 = "half precision floating point"

https://en.wikipedia.org/wiki/Half-precision_floating-point_format

It's the standard 16 bit floating point representation (see IEEE 754 standard)

The 16 bits are used like so:

       1   5       10
       s eeeee ffffffffff
          exp. significand

s = sign bit
e = exponent bits
f = fractional bits

Because fp16 has only 5 exponent bits, the largest normal value that can be represented is 65504.

fp32

fp32 = "single precision floating point"

https://en.wikipedia.org/wiki/Single-precision_floating-point_format

The big brother of fp16

       1    8               23
fp32   s eeeeeeee fffffffffffffffffffffff

s = sign bit
e = exponent bits
f = fractional bits

fp16, fp32 and mixed precision

32 bit floating points are more accurate, but it halves the number of values you can fit into some memory area.

In the context of GPU's and model training, this can mean it's much slower to train models.

That is why it can be helpful to used "mixed precision" where you sometimes lower to 16 bit values like fp16.

There is a trade off of accuracy to speed. Using fp16 also introduces problems because it can't represent very large numbers and also very small positive numbers. These issues can lower the accuracy of the model and increase the processing time and memory doing conversions related to loss scaling.

By default our model training uses mixed precision, but it can be disabled with --disable-mixed-precision.

bf16

bf16 = "brain floating point"

Created by the Google Brain AI research group.

https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

       1    8        7
       s eeeeeeee fffffff
          exp.    significand

s = sign bit
e = exponent bits
f = fractional bits

You can think of it as fp16 where 3 bits moved from the significand to the exponent. This means it can represent larger numbers, and the smallest positive number is more accurate, but you have less significant figures.

Another way to understand it is in relation to fp32:

bf16   s eeeeeeee fffffff (7)
fp32   s eeeeeeee fffffffffffffffffffffff (23)
                         ^^^^^^^^^^^^^^^^
                              16
                       additional precision

It is really just fp32 with the last 16 precision bits "chopped off", and they have the same number of exponent bits.

Conversions between fp32 and bf16 are very efficient because of this bit layout:

fp32 -> bf16: chop off the back 16 bits
bf16 -> fp32: pad with 16x0 bits

There's a nice summary in this post of the potential benefits for bf16 over fp16.

tf32

tf32 = "tensor float 32"

https://en.wikipedia.org/wiki/TensorFloat-32

It only uses 19 bytes in total:

       1    8        *
bf16   s eeeeeeee fffffff (7)
fp32   s eeeeeeee fffffffffffffffffffffff (23)
tf32   s eeeeeeee ffffffffff (10)

My impression is that it's intended as a way to "reinterpret" existing fp32 values into a lower precision to make processing them faster. I don't think it's packing data into 19 bit chunks. So it's effectively a lazy "backwards compatible" fp32.

Historical support

My impression is that bf16 was initially created by Google and native support was added to their custom TPU's.
This is back around 2018 but I haven't found an exact date.

At that point, the existing nvidia GPU's wouldn't have had native support for bf16.

However since then other chip manufacturers have started adding native support, and I suspect that most nvidia hardware
used by the silnlp team for local development and in our clearml infra would support it.

For example these GPU's should support bf16:

3000 series
4000 series
A100
H100

However it's not clear to me yet what effect it would have.

The text was updated successfully, but these errors were encountered:

rminsil · 2025-02-09T12:11:32Z

David and I caught up to discuss this.

David volunteered to do the experimentation. For initial experimentation/POC, we don't even need a branch or experimental cli support - all David needs to do is hack changes directly to the method _create_training_arguments line 1225 of hugging_face_config.py:

         merge_dict(
             args,
             {
-                "fp16": self._mixed_precision and not self._is_t5,
-                "bf16": self._mixed_precision and self._is_t5,
+                "fp16": False,
+                "bf16": True,
                 "tf32": self._mixed_precision,
             },
         )

rminsil linked a pull request Feb 7, 2025 that will close this issue

Issue 647 - add link to floating point documentation #648

Merged

rminsil closed this as completed in #648 Feb 7, 2025

rminsil reopened this Feb 7, 2025

rminsil self-assigned this Feb 7, 2025

davidbaines added the enhancement New feature or request label Feb 7, 2025

ddaspit added this to SIL-NLP Research Feb 7, 2025

github-project-automation bot moved this to 🆕 New in SIL-NLP Research Feb 7, 2025

ddaspit moved this from 🆕 New to 📋 Backlog in SIL-NLP Research Feb 7, 2025

rminsil assigned davidbaines and unassigned rminsil Feb 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with bf16 on nvidia graphics cards #647

Experiment with bf16 on nvidia graphics cards #647

rminsil commented Feb 7, 2025

rminsil commented Feb 9, 2025 •

edited

Loading

Experiment with bf16 on nvidia graphics cards #647

Experiment with bf16 on nvidia graphics cards #647

Comments

rminsil commented Feb 7, 2025

Overview

Proposal

Current settings

Notes on floating point formats

fp16

fp32

fp16, fp32 and mixed precision

bf16

tf32

Historical support

rminsil commented Feb 9, 2025 • edited Loading

rminsil commented Feb 9, 2025 •

edited

Loading