Skip to content

Releases: huggingface/optimum-neuron

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

01 Feb 10:18
Compare
Choose a tag to compare

What's Changed

AWS SDK

  • Use AWS Neuron SDK 2.16.1 (#449)

Inference

  • Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
  • Allow exporting decoder models using optimum-cli by @dacorvo (#422)
  • Add Neuron X cache registry by @dacorvo (#442)
  • Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)

Training

TGI

  • TGI: support vanilla transformer models whose configuration is cached by @dacorvo (#445)

Tutorials and doc improvement

Major bugfixes

  • TGI: correctly identify special tokens during generation by @dacorvo (#438)
  • TGI: do not include the input_text in generated text by @dacorvo (#454)

Other changes

New Contributors

Full Changelog: v0.0.17...v0.0.18

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

19 Jan 07:19
Compare
Choose a tag to compare

What's Changed

AWS SDK

  • Use AWS Neuron SDK 2.16 (#398)
  • Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)

Inference

  • Improve the support of sentence transformers by @JingyaHuang (#408)
  • Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
  • Add support for Mistral models by @dacorvo (#411)
  • Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)

Training

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Full Changelog: v0.0.16...v0.0.17

v0.0.16: T5 export and inference, general training fixes

19 Dec 13:29
Compare
Choose a tag to compare

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

  • Skip model saving during precompilation and provide option to skip cache push (#365)
  • Fixes checkpoint saving and consolidtation for TP (#378)
  • A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

  • Support for the export and inference of T5 (#267)
  • New documentation for Stable Diffusion XL Turbo (#374)

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

24 Nov 17:46
Compare
Choose a tag to compare

What's Changed

Training

Distributed Training

  • parallel_cross_entropy loss support for tensor parallelism (#246)
  • Support for training the Mistral architecture with tensor parallelism (#303)

AWS SDK

  • Fix: neuron_parallel_compile is compatible with the cache system (#352)
  • Full support for neuron_parallel_compile with the cache system: compilation files produced by neuron_parallel_compile will be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)

Documentation

  • Guide explaining how distributed training works in optimum-neuron (#339)

Inference

  • Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
  • Support decoding sequences of byte tokens in TGI (#350)

Documentation

  • Updated the documentation on LCM (#351)

v0.0.14: LCM support

17 Nov 16:38
Compare
Choose a tag to compare

What's Changed

LCM support

  • [Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Full Changelog: v0.0.13...v0.0.14

v0.0.13: AWS Neuron SDK 2.15

27 Oct 09:08
Compare
Choose a tag to compare

What's Changed

The main change in this release is the alignment with AWS Neuron SDK 2.15.

Text-generation

Other changes

Full Changelog: v0.0.12...v0.0.13

v0.0.12.1: Patch release for training with Neuron SDK 2.14

27 Oct 14:08
Compare
Choose a tag to compare

v0.0.12: SDXL refiner, Sequence parallelism training

16 Oct 08:42
Compare
Choose a tag to compare

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

Distributed Training:

Text generation updates

Other changes

New Contributors

Full Changelog: v0.0.11...v0.0.12

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

12 Sep 13:50
Compare
Choose a tag to compare

SDXL Export and Inference

Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).

Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge or larger recommended) or a CPU-only instance (disable the validation with --disable-validation) :

optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/

And then run inference with the class NeuronStableDiffusionXLPipeline

from optimum.neuron import NeuronStableDiffusionXLPipeline

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]

Llama v1, v2 Inference

  • Add support for Llama inference through NeuronModelForCausalLM by @dacorvo in #223

Llama v2 Training

TGI

Major bugfixes

Other changes

Full Changelog: v0.0.10...v0.0.11

v0.0.10: Bugfixes and enhancement

28 Aug 11:55
Compare
Choose a tag to compare

Major bugfixes

  • Improve and Fix inferentia exporter by @JingyaHuang in #168
  • [Stable Diffusion] Fix the image size value inferral by @JingyaHuang in #167
  • Fix inferral of dynamic batch size from the config & Be compatible with transformers 4.32 by @JingyaHuang in #190

Enhancements of APIs

Other changes

New Contributors

Full Changelog: v0.0.9...v0.0.10