Releases · huggingface/optimum-neuron

01 Feb 10:18

dacorvo

v0.0.18

7b18de9

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

What's Changed

AWS SDK

Use AWS Neuron SDK 2.16.1 (#449)

Inference

Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
Allow exporting decoder models using optimum-cli by @dacorvo (#422)
Add Neuron X cache registry by @dacorvo (#442)
Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)

Training

Initial support for pipeline parallelism by @michaelbenayoun (#279)

TGI

TGI: support vanilla transformer models whose configuration is cached by @dacorvo (#445)

Tutorials and doc improvement

Various fixes by @jimburtoft @michaelbenayoun @JingyaHuang (#428 #429 #432)
Improve Stable Diffusion Notebooks by @JingyaHuang (#431)
Add Sentence Transformers Guide and Notebook by @philschmid (#434)
Add benchmark section by @dacorvo (#435)

Major bugfixes

TGI: correctly identify special tokens during generation by @dacorvo (#438)
TGI: do not include the input_text in generated text by @dacorvo (#454)

Other changes

API change to be compatible to Optimum by @JingyaHuang (#421)

New Contributors

@jimburtoft made their first contribution in #432

Full Changelog: v0.0.17...v0.0.18

Contributors

dacorvo, michaelbenayoun, and 3 other contributors

Assets 2

19 Jan 07:19

dacorvo

v0.0.17

8d4b6dc

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

What's Changed

AWS SDK

Use AWS Neuron SDK 2.16 (#398)
Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)

Inference

Improve the support of sentence transformers by @JingyaHuang (#408)
Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
Add support for Mistral models by @dacorvo (#411)
Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)

Training

Add general support for generation on TRN with NxD by @aws-tianquaw (#370)

Tutorials and doc improvement

Add llama 2 fine tuning tutorial by @philschmid (#390)

Major bugfixes

Skip pushing if the user does not have write access to the cache repo by @michaelbenayoun (#405)

Other changes

Bump Hugging Face library versions by @JingyaHuang (#403)

New Contributors

@aws-tianquaw made their first contribution in #370
@aws-yishanm made their first contribution in #387

Full Changelog: v0.0.16...v0.0.17

Contributors

dacorvo, michaelbenayoun, and 4 other contributors

Assets 2

19 Dec 13:29

michaelbenayoun

v0.0.16

c0c1fc8

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training

A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.

Skip model saving during precompilation and provide option to skip cache push (#365)
Fixes checkpoint saving and consolidtation for TP (#378)
A torch_xla compatible version of safetensors.torch.save_file is now used in the NeuronTrainer (#329)

Inference

Support for the export and inference of T5 (#267)
New documentation for Stable Diffusion XL Turbo (#374)

Assets 2

24 Nov 17:46

michaelbenayoun

v0.0.15

3f88322

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

What's Changed

Training

Distributed Training

parallel_cross_entropy loss support for tensor parallelism (#246)
Support for training the Mistral architecture with tensor parallelism (#303)

AWS SDK

Fix: neuron_parallel_compile is compatible with the cache system (#352)
Full support for neuron_parallel_compile with the cache system: compilation files produced by neuron_parallel_compile will be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)

Documentation

Guide explaining how distributed training works in optimum-neuron (#339)

Inference

Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
Support decoding sequences of byte tokens in TGI (#350)

Documentation

Updated the documentation on LCM (#351)

Assets 2

17 Nov 16:38

JingyaHuang

v0.0.14

d65449e

v0.0.14: LCM support

What's Changed

LCM support

[Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323

Tutorials and doc improvement

notebooks: add llama2 chatbot example by @dacorvo in #300
Add llama 2 tutorial by @dacorvo in #321
Migrate documentation of Stable Diffusion and add notebooks by @JingyaHuang in #312

Major bugfixes

Noisy loss fix by @bocchris-aws in #293
Fix neuron cache starting compilation before fetching by @michaelbenayoun in #280
fix(pipelines): support passing decoder model + tokenizer by @dacorvo in #319

Other changes

chore: update dev version by @dacorvo in #276
Explicitly mention aws repo extra url in documentation by @dacorvo in #277
Update supported architecture in the doc by @JingyaHuang in #281
Fix doc build source code broken links by @JingyaHuang in #282
Add revision to push_to_hub by @philschmid in #292
Set default device id for SD and SDXL by @JingyaHuang in #297
Add missing decoder model architectures by @dacorvo in #298
Official support for AWS inferentia2 TGI container by @dacorvo in #302
Transformers fix by @dacorvo in #320
Add sagemaker compatible image by @dacorvo in #322
Fix broken tests by @michaelbenayoun in #274
chore: align with AWS Neuron SDK 2.15.1 by @dacorvo in #325
Deleted the 'maybe_free_model_hooks()' from Diffusers Pipelines by @Cerrix in #330
Bump diffusers version by @JingyaHuang in #335

New Contributors

@Cerrix made their first contribution in #330

Full Changelog: v0.0.13...v0.0.14

Contributors

dacorvo, Cerrix, and 4 other contributors

Assets 2

27 Oct 09:08

dacorvo

v0.0.13

cf97838

v0.0.13: AWS Neuron SDK 2.15

What's Changed

The main change in this release is the alignment with AWS Neuron SDK 2.15.

Text-generation

add support for bloom and opt models by @dacorvo in #275

Other changes

Use attention masks for TGI generation by @dacorvo in #264
Various fixes for TP by @michaelbenayoun in #260
Fix neuron pipelines by @dacorvo in #265
Fix #241 by @michaelbenayoun in #268
Fixes generation during the evaluation step by @michaelbenayoun in #266
Save / load from checkpoint TP by @michaelbenayoun in #269

Full Changelog: v0.0.12...v0.0.13

Contributors

dacorvo and michaelbenayoun

Assets 2

27 Oct 14:08

JingyaHuang

v0.0.12.1

fe11ccf

v0.0.12.1: Patch release for training with Neuron SDK 2.14

Major bugfixes

Fix #241 by @michaelbenayoun in #268

Full Changelog: v0.0.12...v0.0.12.1

Contributors

michaelbenayoun

Assets 2

16 Oct 08:42

JingyaHuang

v0.0.12

78c2c12

v0.0.12: SDXL refiner, Sequence parallelism training

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

[Stable Diffusion] Image2image and inpaint pipeline support by @JingyaHuang in #161
[SDXL] Add SDXL image to image support by @JingyaHuang in #239

Distributed Training:

Sequence parallelism by @michaelbenayoun in #233
Parallelism support for GPTNeoX by @michaelbenayoun in #244

Text generation updates

Add text generation pipeline by @dacorvo in #258

Other changes

TGI stability fixes by @dacorvo in #226
Remove experimental compilation flag for text-generation models by @dacorvo in #228
Patch for diffusers 0.21.0 release by @JingyaHuang in #229
test_examples uses ExampleRunner by @michaelbenayoun in #227
Using the real model name instead of hard code "model" by @davidshtian in #231
Replace transformers list of logits warpers by a fused logic warper by @dacorvo in #234
Use AWS Neuron SDK 2.14 by @dacorvo in #236
Weight loading after lazy loading fix by @michaelbenayoun in #238
Add debug attribute to NeuronPartialState by @michaelbenayoun in #240
Update tests/test_examples.py for AWS team by @michaelbenayoun in #242
Rework text-generation example by @dacorvo in #245
Fix evaluation recompilation issue by @michaelbenayoun in #248
test(generation): specify revision for hub test model by @dacorvo in #250
Add sequence length for generative models and llama tests by @dacorvo in #251
Fix noisy loss for T5 when doing TP by @michaelbenayoun in #257
Fix bug with transformers 4.34 by @michaelbenayoun in #259

New Contributors

@davidshtian made their first contribution in #231

Full Changelog: v0.0.11...v0.0.12

Contributors

dacorvo, davidshtian, and 2 other contributors

Assets 2

12 Sep 13:50

JingyaHuang

v0.0.11

608f869

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

SDXL Export and Inference

Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).

Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge or larger recommended) or a CPU-only instance (disable the validation with --disable-validation) :

optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/

And then run inference with the class NeuronStableDiffusionXLPipeline

from optimum.neuron import NeuronStableDiffusionXLPipeline

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
    model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]

Add sdxl exporter support by @JingyaHuang in #203
Add Stable Diffusion XL inference support by @JingyaHuang in #212

Llama v1, v2 Inference

Add support for Llama inference through NeuronModelForCausalLM by @dacorvo in #223

Llama v2 Training

Llama V2 training support by @michaelbenayoun in #211
LLama V1 training fix by @michaelbenayoun in #211

TGI

AWS Inferentia2 TGI server by @dacorvo in #214

Major bugfixes

neuron_parallel_compile, ParallelLoader and Zero-1 fixes for torchneuron 8+ by @michaelbenayoun in #200
flan-t5 fix: T5Parallelizer, NeuronCacheCallback and NeuronHash refactors by @michaelbenayoun in #207
Fix optimum-cli broke by optimum 1.13.0 release by @JingyaHuang in #217

Other changes

Bump Inference APIs to Neuron 2.13 by @JingyaHuang in #206
Add log for SD when applying optim attn & pipelines lazy loading by @JingyaHuang in #208
Cancel concurreny CIs for inference by @JingyaHuang in #218
fix(tgi): typer does not support Union types by @dacorvo in #219
Bump neuron-cc version to 1.18.* by @JingyaHuang in #224

Full Changelog: v0.0.10...v0.0.11

Contributors

dacorvo, michaelbenayoun, and JingyaHuang

Assets 2

28 Aug 11:55

JingyaHuang

v0.0.10

14dc28c

v0.0.10: Bugfixes and enhancement

Major bugfixes

Improve and Fix inferentia exporter by @JingyaHuang in #168
[Stable Diffusion] Fix the image size value inferral by @JingyaHuang in #167
Fix inferral of dynamic batch size from the config & Be compatible with transformers 4.32 by @JingyaHuang in #190

Enhancements of APIs

Enable exporter on non INF instances by @JingyaHuang in #178
Support multiple prompts for generation example by @dacorvo in #173
Fix unet export when using optimized attn score by @JingyaHuang in #165
Improve default compilation arguments for stable diffusion by @JingyaHuang in #182
Add num_image_per_prompt support for stable diffusion by @JingyaHuang in #192

Other changes

minor doc fix by @oOraph in #164
Fix duplicates handling in converting to safetensors by @michaelbenayoun in #172
Fix empty preprocessor issue by @JingyaHuang in #180
Update models.mdx by @philschmid in #183
Only run INF2 CI for .code change by @JingyaHuang in #184
Improve Readme and installation guide by @JingyaHuang in #181
Fixes #150 by @michaelbenayoun in #177
Fix TP for t5 by @michaelbenayoun in #179
Improve SD logging by @JingyaHuang in #194
Add mark step after optimizer step by @michaelbenayoun in #195
Option to disable the parallelization of the embedding with TP by @michaelbenayoun in #191
Restrict generation to sampling and greedy search by @dacorvo in #201

New Contributors

@oOraph made their first contribution in #164

Full Changelog: v0.0.9...v0.0.10

Contributors

dacorvo, oOraph, and 3 other contributors

Assets 2

Releases: huggingface/optimum-neuron

v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training

What's Changed

AWS SDK

Inference

Training

TGI

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache

What's Changed

AWS SDK

Inference

Training

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

v0.0.16: T5 export and inference, general training fixes

What's Changed

Training

Inference

v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK

What's Changed

Training

Distributed Training

AWS SDK

Documentation

Inference

Documentation

v0.0.14: LCM support

What's Changed

LCM support

Tutorials and doc improvement

Major bugfixes

Other changes

New Contributors

Contributors

v0.0.13: AWS Neuron SDK 2.15

What's Changed

Text-generation

Other changes

Contributors

v0.0.12.1: Patch release for training with Neuron SDK 2.14

Major bugfixes

Contributors

v0.0.12: SDXL refiner, Sequence parallelism training

What's Changed

Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support

Distributed Training:

Text generation updates

Other changes

New Contributors

Contributors

v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI

SDXL Export and Inference

Llama v1, v2 Inference

Llama v2 Training

TGI

Major bugfixes

Other changes

Contributors

v0.0.10: Bugfixes and enhancement

Major bugfixes

Enhancements of APIs

Other changes

New Contributors

Contributors