Releases: huggingface/optimum-neuron
v0.0.18: AWS Neuron SDK 2.16.1, NeuronX TGI improvements, PP for Training
What's Changed
AWS SDK
- Use AWS Neuron SDK 2.16.1 (#449)
Inference
- Preliminary support for neff/weights decoupling by @JingyaHuang (#402)
- Allow exporting decoder models using optimum-cli by @dacorvo (#422)
- Add Neuron X cache registry by @dacorvo (#442)
- Add StoppingCriteria to generate() of NeuronModelForCausalLM by @dacorvo (#454)
Training
- Initial support for pipeline parallelism by @michaelbenayoun (#279)
TGI
Tutorials and doc improvement
- Various fixes by @jimburtoft @michaelbenayoun @JingyaHuang (#428 #429 #432)
- Improve Stable Diffusion Notebooks by @JingyaHuang (#431)
- Add Sentence Transformers Guide and Notebook by @philschmid (#434)
- Add benchmark section by @dacorvo (#435)
Major bugfixes
- TGI: correctly identify special tokens during generation by @dacorvo (#438)
- TGI: do not include the input_text in generated text by @dacorvo (#454)
Other changes
- API change to be compatible to Optimum by @JingyaHuang (#421)
New Contributors
- @jimburtoft made their first contribution in #432
Full Changelog: v0.0.17...v0.0.18
v0.0.17: AWS Neuron SDK 2.16, Mistral, sentence transformers, inference cache
What's Changed
AWS SDK
- Use AWS Neuron SDK 2.16 (#398)
- Use offical serialization API for transformers_neuronx models instead of beta by @aws-yishanm (#387, #393)
Inference
- Improve the support of sentence transformers by @JingyaHuang (#408)
- Add Neuronx compile cache Hub proxy and use it for LLM decoder models by @dacorvo (#410)
- Add support for Mistral models by @dacorvo (#411)
- Do not upload Neuron LLM weights when they can be fetched from the hub by @dacorvo (#413)
Training
- Add general support for generation on TRN with NxD by @aws-tianquaw (#370)
Tutorials and doc improvement
- Add llama 2 fine tuning tutorial by @philschmid (#390)
Major bugfixes
- Skip pushing if the user does not have write access to the cache repo by @michaelbenayoun (#405)
Other changes
- Bump Hugging Face library versions by @JingyaHuang (#403)
New Contributors
- @aws-tianquaw made their first contribution in #370
- @aws-yishanm made their first contribution in #387
Full Changelog: v0.0.16...v0.0.17
v0.0.16: T5 export and inference, general training fixes
What's Changed
Training
A few fixes related to precompilation and checkpoiting. Those fixes enable training LLMs on AWS Trainium instances without friction.
- Skip model saving during precompilation and provide option to skip cache push (#365)
- Fixes checkpoint saving and consolidtation for TP (#378)
- A
torch_xla
compatible version ofsafetensors.torch.save_file
is now used in theNeuronTrainer
(#329)
Inference
v0.0.15: Mistral training, Tensor parallelism improvement, better integration with the AWS SDK
What's Changed
Training
Distributed Training
parallel_cross_entropy
loss support for tensor parallelism (#246)- Support for training the Mistral architecture with tensor parallelism (#303)
AWS SDK
- Fix:
neuron_parallel_compile
is compatible with the cache system (#352) - Full support for
neuron_parallel_compile
with the cache system: compilation files produced byneuron_parallel_compile
will be pushed to the remote cache repo on the Hugging Face Hub at the beginning of the next training job (#354)
Documentation
Inference
- Data parallelism option for Stable Diffusion - LCM allowing multi-device inference (#346)
- Support decoding sequences of byte tokens in TGI (#350)
Documentation
- Updated the documentation on LCM (#351)
v0.0.14: LCM support
What's Changed
LCM support
- [Stable Diffusion] Add LCM(Latent Consistency Models) support by @JingyaHuang in #323
Tutorials and doc improvement
- notebooks: add llama2 chatbot example by @dacorvo in #300
- Add llama 2 tutorial by @dacorvo in #321
- Migrate documentation of Stable Diffusion and add notebooks by @JingyaHuang in #312
Major bugfixes
- Noisy loss fix by @bocchris-aws in #293
- Fix neuron cache starting compilation before fetching by @michaelbenayoun in #280
- fix(pipelines): support passing decoder model + tokenizer by @dacorvo in #319
Other changes
- chore: update dev version by @dacorvo in #276
- Explicitly mention aws repo extra url in documentation by @dacorvo in #277
- Update supported architecture in the doc by @JingyaHuang in #281
- Fix doc build source code broken links by @JingyaHuang in #282
- Add revision to push_to_hub by @philschmid in #292
- Set default device id for SD and SDXL by @JingyaHuang in #297
- Add missing decoder model architectures by @dacorvo in #298
- Official support for AWS inferentia2 TGI container by @dacorvo in #302
- Transformers fix by @dacorvo in #320
- Add sagemaker compatible image by @dacorvo in #322
- Fix broken tests by @michaelbenayoun in #274
- chore: align with AWS Neuron SDK 2.15.1 by @dacorvo in #325
- Deleted the 'maybe_free_model_hooks()' from Diffusers Pipelines by @Cerrix in #330
- Bump diffusers version by @JingyaHuang in #335
New Contributors
Full Changelog: v0.0.13...v0.0.14
v0.0.13: AWS Neuron SDK 2.15
What's Changed
The main change in this release is the alignment with AWS Neuron SDK 2.15.
Text-generation
Other changes
- Use attention masks for TGI generation by @dacorvo in #264
- Various fixes for TP by @michaelbenayoun in #260
- Fix neuron pipelines by @dacorvo in #265
- Fix #241 by @michaelbenayoun in #268
- Fixes generation during the evaluation step by @michaelbenayoun in #266
- Save / load from checkpoint TP by @michaelbenayoun in #269
Full Changelog: v0.0.12...v0.0.13
v0.0.12.1: Patch release for training with Neuron SDK 2.14
v0.0.12: SDXL refiner, Sequence parallelism training
What's Changed
Stable Diffusion: SDXL Refiner, Stable Diffusion Img2Img, Inpaint support
- [Stable Diffusion] Image2image and inpaint pipeline support by @JingyaHuang in #161
- [SDXL] Add SDXL image to image support by @JingyaHuang in #239
Distributed Training:
- Sequence parallelism by @michaelbenayoun in #233
- Parallelism support for GPTNeoX by @michaelbenayoun in #244
Text generation updates
Other changes
- TGI stability fixes by @dacorvo in #226
- Remove experimental compilation flag for text-generation models by @dacorvo in #228
- Patch for diffusers 0.21.0 release by @JingyaHuang in #229
- test_examples uses ExampleRunner by @michaelbenayoun in #227
- Using the real model name instead of hard code "model" by @davidshtian in #231
- Replace transformers list of logits warpers by a fused logic warper by @dacorvo in #234
- Use AWS Neuron SDK 2.14 by @dacorvo in #236
- Weight loading after lazy loading fix by @michaelbenayoun in #238
- Add
debug
attribute toNeuronPartialState
by @michaelbenayoun in #240 - Update
tests/test_examples.py
for AWS team by @michaelbenayoun in #242 - Rework text-generation example by @dacorvo in #245
- Fix evaluation recompilation issue by @michaelbenayoun in #248
- test(generation): specify revision for hub test model by @dacorvo in #250
- Add sequence length for generative models and llama tests by @dacorvo in #251
- Fix noisy loss for T5 when doing TP by @michaelbenayoun in #257
- Fix bug with transformers 4.34 by @michaelbenayoun in #259
New Contributors
- @davidshtian made their first contribution in #231
Full Changelog: v0.0.11...v0.0.12
v0.0.11: SDXL, LLama v2 training and inference, Inf2 powered TGI
SDXL Export and Inference
Optimum CLI now supports compiling components in the SDXL pipeline for inference on neuron devices (inf2/trn1).
Below is an example of compiling SDXL models. You can either compile it with an inf2 instance (inf2.8xlarge
or larger recommended) or a CPU-only instance (disable the validation with --disable-validation
) :
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --auto_cast matmul --auto_cast_type bf16 sdxl_neuron/
And then run inference with the class NeuronStableDiffusionXLPipeline
from optimum.neuron import NeuronStableDiffusionXLPipeline
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
stable_diffusion_xl = NeuronStableDiffusionXLPipeline.from_pretrained(
model_id="sdxl_neuron/", device_ids=[0, 1]
)
image = stable_diffusion_xl(prompt).images[0]
- Add sdxl exporter support by @JingyaHuang in #203
- Add Stable Diffusion XL inference support by @JingyaHuang in #212
Llama v1, v2 Inference
Llama v2 Training
- Llama V2 training support by @michaelbenayoun in #211
- LLama V1 training fix by @michaelbenayoun in #211
TGI
Major bugfixes
neuron_parallel_compile
,ParallelLoader
and Zero-1 fixes for torchneuron 8+ by @michaelbenayoun in #200- flan-t5 fix:
T5Parallelizer
,NeuronCacheCallback
andNeuronHash
refactors by @michaelbenayoun in #207 - Fix optimum-cli broke by optimum 1.13.0 release by @JingyaHuang in #217
Other changes
- Bump Inference APIs to Neuron 2.13 by @JingyaHuang in #206
- Add log for SD when applying optim attn & pipelines lazy loading by @JingyaHuang in #208
- Cancel concurreny CIs for inference by @JingyaHuang in #218
- fix(tgi): typer does not support Union types by @dacorvo in #219
- Bump neuron-cc version to 1.18.* by @JingyaHuang in #224
Full Changelog: v0.0.10...v0.0.11
v0.0.10: Bugfixes and enhancement
Major bugfixes
- Improve and Fix inferentia exporter by @JingyaHuang in #168
- [Stable Diffusion] Fix the image size value inferral by @JingyaHuang in #167
- Fix inferral of dynamic batch size from the config & Be compatible with transformers 4.32 by @JingyaHuang in #190
Enhancements of APIs
- Enable exporter on non INF instances by @JingyaHuang in #178
- Support multiple prompts for generation example by @dacorvo in #173
- Fix unet export when using optimized attn score by @JingyaHuang in #165
- Improve default compilation arguments for stable diffusion by @JingyaHuang in #182
- Add
num_image_per_prompt
support for stable diffusion by @JingyaHuang in #192
Other changes
- minor doc fix by @oOraph in #164
- Fix duplicates handling in converting to
safetensors
by @michaelbenayoun in #172 - Fix empty preprocessor issue by @JingyaHuang in #180
- Update models.mdx by @philschmid in #183
- Only run INF2 CI for .code change by @JingyaHuang in #184
- Improve Readme and installation guide by @JingyaHuang in #181
- Fixes #150 by @michaelbenayoun in #177
- Fix TP for t5 by @michaelbenayoun in #179
- Improve SD logging by @JingyaHuang in #194
- Add mark step after optimizer step by @michaelbenayoun in #195
- Option to disable the parallelization of the embedding with TP by @michaelbenayoun in #191
- Restrict generation to sampling and greedy search by @dacorvo in #201
New Contributors
Full Changelog: v0.0.9...v0.0.10