Update dependency transformers to v4.51.3 #52

renovate · 2025-02-07T16:41:26Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
transformers	`==4.48.2` -> `==4.51.3`

Release Notes

huggingface/transformers (transformers)

`v4.51.3`

Compare Source

`v4.51.2`: Patch Release 4.51.2

Compare Source

Patch Release 4.51.2

This is another round of bug fixes, but they are a lot more minor and outputs were not really affected!

Fix Llama4 offset (#37414) by @Cyrilvallez
Attention Quantization with FBGemm & TP (#37384) by @MekkCyber
use rms_norm_eps for the L2Norm for Llama4 (#37418) by @danielhanchen
mark llama4 as not supported with fa2 (#37416) by @winglian

`v4.51.1`: Patch release v4.51.1

Compare Source

Patch release v4.51.1

Since the release of Llama 4, we have fixed a few issues that we are now releasing in patch v4.51.1

Fixing flex attention for torch=2.6.0 (#37285)
more fixes for post-training llama4 (#37329)
Remove HQQ from caching allocator warmup (#37347)
fix derived berts _init_weights (#37341)
Fix init empty weights without accelerate (#37337)
Fix deepspeed with quantization (#37324)
fix llama4 training (#37319)
fix flex attn when optional args aren't passed (#37327)
Multiple llama4 fixe (#37353)

Thanks all for your patience

`v4.51.0`: : Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3

Compare Source

New Model Additions

Llama 4

Llama 4, developed by Meta, introduces a new auto-regressive Mixture-of-Experts (MoE) architecture.This generation includes two models:

The highly capable Llama 4 Maverick with 17B active parameters out of ~400B total, with 128 experts.
The efficient Llama 4 Scout also has 17B active parameters out of ~109B total, using just 16 experts.

Both models leverage early fusion for native multimodality, enabling them to process text and image inputs. Maverick and Scout are both trained on up to 40 trillion tokens on data encompassing 200 languages (with specific fine-tuning support for 12 languages including Arabic, Spanish, German, and Hindi).

For deployment, Llama 4 Scout is designed for accessibility, fitting on a single server-grade GPU via on-the-fly 4-bit or 8-bit quantization, while Maverick is available in BF16 and FP8 formats. These models are released under the custom Llama 4 Community License Agreement, available on the model repositories

Getting started with Llama 4 using transformers is straightforward. Make sure you have transformers v4.51.0 or later installed:

pip install -U transformers[hf_xet]

Here's a quick example using the instruction-tuned Maverick model responding about two images, using tensor parallel for maximum speed. You need to run this script on an instance with 8 GPUs, using a command like:

torchrun –nproc-per-instance=8 script.py

from transformers import AutoProcessor, Llama4ForConditionalGeneration
import torch

model_id = "meta-llama/Llama-4-Maverick-17B-128E-Instruct"

processor = AutoProcessor.from_pretrained(model_id)
model = Llama4ForConditionalGeneration.from_pretrained(
    model_id,
    attn_implementation="flex_attention",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

url1 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
url2 = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_layout.png"
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": url1},
            {"type": "image", "url": url2},
            {"type": "text", "text": "Can you describe how these two images are similar, and how they differ?"},
        ]
    },
]

inputs = processor.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
)

response = processor.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])[0]
print(response)
print(outputs[0])

Make sure to check the model cards on the repos (Llama 4 Maverick (~400B) and Llama 4 Scout (~109B)) for detailed usage instructions, including multimodal examples, specific prompt formats (like system prompts), quantization details, and advanced configuration options!

Phi4-Multimodal

Phi-4-multimodal-instruct is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models. The model processes text, image, and audio inputs, generating text outputs, and comes with 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning, direct preference optimization and RLHF (Reinforcement Learning from Human Feedback) to support precise instruction adherence and safety measures. The languages that each modal supports are the following:

Text: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian
Vision: English
Audio: English, Chinese, German, French, Italian, Japanese, Spanish, Portuguese

Add Phi4 multimodal by @Cyrilvallez in #36939

DeepSeek-v3

DeepSeek-v3 is heavily referenced in the following model-based release and we recommend reading these if you want all the information relative to that model.

The model is detailed in the following paper.

Overview

The DeepSeek-V3 model was proposed in DeepSeek-V3 Technical Report by DeepSeek-AI Team.

The abstract from the paper is the following:

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.

[WIP] add deepseek-v3 by @bzantium in #35926

Qwen3

The Qwen3 architecture has been contributed to transformers and is available in v4.51.0. At time of release, the models themselves have not yet been released - stay tuned for a release from the Qwen team!

Adding Qwen3 and Qwen3MoE by @bozheng-hit in #36878

Documentation

Model docs are getting a significant overhaul by providing much needed, ready-to-use examples one can copy-paste in their modules/consoles. We will adapt these examples to each model, with the goal of providing relevant examples on a per-model basis.

[docs] Model docs by @stevhliu in #36469

Significant model improvements

A very large PR was provided by @nikosanto13 that helped add modular files to all speech models in the library; seeing the difference between each of them is now much simpler, as well as maintenance and eventual refactors.

Introduce modular files for speech models by @nikosanto13 in #35902

Bugfixes and improvements

fix: loss computation after embeddings resize - mllama by @Ssukriti in #36840
Simplify keep_in_fp32_modules logic by @Cyrilvallez in #36722
Fix Pan and Scan on batched images Gemma3 by @yonigozlan in #36864
Update installation.md by @ariG23498 in #36826
fix Gemma3 Config by @eljandoubi in #36893
Fix torch version guard at import by @zucchini-nlp in #36907
[Fix] Add original_max_position_embeddings to YARN rope_scaling optional keys by @JustinTong0323 in #36877
tests: fix asyncio.wait() usage for python>=3.11 by @dvrogozh in #36898
[chameleon] fix num image token check by @zucchini-nlp in #36918
Fix Compressed tensors to_dict_diff by @MekkCyber in #36922
Use another repo. for Mistral3 processor testing by @ydshieh in #36925
Fix typos by @omahs in #36910
Update trainer_pt_utils.py docstrings for consistency by @ethanknights in #36912
[2/N] Use pyupgrade --py39-plus to improve code by @cyyever in #36857
Fix pytorch defomr attn path by @qubvel in #36923
More precise comment by @ydshieh in #36935
Added support for seed in DataCollatorForWholeWordMask by @capemox in #36903
Fix processor kwargs qwen2 vl by @yonigozlan in #36890
Disallow Offload to disk for gguf files by @MekkCyber in #36933
Deprecate #36741 and map Causal to Conditional by @zucchini-nlp in #36917
Fixing _pre_quantization_dtype when torch_dtype is None by @MekkCyber in #36930
Export for Phi4-mini by @guangy10 in #36780
fix typos in the tests directory by @threewebcode in #36932
Fix cuda index issue in cache allocator by @SunMarc in #36937
[Utils] torch version checks optionally accept dev versions by @gante in #36847
Update after #36962 by @ydshieh in #36965
Change GPUS to GPUs by @zhanluxianshen in #36945
typo fixed in README_fr.md by @NargiT in #36951
Updated docker files to use uv for installing packages by @Sai-Suraj-27 in #36957
update examples after ruff being updated by @ydshieh in #36972
Remove extra tensor clone in PyTorch code by @cyyever in #36748
[docs] Fix image link by @stevhliu in #36869
Add ruff target-version by @cyyever in #36971
update bot comment again by @ydshieh in #36974
🚨Deprecate legacy argument for image-text-to-text models and adopt new behavior by default by @yonigozlan in #36307
Fix tensor dtype mismatch by @cyyever in #36985
byebye CircleCI TF jobs by @ydshieh in #36998
Use torch.expm1 by @cyyever in #36995
Install networkx==3.2.1 manually in some CircleCI jobs after #36957 by @ydshieh in #37000
Fix Optional type annotation by @cyyever in #36841
Fix get_device_properties by @ivarflakstad in #36997
Allow easy registration of custom attention functions by @Cyrilvallez in #36889
Fix removing "cpu" from frozenset in bitsandbytes.py to allow better ROCm support. by @anadon in #36975
Fix device_map check for ggml files by @MekkCyber in #37003
Log the correct learning rate by @SunMarc in #36973
fix typos in the code comments and error messages by @threewebcode in #36993
Remove deprecated training arguments by @cyyever in #36946
[docs] Attention mask image by @stevhliu in #36970
fix transformers_cli import relative path issue by @yao-matrix in #36989
Support QuestionAnswering Module for ModernBert based models. by @bakrianoo in #35566
Fix PixtralProcessor patch_size when spatial_merge_size is used by @mgoin in #37019
[Modeling] Load FP8 safetensors such as DeepSeek by @kylesayrs in #36828
Mark 2 tests as flaky for now by @ydshieh in #37038
remove redundant code in trainer by @hiyouga in #36994
Skip FP8 linear tests For device capability 9.0 by @MekkCyber in #37008
Add Distill Any Depth by @keetrap in #36614
fix pegasus init weights and other copied models by @jiqing-feng in #36844
Optimize to_py_obj for python-native numeric lists and scalars by @n0gu-furiosa in #36885
Fixup for distill_any_depth conversion script by @qubvel in #37043
[chat templates} support loading audio from video by @zucchini-nlp in #36955
[audio utils] fix fft_bin_width computation by @eustlb in #36603
[generate, cache] handle more complex device maps by @gante in #37014
clean pipeline question_answering. by @zhanluxianshen in #36986
Avoid unnecessary device operations in loss computing by @cyyever in #36950
Set weights_only in torch.load by @cyyever in #36991
Replace default split function with jnp.split() in flax models by @premmurugan229 in #37001
Remove deprecated batch_size parameter by @cyyever in #37007
fixed typo by @finnoh in #37036
fix: Fully remove legacy cache from Llama by @Wheest in #36958
Fix SDPA implementation in Qwen2-VL (issues with torch==2.6.0) by @ManuelFay in #36891
fix: AttributeError: 'LlavaProcessor' object has no attribute 'image_token_id' by @jp1924 in #37026
Fix some typos about benchmark scripts. by @zhanluxianshen in #37027
Change deprecated PT functions by @cyyever in #37041
[blip-2] Fix dtype mismatch when keep in fp32 by @zucchini-nlp in #37068
fix tied weigths issue by @ydshieh in #37031
Update w/ new account by @muellerzr in #37084
Fix state_dict map location when quantized by @Cyrilvallez in #37086
Fix AttentionInterface following feedback by @Cyrilvallez in #37010
fixed typo. by @zhanluxianshen in #37057
[generate] beam search -- fix output cropping by @gante in #37080
[Cache] rename dtype attribute 🚨 🚨 by @gante in #37044
Kenlm by @ydshieh in #37091
🌐 [i18n-KO] Translated qwen2_vl.md to Korean by @MinJu-Ha in #36750
Gaudi: Fix the pipeline failed issue with hpu device by @yuanwu2017 in #36990
Support passing flash_attn_kwargs when gradient_checkpointing is enabled by @efsotr in #37037
Fix 4090/ada not detected as having FP8 support by @Qubitium in #37067
enable tp on CPU by @jiqing-feng in #36299
fix whisper re-compile by @jiqing-feng in #36712
[MLU] Fix FA2 check error, remove deepspeed-mlu deps. by @huismiling in #36159
Fix Gemma3 embedding scaling by @gau-nernst in #37109
RWKV: fix mask warning typo by @RobinKa in #37114
Remove deprecated code by @cyyever in #37059
[tests] remove cuda-only test marker in AwqConfigTest by @faaany in #37032
Export T5 (encoder-decoder) to ExecuTorch by @guangy10 in #36486
skip by @ydshieh in #37141
[qwen3] fix generation tests by @zucchini-nlp in #37142
Fix more inefficient PT operations by @cyyever in #37060
Fix std initialization in Idefics variants by @yaswanth19 in #37100
add gpt2 test on XPU by @jiqing-feng in #37028
Fix llava xpu tests. by @jiqing-feng in #37130
enable test_assisted_decoding_in_different_gpu test on XPU by @yao-matrix in #37120
Use public export API on torch 2.5 and future by @guangy10 in #36781
Convert _VALID_DICT_FIELDS to class attribute for shared dict parsing in subclasses by @Tavish9 in #36736
Only count num items in batch when needed by @IlyasMoutawwakil in #36867
Make canine model exportable by removing unncessary complicated logic by @tugsbayasgalan in #37124
[ModernBERT] Never save 'reference_compile' config; should be set based on end user by @tomaarsen in #36305
fix XPU UT error case brough by RNG difference btw XPU and CUDA by @yao-matrix in #37121
Fixes the inconsistency of the optionality of attention_mask by @Zephyr271828 in #37153
Avoid pipeline test failing related to Hub call by @ydshieh in #37170
Fix meta state dict loading with quantizers by @Cyrilvallez in #37136
Revert #37031 by @Cyrilvallez in #37178
[doc] Fix link for Quark quantization page by @BowenBao in #37179
[chat-template] fix video loading by @zucchini-nlp in #37146
Skip code 307 in RequestCounter by @ydshieh in #36953
Add device workaround for int4 weight only quantization after API update by @jerryzh168 in #36980
Fixes DynamicCache export issues due to control flow and inplace modifications by @xadupre in #36652
Try to avoid/reduce some remaining CI job failures by @ydshieh in #37202
fix: Add 'image-text-to-text' to TASK_MAPPING by @saattrupdan in #37107
Fix some code annotation typos. by @zhanluxianshen in #37102
Merge tensor operations with device transfer operations by @cyyever in #37097
[3/N] Use pyupgrade --py39-plus to improve code by @cyyever in #36936
Add py.typed by @cyyever in #37022
No more dtype_byte_size() by @Rocketknight1 in #37144
[Tests] add min_new_tokens to prevent flaky length checks by @gante in #37175
Stop DOSing the Hub in the CI by @Rocketknight1 in #37209
More ReDOS fixes! by @Rocketknight1 in #36964
Updated the model card for CLIP by @purusharthmalik in #37040
Update falcon model card by @ricalanis in #37184
Updated model card for Qwen2 by @Aravind-11 in #37192
Fix static cache export by @guangy10 in #37229
[Phi4] add multimodal chat template by @zucchini-nlp in #36996
Add new dim to num_items_in_batch if necessary by @regisss in #36967
Fix test by @Cyrilvallez in #37213
[tests] fix mamba integration simple inference precision issue by @faaany in #37193
[CI] lazy loading external datasets by @gante in #37218
enable 2 types of case on XPU by @yao-matrix in #37198
Fix AST parsing when looking for remote code imports by @Rocketknight1 in #37245
Add support for fast image processing in image-pretraining example by @jafraustro in #37021
Allow flexible generation params arg when checking pipeline specs by @Rocketknight1 in #37211
[CI] green llama tests by @gante in #37244
Adding links to ShieldGemma 2 technical report by @RyanMullins in #37247
feat: updated model card for qwen_2.5_vl by @arkhamHack in #37099
Update model card for Cohere by @bimal-gajera in #37056
chore: Update model doc for code_llama by @AbhishekRP2002 in #37115
Update Model Card for ModernBERT by @ParagEkbote in #37052
Update model card for electra by @Wu-n0 in #37063
[qwen-vl] fix image processor by @zucchini-nlp in #37258
update error msg by @itazap in #37207
Fix utils/check_bad_commit.py by @ydshieh in #37272
Support return_tensors in audio chat templates by @zucchini-nlp in #34601
Update ruff to 0.11.2 by @ydshieh in #36962
Fix typing for None valued variables by @cyyever in #37004
Use lru_cache for tokenization tests by @ydshieh in #36818
Create and Expose SamVisionModel as public for better accessibility by @geetu040 in #36493
[Feature] Support using FlashAttention2 on Ascend NPU by @FightingZhen in #36696
Remove low_cpu_mem_usage and _fast_init by @Cyrilvallez in #36963
Refactor return_dict logic to remove complicated if/else paths by @qubvel in #36794
Refactor attention for SigLIP based models by @qubvel in #36981
Add Optional to types by @cyyever in #37163
Purge unused ModelTester code by @Rocketknight1 in #37085

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@cyyever
- [2/N] Use pyupgrade --py39-plus to improve code (#36857)
- Remove extra tensor clone in PyTorch code (#36748)
- Add ruff target-version (#36971)
- Fix tensor dtype mismatch (#36985)
- Use torch.expm1 (#36995)
- Fix Optional type annotation (#36841)
- Remove deprecated training arguments (#36946)
- Avoid unnecessary device operations in loss computing (#36950)
- Fix typing for None valued variables (#37004)
- Set weights_only in torch.load (#36991)
- Remove deprecated batch_size parameter (#37007)
- Change deprecated PT functions (#37041)
- Remove deprecated code (#37059)
- Fix more inefficient PT operations (#37060)
- Merge tensor operations with device transfer operations (#37097)
- [3/N] Use pyupgrade --py39-plus to improve code (#36936)
- Add py.typed (#37022)
- Add Optional to types (#37163)
@bzantium
- [WIP] add deepseek-v3 (#35926)
@bozheng-hit
- Adding Qwen3 and Qwen3MoE (#36878)
@geetu040
- Create and Expose SamVisionModel as public for better accessibility (#36493)
@FightingZhen
- [Feature] Support using FlashAttention2 on Ascend NPU (#36696)
@nikosanto13
- Introduce modular files for speech models (#35902)

`v4.50.3`: Patch release v4.50.3

Compare Source

Patch release v4.50.3

Thanks to the vllm team we have a few more bugs that slipped in!

[generate] beam search -- fix output cropping (#37080) by @gante
[blip-2] Fix dtype mismatch when keep in fp32 (#37068) by @zucchini-nlp
Fix PixtralProcessor patch_size when spatial_merge_size is used (#37019)

`v4.50.2`: Patch release v4.50.2

Compare Source

Patch release v4.50.2

I completely forgot to put these in the previous patch sorry!
Should put the transformers backend in a good spot!

[Utils] torch version checks optionally accept dev versions (#36847) by @gante
Fix processor kwargs qwen2 vl (#36890) by @yonigozlan
Fix Pan and Scan on batched images Gemma3 (#36864) by @yonigozlan

`v4.50.1`: Patch release v4.50.1

Compare Source

Patch release v4.50.1

There were some very minor bugs with the new hub kernels, and with remote code that we had to fix

Deprecate #36741 and map Causal to Conditional (#36917) by @zucchini-nlp
Fix pytorch deform attn path (#36923) by @qubvel
[chameleon] fix num image token check (#36918) by @zucchini-nlp
Fix torch version guard at import (#36907) by @zucchini-nlp

`v4.50.0`

Compare Source

Release v4.50.0

New Model Additions

Model-based releases

Starting with version v4.49.0, we have been doing model-based releases, additionally to our traditional, software-based monthly releases. These model-based releases provide a tag from which models may be installed.

Contrarily to our software-releases; these are not pushed to pypi and are kept on our GitHub. Each release has a tag attributed to it, such as:

v4.49.0-Gemma-3
v4.49.0-AyaVision

⚠️ As bugs are identified and fixed on each model, the release tags are updated so that installing from that tag always gives the best experience possible with that model.

Each new model release will always be based on the current state of the main branch at the time of its creation. This ensures that new models start with the latest features and fixes available.

For example, if two models—Gemma-3 and AyaVision—are released from main, and then a fix for gemma3 is merged, it will look something like this:

              o---- v4.49.0-Gemma-3 (includes AyaVision, plus main fixes)
            /                  \  
---o--o--o--o--o-- (fix for gemma3) --o--o--o main
       \          
        o---- v4.49.0-AyaVision

We strive to merge model specific fixes on their respective branches as fast as possible!

Gemma 3

Gemma 3 is heavily referenced in the following model-based release and we recommend reading these if you want all the information relative to that model.

The Gemma 3 model was proposed by Google. It is a vision-language model composed by a SigLIP vision encoder and a Gemma 2 language decoder linked by a multimodal linear projection.

It cuts an image into a fixed number of tokens same way as Siglip if the image does not exceed certain aspect ratio. For images that exceed the given aspect ratio, it crops the image into multiple smaller pacthes and concatenates them with the base image embedding.

One particularity is that the model uses bidirectional attention on all the image tokens. Also, the model interleaves sliding window local attention with full causal attention in the language backbone, where each sixth layer is a full causal attention layer.

Gemma3 by @RyanMullins in #36658

Shield Gemma2

ShieldGemma 2 is built on Gemma 3, is a 4 billion (4B) parameter model that checks the safety of both synthetic and natural images against key categories to help you build robust datasets and models. With this addition to the Gemma family of models, researchers and developers can now easily minimize the risk of harmful content in their models across key areas of harm as defined below:

No Sexually Explicit content: The image shall not contain content that depicts explicit or graphic sexual acts (e.g., pornography, erotic nudity, depictions of rape or sexual assault).
No Dangerous Content: The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).
No Violence/Gore content: The image shall not contain content that depicts shocking, sensational, or gratuitous violence (e.g., excessive blood and gore, gratuitous violence against animals, extreme injury or moment of death).

We recommend using ShieldGemma 2 as an input filter to vision language models, or as an output filter of image generation systems. To train a robust image safety model, we curated training datasets of natural and synthetic images and instruction-tuned Gemma 3 to demonstrate strong performance.

Shieldgemma2 #36678 by @RyanMullins

Aya Vision

AyaVision is heavily referenced in the following model-based release and we recommend reading these if you want all the information relative to that model.

The Aya Vision 8B and 32B models is a state-of-the-art multilingual multimodal models developed by Cohere For AI. They build on the Aya Expanse recipe to handle both visual and textual information without compromising on the strong multilingual textual performance of the original model.

Aya Vision 8B combines the Siglip2-so400-384-14 vision encoder with the Cohere CommandR-7B language model further post-trained with the Aya Expanse recipe, creating a powerful vision-language model capable of understanding images and generating text across 23 languages. Whereas, Aya Vision 32B uses Aya Expanse 32B as the language model.

Key features of Aya Vision include:

Multimodal capabilities in 23 languages
Strong text-only multilingual capabilities inherited from CommandR-7B post-trained with the Aya Expanse recipe and Aya Expanse 32B
High-quality visual understanding using the Siglip2-so400-384-14 vision encoder
Seamless integration of visual and textual information in 23 languages.

Add aya by @ArthurZucker in #36521

Mistral 3.1

Mistral 3.1 is heavily referenced in the following model-based release and we recommend reading these if you want all the information relative to that model.

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

It is ideal for:

Fast-response conversational agents.
Low-latency function calling.
Subject matter experts via fine-tuning.
Local inference for hobbyists and organizations handling sensitive data.
Programming and math reasoning.
Long document understanding.
Visual understanding.

Add Mistral3 by @Cyrilvallez in #36790

Smol VLM 2

SmolVLM-2 is heavily referenced in the following model-based release and we recommend reading these if you want all the information relative to that model.

SmolVLM2 is an adaptation of the Idefics3 model with two main differences:

It uses SmolLM2 for the text model.
It supports multi-image and video inputs

SmolVLM2 by @orrzohar in #36126

SigLIP-2

SigLIP-2 is heavily referenced in the following model-based release and we recommend reading these if you want all the information relative to that model.

The SigLIP2 model was proposed in SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features by Michael Tschannen, Alexey Gritsenko, Xiao Wang, Muhammad Ferjad Naeem, Ibrahim Alabdulmohsin,
Nikhil Parthasarathy, Talfan Evans, Lucas Beyer, Ye Xia, Basil Mustafa, Olivier Hénaff, Jeremiah Harmsen,
Andreas Steiner and Xiaohua Zhai.

The model comes in two variants

FixRes - model works with fixed resolution images (backward compatible with SigLIP v1)
NaFlex - model works with variable image aspect ratios and resolutions (SigLIP2 in transformers)

Add SigLIP 2 by @qubvel in #36323

Prompt Depth Anything

PromptDepthAnything is a high-resolution, accurate metric depth estimation model that leverages prompting, inspired by its success in vision-language (VLMs) and large language models (LLMs). Using iPhone LiDAR as a prompt, the model generates precise depth maps at up to 4K resolution, unlocking the potential of depth foundation models.

Add Prompt Depth Anything Model by @haotongl in #35401

New tool: attention visualization

We

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate bot requested a review from x-504 as a code owner February 7, 2025 16:41

renovate bot changed the title ~~Update dependency transformers to v4.48.3~~ Update dependency transformers to v4.49.0 Feb 17, 2025

renovate bot force-pushed the renovate/transformers-4.x branch from 03c4f8a to 02ee1fe Compare February 17, 2025 19:19

renovate bot force-pushed the renovate/transformers-4.x branch from 02ee1fe to 5172fb0 Compare March 21, 2025 15:59

renovate bot changed the title ~~Update dependency transformers to v4.49.0~~ Update dependency transformers to v4.50.0 Mar 21, 2025

renovate bot force-pushed the renovate/transformers-4.x branch from 5172fb0 to 217e197 Compare March 25, 2025 16:43

renovate bot changed the title ~~Update dependency transformers to v4.50.0~~ Update dependency transformers to v4.50.1 Mar 25, 2025

renovate bot force-pushed the renovate/transformers-4.x branch from 217e197 to aa12834 Compare March 27, 2025 11:05

renovate bot changed the title ~~Update dependency transformers to v4.50.1~~ Update dependency transformers to v4.50.2 Mar 27, 2025

renovate bot force-pushed the renovate/transformers-4.x branch from aa12834 to 34d2943 Compare March 28, 2025 19:37

renovate bot changed the title ~~Update dependency transformers to v4.50.2~~ Update dependency transformers to v4.50.3 Mar 28, 2025

renovate bot force-pushed the renovate/transformers-4.x branch from 34d2943 to 8b38f78 Compare April 5, 2025 22:48

renovate bot changed the title ~~Update dependency transformers to v4.50.3~~ Update dependency transformers to v4.51.0 Apr 5, 2025

renovate bot force-pushed the renovate/transformers-4.x branch from 8b38f78 to 3833064 Compare April 8, 2025 13:59

renovate bot changed the title ~~Update dependency transformers to v4.51.0~~ Update dependency transformers to v4.51.1 Apr 8, 2025

renovate bot force-pushed the renovate/transformers-4.x branch from 3833064 to 2def46a Compare April 10, 2025 16:55

renovate bot changed the title ~~Update dependency transformers to v4.51.1~~ Update dependency transformers to v4.51.2 Apr 10, 2025

Update dependency transformers to v4.51.3

ff0b2ae

renovate bot force-pushed the renovate/transformers-4.x branch from 2def46a to ff0b2ae Compare April 14, 2025 12:54

renovate bot changed the title ~~Update dependency transformers to v4.51.2~~ Update dependency transformers to v4.51.3 Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency transformers to v4.51.3 #52

Update dependency transformers to v4.51.3 #52

renovate bot commented Feb 7, 2025 •

edited

Loading

Update dependency transformers to v4.51.3 #52

Are you sure you want to change the base?

Update dependency transformers to v4.51.3 #52

Conversation

renovate bot commented Feb 7, 2025 • edited Loading

Release Notes

v4.51.3

v4.51.2: Patch Release 4.51.2

Patch Release 4.51.2

v4.51.1: Patch release v4.51.1

Patch release v4.51.1

v4.51.0: : Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3

New Model Additions

Llama 4

Phi4-Multimodal

DeepSeek-v3

Overview

Qwen3

Documentation

Significant model improvements

Bugfixes and improvements

Significant community contributions

v4.50.3: Patch release v4.50.3

Patch release v4.50.3

v4.50.2: Patch release v4.50.2

Patch release v4.50.2

v4.50.1: Patch release v4.50.1

Patch release v4.50.1

v4.50.0

Release v4.50.0

New Model Additions

Model-based releases

Gemma 3

Shield Gemma2

Aya Vision

Mistral 3.1

Smol VLM 2

SigLIP-2

Prompt Depth Anything

New tool: attention visualization

Configuration

renovate bot commented Feb 7, 2025 •

edited

Loading

`v4.51.3`

`v4.51.2`: Patch Release 4.51.2

`v4.51.1`: Patch release v4.51.1

`v4.51.0`: : Llama 4, Phi4-Multimodal, DeepSeek-v3, Qwen3

`v4.50.3`: Patch release v4.50.3

`v4.50.2`: Patch release v4.50.2

`v4.50.1`: Patch release v4.50.1

`v4.50.0`