Fix Qwen3 when `--dtype float16` on CPU / MPS #653

alvarobartt · 2025-06-24T11:33:19Z

What does this PR do?

This PR is a follow up fix of #648, since the float16 inference for Qwen3 models was broken due to the attention_bias dtype, which was defaulting to float32, so this PR adds the dtype field for Qwen3Model so that the dtype is inherited.

As a side-fix, this PR also removes the to_device method from Tensor::zeros_like as it's not required, and by default placed in the same device as the tensor used.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@Narsil or @kozistr

Narsil · 2025-06-24T20:32:01Z

index 989968d..cfe056c 100644
--- a/backends/candle/src/models/qwen3.rs
+++ b/backends/candle/src/models/qwen3.rs
@@ -454,13 +454,14 @@ impl Qwen3Model {
         let causal_mask = Tensor::from_slice(&mask, (seq_len, seq_len), &Device::Cpu)?;
         let causal_mask = causal_mask.expand(&[bs, dim, seq_len, seq_len])?;+        let device = attention_bias.device();
         let negatives =
             Tensor::full(f32::MIN, attention_bias.shape(), &Device::Cpu)?.to_dtype(self.dtype)?;
-        let zeros = Tensor::zeros_like(&attention_bias)?.to_dtype(self.dtype)?;
+        let zeros = Tensor::zeros_like(&attention_bias)?.to_device(&Device::Cpu)?.to_dtype(self.dtype)?;         let causal_mask = causal_mask
             .where_cond(&negatives, &zeros)?
-            .to_device(&self.device)?;
+            .to_device(device)?;         attention_bias.broadcast_add(&causal_mask)
     }

This one is working for me. Here there is an issue because the zeros are on the CPU, while the negatives are on CPU.

kozistr · 2025-06-25T02:13:43Z

thanks for catching this! I missed that part, sorry for that.

alvarobartt · 2025-06-25T06:42:01Z

Closed in favor of #654

alvarobartt added 2 commits June 24, 2025 13:29

Add dtype in Qwen3Model and fix casting

08d1d1c

Remove clone for dtype

30f9d1f

alvarobartt force-pushed the fix-cpu-metal-f16-qwen3 branch from 0856372 to 30f9d1f Compare June 24, 2025 11:43

alvarobartt closed this Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Qwen3 when `--dtype float16` on CPU / MPS #653

Fix Qwen3 when `--dtype float16` on CPU / MPS #653

Uh oh!

alvarobartt commented Jun 24, 2025

Uh oh!

Narsil commented Jun 24, 2025

Uh oh!

kozistr commented Jun 25, 2025

Uh oh!

alvarobartt commented Jun 25, 2025

Uh oh!

Uh oh!

Fix Qwen3 when --dtype float16 on CPU / MPS #653

Fix Qwen3 when --dtype float16 on CPU / MPS #653

Uh oh!

Conversation

alvarobartt commented Jun 24, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Narsil commented Jun 24, 2025

Uh oh!

kozistr commented Jun 25, 2025

Uh oh!

alvarobartt commented Jun 25, 2025

Uh oh!

Uh oh!

Fix Qwen3 when `--dtype float16` on CPU / MPS #653

Fix Qwen3 when `--dtype float16` on CPU / MPS #653