Skip to content

as_chunks lead to unnecessary loop unrolling #150647

@Evian-Zhang

Description

@Evian-Zhang

I tried this code:

fn foo(
    indices: &[usize],
    values: &mut [u8],
) {
    let (_, rem) = indices.as_chunks::<4>();

    for index in rem {
        let value = unsafe { values.get_unchecked_mut(*index) };
        *value = value.wrapping_add(1);
    }
}

I expected to see this happen: The for-loop will be iterated at most 3 times since the rem's length cannot exceed 3.

Instead, this happened:

The compiled assembly using release build (tested with both stable and nightly compiler):

foo:
	lea	ecx, [8*rsi]
	and	ecx, 24
	je	.LBB0_7
	movabs	rax, 2305843009213693948
	and	rsi, rax
	lea	rax, [rdi + 8*rsi]
	lea	rdi, [rcx - 8]
	mov	r8d, edi
	not	r8d
	mov	rsi, rax
	test	r8b, 24
	je	.LBB0_4
	mov	r8d, edi
	shr	r8d, 3
	inc	r8d
	and	r8d, 3
	mov	rsi, rax

.LBB0_3:
	mov	r9, qword ptr [rsi]
	add	rsi, 8
	inc	byte ptr [rdx + r9]
	dec	r8
	jne	.LBB0_3

.LBB0_4:
	cmp	rdi, 24
	jb	.LBB0_7
	add	rax, rcx

.LBB0_6:
	mov	rcx, qword ptr [rsi]
	inc	byte ptr [rdx + rcx]
	mov	rcx, qword ptr [rsi + 8]
	inc	byte ptr [rdx + rcx]
	mov	rcx, qword ptr [rsi + 16]
	inc	byte ptr [rdx + rcx]
	mov	rcx, qword ptr [rsi + 24]
	inc	byte ptr [rdx + rcx]
	add	rsi, 32
	cmp	rsi, rax
	jne	.LBB0_6

.LBB0_7:
	ret
Detailed explanation of the assembly

At the beginning, rsi is the length of indices. By

	lea	ecx, [8*rsi]  ; Which is identical to left shift rsi by 3 bits
	and	ecx, 24       ; 24 is 0b0001_1000

This means the value of rcx can be one of the following values:

  • 0 if the lowest two bits of rsi is 00
  • 8 if the lowest two bits of rsi is 01
  • 16 if the lowest two bits of rsi is 10
  • 24 if the lowest two bits of rsi is 11

And if rcx is 0, it will directly return. So rcx can only be 8, 16, or 24.

By lea rdi, [rcx - 8], rdi then become 0, 8, or 16.

So after label .LBB0_4, the comparison cmp rdi, 24 will never become greater-than-or-equal-to, so the unrolled loop in .LBB0_6 will never get executed.

I tried to figure out why this unnecessary unrolling is generated, and it turns out that it's the LLVM's LoopUnrollPass that generates this code (compiler explorer). I'm not sure if this is an LLVM bug, or Rust fails to provide enough information to LLVM.

Meta

rustc --version --verbose:

rustc 1.92.0 (ded5c06cf 2025-12-08)
binary: rustc
commit-hash: ded5c06cf21d2b93bffd5d884aa6e96934ee4234
commit-date: 2025-12-08
host: x86_64-unknown-linux-gnu
release: 1.92.0
LLVM version: 21.1.3
Backtrace

<backtrace>

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-heavyIssue: Problems and improvements with respect to binary size of generated code.I-slowIssue: Problems and improvements with respect to performance of generated code.needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions