Skip to content

Conditions using bitwise OR on booleans may produce de-optimized code #32414

Closed
@mzabaluev

Description

@mzabaluev

When writing optimization-friendly code, sometimes it might seem like a good idea to unroll branches by hand by performing multi-step computations optimistically, while keeping tabs on possible failures of intermediate steps as booleans. These failure flags are then combined using bitwise logic when the validity of the result is finally checked. Here's an example of how it might work with checked arithmetics, inspired by some code in std::hash:

fn calculate_size(elem_size: usize,
                  length: usize,
                  offset: usize)
                  -> Option<usize> {
    let (acc, oflo1) = elem_size.overflowing_mul(length);
    let (acc, oflo2) = acc.overflowing_add(offset);
    if oflo1 | oflo2 {
        None
    } else {
        Some(acc)
    }
}

However, in optimized code generation at least on x86-64, the bitwise OR for booleans is sometimes decomposed into a series of checks and branches, defeating the whole purpose. Here's a condensed benchmark comparing boolean OR with integer bitwise OR, where the results of both are used as the condition for branching.

#![feature(test)]

extern crate test;

use test::Bencher;

#[inline(never)]
fn or_bools(a: bool, b: bool, c: bool) -> Option<u64> {
    if a | b | c { Some(1) } else { None }
}

#[inline(never)]
fn or_bytes(a: u8, b: u8, c: u8) -> Option<u64> {
    if (a | b | c) != 0 { Some(1) } else { None }
}

#[bench]
fn bench_or_bools(b: &mut Bencher) {
    const DATA: [(bool, bool, bool); 4]
              = [(false, false, false),
                 (true , false, false),
                 (false, true , false),
                 (false, false, true)];
    b.iter(|| {
        for i in 0 .. 4 {
            let (a, b, c) = DATA[i];
            test::black_box(or_bools(a, b, c));
        }
    })
}

#[bench]
fn bench_or_bytes(b: &mut Bencher) {
    const DATA: [(u8, u8, u8); 4]
              = [(0u8, 0u8, 0u8),
                 (1u8, 0u8, 0u8),
                 (0u8, 1u8, 0u8),
                 (0u8, 0u8, 1u8)];
    b.iter(|| {
        for i in 0 .. 4 {
            let (a, b, c) = DATA[i];
            test::black_box(or_bytes(a, b, c));
        }
    })
}

The de-optimization looks like work of LLVM, as the IR for or_bools preserves the original intent:

; Function Attrs: noinline norecurse nounwind uwtable
define internal fastcc void @_ZN8or_bools20h51bacbaed15b22f4gaaE(%"2.core::option::Option<u64>"* noalias nocapture dereferenceable(16), i1 zeroext, i1 zeroext, i1 zeroext) unnamed_addr #0 {
entry-block:
  %4 = or i1 %1, %2
  %5 = or i1 %4, %3
  %6 = bitcast %"2.core::option::Option<u64>"* %0 to i8*
  br i1 %5, label %then-block-26-, label %else-block

then-block-26-:                                   ; preds = %entry-block
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %6, i8* nonnull bitcast ({ i64, i64, [0 x i8] }* @const5784 to i8*), i64 16, i32 8, i1 false)
  br label %join

else-block:                                       ; preds = %entry-block
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %6, i8* nonnull bitcast ({ i64, [8 x i8] }* @const5785 to i8*), i64 16, i32 8, i1 false)
  br label %join

join:                                             ; preds = %else-block, %then-block-26-
  ret void
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    I-slowIssue: Problems and improvements with respect to performance of generated code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions