Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 33 additions & 9 deletions cpp/src/arrow/compute/util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,6 @@ namespace util {
namespace bit_util {

inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
// This will not be correct on big-endian architectures.
#if !ARROW_LITTLE_ENDIAN
ARROW_DCHECK(false);
#endif
ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
if (num_bytes == 8) {
return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work on big-endian system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this.. Now with the way we are handling the tail_bytes and loading the word data, we dont actually need to change "SafeLoadUpTo8Bytes()" function.. With the conditional compilation, this function will never be called on Big-endian architecture.
I have reverted this change.. Tested completely on s390x to see if all the test work. I have pushed a new commit. Please give your review comments. Thanks.

Expand All @@ -47,17 +43,20 @@ inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
}

inline void SafeStoreUpTo8Bytes(uint8_t* bytes, int num_bytes, uint64_t value) {
// This will not be correct on big-endian architectures.
#if !ARROW_LITTLE_ENDIAN
ARROW_DCHECK(false);
#endif
ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
if (num_bytes == 8) {
util::SafeStore(reinterpret_cast<uint64_t*>(bytes), value);
} else {
#if ARROW_LITTLE_ENDIAN
for (int i = 0; i < num_bytes; ++i) {
bytes[i] = static_cast<uint8_t>(value >> (8 * i));
}
#else
// Big-endian: most significant byte first
for (int i = 0; i < num_bytes; ++i) {
bytes[i] = static_cast<uint8_t>(value >> (8 * (num_bytes - 1 - i)));
}
#endif
}
}

Expand Down Expand Up @@ -118,7 +117,22 @@ void bits_to_indexes_internal(int64_t hardware_flags, const int num_bits,
// Optionally process the last partial word with masking out bits outside range
if (tail) {
const uint8_t* bits_tail = bits + (num_bits - tail) / 8;
#if ARROW_LITTLE_ENDIAN
uint64_t word = SafeLoadUpTo8Bytes(bits_tail, (tail + 7) / 8);
#else
int tail_bytes = (tail + 7) / 8;
uint64_t word;
if (tail_bytes == 8) {
word = util::SafeLoad(reinterpret_cast<const uint64_t*>(bits_tail));
} else {
// For bit manipulation, always load into least significant bits
// to ensure compatibility with CountTrailingZeros on Big-endian systems
word = 0;
for (int i = 0; i < tail_bytes; ++i) {
word |= static_cast<uint64_t>(bits_tail[i]) << (8 * i);
}
}
#endif
Comment on lines +120 to +135
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

The SafeLoadUpTo8Bytes() change adds support for big-endian, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I have removed the big-endian support to SafeLoadUpTo8Bytes() function, these changes are required as these handle the way we handle the tail_bytes on big-endian systems. If the tail_bytes are equal to 8, then we call directly the SafeLoad to load the data onto "word" variable. And for rest other cases, we need to take care of loading least significant bits to ensure compatibility with "CountTrailingZeros". This is the reason why we wont be able to make a direct call "SafeLoadUpTo8Bytes()" for every tail_bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed the lint errors and pushed my changes. Thanks..

if (bit_to_search == 0) {
word = ~word;
}
Expand Down Expand Up @@ -299,7 +313,17 @@ void bytes_to_bits(int64_t hardware_flags, const int num_bits, const uint8_t* by
}
int tail = num_bits % unroll;
if (tail) {
uint64_t bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail);
uint64_t bytes_next;
#if ARROW_LITTLE_ENDIAN
bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail);
#else
// On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian
// order to ensure compatibility with subsequent bit operations
bytes_next = 0;
for (int i = 0; i < tail; ++i) {
bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i);
}
#endif
Comment on lines +316 to +326
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we revert this change with the latest SafeLoadUpTo8Bytes() (that has big endian support)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou.. I tried doing that but the testcase failed on s390x.. We need the "bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i);" on big-endian, which we dont get it when we directly call the SafeLoadUpTo8Bytes()..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share code you tried?

#48180 (review) includes word |= static_cast<uint64_t>(bytes[num_bytes - 1 - i]) << (8 * i);.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kou.. The SafeLoadUpTo8Bytes() function remains unchanged..

inline uint64_t SafeLoadUpTo8Bytes(const uint8_t* bytes, int num_bytes) {
  ARROW_DCHECK(num_bytes >= 0 && num_bytes <= 8);
  if (num_bytes == 8) {
    return util::SafeLoad(reinterpret_cast<const uint64_t*>(bytes));
  } else {
    uint64_t word = 0;
    for (int i = 0; i < num_bytes; ++i) {
      word |= static_cast<uint64_t>(bytes[i]) << (8 * i);
    }
    return word;
  }
}

The bytes_to_bits() function is handling the endianness fix independently.. If I do the endianness conversion inside SafeLoadUpTo8Bytes() function, rather than here, then the testcase is not working..

if (tail) {
   uint64_t bytes_next;
#if ARROW_LITTLE_ENDIAN
   bytes_next = SafeLoadUpTo8Bytes(bytes + num_bits - tail, tail);
#else
   // On Big-endian systems, for bytes_to_bits, load all tail bytes in little-endian
   // order to ensure compatibility with subsequent bit operations
   bytes_next = 0;
   for (int i = 0; i < tail; ++i) {
     bytes_next |= static_cast<uint64_t>((bytes + num_bits - tail)[i]) << (8 * i);
   }
#endif
   bytes_next &= 0x0101010101010101ULL;
   bytes_next |= (bytes_next >> 7);  // Pairs of adjacent output bits in individual bytes
   bytes_next |= (bytes_next >> 14);  // 4 adjacent output bits in individual bytes
   bytes_next |= (bytes_next >> 28);  // All 8 output bits in the lowest byte
   bits[num_bits / 8] = static_cast<uint8_t>(bytes_next & 0xff);
 }

bytes_next &= 0x0101010101010101ULL;
bytes_next |= (bytes_next >> 7); // Pairs of adjacent output bits in individual bytes
bytes_next |= (bytes_next >> 14); // 4 adjacent output bits in individual bytes
Expand Down
Loading