Skip to content

Conversation

@poshul
Copy link
Contributor

@poshul poshul commented Dec 1, 2025

Summary by CodeRabbit

  • New Features

    • Arrow library is now a selectable build option and is built cross‑platform with enhanced logging and clearer failure reporting.
  • Chores

    • Archive checksum validation migrated from SHA1 to SHA256 for contributed libraries.
    • Archive/source metadata and download references updated to reflect new checksum format and adjusted download locations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 1, 2025

Walkthrough

Adds Apache Arrow as a selectable contrib build target with a new OPENMS_CONTRIB_BUILD_ARROW CMake macro, and replaces SHA1-based archive checksum handling with SHA256 across contrib download/validation logic.

Changes

Cohort / File(s) Summary
Build configuration
CMakeLists.txt
Adds ARROW to BUILD_TYPE / VALID_BUILD_TYPES; declares ARROW_DIR, ARCHIVE_ARROW, ARCHIVE_ARROW_TAR, ARCHIVE_ARROW_SHA256, INCLUDE_DIR_ARROW; includes libraries.cmake/arrow.cmake; integrates ARROW build/clean/copy steps into the main build loop.
Arrow build automation
libraries.cmake/arrow.cmake
New macro OPENMS_CONTRIB_BUILD_ARROW that extracts the Arrow archive and runs platform-aware configure/build flows: MSVC (CMake configure + Debug/Release builds) and Linux/macOS (CMake configure with PIC/APPLE flags + Release/Install). Logs output and fails on errors.
Archive validation & download macros
macros.cmake
Replaces SHA1-based validation/download checks with SHA256 variants (_target_SHA256 / _downloaded_SHA256), updates sentinel/empty-archive checks and mismatch messages, and adjusts the base download URL path.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Files needing extra attention:
    • libraries.cmake/arrow.cmake — verify CMake arguments, platform branches, RESULT_VARIABLE usage, build target names, and log-file handling.
    • CMakeLists.txt — confirm variable names, inclusion order, and ARROW integration in the main build loop.
    • macros.cmake — ensure all SHA1→SHA256 substitutions are complete and empty-sentinel handling and URL change are correct.

Poem

🐇
I hopped through CMake fields at dawn,
Carried Arrow home before the lawn,
SHA256 tucked in my paw,
Logs that hum and builds that claw,
A rabbit smiles — the build's not gone. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately captures both main changes: adding Arrow to the contrib build system and migrating from SHA1 to SHA256 validation across multiple libraries.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add_arrow

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f942b8e and a551a9c.

📒 Files selected for processing (1)
  • libraries.cmake/arrow.cmake (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (ubuntu-22.04-arm)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (4)
libraries.cmake/arrow.cmake (4)

5-13: LGTM!

Macro definition and archive extraction setup are correctly structured with platform-appropriate extraction arguments.


16-76: LGTM!

The MSVC build path is well-structured with proper error handling at each stage (configure, debug build, release build). Log output is captured and written to file, and failures are correctly reported via FATAL_ERROR.


98-98: Consider also setting CMAKE_CXX_FLAGS for Arrow's C++ sources.

Arrow is primarily a C++ library. Setting only CMAKE_C_FLAGS means the -fPIC flag won't apply to C++ compilation units, which could cause linking issues when building shared libraries.

                           -D CMAKE_C_FLAGS=${ARROW_CFLAGS}
+                          -D CMAKE_CXX_FLAGS=${ARROW_CFLAGS}

Alternatively, verify that Arrow's CMake configuration internally propagates position-independent code settings or that shared library builds work correctly without this change.


111-138: LGTM!

Error handling for the configure and build steps is properly implemented, with detailed output on failure and appropriate logging to the log file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a63c22f and 4728615.

📒 Files selected for processing (3)
  • CMakeLists.txt (5 hunks)
  • libraries.cmake/arrow.cmake (1 hunks)
  • macros.cmake (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-22.04-arm)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (4)
macros.cmake (1)

114-120: LGTM!

The SHA256 migration in download_contrib_archive is correctly implemented. The empty file SHA256 hash (e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) is correct, and variable naming is consistent.

CMakeLists.txt (2)

624-631: LGTM!

The Arrow build section correctly follows the established pattern used by other libraries (COINOR, EIGEN, HDF5, etc.) with OPENMS_CLEAN_LIB, OPENMS_CONTRIB_BUILD_ARROW, and OPENMS_COPY_LIBS.


217-257: LGTM!

The SHA256 hash migration for all existing libraries is correctly implemented. The hash values are the expected 64-character hexadecimal strings consistent with SHA256 format.

libraries.cmake/arrow.cmake (1)

18-31: ARROW_EXTRA_CMAKE_FLAG is intentionally undefined and follows the established pattern.

This variable is used at line 26 but left undefined in CMakeLists.txt. This is an intentional design pattern used consistently across multiple libraries in the codebase (zlib.cmake, xercesc.cmake), allowing users to pass optional CMake flags without modifying the source files. When undefined, it safely expands to an empty string with no functional impact.

Comment on lines +89 to +90
# CFLAGS for libsvm compiler (see libsvm Makefile)
set(ARROW_CFLAGS "-Wall -O3 -fPIC")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Copy/paste error in comment.

The comment references "libsvm compiler" but this is the Arrow build macro.

-  # CFLAGS for libsvm compiler (see libsvm Makefile)
+  # CFLAGS for Arrow compiler
   set(ARROW_CFLAGS "-Wall -O3 -fPIC")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# CFLAGS for libsvm compiler (see libsvm Makefile)
set(ARROW_CFLAGS "-Wall -O3 -fPIC")
# CFLAGS for Arrow compiler
set(ARROW_CFLAGS "-Wall -O3 -fPIC")
🤖 Prompt for AI Agents
In libraries.cmake/arrow.cmake around lines 89 to 90, the inline comment
incorrectly references "libsvm compiler" due to a copy/paste error; update the
comment to correctly describe that these CFLAGS apply to the Arrow build (e.g.,
"CFLAGS for Arrow build") so it accurately documents the set(ARROW_CFLAGS ...)
line and keep the existing flag assignment unchanged.

Comment on lines +56 to +67
set(_target_SHA256 ${ARCHIVE_${libname}_SHA256})

message(STATUS "Validating archive for ${libname} .. ")

file(SHA1 ${_target_file} _downloaded_sha1 )
if(NOT "${_downloaded_sha1}" STREQUAL "${_target_sha1}")
file(SHA256 ${_target_file} _downloaded_SHA256 )
if(NOT "${_downloaded_SHA256}" STREQUAL "${_target_SHA256}")
file(REMOVE ${_target_file})
if("${_downloaded_sha1}" STREQUAL "da39a3ee5e6b4b0d3255bfef95601890afd80709")
if("${_downloaded_SHA256}" STREQUAL "da39a3ee5e6b4b0d3255bfef95601890afd80709")
message(STATUS "Validating archive for ${libname} .. Found empty archive.")
message(STATUS "Check your internet connection.")
else()
message(STATUS "Validating archive for ${libname} .. sha1 mismatch (expected: ${_target_sha1} got: ${_downloaded_sha1})")
message(STATUS "Validating archive for ${libname} .. SHA256 mismatch (expected: ${_target_SHA256} got: ${_downloaded_SHA256})")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent empty-archive hash: SHA1 used instead of SHA256.

Line 63 still uses the SHA1 hash of an empty file (da39a3ee5e6b4b0d3255bfef95601890afd80709), but the variable _downloaded_SHA256 now contains a SHA256 hash. This comparison will never match, so the "Found empty archive" message will never be shown in validate_archive.

Compare with line 117 which correctly uses the SHA256 hash of an empty file.

Apply this diff to fix the inconsistency:

     if(NOT "${_downloaded_SHA256}" STREQUAL "${_target_SHA256}")
       file(REMOVE ${_target_file})
-      if("${_downloaded_SHA256}" STREQUAL "da39a3ee5e6b4b0d3255bfef95601890afd80709")
+      if("${_downloaded_SHA256}" STREQUAL "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855")
         message(STATUS "Validating archive for ${libname} .. Found empty archive.")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set(_target_SHA256 ${ARCHIVE_${libname}_SHA256})
message(STATUS "Validating archive for ${libname} .. ")
file(SHA1 ${_target_file} _downloaded_sha1 )
if(NOT "${_downloaded_sha1}" STREQUAL "${_target_sha1}")
file(SHA256 ${_target_file} _downloaded_SHA256 )
if(NOT "${_downloaded_SHA256}" STREQUAL "${_target_SHA256}")
file(REMOVE ${_target_file})
if("${_downloaded_sha1}" STREQUAL "da39a3ee5e6b4b0d3255bfef95601890afd80709")
if("${_downloaded_SHA256}" STREQUAL "da39a3ee5e6b4b0d3255bfef95601890afd80709")
message(STATUS "Validating archive for ${libname} .. Found empty archive.")
message(STATUS "Check your internet connection.")
else()
message(STATUS "Validating archive for ${libname} .. sha1 mismatch (expected: ${_target_sha1} got: ${_downloaded_sha1})")
message(STATUS "Validating archive for ${libname} .. SHA256 mismatch (expected: ${_target_SHA256} got: ${_downloaded_SHA256})")
set(_target_SHA256 ${ARCHIVE_${libname}_SHA256})
message(STATUS "Validating archive for ${libname} .. ")
file(SHA256 ${_target_file} _downloaded_SHA256 )
if(NOT "${_downloaded_SHA256}" STREQUAL "${_target_SHA256}")
file(REMOVE ${_target_file})
if("${_downloaded_SHA256}" STREQUAL "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855")
message(STATUS "Validating archive for ${libname} .. Found empty archive.")
message(STATUS "Check your internet connection.")
else()
message(STATUS "Validating archive for ${libname} .. SHA256 mismatch (expected: ${_target_SHA256} got: ${_downloaded_SHA256})")
🤖 Prompt for AI Agents
In macros.cmake around lines 56 to 67, the empty-file check compares
_downloaded_SHA256 to the SHA1 empty-file digest
(da39a3ee5e6b4b0d3255bfef95601890afd80709) so it will never match; replace that
value with the SHA256 digest of an empty file
(e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855) so the
condition can detect and report an empty archive correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22c2300 and 56fffd5.

📒 Files selected for processing (1)
  • CMakeLists.txt (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (windows-latest)
  • GitHub Check: build (ubuntu-22.04-arm)
  • GitHub Check: build (ubuntu-latest)
🔇 Additional comments (5)
CMakeLists.txt (5)

94-95: ADD Arrow to build type lists.

Arrow is correctly added to both BUILD_TYPE and VALID_BUILD_TYPES, making it selectable for compilation.


271-271: Verify INCLUDE_DIR_ARROW target location.

INCLUDE_DIR_ARROW is set to ${CONTRIB_BIN_INCLUDE_DIR}/arrow. Confirm that the Arrow build macro (OPENMS_CONTRIB_BUILD_ARROW in libraries.cmake/arrow.cmake) installs headers to this location during the build process.


485-485: Arrow CMake macros correctly included.

The Arrow build macro file is properly included alongside other library macros.


624-631: Arrow build section follows established pattern.

The Arrow build logic is consistent with other optional libraries (HDF5, KISSFFT, OPENMP): clean, build, and copy libraries. The conditional check (if (BUILD_TYPE STREQUAL "ALL" OR "ARROW" IN_LIST BUILD_TYPE)) properly integrates Arrow as a selectable build target.


199-199: Verify Arrow archive extracts to the expected directory structure.

ARROW_DIR references arrow-cpp-22.0.0/cpp, and the archive is arrow-cpp-22.0.0.tar.xz. While the version numbers now match (fixing the prior mismatch), you must confirm that extracting this .tar.xz produces a cpp/ subdirectory at the expected depth. If the archive root differs from arrow-cpp-22.0.0/, or if the C++ sources are not in a cpp/ subdirectory, the build will fail.

You can verify the archive structure by extracting it locally or by checking the Apache Arrow GitHub repository for version 22.0.0 and confirming the directory layout matches.

#!/bin/bash
# Verify Arrow archive structure (if you have access to the archive locally)
# Expected: arrow-cpp-22.0.0/cpp directory should exist after extraction

archive_file="arrow-cpp-22.0.0.tar.xz"
if [ -f "$archive_file" ]; then
  tar -tzf "$archive_file" | grep -E '^arrow-cpp-22.0.0/cpp/' | head -5
  echo "---"
  tar -tzf "$archive_file" | head -10
fi

Alternatively, check the upstream repository on GitHub to confirm the C++ source location within the archive.

Also applies to: 259-261

-D CMAKE_INSTALL_PREFIX=${PROJECT_BINARY_DIR}
-D ARROW_CSV=ON
-D ARROW_PARQUET=ON
${ARROW_EXTRA_CMAKE_FLAG}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could try to make it recognize the compression libs that we are building in our contrib like we do for boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants