Skip to content

fix: macOS universal build#1171

Open
atteneder wants to merge 17 commits into
KhronosGroup:mainfrom
atteneder:fix/macOS-universal-5.0.0
Open

fix: macOS universal build#1171
atteneder wants to merge 17 commits into
KhronosGroup:mainfrom
atteneder:fix/macOS-universal-5.0.0

Conversation

@atteneder
Copy link
Copy Markdown
Contributor

Summary

Since ~4.4.0 I was not able to build a universal libktx or libktx_read macOS binary anymore. Even though those will fade out with Apple deprecating Intel CPU support, I still want to support that for the time being.

Here's what this PR does to fix that:

  1. Multiple astc-encoder* targets may get added as dependency (in most cases astcenc-sse4.1-static and astcenc-neon-static).
  2. SSE 4.1 is enabled for Basis Universal x86_64 only (so not for arm64)

Test

On a Mac navigate to the root of this repository clone and run:

cmake . -B build-mac-test -G Xcode -D CMAKE_OSX_ARCHITECTURES="arm64;x86_64"
cmake --build build-mac-test

Verify SSE instructions:

LIB=build-mac-test/Debug/libktx_read.4.0.0.dylib

# Disassemble the dylib slice and look for x86 SSE instructions that
# can only exist if SSE was actually compiled in. pshufb / pmaddubsw
# are SSSE3+ and appear in basisu's SSE kernels but never in arm64 code.
otool -tV -arch x86_64 "$LIB" | grep -cE '\b(pshufb|pmaddubsw|pmulhrsw)\b'
# output: 16
otool -tV -arch arm64  "$LIB" | grep -cE '\b(pshufb|pmaddubsw|pmulhrsw)\b'
# output: 0 (makes sense :) )

Behavioral change

Due to a (presumably CMake) limitation the build only succeeds if CMAKE_OSX_ARCHITECTURES is set explicitly to arm64;x86_64. Using $(ARCHS_STANDARD) leads to linking errors.

Since before it did not compile at all, I consider this an acceptable improvement.

To reduce confusion I added an explicit configuration error that instructs users to use arm64;x86_64 explicitly. I prefer that option over silently overriding the value, which would eventually lead to future problems should Apple decide to change that standard arch set in the future.

Not Addressed

I verified building universal iOS simulator binaries works, but have not checked whether the x86_64 arch has SSE enabled. I presume not, but do not consider this a blocker.

History

This picks up the previous attempt from #1043 , which was not accepted since the BasisU was built without SSE optimization in that one.

atteneder added 4 commits May 14, 2026 16:38
ASTC encoder library with matching SIMD instruction set will be added as dependency per architecture.
When using the Xcode generator to build a universal binary, SSE 4.1 is enabled for x86_64 architecture only.
…)`, which is known to generate broken linking paths in conjunction with object library dependencies.
@atteneder atteneder changed the title Fix/mac os universal 5.0.0 fix: macOS universal build May 14, 2026
atteneder added 3 commits May 14, 2026 21:06
…lds.

chore: Removed outdated description about falling back to disabling SSE for Basis Universal on universal builds.
…_type`.

Updated comments describing the outdated BasisU SSE behavior for multi-arch builds.
@atteneder
Copy link
Copy Markdown
Contributor Author

I noticed legacy code that sets compiler option -msse4.1 (lib/CMakeLists.txt:495). It's likely safe to remove:

It's effectively redundant now, and the per-arch block accidentally proves it.

The target_compile_options at line 495 fires when BASISU_SSE is TRUE and the compiler is Clang/AppleClang/GNU. After
the change at line 541 forces BASISU_SSE FALSE for any universal build, the only configuration where it still fires
is single-arch x86_64 with Clang/GCC (Linux, Windows-clang, or a hypothetical Intel-only macOS build). For
arm64-only, universal macOS, and MSVC builds it's a no-op.

The empirical evidence that it isn't actually needed even in that remaining case comes from the per-arch override
block at lines 745–761:

  • For the x86_64 slice of ktx in a universal build, we deliberately inject BASISU_SUPPORT_SSE=1 via -Xarch_x86_64 but
    do not inject -msse4.1 on ktx (only on basisu_encoder).
  • That x86_64 slice compiles basis_encode.cpp cleanly. So basisu's PUBLIC headers, when consumed by ktx, do not parse
    any SSE 4.1 intrinsics that would require -msse4.1 — they just reference an extern flag (g_cpu_supports_sse41). The
    actual SSE 4.1 instructions live inside basisu_kernels_sse.cpp, where basisu_encoder's own -msse4.1 (from
    external/basis_universal/CMakeLists.txt) is what matters.

So the same logic applies to the single-arch x86_64 case: ktx doesn't need -msse4.1 either, only basisu_encoder does,
and basisu already arranges that on its own target.

Tradeoff: it's defensive code that's been there a long time, so removing it has a small chance of breaking some
configuration nobody currently exercises (e.g. Linux GCC x86_64 build with a future basisu version that does inline
SSE 4.1 intrinsics in a header). If you'd like to delete it for clarity I can; otherwise leaving it as harmless dead
weight on the cases where it still fires is also defensible.

@atteneder
Copy link
Copy Markdown
Contributor Author

atteneder commented May 14, 2026

I noticed that currently it's not possible to force SSE off via BASISU_SSE option anymore.

To solve it I suggest to introduce option LIBKTX_FEATURE_BASISU_SSE.

After that change the last message about the legacy compiler option may become obsolete.

edit: Can confirm that the -msse41 compiler mentioned above is indeed still subject to be removed.

…ing support for SSE 4.1 in Basis Universal.

The now internal `BASISU_SSE` used to be exposed prior.
@MarkCallow
Copy link
Copy Markdown
Collaborator

@atteneder, I'll be away for the next 5 days. I will review when I return. I am puzzled by your statement that it is not possible to force SSE off via BASISU_SSE. I can see no deliberate reason for that. Perhaps there is an error in basis_universal's CMakeLists.txt. Make sure the CMake cache is cleared before running your tests. Maybe the value is coming from there.

atteneder added 3 commits May 15, 2026 10:40
When cross-compiling to platforms other than iOS/tvOS/visionOS (e.g. Android) this was wrongfully set to `ON` leading to follow-up errors.
@atteneder
Copy link
Copy Markdown
Contributor Author

@atteneder, I'll be away for the next 5 days. I will review when I return.

Much appreciated! No rush.

@atteneder, I am puzzled by your statement that it is not possible to force SSE off via BASISU_SSE. I can see no deliberate reason for that. Perhaps there is an error in basis_universal's CMakeLists.txt. Make sure the CMake cache is cleared before running your tests. Maybe the value is coming from there.

Apologies for the confusing statements. Of course BASISU_SSE still works as intended by the authors of BasisU. However, this PR changes how it's used. In order to work around BasisU's limiting compile-time switch BASISU_SSE is disabled but its effects (compile option -msse4.1 and define BASISU_SUPPORT_SSE) are later injected for arch x86_64 only like so:

if(APPLE_MAC_OS AND LIBKTX_FEATURE_BASISU_SSE)
    ####################################################
    # Per-architecture SSE 4.1 in universal macOS builds.
    ####################################################
    # basisu_encoder cannot enable SSE for the whole project in a universal build
    # (the -msse4.1 flag would break the arm64 slice). With BASISU_SSE OFF, basisu
    # emits BASISU_SUPPORT_SSE=0 PUBLIC and no -msse4.1; we then use clang's
    # -Xarch_<arch> driver option to inject -msse4.1 and override
    # BASISU_SUPPORT_SSE to 1 only for the x86_64 sub-invocation. -Xarch_x86_64
    # applies to the immediately-following argument only, so each flag needs its
    # own prefix. Works with any generator (Xcode, Ninja, Make) on Apple.
    if(APPLE_MAC_OS_UNIVERSAL AND APPLE_MAC_OS_ARCH_x86_64)
        # SHELL: prevents CMake from deduplicating the repeated -Xarch_x86_64
        # tokens, which would otherwise leave only the first flag arch-qualified.
        target_compile_options(basisu_encoder PRIVATE
            "SHELL:-Xarch_x86_64 -msse4.1"
            "SHELL:-Xarch_x86_64 -UBASISU_SUPPORT_SSE"
            "SHELL:-Xarch_x86_64 -DBASISU_SUPPORT_SSE=1"
        )
        foreach(t ktx ktx_read)
            if(TARGET ${t})
                target_compile_options(${t} PRIVATE
                    "SHELL:-Xarch_x86_64 -UBASISU_SUPPORT_SSE"
                    "SHELL:-Xarch_x86_64 -DBASISU_SUPPORT_SSE=1"
                )
            endif()
        endforeach()
    endif()
endif()

Source

This ensures SSE is enabled for x86_64 but won't break arm64 builds.

I needed to introduce LIBKTX_FEATURE_BASISU_SSE so that users are able to opt-out of SSE again.

I opened the PR as soon as the macOS universal builds were functional. There's still issues with some (cross-compiling) platform builds that I'm trying to address now.

What to do with set_target_processor_type?

It appears that set_target_processor_type (in cputypetest.cmake) has historically been used to work around the BasisU SSE issue. It added a certain amount of complexity to the setup and still handles certain cases of cross-compilation poorly (embedded Linux aarch64 as an example).

I'd consider getting rid of it, if this PR proves to handle the BasisU SSE issue reliably.

That being said, this would need extensive testing of all platform compilation targets to ensure that there's no regression.

I can offer limited support in getting the CI to succeed and test my own CI setups in addition.

@atteneder
Copy link
Copy Markdown
Contributor Author

atteneder commented May 15, 2026

Update: I've tested macOS Universal with Xcode and Ninja generators, but Unix Makefiles did not succeed.

edit: I personally think that's not a blocker, but you be the judge.

Supposed to fix compilation of ktx/ktxdiff tools against dynamic library (Windows).
Copy link
Copy Markdown
Collaborator

@MarkCallow MarkCallow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay but I have a lot of questions. Please answer them.

Comment thread cmake/cputypetest.cmake Outdated
# arch (e.g. "x86_64" or "arm64") or to a list ("arm64;x86_64") to
# request a universal build. The literal "$(ARCHS_STANDARD)" is
# rejected by the root CMakeLists.txt because CMake stores it
# unexpanded and breaks $<TARGET_OBJECTS:...> paths.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a pointer to the detailed explanation of the problem in the root CMakeLists.txt.

Too bad we can't query the ARCHS_STANDARD build setting value and expand it ourselves.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

And agreed, that would be nice.

That got me thinking and...what if we substitute the literal with hard-coded values and add a custom pre-build command to check if it (still) matches up:

if(ADD_XCODE_MACOS_ARCHS_STANDARD_CHECK)
        add_custom_command(TARGET ${target} PRE_BUILD
            COMMAND /bin/sh -c [[
                expected="arm64 x86_64"
                actual="$ARCHS_STANDARD"
                if [ "$actual" != "$expected" ]; then
                    echo "ARCHS_STANDARD mismatch: got [$actual], expected [$expected]" >&2
                    exit 1
                else
                    echo "ARCHS_STANDARD check passed: [$actual]"
                fi
            ]]
            VERBATIM
        )
    endif()

I tested it and it worked (with both ONLY_ACTIVE_ARCH=NO and YES), so I'll add it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work on Windows? If not, you can use cmake -P with cmake code to do the same check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd only add this on the conditions macOS target and Xcode generator, so it wouldn't affect other platforms.

How would that cmake -P solution look like? Maybe it's more elegant than this.

Comment thread cmake/cputypetest.cmake
# Building for iOS, iPadOS, etc. Since we don't care what
# type of ARM processor, arbitrarily set armv8.
# It should be arm64 but there is a check in tests/CMakeLists.txt
# that is dropping loadtests for Apple Silicon arm64.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should fix this and tests/CMakeLists.txt. I don't recall why it wants to drop loadtests for Apple Silicon arm64.

Comment thread BUILDING.md
Macs are either based on Intel or the newer Apple Silicon architecture. By default CMake configures to build for your host's platform, whichever it is. If you want to cross compile universal binaries (that support both platforms), add the parameter `-DCMAKE_OSX_ARCHITECTURES="arm64;x86_64"` to cmake.

> **Known limitations:**
> - Intel Macs have support for SSE, but if you're building universal binaries,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This limitation is now outdated. Delete.

Comment thread lib/CMakeLists.txt Outdated
Comment thread lib/CMakeLists.txt
set(APPLE_MAC_OS_UNIVERSAL OFF)
if(APPLE_MAC_OS)
# Check CMAKE_OSX_ARCHITECTURES for multiple architectures.
list_contains(CMAKE_OSX_ARCHITECTURES "$(ARCHS_STANDARD)" APPLE_MAC_OS_ARCH_STANDARD)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we can't use $(ARCHS_STANDARD).

Comment thread lib/CMakeLists.txt
# Set ordinary variable to override astc-encoder option's ON default
# and hide the option.
set(ASTCENC_UNIVERSAL_BUILD ${universal_build})
set(ASTCENC_APPLE_MAC_OS_UNIVERSAL ON)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer an astc-encoder option, is it? So, at the least, the comment is inaccurate. But why create this new variable?

Comment thread lib/CMakeLists.txt
if(APPLE_MAC_OS_UNIVERSAL)
if(APPLE_MAC_OS_ARCH_x86_64h)
list(APPEND ASTCENC_LIB_TARGETS astcenc-avx2-static)
elseif(APPLE_MAC_OS_ARCH_x86_64 OR APPLE_MAC_OS_ARCH_STANDARD)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using APPLE_MAC_OS_ARCH_STANDARD aren't you enshrining the problem you mentioned somewhere else of failing to handle a new architecture added to $(ARCH_STANDARD)?

Comment thread lib/CMakeLists.txt
# SHELL: prevents CMake from deduplicating the repeated -Xarch_x86_64
# tokens, which would otherwise leave only the first flag arch-qualified.
target_compile_options(basisu_encoder PRIVATE
"SHELL:-Xarch_x86_64 -msse4.1"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/where do these get set for a build that is just for x86_64 or x86_64h?

Comment thread lib/CMakeLists.txt
@MarkCallow
Copy link
Copy Markdown
Collaborator

I think you are correct about the -msse4.1 setting at line 495. I think it is historical from when we directly included the basisu_encoder sources in libktx. Please remove it.

I originally introduced set_target_processor_type to determine whether the BASISU_SSE option should be shown or not, to remove a possible point of confusion for users. It wasn't connected to universal builds. I still want to avoid showing this option when it is of no use. I had hoped it would be temporary until basisu_encoder was changed to use compiler pre-defined macros and run-time queries to determine if SSE could be used and what variant. It doesn't look as if that is going to happen.

They key here is to not show the BASISU_SSE option, now LIBKTX_FEATURE_BASISU_SSE, if the target processor does not support it.

atteneder added 5 commits May 23, 2026 23:22
…ting.

Instead it's substituted by the currently known value ('arm64;x86_64') and a custom Xcode pre-build command certifies that that's still correct.
This reverts commit 7a67f3d.

Android linking failed, because astc_codec.cpp depends on ASTC encoder targes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants