Skip to content

Commit bfc698e

Browse files
jonkeanethisisnic
andauthored
GH-49067: [R] Disable GCS on macos (#49068)
### Rationale for this change Builds that complete on CRAN ### What changes are included in this PR? Disable GCS by default ### Are these changes tested? ### Are there any user-facing changes? Hopefully not **This PR includes breaking changes to public APIs.** (If there are any breaking changes to public APIs, please explain which changes are breaking. If not, you can remove this.) **This PR contains a "Critical Fix".** (If the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld), please provide explanation. If not, you can remove this.) * GitHub Issue: #49067 --------- Co-authored-by: Nic Crane <thisisnic@gmail.com>
1 parent 7dacbd0 commit bfc698e

File tree

4 files changed

+197
-3
lines changed

4 files changed

+197
-3
lines changed

compose.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -441,7 +441,9 @@ services:
441441
ARROW_HOME: /arrow
442442
ARROW_DEPENDENCY_SOURCE: BUNDLED
443443
LIBARROW_MINIMAL: "false"
444-
ARROW_MIMALLOC: "ON"
444+
# explicitly enable GCS when we build libarrow so that binary libarrow
445+
# users get more fully-featured builds
446+
ARROW_GCS: "ON"
445447
volumes: *ubuntu-volumes
446448
command: &cpp-static-command
447449
/bin/bash -c "

dev/tasks/r/github.packages.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,6 @@ jobs:
8181
env:
8282
{{ macros.github_set_sccache_envvars()|indent(8) }}
8383
MACOSX_DEPLOYMENT_TARGET: "11.6"
84-
ARROW_S3: ON
8584
ARROW_GCS: ON
8685
ARROW_DEPENDENCY_SOURCE: BUNDLED
8786
CMAKE_GENERATOR: Ninja

r/tools/nixlibs.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -597,7 +597,7 @@ build_libarrow <- function(src_dir, dst_dir) {
597597
env_var_list <- c(
598598
env_var_list,
599599
ARROW_S3 = Sys.getenv("ARROW_S3", "ON"),
600-
ARROW_GCS = Sys.getenv("ARROW_GCS", "ON"),
600+
# ARROW_GCS = Sys.getenv("ARROW_GCS", "ON"),
601601
ARROW_WITH_ZSTD = Sys.getenv("ARROW_WITH_ZSTD", "ON")
602602
)
603603
}
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
---
2+
title: "Libarrow binary features"
3+
description: >
4+
Understanding which C++ features are enabled in Arrow R package builds
5+
output: rmarkdown::html_vignette
6+
---
7+
8+
This document explains which C++ features are enabled in different Arrow R
9+
package build configurations, and documents the decisions behind our default
10+
feature set. This is intended as internal developer documentation for understanding
11+
which features are enabled in which builds. It is not intended to be a guide for
12+
installing the Arrow R package; for that, see the
13+
[installation guide](../../install.html).
14+
15+
## Overview
16+
17+
When the Arrow R package is installed, it needs a copy of the Arrow C++ library
18+
(libarrow). This can come from:
19+
20+
1. **Prebuilt binaries** we host (for releases and nightlies)
21+
2. **Source builds** when binaries aren't available or users opt out
22+
23+
The features available in libarrow depend on how it was built. This document
24+
covers the feature configuration for both scenarios.
25+
26+
## Prebuilt libarrow binary configuration
27+
28+
We produce prebuilt libarrow binaries for macOS, Windows, and Linux. These
29+
binaries include **more features** than the default source build to provide
30+
users with a fully-featured experience out of the box.
31+
32+
### Current binary feature set
33+
34+
| Platform | S3 | GCS | Configured in |
35+
|----------|----|----|---------------|
36+
| macOS (ARM64, x86_64) | ON | ON | `dev/tasks/r/github.packages.yml` |
37+
| Windows | ON | ON | `ci/scripts/PKGBUILD` |
38+
| Linux (x86_64) | ON | ON | `compose.yaml` (`ubuntu-cpp-static`) |
39+
40+
### Exceptions to our build defaults
41+
42+
Even though GCS defaults to OFF for source builds, we explicitly enable it in
43+
our prebuilt binaries because:
44+
45+
1. **Binary users expect features to "just work"** - they shouldn't need to
46+
rebuild from source to access cloud storage
47+
2. **Build time is not a concern** - we build binaries once in CI, not on
48+
user machines
49+
3. **Parity across platforms** - users get the same features regardless of OS
50+
51+
52+
## Feature configuration in source builds of libarrow
53+
54+
Source builds are controlled by `r/inst/build_arrow_static.sh`. The key
55+
environment variable is `LIBARROW_MINIMAL`:
56+
57+
- `LIBARROW_MINIMAL` unset: Default feature set (Parquet, Dataset, JSON, common compression ON; S3/GCS/jemalloc OFF)
58+
- `LIBARROW_MINIMAL=false`: Full feature set (adds S3, jemalloc, additional compression)
59+
- `LIBARROW_MINIMAL=true`: Truly minimal (disables Parquet, Dataset, JSON, most compression, SIMD)
60+
61+
### Features always enabled
62+
63+
These features are always built regardless of `LIBARROW_MINIMAL`:
64+
65+
| Feature | CMake Flag | Notes |
66+
|---------|------------|-------|
67+
| Compute | `ARROW_COMPUTE=ON` | Core compute functions |
68+
| CSV | `ARROW_CSV=ON` | CSV reading/writing |
69+
| Filesystem | `ARROW_FILESYSTEM=ON` | Local filesystem support |
70+
| JSON | `ARROW_JSON=ON` | JSON reading |
71+
| Parquet | `ARROW_PARQUET=ON` | Parquet file format |
72+
| Dataset | `ARROW_DATASET=ON` | Multi-file datasets |
73+
| Acero | `ARROW_ACERO=ON` | Query execution engine |
74+
| Mimalloc | `ARROW_MIMALLOC=ON` | Memory allocator |
75+
| LZ4 | `ARROW_WITH_LZ4=ON` | LZ4 compression |
76+
| Snappy | `ARROW_WITH_SNAPPY=ON` | Snappy compression |
77+
| RE2 | `ARROW_WITH_RE2=ON` | Regular expressions |
78+
| UTF8Proc | `ARROW_WITH_UTF8PROC=ON` | Unicode support |
79+
80+
### Features controlled by LIBARROW_MINIMAL
81+
82+
When `LIBARROW_MINIMAL=false`, the following additional features are enabled
83+
(via `$ARROW_DEFAULT_PARAM=ON`):
84+
85+
| Feature | CMake Flag | Default |
86+
|---------|------------|---------|
87+
| S3 | `ARROW_S3` | `$ARROW_DEFAULT_PARAM` |
88+
| Jemalloc | `ARROW_JEMALLOC` | `$ARROW_DEFAULT_PARAM` |
89+
| Brotli | `ARROW_WITH_BROTLI` | `$ARROW_DEFAULT_PARAM` |
90+
| BZ2 | `ARROW_WITH_BZ2` | `$ARROW_DEFAULT_PARAM` |
91+
| Zlib | `ARROW_WITH_ZLIB` | `$ARROW_DEFAULT_PARAM` |
92+
| Zstd | `ARROW_WITH_ZSTD` | `$ARROW_DEFAULT_PARAM` |
93+
94+
### Features that require explicit opt-in
95+
96+
GCS (Google Cloud Storage) is **always off by default**, even when
97+
`LIBARROW_MINIMAL=false`:
98+
99+
| Feature | CMake Flag | Default | Reason |
100+
|---------|------------|---------|--------|
101+
| GCS | `ARROW_GCS` | `OFF` | Build complexity, dependency size |
102+
103+
To enable GCS in a source build, you must explicitly set `ARROW_GCS=ON`.
104+
105+
**Why is GCS off by default?**
106+
107+
GCS was turned off by default in [#48343](https://github.com/apache/arrow/pull/48343)
108+
(December 2025) because:
109+
110+
1. Building google-cloud-cpp is fragile and adds significant build time
111+
2. The dependency on abseil (ABSL) has caused compatibility issues
112+
3. Users who need GCS can still enable it explicitly
113+
114+
## Configuration file locations
115+
116+
### libarrow source build configuration
117+
118+
The main build script that controls source builds:
119+
120+
**`r/inst/build_arrow_static.sh`** - CMake flags and defaults
121+
([view source](https://github.com/apache/arrow/blob/main/r/inst/build_arrow_static.sh))
122+
the environment variables to look for are `LIBARROW_MINIMAL`, `ARROW_*`, and, `ARROW_DEFAULT_PARAM`
123+
124+
### libarrow binary build configuration
125+
126+
Each platform has its own configuration file:
127+
128+
| Platform | Config file | Key settings |
129+
|----------|-------------|--------------|
130+
| macOS | `dev/tasks/r/github.packages.yml` | `LIBARROW_MINIMAL=false`, `ARROW_GCS=ON` |
131+
| Windows | `ci/scripts/PKGBUILD` | `ARROW_GCS=ON`, `ARROW_S3=ON` |
132+
| Linux | `compose.yaml` (`ubuntu-cpp-static`) | `LIBARROW_MINIMAL=false`, `ARROW_GCS=ON` |
133+
134+
## R-universe builds
135+
136+
[R-universe](https://apache.r-universe.dev/arrow) builds the Arrow R package
137+
for users who want newer versions than CRAN. R-universe behavior varies by
138+
platform and architecture:
139+
140+
| Platform | Architecture | Build method | Features |
141+
|----------|--------------|--------------|----------|
142+
| macOS | ARM64 | Downloads prebuilt binary | Full (S3 + GCS) |
143+
| macOS | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
144+
| Windows | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
145+
| Windows | ARM64 | Not supported | NA |
146+
| Linux | x86_64 | Downloads prebuilt binary | Full (S3 + GCS) |
147+
| Linux | ARM64 | Builds from source | S3 only (no GCS) |
148+
149+
### Why Linux ARM64 builds from source
150+
151+
We only publish prebuilt Linux binaries for x86_64 architecture. The binary
152+
selection logic in `r/tools/nixlibs.R` (line 263) explicitly checks for this:
153+
154+
```r
155+
if (identical(os, "darwin") || (identical(os, "linux") && identical(arch, "x86_64"))) {
156+
```
157+
When R-universe builds on Linux ARM64 runners, no binary is available, so it
158+
falls back to building from source using `build_arrow_static.sh`. Since GCS
159+
defaults to OFF in that script, Linux ARM64 users don't get GCS support.
160+
161+
### Enabling GCS for Linux ARM64
162+
163+
To provide full feature parity for Linux ARM64, we would need to:
164+
165+
1. Add an ARM64 Linux build job to `dev/tasks/r/github.packages.yml`
166+
2. Update `select_binary()` in `nixlibs.R` to recognize `linux-aarch64`
167+
3. Add the artifact pattern to `dev/tasks/tasks.yml`
168+
4. Update the nightly upload workflow
169+
170+
See [GH-36193](https://github.com/apache/arrow/issues/36193) for tracking this work.
171+
172+
Alternatively, changing the GCS default in `build_arrow_static.sh` from `OFF`
173+
to `$ARROW_DEFAULT_PARAM` would enable GCS for all source builds, including
174+
Linux ARM64 on R-universe.
175+
176+
## Checking installed features
177+
178+
Users can check which features are enabled in their installation:
179+
180+
```r
181+
# Show all capabilities
182+
arrow::arrow_info()
183+
184+
# Check specific features
185+
arrow::arrow_with_s3()
186+
arrow::arrow_with_gcs()
187+
```
188+
189+
## Related documentation
190+
191+
- [Installation guide](../install.html) - User-facing installation docs
192+
- [Installation details](./install_details.html) - How the build system works
193+
- [Developer setup](./setup.html) - Building Arrow for development

0 commit comments

Comments
 (0)