Skip to content

GH-48593: [C++] C++20: use standard calendar / timezone APIs#48601

Open
rok wants to merge 75 commits intoapache:mainfrom
rok:cpp20_use_chrono
Open

GH-48593: [C++] C++20: use standard calendar / timezone APIs#48601
rok wants to merge 75 commits intoapache:mainfrom
rok:cpp20_use_chrono

Conversation

@rok
Copy link
Member

@rok rok commented Dec 19, 2025

Rationale for this change

Switch to std::chrono for MSVC to be able to use the system-provided timezone automatically on Windows.

What changes are included in this PR?

This adds chrono_internal.h that uses C++20 std::chrono timezone/calendar APIs on compilers with support (MSVC only for now) and falls back to vendored date.h otherwise.

Are these changes tested?

Partially tested locally and partially to be tested on CI.

Are there any user-facing changes?

Yes, Windows users will no longer need to install the IANA tzdb (see instructions here and here). We possibly have tzdb download set up in CI too and should update it appropriately.

@github-actions
Copy link

⚠️ GitHub issue #48593 has been automatically assigned in GitHub to PR creator.

@rok rok force-pushed the cpp20_use_chrono branch 9 times, most recently from 4283740 to d82f990 Compare December 23, 2025 18:14
@rok
Copy link
Member Author

rok commented Dec 23, 2025

It seems that std::chrono on GCC (14.3.0, 15.2.0) potentially has a bug that triggers some of our tests. Meanwhile std::chrono on MSVC 19.44 ( 14.44) appears to be pass them and is correct or at least consistent with vendored date.h. I would therefore advise we only switch to std::chrono on MSVC for now as that gives us the most benefit anyway (users no longer have to deal with the tz db).

Below is the explanation and reproduction of the bug.

// GCC libstdc++ DST bug reproduction
//
// The AN to AS Transition Bug
// ---------------------------
// Source https://github.com/eggert/tz/blob/c37fbc3249c1a1334948b38f3bca47dee5c11dd1/australasia#L165-L192
// Australia/Broken_Hill used the AN (New South Wales) rules until 2000, then
// switched to AS (South Australia) rules. Under AN rules, DST started the last
// Sunday of October and ended the last Sunday of March. For the 1999-2000
// summer, DST started October 31, 1999 and would end March 26, 2000. February
// 29, 2000 falls squarely within this DST period, so the correct offset should
// be 9:30 base + 1:00 DST = 10:30 (630 minutes).
//
// Why GCC's Data is Wrong
// -----------------------
// When libstdc++ processes the zone transition from AN rules to AS rules (which
// happens in year 2000), it appears to lose or reset the DST state inherited
// from the AN rules. Instead of recognizing that DST is still active from the
// October 1999 transition, it reports offset=570 (just the 9:30 base) with
// save=0. The inconsistency is evident: it returns abbrev="ACDT" (daylight
// time) but the offset and save values indicate standard time. The AN rules
// clearly show DST should be active until the last Sunday of March 2000.
//
// Compile: g++ -std=c++20 -o gcc_dst_bug gcc_libstdcxx_dst_bug.cpp
// Expected: 630 (10:30 = 9:30 base + 1:00 DST)
// Actual:   570 (9:30 = base only, DST missing)

#include <chrono>
#include <iostream>

int main() {
  using namespace std::chrono;
  auto* tz = locate_zone("Australia/Broken_Hill");
  auto info = tz->get_info(sys_days{2000y / February / 29d} + 23h + 23min + 23s);
  std::cout << duration_cast<minutes>(info.offset).count() << "\n";
}

@rok rok marked this pull request as ready for review December 23, 2025 18:46
@rok
Copy link
Member Author

rok commented Dec 23, 2025

@pitrou

@rok rok changed the title GH-48593: [Draft][C++] C++20: use standard calendar / timezone APIs GH-48593: [C++] C++20: use standard calendar / timezone APIs Dec 23, 2025
@pitrou
Copy link
Member

pitrou commented Jan 5, 2026

It seems that std::chrono on GCC (14.3.0, 15.2.0) potentially has a bug that triggers some of our tests.

Is the bug reported somewhere? If not, can you do that?

@pitrou
Copy link
Member

pitrou commented Jan 5, 2026

We possibly have tzdb download set up in CI too and should update it appropriately.

Yes, I think we should do so. There are also a bunch of code snippets in the C++ and Python codebase that could be removed, IIRC.

@rok
Copy link
Member Author

rok commented Jan 5, 2026

Is the bug reported somewhere? If not, can you do that?

It seems to be related to a known issue, I added comment explaining our case.

@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Jan 5, 2026
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jan 5, 2026
rok and others added 21 commits February 27, 2026 20:27
diff --git c/python/pyarrow/tests/test_compute.py i/python/pyarrow/tests/test_compute.py
index 3975a03..08bb6b2 100644
--- c/python/pyarrow/tests/test_compute.py
+++ i/python/pyarrow/tests/test_compute.py
@@ -2367,15 +2367,17 @@ def _compare_strftime_strings_on_windows(result, expected):
     # instead of timezone abbreviations (e.g. "CET")
     # apache#48767

-    p = "(UTC|GMT[+-][0-9]+)$"
+    # Match timezone suffixes: UTC, GMT offsets (GMT+1, GMT-5), or abbreviations (CET, CEST)
+    p = "(UTC|GMT[+-]?[0-9]*|[A-Z]{2,5})$"

-    ends_with_offset = pc.match_substring_regex(result, p)
-    all_end_with_offset = pc.all(ends_with_offset, skip_nulls=True).as_py()
-    assert all_end_with_offset, "All timezone values should be GMT offset format "\
-                                f"or UTC \nActual: {result}"
+    ends_with_tz = pc.match_substring_regex(result, p)
+    all_end_with_tz = pc.all(ends_with_tz, skip_nulls=True).as_py()
+    assert all_end_with_tz, "All timezone values should be GMT offset format, "\
+                            f"UTC, or timezone abbreviation\nActual: {result}"

     result_substring = pc.replace_substring_regex(result, pattern=p, replacement="")
-    assert expected.starts_with(result_substring), \
+    expected_substring = pc.replace_substring_regex(expected, pattern=p, replacement="")
+    assert result_substring.equals(expected_substring), \
         f"Expected: {expected}, \nActual: {result} " \
         "\nNote: tz suffix is not being compared"

diff --git c/python/pyarrow/util.py i/python/pyarrow/util.py
index 71ec865..123e226 100644
--- c/python/pyarrow/util.py
+++ i/python/pyarrow/util.py
@@ -244,6 +244,7 @@ def _download_requests(url, out_path):
         with open(out_path, 'wb') as f:
             f.write(response.content)

+
 # TODO(apacheGH-48593): Remove when libc++ supports std::chrono timezone
 # apache#48593
 def download_tzdata_on_windows():
Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
@rok rok force-pushed the cpp20_use_chrono branch from 863e56f to 9e87d57 Compare February 27, 2026 19:28
@rok
Copy link
Member Author

rok commented Feb 27, 2026

@github-actions crossbow submit test-r-macos-as-cran

@github-actions
Copy link

Revision: 9e87d57

Submitted crossbow builds: ursacomputing/crossbow @ actions-a1f6d9bce5

Task Status
test-r-macos-as-cran GitHub Actions

@rok
Copy link
Member Author

rok commented Feb 27, 2026

I've tried a couple of eebases rebasing on commit before e37c516 succeeded. Trying rebase on 59e0ba6 now. I hope there's not too much to workaround.

…ty issues (apache#49223)

Now that we have CI for it, check on other issues with C++20 compatibility on CRAN. I know that the code in apache#49105 is likely problematic

Resolves: apache#49287

### Rationale for this change

### What changes are included in this PR?

### Are these changes tested?

### Are there any user-facing changes?
* GitHub Issue: apache#49287

Authored-by: Jonathan Keane <jkeane@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>
@rok
Copy link
Member Author

rok commented Feb 27, 2026

@github-actions crossbow submit test-r-macos-as-cran

@github-actions
Copy link

Revision: c6caa3c

Submitted crossbow builds: ursacomputing/crossbow @ actions-10e324e6df

Task Status
test-r-macos-as-cran GitHub Actions

@rok
Copy link
Member Author

rok commented Feb 28, 2026

@jonkeane I rebased to the commit before #49298 and cherrypicked #49223 commit. CRAN CI is now passing. So this PR by itself is above board for CRAN I suppose?

@rok
Copy link
Member Author

rok commented Feb 28, 2026

@pitrou since this won't cause issues with CRAN shall we proceed with review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants