GH-48593: [C++] C++20: use standard calendar / timezone APIs#48601
GH-48593: [C++] C++20: use standard calendar / timezone APIs#48601rok wants to merge 75 commits intoapache:mainfrom
Conversation
|
|
4283740 to
d82f990
Compare
|
It seems that std::chrono on GCC (14.3.0, 15.2.0) potentially has a bug that triggers some of our tests. Meanwhile std::chrono on MSVC 19.44 ( 14.44) appears to be pass them and is correct or at least consistent with vendored Below is the explanation and reproduction of the bug. // GCC libstdc++ DST bug reproduction
//
// The AN to AS Transition Bug
// ---------------------------
// Source https://github.com/eggert/tz/blob/c37fbc3249c1a1334948b38f3bca47dee5c11dd1/australasia#L165-L192
// Australia/Broken_Hill used the AN (New South Wales) rules until 2000, then
// switched to AS (South Australia) rules. Under AN rules, DST started the last
// Sunday of October and ended the last Sunday of March. For the 1999-2000
// summer, DST started October 31, 1999 and would end March 26, 2000. February
// 29, 2000 falls squarely within this DST period, so the correct offset should
// be 9:30 base + 1:00 DST = 10:30 (630 minutes).
//
// Why GCC's Data is Wrong
// -----------------------
// When libstdc++ processes the zone transition from AN rules to AS rules (which
// happens in year 2000), it appears to lose or reset the DST state inherited
// from the AN rules. Instead of recognizing that DST is still active from the
// October 1999 transition, it reports offset=570 (just the 9:30 base) with
// save=0. The inconsistency is evident: it returns abbrev="ACDT" (daylight
// time) but the offset and save values indicate standard time. The AN rules
// clearly show DST should be active until the last Sunday of March 2000.
//
// Compile: g++ -std=c++20 -o gcc_dst_bug gcc_libstdcxx_dst_bug.cpp
// Expected: 630 (10:30 = 9:30 base + 1:00 DST)
// Actual: 570 (9:30 = base only, DST missing)
#include <chrono>
#include <iostream>
int main() {
using namespace std::chrono;
auto* tz = locate_zone("Australia/Broken_Hill");
auto info = tz->get_info(sys_days{2000y / February / 29d} + 23h + 23min + 23s);
std::cout << duration_cast<minutes>(info.offset).count() << "\n";
} |
Is the bug reported somewhere? If not, can you do that? |
Yes, I think we should do so. There are also a bunch of code snippets in the C++ and Python codebase that could be removed, IIRC. |
It seems to be related to a known issue, I added comment explaining our case. |
diff --git c/python/pyarrow/tests/test_compute.py i/python/pyarrow/tests/test_compute.py index 3975a03..08bb6b2 100644 --- c/python/pyarrow/tests/test_compute.py +++ i/python/pyarrow/tests/test_compute.py @@ -2367,15 +2367,17 @@ def _compare_strftime_strings_on_windows(result, expected): # instead of timezone abbreviations (e.g. "CET") # apache#48767 - p = "(UTC|GMT[+-][0-9]+)$" + # Match timezone suffixes: UTC, GMT offsets (GMT+1, GMT-5), or abbreviations (CET, CEST) + p = "(UTC|GMT[+-]?[0-9]*|[A-Z]{2,5})$" - ends_with_offset = pc.match_substring_regex(result, p) - all_end_with_offset = pc.all(ends_with_offset, skip_nulls=True).as_py() - assert all_end_with_offset, "All timezone values should be GMT offset format "\ - f"or UTC \nActual: {result}" + ends_with_tz = pc.match_substring_regex(result, p) + all_end_with_tz = pc.all(ends_with_tz, skip_nulls=True).as_py() + assert all_end_with_tz, "All timezone values should be GMT offset format, "\ + f"UTC, or timezone abbreviation\nActual: {result}" result_substring = pc.replace_substring_regex(result, pattern=p, replacement="") - assert expected.starts_with(result_substring), \ + expected_substring = pc.replace_substring_regex(expected, pattern=p, replacement="") + assert result_substring.equals(expected_substring), \ f"Expected: {expected}, \nActual: {result} " \ "\nNote: tz suffix is not being compared" diff --git c/python/pyarrow/util.py i/python/pyarrow/util.py index 71ec865..123e226 100644 --- c/python/pyarrow/util.py +++ i/python/pyarrow/util.py @@ -244,6 +244,7 @@ def _download_requests(url, out_path): with open(out_path, 'wb') as f: f.write(response.content) + # TODO(apacheGH-48593): Remove when libc++ supports std::chrono timezone # apache#48593 def download_tzdata_on_windows():
Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
863e56f to
9e87d57
Compare
|
@github-actions crossbow submit test-r-macos-as-cran |
|
Revision: 9e87d57 Submitted crossbow builds: ursacomputing/crossbow @ actions-a1f6d9bce5
|
…ty issues (apache#49223) Now that we have CI for it, check on other issues with C++20 compatibility on CRAN. I know that the code in apache#49105 is likely problematic Resolves: apache#49287 ### Rationale for this change ### What changes are included in this PR? ### Are these changes tested? ### Are there any user-facing changes? * GitHub Issue: apache#49287 Authored-by: Jonathan Keane <jkeane@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>
|
@github-actions crossbow submit test-r-macos-as-cran |
|
Revision: c6caa3c Submitted crossbow builds: ursacomputing/crossbow @ actions-10e324e6df
|
|
@pitrou since this won't cause issues with CRAN shall we proceed with review? |
Rationale for this change
Switch to std::chrono for MSVC to be able to use the system-provided timezone automatically on Windows.
What changes are included in this PR?
This adds
chrono_internal.hthat uses C++20 std::chrono timezone/calendar APIs on compilers with support (MSVC only for now) and falls back to vendoreddate.hotherwise.Are these changes tested?
Partially tested locally and partially to be tested on CI.
Are there any user-facing changes?
Yes, Windows users will no longer need to install the IANA tzdb (see instructions here and here). We possibly have tzdb download set up in CI too and should update it appropriately.