Add Temporal half-rounding boundary tests across all units#4996
Add Temporal half-rounding boundary tests across all units#4996MidnightDesign wants to merge 9 commits intotc39:mainfrom
Conversation
…and dayOfWeek The existing rounding mode tests for PlainDate.prototype.since() and PlainDate.prototype.until() use dates that produce ~31.97 months of difference, well above the 0.5 boundary. All half-* rounding modes produce identical results, making it impossible to distinguish halfExpand from halfTrunc, or halfEven from halfCeil. These new tests use dates that produce exactly 0.5 fractional progress (183/366 days in a leap year), causing all nine rounding modes to produce distinct result patterns. The 2.5-year case specifically distinguishes halfEven (rounds to nearest even integer 2) from halfExpand (rounds away from zero to 3). Also adds: - inLeapYear century-year tests (1700, 1800, 1900, 2100, 2200) exercising the 100/400 rule that the basic test does not cover - dayOfWeek tests across all 12 months of a year, since the basic test only checks 7 consecutive days within a single month Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unrelated dayOfWeek and inLeapYear tests. Add RoundRelativeDuration spec references to info blocks. Extend half-boundary coverage to PlainDateTime, PlainYearMonth, and ZonedDateTime (until + since). PlainYearMonth uses June-starting dates because RoundRelativeDuration converts month remainders to days: Jun-Nov = 183 days in a 366-day year span (Jun 2019 - Jun 2020 crossing Feb 29), giving exactly 183/366 = 0.5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ptomato
left a comment
There was a problem hiding this comment.
Thanks! This looks good at first glance.
A couple questions and comments:
- Would you mind sharing approximately the process you used for prompting Claude Code and turning the output into this PR? As the technology is new we are still finding our way around. Thank you for disclosing it up front, by the way.
- Did the mutation tool only find a lack of test coverage when rounding to years, or is the coverage for units such as months also lacking? It might make sense to find similar pairs of objects for other units. (If you do that, it probably makes sense to loop over rounding modes that have the same outcome in each case, to prevent the tests from getting overly long.)
- To make sure we are testing what we expect to be testing, it might be helpful to add assertions at the beginning such as
assert.sameValue(earlier1.since(later).total({ unit: "years", relativeTo: earlier1 }), -1.5, "duration is on a 0.5 boundary");
test/built-ins/Temporal/PlainDate/prototype/since/roundingmode-half-boundary.js
Outdated
Show resolved
Hide resolved
test/built-ins/Temporal/PlainDate/prototype/since/roundingmode-half-boundary.js
Outdated
Show resolved
Hide resolved
I told it to write a minimal transpiler for the syntax used in the Temporal tests and it came up with this. It grew as it implemented more classes and methods that required different syntax to be converted. Then I told it to start implementing classes and methods one by one against the test suite. At one point it was pretty much done and all that was left were concepts that are untranslatable, like JS I was wondering whether there was any dead or untested code in there (which there shouldn't if test262 is exhaustive). I fired up my go-to technique for that, mutation testing, using Infection. To my surprise it actually flagged some mutants, specifically pointing out that the rounding modes were pretty much uncovered. I then told it in a different session (directly in the test262 repo) to double and triple check and it and it was positive that that's an actual gap in the test suite. I told it to add the tests, it did it, verified them against V8 and I let it open the PR, making sure to include the fact that this was written by an agent.
Yeah of course, that was important to me. We're all still figuring out how to handle this stuff and I thought it would be important to be upfront with that.
Claude Code did not flag other units by itself specifically, and I didn't ask. Will check tomorrow.
Great idea, I will add that. @ptomato Thanks for taking the time. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clarifies the half-* rounding mode assertion messages by using standard tie-breaking terminology for consistency with non-half mode phrasing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The copyright hallucination makes me very skeptical of this contribution.
|
I checked the code, not the boilerplate-looking legal header.
Absolutely fair. I'm skeptical of AI-written code that I haven't generated myself either. But given the radical shift in the last couple of months I think these kinds of policies will need to be updated. What a weird point in time; people still see anything involving AIs instinctively as low-quality (myself included) while even the greatest skeptics have started using it all the time. Personally, I'm in favor of being transparent with the use of AI and allowing it in a limited capacity instead of banning it outright (and people using it anyway without telling). |
I'm comfortable with this contribution. I read the AI policy as "don't copy-paste LLM-generated text into discussions that TC39 delegates have to waste time reading", not "don't accept any LLM-assisted PRs" (and from what I caught of the discussion when the policy was adopted, that's how it was intended.) Furthermore, I believe I can vouch for this PR being (1) correct and (2) filling an actual coverage gap, because I checked those things. And the modifications I suggested (looping over rounding modes that expect the same result, asserting that I overlooked the copyright line because I hadn't met Rudi Theunissen and I just assumed that was MidnightDesign's real name 😄 I can understand others' skepticism after seeing that, but I'm pretty confident these are useful tests. |
Adds assert.sameValue checks using .until().total() at the start of each test to prove the test data produces exactly x.5 years, as suggested by @ptomato. Uses .until().total() rather than .since().total() for the since tests because .total() with a negative duration traverses backward from relativeTo into a non-leap year (2018, 365 days), yielding 183/365 ≈ 0.5014 instead of 183/366 = 0.5. Verified against V8 14.8.37 via test262-harness. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Added Note: the All 8 files verified against V8 14.8.37 via test262-harness (16/16 pass). (This comment was AI-generated — I don't fully understand the |
Extend roundingmode-half-boundary.js tests to cover all units where an exact 0.5 fractional boundary can be achieved, not just years. Loop over rounding modes with identical outcomes to keep tests concise. Units by type: - PlainDate: years, months - PlainDateTime/ZonedDateTime: years, months, weeks, days, hours, minutes, seconds, milliseconds, microseconds - PlainYearMonth: years (months always exact for this type) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@ptomato Added tests for all units in ec3a307. I've uploaded the full assistant session since you were interested in it in your first message here: https://gist.github.com/MidnightDesign/a9fd0fffff198d6c59929e64062855a5 |
|
@MidnightDesign Thank you for your contribution here! 🙏 Please be mindful of our AI policy. Specifically:
We ask that contributors understand, and be able to explain, what they are posting. That includes code, of course. It also includes comments and posts, which should be your own writing and not the output of AI/LLMs. |
|
@ctcpip Thank you, I will be mindful of that in the future. |
Note
This PR was drafted with the help of Claude Code. Apologies if that's not welcome here — happy to revise anything by hand.
Context
We're building temporal-php, a PHP 8.4 port of the TC39 Temporal API. We run the test262 suite as part of our CI (transpiled to PHP) and also use Infection for mutation testing. Infection systematically modifies source code and checks whether the test suite catches each mutation. Out of 11,983 mutations, several hundred escaped — and many of those escaped because the upstream test262 data doesn't exercise certain code paths. This PR adds tests to close the most impactful gaps we found.
Summary
Eight new test files across four Temporal types, all exercising the exact 0.5 fractional boundary in
RoundRelativeDuration:PlainDate(since/anduntil/): years (183/366 = 0.5 in a leap year) and months (14/28 = 0.5 in February).PlainDateTimeandZonedDateTime(since/anduntil/): years, months, weeks (3.5/7), days (12/24), hours (30/60), minutes (30/60), seconds (500/1000 ms), milliseconds (500/1000 µs), and microseconds (500/1000 ns).PlainYearMonth(since/anduntil/): years only (months always produce exact results for this type). Uses June-starting dates becauseRoundRelativeDurationconverts month remainders to days: Jun–Nov = 183 days in the 366-day span crossing Feb 29, 2020.Each unit is tested with both an odd integer part (e.g. 1.5) and an even integer part (e.g. 2.5) to distinguish
halfEvenfromhalfExpand. Rounding modes with identical outcomes are looped to keep tests concise. Theuntiltests cover the positive direction; thesincetests cover the negative direction (wherehalfExpandandhalfCeildiverge). Each scenario includes a.total()assertion to verify the duration is exactly on the 0.5 boundary.How the gaps were found
Infection rewrites code like swapping
halfExpandmatch arms, etc. If no test fails, the mutant "escapes." We traced ~80 escaped mutants back to the fact that all half-* rounding modes produce the same output with the current test data (no value near the 0.5 boundary), so entirematcharms can be deleted or swapped without detection.All expected values were verified against V8's
Temporalimplementation viatest262-harnesswithesvu-installed V8 (d8).