Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intl402/Temporal/Plain(Date|MonthDay)/from/basic are Invalid tests #3320

Closed
FrankYFTang opened this issue Nov 19, 2021 · 36 comments · Fixed by #3321
Closed

intl402/Temporal/Plain(Date|MonthDay)/from/basic are Invalid tests #3320

FrankYFTang opened this issue Nov 19, 2021 · 36 comments · Fixed by #3321

Comments

@FrankYFTang
Copy link
Contributor

These two tests coded with values not stated/specified in any specification / proposal

https://github.com/tc39/test262/blob/main/test/intl402/Temporal/PlainDate/from/basic.js
the value "ce" and "bce" is not specified in any where in the temporal proposal nor in UTS35 or CLDR.

https://github.com/tc39/test262/blob/main/test/intl402/Temporal/PlainMonthDay/from/basic.js

Same issue

The exact value of era is not specified in the proposal YET and therefore we should NOT code test262 to test against it here. We should remove these two test until the champion specify the value of era somewhere.

@ryzokuken @sffc @ptomato @ljharb

@ptomato
Copy link
Contributor

ptomato commented Nov 19, 2021

I don't think I agree with this, though my opinion would depend on what exactly the status of the tests in the intl402/ directory is regarding implementation-definedness.

@FrankYFTang
Copy link
Contributor Author

so, where in the spec mention "ce" and "bce" now?

@sffc
Copy link
Contributor

sffc commented Nov 20, 2021

These are valid tests; the spec work for era/month codes is pending. CC @ryzokuken

@FrankYFTang
Copy link
Contributor Author

Again, could you tell me where in the spec text stated "ce" and "bce" for era code? why is not "CE" and "BCE" or "AD" and "BC" or "ad" and "bc" ? Have it been discussed and agree within the champion and/or TC39? If not, they are NOT YET valid, right?

@sffc
Copy link
Contributor

sffc commented Nov 20, 2021

This is in the polyfill and is aligned with what I recall discussing with the champions, but you are correct that I do not see this behavior actually documented anywhere yet.

Previous discussions include, e.g., tc39/proposal-temporal#901

I do think this is still a case of the polyfill and the tests being ahead of the spec.

@FrankYFTang
Copy link
Contributor Author

Even in the discussion tc39/proposal-temporal#901 I there are no mention of the exact era code lower case "ce" and "bce" there. So even if there is a spec text (which does not exit yet) already documented what you stated in that discussion, this test is STILL invalid because no body ever mention lowercase "ce" and "bce" anywhere. (not even in the dicussion 901)

@sffc
Copy link
Contributor

sffc commented Nov 20, 2021

Is the precedence in the polyfill sufficient?

@justingrant may also know if there is another source for the "ce" and "bce" era codes.

@FrankYFTang
Copy link
Contributor Author

FrankYFTang commented Nov 20, 2021

Is the precedence in the polyfill sufficient?

Of course NOT . These two tests should be removed until a spec text got agree upon at least. There are no document about using lower "ce" and "bce" anywhere. No standard use it, not part of the proposal (yet), not even in any prior discussion thread. We should remove them for now until they are officially part of the proposal.

@sffc
Copy link
Contributor

sffc commented Nov 20, 2021

These two tests should be removed until a spec text got agree upon at least.

It's standard practice for Test262 for ECMA-402 that we have ILD behavior in the test suite. The era codes are implementation-dependent behavior, at least until we have the spec from ryzokuken.

@FrankYFTang
Copy link
Contributor Author

what is "ILD behavior" ?

@FrankYFTang
Copy link
Contributor Author

All the locale depend tests in current test262 intl402 are based on at least some external sources: for example, IANA timezone db, or CLDR data. But lowercase code"ce" and "bce" are nowhere in ANY external reference standard, not in CLDR nor in UTS35 or any standards any implementation could based on. CLDR does has data file mention the UPPERCASE "CE" and "BCE" but nowhere in CLDR use lowercase "ce" and "bce" at all so no possible implementation will based on that to produce "ce" and "bce" for the implementation.

@FrankYFTang
Copy link
Contributor Author

UTS35 use "0" for BC and "1" for AD as era code in the https://unicode.org/reports/tr35/tr35-dates.html#months_days_quarters_eras
so if we want to defer the era code to the defintion in UTS35, we should use era: "0" for BC and era: "1" for AD . Notice the "BC" and "BCE" in this part are Localized value, not code for the era. The code for the era in CLDR is "0" and "1" for the case of gregorian calendar.

        <eraAbbr>
            <era type="0">BC</era>
            <era type="0" alt="variant">BCE</era>
            <era type="1">AD</era>
            <era type="1" alt="variant">CE</era>
        </eraAbbr>

@justingrant
Copy link
Contributor

Is the precedence in the polyfill sufficient?

@justingrant may also know if there is another source for the "ce" and "bce" era codes.

As @FrankYFTang implies above, I could not find a 3rd-party source for enumerated identifiers for eras. If such a thing existed, I would have used it in the polyfill. For Gregorian, I implemented 'ce' and 'bce' in the polyfill based on @sffc's request in unicode-org/icu4x#470.

I don't however think the right answer is to use numeric indexes like "0" or "1" (which is what https://unicode.org/reports/tr35/tr35-dates.html#Calendar_Data does), for the same reason we don't use numeric enumerations for other things in Intl, because it's harder to understand what the identifier does when reading or writing code. We could have used 0-3 instead of 'h23', 'h12', etc. but we didn't. I think the same applies to era identifiers too.

In theory we could try to generate these identifiers from CLDR data like what Frank excerpted above. The algorithm could be "Find CLDR's localized name of this era in English (using the "variant" option if present), then convert it to lowercase". Although this seems like a reasonable starting point to populate a first draft of era identifiers, I'd be skeptical of this for long-term maintenance because:

  • CLDR localizations can change, which would introduce interop issues between implementations
  • Last I checked, there are no English localizations for some calendars. Example:
new Date().toLocaleDateString(undefined, {calendar: 'ethiopic'})
// => '3/11/2014 ERA1'
  • Using numbers may induce developers to make assumptions around numeric values (e.g. they are in temporal order, there are no gaps, etc.) that may not hold true, esp. in Japanese where future eras will be created and where AFAIK historical eras are occasionally added in response to scholars' work with historical sources.
  • Using number identifiers was strongly rejected in @Manishearth's doc for Japanese era identifiers.

I think there's value in standardizing era names, but I don't think that a lack of standard names should make us use numeric identifiers. My opinion is that "unstandardized but readable" is better than "numeric IDs from CLDR". To be clear, I'd support standardizing identifiers, but I don't think we should use numbers just because there's no standard yet.

@Manishearth has given era identifiers a lot of thought for Japanese eras so I'd be interested in hearing his opinion about how to define and maintain identifiers for built-in era names for all calendars.

@FrankYFTang
Copy link
Contributor Author

FrankYFTang commented Nov 22, 2021

Just want to be clear, this is issue is NOT about what ear code should be, but the request to remove the test UNTIL Temporal spec clearly spec out what it is. We should MOVE the discussion of how should Temporal spec out the era code in Temporal. The only question we need to answer is very simple -
"AS Nov 22, 2021 , does Temporal spec text include a definition of using lowercase "ce" and "bce" as era code?"
IF the answer is YES, we should keep these tests now,
IF the anser is NO , we should remove these tests now.

We can surely add back tests against era code AFTER they got defined in the Temporal. But as today, I see no spec text stated, that. Do you agree about that? All other questions are beyond this issue.

@Manishearth
Copy link

@justingrant so given the complexities of Japanese era naming my hope was that we would define era codes as a per-calendar deal and in the spec (or somewhere) define how the mapping worked. Of course Japanese will need a bit of an algorithm for the mapping (though it can also be displayed as a table that gets extended every now and then), and for the other calendars we can just show a small 1-3 element table. We should pick some identifier that looks like the era name, ideally "ce" and "bce" for gregorian, "be" for Buddhist, "am" for

In general I think Temporal will need calendar-specific pluggability in the spec, and ideally spec text for each calendar that plugs into those hooks. The era mappings would be a part of that.

@FrankYFTang
Copy link
Contributor Author

Can we agree the fact that currently in the Temporal spec the word "ce" and "bce" are not mentioned, nor in any defined value for KEY in UTS35 or CLDR?

@FrankYFTang
Copy link
Contributor Author

These are valid tests; the spec work for era/month codes is pending. CC @ryzokuken

If the spec work is "PENDING" as you said, then the validaity of the test is also "PENDING", and therefore not valid YET.

@FrankYFTang
Copy link
Contributor Author

FrankYFTang commented Nov 22, 2021

What I am asking here is we follow a right procedure-

  1. agree upon what the spec is and put the agree upon spec into a PR and merge into Temporal proposal
  2. add the tests to test262 based on the agreed spec

We should NOT reverse the order here. If we do not have 1 we should not have 2. If we have tests but not 1, we should remove these tests because the end result of 1 MAY or MAY NOT be the same as currently got put into the tests prematurely.

@Manishearth
Copy link

Yeah, I agree with Frank about the order of the tests: we should spec first and then write tests, otherwise the current ambiguity is hard on implementors.

@justingrant
Copy link
Contributor

I have no objection to removing tests until the table that @Manishearth describes (excepted below) is complete and added to the spec. I assume that would be an editorial change because the current spec is not implementable without it? FYI @ptomato @gibson042

for the other calendars we can just show a small 1-3 element table. We should pick some identifier that looks like the era name, ideally "ce" and "bce" for gregorian, "be" for Buddhist

Agreed. @ryzokuken are you the lucky person who's going to build this table as part of your existing calendar-focused refactoring of the 402 spec? Or should this be a separate issue to track?

@justingrant
Copy link
Contributor

BTW, the only outcome that I wouldn't support is changing tests to some arbitrary value like "0" or "1". Either we should have tests for the values we know are gonna be in the spec, or we should have no tests.

@FrankYFTang
Copy link
Contributor Author

I am glad you agree we should remove these tests now until the issue got resolved and spec out. We need to add some tests back later, but we have to have a agree upon spec text first.

@ryzokuken
Copy link
Member

Sorry for being late to the party, but I agree with Frank here, I have WIP work that will specify these values and we can keep the tests on hold until it's done.

Agreed. @ryzokuken are you the lucky person who's going to build this table as part of your existing calendar-focused refactoring of the 402 spec? Or should this be a separate issue to track?

Yes, this is the current plan.

BTW, the only outcome that I wouldn't support is changing tests to some arbitrary value like "0" or "1". Either we should have tests for the values we know are gonna be in the spec, or we should have no tests.

I agree completely. We should temporarily disable or remove the tests and re-add them once the spec is finished.

@Manishearth
Copy link

I have WIP work that will specify these values and we can keep the tests on hold until it's done.

Mind sharing the WIP with me? I'm facing a similar problem in ICU4X and they're initially internal data model values so it doesn't matter as much but it's worth trying to pick values in rough alignment with Temporal.

@ryzokuken
Copy link
Member

@Manishearth you meant for era codes and such?

@Manishearth
Copy link

@ryzokuken yeah

@ptomato
Copy link
Contributor

ptomato commented Nov 23, 2021

I would still rather not remove these tests, because in the end it'll come down to being my responsibility to keep track that they exist and were deleted, and to resubmit them when the corresponding ECMA-402 parts are added. I've already got plenty of things on my plate in service of the larger goal of complete test coverage for Temporal, without keeping track of tests that need to be reverted and re-added. So for that reason, I'd really like to ask for some flexibility from the consumers of test262 in this regard, to tolerate having these underspecified tests while we close this gap.

However, I can tell my opinion is squarely in the minority here, so if people really don't want to do that, then I'll adapt 🤷

@srl295
Copy link
Member

srl295 commented Nov 23, 2021

BTW, the only outcome that I wouldn't support is changing tests to some arbitrary value like "0" or "1". Either we should have tests for the values we know are gonna be in the spec, or we should have no tests.

thanks @ryzokuken for the ping.

summary: I'd recommend using arbitrary values such as 0 or 1.

just as a comment… i fail to see the utility in stabilizing such identifiers. i think they should come from users (UI input) and/or be queried based on metadata. (iterate over all eras with a rich metadata api that tells you what they are for.) Otherwise you'll be reinventing parse-without-parse of era names, and within cldr there definitely issues around the identity of eras. I appreciate 262/402, but I think this is also a case of overspecification of what should come from data, and vary with data.

@ryzokuken
Copy link
Member

Thanks @srl295!

For context, I'd been talking to @srl295 about eras and the task of coming up with identifiers for each era (CLDR already supports indices as Frank pointed out but no identifiers).

@FrankYFTang
Copy link
Contributor Author

without keeping track of tests that need to be reverted and re-added.

You are ASSUMING the final spec would be exactly as what you currently put into the test.

I am ASSUMING the final spec MAY NOT be exactly as what you currently put into the test.

If the final spec text is changed to use "CE" and "BCE" (uppercase not lowercase) or any other values, you still need to change all these tests anyway and therefore you still need to track it, simply because the fact that this part of spec is not yet specified.

@justingrant
Copy link
Contributor

However, I can tell my opinion is squarely in the minority here, so if people really don't want to do that, then I'll adapt 🤷

You're doing a lot of work here, so my opinion is whatever will make your job easier. What about using expected-failures.txt to ignore those tests until we can get identifiers into the spec?

@justingrant
Copy link
Contributor

justingrant commented Nov 24, 2021

i think they should come from users (UI input) and/or be queried based on metadata. (iterate over all eras with a rich metadata api that tells you what they are for.) Otherwise you'll be reinventing parse-without-parse of era names, and within cldr there definitely issues around the identity of eras.

I think it'd be helpful to clarify the use cases where eras will be used:

  1. Era entry in UI
  2. Era display in UI
  3. Era initialization in code, e.g. when setting a default era for UI display, or for initializing a date with an era/eraYear pair.
  4. Debugging, e.g. viewing era identifiers in an IDE debugger, in logs, or in persisted JSON.

For use cases (1) and (2), it doesn't matter what era identifiers we use, because only localized strings (not identifiers) will be shown to the user. However, use cases (3) and (4) are much easier when using human-readable strings because reading, writing, and debugging code gets easier if the code is more self-describing without having to look up values in the docs. This is especially true for niche features like eras that most developers will probably never use, which IMHO makes self-describing-ness more important.

Also, using strings prevents developers from being tempted to assume common behavior across calendars (e.g. 1 means "the latest era") which won't apply to some calendars like 'japanese'. It also prevents trying to use eras for chronological sorting which may not work depending on how 'japanese' eras are assigned, or if negative values for eraYear are used in the source data.

i fail to see the utility in stabilizing such identifiers.

Other than making code easier to read, write, and debug, another reason to use stable alphanumeric identifiers is because (unlike other platforms like C) ECMAScript built-ins typically uses alphanumeric identifiers for similar kinds of metadata, e.g. names of calendars, numbering systems, options for displaying date/time values, etc.

Are there any examples of ECMAScript built-in objects where a numeric lookup table is used instead of a human-readable value? I looked but couldn't find any. We could of course establish a new pattern, but the TC39 committee sets a very high bar to deviate from how the language already accomplishes similar tasks.

For the Japanese calendar we already have consensus on an alphanumeric scheme. Are you suggesting to change this to also use numeric identifiers? Could you and @Manishearth (who drove the Japanese era identifier decision) sync up on this issue. We should use a consistent scheme for all calendars, IMHO.

In all other (non-Japanese) supported calendars, the maximum number of eras is three, no new eras have been created in the last 100 years, and only one supported calendar ('roc') has had an era created in the last 1700 years. In other words, non-Japanese eras are effectively constants, just like constant enumerations like hourCycle: "h11", "h12", "h23", "h24" which are based on much more recent conventions. Why would the latter use strings but the former use numbers?

Otherwise you'll be reinventing parse-without-parse of era names

Not sure I understand. It's an equality comparison, not parsing. There's no fuzziness supported. Could you clarify what you had in mind?

and within cldr there definitely issues around the identity of eras.

My understanding is that these issues (some of which I'm painfully familiar with) are about the era data (start and stop dates), not era identifiers. Would naming an era 1234 instead of 'meiji' in source code have made these issues less bad?

@sffc
Copy link
Contributor

sffc commented Nov 24, 2021

summary: I'd recommend using arbitrary values such as 0 or 1.

This was my preference originally, but @Manishearth and others convinced me otherwise.

just as a comment… i fail to see the utility in stabilizing such identifiers. i think they should come from users (UI input) and/or be queried based on metadata. (iterate over all eras with a rich metadata api that tells you what they are for.)

You are talking about eras as something that is returned from a UI input or an iterator. This implies that you agree that eras are entities. Since they are entities, programs need a way to talk about them during interchange, such as from a date picker UI to a Temporal program. The identifiers serve that purpose. The identifiers could be numbers, as you suggested, or they could be strings, but either way, they are identifiers for entities.

Otherwise you'll be reinventing parse-without-parse of era names

Has this been "invented" before? I'd love to see it. As far as I know, the only precedent are CLDR's numbers.

and within cldr there definitely issues around the identity of eras.

These issues are primarily in calendars with a lot of eras that change over time, like Japanese. @Manishearth's document proposes multiple ways to resolve the identity issue, such as when new eras or added or old eras are discovered. The solution that reached consensus when this was discussed in ICU4X/Temporal earlier this year was to use 4-digit ISO year annotations, like "reiwa-1583" for a hypothetical Reiwa era that started in ISO year 1583.

I appreciate 262/402, but I think this is also a case of overspecification of what should come from data, and vary with data.

We haven't needed this in ECMA-402 before because we didn't talk about eras as entities; we only talked about them as a field in DateTimeFormat. But now Temporal is enabling us, by design, to talk about date arithmetic in non-Gregorian calendar systems. This is why we are now crossing this bridge.

@ptomato
Copy link
Contributor

ptomato commented Nov 24, 2021

It seems like this discussion should take place on an issue such as tc39/ecma402#541

@FrankYFTang
Copy link
Contributor Author

This is a test262 issue. I think we should only discuss are these two tests valid AS TODAY with the CURRENT Temporal proposal. It is clear that there will be some discussion and debate in tc39/ecma402#541 to resolve that and the final spec may or may not be the same as what expect in these two tests. So I think we should just remove these two invalid test and discuss that somewhere else, and once we reach agreement, spec out the spec text and add the test accordingly. There is a bar for what should be accepted into test262- expectation which is not based on the current spec text (the proposal) should not be kept in test262 unless we know a quick fix for the spec text is coming soon. (which is NOT the case as I understand, I see no PR for the spec change yet)

@ptomato
Copy link
Contributor

ptomato commented Nov 29, 2021

I agree, I meant that this is the place to discuss the test262 issue and tc39/ecma402#541 is the place to discuss whether to use "0" or "bce" or "BC" for the era identifiers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants