Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguity in numeric literal formatting in test cases #252

Open
thomasrstorey opened this issue Dec 12, 2021 · 6 comments
Open

Ambiguity in numeric literal formatting in test cases #252

thomasrstorey opened this issue Dec 12, 2021 · 6 comments

Comments

@thomasrstorey
Copy link

Howdy!
I am working on a Perl implementation of a KDL parser (which I would love to submit for the official 1.0 compliant list once it is closer to completion). One issue I am running into right now is, I am not clear on how to correctly and consistently handle numeric literals such that they comply with the expected output of the test cases. The only note provided regarding numeric formatting in tests/README.md is:

All numbers must be converted to their simplest decimal representation. That means that hex, octal, and binary must all be converted to decimals. All floats must be represented using E notation, with a single digit left of the decimal point if the float is less than 1. While parsers are required to consume different number syntaxes, they are under no obligation to represent numbers in any particular way.

So for example, why does:

node 1e10

Result in:

node 1E+10

instead of:

10000000000

? What is the definition of simplest decimal representation? If a parser is under no obligation to represent a number in any particular way, what is intrinsic to the value derived from 1e10 that means it should be represented with E notation instead of as a simple decimal integer?

Thanks!

@zkat
Copy link
Member

zkat commented Dec 12, 2021

I believe these instructions are for how to run things against the test suite, because of the specific output the test suite expects. You can represent things and even present them in your own API however you want. But your test-specific printer has to represent things in this way for the test suite to work. Does that make sense?

Likewise, there's a ton of flexibility in how your parser handles numbers. You can keep hex numbers as hex numbers (for example, if you're writing a parser that's meant to edit kdl files programmatically), etc.

@thomasrstorey
Copy link
Author

thomasrstorey commented Dec 13, 2021

Thanks for the quick response! That does make sense, but I am not sure if it answers my question...
After studying the test cases and the wording a bit more closely I will try express my confusion more clearly. :)

for no_decimal_exponent.kdl, the input is:

node 1e10

and the expected output is

node 1E+10

However, for positive_exponent.kdl, the input is

node 1.0e+10

and the expected output is

node 1.0E+10

Now, my parser can parse both inputs fine, but they effectively result in identical internal values. So when they print, they both come out as 1.0E+10, which is wrong. Does my pretty printer need to remember that the original parsed string was formatted with a decimal point and use that to inform the output format? Or do I need to allow for controls in my pretty print API and choose to print certain values a certain way on a per-test case basis?

@tabatkins
Copy link
Contributor

Yes, we could use a little more detail in precisely how the test suite expects numeric output to work; I had to do some slightly non-trivial stuff to hold onto enough information to output in the test's expected output, and put that output form behind a flag so the default is a little simpler.

It looks like I'm passing those two tests in particular by accident, just because I'm relying on Python's native number parsing and printing, and it distinguishes ints from floats and prints them differently. I'm wondering if we should canonicalize those in the test suite to one form or the other?

@tabatkins
Copy link
Contributor

In particular, right now the test suite expects:

  • numbers to be canonicalized to Decimal
  • the presence or absence of a decimal part and an exponent part to be preserved
  • the precise value of the exponent, if present, to be preserved (no canonicalizing to, say, engineering notation's "divisible by 3" exponents)
  • the exponent, when printed, always uses "E", and has a "+" or "-" in front of it

Aside from the fact that 1.0 needs to be preserved in that way, insignificant zeros are never preserved. It probably makes sense to relax that final case and instead expect insignificant zeros to always be omitted?

@thomasrstorey
Copy link
Author

Thanks @tabatkins those expectations and your explanation of your approach are very helpful!

I would like to avoid tightly coupling the behavior of my pretty-printer to the specific formatting observed by my parser, and would rather control the output of my pretty-printer explicitly by configuration. It sounds like there isn't any specific reason not to do so, as long as the parser can parse the input files and is capable of representing them in the way shown in the expected output.

@tjol
Copy link
Contributor

tjol commented Oct 7, 2022

FWIW, in ckdl I don't keep track of significant digits, or of whether scientific notation was used, but, for the purposes of running the test suite, configured float output to:

  • Always print a decimal point
  • Use a capital E for the exponent, if present
  • Always print a + or - after the E, if present
  • Use an exponent starting at E+/-4

This covers all tests except no_decimal_exponent.kdl, which happens to rely on keeping track of significant digits. (That test I've excluded from the ckdl test suite)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants