Add documentation example of retrieving position information of an arbitrary node #29

alberth · 2018-02-08T09:17:20Z

Hopefully reducing time spend looking for how to do X :)

Maybe move this function into the regular parglare code?

…bitrary node

coveralls · 2018-02-08T09:25:04Z

Coverage remained the same at 85.185% when pulling c329aa0 on alberth:doc-add-position-example into db66177 on igordejanovic:master.

coveralls · 2018-02-08T09:25:04Z

Coverage remained the same at 85.185% when pulling c329aa0 on alberth:doc-add-position-example into db66177 on igordejanovic:master.

coveralls · 2018-02-08T09:25:04Z

Coverage remained the same at 85.185% when pulling c329aa0 on alberth:doc-add-position-example into db66177 on igordejanovic:master.

coveralls · 2018-02-08T09:25:04Z

Coverage remained the same at 85.185% when pulling c329aa0 on alberth:doc-add-position-example into db66177 on igordejanovic:master.

igordejanovic · 2018-03-01T16:33:03Z

docs/actions.md

+def get_node_position(context, nodes, index):
+    if index < 0:
+        index = index + len(nodes) # Handle negative list indices.
+    node_startpos = context.start_position + sum(len(nodes[n]) for n in range(index))


This function assumes that all nodes have len defined which might not be true as the elements of nodes can be arbitrary objects produced by lower level actions.
The only way I see to track positions of individual nodes is to take note of them in all actions. That is handled by parglare if tree building is used.

context has a start and end position, so it must know the length of everything. Unfortunately, it does not provide a start position for each token, it only does that for the production rule as a whole. (That is, what I really want is context.token_start_position[token_index].) As a work-around, I need to calculate the offset from the production rule start to the token, so I can use the start position of the rule, and apply parglare.pos_to_line_col to get a human-readable terminal position.

I wrote this code, since in my opinion, line (and column) information is required in almost all non-trivial programs that use a parser. Parglare doesn't provide a standard function for it, so I think almost every Parglare user will have to dig through the API in the manual like I did, figuring out how to get useful position information. The example code shows how to do it in a common use case (ie parsing plain text).

While for you, parsing is the one and only job, for me, parsing is just the simple first step (takes a day or two if the language design is finished, and if it fits nicely in the parser assumptions).
Right behind it is static semantics checking and/or interpretation of the input, and error detection and reporting about the input is a dominant part there (takes several months at least rather than 2 days). In my experience, you always need line/column position from the input, with ascii-ish text being the dominant stuff being parsed, and the need to be able to fallback to dumping error messages onto stderr. (That is, incorrect type at line 32, column 54 is so much nicer than incorrect type in the type definition starting at offset 33536.)

I agree you need length of a token in my code, but I don't know how else to get a (line, column) position of an arbitrary terminal in a production rule. As a secondary consideration, in case there is no length of a token, my guess is that a (line, column) notion won't make sense either, ie, you don't need this code in that case.

I didn't look at building a parse tree, since my actions immediately abstract to an AST. I don't want a tree with all irrelevant stuff still in there, since I will have to crawl over it with manually written code rather than with your automagically generated callbacks. (That is, a parse tree is more work.)

Sorry, probably I wasn't clear enough. It's not that I think that position is not important. It is very much important to me as well as the parsing is not the only thing I do. I do stuff after parsing :)

The note I made was that the code above assumes that all objects returned from the lower level actions in the bottom-up processing have length defined which might not be true. So, this example should make that clear.

parglare idea is to be general enough and flexible to be used in different context. So I try to provide bare minimum functionality and build on top of it. For example, support for assignments and auto-building of ASTs is provided using built-in obj action which uses the same action calling mechanism as any other action.

Keeping positions of all tokens might be a significant waste of space/time if the information is not needed. If it is needed than a simple action decorator that is applied to all actions and keeps track of all tokens shouldn't be more than a few lines of code in Python. I've been doing recently some support for action decorators to make action registration easier.
It sounds like a good idea to provide decorator for position tracking in parglare, and that would be a more general approach which won't have assumption about results of the previous actions.

I typically need the position of one token in a production rule (which gets copied into the AST). I can even point out which token(s) exactly in advance. In another parser we use the @ to denote a token should be an argument in the callback, so we can retrieve its position, eg

@Override // CompoundProgram : @PROCKW @IDENTIFIERTK PAROPENTK PARCLOSETK COLONTK Body @ENDKW; public List<Declaration> parseCompoundProgram01(Token t1, Token t2, ParserBody p6, Token t7) {

It's the wrong programming language, but you get the idea :p

This parser is much more aimed at parsing text, so 'Token.position' is a line and column already, which avoids a lot of boilerplate position conversions.

Another option could be to drop all the position conversion here, and just get and store the offset of a token in the AST. Then, conversion becomes a separate problem outside parsing, although I can see a user wanting to convert the offset into position while constructing the AST rather than at error reporting time. You'd likely want to provide a simple additional API for the 'text input' case though as a service.

Ah, ok. You can easily do that now in parglare. Default actions in parglare will convert token to string it matches and non-terminal rule to a Python list. If you use assignments, obj action will be executed on those rules by default which creates object with attributes created by the LHS of assignments and values matched by RHS. The class of the object will be named after the grammar rule. Thus, you have automatic AST building out-of-the box. Position is kept in all AST created objects as _pg_start_position and _pg_end_position (that is done by obj action). The only thing you might need in addition is the position of tokens/terminal matches. That can be easily provided by overriding default shift action that converts token to a string.

For example:

@token some_token: /.../; @token some_other_token: /.../;

where token is action you provide that will take what is matched and create object with matched content and position for example. Notice that in this way you can choose which token keeps its position.

The other way to achieve similar is by converting each terminal rule to a non-terminal rule using assignments:

some_token: value=/.../; some_other_token: value=/.../;

Now, these rules are actually non-terminal rules using assignments. They will be processed by obj action and they will get _pg_start/end_position attributes. Matched value will be in the value attribute.

Great that it's already possible. The main problem might be lack of a "how to get positions in my ast" guide?

I typically don't use automagic AST generators, since the structure of the parsed text normally does not match the structure of my AST. As a simple example, consider

vardef: VAR type variables SEMICOL ; type: NAME ; variables: NAME | variables COMMA NAME ; NAME: /[A-Z][A-Za-z0-9]+/;

This declares variables in C style, ie int i, j, k; In my AST, I'd like

t = Type('int', intPos) AST = [VarDecl(t, 'i', iPos), VarDecl(t, "j", jPos), VarDecl(t, "k", kPos)]

This makes further processing of variable declarations easier. Automagic AST generation can't handle such structural changes, for the simple reason that it doesn't know what I want.

Great that it's already possible. The main problem might be lack of a "how to get positions in my ast" guide?

Yes, there is a lack of this in the docs. I'll see to provide some form of action decorator for this and document it.

I typically don't use automagic AST generators, since the structure of the parsed text normally does not match the structure of my AST. As a simple example, consider...

You got the point. Indeed, there is a need to manually handle such cases. That's why I didn't go with textX route like before where all you have is just auto-AST building. In parglare you have some sane defaults (at least sane in my view :) ) but you can override what you don't like. Here you could override default obj action for vardef and provide your own as there is the change in the structure. But for the part where the syntactic structure fits wanted AST you don't need to do anything.

I agree it's less work, but I prefer to have a proper class for each AST node, since it has room for documentation, and it serves as a stable, independent, easy to find reference, while working on semantics checking. That's likely a personal preference though, ymmv.

Add documentation example of retrieving position information of an ar…

c329aa0

…bitrary node

igordejanovic reviewed Mar 1, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add documentation example of retrieving position information of an arbitrary node #29

Add documentation example of retrieving position information of an arbitrary node #29

alberth commented Feb 8, 2018

coveralls commented Feb 8, 2018 •

edited

Loading

coveralls commented Feb 8, 2018

coveralls commented Feb 8, 2018

coveralls commented Feb 8, 2018

igordejanovic Mar 1, 2018

alberth Mar 2, 2018 •

edited

Loading

igordejanovic Mar 2, 2018

alberth Mar 2, 2018

igordejanovic Mar 2, 2018 •

edited

Loading

alberth Mar 2, 2018 •

edited

Loading

igordejanovic Mar 2, 2018

alberth Mar 3, 2018

Add documentation example of retrieving position information of an arbitrary node #29

Are you sure you want to change the base?

Add documentation example of retrieving position information of an arbitrary node #29

Conversation

alberth commented Feb 8, 2018

coveralls commented Feb 8, 2018 • edited Loading

coveralls commented Feb 8, 2018

coveralls commented Feb 8, 2018

coveralls commented Feb 8, 2018

igordejanovic Mar 1, 2018

Choose a reason for hiding this comment

alberth Mar 2, 2018 • edited Loading

Choose a reason for hiding this comment

igordejanovic Mar 2, 2018

Choose a reason for hiding this comment

alberth Mar 2, 2018

Choose a reason for hiding this comment

igordejanovic Mar 2, 2018 • edited Loading

Choose a reason for hiding this comment

alberth Mar 2, 2018 • edited Loading

Choose a reason for hiding this comment

igordejanovic Mar 2, 2018

Choose a reason for hiding this comment

alberth Mar 3, 2018

Choose a reason for hiding this comment

coveralls commented Feb 8, 2018 •

edited

Loading

alberth Mar 2, 2018 •

edited

Loading

igordejanovic Mar 2, 2018 •

edited

Loading

alberth Mar 2, 2018 •

edited

Loading