Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow access to skipped whitespace #47

Open
codeZeilen opened this issue Jul 6, 2020 · 13 comments
Open

Allow access to skipped whitespace #47

codeZeilen opened this issue Jul 6, 2020 · 13 comments

Comments

@codeZeilen
Copy link
Member

In order to support Pretty Printers, the CST should allow access to whitespace. The whitespace does not have to be explicitly materialized in the CST but could also be re-constructed after the fact.

Here are two suggestions:

  • CST nodes support the method skippedSpaces that returns the spaces skipped before the node was applied. There are two options how the spaces could be returned:
    • The spaces are returned as a simple string
    • The spaces are returned as a spaces node
  • CST nodes support the method childrenIncludingSpaces that returns a collection of all parsed nodes including the space nodes in between

Any thoughts from your side @Paula-Kli?

@Paula-Kli
Copy link

Paula-Kli commented Jul 7, 2020

From our perspective skippedSpaces returning a spaces node would be the better option. Since than you can apply this method in the value:-Method of OhmAttributes. In Addition you can then apply value: to the returned spaces node and using this method one can evaluate it.

What we don't really get is your second suggestion: what do you mean by "a collection of all parsed nodes"? It is not a new CST right?

@codeZeilen
Copy link
Member Author

In addition to the children method wthere would be the childrenIncludingSpaces method that returns the children as if the spaces were parsed from the start on. For example

aNode children -> {leftHandNode . operatorNode . rightHandNode}

aNode childrenIncludingSpaces -> {leftHandNode . spacesNode . operatorNode . spacesNode . rightHandNode}
"I still have to figure out whether there should be a spaces node at the first position in that array

Instead of getting the skippedSpaces for a specific node, this interface allows for a more "natural" tree traversal. However, it might be very difficult to implement correctly... :)

@Paula-Kli
Copy link

Ah alright I see!
In addition to saying it might be more difficult to implement correctly we still think skippedSpaces would be a more useful method.
Thanks a lot for asking!

@codeZeilen
Copy link
Member Author

Please have a look / try out the new methods added in 31c8c08. They are a first sketch of how this could work and not all corner cases are covered yet. If this works for you, I'll see to getting it right :)

@felixauringer
Copy link

felixauringer commented Jul 11, 2020

Thanks a lot for prototyping this so quickly. In general it works as we have expected it.

The prototype is based on the assumption that nodes that are next to each other in the CST are also neighbours in the original string. However we see a problem with rules that contain a * or a + (and maybe more):

grammar := OhmGrammar new: 'OhmNodeTestGrammar {
	StartRule = (";" firstRule)+
	firstRule = "a"
	space += comment
	comment = "\"" (~"\"" any)* "\""
}'.

result := (grammar
	match: ';a ; "comment" a'
	startingFrom: #StartRule) cst.

Here are parts of the CST that is produced with the code above:

If we want to retrieve the comment before the last a in the input string with

result children last children last skippedSpacesNodes

your code takes the other a from the _list-node as preceding node so the space interval ranges from 3 to 15 and also includes a ;.
When we had a look at how to get the comments back before we asked you, we stumbled upon this problem as well and didn't know how to get the right nodes.

We really appreciate your effort! Thanks a lot!

@codeZeilen
Copy link
Member Author

Ah! Thanks a lot! I suspected something like this but didn't have the time to think this through. I'll look into this case.

@codeZeilen
Copy link
Member Author

Please have a look at 9b85725 :) Again it is more of a sketch as I currently do not have time to think this trough. The current approach might still miss cases but is easier to reason about (but might take a little longer on the first call due to the construction of the source map).

@felixauringer
Copy link

When I try to send the message skippedSpacesString, I get the error MessageNotUnderstood: Array>>findFirst:startingAt: which also occurs in your tests. The message can easily be implemented but I do not think that our package would be the place for it. After implementing the message everything seems to work as expected 👍
Furthermore in the following snippet occurs an error with a node that is outside of the interval (at size + 1):

result := (OhmExplicitSendsSmalltalk
	match: 'header <pragma>.'
	startingFrom: #MethodDeclaration) cst.
spaces := result children second skippedSpacesString

This produces an error when trying to access the source map (index out of bound). It seems to be related to #43 because there are nodes of optional rules that have an interval outside the interval of method declaration:
image

@codeZeilen
Copy link
Member Author

Thanks for the PR! I will look at it Monday. Also: Sorry for not advancing this further. The week was somewhat busy. I am working on #48 currently, which resulted in a major rework of the whitespace skipping and might also help with the Optional problem (all maybe instances of skipped whitespace although the rule actually failed, but the parse succeeded as the rule was optional/negated/lookahead).

@felixauringer
Copy link

felixauringer commented Jul 24, 2020

We looked further into the problem with wrong intervals. As you can see in the following screenshot, the interval of the last node also covers whitespace characters (including comments) at the end of the method.
image

This causes problems when we try to extract comments after the last node with the source map.

@codeZeilen
Copy link
Member Author

I actually have a solution to this in my local image. The space skipping logic was inconsistent throughout nodes, as I did not completely understand the rule when to do it when I first implemented it. I now adjusted it but I did not manage to adjust all tests accordingly before the exam preparations started. I hope I can attend to it beginning of next week. It should resolve a number of issues...

@felixauringer
Copy link

That sounds really great! We really appreciate your work for this issue :)

@codeZeilen
Copy link
Member Author

I uploaded the intermediate state of the work on a new branch called new-whitespace-handling. You can try that one. I would advise loading it into a fresh image though... It works okay in my image so far, but not all tests are green so something is still off. I think I can attend to it Tuesday afternoon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants