Skip to content

Conversation

@blindFS
Copy link
Contributor

@blindFS blindFS commented Nov 25, 2025

Make it easy for blindFS/topiary-nushell#39 (comment)

Also fixes a minor issue of empty command_list

Although this PR introduces 6 more conflicts, it hardly hurts lex state counts and WASM comp time, should be safe to merge.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 25, 2025

@mkatychev, sorry to ping you on an irrelevant PR. I'm confused at the failed CI pipeline, do you happen to know what's going on?

@mkatychev
Copy link
Contributor

I'll look more into it tomorrow but I've encountered similar messages and it was an issue with the package-lock.json

@mkatychev
Copy link
Contributor

@blindFS I finally narrowed down the problem, until tree-sitter/node-tree-sitter#258 is merged, node isn't compatible with 0.25 basically since there's breaking ABI changes that haven't been addressed.

So either temporarily drop node support or temporarily downgrade the ABI.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 26, 2025

@blindFS I finally narrowed down the problem, until tree-sitter/node-tree-sitter#258 is merged, node isn't compatible with 0.25 basically since there's breaking ABI changes that haven't been addressed.

So either temporarily drop node support or temporarily downgrade the ABI.

Thanks a lot, I'll disable test-node for now.

BTW, it will be super helpful if you review this PR in your spare time.

A lot of conflicts for this minor edge case is kinda silly, but I really run out of ideas here.

The basic idea behind this PR is that, previously, there's a list_body node in empty lists like

[
,
]

which covers the commas and newlines between brackets, it's somewhat annoying for topiary, so I make this kind of empty body contents anonymous in this PR.

The general_body_rule function helps to generate such kind of repeated patterns with separators in between, especially helpful when extra newlines are allowed.

@fdncred
Copy link
Contributor

fdncred commented Nov 26, 2025

ping me when you're ready to land it.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 26, 2025

ping me when you're ready to land it.

Sure

@mkatychev
Copy link
Contributor

mkatychev commented Nov 26, 2025

@blindFS I'll post my general comments initially:

  • getting rid of the general_body_rules function would help a lot with indirection
  • adding newlines to extras would remove the need for most of this PR (ignoring regressions in other rules ofc)1.

I managed to greatly simplify the $._collection_body node while adding additional support for
commas
inside of collection types without regressions in #235:

In general handling whitespace explicitly (the choice(punc().comma, /\s/) below) does not need to be done as tree sitter nodes2 consider whitespace delimitation between nodes as opt-out 1:

With the whitespace mention above general_body_rules does not feel justified and removing it would greatly simplify a lot of the rulesets (in my opinion).

Ideally a list node should have a simple definition:
list: ($) => seq('[', repeat1(choice( ',', $._list_entry, '\n'))),

...this could further be reduced by keeping newlines as part of extras
and having their presence define rules3 (such as end of function or statement) be explicitly opt in: extras: ($) => [/s/, $.comment],

Footnotes

  1. https://tree-sitter.github.io/tree-sitter/creating-parsers/3-writing-the-grammar.html#using-extras 2

  2. named or anonymous (like ,)

  3. token.immediate(rule)

),
general_body_rules('entry', $.val_entry, $._entry_separator, $._newline),

_list_body_or_empty: ($) =>
Copy link
Contributor

@mkatychev mkatychev Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going more off of #234 (comment)

the presences of rules like _list_body_or_empty feels like a smell because such a type should be simply handled by a repeat.

Ideally a list should have as simple a rule as possible so that most edge cases are handled within the _*_entry rule:

  • list: ($) => seq('[', repeat(choice($._list_entry, ',')), ']'),

Handling newlines explicitly where their presence does not matter feels like a design problem (exclusion from extras).

@mkatychev
Copy link
Contributor

mkatychev commented Nov 26, 2025

@blindFS I finally narrowed down the problem, until tree-sitter/node-tree-sitter#258 is merged, node isn't compatible with 0.25 basically since there's breaking ABI changes that haven't been addressed.
So either temporarily drop node support or temporarily downgrade the ABI.

Thanks a lot, I'll disable test-node for now.

BTW, it will be super helpful if you review this PR in your spare time.

A lot of conflicts for this minor edge case is kinda silly, but I really run out of ideas here.

The basic idea behind this PR is that, previously, there's a list_body node in empty lists like

[
,
]

which covers the commas and newlines between brackets, it's somewhat annoying for topiary, so I make this kind of empty body contents anonymous in this PR.

The general_body_rule function helps to generate such kind of repeated patterns with separators in between, especially helpful when extra newlines are allowed.

In topiary's case I think turning an [ , ] into [] ( through @delete) would be reasonable (I'm assuming that's what you're trying to do) and should not conflict with some changes.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 27, 2025

@mkatychev Thanks a lot for the reviewing. The removing of newline in extras #139 does seem to cause a lot of trouble.

As I recall, it mainly solves the issue of multiline binary-op.

That issue might be solvable with some careful tuning of precedence. I'll try it on the old grammar.js before #139 but I feel pessimistic about it. I'll let you know if I run into troubles.

@mkatychev
Copy link
Contributor

Please let me know if you have issues, you can generally get away with this kind of thing without resorting to scanner.c so long as you don't maintain state (unlike something like python where indentation is "stateful").

@blindFS
Copy link
Contributor Author

blindFS commented Nov 27, 2025

Ideally a list node should have a simple definition:
list: ($) => seq('[', repeat1(choice( ',', $._list_entry, '\n'))),

You mean getting rid of list_body? That will be a breaking change affecting many downstream projects. And we should do the same for all xxx_body nodes.

@blindFS
Copy link
Contributor Author

blindFS commented Nov 27, 2025

@mkatychev Oh, I think I remember what was the problem:

There's a fundamental difference between

(foo
bar # bar is the argument of command foo
)

foo
bar # a different command

It seems the only way to differentiate them is to specify newlines in an explicit way.

Two possibilities:

Explicit \n in parenthesized rules to force continuation of the matching of current pipe-element

That's what it looked like before #139, It makes everything with higher precedence difficult, for example:

(
  ls | where $it.name != "foo"
  and $it.name != "bar"
)

Explicit \n in non-parenthesized blocks as a terminator between piplelines

This probably is what you would expect. However I have trouble making this work as expected

1
# comment
| $in

The best bet I'm aware of is moving the $._pipe_separator rule to scanner.c so it can have a higher precedence than the \n in $._terminator, but still the comments in-between are pretty annoying. Any idea?

@blindFS blindFS closed this Nov 28, 2025
@blindFS
Copy link
Contributor Author

blindFS commented Nov 28, 2025

I think moving terminator to the external scanner is a simple and feasible way. Closing this for now.

@mkatychev
Copy link
Contributor

I see your case, I think the terminator case is most similar to python/justfile indentation:

justfile scanner.c
python scanner.c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants