Skip to content

Commit 6f6e46e

Browse files
authoredJan 13, 2025
Migrate tree-sitter support to MODULE.bazel (carbon-language#4783)
The WORKSPACE file is deprecated; support is already off by default, and it'll be removed in the next major bazel release. Our main dependency is tree-sitter, and I'm trying to address that here. We're currently using https://github.com/elliottt/rules_tree_sitter, but that hasn't been updated in a couple years, meaning it lacks MODULE.bazel support. In the registry, there's https://registry.bazel.build/modules/tree-sitter-bazel, but this is only the *parser* libraries of tree-sitter, not the *generator*. I'm using it for that much, at least. For the *generator*, which transforms grammar.js to parser.c/h, I'm just requiring a non-hermetic invocation (i.e., people who want to work on it will need to install tree-sitter; see the README.md updates). I tried running it manually, but parser.c is about 600 KB; pre-commit rejects files that large and I don't think an exception makes sense to override for this (it'd probably also grow substantially if the grammar were updated to cover more syntax). In order to make the non-hermetic call not break "bazel build //..." for most developers, I'm marking most targets in the package as manual. Note, I did look long and hard at using `aspect_rules_js`/`rules_nodejs` to invoke npm. This took a lot of time, and I have a commit that's mostly working, except I hit a point where it uses `declare_symlink` which we disallow for compatibility reasons (commit "Lots of work for figuring out rule_js uses declare_symlink" on the PR). As a consequence, I think we can't use the primary supported ways to have hermetic npm calls. Also, `treesitter` -> `tree_sitter` because it's generally called `tree-sitter`, two words. We even had a `treesitter/src/tree_sitter` directory so it's a bit inconsistent. As far as bugs here, the parser library breaks bazel queries, e.g. the error: ``` ERROR: Evaluation of query "somepath(//..., @llvm-project//third-party/unittest:gtest)" failed: preloading transitive closure failed: no such package '@@[unknown repo 'platforms' requested from @@tree-sitter-bazel+]//': The repository '@@[unknown repo 'platforms' requested from @@tree-sitter-bazel+]' could not be resolved: No repository visible as '@platforms' from repository '@@tree-sitter-bazel+' ``` I'm just excluding tree_sitter from queries where I can to work around the error.
1 parent 0d70091 commit 6f6e46e

33 files changed

+200
-149
lines changed
 

‎.bazelrc

+8-16
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,6 @@ build --action_env=CMAKE_SYSROOT --host_action_env=CMAKE_SYSROOT
5757
build --per_file_copt=external/.*\.(c|cc|cpp|cxx)$@-w
5858
build --host_per_file_copt=external/.*\.(c|cc|cpp|cxx)$@-w
5959

60-
# The `rules_treesitter` synthesized libraries don't allow us to inject flags,
61-
# and compile generated code where we can't fix warnings.
62-
build --per_file_copt=utils/treesitter/_treesitter.tree_sitter/.*\.c$@-w
63-
build --host_per_file_copt=utils/treesitter/_treesitter.tree_sitter/.*\.c$@-w
64-
6560
# Default dynamic linking to off. While this can help build performance in some
6661
# edge cases with very large linked executables and a slow linker, between using
6762
# fast linkers on all platforms (LLD and the Apple linker), as well as having
@@ -128,17 +123,6 @@ build --allow_unresolved_symlinks=false
128123
# RC file here if present.
129124
try-import %workspace%/user.bazelrc
130125

131-
# Incompatible with `rules_tree_sitter`.
132-
# TODO: WORKSPACE will be removed in bazel 9, and we need to move off.
133-
# TODO: The registry has a different treesitter rule set, and we should
134-
# investigate switching. See:
135-
# https://registry.bazel.build/modules/tree-sitter-bazel
136-
common --enable_workspace
137-
# This is on by default in bazel 8.
138-
common --incompatible_disallow_empty_glob=false
139-
# common --incompatible_auto_exec_groups
140-
# common --incompatible_disable_starlark_host_transitions
141-
142126
# This excludes things like rules_android to reduce warnings.
143127
# TODO: There's a pending fix, so hopefully we can remove it soon.
144128
# - Issue: https://github.com/bazelbuild/bazel/issues/23929
@@ -150,11 +134,19 @@ common --incompatible_autoload_externally=+@rules_java,+@rules_python,+@rules_sh
150134
# TODO: Enable the flag once compatibility issues are fixed.
151135
# common --incompatible_disable_non_executable_java_binary
152136

137+
# Incompatible with the clang-tidy build mode.
138+
# TODO: Enable the flag once compatibility issues are fixed.
139+
# common --incompatible_auto_exec_groups
140+
153141
# Incompatible with `rules_cc`.
154142
# TODO: Enable the flag once compatibility issues are fixed.
155143
# common --incompatible_no_rule_outputs_param
156144
# common --incompatible_stop_exporting_language_modules
157145

146+
# Incompatible with `rules_flex`.
147+
# TODO: Enable the flag once compatibility issues are fixed.
148+
# common --incompatible_disable_starlark_host_transitions
149+
158150
# Incompatible with `rules_pkg`.
159151
# TODO: Enable the flag once compatibility issues are fixed.
160152
# common --incompatible_disable_target_default_provider_fields

‎.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535
# vim temporary files
3636
.*.sw[a-p]
3737

38-
# generated by utils/treesitter/helix.sh
38+
# generated by utils/tree_sitter/helix.sh
3939
/.helix/
4040

4141
# Ignore .DS_Store files

‎MODULE.bazel

+5-4
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,13 @@ http_archive = use_repo_rule(
2828
"http_archive",
2929
)
3030

31-
bazel_dep(name = "bazel_skylib", version = "1.7.1")
32-
bazel_dep(name = "rules_pkg", version = "1.0.1")
3331
bazel_dep(name = "abseil-cpp", version = "20240722.0.bcr.2")
34-
bazel_dep(name = "re2", version = "2024-07-02.bcr.1")
35-
bazel_dep(name = "googletest", version = "1.15.2")
32+
bazel_dep(name = "bazel_skylib", version = "1.7.1")
3633
bazel_dep(name = "google_benchmark", version = "1.8.5")
34+
bazel_dep(name = "googletest", version = "1.15.2")
35+
bazel_dep(name = "re2", version = "2024-07-02.bcr.1")
36+
bazel_dep(name = "rules_pkg", version = "1.0.1")
37+
bazel_dep(name = "tree-sitter-bazel", version = "0.24.4")
3738

3839
# The registry only has an old version. We use that here to avoid a miss but
3940
# override it with a newer version.

‎MODULE.bazel.lock

+32
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

‎WORKSPACE

-36
This file was deleted.

‎bazel/check_deps/BUILD

+5-2
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,17 @@ filegroup(
1818
"//migrate_cpp:rewriter",
1919
"//migrate_cpp/cpp_refactoring",
2020
"//toolchain/install:carbon-busybox",
21-
"//utils/treesitter",
21+
# The tree sitter rules can't be queried; evaluation fails on
22+
# @platforms.
23+
# "//utils/tree_sitter",
2224
],
2325
tags = ["manual"],
2426
)
2527

2628
genquery(
2729
name = "non_test_cc_deps.txt",
28-
expression = "kind('cc.* rule', deps(//bazel/check_deps:non_test_cc_rules))",
30+
expression =
31+
"kind('cc.* rule', deps(//bazel/check_deps:non_test_cc_rules))",
2932
opts = [
3033
"--notool_deps",
3134
"--noimplicit_deps",

‎docs/design/lexical_conventions/words.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ in Unicode Normalization Form C (NFC).
3838
<!--
3939
Keep in sync:
4040
- utils/textmate/Syntaxes/Carbon.plist
41-
- utils/treesitter/queries/highlights.scm
41+
- utils/tree_sitter/queries/highlights.scm
4242
-->
4343

4444
The following words are interpreted as keywords:

‎explorer/BUILD

+2-2
Original file line numberDiff line numberDiff line change
@@ -95,12 +95,12 @@ file_test(
9595
)
9696

9797
filegroup(
98-
name = "treesitter_testdata",
98+
name = "tree_sitter_testdata",
9999
srcs = glob(
100100
["testdata/**/*.carbon"],
101101
exclude = [
102102
"testdata/**/fail_*",
103103
],
104104
),
105-
visibility = ["//utils/treesitter:__pkg__"],
105+
visibility = ["//utils/tree_sitter:__pkg__"],
106106
)

‎scripts/create_compdb.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,9 @@ def _build_generated_files(bazel: str) -> None:
3636
kinds_query = (
3737
"filter("
3838
' ".*\\.(h|cpp|cc|c|cxx|def|inc)$",'
39-
' kind("generated file", deps(//...))'
39+
# tree_sitter is excluded here because it causes the query to failure on
40+
# `@platforms`.
41+
' kind("generated file", deps(//... except //utils/tree_sitter/...))'
4042
")"
4143
)
4244
generated_file_labels = subprocess.check_output(

‎scripts/forbid_llvm_googletest.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,11 @@ def main() -> None:
3838
args = [
3939
scripts_utils.locate_bazel(),
4040
"query",
41-
"somepath(//..., @llvm-project//third-party/unittest:gtest)",
41+
"somepath("
42+
# tree_sitter is excluded here because it causes the query to failure on
43+
# `@platforms`.
44+
+ "//... except //utils/tree_sitter/..., "
45+
+ "@llvm-project//third-party/unittest:gtest)",
4246
]
4347
p = subprocess.run(
4448
args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding="utf-8"

‎utils/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,4 @@ developers and developers writing Carbon code.
2222
Any editor that supports Language server protocol and/or tree-sitter is
2323
supported. The editor just needs to be configured manually.
2424
`bazel build //toolchain` produces the language server binary.
25-
`utils/treesitter` contains the treesitter grammar.
25+
`utils/tree_sitter` contains the tree-sitter grammar.

‎utils/nvim/setup.sh

+3-3
Original file line numberDiff line numberDiff line change
@@ -11,15 +11,15 @@ ROOT="$(git rev-parse --show-toplevel)"
1111
mkdir -p ~/.config/nvim/{lua,parser,queries}
1212

1313
# add highlight queries
14-
ln -sTf "$PWD/utils/treesitter/queries" ~/.config/nvim/queries/carbon
14+
ln -sTf "$PWD/utils/tree_sitter/queries" ~/.config/nvim/queries/carbon
1515

1616
# add carbon.lua
1717
ln -sf "$PWD/utils/nvim/carbon.lua" ~/.config/nvim/lua/carbon.lua
1818

1919
# load carbon.lua on startup
2020
grep 'require "carbon"' ~/.config/nvim/init.lua || echo 'require "carbon"' >> ~/.config/nvim/init.lua
2121

22-
# build treesitter
23-
cd utils/treesitter
22+
# build tree_sitter
23+
cd utils/tree_sitter
2424
tree-sitter generate
2525
clang -o ~/.config/nvim/parser/carbon.so -shared src/parser.c src/scanner.c -I ./src -Os -fPIC
File renamed without changes.

‎utils/tree_sitter/BUILD

+118
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
2+
# Exceptions. See /LICENSE for license information.
3+
# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
4+
5+
load("@rules_cc//cc:defs.bzl", "cc_binary", "cc_library", "cc_test")
6+
7+
package(default_visibility = ["//bazel/check_deps:__pkg__"])
8+
9+
# See README.md for instructions on tree-sitter setup and use. These rules are
10+
# manual because the tree-sitter invocation is non-hermetic, and most developers
11+
# won't have it installed; we don't want to break "bazel test //..." if we can
12+
# avoid it.
13+
#
14+
# We use tree-sitter non-hermetically for two key reasons:
15+
#
16+
# - The main way of hermetically using npms in bazel, `aspect_rules_js`, uses
17+
# declare_symlink; we disallow that for important compatibility reasons.
18+
# - When generated, src/parser.c is over 500 KB, which is larger than we want to
19+
# check in. It should also be expected to grow if the grammar becomes more complete.
20+
21+
# Convenience target for running all tests, including manual tests.
22+
test_suite(
23+
name = "tests",
24+
tags = ["manual"],
25+
tests = [
26+
":explorer_tests",
27+
":string_fail_tests",
28+
":string_tests",
29+
],
30+
)
31+
32+
# Call tree-sitter to generate parser files.
33+
genrule(
34+
name = "parser_files",
35+
srcs = ["grammar.js"],
36+
outs = [
37+
"src/parser.c",
38+
"src/tree_sitter/parser.h",
39+
],
40+
cmd = "tree-sitter generate $(location grammar.js) &&\n" +
41+
"cp src/parser.c $(location src/parser.c) &&\n" +
42+
"cp src/tree_sitter/parser.h $(location src/tree_sitter/parser.h)",
43+
tags = ["manual"],
44+
)
45+
46+
cc_library(
47+
name = "parser",
48+
srcs = [
49+
"src/scanner.c",
50+
":src/parser.c",
51+
],
52+
hdrs = [":src/tree_sitter/parser.h"],
53+
copts = ["-Wno-missing-prototypes"],
54+
tags = ["manual"],
55+
deps = ["@tree-sitter-bazel//:tree-sitter"],
56+
)
57+
58+
cc_binary(
59+
name = "test_runner",
60+
testonly = 1,
61+
srcs = ["test_runner.cpp"],
62+
tags = ["manual"],
63+
deps = [
64+
":parser",
65+
],
66+
)
67+
68+
cc_test(
69+
name = "explorer_tests",
70+
size = "small",
71+
srcs = ["test_runner.cpp"],
72+
args = ["$(locations //explorer:tree_sitter_testdata)"],
73+
data = ["//explorer:tree_sitter_testdata"],
74+
tags = ["manual"],
75+
deps = [
76+
":parser",
77+
],
78+
)
79+
80+
filegroup(
81+
name = "string_testdata",
82+
srcs = glob(
83+
["testdata/string/*.carbon"],
84+
exclude = ["testdata/string/fail_*.carbon"],
85+
),
86+
)
87+
88+
filegroup(
89+
name = "string_fail_testdata",
90+
srcs = glob(["testdata/string/fail_*.carbon"]),
91+
)
92+
93+
cc_test(
94+
name = "string_tests",
95+
size = "small",
96+
srcs = ["test_runner.cpp"],
97+
args = ["$(locations :string_testdata)"],
98+
data = [":string_testdata"],
99+
tags = ["manual"],
100+
deps = [
101+
":parser",
102+
],
103+
)
104+
105+
cc_test(
106+
name = "string_fail_tests",
107+
size = "small",
108+
srcs = ["test_runner.cpp"],
109+
args = ["$(locations :string_fail_testdata)"],
110+
data = [":string_fail_testdata"],
111+
env = {
112+
"FAIL_TESTS": "1",
113+
},
114+
tags = ["manual"],
115+
deps = [
116+
":parser",
117+
],
118+
)

‎utils/treesitter/README.md ‎utils/tree_sitter/README.md

+11-2
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,22 @@
1+
# Tree-sitter grammar for Carbon
2+
13
<!--
24
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
35
Exceptions. See /LICENSE for license information.
46
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
57
-->
68

7-
# Tree-sitter grammar for Carbon
8-
99
Tree-sitter is currently used for syntax highlighting in supported editors.
1010

11+
## Development
12+
13+
We use a non-hermetic tree-sitter invocation, so it must be installed locally.
14+
To install tree-sitter, run:
15+
16+
```
17+
npm install -g tree-sitter-cli
18+
```
19+
1120
## Editor Installation
1221

1322
### Helix
File renamed without changes.

‎utils/treesitter/helix.sh ‎utils/tree_sitter/helix.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
set -euo pipefail
88

99
ROOT="$(git rev-parse --show-toplevel)"
10-
cd "$ROOT/utils/treesitter"
10+
cd "$ROOT/utils/tree_sitter"
1111

1212
tree-sitter generate --no-bindings
1313

‎utils/treesitter/src/scanner.c ‎utils/tree_sitter/src/scanner.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
// Exceptions. See /LICENSE for license information.
33
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
44

5-
#include "tree_sitter/parser.h"
5+
#include "utils/tree_sitter/src/tree_sitter/parser.h"
66

77
enum TokenType {
88
BINARY_STAR,

‎utils/treesitter/test_runner.cpp ‎utils/tree_sitter/test_runner.cpp

+3-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
44

55
#include <tree_sitter/api.h>
6-
#include <tree_sitter/parser.h>
76

87
#include <cstdlib>
98
#include <filesystem>
@@ -13,6 +12,8 @@
1312
#include <string>
1413
#include <vector>
1514

15+
#include "utils/tree_sitter/src/tree_sitter/parser.h"
16+
1617
extern "C" {
1718
auto tree_sitter_carbon() -> TSLanguage*;
1819
}
@@ -29,7 +30,7 @@ static auto ReadFile(std::filesystem::path path) -> std::string {
2930
// TODO: use file_test.cpp
3031
auto main(int argc, char** argv) -> int {
3132
if (argc < 2) {
32-
std::cerr << "Usage: treesitter_carbon_tester <file>...\n";
33+
std::cerr << "Usage: test_runner <file>...\n";
3334
return 2;
3435
}
3536

‎utils/treesitter/BUILD

-75
This file was deleted.

0 commit comments

Comments
 (0)
Please sign in to comment.