tree-sitter API for Java using WebAssembly.
This library brings tree-sitter parsing capabilities to Java by compiling tree-sitter and its language grammars to WebAssembly and running them through Endive, a pure-Java WASM runtime. No JNI or native binaries required.
treesitter4j is a Maven multi-modules project:
The Java module that provides the public API. It contains:
TreeSitter-- factory for creating parser instances (manages the WASM module lifecycle via Endive)TreeSitterParser-- wraps a tree-sitter parser; set a language, parse a string, get a treeTreeSitterTree/TreeSitterNode-- navigate the resulting AST (node types, children, S-expressions, byte ranges, ...)TreeSitterPool-- thread-safe pool ofTreeSitterinstances for concurrent parsingLanguage-- enum of available grammars (see below)
The Rust/WebAssembly module. It compiles tree-sitter core and the grammar crates listed in Cargo.toml into a single tree-sitter.wasm binary. The Makefile handles downloading WASI SDK and Rust, building, and optimizing the WASM output.
The following tree-sitter grammars are compiled into the WASM module and exposed through the Language enum:
| Language | Crate | Version | Repository |
|---|---|---|---|
| core | tree-sitter |
0.26.9 | [tree-sitter](https://docs.rs/tree-sitter/0.26.9/tree_sitter/ |
| JSON | tree-sitter-json |
0.24.8 | tree-sitter-json |
| Java | tree-sitter-java |
0.23.5 | tree-sitter-java |
| Properties | tree-sitter-properties |
0.3.0 | tree-sitter-properties |
| HTML | tree-sitter-html |
0.23.2 | tree-sitter-html |
| XML | tree-sitter-xml |
0.7.0 | tree-sitter-xml |
| Markdown | tree-sitter-md |
0.5.3 | tree-sitter-md |
| YAML | tree-sitter-yaml |
0.7.2 | tree-sitter-yaml |
To add a new language, add the grammar crate to wasm-build/Cargo.toml, register it in wasm-build/src/lib.rs with a new ID, and add the corresponding entry to the Language enum in core.
try (TreeSitter ts = TreeSitter.create();
TreeSitterParser parser = ts.newParser()) {
parser.setLanguage(Language.JAVA);
try (TreeSitterTree tree = parser.parseString("class Foo {}")) {
TreeSitterNode root = tree.rootNode();
System.out.println(root.toSexp());
}
}Each TreeSitter instance owns an isolated WASM module with its own linear memory, so instances must not be shared across threads. TreeSitterPool caps the number of live instances and reuses them across parse operations. It uses only lock-free data structures and Semaphore internally (no synchronized blocks), so it is safe to use with virtual threads.
try (TreeSitterPool pool = TreeSitterPool.create(4)) {
try (TreeSitterPool.Loan loan = pool.borrow()) {
try (TreeSitterParser parser = loan.instance().newParser(Language.JAVA);
TreeSitterTree tree = parser.parseString(source)) {
System.out.println(tree.rootNode().toSexp());
}
}
}If an error leaves the instance in a bad state, call loan.discard() before closing the loan to destroy it rather than returning it to the pool.
- Java 17+
- Maven 3.9+
- Rust toolchain with the
wasm32-wasip1target (only needed to rebuild the WASM binary)
If wasm-build/wasm/tree-sitter.wasm is already present (checked into the repo), you only need to run this command:
mvn clean installIf it is needed to rebuild the wasm binary file for whatever reason like to add a new grammar/language, then you will have to perform the following steps:
- Add the new grammar to the Cargo.toml file under the section
[dependencies], - Include the new language id and method to call under the rust file
wasm-build/src/lib.rs. see section// --- Language helpers ---, - Update the README.md file to add the new language
## Supported languages
Then, execute these commands
cd wasm-build
make allAdditionally, include to this new language part of the Enum io.roastedroot.treesitter.Language and rebuild the java core module
public enum Language {
JSON(0),
JAVA(1),
PROPERTIES(2),
HTML(3),
XML(4),
MARKDOWN(5),
YAML(6);
// NEWLANGUAGE(7);NOTE: The Makefile includes the instructions needed to install localy: wasi-sdk and Binaryen !