Adding a language

This is a first stab at a tutorial on adding a new language. First we demonstrate installing a Tree-sitter parser library for JSON then for the rest of the tutorial the Snake programming language is used.

Installing a parser

You need a language for which a Tree-sitter parse with Python language bindings exists. For JSON there is the “tree-sitter-json” package. Within Vim you can run:

Treesit hint install tree-sitter-json

This will display a pip command that should be suitable for installing the package, taking any virtual environment into account.

After installation the following should work, within Vim.

py3 import tree_sitter_json

JSON is not very demanding in terms of providing syntax highlighting so for the rest of the tutorial we switch to Snake.

Make Snake supported by VPE-Sitter

The Snake programming language we are interested in just happens to look remarkably similar to Python [1]. In fact it is so similar that we can use the tree-sitter-python library to perform the parsing.

We need the VPE-Sitter to recognise Snake as a supported language. This simply involves a small amount of editing of a languages.conf file in the subdirectory plugin/vpe_sitter within your Vim configuration directory. The command:

Treesit openconfig

will open this file for you and create the directory if necessary. It will also provide template content if the file does not already exist. The line required for this example is:

snake tree_sitter_python

The following command will list all the configured languages. The --log option will write the output to the VPE log.

Treesit [--log] info languages

which will output something like:

Languages configured:
   c
   python
   snake (user provided)
Note: Support depends on correctly installed Tree-sitter code.

Any user configured languages are defined in:
    /home/paul/.config/vim/plugin/vpe_sitter/languages.conf

Create a language configuration

Getting set up

You will need:

  1. Some example code, which you should open in Vim.

  2. A snake.syn file in your plugin/vpe_syntax directory (the syn-file).

  3. The list of highlight groups that VPE-Syntax makes available.

  4. The VPE log open in a window.

For this tutorial we will use example.snk.

 1"""Module docstring."""
 2
 3import inspect
 4from typing import TypeAlias
 5
 6MIN_HEIGHT: Final[int] = 40            # Height of game display.
 7PropertySpec: TypeAlias = [int, int]
 8
 9# Mapping from character to highlight group.
10lookup_table = {
11    'X': 'String',
12    42: 'Number',
13}
14
15def hello_world() -> int:
16    """Greet the globe upon which we live."""
17    print('Hello, Wordl!')
18    return 42
19
20class LevelStore:
21    """Source of the levels."""
22    def __init__(self):
23        self.the_answer = 42.0

Edit this file. It will most likely display without any syntax highlighting because the .snk extension is not recognised by Vim. You will need to set the filetype option; use the setfiletype command - setfiletype snake.

In the same Vim session, use the following commands to start editing the snake.syn file.

split
Synsit openconfig snake.syn

Will create any necessary directory and provide basic template text. Save the file because VPE-Syntax will not enable parsing unless it finds a syn-file.

To see the list of VPE-Syntax highlight groups you can run the scheme tweaker. Make sure the current buffer is your example Snake code and run the command:

Synsit tweak

This will split the window horizontally. The top window will display VPE-Syntax’s built-in (experimental) colour scheme editor. The VPE-Syntax “standard” groups are also listed in the VPE-Syntax highlight groups section.

Open the VPE log in a window:

Vpe log show

You should now have 4 windows which you can arrange as you see fit.

Now start syntax highlighting by making the example code your current window and running:

Synsit on

If everything is working you will see messages in the log that looks like:

VPE-sitter: Can parse snake
VPE-sitter:    parser=<tree_sitter.Parser object at 0x7f9d7d7f8f30>
VPE-sitter:    parser.language=<Language id=140314393664544, version=14>

If there is a problem then, hopefully, error messages will be displayed that help you figure out the problem. If you find the diagnostics lacking in any way please raise an issue (https://github.com/paul-ollis/vpe_syntax/issues).

Writing the language (.syn) file

This is basically a process of displaying partial syntax trees in the VPE log and using the displayed tree to add rules to the language configuration (snake.syn) file. Bit by bit you should be able to fairly quickly build up a useful configuration.

Start with line 6 by placing the cursor on it and entering the command:

Treesit log tree

This will write a partial tree that contains all Tree-sitter nodes for line 2, plus all ancestor nodes:

module
    expression_statement
        assignment
            left:identifier
            :
            type:type
                generic_type
                    identifier
                    type_parameter
                        [
                        type
                            identifier
                        ]
            =
            right:integer
    comment

The mapping from the above tree to the Snake code should be fairly obvious, but sometimes you might find it easier to show line/column range information for each node by using Treesit log tree --ranges.

The above tree contains several Tree-sitter nodes of immediate interest - left:identifier, identifier type_parameter and comment. Let’s add simple entries for each. Update the syn-file to look like:

# Tree structure                   Property name
identifier                         Identifier
comment                            Comment
type                               Type

The two column layout is a (strongly) recommended convention. Note that there is no rule for left:identifier. The identifier rule will match left:identifier as well as any plain identifier. The left: part is called a “field name prefix” (or just “prefix”) and it can be included in rules for more precise matching, as will be shown later.

To see the result go to the example code window and execute the commands:

Synsit rebuild
edit             " Reloading triggers a reparse of the buffer.

The exact appearance will depend on your colour scheme, but you should now see the comments and identifiers highlighted. Next we can examine line 11, which has the tree:

module
    expression_statement
        assignment
            right:dictionary
                pair
                    key:string
                        string_start
                        string_content
                        string_end
                    :
                    value:string
                        string_start
                        string_content
                        string_end
                ,

This has two string nodes, one with a key prefix and one with a value prefix. Add these as key:string and string so the syn-file now reads:

# Tree structure                   Property name
identifier                         Identifier
comment                            Comment
type                               Type
string                             String
key:string                         Property

and do Synsit rebuild | edit to see the results. Unless you have already created an extended colour scheme, you will now see all strings highlighted identically. However, the ‘X’ on line 11 uses the Property highlight rather than String. By default, VPE-Syntax links Property to string, so the ‘X’ looks like other strings.

As a diversion may now wish to experiment with the scheme tweaker to make the keys distinguishable from string value. Go to the scheme tweaker’s window, find the Property group and hit <Enter>. Press ‘K’ to break the link to the String group and then experiment with changing the colour. If you wish, you can copy all or the modified the highlight commands into a personal colour scheme file (:help colorscheme for details).

So far, all of the rules we have added are very simple - node name, highlight group name. I prefer my Snake docstrings to look more like comments than strings. So let’s make it so. First line 1, with the tree:

module
    expression_statement
        string
            string_start
            string_content
            string_end

We need a rule that is more specific for this. In this case we can do this:

# Tree structure                   Property name
identifier                         Identifier
comment                            Comment
type                               Type
string                             String
key:string                         Property

module
    expression_statement
        string                     StringDocumentation

We have now added a rule that means:

If a module contains an expression_statement which in turn contains a string then highlight the string using the StringDocumentation property.

The basic rule for applying syn-file rules is “longest matchinf rule wins”. The new rule involves 3 nodes and so trumps the simpler, single node rule for strings.

The blank line before the rule is not required, but the indentation used is mandatory and must use multiples of 4 spaces. The output from Treesit log tree uses 4 spaces, so you can cut and paste from the VPE log. Once you start adding ‘tree-style’ rules like this, the two column layout convention makes your syn-file easier to read.

With this change in place, the module docstring will be highlighted like a comment because the StringDocumentation highlight group links to the Comment group. Once again, you may want to edit your colour scheme to make docstrings look slightly (or very) different to comments.

Docstrings for functions and classes require additional rules. Here they are:

class_definition
    block
        expression_statement
            string                 StringDocumentation

function_definition
    block
        expression_statement
            string                 StringDocumentation

By now the syntax highlighting of your Snake code should be looking quite reasonable, but much more can, of course be done. For a start, we could pick out the def keyword on line 15 and we might also want to make function names standout. The tree logged for line 15 is:

module
    function_definition
        def
        name:identifier
        parameters:parameters
            (
            )
        ->
        return_type:type
            identifier
        :

We could add the rule:

function_definition
    def                            FunctionDef
    name:identifier                FunctionName

but we can also merge this rule with the one we created earlier to get:

function_definition
    def                            FunctionDef
    name:identifier                FunctionName
    block
        expression_statement
            string                 StringDocumentation

That just about covers the process. Keep logging selected tree snippets, using them to add match rules. You can view the Python syn-file provided with VPE-Syntax with the command Synsit openconfig --std python.syn.

You may find some more useful information in ref:lang files, but much of it overlaps with this tutorial.