Language configuration files¶
Naming and location¶
Each supported language requires a configuration file that defines how to
process the Tree-sitter syntax tree provided by VPE-Sitter. The most important
(and currently only) function is to define how to add syntax highlighting.
Hence the files have a .syn extension. For example the configuration file
for Python is called ‘python.syn’.
Files are searched for in 2 places:
Under the installation directory of VPE-Syntax. The following command will display its name.
Synsit confdir
In your Vim configuration directory tree in the sub-directory
plugin/vpe_syntax. Use this command to display the full directory name.Synsit confdir --user
Settings in the user configuration file, if it exists, override matching settings in the installation configuration file.
Syntax trees¶
In order to understand the language configuration files, it is necessary to understand the Tree-sitter syntax trees used by VPE-Syntax. Let us start with a very short Python module:
1"""Module docstring."""
2
3WIDTH = 30
4
5class LevelStore:
6 """Source of the levels."""
7 def retrieve_content(self, level: int) -> list[str]:
8 """The text for this level."""
9 return [line.replace('@', 'X')
10 for line in text.decode('utf-8').splitlines()]
If you are editing this file and have enabled VPE-Syntax for the buffer then you can display the Tree-sitter tree using the command:
Treesit log tree --all
Which produces a tree representation of over 90 lines in the VPE log! The whole tree is typically not easy to use, so instead we can get a subset for a given line using the command:
Treesit log tree --start <lnum>
Or by placing the cursor on a line and doing:
Treesit log tree --ranges
For the third line, the tree produced is:
module (0, 0)->(9, 58)
expression_statement (2, 0)->(2, 10)
assignment (2, 0)->(2, 10)
left:identifier (2, 0)->(2, 5)
= (2, 6)->(2, 7)
right:integer (2, 8)->(2, 10)
All the syntactic elements for line 3 (index 2) are displayed along with
ancestor elements up to the top module element. The output is fairly easy
to interpret.
The numbers in parentheses are row and byte indices. For example the
(0, 0)->(7, 58)aftermodulemeans that the entire module starts at line zero, byte zero and ends at line 7, byte 58 (note that 58 is the index just after the last byte).The syntactic elements are known as “nodes” are and consist of two parts:
A name. Examples from a above are “module”, “identifier” and “=”.
A field name prefix - “left” and “right” above.
For our discussion, the ranges of the above tree are not of much interest, so this discussion normally omits them provide cleaner partial trees.:
module
expression_statement
assignment
left:identifier
=
right:integer
Configuration files¶
The job of a configuration file is to map parts of the syntax tree to Vim highlight group names. It has fairly simply formatting rules.
A configuration file has a fairly simple format.
Lines that start with a ‘#’ followed by a space are comments.
Blank lines are ignored and optional.
All other lines provide tree-match rules.
A tree-match rule consists of one or more lines that form tree structures, which is very similar to a portion of the syntax tree of the language. For example:
yield
yield Keyword
module
expression_statement
string StringDocumentation
The indentation used to form the tree structure must use increasing blocks of four spaces for each level. The words on the right are Vim highlight groups to be used for matching syntax tree nodes. It is not necessary to align the right hand side as shown above, but it is highly recommended.
A tree-match rule may consist of a single node. The following rule causes any
identifier node (with or without a field name prefix) to be highlighted using
the “Identifier” group, unless a more specific match is found - see later. So
the left:identifier above would be matched by the rule.
identifier Identifier
The algorithm that maps tree nodes to highlight groups chooses the most specific match. Basically “longest match wins”. Here is the example Python module again.
1"""Module docstring."""
2
3WIDTH = 30
4
5class LevelStore:
6 """Source of the levels."""
7 def retrieve_content(self, level: int) -> list[str]:
8 """The text for this level."""
9 return [line.replace('@', 'X')
10 for line in text.decode('utf-8').splitlines()]
The partial tree for for the docstring on line 1 is:
1module (0, 0)->(9, 58)
2 expression_statement (0, 0)->(0, 23)
3 string (0, 0)->(0, 23)
4 string_start (0, 0)->(0, 3)
5 string_content (0, 3)->(0, 20)
6 string_end (0, 20)->(0, 23)
The relevant tree-match rules from the supplied configuration are:
string String
module
expression_statement
string StringDocumentation
The first rule will match the string node on line 3, but the second rule
matches the parent-child sequence module -> expression_statement -> string,
which is 3 nodes long. So the string on line 1 is highlighted using the
“StringDocumentation” group.
A tree-match rule can appear quite complex. This is one of the longest in the supplied Python rule set.:
class_definition
class Class
name:identifier ClassName
block
expression_statement
string StringDocumentation
function_definition
def MethodDef
function_definition
identifier MethodName
However, it is actually just a more compact way of representing multiple rules within one tree structure. The above could be split up as:
class_definition
class Class
class_definition
name:identifier ClassName
class_definition
block
expression_statement
string StringDocumentation
class_definition
block
function_definition
def MethodDef
class_definition
block
function_definition
identifier MethodName
The second forms can be thought of as ‘pure’ rules, where each node has only a single child.
Field name prefix¶
When a field name prefix appears in the Tree-sitter tree it can be used in a
tree-match rule as a way of making the rule more specific. For example the
class_definition compound rule above uses name:identifier rather than
just name. In general, rules that include field name prefixes are preferred
over those that do not.