============================
Language configuration files
============================

Naming and location
===================

Each supported language requires a configuration file that defines how to
process the Tree-sitter syntax tree provided by VPE-Sitter. The most important
(and currently only) function is to define how to add syntax highlighting.
Hence the files have a ``.syn`` extension. For example the configuration file
for Python is called 'python.syn'.

Files are searched for in 2 places:

1. Under the installation directory of VPE-Syntax. The following command will
   display its name.

   .. code-block:: vim

       Synsit confdir

2. In your Vim configuration directory tree in the sub-directory
   ``plugin/vpe_syntax``. Use this command to display the full directory name.

   .. code-block:: vim

       Synsit confdir --user

Settings in the user configuration file, if it exists, override matching
settings in the installation configuration file.


Syntax trees
============

In order to understand the language configuration files, it is necessary to
understand the Tree-sitter syntax trees used by VPE-Syntax. Let us start with a
very short Python module:

.. code-block:: python
    :linenos:

    """Module docstring."""

    WIDTH = 30

    class LevelStore:
        """Source of the levels."""
        def retrieve_content(self, level: int) -> list[str]:
            """The text for this level."""
            return [line.replace('@', 'X')
                for line in text.decode('utf-8').splitlines()]

If you are editing this file and have enabled VPE-Syntax for the buffer then
you can display the Tree-sitter tree using the command:

.. code-block:: vim

    Treesit log tree --all

Which produces a tree representation of over 90 lines in the VPE log! The whole
tree is typically not easy to use, so instead we can get a subset for a given
line using the command:

.. code-block:: vim

    Treesit log tree --start <lnum>

Or by placing the cursor on a line and doing:

.. code-block:: vim

    Treesit log tree --ranges

For the third line, the tree produced is::

    module (0, 0)->(9, 58)
      expression_statement (2, 0)->(2, 10)
        assignment (2, 0)->(2, 10)
          left:identifier (2, 0)->(2, 5)
          = (2, 6)->(2, 7)
          right:integer (2, 8)->(2, 10)

All the syntactic elements for line 3 (index 2) are displayed along with
ancestor elements up to the top ``module`` element. The output is fairly easy
to interpret.

- The numbers in parentheses are row and byte indices. For example the ``(0,
  0)->(7, 58)`` after ``module`` means that the entire module starts at line
  zero, byte zero and ends at line 7, byte 58 (note that 58 is the index just
  after the last byte).

- The syntactic elements are known as "nodes" are and consist of two parts:

  1. A name. Examples from a above are "module", "identifier" and "=".

  2. A field name prefix - "left" and "right" above.

For our discussion, the ranges of the above tree are not of much interest, so
this discussion normally omits them provide cleaner partial trees.::

    module
      expression_statement
        assignment
          left:identifier
          =
          right:integer


Configuration files
===================

The job of a configuration file is to map parts of the syntax tree to Vim
highlight group names. It has fairly simply formatting rules.

A configuration file has a fairly simple format.

1. Lines that start with a '#' followed by a space are comments.
2. Blank lines are ignored and optional.
3. All other lines provide tree-match rules.

A tree-match rule consists of one or more lines that form tree structures,
which is very similar to a portion of the syntax tree of the language. For
example::

    yield
        yield                          Keyword

    module
        expression_statement
            string                     StringDocumentation

The indentation used to form the tree structure **must** use increasing blocks
of four spaces for each level. The words on the right are Vim highlight groups
to be used for matching syntax tree nodes. It is not necessary to align the
right hand side as shown above, but it is highly recommended.

A tree-match rule may consist of a single node. The following rule causes any
identifier node (with or without a field name prefix) to be highlighted using
the "Identifier" group, unless a more specific match is found - see later. So
the ``left:identifier`` above would be matched by the rule.
::

    identifier                         Identifier

The algorithm that maps tree nodes to highlight groups chooses the most
specific match. Basically "longest match wins". Here is the example Python
module again.

.. code-block:: python
    :linenos:

    """Module docstring."""

    WIDTH = 30

    class LevelStore:
        """Source of the levels."""
        def retrieve_content(self, level: int) -> list[str]:
            """The text for this level."""
            return [line.replace('@', 'X')
                for line in text.decode('utf-8').splitlines()]

The partial tree for for the docstring on line 1 is:

.. code-block::
    :linenos:

    module (0, 0)->(9, 58)
     expression_statement (0, 0)->(0, 23)
       string (0, 0)->(0, 23)
         string_start (0, 0)->(0, 3)
         string_content (0, 3)->(0, 20)
         string_end (0, 20)->(0, 23)

The relevant tree-match rules from the supplied configuration are::

    string                             String

    module
        expression_statement
            string                     StringDocumentation

The first rule will match the ``string`` node on line 3, but the second rule
matches the parent-child sequence ``module -> expression_statement -> string``,
which is 3 nodes long. So the string on line 1 is highlighted using the
"StringDocumentation" group.

A tree-match rule can appear quite complex. This is one of the longest in the
supplied Python rule set.::

    class_definition
        class                          Class
        name:identifier                ClassName
        block
            expression_statement
                string                 StringDocumentation
            function_definition
                def                    MethodDef
            function_definition
                identifier             MethodName

However, it is actually just a more compact way of representing multiple rules
within one tree structure. The above could be split up as::

    class_definition
        class                          Class

    class_definition
        name:identifier                ClassName

    class_definition
        block
            expression_statement
                string                 StringDocumentation

    class_definition
        block
            function_definition
                def                    MethodDef

    class_definition
        block
            function_definition
                identifier             MethodName

The second forms can be thought of as 'pure' rules, where each node has only a
single child.


Field name prefix
-----------------

When a field name prefix appears in the Tree-sitter tree it can be used in a
tree-match rule as a way of making the rule more specific. For example the
``class_definition`` compound rule above uses ``name:identifier`` rather than
just ``name``. In general, rules that include field name prefixes are preferred
over those that do not.

.. vim: nospell