For a very long time, I wanted to make a syntax highlighter for the SR Linux command-line interface mainly because I belong to a cohort of readers who appreciate visual aids in lengthy CLI snippets. Give me a piece of code that is not syntax highlighted, and my reading speed will significantly drop.
And even though the Network OS CLI snippets do not contain code per-se, they have markers (such as a current command, IP addresses, up/down statuses, etc.) that when highlighted, contribute to the clarity of the provided snippet.
So during a lazy first Thursday of 2023 I finally made myself looking into it and created the srlinux-pygments - a Pygments lexer to highlight SR Linux CLI snippets.
Whenever you see a nicely highlighted code block on the web, chances are high that syntax highlighting was done using Pygments.
Info
Pygments is a generic syntax highlighter suitable for use in code hosting, forums, wikis or other applications that need to prettify source code. Highlights are:
a wide range of 548 languages and other text formats is supported
special attention is paid to details that increase highlighting quality
support for new languages and formats are added easily; most languages use a simple regex-based lexing mechanism
a number of output formats is available, among them HTML, RTF, LaTeX and ANSI sequences
it is usable as a command-line tool and as a library
Almost all python-based documentation engines exclusively use Pygments for syntax highlighting; mkdocs-material engine, which powers this learning portal, is no exception. When you create a code block in your markdown file and annotate it with some language class, Pygments kicks in and colorizes it.
A lexer is a Pygments' component that parses the code block's content and generates tokens. Tokens are then rendered by the formatter in one of the supported ways, for example, HTML code. This might sound confusing at first, but the key takeaway here is that lexer is a python program that leverages Pygments' API to parse the raw text and extract the tokens that will be highlighted later on. So when you need to create a new syntax highlighting for a custom language, you typically only need to create a new lexer.
Before jumping to creating our new lexer, let's draft the requirements. What do we want to highlight? There is no public standard for a Network OS CLI syntax, thus, we can choose what tokens we want to highlight.
Consider the following SR Linux CLI snippet that displays static routes configuration stanza:
In this "wall of text", I think it is suitable to make the following adjustments:
Make the 1st line of the prompt less intrusive. It displays auxiliary information about the selected datastore and present context, but it is not the "meat" of the snippet, so it's better to make it less visible.
On the 2nd prompt line, we typically have the command - info static-routes in our example. This is the crux of the snippet, the key piece of information. Thus it makes sense to put the accent on the command we display.
Interface names, IP addresses, string, and number literals are often the key user input in many configuration blocks. It makes sense to highlight these tokens to improve visibility.
Keywords like enable/disable/up/down are often the most important part of the code blocks, especially if this is a show command output. We need to articulate those keywords visually.
Authors often augment raw CLI snippets with comments; we need to make those strings render with comments style.
Those styling requirements laid out the base for srlinux-pygments lexer project, and you can see the effect of it at the beginning of this post.
Once the requirements are fleshed out, let's create a custom Pygments lexer for SR Linux CLI snippets. Pygments documentation on writing a lexer is a good start, but it is not as welcoming as I wanted it to be, so let me fill in the gaps.
A lexer is a Python module that leverages Pygments' API to parse the input text and emit tokens which are subject to highlight. Typically, the lexer module contains a single class that subclasses Pygments's RegexLexer class:
With the name field, we give a name to the lexer. The aliases list defines the aliases our lexers can be found by (as in the fenced code block). And filenames field will auto-guess this lexer for files which conform to the provided pattern.
The whole deal of the lexer is within the tokens variable, which defines states and state's tuples with regular expressions and corresponding tokens. Let's zoom in.
The tokens var defines a single state called root, which contains a list of tuples. Each tuple contains at most three elements:
regexp expression
token to emit for the match
next state
What are states?
I have mentioned states a few times by now; they are a powerful concept for complex syntax highlighting rules. Luckily, in our simple case, we don't have to deal with states, thus we have only a single root state. Consequently, all our tuples have at most two elements.
Currently, our lexer has a single state with two tuples containing match rules written to handle Comments and String literals. Let's consider the first tuple that handles comments in our snippets:
(r"^\s*#.*$",Comment)
The regexp matches on every string that may start with a space characters, followed by the # char and any number of characters after it till the end of the string. The whole match of this regexp will be assigned the Comment token.
Pygments maintains an extensive collection of Tokens for different cases. When HTML output is used, each token is marked with a distinctive CSS class, which makes it possible to highlight it differently.
Like in the case above, when lexer matches the comment string and HTML output is used, the whole match will be assigned a CSS class of c (short for Comment), and documentation themes may create CSS rules to style elements with this particular class according to their theme.
Tip
Read along to see how mkdocs-material uses those classes to style the elements in the code blocks.
By now, you probably figured out, that, in a nutshell, a simple lexer is just a bunch of regexps and associated tokens. Let's see which match rules and tokens we chose for SR Linux CLI snippets and for what purpose.
SR Linux prompt consists of two lines. First one holding the current datastore and its state plus the current working context. On the second line, you get the active CPM literal and the hostname. The rest is vacant for the command to type in.
Since prompt appears in the snippet potentially many times (you show multiple commands typed in) it makes sense to make it less intrusive. On the other hand, the command you typed in is what needs to stand out, and thus it is better to be highlighted.
We used two match tuples to handle the prompt lines. First one handles the first line and marks it with a Comment token, and second one marks the command string with Name token.
--{ * candidate shared default }--[ network-instance black ]--
A:leaf1# info static-routes
--{ * candidate shared default }--[ network-instance black ]--A:leaf1# info static-routes
Tip
Most parsers you find in srlinux-pygments repo augmented with regexp101.com links to visualise the work of the matching expression.
All CLIs have some keywords like enable, enter or commit. Those keywords bear significant value and thus are good candidates for highlighting. In the same spirit, words like up, established or down and disabled are important markers that a human desperately searches for during the debugging session.
What unites those three categories is that all of them are simple words which can be easily matched using a list containing those words. This is exactly what Pygements allow us to do using the word() function. We keep a list of keywords, positive and negative words in a words.py file and then corresponding parser tuples leverage those.
enter candidate
set / interface ethernet-1/49 admin-state enable
set / interface ethernet-1/50 admin-state disable
Highlighting interface names and IP addresses is equally important. They are the beacons and key elements in Network OS configuration to which many objects bind. Making them distinguishable aids in clarity.
We also decided to highlight digits (aka numerals) as they often indicate an index, a VLAN ID, or some other significant parameter.
Here is a parser responsible for matching digits in different positions in the text.
--{ + running }--[ ]--
A:leaf1# show network-instance default protocols bgp summary
-------------------------------------------------------------
BGP is enabled and up in network-instance "default"
Global AS number : 101
BGP identifier : 10.0.0.1
-------------------------------------------------------------
Total paths : 3
Received routes : 3
Received and active routes: None
Total UP peers : 1
Configured peers : 1, 0 are disabled
Dynamic peers : None
--{ + running }--[ ]--A:leaf1# show network-instance default protocols bgp summary-------------------------------------------------------------BGP is enabled and up in network-instance "default"Global AS number :101BGP identifier : 10.0.0.1------------------------------------------------------------- Total paths :3 Received routes :3 Received and active routes: None Total UP peers :1 Configured peers : 1, 0 are disabled Dynamic peers : None
--{ + running }--[ ]--A:leaf1# show network-instance default protocols bgp summary-------------------------------------------------------------BGP is enabled and up in network-instance "default"Global AS number : 101BGP identifier : 10.0.0.1------------------------------------------------------------- Total paths : 3 Received routes : 3 Received and active routes: None Total UP peers : 1 Configured peers : 1, 0 are disabled Dynamic peers : None
Highlighting numbers can be too much for some users, for that reason we also created a minified lexer, that has everythin, but numbers highlighted. It can be selected with srlmin language identifier.
At this stage, we created match tuples contained in the parsers.py file, but parsers need to be attached to the token variable of the lexer class as discussed in the Lexer structure section.
This is done in the srlinux.py file where parsers are imported and added to the root state of the token variable:
"""A Pygments lexer for SR Linux configuration snippets."""importrefrompygments.lexerimportRegexLexerfrompygments.tokenimport*from.parsersimport(srl_prompt,comments,strings,keywords,pos_words,neg_words,sys_lo_if,eth_if,ipv4,ipv6,nums,rt,)__all__=("SRLinuxLexer",)classSRLinuxLexer(RegexLexer):""" A lexer to highlight SR Linux CLI snippets. """name="SR Linux"aliases=["srl"]flags=re.MULTILINE|re.IGNORECASEtokens={"root":[]}tokens["root"].extend(srl_prompt)tokens["root"].extend(comments)tokens["root"].extend(strings)tokens["root"].extend(keywords)tokens["root"].extend(pos_words)tokens["root"].extend(neg_words)tokens["root"].extend(eth_if)tokens["root"].extend(sys_lo_if)tokens["root"].extend(ipv4)tokens["root"].extend(ipv6)tokens["root"].extend(nums)tokens["root"].extend(rt)
Note
The order of adding the parsers is important, as they are processed sequentially.
Now that our lexer has its structure formed with parser tuples attached, the question is how to install it so that the pygments package can use it?1
To our luck, pygments uses setuptools entry_points property that allows plugins to register easily. In the setup.py file we specify the entry_points values registering our lexer classes with pygments.lexers.
Now, to install our custom lexer and make it known to the pygments all we need to do is:
To use your custom syntax highligter, use the alias you provided in the lexer class definition (in our case it was aliases=['srl']) in the fenced code block:
When doing the initial development of a lexer, I wanted to have an immediate feedback loop and see the results of the changes I made to parsers. To assist in that, I have created a dockerized test environment that consists of mkdocs-material doc engine which installs the lexer on startup.
With the make test command developers should have the mkdocs-material container to start and have lexers installed in editable mode. Now, to start the dev server paste in mkdocs serve -a 0.0.0.0:8000 command and you should be able to open the web page with the mkdocs-material doc portal that displays various CLI snippets with applied highlighting.
When you made changes to the parsers, simply Ctrl+C the live web server and start it again to reload pygments.
Ok, it is all cool, but how do you make mkdocs-material to make use of the custom parser we just created? And how to know which colors it uses for which tokens? All the hard questions.
First, we have to install the custom lexer along with the mkdocs-material. If you use mkdocs-material as a python project, install the lexer as explained before in the same virtual environment which mkdocs-material uses. Should you use mkdocs-material container image (you really should), you have to either modify the container image run command and embed the pip install step before calling mkdocs build/serve or create your own image based on original mkdocs-material image and add this step in the dockerfile.
Mkdocs-material offers a single color palette for code blocks syntax, and the question is how to understand which color is used for which token? To discover that we have to dig into some source files.
First, we need to know which tokens are associated with which CSS classes (aka short names). You can find the mapping between the Token name and the corresponding CSS classes in the token.py file of the pygments project. For example, the Comment token is associated with c class.
Knowing the CSS class of a particular token let's find which color variable mkdocs-material uses. This information is avilable in the _highlight.scss file of mkdocs-material. For example, there we can find that for a c CSS class the var(--md-code-hl-comment-color) is associated.
With this information, you can pick up the Tokens and the corresponding colors to make your syntax highlighting style to match your design ideas.
Making a simple custom highlighter for Pygments turned out to be an easy job. The only prerequisite - is familiarity with regular expressions, and Pygments handles the rest.
I am quite happy with the result and plan to fine-tune the parsers based on users' feedback. Likelty, there is a bunch of important keywords we will discover in the CLI snippets worth highlighting.
You can check the EVPN Layer 2 Tutorial, where snippets have been fixed to use the srl highlighting style.
Tip
Make sure to subscribe to receive email/rss notifications when new blog posts are published.