Literate Tests with Org Mode and Emacs Lisp
In case you don't know, I'm the creator of and main contributor to the Lensor Compiler Collection (abbreviated LCC).
LCC has design the standard: "no third party dependencies". That means that I have written the code that transforms source code into tokens, as well as the code that issues diagnostics, as well as the code that selects machine instructions to emit, as well as the code that emits said machine instructions. As well as about 18 kajillion other steps that I'm omitting (not for sake of brevity, but because they've been blocked out by my subconscious in an attempt to protect me from the horrors of the world).
While that's all well and good, I'm only human, and humans make mistkaes; especially over the course of years, which is how much time I have sunk into this little hobby project. The issue with a compiler collection is, well, finding the bugs. Sometimes they are bugs of omission (i.e. not doing something that is actually required), sometimes they are bugs of logical error (i.e. my mental model is incorrect and doesn't match reality), sometimes they are off-by-one bugs, or accidentally using + instead of +=. It's not easy to just scan through code and find bugs; it often requires a deep understanding of what the compiler is trying to accomplish, not the code that is currently there (otherwise, it wouldn't be buggy code).
On top of all that, the compiler performs many data transformations, separated and grouped into stages. This means that, if one stage is wrong, the problem often appears in another stage, despite originating from an earlier one. For example, if semantic analysis of a language fails to set the type of some node, and that language's IR code generation tries to read the type and gets NULL, it will crash in IR generation, not in semantic analysis…
All of that to say, compilers are inherently complex systems, and complex systems have lots of places for errors to be introduced, and often have lots of other places for errors to actually manifest.
This is why, for compiler collections, testing is of the utmost importance. At it's most basic definition, a test is just an assertion of some output after some transformation is applied to some input.
LCC has many testing frameworks in place to ensure all parts of the codebase work as expected (or, at least, I'm working on it). But, one of these frameworks is of interest today: RunTest.
RunTest
You can think of RunTest as an end-to-end testing framework for LCC. That is, it treats LCC as a black box, and just pushes data through it to get an expected output. The catch is that the input is compiled with LCC, linked with GCC, and then the resulting binary executed; RunTest asserts the output of that binary, given an input of a source language to LCC.
Specifically, RunTest asserts the exit status and printed output of the final binary.
So, unless we want a single test to span across multiple files (something that I abhor), we need some sort of polyglot file format which may contain arbitrary source code, binary data (in case of asserting the program outputs control characters, NUL bytes, etc), and a number for the exit status.
We could just use a .txt file, but, if we include binary data, a lot of text editors (which is what .txt files will be opened with, by default) will crash, or display unexpected/wrong characters. This is not good for the test author, as you may imagine. On top of that, some languages that LCC supports may not support comments; that means that if we relied on defining tests within a language, the language's lexer would have to work in tandem with the test suite. That doesn't quite match our "arbitrary" source code specification, does it?
I was writing some documentation for Glint, typing some Glint source code in a source block within an Org mode file using Emacs, when I realized that I already had the answer. That is when it hit me: Org mode. I write source code of arbitrary languages in Org mode all the time. In fact, I do it for this blog. Why not use Org mode to write tests in? We can include binary data with example blocks, and we can include source code. Obviously, we can write a number in an Org file… It fits our specifications! On top of that, it allows the author of the test to define it literately, i.e. with included text to describe (any part of) the test.
The one "downside" to this is that Org mode is deeply tied into the Emacs ecosystem; in fact, the sole implementation of Org mode is Emacs. There aren't really parsers available for Org mode files that don't use Emacs itself. So, if we want to extract the test data from our Org mode test files, we need to use—you guessed it—Emacs!
Of course, Emacs is wonderful and provides Emacs Lisp, so we can just write a script, and Emacs will run it for us whenever we ask (just like Bash). Emacs is also very portable (just like Bash) ((holy heck, it even runs on my phone now!!)), so I think anyone who wants to hack on LCC will be able to get a working Emacs on their system.
With that out of the way, I decided on a basic syntax for the tests; here's Glint's "Empty File" test.
* Empty File In Glint, an empty program is completely valid. The rules for implicit return at the top level still apply. So, because the last expression is not of =int= type, Glint inserts =return 0;=. #+begin_src glint #+end_src #+begin_example 0 #+end_example #+begin_example #+end_example
The name of the test is the top-level headline, and the following named blocks constitute the test's source code input, and the test's expected status and output when the resulting binary has been run.
As you can see, a test author may mark up the different portions of their test with explanations, information, or whatever data is needed to understand the test case and why it is the way it is.
Now, an LCC contributor may define some language input and assert that it properly compiles to a program with the given semantics. This asserts that the binding between language and compiler is properly aligned, and semantics are preserved. For example, if semantics are not preserved, the language may be generating IR that doesn't match their intentions, or LCC may have a bug in it where it isn't respecting some feature of the IR when applying a transformation, or it could be a cosmic ray sent by aliens, bit-flipping your RAM just for funsies. You never know with these computer things.
Either way, it's good to have a system that alerts you when things are broken, rather than gasp silently producing broken code (my nightmare), and this framework does that in a (I believe) neat and tidy way.
LISP-based Output Matcher
#+begin_src (lambda (output) (string-empty-p output)) #+end_src
By using a source block to define the output, instead of an example block, RunTest expects a read-able function that accepts one argument, and will be applied with the programs output. If it returns non-nil, then the output is seen as "expected". This means you can use things like string-search to ensure a Glint Runtime Error without knowing the exact source location it will produce (since the test runner creates test files with arbitrary names).
Oh, Yeah
I forgot to mention, the org syntax accepts multiple source blocks, each one compiled separately and their artifacts collected to link together properly. This means that, if your language has modules or something like that, you can define a module and a program that uses it, and RunTest will automagically compile it properly. This means that Module/Type/AST serialisation and deserialisation may be tested by RunTest as well, which is a large pain-point for a lot of language developers (binary formats tend to be annoying like that).
Deep Dive
You can read actual RunTest tests at runtest/corpus/ within the LCC repository. This can be a good way to get a feel for a language LCC compiles, as there are lots of example programs combined with an assertion of exactly what they do.
Still interested? Go read runtest/runtest.el within the LCC repository; it showcases a lot of very LISPy things, like gensym, macros, asynchronous subprocesses, and more, while still being short and sweet.
For a really deep dive, clone LCC, then build and run the testing suite in tst/. This will run RunTest, CodeTest, and LangTest.