Last week I have been working on improving prolog-markdown (my old post about it). I had 2 main things to improve: more supported Markdown constructs, including GitHub-styled fenced code blocks, and compatibility with the SWI-Prolog version 7. The problem with the new version of SWI is that they introduced strings which breaks the convention that double-quoted string literals are converted into lists of codes. I had taken this convention for granted and my code used it in many places. Instead of relying on the compatibility flags I decided to make the parser fully compatible with the new default behavior.
I started by rewriting the block-level parser because I was not satisfied with some of its predicates. I also wanted to move some complex code (like list item parsing) into separate modules as this would make it cleaner to add more features to the parser. To support tight Markdown, I made a change to the algorithm of detecting block boundaries. Previously I just assumed that each block is separated by at least one empty line but that is not always the case. Now the parser tries to recognize a block at the beginning of each line. Lines that do not start a block will be accumulated and turned into normal paragraphs.
A new pass had to be added to extract reference links. They must be known before applying the span-level parser as it emits the HTML terms right away (there is no separate AST). While extracting links it also replaces different line endings with the canonical
\n. It does not simplify later parsing stages much but makes them a bit faster.
The test suite grew more than twice. I added many cases of tight/non-tight Markdown block boundaries.
- Lines of code (without tests): 1709
- Lines of test code: 985
- Number of test cases: 169