MarkdownDeep - Design Notes
Understanding the code base
MarkdownDeep was designed with the following goals in mind:
To provide an extremely effecient server side implemenation of Markdown.
To provide customization of the generated output markdown.
To provide built in XSS attack prevention.
The Markdown to HTML Transformation Process
MarkdownDeep takes a different approach to most other Markdown processors. Instead of using a series of Regular Expression replacements, MarkdownDeep builds a document heirarchy which it then renders.
Block processing is the process of parsing the input document into a heirarchial tree representing the structure of the document.
The input text is loaded into a
StringScannerwhich is then passed to a
BlockProcessorevaluates the input text creating a
Blockfor each line.
Sequences of line blocks are then assessed and related lines are merged to form the actual block structure of the document. (eg: consecutive indent blocks are folded into code blocks, consecutive list items into lists etc...).
Nested structures (eg: blockquotes inside blockquotes) are also handled by the
BlockProcessorby creating a new input string containing the nested input and recusively passing that through a another
Once the block structure of the document has been parsed, those blocks are recursively rendered to a
StringBuilder to produce the final output. As part of the rendering process, spans of text are formatted using a
Each span of text is tokenized into a series of
Tokensrepresenting the internal content of the text span.
Balanced tokens such as emphasis and strong are matched against each other.
Tokens are rendered to a
StringBuilderto produce the final output.
This approach has its pros and cons:
The C# implementation is significantly faster as the input text is scanned far fewer times.
The code base is larger, but can also be debugged and extended more easily.
Ambiguous Markdown Input
There are cases where Markdown input can be interpreted in multiple ways. In these cases MarkdownDeep favours performance, code maintainability and correct mark-up in preference to compatibility with other implementations and and may generate different output.
For example, when dealing with nested emphasis and bold indicators such as this:
many implementations of Markdown will generate this:
whereas MarkdownDeep will produce the more correct:
In all cases where the Markdown syntax is unambiguous, MarkdownDeep generates equivalent output.
Whitespace in Output
MarkdownDeep makes no effort to generate output that matches the whitespace output of other implementations, nor does it maintain the whitespace of the input text except where that whitespace affects the finally rendered page (eg: code blocks).
At some point an option may be added to do pretty formatted (indented) output.
Use of HTML Entities
MarkdownDeep makes no promises on the use of HTML entities in it's output and may generate different (but equivalent) output to other Markdown processors. For example: MarkdownDeep transforms
> where as some other Markdown processors do not.