MarkdownDeep - Design Notes

Understanding the code base


MarkdownDeep was designed with the following goals in mind:

  • To provide an extremely effecient server side implemenation of Markdown.

  • To provide an 100% compatibile Javascript implementation.

  • To provide customization of the generated output markdown.

  • To provide built in XSS attack prevention.

The Markdown to HTML Transformation Process

MarkdownDeep takes a different approach to most other Markdown processors. Instead of using a series of Regular Expression replacements, MarkdownDeep builds a document heirarchy which it then renders.

Block processing is the process of parsing the input document into a heirarchial tree representing the structure of the document.

  1. The input text is loaded into a StringScanner which is then passed to a BlockProcessor.

  2. The BlockProcessor evaluates the input text creating a Block for each line.

  3. Sequences of line blocks are then assessed and related lines are merged to form the actual block structure of the document. (eg: consecutive indent blocks are folded into code blocks, consecutive list items into lists etc...).

  4. Nested structures (eg: blockquotes inside blockquotes) are also handled by the BlockProcessor by creating a new input string containing the nested input and recusively passing that through a another BlockProcessor instance.

Once the block structure of the document has been parsed, those blocks are recursively rendered to a StringBuilder to produce the final output. As part of the rendering process, spans of text are formatted using a SpanFormatter:

  1. Each span of text is tokenized into a series of Tokens representing the internal content of the text span.

  2. Balanced tokens such as emphasis and strong are matched against each other.

  3. Tokens are rendered to a StringBuilder to produce the final output.

This approach has its pros and cons:

  • The C# implementation is significantly faster as the input text is scanned far fewer times.

  • The code base is larger, but can also be debugged and extended more easily.

  • The performance of the Javascript implementation depends heavily on the browser. For Chrome, Firefox and Safari it performs as well if not better than other regular expression based Javascript implementations. For Internet Explorer, it performs considerably worse - but still acceptible for typical use. Initial testing in IE9 Technology Preview shows performance comparable to the other browsers.

Ambiguous Markdown Input

There are cases where Markdown input can be interpreted in multiple ways. In these cases MarkdownDeep favours performance, code maintainability and correct mark-up in preference to compatibility with other implementations and and may generate different output.

For example, when dealing with nested emphasis and bold indicators such as this:

***test** test*

many implementations of Markdown will generate this:

<p><strong><em>test</strong> test</em></p>

whereas MarkdownDeep will produce the more correct:

<p><em><strong>test</strong> test</em></p>

In all cases where the Markdown syntax is unambiguous, MarkdownDeep generates equivalent output.

Whitespace in Output

MarkdownDeep makes no effort to generate output that matches the whitespace output of other implementations, nor does it maintain the whitespace of the input text except where that whitespace affects the finally rendered page (eg: code blocks).

At some point an option may be added to do pretty formatted (indented) output.

Use of HTML Entities

MarkdownDeep makes no promises on the use of HTML entities in it's output and may generate different (but equivalent) output to other Markdown processors. For example: MarkdownDeep transforms > into &gt; where as some other Markdown processors do not.