MarkdownDeep - Implementation Notes

Getting started with MarkdownDeep's code base

This page describes important information for anyone interested in working on the MiniME code base.

Code Repository

The source code for MarkdownDeep is available from GitHub:

(If you're not familiar with Git, GitHib has several useful tutorials).

Development Environment

Developing against the MarkdownDeep solution requires:

  • Visual Studio 2008, with C#, web development and C++ installed.

  • NUnit for running unit tests.

  • MiniME for minification of the JavaScript edition (included in the Tool directory)

  • Info-zip command line utility (included in the Tools directory)

To run the unit tests, edit the project settings for the MarkdownDeepTests project and set the following:

  • Start External Program - C:\Program Files (x86)\NUnit 2.5.5\bin\net-2.0\nunit.exe (or equivalent)

  • Command line arguments - MarkdownDeepTests.dll /run

Project Overview

The MarkdownDeep codebase is contained within a single Visual Studio 2008 solution with the following projects:

MarkdownDeep

The C# implementation

MarkdownDeepBenchmark

Benchmark tests for the C# implementation (tests are the same as the benchmark tests included in MarkdownSharp).

MarkdownDeepGui

A WinForms project that allows entry of Markdown text into a textbox and shows the transformed text output and preview in a WebBrowser control. Great for testing.

MarkdownDeepJS

The Javascript implementation. Contained in a Visual Studio Utility project that runs MiniME to generate the minimized Javascript.

MarkdownDeepTests

Comprehensive set of 350+ NUnit test cases. Tests both the C# and Javascript implementation.

MarkdownDevBed

A simple console app that processes a single input file - used during development.

Classes Overview

The implementation of both the C# and Javascript editions is pretty much identical so these class descriptions apply to both code bases.

Abbreviation class

Stores information about a single abbreviation declaration.

Block class

Represents the block level structures of a Markdown document, eg: a paragraph, list, list item, blockquote, code block etc... Also used by the BlockProcessor to store information about a single line of input.

BlockProcessor class

Parses the input text from a StringScanner into a tree of blocks.

FootnoteReference class

Stores information about a single footnote declaration.

HtmlTag class

Represents a HTML element tag, including name, attributes, closing tag info etc... Has static methods for parsing a HTML tag out of StringScanner.

LinkDefinition class

Stores information about a link or image definition - either parsed from a Markdown reference style link definition, or an inline link definition. Has static methods for parsing a link definition or link target from a StringScanner. A LinkDefinition doesn't include the link text - see LinkInfo for that.

LinkInfo class

Stores information about a specific link instance - specifically the link text and a reference to the associated LinkDefinition.

Markdown class

Implements the public API to MarkdownDeep, including the main Transform method, properties for configurable options and object pooling.

SpanFormatter class

Scans a string of input Markdown text from a StringScanner, tokenizes it and renders the final output into a StringBuilder. Spans are internal to blocks (similar HTML block vs inline elements).

StringBuilder class

For C#, the standard .NET StringBuilder class. For Javascript a simple class that performs similar functionality.

StringScanner class

Provides a framework for maintaining a position in a string being scanned with helper methods for skipping, finding and generally working with the input text.

TableSpec class

Stores the definition of a simple table, including column alignments, header text and row information.

Token class

Represents a single element in a tokenized span. Used by the SpanFormatter class to represent the internal structure of a span of text.

Differences in the Javascript Edition

The Javascript edition is nearly identical to the C# edition with some notable exceptions:

  • The StringScanner class is never derived from - rather it's passed around as a parameter. The C# implementation will probably be changed to match.

  • Members that are properties in the C# version are implemented as either member variables or as accessor methods of the form get_xxx() and set_xxx().

  • For performance reasons, there is occassional use of regular expressions. Particularly in Internet Explorer, Javascript is not as efficient at string scanning as C# and some limited use of Regex allows better performance without radically differentiating the code base of both editions.

  • The entire implementation is wrapped in a closure to provide appropriate namespacing/access control. Unfortunately this seemed to adversly affect performance, but not enough to warrant removal of the closure* Some properties have been prefixed with m_ to better facilitate easier minification.

Special Case Handling

There are a number of cases where MarkdownDeep needs to handle special cases:

Reverting setext headings to horizontal rules or normal paragraphs

In line parsing lines that look like === or --- are marked as BlockType.post_h1 and BlockType.post_h2. If these can't be matched to a preceeding line to made into a paragraph, they're reverted to either a horizontal rule for the --- or a normal paragraph for the ===.

Reverting list items to normal paragraphs.

When a line starts with a * or 1. style list item indicator, it's marked as a list item. If that line immediately follows a normal paragraph line the line needs to be considered part of the paragraph and not a list item. In this case the list item is reverted to a plain paragraph including the leading * or 1. prefix.

Leading spaces on list items

List item levels can be increased by spaces rather than the normal 4 character or tab indent mechanism. In determining list levels, the normal indent classification of a block can't be relied on. In this case, when building the list blocks, the leading space on list items is examined and any lines that have more leading space than the first item in the list are promoted from list item blocks to indent blocks.

Html Block Processing

Normally the block processor works on a line by line bases. In the case where it detects a block HTML tag, it processes the entire multi-line structure as a single block immediately, rather than processing it on a line by line basis.

Matching of <strong> and <em> tokens

The algorithm for matching emphasis markers is documented in the span formatter class.

Titled Figures

In order to implement titled figures, what would normally be rendered as a p tag needs to be

replaced with a div tag when the paragraph contains only an image. Since we don't know about

the content of the paragraph until after the paragraph has been tokenized by the SpanFormatter,

there's a special method on the SpanFormatter called FormatParagraph that is used when rendering

p blocks. It tokenizes the paragraph content and then renders either the normal p or the

titled figure div tag.

Miscellaneous Notes

  • Currently the Markdown object is passed to many objects and methods where it's not actually used. The intention is that as more configuration options are added, this object will act as the storage for these settings and having this object available everywhere will make those properties available where needed.