MarkdownDeep - Implementation Notes
Getting started with MarkdownDeep's code base
This page describes important information for anyone interested in working on the MiniME code base.
The source code for MarkdownDeep is available from GitHub:
(If you're not familiar with Git, GitHib has several useful tutorials).
Developing against the MarkdownDeep solution requires:
Visual Studio 2008, with C#, web development and C++ installed.
NUnit for running unit tests.
Info-zip command line utility (included in the Tools directory)
To run the unit tests, edit the project settings for the MarkdownDeepTests project and set the following:
Start External Program -
C:\Program Files (x86)\NUnit 2.5.5\bin\net-2.0\nunit.exe(or equivalent)
Command line arguments -
The MarkdownDeep codebase is contained within a single Visual Studio 2008 solution with the following projects:
The C# implementation
Benchmark tests for the C# implementation (tests are the same as the benchmark tests included in MarkdownSharp).
A WinForms project that allows entry of Markdown text into a textbox and shows the transformed text output and preview in a WebBrowser control. Great for testing.
A simple console app that processes a single input file - used during development.
Stores information about a single abbreviation declaration.
Represents the block level structures of a Markdown document, eg: a paragraph, list, list item, blockquote, code block etc... Also used by the
BlockProcessorto store information about a single line of input.
Parses the input text from a
StringScannerinto a tree of blocks.
Stores information about a single footnote declaration.
Represents a HTML element tag, including name, attributes, closing tag info etc... Has static methods for parsing a HTML tag out of
Stores information about a link or image definition - either parsed from a Markdown reference style link definition, or an inline link definition. Has static methods for parsing a link definition or link target from a
LinkDefinitiondoesn't include the link text - see
Stores information about a specific link instance - specifically the link text and a reference to the associated LinkDefinition.
Implements the public API to MarkdownDeep, including the main
Transformmethod, properties for configurable options and object pooling.
Scans a string of input Markdown text from a
StringScanner, tokenizes it and renders the final output into a
StringBuilder. Spans are internal to blocks (similar HTML block vs inline elements).
For C#, the standard .NET
Provides a framework for maintaining a position in a string being scanned with helper methods for skipping, finding and generally working with the input text.
Stores the definition of a simple table, including column alignments, header text and row information.
Represents a single element in a tokenized span. Used by the
SpanFormatterclass to represent the internal structure of a span of text.
StringScannerclass is never derived from - rather it's passed around as a parameter. The C# implementation will probably be changed to match.
Members that are properties in the C# version are implemented as either member variables or as accessor methods of the form
The entire implementation is wrapped in a closure to provide appropriate namespacing/access control. Unfortunately this seemed to adversly affect performance, but not enough to warrant removal of the closure* Some properties have been prefixed with
m_to better facilitate easier minification.
Special Case Handling
There are a number of cases where MarkdownDeep needs to handle special cases:
- Reverting setext headings to horizontal rules or normal paragraphs
In line parsing lines that look like
---are marked as
BlockType.post_h2. If these can't be matched to a preceeding line to made into a paragraph, they're reverted to either a horizontal rule for the
---or a normal paragraph for the
- Reverting list items to normal paragraphs.
When a line starts with a
1.style list item indicator, it's marked as a list item. If that line immediately follows a normal paragraph line the line needs to be considered part of the paragraph and not a list item. In this case the list item is reverted to a plain paragraph including the leading
- Leading spaces on list items
List item levels can be increased by spaces rather than the normal 4 character or tab indent mechanism. In determining list levels, the normal indent classification of a block can't be relied on. In this case, when building the list blocks, the leading space on list items is examined and any lines that have more leading space than the first item in the list are promoted from list item blocks to indent blocks.
- Html Block Processing
Normally the block processor works on a line by line bases. In the case where it detects a block HTML tag, it processes the entire multi-line structure as a single block immediately, rather than processing it on a line by line basis.
- Matching of
The algorithm for matching emphasis markers is documented in the span formatter class.
- Titled Figures
In order to implement titled figures, what would normally be rendered as a
ptag needs to be
replaced with a
divtag when the paragraph contains only an image. Since we don't know about
the content of the paragraph until after the paragraph has been tokenized by the SpanFormatter,
there's a special method on the SpanFormatter called
FormatParagraphthat is used when rendering
pblocks. It tokenizes the paragraph content and then renders either the normal
- Currently the Markdown object is passed to many objects and methods where it's not actually used. The intention is that as more configuration options are added, this object will act as the storage for these settings and having this object available everywhere will make those properties available where needed.