Implementation Notes
This page describes important information for anyone interested in working on the MiniME code base.
Code Repository
The source code for MarkdownDeep is available from GitHub:
(If you're not familiar with Git, GitHib has several useful tutorials).
Development Environment
Developing against the MarkdownDeep solution requires:
- Visual Studio 2008, with C#, web development and C++ installed.
- NUnit for running unit tests.
- MiniME for minification of the JavaScript edition (included in the Tool directory)
- Info-zip command line utility (included in the Tools directory)
To run the unit tests, edit the project settings for the MarkdownDeepTests project and set the following:
- Start External Program -
C:\Program Files (x86)\NUnit 2.5.5\bin\net-2.0\nunit.exe
(or equivalent) - Command line arguments -
MarkdownDeepTests.dll /run
Project Overview
The MarkdownDeep codebase is contained within a single Visual Studio 2008 solution with the following projects:
- MarkdownDeep
- The C# implementation
- MarkdownDeepBenchmark
- Benchmark tests for the C# implementation (tests are the same as the benchmark tests included in MarkdownSharp).
- MarkdownDeepGui
- A WinForms project that allows entry of Markdown text into a textbox and shows the transformed text output and preview in a WebBrowser control. Great for testing.
- MarkdownDeepJS
- The Javascript implementation. Contained in a Visual Studio Utility project that runs MiniME to generate the minimized Javascript.
- MarkdownDeepTests
- Comprehensive set of 350+ NUnit test cases. Tests both the C# and Javascript implementation.
- MarkdownDevBed
- A simple console app that processes a single input file - used during development.
Classes Overview
The implementation of both the C# and Javascript editions is pretty much identical so these class descriptions apply to both code bases.
Abbreviation
class- Stores information about a single abbreviation declaration.
Block
class- Represents the block level structures of a Markdown document, eg: a paragraph, list, list item, blockquote, code block etc... Also used by the
BlockProcessor
to store information about a single line of input. BlockProcessor
class- Parses the input text from a
StringScanner
into a tree of blocks. FootnoteReference
class- Stores information about a single footnote declaration.
HtmlTag
class- Represents a HTML element tag, including name, attributes, closing tag info etc... Has static methods for parsing a HTML tag out of
StringScanner
. LinkDefinition
class- Stores information about a link or image definition - either parsed from a Markdown reference style link definition, or an inline link definition. Has static methods for parsing a link definition or link target from a
StringScanner
. ALinkDefinition
doesn't include the link text - seeLinkInfo
for that. LinkInfo
class- Stores information about a specific link instance - specifically the link text and a reference to the associated LinkDefinition.
Markdown
class- Implements the public API to MarkdownDeep, including the main
Transform
method, properties for configurable options and object pooling. SpanFormatter
class- Scans a string of input Markdown text from a
StringScanner
, tokenizes it and renders the final output into aStringBuilder
. Spans are internal to blocks (similar HTML block vs inline elements). StringBuilder
class- For C#, the standard .NET
StringBuilder
class. For Javascript a simple class that performs similar functionality. StringScanner
class- Provides a framework for maintaining a position in a string being scanned with helper methods for skipping, finding and generally working with the input text.
TableSpec
class- Stores the definition of a simple table, including column alignments, header text and row information.
Token
class- Represents a single element in a tokenized span. Used by the
SpanFormatter
class to represent the internal structure of a span of text.
Differences in the Javascript Edition
The Javascript edition is nearly identical to the C# edition with some notable exceptions:
- The
StringScanner
class is never derived from - rather it's passed around as a parameter. The C# implementation will probably be changed to match. - Members that are properties in the C# version are implemented as either member variables or as accessor methods of the form
get_xxx()
andset_xxx()
. - For performance reasons, there is occassional use of regular expressions. Particularly in Internet Explorer, Javascript is not as efficient at string scanning as C# and some limited use of Regex allows better performance without radically differentiating the code base of both editions.
- The entire implementation is wrapped in a closure to provide appropriate namespacing/access control. Unfortunately this seemed to adversly affect performance, but not enough to warrant removal of the closure* Some properties have been prefixed with
m_
to better facilitate easier minification.
Special Case Handling
There are a number of cases where MarkdownDeep needs to handle special cases:
- Reverting setext headings to horizontal rules or normal paragraphs
- In line parsing lines that look like
===
or---
are marked asBlockType.post_h1
andBlockType.post_h2
. If these can't be matched to a preceeding line to made into a paragraph, they're reverted to either a horizontal rule for the---
or a normal paragraph for the===
. - Reverting list items to normal paragraphs.
- When a line starts with a
*
or1.
style list item indicator, it's marked as a list item. If that line immediately follows a normal paragraph line the line needs to be considered part of the paragraph and not a list item. In this case the list item is reverted to a plain paragraph including the leading*
or1.
prefix. - Leading spaces on list items
- List item levels can be increased by spaces rather than the normal 4 character or tab indent mechanism. In determining list levels, the normal indent classification of a block can't be relied on. In this case, when building the list blocks, the leading space on list items is examined and any lines that have more leading space than the first item in the list are promoted from list item blocks to indent blocks.
- Html Block Processing
- Normally the block processor works on a line by line bases. In the case where it detects a block HTML tag, it processes the entire multi-line structure as a single block immediately, rather than processing it on a line by line basis.
- Matching of
<strong>
and<em>
tokens - The algorithm for matching emphasis markers is documented in the span formatter class.
- Titled Figures
-
In order to implement titled figures, what would normally be rendered as a
p
tag needs to be replaced with adiv
tag when the paragraph contains only an image. Since we don't know about the content of the paragraph until after the paragraph has been tokenized by the SpanFormatter, there's a special method on the SpanFormatter calledFormatParagraph
that is used when renderingp
blocks. It tokenizes the paragraph content and then renders either the normalp
or the titled figurediv
tag.
Miscellaneous Notes
- Currently the Markdown object is passed to many objects and methods where it's not actually used. The intention is that as more configuration options are added, this object will act as the storage for these settings and having this object available everywhere will make those properties available where needed.