Implementation Notes

Getting started with MiniME's code base

This page describes important information for anyone interested in working on the MiniME code base.

Code Repository

The source code for MiniME is available from GitHub:

(If you're not familiar with Git, GitHib has several useful tutorials).

Development Environment

To build MiniME you will need:

  • Visual Studio 2008, with C# installed.
  • NUnit for running unit tests.
  • Info-zip command line utility.

Unit Tests

To run the unit tests, edit the project settings for the MiniMETestCases project and set the following:

  • Start External Program - C:\Program Files (x86)\NUnit 2.5.5\bin\net-2.0\nunit.exe (or equivalent)
  • Command line arguments - MiniMETestCases.dll /run

Each unit test is defined as a text file resource that uses lines of hyphens to delimit the end of the input script from the expected output script, for example:

// This is a test scipt
function fn()
{
}
-----
function fn(){}
-----

Comments about the test can be inserted as regular JavaScript comments in the input text area.

Some MiniME options can be set using a JavaScript comment with the option name surrounded in square brackets - see the included test scripts for examples.

How it Works

In processing a JavaScript file, MiniME does the following:

  1. Loads the file into a StringScanner
  2. Creates a Tokenizer that reads characters from the StringScanner
  3. Creates a Parser that reads tokens from the Tokenizer and generates a complete Abstract Syntax Tree.
  4. Runs the VisitorScopeBuilder visitor over the AST and creates a heirarchy of SymbolScopes and connects each to it's defining function.
  5. Runs the VisitorCombineVarDecls visitor to combine consecutive variable declarations into a single statement.
  6. Runs the VisitorSymbolDeclaration visitor to find all symbol and member declarations and populates each SymbolScope with the symbols and members it defines.
  7. Runs the three VisitorConstDetectorPass[1,2,3] visitors to detect constant symbol declarations, eliminate symbols that are subsequent modified (and therefore not const) and finally remove symbols that are determined to be constant.
  8. Runs the VisitorSimplifyExpressions visitor that instructs all expressions to attempt to simplify themselves.
  9. Runs the VisitorSymbolUsage visitor to calculate the usage frequency of all symbols and members.
  10. Prepares all SymbolScopes by sorting used symbols by frequency and determining which symbols can be obfuscated.
  11. Renders the AST, building the final output script.

Terms

In this documentation and the MiniME codebase itself, the following terms are used:

Abstract Syntax Tree (AST)

A heirarchial representation of the entire structure of a JavaScript file as a set of objects.

Accesibility
Whether a symbol is private or public.
Compile

The process of tokenizing and parsing a JavaScript file into an Abstract Syntax Tree, optionally performing a set of transformations and then re-rendering the AST as a new JavaScript file.

Obviously this is not a true definition of what it means to Compile code, but explains how it is used in the context of MiniME.

Frequency

The number of times a Symbol or Member is used - used to allocate shorter names to more frequently used symbols.

Member
Any symbol on the right-hand side of a period. ie: object properties and methods.
Parse
The process of reading a stream of tokens and producing an AST.
Rank
The position of a symbol in a list of symbols sorted by frequency.
Render
The process of generating JavaScript code from an AST.
Scope
Either the global scope of a JavaScript file, or the scope of a function body.
Symbol

Any symbol, not on the right-hand side of a period. ie: parameters, local variables, global variables and non-anonymous function names.

Token

A single element in a JavaScript file, eg: an operator, identifier, keyword, string ornumber literal etc...

Tokenize
The process of reading a stream of characters and generating a stream of tokens.
Visitor
A object the enumerates the entire AST inpecting each node.

Project Overview

The MiniME codebase is contained within a single Visual Studio 2008 solution with the following projects:

MiniME
The main MineME.dll assembly that contains all the code of interest.
MiniMETestCases
NUnit test cases
mm
The command line console application, which serves as a simple front end for MiniME.dll

Only the MiniME project is discussed in any detail here.

Class Overview

The following is a brief description of each of the classes in the MiniME project:

MiniME namespace ###

AccessibilitySpec class

Stores a symbol specifier for a private or public directive with methods to parse and match the specifier against an ast.ExprNodeIdentifier.

CommandLine class

Utility class for processing command line arguments and helpers for displaying the command line logo and help.

This functionality is included in the main assembly because it was thought the ability to process response files through the API might be useful in a server environment.

CompileError class

Exception class thrown when errors in processing are encountered. Typically stores an error message and a Bookmark reference that points to the offending JavaScript code.

Compiler class

The main MiniME compiler class - stores options, loads JavaScript files, processes all files, writes output files, checks file times.

Parser class
Reads tokens from a Tokenizer and builds an Abstract Syntax Tree for a JavaScript program.
RenderContext class

Provides context for rendering the final output, including storing the current function scope, providing references to symbol and member name allocators and providing methods for rendering the actual output text.

StringScanner class
Stores a current position in a string being parsed.
Symbol class
Stores information about a symbol including it's scope, frequency, rank and accessibility.
SymbolAllocator class

Stores a mapping of original symbol names to obfuscated names. Internally manages its own concept of scope and is responsible for allocating symbols according to frequency of use.

SymbolFrequency class

Stores information about the frequency of a set of symbols and provides methods to sort accordingly.

SymbolScope class

Represents a global or function scope and stores information about all symbols declared and used in that scope.

TextFileUtils class
Helpers for working with file encodings.
Tokenizer class

Reads characters from a StringScanner and produces a stream of tokens for the Parser.

The Tokenizer is also responsible for processing include directives and producing a continuous sequence of tokens to the parser.

Utils class
Miscellanous utility functions
VisitorCombineVarDecls class
Visitor to merge consecutive var declarations into a single statement.
VisitorConstDetectorPassN class
Three pass visitors to detect constant declarations.
VisitorScopeBuilder class
Allocates a SymbolScope for each function and connects scopes into a heirarchy.
VisitorSimplifyExpressions class
Visits all expressions and requests they simplify themselves.
VisitorSymbolDeclaration class
Finds symbol declarations and registers them with the enclosing symbol scope.
VisitorSymbolUsage class
Updates the frequency count of symbols.

MiniME.ast namespace ###

The ast namespace encapsulates all the classes used to store the Abtract Syntax Tree (AST) of the parsed input file(s).

ast.Node class
Base class for all AST elements.
ast.ExprNode class
Base class for all expression nodes.
ast.ExprNode* classes
Implementations of each of the expression node types.
ast.Expression class
Holds the root node of an expression.
ast.CodeBlock class

Holds a sequence of statements that define a code block. Code blocks may or may not have surround braces depending on context. (eg: the code block for the root scope of a file doesn't have braces, the code block for a try statement always does and the code block of a while statment will if there is more than one contained statement)

ast.Parameter class
Represents a parameter to a function.
ast.Statement class
Base class for all statements. A statement is anything that can appear in a CodeBlock.
ast.Statement* classes
Implementations of each of the statement types.
ast.StatementExpression class

Deserves special mention, is used to store an expression as a statement - simply wraps up an ast.Expression allowing it to be stored in a code block.