Implementation Notes
Getting started with MiniME's code base
This page describes important information for anyone interested in working on the MiniME code base.
Code Repository
The source code for MiniME is available from GitHub:
(If you're not familiar with Git, GitHib has several useful tutorials).
Development Environment
To build MiniME you will need:
Unit Tests
To run the unit tests, edit the project settings for the MiniMETestCases project and set the following:
- Start External Program -
C:\Program Files (x86)\NUnit 2.5.5\bin\net-2.0\nunit.exe(or equivalent) - Command line arguments -
MiniMETestCases.dll /run
Each unit test is defined as a text file resource that uses lines of hyphens to delimit the end of the input script from the expected output script, for example:
// This is a test scipt
function fn()
{
}
-----
function fn(){}
-----
Comments about the test can be inserted as regular JavaScript comments in the input text area.
Some MiniME options can be set using a JavaScript comment with the option name surrounded in square brackets - see the included test scripts for examples.
How it Works
In processing a JavaScript file, MiniME does the following:
- Loads the file into a
StringScanner - Creates a
Tokenizerthat reads characters from theStringScanner -
Creates a
Parserthat reads tokens from theTokenizerand generates a complete Abstract Syntax Tree. -
Runs the
VisitorScopeBuildervisitor over the AST and creates a heirarchy ofSymbolScopesand connects each to it's defining function. -
Runs the
VisitorCombineVarDeclsvisitor to combine consecutive variable declarations into a single statement. -
Runs the
VisitorSymbolDeclarationvisitor to find all symbol and member declarations and populates eachSymbolScopewith the symbols and members it defines. -
Runs the three
VisitorConstDetectorPass[1,2,3]visitors to detect constant symbol declarations, eliminate symbols that are subsequent modified (and therefore not const) and finally remove symbols that are determined to be constant. -
Runs the
VisitorSimplifyExpressionsvisitor that instructs all expressions to attempt to simplify themselves. -
Runs the
VisitorSymbolUsagevisitor to calculate the usage frequency of all symbols and members. - Prepares all SymbolScopes by sorting used symbols by frequency and determining which symbols can be obfuscated.
- Renders the AST, building the final output script.
Terms
In this documentation and the MiniME codebase itself, the following terms are used:
- Abstract Syntax Tree (AST)
-
A heirarchial representation of the entire structure of a JavaScript file as a set of objects.
- Accesibility
- Whether a symbol is private or public.
- Compile
-
The process of tokenizing and parsing a JavaScript file into an Abstract Syntax Tree, optionally performing a set of transformations and then re-rendering the AST as a new JavaScript file.
Obviously this is not a true definition of what it means to Compile code, but explains how it is used in the context of MiniME.
- Frequency
-
The number of times a Symbol or Member is used - used to allocate shorter names to more frequently used symbols.
- Member
- Any symbol on the right-hand side of a period. ie: object properties and methods.
- Parse
- The process of reading a stream of tokens and producing an AST.
- Rank
- The position of a symbol in a list of symbols sorted by frequency.
- Render
- The process of generating JavaScript code from an AST.
- Scope
- Either the global scope of a JavaScript file, or the scope of a function body.
- Symbol
-
Any symbol, not on the right-hand side of a period. ie: parameters, local variables, global variables and non-anonymous function names.
- Token
-
A single element in a JavaScript file, eg: an operator, identifier, keyword, string ornumber literal etc...
- Tokenize
- The process of reading a stream of characters and generating a stream of tokens.
- Visitor
- A object the enumerates the entire AST inpecting each node.
Project Overview
The MiniME codebase is contained within a single Visual Studio 2008 solution with the following projects:
- MiniME
- The main MineME.dll assembly that contains all the code of interest.
- MiniMETestCases
- NUnit test cases
- mm
- The command line console application, which serves as a simple front end for MiniME.dll
Only the MiniME project is discussed in any detail here.
Class Overview
The following is a brief description of each of the classes in the MiniME project:
MiniME namespace ###
AccessibilitySpecclass-
Stores a symbol specifier for a
privateorpublicdirective with methods to parse and match the specifier against anast.ExprNodeIdentifier. CommandLineclass-
Utility class for processing command line arguments and helpers for displaying the command line logo and help.
This functionality is included in the main assembly because it was thought the ability to process response files through the API might be useful in a server environment.
CompileErrorclass-
Exception class thrown when errors in processing are encountered. Typically stores an error message and a
Bookmarkreference that points to the offending JavaScript code. Compilerclass-
The main MiniME compiler class - stores options, loads JavaScript files, processes all files, writes output files, checks file times.
Parserclass- Reads tokens from a
Tokenizerand builds an Abstract Syntax Tree for a JavaScript program. RenderContextclass-
Provides context for rendering the final output, including storing the current function scope, providing references to symbol and member name allocators and providing methods for rendering the actual output text.
StringScannerclass- Stores a current position in a string being parsed.
Symbolclass- Stores information about a symbol including it's scope, frequency, rank and accessibility.
SymbolAllocatorclass-
Stores a mapping of original symbol names to obfuscated names. Internally manages its own concept of scope and is responsible for allocating symbols according to frequency of use.
SymbolFrequencyclass-
Stores information about the frequency of a set of symbols and provides methods to sort accordingly.
SymbolScopeclass-
Represents a global or function scope and stores information about all symbols declared and used in that scope.
TextFileUtilsclass- Helpers for working with file encodings.
Tokenizerclass-
Reads characters from a
StringScannerand produces a stream of tokens for theParser.The Tokenizer is also responsible for processing
includedirectives and producing a continuous sequence of tokens to the parser. Utilsclass- Miscellanous utility functions
VisitorCombineVarDeclsclass- Visitor to merge consecutive
vardeclarations into a single statement. VisitorConstDetectorPassN class- Three pass visitors to detect constant declarations.
VisitorScopeBuilderclass- Allocates a
SymbolScopefor each function and connects scopes into a heirarchy. VisitorSimplifyExpressionsclass- Visits all expressions and requests they simplify themselves.
VisitorSymbolDeclarationclass- Finds symbol declarations and registers them with the enclosing symbol scope.
VisitorSymbolUsageclass- Updates the frequency count of symbols.
MiniME.ast namespace ###
The ast namespace encapsulates all the classes used to store the Abtract Syntax Tree
(AST) of the parsed input file(s).
ast.Nodeclass- Base class for all AST elements.
ast.ExprNodeclass- Base class for all expression nodes.
ast.ExprNode* classes- Implementations of each of the expression node types.
ast.Expressionclass- Holds the root node of an expression.
ast.CodeBlockclass-
Holds a sequence of statements that define a code block. Code blocks may or may not have surround braces depending on context. (eg: the code block for the root scope of a file doesn't have braces, the code block for a try statement always does and the code block of a while statment will if there is more than one contained statement)
ast.Parameterclass- Represents a parameter to a function.
ast.Statementclass- Base class for all statements. A statement is anything that can appear in a CodeBlock.
ast.Statement* classes- Implementations of each of the statement types.
ast.StatementExpressionclass-
Deserves special mention, is used to store an expression as a statement - simply wraps up an ast.Expression allowing it to be stored in a code block.