How to Write a Compiler #5 - The Visitor Pattern
If ever there was a design pattern that's perfectly suited to a job, it's the visitor pattern for working with an abstract syntax tree. But what is the visitor pattern and why is it so well suited?
If you've been doing software development for any length of time you've almost certainly heard of the visitor pattern - it's even got its own Wikipedia page.
But it's also quite possible that you've never actually used it. Rather than try to explain it in theoretical terms, let's look at it from a practical point of view.
(Normally I wouldn't devote a whole post to a design pattern, but it really is central to everything else the compiler does. It's also a rarely used pattern so I wanted to make sure we're on the same page on how it works).
Working with the AST
Now that our compiler has parsed the source code into an AST we need a way to work with the AST. We might want to go over it to check for errors, evaluate a constant expression node, or generate code. Or perhaps we just want to print it out.
Let's take that last one - "print it out" - and see how to implement it.
The Traditional OO Approach
In order to print out the AST we need to start at the top-level element and print the parts related to that element while also recursively calling its connected elements to print themselves.
The traditional OO way to do this would be to add an abstract method to the base AstElement
class named Print()
.
class AstElement
{
// omitted
public abstract void Print(IndentedTextwriter target);
}
(Let's assume a helper class IndentedTextWriter
that we can print to).
Next, we'd implement the Print
method on every class. For example, we'd add a Print
method to the if
statement like this:
public override void Print(IndentedTextWriter target)
{
target.Write("if (");
Condition.Print(target);
target.WriteLine(")");
TrueBlock.Print(target);
if (FalseBlock != null)
{
target.WriteLine("else");
FalseBlock.Print(target);
}
}
To print the entire AST all we have to do is call Print
on the top level AstCompilationUnit
and the entire program can be printed out. Nice!
Need to generate code? Repeat the above adding a GenerateCode()
method.
Need something else? Repeat again adding another method.
Very quickly a problem emerges: the code for each of these operations is spread over many different classes and files. Also, any state or context information for the operations (such as the target writer in the above example) needs to be passed around as parameters.
Containing an Operation to One Class
What we'd like to do is contain each operation on the AST to its own class.
Continuing with our example of printing out the AST, let's make a class called AstFormatter
:
class AstFormatter
{
public AstFormatter(IndentedTextWriter target)
{
_target = target;
}
public void Print(AstElement element)
{
// TODO
}
}
and let's move the print methods from all the AstElement
classes into our new AstFormatter
. The function to print if
statements now looks like this:
void PrintIfStatemnt(AstIfStatement stmt)
{
_target.Write("if (");
Print(stmt.Condition);
_target.WriteLine(")");
Print(stmt.TrueBlock);
if (stmt.FalseBlock != null)
{
_target.WriteLine("else");
Print(FalseBlock);
}
}
This is looking good:
- We've moved all our printing code to one class
- We don't need to pass the target text writer around anymore since we can just store it as a member variable on the
AstFormatter
class.
But we've also created a mess. The worst kind of mess. A slow mess.
See that TODO
comment in the Print
method? It needs to inspect every element's type to work out which print method to call for each element:
if (element is AstIfStatement stmt_if)
PrintIfStatement(stmt_if);
else if (element is AstForStatement stmt_for)
PrintForStatement(stmt_for);
else if (element is AstWhileStatement stmt_while)
PrintWhileStatement(stmt_while);
else if
// ... for every ... type ... of ... element. Ugh
We've solved the original problem, but:
- it's slow - we need to test every element for every type.
- it's messy - just look at it!
- it's unsafe - we won't get compiler errors if we forget to implement an element (or add a new element type).
- it's painful - we need to re-type (or cut/copy/replace) that code for every different operation.
What we need is a type-safe, fast way to dispatch those PrintXXX()
methods.
Using a Dispatch Interface
Imagine we've implemented all our operations using the above approach: a separate class for each operation, with a function that dispatches to methods for each element type.
What you'll notice is that you end up with each class having a method to handleif
statements, a method to handle for
statements, a method to handle while
statement etc...
Let's force each of those classes to implement those methods by defining an interface IStatementProcesor
that has a method for each statement type.
interface IStatementProcessor
{
void ProcessIfStatement(AstIfStatement stmt);
void ProcessForStatement(AstForStatement stmt);
void ProcessWhileStatement(AstForStatement stmt);
// etc.. for all statement types
}
By implementing this interface on an operation class we can ensure that it implements the functionality for all element types:
class AstFormatter : IStatementProcessor
{
// If we don't implement a particular element type...
// the compiler will complain
}
That solves the "unsafe" problem with the previous approach, but we can also use it to solve the "slow" and "messy" problems by adding a Process
method to the element classes:
abstract class AstElement
{
public abstract void Process(IStatementProcessor processor)
}
class AstIfStatement : AstElement
{
public override void Process(IStatementProcessor processor)
{
processor.ProcessIfStatment(this);
}
}
In other words, if you call Process
on an element, passing an instance of IStatementProcessor
you'll be called back through that interface on the correct method for the element's type.
Now the big slow mess can be fixed:
// GET RID OF THIS
/*
if (element is AstIfStatement stmt_if)
PrintIfStatement(stmt_if);
else if (element is AstForStatement stmt_for)
PrintForStatement(stmt_for);
else if (element is AstWhileStatement stmt_while)
PrintWhileStatement(stmt_while);
else if ...
*/
element.Process(this);
Oh, and by the way, we've also solved the "painful" problem. Right clicking on the interface in Visual Studio there's a command "Implement Interface Explicitly" and stubs for every method will be automatically generated and we can just fill in the blanks.
What I've just described is the Visitor Pattern... so let's call it that.
The Visitor Pattern
Now that we understand what the visitor pattern is, let's look at how it's actually implemented in C-minor for the AST.
As before, there are two categories of visitors - statements and expressions and since we might want to have return values when processing elements the interfaces are defined with a generic type parameter T
for the return type:
public interface IAstStatementVisitor<T>
{
T Visit(AstIfStatement stmt);
T Visit(AstForStatement stmt);
T Visit(AstWhileStatement stmt);
// etc
}
public interface IAstExprNodeVisitor<T>
{
T Visit(AstExprNodeLiteral node);
T Visit(AstExprNodeUnaryOp node);
T Visit(AstExprNodeBinaryOp node);
// etc...
}
(Note: we can use method overloading to name every method Visit()
instead of VisitIfStatement()
, VisitForStatement()
etc...)
The AstStatement
and AstExprNode
bases classes both get an abstract Visit
method:
public class AstStatement
{
public abstract T Visit<T>(IAstStatementVisitor<T> visitor);
}
public class AstExprNode
{
public abstract T Visit<T>(IAstExprNodeVisitor<T> visitor);
}
and every statement and expression node implements the Visit
method with a simple one liner:
public class AstIfStatement
{
public override T Visit<T>(IAstStatementVisitor<T> visitor) => visitor.Visit(this);
}
Revisiting (ha-ha) the AstFormatter
Now that we've formalized our visitor pattern, let's take a look at how our original AstFormatter
class now looks:
class AstFormatter :
IAstStatementVisitor<bool>,
IAstExprNodeVisitor<bool>
{
public AstFormatter(IndentedTextWriter target)
{
_target = target;
}
public void Print(AstStatment stmt)
{
stmt.Visit(this);
}
bool IAstStatementVisitor<bool>.Visit(AstIfStatement stmt)
{
_target.Write("if (");
stmt.Condition.Visit(this);
_target.WriteLine(")");
stmt.TrueBlock.Visit(this);
if (stmt.FalseBlock != null)
{
_target.WriteLine("else");
stmt.FalseBlock.Visit(this);
}
return true;
}
bool IAstStatementVisitor<bool>.Visit(AstForStatement stmt)
{
// TODO
}
// etc...
}
Note:
- C# doesn't allow a
void
generic type parameter sobool
is used and its return value is always ignored. - The interfaces are implemented explicitly which keeps the public API to the
AstFormatter
nice and clean.
Who Are These Mysterious Visitors?
To put into perspective just how important the visitor pattern is, here's the full list of visitors currently used in C-minor's compiler:
- AstFormatter
- CodeGenerator (C code)
- ControlFlowAnalysis
- HydrateTypes
- NameResolver
- ScaffoldTypes
- StatementBinder
- TopLevelStatements
- ConstantExpressionEvaluator
- SideEffects
Some of these classes are quite complex and without the visitor pattern their code would be scattered across many classes and files and we'd need to pass some sort of context class around to store the state of the operation.
The Down Side of the Visitor Pattern
There's only one small downside I've found with the visitor pattern - we can't pass parameters to the visit methods.
For example, in the ControlFlowAnalysis
visitor there's an "entry state" that needs to be passed to each element as it's visited.
My solution is to store that information as a member of the operation class with a helper function that sets it before visiting the element. The Visit
callback then picks it up again once it's been called.
// Helper function to visit a statement, passing an entryState
CFSTate Visit(AstStatement stmt, CFState entryState)
{
_entryState = entryState;
return stmt.Visit(this);
}
CFState Visit(AstIfStatement stmt)
{
// Would be nicer if this was a parameter, but oh well.
var entryState = _entryState;
}
It's a bit of a hacky solution, and I could update the visitor interfaces with at least one generic parameter, but I don't think it's worth it.
Enough Already!
Ok, so I think I've gone on enough about the visitor pattern, but just to sum up why it's used:
- Code for each operation on the AST can be contained to a single class.
- The AST elements know nothing about the operations being performed on them (this is good).
- We can add operations without touching the AST classes.
- It's fast since the dispatch is through virtual methods, not slow type tests.
- The operations can implement the separate statement and/or expression node visitor patterns as required.
- If we add a new AST element type, we have to add the associated method to the visitor interface in order to implement its
Visit
method... and then we're forced to implement the method on all classes. ie: it's type safe.
From now on, pretty much everything the C-minor compiler does centers around the AST - and the visitor pattern is the mechanism for doing it.