How to Write a Compiler #7 - Top Level Statements

C-minor's top-level statements are implemented by manipulating the AST before invoking the other compilation stages and are easy introduction to working with the AST.

How to Write a Compiler #7 - Top Level Statements

One of C-minor's primary goals is to provide a nice language for scripting and automation and as such it should be possible to write code with a minimum amount of boilerplate and plumbing code.

In C# this feature is called "top-level statements" (see here) and it removes the need for a Program class and a main method. Internally this is implemented by wrapping the top-level statements in a method of a class.

A similar approach is used with C-minor, but there's a small catch-22.

The Catch

Eventually I'd like C-minor to support that same kind of top-level statements as C# in which all the top-level statements are simply wrapped in a function. However, this requires support for closures and nested functions - which C-minor doesn't yet have.

So, for the time being there's a restriction on C-minor's top-level support: only functions and variables can be used at the top level - not control flow or expression statements - and they're wrapped in a class, not a method in a class.

Later, after I've implemented closures, this restriction will be relaxed and full support for top-level statements will be enabled.

It's Just Syntactic Sugar

In C-minor, top-level statements are just syntactic sugar in which the compiler automatically wraps any such statements in class.

💡
"Syntactic Sugar" is a term used to describe a language feature that makes things easier to read or express - read more.

In other words, all code in a C-minor program must reside inside a class - but for top-level statements, the compiler automatically generates this class and puts the code in it.

The mechanism for do this is by manipulating the AST.

The TopLevelStatements Visitor

As explained in previous posts, the way to work with the AST is with the visitor pattern and that's exactly what we do here.

The TopLevelStatements class implements the IAstStatementVisitor interface and its job is to locate any top-level statements and move them into an enclosing wrapper class called $global.

public class TopLevelStatements : IAstStatementVisitor<bool>
{
   ...

   AstClassOrStructDeclarationStatement _globalClass;
}

Firstly, we need a helper function to create the global class. Creating the global class is delayed until needed in case there are no top-level statements.

// Create (or return) the global class that encloses
// all top-level statements
AstClassOrStructDeclarationStatement GetGlobalClass()
{
    // First time? If so create the class
    if (_globalClass == null)
    {
        _globalClass = new AstClassOrStructDeclarationStatement(_unit.Position)
        {
            Name = "$global",
            IsClass = true,
            Modifiers = Modifiers.Partial | Modifiers.Public | Modifiers.Static,
            Statements = new(),
        };
    }
    return _globalClass;
}

Next, since we're only looking for top-level statements there's no need to recurse through the entire AST. Instead each statement in the root compilation unit is visited and if it returns false that means it was a top-level statement and should be removed from the root compilation unit.

After processing all the statements, if any top-level statements were found, the global class will have been created and we add it to the root compilation unit.

bool IAstStatementVisitor<bool>.Visit(AstCompilationUnit stmt)
{
    // Process all statements
    for (int i = 0; i < stmt.Statements.Count; i++)
    {
        if (!stmt.Statements[i].Visit(this))
        {
            stmt.Statements.RemoveAt(i);
            i--;
        }
    }

    // Add the global class declaration to the unit
    if (_globalClass != null)
        stmt.Statements.Add(_globalClass);

    return true;
}

Here's the visitor for the function declaration statement - it just adds the statement to the global class and returns false - so it's removed from the root compilation unit.

bool IAstStatementVisitor<bool>.Visit(AstFunctionDeclarationStatement stmt)
{
    // Move it to the global class
    GetGlobalClass().Statements.Add(stmt);
    stmt.Modifiers |= Modifiers.Public | Modifiers.Static;
    return false;
}

Note the statement is also updated to implicitly mark it as public and static.

Variable declarations are handled in exactly the same way and all other statements throw an error - since they're not supported a the top-level (at least not yet).

Before and After

To see the effect of the top-level statement visitor we can create a test case that shows the raw and the post-processed AST.

void main()
{
	Console.WriteLine("Hello World");
}

## Raw-AST

void main()
{
    Console.WriteLine("Hello World");
}

## AST

public static partial class $global
{
    public static void main()
    {
        Console.WriteLine("Hello World");
    }
}

Notice how the original void main() function is now public static void main() on the $global class.

Wrapping Up

That's it for top-level statements.

Top-level Statements remove the need for much of the plumbing and boilerplate code that would otherwise be necessary when all code must reside in a class and/or function.

The TopLevelStatements visitor is the first step after parsing and it just provides syntactic sugar that wraps any such statements in a class.

This approach means that code is always in a class and it removes the need for any further special handling by the later stages of the compiler.