How to Write a Compiler #1 - Introducing C-minor

C-minor is a strongly typed, garbage collected language that compiles to in-memory machine code for direct execution. Here's how I built it.

How to Write a Compiler #1 - Introducing C-minor
💡
This post is part of the series How to Write a Compiler based on my experience building C-minor - a strongly typed, garbage collected, direct to in-memory machine-code scripting language.

Like many developers I've often thought about writing a compiler for a programming language. Recently I had reason enough to look into making a proof-of-concept scripting language for my music software Cantabile.

Although just a proof-of-concept I've learned a lot and thought it might be worth sharing what I've learned about "How to Write a Compiler".

First up though, what is this language?

Introducing C-minor

C-minor is a strongly typed, garbage collected language that compiles to in-memory machine code for direct execution.

  • The compiler is written in C#.
  • The supporting runtime library and garbage collector are written in plain C.
  • The language is heavily influenced by C#.
  • It has a focus on speed - which is why it's strongly typed and compiled to machine code instead of interpreted.
  • It's strictly single threaded - as a feature.
  • The name "C-minor" was chosen to reflect its musical connection to Cantabile and its C-style syntax. The word "minor" suggests this is a language for scripting and automation and not for building fully-fledged applications.

(Although influenced by C# it obviously doesn't have most of C#'s features - but anything it does has been shamelessly stolen).

Just how "Proof-of-Concepty" is it?

Very! It's not useful for anything other than as a proof-of-concept.

That said the initial goal for this project was just to get something working and it is "working":

  • Primitive data types - bool, char, sbyte, byte, short, ushort, int, uint, long, ulong, float, double and string.
  • All the standard math, logical, relational and bitwise operators for the built in types.
  • Explicit and implicit type casting of primitive data types.
  • Functions with simple parameters and local variables.
  • Function overloading and overload resolution.
  • Flow control statements - if/else, while, do-while, for, switch .
  • Support for exceptions (currently by throwing a string since no class support), along with try/catch/finally blocks.
  • Interpolated strings and numeric formatting.
  • An incremental "in-series" garbage collector.
  • A few library functions - just enough to run the test cases (eg: string.Substring, Console.Write/WriteLine).

Other things of note:

  • The compiler generates C code that is then compiled to in-memory machine code using Tiny-C. I might do a direct native code generator at a later date.
  • There's a front-end command line program to run C-minor source files and test cases.
  • The front-end can either run C-minor programs directly, produce C code or produce .exe files.

An Example Program

Here's an example program:

void main()
{
  for (int i = 0; i < 3; i++)
  {
    Console.WriteLine($"#{i+1}: Hello World from C-minor");
  }
}

and its output:

#1: Hello World from C-minor
#2: Hello World from C-minor
#3: Hello World from C-minor

(I'd like to get rid of that main and have true top-level statements - I'll explain why I haven't in a later post).

Here's a more interactive look at it in action:

What's Next?

I'm going to chip away at this as a side project to Cantabile - partly because I'm enjoying the challenge and partly because it might get to the point of being useful.

In the meantime, I'm going to write some articles that go pretty deep on how it works and hope to cover everything from tokenization, parsing and semantic and control flow analysis through to code generation, the runtime and the garbage collector.

I'll also be touching on some interesting design patterns (in particular the much under-rated visitor pattern), some Big-O analysis and an interesting approach to testing.

In other words, this won't be hand-waving about abstract concepts - this will be everything you need to know to write your own compiler - or at least have a better understanding of how one works.

If you've ever been curious about how a modern compiler works (and who hasn't) then I think you'll enjoy these upcoming posts.

It always starts with "Hello World" and who knows where it goes from there.