<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Topten Software Blog]]></title><description><![CDATA[Design, Development, Technology]]></description><link>https://www.toptensoftware.com/blog/</link><image><url>https://www.toptensoftware.com/blog/favicon.png</url><title>Topten Software Blog</title><link>https://www.toptensoftware.com/blog/</link></image><generator>Ghost 5.75</generator><lastBuildDate>Mon, 27 Apr 2026 06:34:57 GMT</lastBuildDate><atom:link href="https://www.toptensoftware.com/blog/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[How to Write a Compiler #7 - Top Level Statements]]></title><description><![CDATA[C-minor's top-level statements are implemented by manipulating the AST before invoking the other compilation stages and are easy introduction to working with the AST.]]></description><link>https://www.toptensoftware.com/blog/how-to-write-a-compiler-7-top-level-statements/</link><guid isPermaLink="false">65da6d1e649514000101052e</guid><category><![CDATA[h2wac]]></category><category><![CDATA[c-minor]]></category><category><![CDATA[C#]]></category><category><![CDATA[AMD]]></category><category><![CDATA[big80]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Wed, 28 Feb 2024 05:59:30 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2024/02/BannerTopLevelStatements.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2024/02/BannerTopLevelStatements.png" alt="How to Write a Compiler #7 - Top Level Statements"><p>One of C-minor&apos;s primary goals is to provide a nice language for scripting and automation and as such it should be possible to write code with a minimum amount of boilerplate and plumbing code.</p><p>In C# this feature is called &quot;top-level statements&quot; (<a href="https://learn.microsoft.com/en-us/dotnet/csharp/tutorials/top-level-statements?ref=toptensoftware.com" rel="noreferrer">see here</a>) and it removes the need for a <code>Program</code> class and a <code>main</code> method.  Internally this is implemented by wrapping the top-level statements in a method of a class. </p><p>A similar approach is used with C-minor, but there&apos;s a small catch-22.</p><h3 id="the-catch">The Catch</h3><p>Eventually I&apos;d like C-minor to support that same kind of top-level statements as C# in which all the top-level statements are simply wrapped in a function.  However, this requires support for closures and nested functions - which C-minor doesn&apos;t yet have.</p><p>So, for the time being there&apos;s a restriction on C-minor&apos;s top-level support: only functions and variables can be used at the top level - not control flow or expression statements - and they&apos;re wrapped in a class, not a method in a class.</p><p>Later, after I&apos;ve implemented closures, this restriction will be relaxed and full support for top-level statements will be enabled.</p><h3 id="its-just-syntactic-sugar">It&apos;s Just Syntactic Sugar</h3><p>In C-minor, top-level statements are just syntactic sugar in which the compiler automatically wraps any such statements in class.</p><div class="kg-card kg-callout-card kg-callout-card-blue"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text">&quot;Syntactic Sugar&quot; is a term used to describe a language feature that makes things easier to read or express - <a href="https://en.wikipedia.org/wiki/Syntactic_sugar?ref=toptensoftware.com" rel="noreferrer">read more</a>.</div></div><p>In other words, all code in a C-minor program must reside inside a class - but for top-level statements, the compiler automatically generates this class and puts the code in it.</p><p>The mechanism for do this is by manipulating the AST.</p><h3 id="the-toplevelstatements-visitor">The TopLevelStatements Visitor</h3><p>As explained in previous posts, the way to work with the AST is with the visitor pattern and that&apos;s exactly what we do here.  </p><p>The <code>TopLevelStatements</code> class implements the <code>IAstStatementVisitor</code> interface and its job is to locate any top-level statements and move them into an enclosing wrapper class called <code>$global</code>.</p><pre><code class="language-csharp">public class TopLevelStatements : IAstStatementVisitor&lt;bool&gt;
{
   ...

   AstClassOrStructDeclarationStatement _globalClass;
}</code></pre><p>Firstly, we need a helper function to create the global class.  Creating the global class is delayed until needed in case there are no top-level statements.</p><pre><code class="language-csharp">// Create (or return) the global class that encloses
// all top-level statements
AstClassOrStructDeclarationStatement GetGlobalClass()
{
    // First time? If so create the class
    if (_globalClass == null)
    {
        _globalClass = new AstClassOrStructDeclarationStatement(_unit.Position)
        {
            Name = &quot;$global&quot;,
            IsClass = true,
            Modifiers = Modifiers.Partial | Modifiers.Public | Modifiers.Static,
            Statements = new(),
        };
    }
    return _globalClass;
}</code></pre><p>Next, since we&apos;re only looking for top-level statements there&apos;s no need to recurse through the entire AST. Instead each statement in the root compilation unit is visited and if it returns <code>false</code> that means it was a top-level statement and should be removed from the root compilation unit.</p><p>After processing all the statements, if any top-level statements were found, the global class will have been created and we add it to the root compilation unit.</p><pre><code class="language-csharp">bool IAstStatementVisitor&lt;bool&gt;.Visit(AstCompilationUnit stmt)
{
    // Process all statements
    for (int i = 0; i &lt; stmt.Statements.Count; i++)
    {
        if (!stmt.Statements[i].Visit(this))
        {
            stmt.Statements.RemoveAt(i);
            i--;
        }
    }

    // Add the global class declaration to the unit
    if (_globalClass != null)
        stmt.Statements.Add(_globalClass);

    return true;
}
</code></pre><p>Here&apos;s the visitor for the function declaration statement - it just adds the statement to the global class and returns false - so it&apos;s removed from the root compilation unit.</p><pre><code class="language-csharp">bool IAstStatementVisitor&lt;bool&gt;.Visit(AstFunctionDeclarationStatement stmt)
{
    // Move it to the global class
    GetGlobalClass().Statements.Add(stmt);
    stmt.Modifiers |= Modifiers.Public | Modifiers.Static;
    return false;
}
</code></pre><p>Note the statement is also updated to implicitly mark it as <code>public</code> and <code>static</code>.</p><p>Variable declarations are handled in exactly the same way and all other statements throw an error - since they&apos;re not supported a the top-level (at least not yet).</p><h3 id="before-and-after">Before and After</h3><p>To see the effect of the top-level statement visitor we can create a test case that shows the raw and the post-processed AST. </p><pre><code class="language-csharp">void main()
{
	Console.WriteLine(&quot;Hello World&quot;);
}

## Raw-AST

void main()
{
    Console.WriteLine(&quot;Hello World&quot;);
}

## AST

public static partial class $global
{
    public static void main()
    {
        Console.WriteLine(&quot;Hello World&quot;);
    }
}
</code></pre><p>Notice how the original <code>void main()</code> function is now <code>public static void main()</code> on the <code>$global</code> class.</p><h3 id="wrapping-up">Wrapping Up</h3><p>That&apos;s it for top-level statements. </p><p>Top-level Statements remove the need for much of the plumbing and boilerplate code that would otherwise be necessary when all code must reside in a class and/or function.</p><p>The <code>TopLevelStatements</code> visitor is the first step after parsing and it just provides syntactic sugar that wraps any such statements in a class. </p><p>This approach means that code is always in a class and it removes the need for any further special handling by the later stages of the compiler.</p><p></p>]]></content:encoded></item><item><title><![CDATA[How to Write a Compiler #6 - Testing]]></title><description><![CDATA[For language products typical unit testing doesn't work well long term.  Instead I recommend sandboxing for early prototyping and a dedicated test case file runner for more end-to-end style testing.]]></description><link>https://www.toptensoftware.com/blog/how-to-write-a-compiler-6-testing/</link><guid isPermaLink="false">65b471d9649514000101027e</guid><category><![CDATA[h2wac]]></category><category><![CDATA[c-minor]]></category><category><![CDATA[C#]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Thu, 22 Feb 2024 02:40:37 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2024/02/TestingBanner.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2024/02/TestingBanner.png" alt="How to Write a Compiler #6 - Testing"><p>On other language-based projects that I&apos;ve worked on I&apos;ve found that coded unit tests are cumbersome and a better approach is end-to-end testing with a dedicated test runner. </p><p>This post describes the approach I&apos;ve taken with C-minor.</p><h3 id="starting-with-sandboxing">Starting with Sandboxing</h3><p>For this project I didn&#x2019;t want to waste time writing a comprehensive set of unit-tests for the tokenizer and parser as I knew those tests would soon be superceded by end-to-end testing. </p><p>Instead, I used a throw away sandbox program to exercise these parts to confirm they were basically working. Along with an AST formatter, I then had all the pieces required to build a custom test runner. </p><h3 id="requirements-for-a-testing-framework">Requirements for a Testing Framework</h3><p>The main thing I wanted to avoid with testing is the need to embed C-minor programs as strings in C# test case code because:</p><ul><li>it would require escaping C-minor code as C# strings.</li><li>it&apos;s hard to correlate line numbers in error messages from the C-minor compiler to line numbers in an embedded string.</li></ul><p>In other words, the source C-minor code had to come straight from a file. </p><p>Besides the C-minor program I knew the test cases would also be working with other blocks of text including reformatted AST listings, error messages from the compiler, generated code, perhaps assembly listings and the output from running a C-minor program. </p><p>Here&#x2019;s the final requirements I decided on for the test cases:</p><ul><li>a test case file that included an input C-minor program</li><li>line number error messages should match directly with the input program source code file</li><li>ability to work with other blocks of text - including output from the compiler for diagnostic purposes and other blocks of text to be tested for correct output</li><li>ability to list the input program&#x2019;s AST and compare to an expected AST</li><li>ability to check the output of a C-minor program to confirm correct behaviour</li><li>ability to check for expected compiler error messages</li><li>ability to show actual results when they don&#x2019;t match expected results</li><li>ability to produce diagnostic listings including generated code and post-processed AST listings </li><li>the test runner should be able to run individual test cases or a whole recursive directory of test cases</li><li>the test runner should be a console mode app so it can be run on any platform and from within a coding editor such as VS Code</li><li>the output of the test runner itself should be minimal just listing failed tests</li></ul><h3 id="c-minor-test-case-file-format">C-minor Test Case File Format</h3><p>The file format I decided on for the test case files is extremely simple:</p><ul><li>at the top of the file is a C-minor program</li><li>this is followed by sections of text delimited by section headings each starting with <code>##</code>.</li></ul><p>A <code>TestCaseFile</code> utility class (<a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Test.TestCaseFile?ref=toptensoftware.com" rel="noreferrer">see here</a>) provides support for loading a file, splitting it into sections, methods to add/remove and replace sections and then write it out again. </p><h3 id="testing-the-tokenizer-and-parser">Testing the Tokenizer and Parser</h3><p>To test the tokenizer and parser the test cases support a section named <code>## Expected-Raw-AST</code>.  When a test contains this section the test runner parses the code section to an AST before reformatting it back into code and comparing it to the expected AST. </p><pre><code>// C-minor program at the top
void main()
{
    // Print a message
    Console.WriteLine(&quot;Hello World&quot;);

    /* Notice comments have been stripped below */
}

## Expected-Raw-AST
void main()
{
    Console.WriteLine();
}</code></pre><p>If the AST matches, the test passes and there&apos;s nothing else to do.  If the test fails (as the above example would), the test runner adds a<code>## Actual-Raw-AST</code> to the test case file:</p><pre><code>// C-minor program at the top
void main()
{
    // Print a message
    Console.WriteLine(&quot;Hello World&quot;);

    /* Notice comments have been stripped below */
}

## Expected-Raw-AST
void main()
{
    Console.WriteLine();
}

## Actual-Raw-AST
void main()
{
    Console.WriteLine(&quot;Hello World&quot;);
}

</code></pre><p>From here, there&apos;s two possibilities:</p><ul><li>There&apos;s a bug in the tokenizer/parser. Once fixed the test can be re-run, the expected AST will match and the<code>## Actual-Raw-AST</code> section will be automatically deleted by the test runner.</li><li>Or, there&apos;s a mistake in the test case and to fix it I can just copy the Actual AST to the Expected AST section and the test will now pass.</li></ul><p>By having the actual AST added right there to the file, everything about the test is contained to one file. It also makes for an easy way to setup these tests:</p><ol><li>Create the input program</li><li>Add an empty <code>## Expected-Raw-Ast</code> section</li><li>Run the test, it will fail and the test runner will add the <code>## Actual-Raw-Ast</code> section</li><li>Review the actual AST to ensure it&apos;s correct</li><li>Copy the actual to expected AST</li></ol><p><em>(Note: the reason it&apos;s called the &quot;raw&quot; AST is that later we&apos;ll be doing manipulations on the AST to produce a post-processed AST.  Using the prefix &quot;raw&quot; keeps the two AST&apos;s separately testable by the test runner)</em></p><h3 id="testing-for-errors">Testing for Errors</h3><p>It&#x2019;s also important to test the compiler generates correct error messages when it should.  To support this, the test case can include an <code>## Expected-Error</code> section.  </p><p>If the compiler doesn&apos;t generate this text as an error output, the test runner adds an <code>## Actual-Error</code> section with the errors that were actually generated (if any).</p><pre><code>void main()
{
    Console.WriteLine(&quot;Hello World&quot;)
}

## Expected-Error
error: code(4,1,4,2): syntax error: expected &apos;;&apos;, found &apos;}&apos;
</code></pre><p>Just like the AST sections the <code>## Actual-Error</code> section makes it easy to initially setup these tests.</p><h3 id="project-structure-overview">Project Structure Overview</h3><p>We&apos;re now almost at the point where we can actually run some tests, but first let&apos;s have a look at the project structure and where everything lives.</p><p>For this project I&apos;ve decided to split things into a few top-level projects:</p><ul><li><code>Topten.CMinor.Compiler</code> - the compiler itself as an assembly (.dll)</li><li><code>Topten.Cminor.FrontEnd</code> - a command line program for launching the compiler (cmc.exe)</li><li><code>Topten.Cminor.Runtime</code> - the .NET API to the runtime</li><li><code>cmrt.dll</code> - the actual (runtime written in C)</li></ul><p><em>(I haven&apos;t covered the runtime yet, but I&apos;ve included it here for completeness)</em></p><p>The <code>Topten.CMinor.Compiler</code> project contains everything related to actually compiling code.  It also includes a <code>Compiler</code> class that wraps all the compilation stages - <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Compiler?ref=toptensoftware.com" rel="noreferrer">see here</a>.</p><p>Here&apos;s an example of using the compiler to parse a file (or string) and render it as an AST:</p><pre><code class="language-csharp">// Create an instance of the compiler
var c = new Compiler();

// Pass compiler messages to console
c.OnMessage = (m) =&gt; Console.Error.WriteLine(m.ToString());

// Add files to be compiled
c.AddFile(&quot;myfile.cm&quot;);

// Or, add code from a string
//c.AddCode(&quot;void main() {}&quot;, &quot;mycode.cm&quot;);

// Parse it
c.Parse();

// Render AST
c.RenderAST(Console.Out);
</code></pre><h3 id="the-test-runner">The Test Runner</h3><p>All the pieces are now in place for building and running tests: </p><ol><li>the <code>TestRunner</code>  class (<a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Test.TestRunner?ref=toptensoftware.com" rel="noreferrer">see here</a>) in the <code>Topten.Cminor.Compiler</code> project loads a test case file, and then drives the <code>Compiler</code> class to actually run the tests.  It does this according to the <code>Expected-XXX</code> sections it finds in the test case file.</li><li>the front-end (cmc.exe) supports loading C-minor test case files (these have <code>.cmt</code> file extension). For <code>.cmt</code> files the front-end uses the <code>TestRunner</code> class to run the test instead of compiling or running the file directly (as it would for<code>.cm</code> files).</li><li>the front-end can run an entire recursive directory full of <code>.cmt </code>files to run a comprehensive set of tests with one command.</li></ol><h3 id="using-vs-code-to-develop-and-run-tests">Using VS Code to Develop and Run Tests</h3><p>To make running tests a little easier, I setup a couple of VS Code tasks:</p><pre><code class="language-json">{
    // See https://go.microsoft.com/fwlink/?LinkId=733558
    // for the documentation about the tasks.json format
    &quot;version&quot;: &quot;2.0.0&quot;,
    &quot;tasks&quot;: [
        {
            &quot;label&quot;: &quot;Run Current Test&quot;,
            &quot;type&quot;: &quot;shell&quot;,
            &quot;command&quot;: &quot;cmc ${file}&quot;,
            &quot;problemMatcher&quot;: [],
            &quot;group&quot;: {
                &quot;kind&quot;: &quot;build&quot;,
                &quot;isDefault&quot;: &quot;**/*.cmt&quot;
            }
        },
        {
            &quot;label&quot;: &quot;Run All Tests&quot;,
            &quot;type&quot;: &quot;shell&quot;,
            &quot;command&quot;: &quot;cmc .&quot;,
            &quot;problemMatcher&quot;: [],
            &quot;group&quot;: {
                &quot;kind&quot;: &quot;build&quot;,
                &quot;isDefault&quot;: true
            }
        }
    ]
}</code></pre><p>By putting this in a .vscode directory in the base test case directory I can:</p><ul><li>Open a <code>.cmt</code> file and press <code>Ctrl+Shift+B</code> to run it with the test runner.  Any changes made by the test runner (eg: adding an <code>Actual-XXX</code> section for a failed test) will automatically be reloaded by VS Code and appear immediately.</li></ul><p>Close all files and press <code>Ctrl+Shift+B</code> to run all the tests in the directory.  This produces a list of files that failed in the output window that I can <code>Ctrl+click</code> to open them and see what went wrong.</p><h3 id="testing-in-action">Testing in Action</h3><p>Here&apos;s a video showing the test runner in action.</p><figure class="kg-card kg-embed-card"><iframe width="200" height="113" src="https://www.youtube.com/embed/t6ThKwiK01A?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen title="Testing the C-minor Compiler"></iframe></figure><h3 id="time-to-get-testing">Time to Get Testing</h3><p>The approach I&apos;ve taken to testing with C-minor is a custom test case file that&apos;s updated with detailed information by the test runner when a test fails.  The output of the test runner is just a list of failed tests and a couple of tasks makes it easy to run individual or the entire test set from within VS Code.</p><figure class="kg-card kg-image-card"><img src="https://www.toptensoftware.com/blog/content/images/2024/02/image.png" class="kg-image" alt="How to Write a Compiler #6 - Testing" loading="lazy" width="1581" height="418" srcset="https://www.toptensoftware.com/blog/content/images/size/w600/2024/02/image.png 600w, https://www.toptensoftware.com/blog/content/images/size/w1000/2024/02/image.png 1000w, https://www.toptensoftware.com/blog/content/images/2024/02/image.png 1581w" sizes="(min-width: 720px) 720px"></figure><p>From here on, these test case files will be central to everything I do with the compiler.  I might use sandboxing for testing and experimenting with one-off things for the runtime, but anything related to the compiler itself will be developed and tested using these tests case files.</p><p></p>]]></content:encoded></item><item><title><![CDATA[How to Write a Compiler #5 - The Visitor Pattern]]></title><description><![CDATA[If ever there was a design pattern that's perfectly suited to a job, it's the visitor pattern for working with an abstract syntax tree. But what is the visitor pattern and why is it so well suited?]]></description><link>https://www.toptensoftware.com/blog/how-to-write-a-compiler-5-the-visitor-pattern/</link><guid isPermaLink="false">65acd6d6649514000101003b</guid><category><![CDATA[h2wac]]></category><category><![CDATA[c-minor]]></category><category><![CDATA[C#]]></category><category><![CDATA[design patterns]]></category><category><![CDATA[visitor pattern]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Wed, 14 Feb 2024 03:30:13 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2024/01/VisitorPattern.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card kg-callout-card kg-callout-card-green"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><i><em class="italic" style="white-space: pre-wrap;">This post is part of the series </em></i><a href="https://www.toptensoftware.com/blog/tag/h2wac" rel="noreferrer"><i><em class="italic" style="white-space: pre-wrap;">How to Write a Compiler</em></i></a><i><em class="italic" style="white-space: pre-wrap;"> based on my experience building C-minor - a strongly typed, garbage collected, direct to in-memory machine-code scripting language.</em></i></div></div><img src="https://www.toptensoftware.com/blog/content/images/2024/01/VisitorPattern.png" alt="How to Write a Compiler #5 - The Visitor Pattern"><p>If you&apos;ve been doing software development for any length of time you&apos;ve almost certainly heard of the visitor pattern - it&apos;s even got its own <a href="https://en.wikipedia.org/wiki/Visitor_pattern?ref=toptensoftware.com" rel="noreferrer">Wikipedia page</a>.  </p><p>But it&apos;s also quite possible that you&apos;ve never actually used it.  Rather than try to explain it in theoretical terms, let&apos;s look at it from a practical point of view.</p><p><em>(Normally I wouldn&apos;t devote a whole post to a design pattern, but it really is central to everything else the compiler does.  It&apos;s also a rarely used pattern so I wanted to make sure we&apos;re on the same page on how it works).</em></p><h3 id="working-with-the-ast">Working with the AST</h3><p>Now that our compiler has parsed the source code into an AST we need a way to work with the AST.  We might want to go over it to check for errors, evaluate a constant expression node, or generate code.  Or perhaps we just want to print it out.</p><p>Let&apos;s take that last one - &quot;print it out&quot; - and see how to implement it.</p><h3 id="the-traditional-oo-approach">The Traditional OO Approach</h3><p>In order to print out the AST we need to start at the top-level element and print the parts related to that element while also recursively calling its connected elements to print themselves.</p><p>The traditional OO way to do this would be to add an abstract method to the base <code>AstElement</code> class named <code>Print()</code>. </p><pre><code class="language-csharp">class AstElement
{
    // omitted

    public abstract void Print(IndentedTextwriter target);
}</code></pre><p><em>(Let&apos;s assume a helper class <code>IndentedTextWriter</code> that we can print to).</em></p><p>Next, we&apos;d implement the <code>Print</code> method on every class.  For example, we&apos;d add a <code>Print</code> method to the <code>if</code> statement like this:</p><pre><code class="language-csharp">public override void Print(IndentedTextWriter target)
{
    target.Write(&quot;if (&quot;);
    Condition.Print(target);
    target.WriteLine(&quot;)&quot;);

    TrueBlock.Print(target);

    if (FalseBlock != null)
    {
        target.WriteLine(&quot;else&quot;);
        FalseBlock.Print(target);
    }
}</code></pre><p>To print the entire AST all we have to do is call <code>Print</code> on the top level <code>AstCompilationUnit</code> and the entire program can be printed out.  Nice!</p><p>Need to generate code? Repeat the above adding a <code>GenerateCode()</code> method.</p><p>Need something else? Repeat again adding another method.</p><p>Very quickly a problem emerges:  the code for each of these operations is spread over many different classes and files.  Also, any state or context information for the operations (such as the target writer in the above example) needs to be passed around as parameters.</p><h3 id="containing-an-operation-to-one-class">Containing an Operation to One Class</h3><p>What we&apos;d like to do is contain each operation on the AST to its own class.</p><p>Continuing with our example of printing out the AST, let&apos;s make a class called <code>AstFormatter</code>:</p><pre><code class="language-csharp">class AstFormatter
{
    public AstFormatter(IndentedTextWriter target)
    {
        _target = target;
    }

    public void Print(AstElement element)
    {
        // TODO
    }
}</code></pre><p>and let&apos;s move the print methods from all the <code>AstElement</code> classes into our new <code>AstFormatter</code> .  The function to print <code>if</code> statements now looks like this:</p><pre><code class="language-csharp">void PrintIfStatemnt(AstIfStatement stmt)
{
    _target.Write(&quot;if (&quot;);
    Print(stmt.Condition);
    _target.WriteLine(&quot;)&quot;);

    Print(stmt.TrueBlock);

    if (stmt.FalseBlock != null)
    {
        _target.WriteLine(&quot;else&quot;);
        Print(FalseBlock);
    }
}</code></pre><p>This is looking good:</p><ol><li>We&apos;ve moved all our printing code to one class</li><li>We don&apos;t need to pass the target text writer around anymore since we can just store it as a member variable on the <code>AstFormatter</code> class.</li></ol><p>But we&apos;ve also created a mess.  The worst kind of mess.  A slow mess.  </p><p>See that <code>TODO</code> comment in the <code>Print</code> method?  It needs to inspect every element&apos;s type to work out which print method to call for each element:</p><pre><code class="language-csharp">if (element is AstIfStatement stmt_if)
  PrintIfStatement(stmt_if);
else if (element is AstForStatement stmt_for)
  PrintForStatement(stmt_for);
else if (element is AstWhileStatement stmt_while)
  PrintWhileStatement(stmt_while);
else if 

// ... for every ... type ... of ... element. Ugh</code></pre><p>We&apos;ve solved the original problem, but:</p><ul><li>it&apos;s slow - we need to test every element for every type.</li><li>it&apos;s messy - just look at it!</li><li>it&apos;s unsafe - we won&apos;t get compiler errors if we forget to implement an element (or add a new element type).</li><li>it&apos;s painful - we need to re-type (or cut/copy/replace) that code for every different operation.</li></ul><p>What we need is a type-safe, fast way to dispatch those <code>PrintXXX()</code> methods.  </p><h3 id="using-a-dispatch-interface">Using a Dispatch Interface</h3><p>Imagine we&apos;ve implemented all our operations using the above approach: a separate class for each operation, with a function that dispatches to methods for each element type.  </p><p>What you&apos;ll notice is that you end up with each class having a method to handle<code>if</code> statements, a method to handle <code>for</code> statements, a method to handle <code>while</code> statement etc...</p><p>Let&apos;s force each of those classes to implement those methods by defining an interface <code>IStatementProcesor</code> that has a method for each statement type.</p><pre><code class="language-csharp">interface IStatementProcessor
{
    void ProcessIfStatement(AstIfStatement stmt);
    void ProcessForStatement(AstForStatement stmt);
    void ProcessWhileStatement(AstForStatement stmt);
    // etc.. for all statement types
}</code></pre><p>By implementing this interface on an operation class we can ensure that it implements the functionality for all element types:</p><pre><code class="language-csharp">class AstFormatter : IStatementProcessor
{
    // If we don&apos;t implement a particular element type...
    // the compiler will complain
}</code></pre><p>That solves the &quot;unsafe&quot; problem with the previous approach, but we can also use it to solve the &quot;slow&quot; and &quot;messy&quot; problems by adding a <code>Process</code> method to the element classes:</p><pre><code class="language-csharp">abstract class AstElement
{
    public abstract void Process(IStatementProcessor processor)
}

class AstIfStatement : AstElement
{
    public override void Process(IStatementProcessor processor)
    {
        processor.ProcessIfStatment(this);
    }
}</code></pre><p>In other words, if you call <code>Process</code> on an element, passing an instance of <code>IStatementProcessor</code> you&apos;ll be called back through that interface on the correct method for the element&apos;s type.</p><p>Now the big slow mess can be fixed:</p><pre><code class="language-csharp">// GET RID OF THIS
/*
if (element is AstIfStatement stmt_if)
  PrintIfStatement(stmt_if);
else if (element is AstForStatement stmt_for)
  PrintForStatement(stmt_for);
else if (element is AstWhileStatement stmt_while)
  PrintWhileStatement(stmt_while);
else if ...
*/

element.Process(this);</code></pre><p>Oh, and by the way, we&apos;ve also solved the &quot;painful&quot; problem.  Right clicking on the interface in Visual Studio there&apos;s a command &quot;Implement Interface Explicitly&quot; and stubs for every method will be automatically generated and we can just fill in the blanks.</p><p>What I&apos;ve just described is the Visitor Pattern... so let&apos;s call it that.</p><h3 id="the-visitor-pattern">The Visitor Pattern</h3><p>Now that we understand what the visitor pattern is, let&apos;s look at how it&apos;s actually implemented in C-minor for the AST.</p><p>As before, there are two categories of visitors - statements and expressions and since we might want to have return values when processing elements the interfaces are defined with a generic type parameter <code>T</code> for the return type:</p><pre><code class="language-csharp">public interface IAstStatementVisitor&lt;T&gt;
{
  T Visit(AstIfStatement stmt);
  T Visit(AstForStatement stmt);
  T Visit(AstWhileStatement stmt);
  // etc
}

public interface IAstExprNodeVisitor&lt;T&gt;
{
  T Visit(AstExprNodeLiteral node);
  T Visit(AstExprNodeUnaryOp node);
  T Visit(AstExprNodeBinaryOp node);
  // etc...
}</code></pre><p><em>(Note: we can use method overloading to name every method <code>Visit()</code> instead of <code>VisitIfStatement()</code>, <code>VisitForStatement()</code> etc...)</em></p><p>The <code>AstStatement</code> and <code>AstExprNode</code> bases classes both get an abstract <code>Visit</code> method:</p><pre><code class="language-csharp">public class AstStatement
{
    public abstract T Visit&lt;T&gt;(IAstStatementVisitor&lt;T&gt; visitor);
}

public class AstExprNode
{
    public abstract T Visit&lt;T&gt;(IAstExprNodeVisitor&lt;T&gt; visitor);
}

</code></pre><p>and every statement and expression node implements the <code>Visit</code> method with a simple one liner:</p><pre><code class="language-csharp">public class AstIfStatement
{
    public override T Visit&lt;T&gt;(IAstStatementVisitor&lt;T&gt; visitor) =&gt; visitor.Visit(this);
}</code></pre><h3 id="revisiting-ha-ha-the-astformatter">Revisiting (ha-ha) the AstFormatter</h3><p>Now that we&apos;ve formalized our visitor pattern, let&apos;s take a look at how our original <code>AstFormatter</code> class now looks:</p><pre><code class="language-csharp">class AstFormatter : 
    IAstStatementVisitor&lt;bool&gt;,
    IAstExprNodeVisitor&lt;bool&gt;
{
    public AstFormatter(IndentedTextWriter target)
    {
        _target = target;
    }

    public void Print(AstStatment stmt)
    {
        stmt.Visit(this);
    }

    bool IAstStatementVisitor&lt;bool&gt;.Visit(AstIfStatement stmt)
    {
        _target.Write(&quot;if (&quot;);
        stmt.Condition.Visit(this);
        _target.WriteLine(&quot;)&quot;);

        stmt.TrueBlock.Visit(this);

        if (stmt.FalseBlock != null)
        {
            _target.WriteLine(&quot;else&quot;);
            stmt.FalseBlock.Visit(this);
        }
        return true;
    }

    bool IAstStatementVisitor&lt;bool&gt;.Visit(AstForStatement stmt)
    {
        // TODO
    }

    // etc...
}</code></pre><p>Note:</p><ul><li>C# doesn&apos;t allow a <code>void</code> generic type parameter so<code>bool</code> is used and its return value is always ignored.</li><li>The interfaces are implemented explicitly which keeps the public API to the <code>AstFormatter</code> nice and clean.</li></ul><h3 id="who-are-these-mysterious-visitors">Who Are These Mysterious Visitors?</h3><p>To put into perspective just how important the visitor pattern is, here&apos;s the full list of visitors currently used in C-minor&apos;s compiler:</p><ul><li>AstFormatter</li><li>CodeGenerator (C code)</li><li>ControlFlowAnalysis</li><li>HydrateTypes</li><li>NameResolver</li><li>ScaffoldTypes</li><li>StatementBinder</li><li>TopLevelStatements</li><li>ConstantExpressionEvaluator</li><li>SideEffects</li></ul><p>Some of these classes are quite complex and without the visitor pattern their code would be scattered across many classes and files and we&apos;d need to pass some sort of context class around to store the state of the operation.</p><h3 id="the-down-side-of-the-visitor-pattern">The Down Side of the Visitor Pattern</h3><p>There&apos;s only one small downside I&apos;ve found with the visitor pattern - we can&apos;t pass parameters to the visit methods.</p><p>For example, in the <code>ControlFlowAnalysis</code> visitor there&apos;s an &quot;entry state&quot; that needs to be passed to each element as it&apos;s visited.</p><p>My solution is to store that information as a member of the operation class with a helper function that sets it before visiting the element. The <code>Visit</code> callback then picks it up again once it&apos;s been called.</p><pre><code class="language-csharp">// Helper function to visit a statement, passing an entryState
CFSTate Visit(AstStatement stmt, CFState entryState)
{
    _entryState = entryState;
    return stmt.Visit(this);
}

CFState Visit(AstIfStatement stmt)
{
    // Would be nicer if this was a parameter, but oh well.
    var entryState = _entryState;
}</code></pre><p>It&apos;s a bit of a hacky solution, and I could update the visitor interfaces with at least one generic parameter, but I don&apos;t think it&apos;s worth it.</p><h3 id="enough-already">Enough Already!</h3><p>Ok, so I think I&apos;ve gone on enough about the visitor pattern, but just to sum up why it&apos;s used:</p><ul><li>Code for each operation on the AST can be contained to a single class.</li><li>The AST elements know nothing about the operations being performed on them (this is good).</li><li>We can add operations without touching the AST classes.</li><li>It&apos;s fast since the dispatch is through virtual methods, not slow type tests.</li><li>The operations can implement the separate statement and/or expression node visitor patterns as required.</li><li>If we add a new AST element type, we have to add the associated method to the visitor interface in order to implement its <code>Visit</code> method... and then we&apos;re forced to implement the method on all classes.  ie: it&apos;s type safe.</li></ul><p>From now on, pretty much everything the C-minor compiler does centers around the AST - and the visitor pattern is the mechanism for doing it.</p>]]></content:encoded></item><item><title><![CDATA[How to Write a Compiler #4 - The Parser]]></title><description><![CDATA[The Parser takes a stream of tokens, checks the syntax is valid and produces an Abstract Syntax Tree.]]></description><link>https://www.toptensoftware.com/blog/how-to-write-a-compiler-4-the-parser/</link><guid isPermaLink="false">65ac9d72649514000100fed5</guid><category><![CDATA[h2wac]]></category><category><![CDATA[c-minor]]></category><category><![CDATA[C#]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Wed, 07 Feb 2024 03:00:11 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2024/01/ParserBanner.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card kg-callout-card kg-callout-card-green"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><i><em class="italic" style="white-space: pre-wrap;">This post is part of the series </em></i><a href="https://www.toptensoftware.com/blog/tag/h2wac" rel="noreferrer"><i><em class="italic" style="white-space: pre-wrap;">How to Write a Compiler</em></i></a><i><em class="italic" style="white-space: pre-wrap;"> based on my experience building C-minor - a strongly typed, garbage collected, direct to in-memory machine-code scripting language.</em></i></div></div><h3 id="the-parser">The Parser</h3><img src="https://www.toptensoftware.com/blog/content/images/2024/01/ParserBanner.png" alt="How to Write a Compiler #4 - The Parser"><p>Now that we&apos;ve got a Tokenizer and an Abstract Syntax Tree we&apos;ve got everything we need to build the Parser.</p><p>The parser&apos;s job is to read the tokens from the tokenizer, check they conform to the syntax rules of the language and produce a fully populated abstract syntax tree.</p><p>Just like the AST, this falls into two main categories - statements and expressions.</p><h3 id="parsing-statements">Parsing Statements</h3><p>Parsing statements is a relatively straight forward process:</p><ol><li>If the current token is a flow control statement <code>if</code>, <code>for</code>, <code>while</code> etc... then call a dedicated function  to parse the supported syntax of each statement type.</li><li>If the current token is an open brace <code>{</code> then the statement is a code block, and the parser just recursively parses its contents into an <code>AstCodeBlock</code> statement.</li><li>Otherwise, the statement must be an expression or declaration statement - call a dedicated function for this.</li></ol><p>Here&apos;s the code from C-minor that does this:</p><pre><code class="language-csharp">AstStatement ParseStatement()
{
    // Flow control statements
    switch (_tokenizer.Token)
    {
        case Token.Keyword_break:
            return ParseBreakStatement();

        case Token.Keyword_continue:
            return ParseContinueStatement();

        case Token.Keyword_for:
            return ParseForStatement();

        case Token.Keyword_if:
            return ParseIfStatement();

        case Token.Keyword_return:
            return ParseReturnStatement();

        case Token.Keyword_switch:
            return ParseSwitchStatement();

        case Token.Keyword_do:
            return ParseDoWhileStatement();

        case Token.Keyword_while:
            return ParseWhileStatement();

        case Token.Keyword_throw:
            return ParseThrowStatement();

        case Token.Keyword_try:
            return ParseTryStatement();
    }

    // Braced statement block?
    if (_tokenizer.Token == Token.OpenBrace)
    {
        return ParseCodeBlock();
    }

    // Function declaration, variable declaration or expression statement
    return ParseExpressionOrDeclarationStatement(0);
}
</code></pre><h3 id="flow-control-statements">Flow Control Statements</h3><p>Flow control statements are parsed by following the required and optional parts of the statement and building the associated AST element.</p><p>Here&apos;s an example for the <code>if</code> statement:</p><pre><code class="language-csharp">AstIfStatement ParseIfStatement()
{
    // Create &apos;if&apos; Statement
    var stmt = new AstIfStatement(_tokenizer.TokenPosition);

    // Skip &quot;if&quot;
    _tokenizer.Next();

    // Parse the condition expression
    // &apos;(&apos; &lt;condition&gt; &apos;)&apos;
    _tokenizer.SkipToken(Token.OpenRound);
    stmt.Condition = ParseExpression();
    _tokenizer.SkipToken(Token.CloseRound);

    // Parse the true block
    stmt.TrueBlock = ParseCodeBlock();

    // If there&apos;s an else block parse it too
    if (_tokenizer.TrySkipToken(Token.Keyword_else))
        stmt.FalseBlock = ParseCodeBlock();

    // Done
    return stmt;
}
</code></pre><p>That&apos;s the entire code for parsing an <code>if</code> statement. The other flow control statements are all very similar.</p><p>The parsing code makes extensive use of helper functions provided by the tokenizer:</p><ul><li><code>SkipToken</code> - checks the current token matches a particular token and if so moves to the next token. Otherwise, it throws a <code>CodeException</code> reporting a syntax error.</li><li><code>TrySkipToken</code> - checks if the current token matches a particular token and if so moves to the next token and return true.  Otherwise, it returns false.</li><li>There are other similar functions for skipping identifiers and various unexpected token helpers.</li></ul><p>Note that at this point we&apos;ve enforced the syntax of the <code>if</code> statement but haven&apos;t checked the statement is correct.  eg: we haven&apos;t checked the condition expression evaluates to a boolean value.</p><p>Similarly, we haven&apos;t checked for statements that are only supported in certain contexts.  For example, a <code>continue</code> statement can only be used inside a loop but the parser doesn&apos;t care and will happily include it anywhere in the AST.</p><h3 id="code-blocks">Code Blocks</h3><p>A code block is a sequence of one or more statements, optionally enclosed in braces. Here&apos;s how they&apos;re parsed:</p><pre><code class="language-csharp">AstCodeBlock ParseCodeBlock()
{
    // Create a code block statement
    var block = new AstCodeBlock(_tokenizer.TokenPosition);

    // Is it a braced code block?
    if (_tokenizer.TrySkipToken(Token.OpenBrace))
    {
        // Yes, parse statements until the close brace
        block.WasBraced = true;
        while (!_tokenizer.TrySkipToken(Token.CloseBrace))
        {
            block.Statements.Add(ParseStatement());
        }
    }
    else
    {
        // Not braced, parse just a single statement
        block.Statements.Add(ParseStatement());
    }

    return block;
}
</code></pre><p>Notice how braces are handled to group a block of statements as a single statement.  This allows the same parsing code to be used for cases where braces are required vs optional. Eg:</p><ul><li>The true/false blocks of an <code>if</code> statement can be either a single statement or a braced block of statements. <code>ParseCodeBlock</code> handles either case.</li><li>The code blocks of a <code>try</code>/<code>catch</code> statement are always braced.  To enforce this the code that parses <code>try</code> statements just checks for an open brace before calling <code>ParseCodeBlock</code>.</li></ul><p>Although not strictly necessary for the abstract syntax tree, the code block remembers if the block was originally braced or not.  This is only used for reformatting the AST to text for debugging purposes and isn&apos;t used otherwise.</p><h3 id="parsing-declarations">Parsing Declarations</h3><p>Declarations are probably the trickiest part of the parser because we can&apos;t always tell whether it&apos;s a declaration or an expression until we get some way into things.</p><p>Declarations include:</p><ul><li>variables eg: <code>int x</code>;</li><li>functions eg: <code>void fn() { }</code></li><li>type declarations eg: <code>class MyClass { }</code></li></ul><p>The approach here is to:</p><ol><li>Capture the current position in the tokenizer.</li><li>Try to parse a declaration - if it succeeds, then use it and continue on.</li><li>Otherwise, rewind the tokenizer to the saved position and try again as an expression.</li></ol><pre><code class="language-csharp">AstStatement ParseExpressionOrDeclarationStatement(ParseFlags flags)
{
    // Function or variable declaration
    var decl = TryParseDeclaration(flags);
    if (decl != null)
        return decl;

    // Must be an expression statement
    var pos = _tokenizer.TokenPosition.Start;
    var expr = ParseExpression();
    if (!flags.HasFlag(ParseFlags.NoConsumeSemicolon))
        _tokenizer.SkipToken(Token.SemiColon);
    return new AstExpressionStatement(pos)
    {
        Expression = expr
    };
}
</code></pre><p><em>(the <code>NoConsumeSemicolon</code> flag is used when parsing the initializer of a <code>for</code> statement).</em></p><h3 id="parsing-expressions">Parsing Expressions </h3><p>Finally, we come to parsing expression nodes.  The main thing to get right here is the order-of-operation which is established by having the parser functions for lower order of operations call parse functions for higher order operations.</p><p>Let&apos;s take a look at the function for parsing the add and subtract binary operator.  Add and subtract have equal order of operation, are evaluated left to right and have lower order of operation than multiply and divide.</p><pre><code class="language-csharp">// Parse add/subtract expression nodes
AstExprNode ParseAddSub()
{
    // Parse the LHS
    var lhs = ParseMulDiv();

    // Parse RHS while it&apos;s a add or subtract
    while (true)
    {
        switch (_tokenizer.Token)
        {
            case Token.Plus:
            {
                // Create binary add operator
                var binOp = new AstExprNodeBinaryOp(_tokenizer.TokenPosition);
                _tokenizer.Next();
                binOp.LHS = lhs;
                binOp.RHS = ParseMulDiv();
                binOp.Operator = BinaryOperatorType.Add;

                // Binary op becomes the new LHS
                lhs = binOp;
                break;
            }

            case Token.Minus:
            {
                // Create binary subtract operator
                var binOp = new AstExprNodeBinaryOp(_tokenizer.TokenPosition);
                _tokenizer.Next();
                binOp.LHS = lhs;
                binOp.RHS = ParseMulDiv();
                binOp.Operator = BinaryOperatorType.Subtract;

                // Binary op becomes the new LHS
                lhs = binOp;
                break;
            }

            default:
                // Not an add or subtract, finish.
                return lhs;
        }
    }
}
</code></pre><p>Consider this expression:</p><pre><code>a * b + c * d</code></pre><p>and imagine the tokenizer is currently on the <code>a</code> identifier token and we&apos;ve just entered the <code>ParseAddSub</code> method:</p><ul><li>The <code>ParseMulDiv</code> function is called.  This will consume the <code>a * b</code> and return an binary expression node to multiply <code>a</code> and <code>b</code></li><li>The <code>ParseAddSub</code> function then sees the <code>Token.Plus</code> token and creates a new <code>AstExprNodeBinaryOp</code> for addition.</li><li>It sets the left-hand side to the previously parsed result from ParseMulDiv. </li><li>It then parses another mul/div expression and sets it as the right-hand side.</li><li>The new expression node is then set as the left-hand side and the whole operation is repeated for as long as there&apos;s an add or subtract token.</li></ul><p>What we end up with is an expression tree that looks like this:</p><pre><code class="language-mermaid">flowchart TD

    add[+] --&gt; mul
    add[+] --&gt; mul2
    mul[*] --&gt; a
    mul[*] --&gt; b

    mul2[*] --&gt; c
    mul2[*] --&gt; d</code></pre><p>You might be wondering how parentheses work?  Let&apos;s look at this expression:</p><pre><code>a * (b + c) * d</code></pre><p>The <code>ParseMulDiv</code> function is basically an exact copy of the <code>ParseAddSub</code> function except it calls an even higher order of operation parser for its operands.  At the very top of the order of operation the parser looks for round parentheses and recurses back into itself.</p><pre><code class="language-csharp">AstExprNode ParsePrimaryRoot()
{
    switch (_tokenizer.Token)
    {
        /* OTHER CASES OMITTED */

        // Grouped expression `(` &lt;subexpression&gt; `)`
        case Token.OpenRound:
        {
            // Skip &apos;(&apos;
            _tokenizer.Next();

            // Parse the sub-expression
            node = ParseExpression();

            // Must end with a &apos;)&apos;
            _tokenizer.SkipToken(Token.CloseRound);

            return node;
        }

        // ...

    </code></pre><p>So when <code>ParseMulDiv</code> parses its right-hand operand after parsing the <code>a</code> operand, it will get back a binary node adding <code>b + c</code> and the expression tree looks like this:</p><pre><code class="language-mermaid">flowchart TD

    mul[*] --&gt; a
    mul[*] --&gt; add
    add[+] --&gt; b
    add[+] --&gt; c

    mul2[*] --&gt; mul[*]
    mul2[*] --&gt; d</code></pre><p>It&apos;s the order that these expression parsing functions call each other that establishes order of operation.  </p><p>Here&apos;s the full set of expression parsing functions from highest to lowest order of operation (ie: each function calls the one above it to parse its operands).</p><pre><code class="language-csharp">// literals, identifiers, &apos;(&apos; &apos;)&apos;
AstExprNode ParsePrimaryRoot()

// function call `()`, member accessor `.`, postfix `++` and `--`
AstExprNode ParsePrimarySuffix()

// (type cast)
AstExprNode ParsePrimary()

// prefix `++` and `--`, negate `-`, unary plus `+`, 
// logical not `!`, one&apos;s complement `~`
AstExprNode ParseUnary()

// `*`, `/` and  `%`
AstExprNode ParseMulDiv()

// `+` and `-`
AstExprNode ParseAddSub()

// `&lt;&lt;` and `&gt;&gt;`
AstExprNode ParseShift()

// `&lt;`, `&lt;=`, `&gt;` and `&gt;=`
AstExprNode ParseRelational()

// `==` and `!=`
AstExprNode ParseEquality()

// `&amp;`
AstExprNode ParseBitwiseAnd()

// `^`
AstExprNode ParseBitwiseXor()

// `|`
AstExprNode ParseBitwiseOr()

// `&amp;&amp;`
AstExprNode ParseLogicalAnd()

// `||`
AstExprNode ParseLogicalOr()

// `? :` (conditional operator)
AstExprNode ParseTernary()

// Entry point to parsing an expression
AstExprNode ParseExpression()
</code></pre><h3 id="parser-api">Parser API</h3><p>That&apos;s most of the internal workings of the Parser but you might be wondering where it all starts?  What kicks it off?</p><p>The API to the Parser is essentially just one function <code>ParseCompilationUnit</code> - <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Syntax.Parser.ParseCompilationUnit?ref=toptensoftware.com" rel="noreferrer">see here</a>.  </p><p>A compilation unit represents an entire source code file.</p><pre><code class="language-csharp">/// &lt;summary&gt;
/// Parses a &quot;compilation unit&quot; (aka a source code file)
/// &lt;/summary&gt;
/// &lt;param name=&quot;source&quot;&gt;The &lt;see cref=&quot;CodeFile&quot;/&gt; to parse&lt;/param&gt;
/// &lt;returns&gt;The AST of the entire file&lt;/returns&gt;
public AstCompilationUnit ParseCompilationUnit(CodeFile source)
{
    // Create tokenizer
    _tokenizer = new Tokenizer(source);

    // Create compilation unit ast
    var unit = new AstCompilationUnit(_tokenizer.TokenPosition);

    // Parse all statements
    unit.Statements = ParseStatements();

    // Check nothing unexpected at the end of the file
    _tokenizer.CheckToken(Token.EOF);

    return unit;
}

</code></pre><p>Here&apos;s everything needed to parse a file:</p><pre><code class="language-csharp">try
{
    // Load code
    var codefile = CodeFile.FromFile(&quot;my-cminor-program.cm&quot;);

    // Parse it
    var parser = new Parser();
    var ast = parser.ParseCompilationUnit(codefile);

    // Work with ast...
}
catch (CodeException x)
{
    // Syntax error
    Console.WriteLine(x.Message);
}

</code></pre><p><em>(Note the parser has no internal state so the one instance can be re-used to parse multiple files).</em></p><h3 id="syntax-checked">Syntax Checked!</h3><p>The Tokenizer and Parser have now been brought together to produce an Abstract Syntax Tree.  At this point we have the entire structure of the program loaded and we know it&apos;s syntactically correct.</p><p>We&apos;re now ready to start working with the AST. To do that we&apos;ll be making extensive use of the <a href="https://en.wikipedia.org/wiki/Visitor_pattern?ref=toptensoftware.com" rel="noreferrer">visitor design pattern</a> and in the next post I&apos;ll be showing how it&apos;s implemented and used with C-minor&apos;s AST.</p><p></p><p></p><p></p>]]></content:encoded></item><item><title><![CDATA[How to Write a Compiler #3 - Abstract Syntax Trees]]></title><description><![CDATA[Ask anyone who's worked on a compiler, and they'll all agree that the Abstract Syntax Tree is the most important structure in the whole compiler.  But what exactly is it and what is it used for?]]></description><link>https://www.toptensoftware.com/blog/how-to-write-a-compiler-3-abstract-syntax-trees/</link><guid isPermaLink="false">65ab2cb0649514000100fe12</guid><category><![CDATA[h2wac]]></category><category><![CDATA[c-minor]]></category><category><![CDATA[C#]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Wed, 31 Jan 2024 03:28:45 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2024/01/ASTBanner2.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card kg-callout-card kg-callout-card-green"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><i><em class="italic" style="white-space: pre-wrap;">This post is part of the series </em></i><a href="https://www.toptensoftware.com/blog/tag/h2wac" rel="noreferrer"><i><em class="italic" style="white-space: pre-wrap;">How to Write a Compiler</em></i></a><i><em class="italic" style="white-space: pre-wrap;"> based on my experience building C-minor - a strongly typed, garbage collected, direct to in-memory machine-code scripting language.</em></i></div></div><h3 id="what-is-an-abstract-syntax-tree">What is an Abstract Syntax Tree?</h3><img src="https://www.toptensoftware.com/blog/content/images/2024/01/ASTBanner2.png" alt="How to Write a Compiler #3 - Abstract Syntax Trees"><p>An Abstract Syntax Tree (aka AST) is a tree graph of elements that represent the entire input program.</p><p>The easiest way to understand it is with a simple example.  Consider this <code>if</code> statement:</p><pre><code class="language-csharp">if (x &lt; y)
{
  print(&quot;less&quot;);
}
else
{
  return 0;
}</code></pre><p>The abstract syntax tree for this statement would look something like this:</p><pre><code class="language-mermaid">flowchart TD
    if[if statement]
    cond[&lt;]
    x[x]
    y[y]
    
    if --condition--&gt; cond
    cond --&gt; x
    cond --&gt; y

    if --trueblock--&gt; true
    true[codeblock] --statements--&gt; meth1

    if --falseblock--&gt; false
    false[codeblock] --statements--&gt; retstmt

    meth1[method call]
    meth1--LHS--&gt;id1
    meth1--parameters--&gt;p1
    p1[expr literal &apos;less&apos;]
    id1[identifier &apos;print&apos;]
    
    retstmt[return] --value--&gt; retval
    retval[expr literal 0]</code></pre><p>This AST can be read as follows:</p><ul><li>An<code>if</code> statement with three connected elements<ul><li>a condition expression</li><li>a code block of statements to execute if the condition is true</li><li>a code block of statements to execute if the condition is false</li></ul></li><li>The <code>condition</code> is a less than comparison <code>&lt;</code> operator with two operands <code>x</code> and <code>y</code></li><li>The <code>true block</code> is a method call, on the identifier <code>print</code> with a single parameter - a literal string &quot;less&quot;</li><li>The <code>false block</code> is a return statement with the literal value <code>0</code></li></ul><h3 id="statements-vs-expression-nodes">Statements vs Expression Nodes</h3><p>Everything in the AST falls under one of two main categories:</p><ul><li>Statements - control flow statements and declaration (functions, variables, types etc...).</li><li>Expression Nodes - anything that has a value (literals, function calls, operator results etc...)</li></ul><p>In C-minor these two categories are represented by the base classes <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Ast.AstStatement?ref=toptensoftware.com" rel="noreferrer"><code>AstStatement</code></a> and <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Ast.AstExprNode?ref=toptensoftware.com" rel="noreferrer"><code>AstExprNode</code></a> - both of which derived from a common base class <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Ast.AstElement?ref=toptensoftware.com" rel="noreferrer"><code>AstElement</code></a>.</p><pre><code class="language-mermaid">classDiagram
    class AstElement{
        +CodePosition Position
    }
    AstElement &lt;|-- AstStatement
    AstElement &lt;|-- AstExprNode

    AstStatement &lt;|-- all_statements
    AstExprNode &lt;|-- all_exprnodes

    class all_statements[&quot;all statement types&quot;]

    class all_exprnodes[&quot;all expression nodes types&quot;]
</code></pre><p>From <code>AstStatement</code> and <code>AstExprNode</code> there are derived classes for all the different element types - you can see the <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Ast?ref=toptensoftware.com" rel="noreferrer">full list here</a>.</p><p><em>(note that even though these classes are in a <code>Topten.Cminor.Ast</code> ok namespace I&apos;ve kept the &quot;Ast&quot; prefix because their names tend to become a bit generic without it).</em></p><h3 id="declarations-are-statements">Declarations are Statements</h3><p>As far as C-minor is concerned declarations (functions, variables, classes etc...) are also statements even though not technically statements if you take that word to mean something that executes.</p><h3 id="expression-statements">Expression Statements</h3><p>Although statements and expressions are generally different things, there&apos;s a special statement for expressions that are statements.</p><p>Consider a statement like this:</p><pre><code class="language-csharp">print(&quot;Hello World&quot;);</code></pre><p>This is an expression, but it&apos;s used as a statement. The <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Ast.AstExpressionStatement?ref=toptensoftware.com" rel="noreferrer"><code>AstExpressionStatement</code></a> class is used to represent this.</p><h3 id="order-of-operation">Order of Operation</h3><p>It&apos;s worth noting that order of operation is implicit in the AST and doesn&apos;t need to record grouping tokens like parentheses.</p><p>Consider these two expressions:</p><pre><code># expression A
x + y * z

# expression B
(x + y) * z</code></pre><p>Expression A is represented as:</p><pre><code class="language-mermaid">flowchart TD

    add --&gt; x
    add --&gt; multiply

    multiply --&gt; y
    multiply --&gt; z</code></pre><p>Expression B is represented as:</p><pre><code class="language-mermaid">flowchart TD

    add --&gt; x
    add --&gt; y

    multiply --&gt; add
    multiply --&gt; z</code></pre><p>The grouping parentheses in the original expression B is used by the parser to construct the correct tree and then discarded.</p><h3 id="syntactic-structure-little-meaning">Syntactic Structure, Little Meaning</h3><p>The AST stores the syntactic structure of the input program but by itself ascribes very little meaning to its elements.</p><p>For example:</p><ul><li>identifiers are recorded as identifier nodes but say nothing about whether the identifier represents a local variable, parameter, method or function, a class name etc...</li><li>the AST allows function and class declarations anywhere statements are allowed so there&apos;s nothing to stop the declaration of a class inside a function (even though this is not supported in C-minor)   </li><li>the AST will happily record operations on incompatible arguments (eg: string + integer) or passing an incorrect number of arguments to a function.</li></ul><p>In other words, at this stage all we&apos;re doing is capturing the syntactic structure of the program and meaning will be applied and correctness checked later as part of  &quot;semantic analysis&quot;.</p><h3 id="position-information">Position Information</h3><p>Just like every token produced by the tokenizer has a <code>CodePosition</code> describing where it came from, every element in the AST also has a position.   In later stages as we&apos;re performing checks on the AST and we find problems we&apos;ve got the position readily available to report the original location of the error.</p><h3 id="time-to-parse">Time to Parse!</h3><p>The AST is a tree of elements that describe a program&apos;s syntactic structure but not its meaning.</p><p>Each node is either a statement or an expression. Order of operation is implicit in the arrangement of nodes that make an expression. Every node in the AST has position information attached for error reporting.</p><p>The AST is central to everything else the compiler does - it&apos;ll be used for building the type system, semantic analysis, code generation and more.</p><p>Now that we have a way to represent the AST the next challenge is to build a correctly formed AST from a stream of tokens... and that&apos;s the job of the parser.</p>]]></content:encoded></item><item><title><![CDATA[How to Write a Compiler #2 - Tokenization]]></title><description><![CDATA[Tokenization is the process of reading source code and splitting it into meaningful symbols that describe the program's content. It's the first step of any compiler.]]></description><link>https://www.toptensoftware.com/blog/how-to-write-a-compiler-2-tokenization/</link><guid isPermaLink="false">65aa525a649514000100fc43</guid><category><![CDATA[h2wac]]></category><category><![CDATA[c-minor]]></category><category><![CDATA[C#]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Wed, 24 Jan 2024 03:00:58 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2024/01/TokenizationBanner.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card kg-callout-card kg-callout-card-green"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><i><em class="italic" style="white-space: pre-wrap;">This post is part of the series </em></i><a href="https://www.toptensoftware.com/blog/tag/h2wac" rel="noreferrer"><i><em class="italic" style="white-space: pre-wrap;">How to Write a Compiler</em></i></a><i><em class="italic" style="white-space: pre-wrap;"> based on my experience building C-minor - a strongly typed, garbage collected, direct to in-memory machine-code scripting language.</em></i></div></div><h3 id="what-is-a-token">What is a Token?</h3><img src="https://www.toptensoftware.com/blog/content/images/2024/01/TokenizationBanner.png" alt="How to Write a Compiler #2 - Tokenization"><p>A token represents a single indivisible piece of source code:</p><ul><li>Operators - <code>+</code>, <code>-</code>, <code>&gt;=</code>, <code>&gt;&gt;</code> etc...</li><li>Literals - numbers, strings, <code>null</code>, <code>true</code>, <code>false</code> etc...</li><li>Identifiers - name of variables, functions, classes, etc...</li><li>Keywords - identifiers with special meaning in the language <code>if</code>/<code>else</code>, <code>while</code>, <code>class</code> etc...</li></ul><p>As an example, this piece of code...</p><pre><code>// A comment
int myvar = myothervar &lt;&lt; 4;</code></pre><p>...would be tokenized into the following tokens (assuming an enumerated type <code>Token</code>):</p><ol><li><code>Token.Keyword_int</code></li><li><code>Token.Identifier</code> &quot;myvar&quot;</li><li><code>Token.Assignment</code></li><li><code>Token.Identifier</code> &quot;myothervar<code>&quot;</code></li><li><code>Token.ShiftLeft</code></li><li><code>Token.Literal</code> 4</li><li><code>Token.Semicolon</code></li></ol><p>Note that whitespace and comments aren&apos;t included in the token stream.</p><h3 id="tokenization-algorithm">Tokenization Algorithm</h3><p>The tokenization process is to skip whitespace and comments and then look at the next character to figure out the type of token and consume as many characters as necessary to produce the token.</p><ol><li>Skip any whitespace</li><li>If the next character is a <code>/</code> , skip it and...<ol><li>If the next character is a <code>*</code> then this is a comment - skip to the end of the comment and start again from step 1.</li><li>Else, if the next character is a <code>/</code> then this is a single line comment - skip to the next line and start again from step 1</li><li>Else, if the next character is a <code>=</code>, then skip it and return <code>Token.DivideAssign</code></li><li>Else, return <code>Token.Divide</code>.</li></ol></li><li>If the next few characters match an operator token (eg: <code>+</code>, <code>-</code>, <code>&lt;</code>, <code>&lt;=</code>, <code>&lt;&lt;</code> etc...) skip those characters and return the associated token.</li><li>If the next character is double or single quote, parse and store a string or character (while handling escape sequences) and return <code>Token.Literal</code> </li><li>If the next character is a digit, skip and store all characters valid for a numeric literal, convert the number to a numeric value and return <code>Token.Literal</code>.  Also consume suffix letters (<code>U</code> and <code>L</code>) for unsigned and long numbers and make sure the literal value reflects those.</li><li>If the next character is a valid first character for an identifier, read all characters that are valid identifier characters and store as a string.<ol><li>If the resulting string is a named literal (eg: <code>true</code>, <code>false</code>, <code>null</code>) return <code>Token.Literal</code>.</li><li>If the resulting string matches a keyword, return the <code>Token</code> for that keyword.</li><li>Otherwise, return <code>Token.Identifier</code>.</li></ol></li><li>Throw a syntax error exception. (see <code>CodeException</code> below)</li></ol><h3 id="the-tokenizer-class">The Tokenizer Class</h3><p>The <code>Tokenizer</code> class wraps the above algorithm. The main API to the tokenizer consists of:</p><ul><li><code>Next()</code> method - reads the next token (ie: the above algorithm)</li><li><code>Token</code> property - the current token (<a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Lexical.Token?ref=toptensoftware.com" rel="noreferrer">see here</a> for a list of tokens)</li><li><code>TokenLiteral</code> property - the value of the current<code>Token.Literal</code> token</li><li><code>TokenIdentifier</code>  property - the identifier name of the current<code>Token.Identifier</code> token</li><li><code>TokenPosition</code> property - position information for the current token (see below)</li></ul><p>The tokenizer also has utility methods that can check for particular expected token and either skip it or throw an error.  These are <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Lexical.Tokenizer?ref=toptensoftware.com" rel="noreferrer">listed here</a> and we&apos;ll see how they&apos;re used when we talk about the parser.</p><h3 id="related-classes">Related Classes</h3><p>Besides the tokenizer itself, there are a number of supporting classes - primarily for tracking the position of a tokens in the source file for later error reporting:</p>
<!--kg-card-begin: html-->
<div class="mermaid">
classDiagram
    class Tokenizer {
        +Token Token
        +String TokenIdentifier
        +Object TokenLiteral
        +CodePosition TokenPosition
        -CodeFile CodeFile
    }

    class CodeFile {
        String Filename
        String Code
        int Position
    }

    class CodePosition{
        +CodeFile CodeFile
        +int Position
        +LinePosition get_LinePosition()
    }

    class LineNumbers{
        Array~int~ lineOffsets
    }

    class LinePosition {
        +int LineNumber
        +int Offset
    }

    Tokenizer --o CodeFile 
    CodeFile --o LineNumbers
    CodePosition --o CodeFile
    CodePosition -- LinePosition
</div>
<!--kg-card-end: html-->
<h3 id></h3><h3 id="the-codefile-class">The CodeFile Class</h3><p>The tokenizer reads from the source file via the <code>CodeFile</code> class which:</p><ul><li>stores the filename of the input file</li><li>holds the entire input file as a string</li><li>maintains a current position in the file as an offset from the start of the file (ie: the current character index)</li><li>provides helper methods for skipping whitespace, extracting sub-strings etc...</li></ul><p><a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Lexical.CodeFile?ref=toptensoftware.com" rel="noreferrer">See here</a> for more about this class.</p><h3 id="file-offsets-vs-line-number-positions">File Offsets vs Line Number Positions</h3><p>A file position can be represented in one of two ways:</p><ol><li>a character offset from the start of the file, or</li><li>a line number and a character offset from the start of the line.</li></ol><p>Internally, only file offsets are used because they&apos;re cheaper to store and easier to work with.  When a file position needs to be presented to the user as part of an error message, it&apos;s converted to line number/offset.</p><p>In order to convert from a file offset to a line number the <code>CodeFile</code> class has an internal instance of the <a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.Lexical.LineNumbers?ref=toptensoftware.com" rel="noreferrer"><code>LineNumbers</code> class</a> that uses a list of line offsets and a binary search to quickly convert file offsets to line numbers.</p><p>The <code>LinePosition</code> struct encapsulates a line number and a character offset as a single value. </p><h3 id="the-codeposition-struct">The CodePosition Struct</h3><p>Every token has an association position represented by a <code>CodePosition</code> struct.  It&apos;s a struct to save on memory allocations and it stores:</p><ul><li>a reference to the <code>CodeFile</code></li><li>a file offset</li></ul><p>Given a <code>CodePosition</code> we have everything needed to construct useful position information for error messages:</p><ul><li>The name of the file (available via the <code>CodeFile</code> reference)</li><li>The line number and offset from the start of the line (by converting the file offset using the line number mapping stored in the <code>CodeFile</code>)</li></ul><h3 id="error-handling">Error Handling</h3><p>Any syntax errors encountered during tokenization and parsing cause the compilation to immediately abort by throwing a <code>CodeException</code> - an exception with an associated <code>CodePosition </code>(<a href="https://www.toptensoftware.com/cminor/ref/Topten.Cminor.CodeException?ref=toptensoftware.com" rel="noreferrer">see here</a>).</p><p>These exceptions are caught by the compiler and reported as errors using the message and position information stored in the <code>CodeException</code>.</p><p><em>(Some compilers will attempt error recovery from syntax errors. This is important if you need to work with the file&apos;s abstract syntax tree as the user is editing the file - eg: for code completion.  We don&apos;t need that functionality so the complexity of error recovery is avoided - at least for now).</em></p><h3 id="rewinding">Rewinding</h3><p>There are a couple of instances where the parser needs to look ahead in the token stream. </p><p>To support this the tokenizer has a <code>Rewind</code> method which simply resets the input position to a previously saved position and re-reads the next token.</p><h3 id="special-handling-for-interpolated-strings">Special Handling for Interpolated Strings</h3><p>Normally the tokenizer is &quot;context free&quot; in that the next token can always be determined without regard to the previously supplied tokens.</p><p>The only time this isn&#x2019;t true is with interpolated strings as these contain embedded expressions.  </p><p><em>(interpolated strings are string literals with embedded expressions such as <code>$&quot;answer {x} + {y} = {x+y}&quot;</code> where <code>x</code>, <code>y</code> and <code>x+y</code> are expressions to be evaluated and substituted into the final string)</em></p><p>You&apos;ll notice the tokenizer has methods and tokens related to interpolated strings. I&#x2019;ll cover these in a later article because it involves the tokenizer, the parser and the runtime.</p><h3 id="tokenized">Tokenized!</h3><p>There&apos;s not much more can be said about the tokenizer - it reads a stream of input characters, strips whitespace and comments and produces tokens with position information.</p><p>The next step is parsing, but first we need a structure in which to store the results of the parsing - an Abstract Syntax Tree.</p><p></p>]]></content:encoded></item><item><title><![CDATA[How to Write a Compiler #1 - Introducing C-minor]]></title><description><![CDATA[C-minor is a strongly typed, garbage collected language that compiles to in-memory machine code for direct execution.  Here's how I built it.]]></description><link>https://www.toptensoftware.com/blog/how-to-write-a-compiler-1-introducing-c-minor/</link><guid isPermaLink="false">65a8c279649514000100f760</guid><category><![CDATA[h2wac]]></category><category><![CDATA[c-minor]]></category><category><![CDATA[C#]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Wed, 17 Jan 2024 03:14:00 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2024/01/CminorBanner.png" medium="image"/><content:encoded><![CDATA[<div class="kg-card kg-callout-card kg-callout-card-green"><div class="kg-callout-emoji">&#x1F4A1;</div><div class="kg-callout-text"><i><em class="italic" style="white-space: pre-wrap;">This post is part of the series </em></i><a href="https://www.toptensoftware.com/blog/tag/h2wac" rel="noreferrer"><i><em class="italic" style="white-space: pre-wrap;">How to Write a Compiler</em></i></a><i><em class="italic" style="white-space: pre-wrap;"> based on my experience building C-minor - a strongly typed, garbage collected, direct to in-memory machine-code scripting language.</em></i></div></div><img src="https://www.toptensoftware.com/blog/content/images/2024/01/CminorBanner.png" alt="How to Write a Compiler #1 - Introducing C-minor"><p>Like many developers I&apos;ve often thought about writing a compiler for a programming language.  Recently I had reason enough to look into making a proof-of-concept scripting language for my music software <a href="https://www.cantabilesoftware.com/?ref=toptensoftware.com" rel="noreferrer">Cantabile</a>.</p><p>Although just a proof-of-concept I&apos;ve learned a lot and thought it might be worth sharing what I&apos;ve learned about &quot;How to Write a Compiler&quot;.</p><p>First up though, what is this language?</p><h3 id="introducing-c-minor">Introducing C-minor</h3><p>C-minor is a strongly typed, garbage collected language that compiles to in-memory machine code for direct execution.</p><ul><li>The compiler is written in C#.</li><li>The supporting runtime library and garbage collector are written in plain C.</li><li>The language is heavily influenced by C#.</li><li>It has a focus on speed - which is why it&apos;s strongly typed and compiled to machine code instead of interpreted.</li><li>It&apos;s strictly single threaded - as a feature.</li><li>The name &quot;C-minor&quot; was chosen to reflect its musical connection to Cantabile and its C-style syntax.  The word &quot;minor&quot; suggests this is a language for scripting and automation and not for building fully-fledged applications.</li></ul><p><em>(Although influenced by C# it obviously doesn&apos;t have most of C#&apos;s features - but anything it does has been shamelessly stolen).</em></p><h3 id="just-how-proof-of-concepty-is-it">Just how &quot;Proof-of-Concepty&quot; is it?</h3><p>Very! It&apos;s not useful for anything other than as a proof-of-concept.  </p><p>That said the initial goal for this project was just to get something working and it is &quot;working&quot;:</p><ul><li>Primitive data types - <code>bool</code>, <code>char</code>, <code>sbyte</code>, <code>byte</code>, <code>short</code>, <code>ushort</code>, <code>int</code>, <code>uint</code>, <code>long</code>, <code>ulong</code>, <code>float</code>,  <code>double</code> and <code>string</code>. </li><li>All the standard math, logical, relational and bitwise operators for the built in types.</li><li>Explicit and implicit type casting of primitive data types.</li><li>Functions with simple parameters and local variables.</li><li>Function overloading and overload resolution.</li><li>Flow control statements - <code>if/else</code>, <code>while</code>, <code>do-while</code>, <code>for</code>, <code>switch</code> .</li><li>Support for exceptions (currently by throwing a string since no class support), along with <code>try</code>/<code>catch</code>/<code>finally</code> blocks.</li><li>Interpolated strings and numeric formatting.</li><li>An incremental &quot;in-series&quot; garbage collector.</li><li>A few library functions - just enough to run the test cases (eg: <code>string.Substring</code>, <code>Console.Write/WriteLine</code>).</li></ul><p>Other things of note:</p><ul><li>The compiler generates C code that is then compiled to in-memory machine code using Tiny-C.  I might do a direct native code generator at a later date.</li><li>There&apos;s a front-end command line program to run C-minor source files and test cases.</li><li>The front-end can either run C-minor programs directly, produce C code or produce .exe files.</li></ul><h3 id="an-example-program">An Example Program</h3><p>Here&apos;s an example program:</p><pre><code class="language-csharp">void main()
{
  for (int i = 0; i &lt; 3; i++)
  {
    Console.WriteLine($&quot;#{i+1}: Hello World from C-minor&quot;);
  }
}</code></pre><p>and its output:</p><pre><code class="language-txt">#1: Hello World from C-minor
#2: Hello World from C-minor
#3: Hello World from C-minor
</code></pre><p><em>(I&apos;d like to get rid of that <code>main</code> and have true top-level statements - I&apos;ll explain why I haven&apos;t in a later post).</em></p><p>Here&apos;s a more interactive look at it in action:</p><figure class="kg-card kg-embed-card"><iframe width="200" height="150" src="https://www.youtube.com/embed/BHZj_Y3cXUw?feature=oembed" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen title="Introducing C minor"></iframe></figure><h3 id="whats-next">What&apos;s Next?</h3><p>I&apos;m going to chip away at this as a side project to Cantabile - partly because I&apos;m enjoying the challenge and partly because it might get to the point of being useful.</p><p>In the meantime, I&apos;m going to write some articles that go pretty deep on how it works and hope to cover everything from tokenization, parsing and semantic and control flow analysis through to code generation, the runtime and the garbage collector.</p><p>I&apos;ll also be touching on some interesting design patterns (in particular the much under-rated visitor pattern), some Big-O analysis and an interesting approach to testing.</p><p>In other words, this won&apos;t be hand-waving about abstract concepts - this will be everything you need to know to write your own compiler - or at least have a better understanding of how one works.</p><p>If you&apos;ve ever been curious about how a modern compiler works (and who hasn&apos;t) then I think you&apos;ll enjoy these upcoming posts.  </p><p><em>It always starts with &quot;Hello World&quot; and who knows where it goes from there.</em></p>]]></content:encoded></item><item><title><![CDATA[The Microsoft Surface Laptop 4 Has A Fatal Usability Flaw]]></title><description><![CDATA[<p>Recently I purchased a Microsoft Surface Laptop 4 to replace my aging MacBook Pro. &#xA0;The Surface is a great machine: excellent keyboard and trackpad, beautiful display. It&apos;s slim, lightweight, powerful, just enough storage and has good battery life.</p><p>But what a pity that all of that is</p>]]></description><link>https://www.toptensoftware.com/blog/the-surface-laptop-4-has-a-fatal-usability-flaw/</link><guid isPermaLink="false">65a60a2f649514000100f3e9</guid><category><![CDATA[surface]]></category><category><![CDATA[laptop]]></category><category><![CDATA[microsoft]]></category><category><![CDATA[Windows]]></category><category><![CDATA[keyboard]]></category><category><![CDATA[fnkey]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Sat, 02 Jul 2022 01:32:37 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2022/07/banner-1.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2022/07/banner-1.png" alt="The Microsoft Surface Laptop 4 Has A Fatal Usability Flaw"><p>Recently I purchased a Microsoft Surface Laptop 4 to replace my aging MacBook Pro. &#xA0;The Surface is a great machine: excellent keyboard and trackpad, beautiful display. It&apos;s slim, lightweight, powerful, just enough storage and has good battery life.</p><p>But what a pity that all of that is ruined by one stupid design decision - the behaviour of the Fn key is completely intolerable and drives me to distraction.</p><h3 id="the-fn-key">The Fn Key</h3><p>The Surface Laptop 4&apos;s Fn key is both a toggle (like capslock) and a modifier (like shift):</p><ul><li>If you press and release the Fn key without pressing another key it toggles - the light toggles on/off and the behaviour of the F1-F12 keys inverts. &#xA0;However...</li><li>If you press and hold the Fn key while using the arrow keys, it makes them work as Page Up/Down and Home/End.</li></ul><p>Now that seems like a clever design decision to get double use out of a single key but in real-world use it makes for a terrible experience.</p><h3 id="developers-and-keyboards">Developers and Keyboards</h3><p>I&apos;m a software developer and make heavy use of the keyboard. &#xA0;In particular I&apos;m constantly using the Fn+arrow keys to navigate around in code. I also make heavy use of the F1-F12 keys for building, debugging, browsing etc...</p><p>And here&apos;s where the problem starts. If you&apos;re constantly using the Fn+arrows for navigation it&apos;s <em>incredibly easy</em> to press/release the Fn key without another key and all of a sudden <em>the behaviour of all the function keys has flipped</em>.</p><p>It seems like such a minor thing but it completely kills productivity. Instead of being able to focus on what you&apos;re working on you end up constantly distracted by the keyboard not behaving as you expect it to. </p><p>In other words, the behaviour of the F1-F12 keys becomes random coin flip.</p><ul><li>Press F3 and it might jump to the next search match or it might increase the volume.</li><li>Press F7 to start a build, but no the screen gets brighter.</li><li>Press F8 to step in the debugger, nope that takes a screen shot.</li><li>Press F10 to step-over in the debugger, why didn&apos;t it step? And why did the cursor move to the end of the line?</li><li>Press F12 to browse to a symbol definition - that doesn&apos;t look right. &#xA0;Oh, that&apos;s because it did a page down instead. OMG!</li></ul><p>But it gets worse. Once you&apos;ve been tricked by the Fn key trap, you can&apos;t just toggle it and try again. &#xA0;No, you also need to undo what you just did, re-adjust the volume or screen brightness or navigate back to where you were, then you can toggle the Fn key and then you can try again - by which time you&apos;ve probably forgotten what you were trying to do in the first place (mostly due to the rising rising frustration levels).</p><p>Argh! &#xA0;I hate you Fn key!</p><h3 id="this-is-not-a-muscle-memory-issue">This is not a Muscle Memory Issue</h3><p>At first I thought this might be just a muscle memory issue. &#xA0;I mean I get it, moving between Mac and PC is always painful when it comes to Ctrl, Alt and Options/Windows keys. &#xA0;But I&apos;ve been through that multiple times and you learn and adapt reasonably quickly.</p><p>But this isn&apos;t like that. &#xA0;Muscle memory isn&apos;t going to learn that if you press the Fn key you absolutely must press another key to avoid toggling the function keys.</p><p>And it&apos;s easy to do: you press the Fn key in anticipation of pressing an arrow and there&apos;s no going back - you have to press another key to avoid the toggle. It&apos;s nuts.</p><h3 id="this-is-worse-than-the-stupid-power-button">This is Worse than the Stupid Power Button</h3><p>The Surface Laptop 4 has another annoying keyboard antic - accidentally bump the power button and the machine will go to sleep. &#xA0;</p><p>Conveniently (not) the power button is exactly where the F12 key is on my MacBook so I&apos;ve accidentally hit it multiple times already. But muscle memory will kick in eventually and this will become less of an issue over time. &#xA0;Also the machine wakes quickly and face recognition for login means I can back pretty quickly.</p><p><em>(That said, why not make it a long press on the power button to put the machine to sleep? Who needs to be able to sleep their machine at the touch of button - especially when you can just close the lid?)</em></p><p>But think about that... accidentally putting the machine to sleep is less annoying and easier to deal with than randomly toggling the function keys.</p><h3 id="one-possible-saving-grace">One Possible Saving Grace</h3><p>The only possible saving grace here is that at the moment, I might be accidentally hitting the Fn key more often than I will be once I adjust to its different position. &#xA0;</p><p>The Fn and Ctrl keys on the Surface are swapped compared to the MacBook and I&apos;m still adjusting to that. &#xA0;Eventually, as I get used to it, this issue should become less frequent but it certainly won&apos;t go away completely.</p><h3 id="there-is-no-fix">There Is No Fix</h3><p>Now you&apos;d think there&apos;d be some work around for this right?</p><p>Well if you search you&apos;ll find suggestions like pressing Fn+CapsLock or pressing and holding Fn for 10 seconds. &#xA0;Oh how I wish these ideas worked - they might on some Surface models, but they absolutely do not work on a Surface Laptop 4. &#xA0;</p><p><em>(Microsoft Support: stop suggesting these ideas - they don&apos;t work, you should know that)</em></p><p>There is no firmware setting for this.</p><p>What&apos;s worse, the Fn key appears to be completely implemented in firmware. Windows doesn&apos;t see the Fn key presses so this can&apos;t be worked around in software. &#xA0;No amount of key intercepting or key remapping can fix this.</p><p>So until Microsoft decide to fix this (presumably with a firmware update, assuming they decide to and that it&apos;s even possible) this is a problem that anyone with a Surface Laptop 4 is stuck with.</p><h3 id="please-please-please-microsoft-">Please, please, please Microsoft...</h3><p>Microsoft you have an excellent machine here but:</p><blockquote>I hate, hate, hate that stupid f***ing Fn key.</blockquote><p>Please fix it.</p><p><em>(And while you&apos;re at it, long press for the power button and Fn+Backspace for delete would be the icing on the cake).</em></p>]]></content:encoded></item><item><title><![CDATA[Booting Multiple Windows Installations from a Third-Party Boot Manager]]></title><description><![CDATA[Configuring the Windows boot manager so third-party boot loaders can directly load different versions of Windows.]]></description><link>https://www.toptensoftware.com/blog/booting-multiple-windows-with-refind/</link><guid isPermaLink="false">65a60a2f649514000100f3e8</guid><category><![CDATA[Windows]]></category><category><![CDATA[boot]]></category><category><![CDATA[rEFInd]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Tue, 02 Nov 2021 23:33:37 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2021/11/refindbanner.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2021/11/refindbanner.png" alt="Booting Multiple Windows Installations from a Third-Party Boot Manager"><p>When you install multiple copies of Windows on machine, Windows updates it&apos;s boot manager configuration so that at boot time you can choose which version you want to run. &#xA0;This works fine if you&apos;re only running Windows...</p><p>If you&apos;re using a third-party boot manager like Grub or rEFInd it&apos;s less than ideal because it becomes a two-step process - first you need to choose Windows Boot Manager and then from the Windows boot manager choose which version of Windows.</p><p>But wait, it&apos;s even worse than that. &#xA0;If you choose the non-default Windows installation the Windows boot manager reboots the machine before launching the selected operating system... so you need to go through your first boot loader again.</p><p>On my machine, getting into Windows 10 went like this:</p><ol><li>Machine boots, rEFInd menu appears with Windows and Ubuntu options</li><li>Choose Windows</li><li>Windows boot menu appears, with Windows 10 and Windows 11</li><li>Choose Windows 10</li><li>Machine reboots and goes back to the rEFInd menu again</li><li>Choose Windows again</li><li>Windows 10 boots.</li></ol><p>What&apos;s needed is a way to split up the two Windows entries so other boot loaders can see them as separate operating systems.</p><h3 id="a-little-background">A Little Background</h3><p>On EFI systems there&apos;s a special partition called the EFI System Partition (aka the ESP). This FAT32 formatted partition stores the boot loaders for all the installed operating systems. </p><p>On a Windows machine you can get to the ESP with the following command from an Administrator command prompt.</p><p><code>&gt; mountvol b: /s</code></p><p>If you inspect the folder <code>b:\EFI</code> you&apos;ll see folders for each of the installed operating systems and boot tools. &#xA0;</p><p>The Windows boot loader is in the folder <code>b:\EFI\Microsoft\Boot</code> . &#xA0;Inside that folder is a file <code>BCD</code> that stores the boot configurations for the installed versions of Windows. </p><p>The problem is that third party bootloaders typically create menu entries based on the folders found in the <code>EFI</code> directory. &#xA0;Because there&apos;s only one folder for Windows only one menu item appears. &#xA0;The trick is to create two separate folders for Windows - each booting a particular version.</p><p><em>Note that all this only applies if the second installation of Windows found the original ESP and updated it. &#xA0;If you disconnect the original ESP drive when you install the second version of Windows it&apos;ll create a new ESP on the installation drive and your boot manager should find both when you reconnect the original drive. &#xA0;In this case you&apos;ll automatically get two menu entries.</em></p><h3 id="the-fix">The Fix</h3><p>The basic idea here is to create two copies of the Windows boot loader folder and edit the contained BCD file in each to only reference one installation of Windows instead of both.</p><p><em>The following assumes you&apos;re using rEFInd and I&apos;ve only tested it with rEFInd. The same approach should work with other boot managers but might require additional configuration of the boot manager.</em></p><p>Assuming you have Windows 10 and Windows 11 installed, here&apos;s how you can split the Windows boot loader:</p><ol><li>Make sure you know what you&apos;re doing. &#xA0;If you mess up the ESP you can end up with an unbootable machine. &#xA0;Be careful.</li><li>Mount the ESP as described above</li><li>Create two copies of the <code>b:\Microsoft\Boot</code> folder as <code>b:\EFI\win10</code> and <code>b:\EFI\win11</code>. eg:<br><code>xcopy /s b:\EFI\Microsoft\Boot b:\EFI\win10</code><br><code>xcopy /s b:\EFI\Microsoft\Boot b:\EFI\win11</code><br>(Note these folders must be in the <code>\EFI</code> folder, and not in the <code>\EFI\Microsoft</code> folder otherwise rEFInd won&apos;t find them). </li><li>List out the contents of the BCD file with this command: <br><code>bcdedit /store b:\EFI\win10\BCD /enum</code>. &#xA0;<br>You should see three entries: one titled &quot;Windows Boot Manager&quot; and two titled &quot;Windows Boot Loader&quot; (one for each installed version of Windows). Note of the <code>identifier</code> value listed under each boot loader entry - you&apos;ll need these for the next two steps.</li><li>Remove the Windows 10 entry from the Windows 11 folder:<br> <code>bcdedit /store b:\EFI\win11\BCD /delete {ID_FOR_WIN10}</code>.</li><li>Vice-versa for the other OS: <br> <code>bcdedit /store b:\EFI\win10\BCD /delete {ID_FOR_WIN11}</code>.</li></ol><p>You&apos;ll now have two new, but separate boot folders for each version of Windows. Since each folder is now only configured for just one operating system the Windows boot loader won&apos;t prompt to select a version.</p><p>Reboot your machine and check that rEFInd has two new entries for Windows 10 and Windows 11 and check they both boot directly into the appropriate OS.</p><h3 id="finishing-touches">Finishing Touches</h3><p>The rEFInd menu will probably now have a total of three entries for Windows - the original one and the two you just created. &#xA0;To hide the original Windows boot manager just select it and press Delete. &#xA0;You can get it back through the hidden items menu.</p><p>Finally, if you&apos;re after a nice clean theme for rEFInd here&apos;s one I put together that includes an updated icon for Windows 11&apos;s new logo. (<a href="https://github.com/toptensoftware/brads-refind-theme?ref=toptensoftware.com">get it here</a>).</p><figure class="kg-card kg-image-card"><img src="https://www.toptensoftware.com/blog/content/images/2021/11/screenshot2.png" class="kg-image" alt="Booting Multiple Windows Installations from a Third-Party Boot Manager" loading="lazy" width="2000" height="1250" srcset="https://www.toptensoftware.com/blog/content/images/size/w600/2021/11/screenshot2.png 600w, https://www.toptensoftware.com/blog/content/images/size/w1000/2021/11/screenshot2.png 1000w, https://www.toptensoftware.com/blog/content/images/size/w1600/2021/11/screenshot2.png 1600w, https://www.toptensoftware.com/blog/content/images/size/w2400/2021/11/screenshot2.png 2400w" sizes="(min-width: 720px) 720px"></figure>]]></content:encoded></item><item><title><![CDATA[How it Works - nvpatch]]></title><description><![CDATA[nvpatch is a command line utility to patch Windows x64 applications to enable NVidia and AMD discreet graphics GPUs on low power devices.]]></description><link>https://www.toptensoftware.com/blog/nvpatch-how-it-works/</link><guid isPermaLink="false">65a60a2f649514000100f3e7</guid><category><![CDATA[C#]]></category><category><![CDATA[PE FIle Format]]></category><category><![CDATA[nvpatch]]></category><category><![CDATA[NVidia]]></category><category><![CDATA[AMD]]></category><category><![CDATA[Games]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Tue, 31 Aug 2021 02:27:16 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2021/08/nvpatch.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2021/08/nvpatch.png" alt="How it Works - nvpatch"><p><code>nvpatch</code> is a command line utility that patches Windows x86 and x64 .exe files to include the export symbols required for some machines to enable their discreet GPU.  </p><p>Although called &quot;<code><strong>nv</strong>patch</code>&quot; it works for both NVidia and AMD GPUs with appropriate drivers installed.  It was originally written to be used with <a href="https://sectorsedge.com/?ref=toptensoftware.com">Sector&apos;s Edge</a>, a game on which my son <a href="https://twitter.com/vercidium?ref=toptensoftware.com">Mitch</a> is the lead developer.</p><p>This article explains how it works.  If you just want to get it and use it, then <a href="https://github.com/toptensoftware/nvpatch?ref=toptensoftware.com">see here</a> for instructions.</p><h3 id="background">Background</h3><p><a href="https://docs.nvidia.com/gameworks/content/technologies/desktop/optimus.htm?ref=toptensoftware.com">NVidia Optimus</a> and <a href="https://www.amd.com/en/technologies/enduro?ref=toptensoftware.com">AMD Enduro</a> are technologies that switch GPU behaviour between low power and high performance modes.  One of the ways the drivers determine which mode to run is to look for special symbols exported from the main .exe of the process.</p><p>Typically graphics intensive applications like games will have these symbols included in their .exe files to enabling switching to the high performance GPU mode.</p><p>The NVidia drivers look for an export symbol named <code>NvOptimusEnablement</code> while the AMD drivers look for <code>AmdPowerXpressRequestHighPerformance</code>.  In both cases the exported symbol refers to a 32-bit integer where a value of 1 indicates high performance mode should be enabled.</p><p>While these symbols are trivial to add to C and C++ program, for other languages it&apos;s more difficult.  In C# for example, it&apos;s not possible to export native symbols. One solution is <a href="http://lucasmagder.com/blog/2014/05/exporting-data-symbols-in-c-for-nvidia-optimus/?ref=toptensoftware.com">described here</a> by Lucas Magder where he decompiles the exe IL, patches the IL to make an export symbol, recompiles and then patches the exported data at runtime. </p><p>That approach works in previous versions of .NET but with .NET 5 the executable is actually a stub program that launches the .NET runtime and then loads your program from an associated assembly dll - and the export symbols need to be on the .exe and not the .dll.  In other words the .exe is produced by the Microsoft toolchain and your program has no influence over its content.</p><p>There&apos;s a couple of options here:</p><ol><li>Petition Microsoft (and perhaps other language vendors) to add support for this in their stub .exe files.</li><li>Petition NVidia and AMD to provide alternative mechanisms to enable these features</li><li>Work out how to build the .NET stub executable and add those symbols</li><li>Get creative and patch the exe to add these symbols</li></ol><p>I&apos;m not going to hold my breath for options 1 and 2.  Option 3 is the most technically correct but seems fragile and probably complicated to setup a build environment for such a trivial change.</p><p>Option 4 sounds complicated but it&apos;s actually not that hard and a fun dive in the PE .exe file format.</p><h3 id="pe-file-format-overview">PE File Format Overview</h3><p>Windows .exe files are in the Portable Executable (aka &quot;PE&quot;) file format.  While the <a href="https://docs.microsoft.com/en-us/windows/win32/debug/pe-format?ref=toptensoftware.com">Microsoft documentation on the PE Format</a> is a very long document, we only need to understand a few basic concepts and one section in detail in order to patch in these new symbols.</p><p>The basic format of a PE file is simply a bunch of headers followed by a bunch of sections.  The headers provide important information for the Windows loader (as well as pointers to significant data in the sections), while the sections contain the actual data and code that&apos;s loaded into memory by the loader.</p><p>In order to add new symbols we&apos;re interested in the <a href="https://docs.microsoft.com/en-us/windows/win32/debug/pe-format?ref=toptensoftware.com#the-edata-section-image-only">Export Table</a>.   While the documentation suggests that the export table lives in a special &quot;.edata&quot; section, in practice it can reside in any section and will often be found in the &quot;.rdata&quot; section along with other initialized read-only data. To find the export table, we can&apos;t just look for the &quot;.edata&quot; section. Instead, in among the headers is a data directory entry that points to the export table - in whatever section in happens to live.</p><h3 id="how-it-works">How It Works</h3><p>So this is how nvpatch does its tricks:</p><ol><li>Loads the .exe and locates all the various headers and sections</li><li>Finds the export table (if it exists) and parses it into its own set of data structures so it can be manipulated</li><li>Adds the new symbols to the parsed export table</li><li>Re-writes the modified export table to a new section at the end of the file called &quot;.nvpatch&quot; along with the <code>0x00000001</code> data constants that the symbols point to</li><li>Inserts a new section header in the headers area that points to the new .nvpatch section</li><li>Updates the Export Table data directory entry (in the header area) to point to the new export table (in the .nvpatch section)</li><li>Updates various sizes and counts in the headers so everything checks out</li><li>Rewrites all the changes as new .exe file</li></ol><p>Note that if the exe had an existing Export Table (many exe files don&apos;t) it&apos;s left in the file - but nothing points to it so it&apos;s ignored.  The reason the existing table isn&apos;t updated is there&apos;s no guarantee there will be room to extend the existing table without overwriting other important data.  It&apos;s simpler and safer to just rewrite the table to a new section.</p><p>Note too, that by good luck there is almost always room at the end of the existing section header table to insert a new section header. That&apos;s because the section headers are the last of the headers and the actual section data that follows is typically aligned to 512 byte boundaries meaning there&apos;s usually room there.  If by bad luck there isn&apos;t room, nvpatch will fail.</p><h3 id="about-the-code">About the Code</h3><p>nvpatch is written in C#.  It could have been written in any language really, but C# means it can be easily packaged up as a dotnet tool for publishing.</p><p>The code is not at all shy about using unsafe code and pointers.  In fact much of the code looks more like C.</p><ul><li>PEFile - the main class the reads and writes the .exe files and provides helpers for adding new sections</li><li>PEExportTable - the class responsible for reading and writing the Export Table</li><li>PESectionBuilder - helper class for adding new sections</li><li>PEStructs.cs  - C# definitions of various structures used in the PE format</li><li>Utils.cs - various utilities functions</li><li>Program.cs - logic to actually patch the exe</li></ul><p>Just to explain the PEFile class a little more, it works like this:</p><ol><li>Reads the entire .exe into a <code>byte</code> array</li><li>Pins the array and gets a fixed pointer to its content</li><li>The various header addresses are calculated and available as direct pointers into the loaded byte array as <code>PEFile</code> properties.  These pointers are used to directly read and manipulate data in the headers.</li><li>The <code>AddSection</code> method creates a new <code>PESectionBuilder</code> instance that has a <code>MemoryStream</code> into which the new section content can be written.</li><li>When the file is rewritten, the originally loaded byte array (which has now be modified) is written first, followed the contents the memory stream of the <code>PESectionBuilder</code> of any new sections.</li></ol><h3 id="limitations">Limitations</h3><p>In the interest of expediency (aka laziness) I&apos;ve taken a few shortcuts that introduce a couple of limitations:</p><ul>
<li><strike>Only the PE32+ file format (as used by x64) is supported.  The PE32 (without the plus) format typically used by x86 executables isn&apos;t supported and the tool will fail with an error message.  It should be reasonably easy to add support for this, but I just haven&apos;t bothered.</strike> x86 support has now been added (12 Jan 2026), but has had minimal testing.  Let me know if you find issues.</li>
<li>If there&apos;s no room for a new section header it will fail with an error message.</li>
<li>The export table reading doesn&apos;t handle forwarding exports.  It doesn&apos;t even try to detect nor warn about them.</li>
</ul>
<p>That&apos;s it!</p>]]></content:encoded></item><item><title><![CDATA[Fast Bit Flag Boolean Expressions]]></title><description><![CDATA[An algorithm for converting boolean bit flag expressions to fast bit mask and test operations and even faster execution via dynamic IL method generation.]]></description><link>https://www.toptensoftware.com/blog/bitmask-expressions/</link><guid isPermaLink="false">65a60a2f649514000100f3e6</guid><category><![CDATA[C#]]></category><category><![CDATA[boolean logic]]></category><category><![CDATA[bitwise operators]]></category><category><![CDATA[optimization]]></category><category><![CDATA[ILGenerator]]></category><category><![CDATA[MSIL]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Sat, 28 Aug 2021 08:01:10 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2021/08/BitmaskBanner.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2021/08/BitmaskBanner.png" alt="Fast Bit Flag Boolean Expressions"><p>Recently I&apos;ve been working on a new theming engine for my <a href="https://www.cantabilesoftware.com/?ref=toptensoftware.com">music app Cantabile</a> and ran across the need for an algorithm that I&apos;ve not seen discussed in computer science circles before.</p><p>The theming engine is built around a custom language called GTL (GuiKit Theming Language) that has a feature where different visual representations for a UI element can be specified based on the current state of that element.</p><p>For example, you can specify different colors for a control based on the control&apos;s current state:</p><pre><code>BackgroundColor:
{
    Checked|Pressed: Color.Blue,
    Checked: Color.LightBlue,
    Else: Color.Black,
}</code></pre><p>Currently GTL expects these states to be specified as a bitmask and evaluates them to true when <code>(ControlState &amp; BitMask) == BitMask</code>. This is fine for many cases but is limited in that you can only specify a set of bits that must be set. What if I want to test that one of two bits are set, or that a bit isn&apos;t set?</p><p>What&apos;s really needed here are boolean expressions, or more specifically &quot;Bit Flag Boolean Expressions&quot;. &#xA0;That is, expressions that use typical C style boolean operators ( <code>&amp;&amp;</code> , <code>||</code> , and <code>!</code> ), but work on the individual bits in an input word.</p><ul><li><code>Checked &amp;&amp; Pressed</code> </li><li><code>Focused &amp;&amp; !Disabled</code> </li><li><code>Focused || Selected</code> </li></ul><p>Besides being more flexible in the types of conditions that can be expressed, boolean expressions are also a more intuitive way to express these conditions. eg: <code>Checked | Pressed</code> reads as &quot;checked <em><u>or</u></em> pressed&quot;, but the condition we&apos;re actually trying to describe is &quot;checked <em><u>and</u></em> pressed&quot;.</p><p>While it&apos;s pretty easy to parse and evaluate a boolean expression walking and evaluating an expression tree is going to be slower than a simple bit mask and test where multiple boolean operations can often be reduced to a single operation. </p><p>But what if we could convert the expression tree to a series of bit mask and test operations automatically? So, the question is:</p><blockquote>How do we convert an abstract syntax tree for a boolean expression into an optimal set of mask and test operations?</blockquote><p>I <a href="https://stackoverflow.com/questions/68949922/converting-a-boolean-expression-tree-to-a-series-of-bitwise-operations?ref=toptensoftware.com">posted the question to StackOverflow</a>, but ended up finding a good solution myself and thought it was worth writing up.</p><h3 id="tokenizing-and-parsing">Tokenizing and Parsing</h3><p>I&apos;ve written about tokenizing and parsing expressions into abstract syntax trees before so I won&apos;t cover it again here. &#xA0;If this is new to you, see <a href="https://www.toptensoftware.com/blog/writing-a-simple-math-expression-engine-in-csharp/">this article</a>.</p><p>For this algorithm to work properly the AST needs to represent boolean AND and OR operators as multi-input nodes rather than a chain of binary nodes as is often seen in expression syntax trees.</p><p>Since these operations are commutative (ie: can be evaluated in any order) making them all inputs to a single node provides more opportunities to re-order and coalesce boolean operations into a wider bitmask operation.</p><p>For an existing AST/parser implementation this might involve:</p><ol><li>Updating the AST nodes for boolean AND and OR to accept multiple inputs</li><li>Updating the parser to parse chains of AND and OR operations into a single node</li><li>Updating the parser to simplify the AST so that it promotes nodes of the same type into their parent node. &#xA0;eg: most parsers would produce two binary AND nodes for this expression: <code>A &amp;&amp; (B &amp;&amp; C)</code> but ideally this should be flattened to one three input node: <code>A &amp;&amp; B &amp;&amp; C</code>.</li></ol><h3 id="the-algorithm">The Algorithm</h3><p>Now that we&apos;ve got an AST tree in suitable format, lets think about the algorithm. Most of these boolean operations can be reduced to one of two forms of bitmask operations:</p><ul><li><code>(input &amp; mask) == testValue</code></li><li><code>(input &amp; mask) != testValue</code></li></ul><p>The first form which I call <code>MaskEqual</code> is used for boolean AND operations. eg: the boolean expression <code>A &amp;&amp; B &amp;&amp; !C</code> becomes <code>(input &amp; (A|B|C)) == (A|B)</code></p><p>The second form which I call <code>MaskNotEqual</code> is used for boolean OR operations. eg: the boolean expression <code>A || B || C</code> becomes <code>(input &amp; (A|B|C)) != 0</code>.</p><p><em>Note: throughout this article and in the code base I use the single capital letter symbols such that <code>A</code> = <code>0x01</code>, <code>B</code> = <code>0x02</code>, <code>C</code> = <code>0x04</code> etc... &#xA0;So <code>A &amp;&amp; B &amp;&amp; !C</code> can also be expressed in bitwise form as <code>(input &amp; 0x07) == 0x03</code>.</em></p><p>Next we have expressions that always evaluate to <code>true</code> or <code>false</code>:</p><ul><li><code>A &amp;&amp; !A</code> is always <code>false</code></li><li><code>A || !A</code> is always <code>true</code></li></ul><p>And finally, we have expressions that can&apos;t be reduced to a single bitwise operation. &#xA0;eg: &#xA0;the best we can do with this:</p><p><code>A &amp;&amp; (B || C)</code> </p><p>is this:</p><p><code>((input &amp; 0x1) == 0x1) &amp;&amp; ((input &amp; 0x6) != 0x0)</code></p><p>For those cases we have <code>EvalAnd</code>, <code>EvalOr</code> and <code>EvalNot</code>.</p><p>Based on the above we need to calculate an execution plan for every node in the tree where each plan will be one of the following types:</p><pre><code class="language-csharp">enum ExecPlanKind
{
    True,
    False,
    MaskEqual,
    MaskNotEqual,
    EvalAnd,
    EvalOr,
    EvalNot,
}
</code></pre><p>Each execution plan node will also have additional data depending on its type:</p><ul><li><code>True</code> / <code>False</code> - no other data</li><li><code>MaskEqual</code> / <code>MaskNoteEqual</code> - an associated mask and test value</li><li><code>EvalAnd</code> / <code>EvalOr</code> / <code>EvalNot</code> - a set of input execution plan nodes</li></ul><p>The goal of the algorithm is walk the AST and build a tree of execution plan nodes that minimizes the number of <code>EvalAnd</code>, <code>EvalOr</code> and <code>EvalNot</code> operations - since these require extra steps to execute.</p><p>One thing to note about the <code>MaskEqual</code> and <code>MaskNotEqual</code> execution plans is that if the mask has only a single bit set then it&apos;s possible to switch the plan from one kind to another by flipping the unmasked bits in the test value and changing the plan kind.</p><p>eg: <code>(value &amp; 0x01) == 0x01</code> is the same as <code>(value &amp; 0x01) != 0x00</code>. </p><p>If more that one mask bit is set then the plan <em>can&apos;t</em> be converted to the other kind. </p><p>Next, let&apos;s think about how to translate each node in the AST into an execution plan node.</p><h3 id="the-identifier-node">The Identifier Node</h3><p>The identifier node represents a name in the expression (eg: A, B, C etc...) and has an associated bit that it represents in the input value. &#xA0;The execution plan for this kind of node is trivial:</p><ul><li>Kind:<code>ExecPlanKind.MaskEqual</code> </li><li>Mask: the identifier bit value</li><li>TestValue: the identifier bit value</li></ul><p>In other words, the expression <code>A</code> which has an bitmask value of <code>0x01</code> can be evaluated as <code>(input &amp; 0x01) == 0x01</code>.</p><h3 id="the-not-operator">The Not Operator</h3><p>The Not operator has a single input node. &#xA0;To generate an execution plan, first we calculate the execution plan for the input node and depending on its <code>ExecPlanKind</code> we create a new execution plan that inverts it.</p><ul><li><code>ExecPlanKind.True</code> =&gt; <code>ExecPlanKind.False</code></li><li><code>ExecPlanKind.False</code> =&gt; <code>ExecPlanKind.True</code></li><li><code>ExecPlanKind.MaskEqual</code> =&gt; <code>ExecPlanKind.MaskNotEqual</code> (keeping the same mask and value)</li><li><code>ExecPlanKind.MaskNotEqual</code> =&gt; <code>ExecPlanKind.MaskEqual</code> (keeping the same mask and value)</li><li>Otherwise <code>ExecPlanKind.EvalNot</code>, with the input plan of the input operand as the argument</li></ul><h3 id="the-and-and-or-operators">The And and Or Operators</h3><p>The And and Or operators are the most complex, but very similar to each other.</p><p>Here&apos;s the logic for the And operator:</p><ol><li>Calculate the execution plan for each of the inputs and place them in a collection.</li><li>If any of the input plans are <code>ExecPlanKind.False</code> then regardless of the other inputs the result is also <code>ExecPlanKind.False</code>.</li><li>If all of the input plans are <code>ExecPlanKind.True</code> the the result is also <code>ExecPlanKind.True</code>.</li><li>Remove any nodes from the collection that are <code>ExecPlanKind.True</code> since they don&apos;t contribute anything to the result.</li><li>Where possible, convert input plans of type <code>ExecPlanKind.MaskNotEqual</code> to <code>ExecPlanKind.MaskEqual</code>. Without this conversion, expressions like <code>A &amp;&amp; B &amp;&amp; !C</code> can&apos;t be reduced because of the way the Not operator works. (see the note above about only converting these when a single bit is set).</li><li>Coalesce all <code>ExecPlanKind.MaskEqual</code> input plans into a single <code>MaskEqual</code> plan that Or&apos;s the input masks and test values together. &#xA0;The only catch here is to detect cases where the there&apos;s a conflicting test value bit for any of the masked bits. &#xA0;Such a case will always be false (eg: <code>A &amp;&amp; !A</code>) and if detected the execution plan for this node becomes just <code>ExecPlanKind.False</code>. </li><li>After applying the above steps if there&apos;s only one input plan left in the collection then that plan becomes the plan for this node.</li><li>Otherwise, there are multiple input plans left in the collection and they need to be evaluated sequentially at runtime. The plan for this node becomes <code>ExecPlanKind.EvalAnd</code> with the current collection of input plans as its input.</li></ol><p>As mentioned the Or operator is basically the same with some of the logic flipped - see the source code for details.</p><h3 id="executing-the-execution-plan">Executing the Execution Plan</h3><p>The output of the above algorithm is an execution plan and to execute it we just need an input value to test against:</p><pre><code class="language-csharp">public bool Evaluate(ulong input)
{
    switch (Kind)
    {
        case ExecPlanKind.True: 
            return true;

        case ExecPlanKind.False: 
            return false;

        case ExecPlanKind.MaskEqual: 
            return (input &amp; Mask) == TestValue;

        case ExecPlanKind.MaskNotEqual: 
            return (input &amp; Mask) != TestValue;

        case ExecPlanKind.EvalAnd: 
            return InputPlans.All(x =&gt; x.Evaluate(input));

        case ExecPlanKind.EvalOr: 
            return InputPlans.Any(x =&gt; x.Evaluate(input));

        case ExecPlanKind.EvalNot: 
            return !InputPlans[0].Evaluate(input);
    }
}</code></pre><p></p><h3 id="testing-and-results">Testing and Results</h3><p>To verify the results of the optimized execution plan, an expression evaluator that works directly using the AST was also implemented and the results compared with the optimized plan over a range of inputs and against a representative set of expressions </p><p>To get an idea of the kinds of operations that are generated by this algorithm, the sample program displays the boolean expression and its equivalent bitmask expression:</p><pre><code>A &amp;&amp; !A  =&gt;  false
A || !A  =&gt;  true
A  =&gt;  (input &amp; 0x1) == 0x1
B  =&gt;  (input &amp; 0x2) == 0x2
C  =&gt;  (input &amp; 0x4) == 0x4
D  =&gt;  (input &amp; 0x8) == 0x8
A &amp;&amp; B  =&gt;  (input &amp; 0x3) == 0x3
A || B  =&gt;  (input &amp; 0x3) != 0x0
A &amp;&amp; (B || C)  =&gt;  ((input &amp; 0x1) == 0x1) &amp;&amp; ((input &amp; 0x6) != 0x0)
(A &amp;&amp; B) || C  =&gt;  ((input &amp; 0x4) != 0x0) || ((input &amp; 0x3) == 0x3)
(A &amp;&amp; B) || (C &amp;&amp; D)  =&gt;  ((input &amp; 0x3) == 0x3) || ((input &amp; 0xC) == 0xC)
(A || B) &amp;&amp; (C || D)  =&gt;  ((input &amp; 0x3) != 0x0) &amp;&amp; ((input &amp; 0xC) != 0x0)
(A || B) &amp;&amp; (C || D) &amp;&amp; (A || C) &amp;&amp; (A || D)  =&gt;  ((input &amp; 0x3) != 0x0) &amp;&amp; ((input &amp; 0xC) != 0x0) &amp;&amp; ((input &amp; 0x5) != 0x0) &amp;&amp; ((input &amp; 0x9) != 0x0)</code></pre><p>But what about performance? &#xA0;On the above set of expressions the optimized execution plans ran on average more than 60% faster that the non-optimized version. (approx 2.9 down to 1.1 seconds)</p><h3 id="more-performance-with-ilgenerator">More Performance with ILGenerator</h3><p>I really wanted to squeeze as much performance as I could out of these expressions since the UI library calls them fairly regularly during rendering and control state updates.</p><p>One of the coolest features of .NET are dynamic methods - the ability to generate IL code at runtime that effectively gets compiled to machine code. </p><p>Given the optimized execution plans from above it&apos;s really quite simple to generate IL code to actually execute them. &#xA0;Here&apos;s the entire function that generates the IL for an execution plan:</p><pre><code class="language-csharp">void Emit(ExecPlan plan)
{
    switch (plan.Kind)
    {
        case ExecPlanKind.True:
            _il.Emit(OpCodes.Ldc_I4_1);
            break;

        case ExecPlanKind.False:
            _il.Emit(OpCodes.Ldc_I4_0);
            break;

        case ExecPlanKind.MaskEqual:
            _il.Emit(OpCodes.Ldarg_0);
            _il.Emit(OpCodes.Ldc_I4, plan.Mask);
            _il.Emit(OpCodes.And);
            _il.Emit(OpCodes.Ldc_I4, plan.TestValue);
            _il.Emit(OpCodes.Ceq);
            break;

        case ExecPlanKind.MaskNotEqual:
            _il.Emit(OpCodes.Ldarg_0);
            _il.Emit(OpCodes.Ldc_I4, plan.Mask);
            _il.Emit(OpCodes.And);
            _il.Emit(OpCodes.Ldc_I4, plan.TestValue);
            _il.Emit(OpCodes.Ceq);
            _il.Emit(OpCodes.Ldc_I4_0);
            _il.Emit(OpCodes.Ceq);
            break;

        case ExecPlanKind.EvalAnd:
            {
                var lblFalse = _il.DefineLabel();
                var lblDone = _il.DefineLabel();
                for (int i = 0; i &lt; plan.InputPlans.Count; i++)
                {
                    // Generate the input plan
                    Emit(plan.InputPlans[i]);
                    _il.Emit(OpCodes.Brfalse, lblFalse);
                }
                _il.Emit(OpCodes.Ldc_I4_1);
                _il.Emit(OpCodes.Br_S, lblDone);
                _il.MarkLabel(lblFalse);
                _il.Emit(OpCodes.Ldc_I4_0);
                _il.MarkLabel(lblDone);
                return;
            }

        case ExecPlanKind.EvalOr:
            {
                var lblTrue = _il.DefineLabel();
                var lblDone = _il.DefineLabel();
                for (int i = 0; i &lt; plan.InputPlans.Count; i++)
                {
                    // Generate the input plan
                    Emit(plan.InputPlans[i]);
                    _il.Emit(OpCodes.Brtrue, lblTrue);
                }
                _il.Emit(OpCodes.Ldc_I4_0);
                _il.Emit(OpCodes.Br_S, lblDone);
                _il.MarkLabel(lblTrue);
                _il.Emit(OpCodes.Ldc_I4_1);
                _il.MarkLabel(lblDone);
                break;
            }

        case ExecPlanKind.EvalNot:
            {
                Emit(plan.InputPlans[0]);
                _il.Emit(OpCodes.Ldc_I4_0);
                _il.Emit(OpCodes.Ceq);
                break;
            }

        default:
            throw new NotImplementedException();
    }
}
</code></pre><p>Using the above approach, performance jumped to almost 90% faster than directly evaluating the unoptimized tree. &#xA0;(approx 2.9 down to 0.4 seconds). Well worth the effort!</p><h3 id="about-the-code">About the Code</h3><p>The code for this is <a href="https://github.com/toptensoftware/BitmaskExpressions?ref=toptensoftware.com">available here</a>. &#xA0;Some notes:</p><ul><li>A <a href="https://www.toptensoftware.com/blog/bitmask-expressions/en.wikipedia.org/wiki/Visitor_pattern">Visitor Pattern</a> is used for walking the AST</li><li>It should probably use the Visitor pattern for <code>ExecPlan</code> nodes too (but it doesn&apos;t)</li><li>Most of the logic can be found where you&apos;d expect - the <code>Tokenizer</code> is in Tokenizer.cs, the <code>Parser</code> in Parser.cs etc...</li><li>The <code>Evaluator</code> class evaluates an unoptimized AST tree against an input value</li><li>The <code>Planner</code> class produces an optimized <code>ExecPlan</code> from a supplied AST</li><li>The <code>Compiler</code> class generates IL code for a supplied<code>ExecPlan</code></li><li>The <code>Logger</code> class is used log an expression AST to the console</li><li>All the AST nodes are in AstNodes.cs</li><li>The <code>IBitNames</code> interface allows plugging in a mapping from expression identifier to bit mask</li><li>The <code>BitFromLetter</code> class provides a hardcoded implementation of <code>IBitNames</code> where <code>A</code> = <code>0x01</code>, <code>B</code> = <code>0x02</code>, <code>C</code> = <code>0x04</code> etc... and is used only for testing.</li><li>The <code>EnumNames</code> class provides an implementation of <code>IBitNames</code> that retrieves bit values from a C# <code>enum</code> type.</li></ul><p>Finally, the <code>Compiler</code> class also includes this method:</p><pre><code class="language-csharp">public static Func&lt;T, bool&gt; Compile&lt;T&gt;(string expression) where T: Enum
</code></pre><p>It can be used to create an IL optimized method for an expression based on a C# enum:</p><pre><code class="language-csharp">[Flags]
enum Fruit
{
    Apples = 0x01,
    Pears = 0x02,
    Bananas = 0x04,
}

var compiledExpression = Compiler.Compile&lt;Fruit&gt;(&quot;Apples &amp;&amp; (Bananas || Pears)&quot;);
var result = compiledExpression(Fruit.Apples|Fruit.Bananas);
</code></pre><p>The compiler supports generating IL code for <code>enum</code> types with a 8, 16, 32 and 64-bit underlying types.</p><p>Now I just need to retrofit it into GTL.</p><hr><p>PS: If you&apos;ve seen this or a similar algorithm before, I&apos;d be interested in reading about it - <a href="https://twitter.com/toptensoftware?ref=toptensoftware.com">let me know</a>.</p>]]></content:encoded></item><item><title><![CDATA[Running EA Origin Games under Linux via Steam and Proton]]></title><description><![CDATA[Explains how to run EA Origin games on Linux under Steam and Proton, using NFS The Run as an example.]]></description><link>https://www.toptensoftware.com/blog/running-ea-origin-games-under-linux-via-steam-and-proton/</link><guid isPermaLink="false">65a60a2f649514000100f3e5</guid><category><![CDATA[gaming]]></category><category><![CDATA[linux]]></category><category><![CDATA[ubuntu]]></category><category><![CDATA[steam]]></category><category><![CDATA[origin]]></category><category><![CDATA[nfs]]></category><category><![CDATA[nfstherun]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Tue, 01 Dec 2020 05:02:25 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2020/12/nfstherun-banner.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2020/12/nfstherun-banner.png" alt="Running EA Origin Games under Linux via Steam and Proton"><p>One of my favourite PS3 games was <a href="https://en.wikipedia.org/wiki/Need_for_Speed:_The_Run?ref=toptensoftware.com">Need for Speed: The Run</a>. When I saw the PC version on sale I couldn&apos;t resist grabbing a copy to see if I could get it to work on my new Linux gaming machine (which I <a href="https://www.toptensoftware.com/blog/building-a-linux-based-console-style-lounge-room-gaming-pc/">wrote about here</a>).  </p><p>The Run isn&apos;t listed on <a href="https://www.protondb.com/?ref=toptensoftware.com">ProtonDB</a> so there was a chance it wouldn&apos;t be playable but at just $5 there wasn&apos;t much to lose. Since this was the first time I&apos;d tried a non-Steam game under Proton I knew there&apos;d be some tinkering around but I got it working in the end.</p><h3 id="update-1">Update 1</h3><p>Since posting this article there&apos;s a been some <a href="https://www.reddit.com/r/linux_gaming/comments/k4e9rf/how_to_run_ea_origin_games_under_linux_via_steam/?ref=toptensoftware.com">discussion on Reddit</a> about why this is necessary:</p><ul><li>&quot;Why not just use Lutris?&quot; No reason except that I just like using Steam as the launcher for this lounge-room based, controller-only gaming machine.  I personally haven&apos;t looked into Lutris, but have only heard good things about it. For me, everything else I play is in Steam, it works fine and I just wanted to add this one non-Steam game.</li><li>&quot;Why do this when EA Games already install OriginThinSetup.&quot;  This is specifically for games that aren&apos;t on Steam, in this case NFS The Run.</li></ul><h3 id="update-2">Update 2</h3><p>Since posting, EA has shutdown the servers causing an infinite hang when starting the game showing &quot;Connecting to Auto Log servers&quot;.  On Windows this can be worked around by setting up a firewall rule to block the game .exe file. </p><p>Since Linux doesn&apos;t seem to support firewall blocking by process, certainly not by process name, and definitely not for Wine process I did some digging and figured out the actual ports to block.  The following two <code>ufw</code> rules seems to let the game run again:</p><pre><code>sudo ufw reject out 42127/tcp
sudo ufw reject out 1900/udp
</code></pre>
<p>Actually, only the first rule is required, but in trying to figure this out I noticed it was connecting to 1900/udp as well... might as well block it too.</p><h3 id="ugh-ea-origin">Ugh, EA Origin</h3><p>Unfortunately, the only way to download, install and run most EA games is via their launcher Origin.  It&apos;s like EA&apos;s version of Steam but crappier - especially for lounge room gaming machines that don&apos;t have a mouse or keyboard since it doesn&apos;t support controllers.</p><p>Luckily there&apos;s a way to launch Origin and then get it to run a specific game - which I&apos;ll cover below.  Unfortunately, there doesn&apos;t seem to be a way to have it automatically shutdown when you close the game so that needs to be done manually.</p><p>This post explains the best way I found to set it up. There might be other better ways, in which case please <a href="http://www.twitter.com/toptensoftware?ref=toptensoftware.com">let me know</a>.</p><h3 id="heres-how">Here&apos;s How...</h3><ol><li>On a Windows machine, download OriginThinSetup.exe from their site. You need to do this because viewing the site from a Linux machine doesn&apos;t give the Windows download option. (Alternatively, Redditor <a href="https://www.reddit.com/r/linux_gaming/comments/k4e9rf/how_to_run_ea_origin_games_under_linux_via_steam/ge8onw8?utm_source=share&amp;utm_medium=web2x&amp;context=3">GGG_246</a> informs these are also available at <a href="https://appdb.winehq.org/objectManager.php?sClass=version&amp;iId=26175&amp;ref=toptensoftware.com">winehq</a>)</li><li>Transfer OriginThinSetup.exe to your Linux machine. It doesn&apos;t matter where you put it but your Downloads folder is a good option.</li><li>In Steam, choose the &quot;Add non-Steam Game&quot; command and select OriginThinSetup.exe from where ever you placed it.  Also, choose to run it using Proton.  I used Proton 5.0.</li><li>Start the newly added &quot;game&quot; ie: the Origin installer and install it.</li><li>Once Origin is installed you can launch it directly from its installer.  Login to your account and choose to download and install the game.</li><li>You should now be able to run the game and with a little luck it should basically work.</li></ol><p>Now that Origin and the game are installed, the trick is figuring out how to start it directly instead of running the Origin installer again.  The method I ended up using was a bash script:</p><ol><li>Close Origin if it&apos;s still running.</li><li>Go to the directory <code>~/.steam/steam/steamapps/compatdata/</code> and look for a sub-directory named with 10 digits.  In my case it was called <code>3627082160</code>. If you have multiple directories go into each one and work out which has the Origin.exe program.  It should be in a sub-folder named <code>pfx/drive_c/Program Files (x86)/Origin/Origin.exe</code>.</li><li>Next you&apos;ll need to create a bash script with the code shown below.</li><li>In the script update the variable <code>COMPATDIR</code> from <code>3627082160</code> to whatever the folder is called on your machine.</li><li>Also update the variable <code>GAMEID</code> to the Origin Id of the game you installed. You can get this by logging into origin.com and clicking on the game in your library and the game ID will show up in the URL (see screen shot below).  The Origin Game Id for NFS The Run is <code>231088400</code>.</li><li>If you&apos;ve got other versions of Proton installed you can experiment with the PROTONVER variable in the script to change which version will be used.  I found for The Run, Proton-5.21-GE-1 worked best.</li><li>Save the script somewhere convenient (I just put it in my home directory) and use chmod to mark it executable. eg: <code>~$ chmod +x nfstherun</code></li><li>Back in Steam, delete the previously created &quot;non-Steam&quot; shortcut to OriginThinSetup.exe.</li><li>Use the Add non-Steam game command again to add a shortcut to the script (note this time don&apos;t choose to run under Proton as this will cause Steam to create a second compatdata prefix directory which we don&apos;t want as we&apos;ve already got everything setup in the existing prefix).</li><li>Finally you can set and icon and grid artwork as you would for any other non-Steam game.</li></ol><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.toptensoftware.com/blog/content/images/2020/11/Screenshot-from-2020-11-30-18-20-30.png" class="kg-image" alt="Running EA Origin Games under Linux via Steam and Proton" loading="lazy" width="1920" height="1080" srcset="https://www.toptensoftware.com/blog/content/images/size/w600/2020/11/Screenshot-from-2020-11-30-18-20-30.png 600w, https://www.toptensoftware.com/blog/content/images/size/w1000/2020/11/Screenshot-from-2020-11-30-18-20-30.png 1000w, https://www.toptensoftware.com/blog/content/images/size/w1600/2020/11/Screenshot-from-2020-11-30-18-20-30.png 1600w, https://www.toptensoftware.com/blog/content/images/2020/11/Screenshot-from-2020-11-30-18-20-30.png 1920w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Configuring Steam to run the script</span></figcaption></figure><p>Here&apos;s the script:</p><pre><code class="language-bash">#!/bin/bash

# Set this the folder where Steam created the Origin prefix
COMPATDIR=3627082160

# Set the to the Origin Game ID of the game to launch
GAMEID=231088400

# Pick a Proton Version
#PROTONVER=steam/steamapps/common/Proton\ 5.0
PROTONVER=root/compatibilitytools.d/Proton-5.21-GE-1

# Location of Origin.exe within the compatdata folder
ORIGIN_EXE=&quot;pfx/drive_c/Program Files (x86)/Origin/Origin.exe&quot;

# Steam prefix directory
export STEAM_COMPAT_DATA_PATH=~/.steam/steam/steamapps/compatdata/$COMPATDIR/

# Proton settings
export PROTON_USE_WINED3D=0

# Run
~/.steam/$PROTONVER/proton waitforexitandrun \
    &quot;${STEAM_COMPAT_DATA_PATH}${ORIGIN_EXE}&quot; \
    origin://LaunchGame/DR:${GAMEID}
</code></pre><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://www.toptensoftware.com/blog/content/images/2020/11/image.png" class="kg-image" alt="Running EA Origin Games under Linux via Steam and Proton" loading="lazy" width="2000" height="561" srcset="https://www.toptensoftware.com/blog/content/images/size/w600/2020/11/image.png 600w, https://www.toptensoftware.com/blog/content/images/size/w1000/2020/11/image.png 1000w, https://www.toptensoftware.com/blog/content/images/size/w1600/2020/11/image.png 1600w, https://www.toptensoftware.com/blog/content/images/2020/11/image.png 2077w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Locating the Origin Game ID for a game.</span></figcaption></figure><h3 id="in-practice-origin">In Practice: Origin</h3><p>That&apos;s everything I did to setup Origin on my lounge room gaming machine. There&apos;s a couple of caveats:</p><ol><li>For some reason sometimes the game either takes a really long time to start, or never starts.  I&apos;ve found that moving the mouse cursor around using the track pad on the PS4 controller seems to hurry this along quite a bit.</li><li>Once the game is closed, Origin will rear it&apos;s ugly head. I haven&apos;t found a way to prevent this so I just use the controller track pad to shut it down. Unfortunately you can&apos;t just leave it running because if you launch the game again it doesn&apos;t seem to start. (For a possible work around for this, see <a href="https://www.reddit.com/r/linux_gaming/comments/k4e9rf/how_to_run_ea_origin_games_under_linux_via_steam/ge8s7ga?utm_source=share&amp;utm_medium=web2x&amp;context=3">this reddit post</a> by <a href="https://www.reddit.com/user/lucasrizzini/?ref=toptensoftware.com">lucasrizzini</a>)</li></ol><h3 id="in-practice-nfs-the-run">In Practice: NFS The Run</h3><p>As for NFS The Run, it seems to run really well and mostly looks like any other game in Steam:</p><figure class="kg-card kg-image-card"><img src="https://www.toptensoftware.com/blog/content/images/2020/11/Screenshot-from-2020-11-30-18-19-10.png" class="kg-image" alt="Running EA Origin Games under Linux via Steam and Proton" loading="lazy" width="1920" height="1080" srcset="https://www.toptensoftware.com/blog/content/images/size/w600/2020/11/Screenshot-from-2020-11-30-18-19-10.png 600w, https://www.toptensoftware.com/blog/content/images/size/w1000/2020/11/Screenshot-from-2020-11-30-18-19-10.png 1000w, https://www.toptensoftware.com/blog/content/images/size/w1600/2020/11/Screenshot-from-2020-11-30-18-19-10.png 1600w, https://www.toptensoftware.com/blog/content/images/2020/11/Screenshot-from-2020-11-30-18-19-10.png 1920w" sizes="(min-width: 720px) 720px"></figure><p>There are a couple of minor issues:</p><ul><li>Proton 5 seemed to give fairly frequent micro-stutters.  Switching to <a href="https://github.com/GloriousEggroll/proton-ge-custom/releases?ref=toptensoftware.com">Glorious Eggroll</a> 5.1 seemed to really help this.  There&apos;s still the occasional stutter but I seem to remember similar behaviour on the PS3 - it could just be the game. </li><li>Some of the instruction popup screens appeared blank with no text and just a close button.  This didn&apos;t bother me since I knew the game anyway.</li><li>In the snow levels the kicked up spray from other cars appears like black diesel smoke instead of a white mist.  I didn&apos;t notice this on PS3 or in online videos of the PC version so I&apos;m guessing this might be a bug in Proton.</li><li>If you disable V-Sync, the same kicked up spray renders really weirdly in front of your own car and rises vertically from other cars.  This is a <a href="https://answers.ea.com/t5/Other-Need-for-Speed-Games/Dust-water-snow-visual-bug/td-p/6120881?ref=toptensoftware.com">known bug</a> in the PC version and nothing to do with running under Linux/Proton.</li><li>There&apos;s some lip-sync issues in the cut scenes.  Not sure if this is a problem with Linux/Proton or just a problem with PC edition of the game. This didn&apos;t happen in the PS3 version.  No big deal.</li></ul><p>On the positive side, I&apos;ve played through the entire &quot;Run&quot; part of the game and it&apos;s very playable - better than the PS3.  On my GTX-2070 I can set all graphics settings to ultra and run it at 1920 x 1080, the sound is great, PS4 controller works well (although requires mental mapping of Playstation buttons to ABXY style buttons), it looks better than PS3 and I feel like I can see further down the road.</p><p>I also have a suspicion things are slightly better balanced on the PC.  Some levels on the PS3 version seemed unusually difficult compared to the levels before and after and I didn&apos;t notice it this time through.</p><p>TL;DR: Definitely playable and a ton of fun :)</p>]]></content:encoded></item><item><title><![CDATA[Rich Text Editor in C# - Part 10 - View Updates and Clipboard]]></title><description><![CDATA[<p>This video finishes the implementation of Undo/Redo in the view as well as support for multiple views and plain text clipboard operations.</p><figure class="kg-card kg-embed-card"><iframe width="459" height="344" src="https://www.youtube.com/embed/TpE4t9fugGE?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></figure><p>For source code, see <a href="https://github.com/toptensoftware/RichTextKit/tree/UndoMultiViewClipboard?ref=toptensoftware.com">this tagged branch of the RichTextKit repo</a>. </p><p>Got questions or comments? Find me on Twitter - <a href="https://twitter.com/toptensoftware?ref=toptensoftware.com">@toptensoftware</a></p>]]></description><link>https://www.toptensoftware.com/blog/rich-text-editor-in-csharp-10/</link><guid isPermaLink="false">65a60a2f649514000100f3e0</guid><category><![CDATA[richedit]]></category><category><![CDATA[skia]]></category><category><![CDATA[skiasharp]]></category><category><![CDATA[C#]]></category><category><![CDATA[richtextkit]]></category><category><![CDATA[editor]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Mon, 31 Aug 2020 00:30:37 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2020/08/RichEdBanner-9.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2020/08/RichEdBanner-9.png" alt="Rich Text Editor in C# - Part 10 - View Updates and Clipboard"><p>This video finishes the implementation of Undo/Redo in the view as well as support for multiple views and plain text clipboard operations.</p><figure class="kg-card kg-embed-card"><iframe width="459" height="344" src="https://www.youtube.com/embed/TpE4t9fugGE?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></figure><p>For source code, see <a href="https://github.com/toptensoftware/RichTextKit/tree/UndoMultiViewClipboard?ref=toptensoftware.com">this tagged branch of the RichTextKit repo</a>. </p><p>Got questions or comments? Find me on Twitter - <a href="https://twitter.com/toptensoftware?ref=toptensoftware.com">@toptensoftware</a></p>]]></content:encoded></item><item><title><![CDATA[Rich Text Editor in C# - Part 9 - Undo/Redo Support]]></title><description><![CDATA[<p>This video looks at the document side aspects of undo/redo support.</p><figure class="kg-card kg-embed-card"><iframe width="459" height="344" src="https://www.youtube.com/embed/epvqDR63fak?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></figure><p>For source code, see <a href="https://github.com/toptensoftware/RichTextKit/tree/UndoMultiViewClipboard?ref=toptensoftware.com">this tagged branch of the RichTextKit repo</a>. </p><p>Got questions or comments? Find me on Twitter - <a href="https://twitter.com/toptensoftware?ref=toptensoftware.com">@toptensoftware</a></p>]]></description><link>https://www.toptensoftware.com/blog/rich-text-editor-in-csharp-9/</link><guid isPermaLink="false">65a60a2f649514000100f3df</guid><category><![CDATA[richedit]]></category><category><![CDATA[skia]]></category><category><![CDATA[skiasharp]]></category><category><![CDATA[C#]]></category><category><![CDATA[richtextkit]]></category><category><![CDATA[editor]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Mon, 24 Aug 2020 07:06:22 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2020/08/RichEdBanner-8.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2020/08/RichEdBanner-8.png" alt="Rich Text Editor in C# - Part 9 - Undo/Redo Support"><p>This video looks at the document side aspects of undo/redo support.</p><figure class="kg-card kg-embed-card"><iframe width="459" height="344" src="https://www.youtube.com/embed/epvqDR63fak?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></figure><p>For source code, see <a href="https://github.com/toptensoftware/RichTextKit/tree/UndoMultiViewClipboard?ref=toptensoftware.com">this tagged branch of the RichTextKit repo</a>. </p><p>Got questions or comments? Find me on Twitter - <a href="https://twitter.com/toptensoftware?ref=toptensoftware.com">@toptensoftware</a></p>]]></content:encoded></item><item><title><![CDATA[Rich Text Editor in C# - Part 8 - Basic Edits Working]]></title><description><![CDATA[<p>In this video we reach a bit of a milestone with basic edit operations now working!</p><figure class="kg-card kg-embed-card"><iframe width="459" height="344" src="https://www.youtube.com/embed/qBToGzTWuJQ?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></figure><p>Got questions or comments? Find me on Twitter - <a href="https://twitter.com/toptensoftware?ref=toptensoftware.com">@toptensoftware</a></p>]]></description><link>https://www.toptensoftware.com/blog/rich-text-editor-in-csharp-8/</link><guid isPermaLink="false">65a60a2f649514000100f3d9</guid><category><![CDATA[richedit]]></category><category><![CDATA[skia]]></category><category><![CDATA[skiasharp]]></category><category><![CDATA[C#]]></category><category><![CDATA[richtextkit]]></category><category><![CDATA[editor]]></category><dc:creator><![CDATA[Brad Robinson]]></dc:creator><pubDate>Thu, 20 Aug 2020 23:50:33 GMT</pubDate><media:content url="https://www.toptensoftware.com/blog/content/images/2020/08/RichEdBanner-7.png" medium="image"/><content:encoded><![CDATA[<img src="https://www.toptensoftware.com/blog/content/images/2020/08/RichEdBanner-7.png" alt="Rich Text Editor in C# - Part 8 - Basic Edits Working"><p>In this video we reach a bit of a milestone with basic edit operations now working!</p><figure class="kg-card kg-embed-card"><iframe width="459" height="344" src="https://www.youtube.com/embed/qBToGzTWuJQ?feature=oembed" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></figure><p>Got questions or comments? Find me on Twitter - <a href="https://twitter.com/toptensoftware?ref=toptensoftware.com">@toptensoftware</a></p>]]></content:encoded></item></channel></rss>