A Language Called [something] - Code Path Analysis

Saturday, August 13th, 2011     #javascript #everything

This is not so much a language feature but the compiler now does a fairly good job of code path analysis.

This post is about Prefix a modern language that compiles to JavaScript. A language where C# is the inspiration - not the goal. Read more on the Prefix Project Page.

What is Code Path Analysis?

Code path analysis is the ability of a compiler to analyse the possible execution paths through an input program and generate errors and warnings for things that are probably not correct.

[Somethings]'s Code Path Analyser

[Something]'s compiler can now analyse individual functions and global scope code. It doesn't analyse execution paths between functions nor does it track variable values but it does evaluate constant expressions.

It can detect:

  • Non-void functions that don't return a value on all code paths (error)
  • Use of unassigned variables (error)
  • Unreachable code (warning)
  • Uninitialized variables (warning)
  • Variables that are initialized but never used (warning)

Devil in the Detail

When I sat down to design how code path analysis should work I was under no illusions that it would be trivial to implement. In the end it turned out simpler than I thought but there were a few interesting cases that cropped up in trying to get it just right.

Constant Expressions

At first I didn't think I'd need to evaluate and handle constant expressions in the code path, but this soon proved wrong. Take these two examples:

{{C#}}
string x;
while (true)
{
    x="Hello World";
    break;
}
Console.WriteLine(x);

vs

{{C#}}
string x;
bool b=true;
while (b)
{
    x = "Hello World";
    break;
}
Console.WriteLine(x)

The first example shouldn't generate an unassigned variable error, but in the second one it's probably reasonable.

Short-circuit Logical Expressions

In evaluating logical And and Or expressions, the code path analysis needs to account for the short-circuit evaluation that most languages (include JavaScript) do.

I struggled to come up with a decent example to show this, but this is what I used in the unit tests:

{{C#}}
string x;
bool t=true;
bool b = ((x="a")=="a" && t);
Console.WriteLine(x);               // This is fine (x="a" in the previous line will always execute)

vs

{{C#}}
string x;
bool t=true;
bool b = (t && (x="a")=="a" );
Console.WriteLine(x);              // This is an uninitialized variable error

In the second example t could be true or false (even though this example sets t to true in the previous line variable values aren't analysed). If t is false the right hand side of the && won't execute and therefore x is uninitialized.

Ternary Operator

The ternary operator <cond> ? <truevalue> : <falsevalue> is effectively an if/else statement and therefore needs to be tracked in the code analysis.

Break Statements

Break statements are a pain for code analysis and I needed to change the design of the analyser a couple of times to get this working. Basically a break statement is the main form of jump or goto in this language and requires a bit of special handling to track properly.

Conclusion

Initially I was going to leave the code path analysis until after most other language features were implemented. Given that this is an area that is going to require continuous ongoing testing I'm glad it's now done.

Pleasingly, in the process of getting this working the analyser found a bunch of errors in the existing unit test scripts.

« Older - A Language (temporarily) called Prefix Newer - A Language Called [something] - more language features »