A Language Called [something] - The Story So Far

Friday, August 5th, 2011     #javascript #everything

As mentioned in my previous post I've made some pretty good progress on this little language I'm working on.

This post is about Prefix a modern language that compiles to JavaScript. A language where C# is the inspiration - not the goal. Read more on the Prefix Project Page.

Statements

These statements are implemented and type safe (eg: if statement must have a boolean expression as its condition)

  • if/else
  • switch/case.
  • while
  • do/while
  • for,
  • return,
  • break,
  • continue

These are not:

  • foreach
  • throw/catch

Expressions

Most expression elements are implemented:

  • parentheses: ( ),
  • assignment: =, += etc...,
  • all binary operators: +, -, *, / etc...
  • all unary operators: -, ~, ! etc...
  • function calls fn(params)
  • literals: strings, integers, doubles, null, true/false
  • pre and postfix increment operators: x++ and ++x
  • ternary operator: ? :
  • this

Again these are all type safe, so you can't multiply a string by an integer for example.

The indexer operator [] is implemented for dynamic types (see below) but not yet for static types.

Variable Declarations

Variable declarations with initialization expressions are implemented with the typical syntax:

// Declare an integer variable
int x = 10;

The var keyword is also supported for local variables and will infer its type from the initialization expression. This statement is identical to the one above:

// Infer x's type from right hand side of assignment
var x = 10;

The dynamic keyword defines a variable that is not typed. The compiler doesn't check method calls or operations on these objects and will simply generate the associated JavaScript code. Think of dynamic as pretty much identical to JavaScript's var keyword.

// Declare an untyped variable
dynamic x = getSomeObject();
x.DoWhateverYouLikeButItMightExplodeAtRuntime();

Primitive Types

So far these primitive types are implemented and working:

  • Object
  • String
  • Integer
  • Double
  • Boolean

Integers and Doubles

Unlike JavaScript, there are two numeric types - integer and double. Although both are implemented as numbers in JavaScript, the compiler ensures that integer division and casting doubles to integers are rounded to whole numbers. eg:

{{C#}}
// input
int x = 11 / 5;
{{JavaScript}}
// JavaScript
var x = (11 / 5)|0;

(resulting in 2, not 2.2)

It also works for /= and is smart enough to not cause side effects:

{{C#}}
GetInstance().val/=10;
{{JavaScript}}
var $temp;
($temp=GetInstance()).val = (($temp.val/10)|0);

Note the use of a temporary variable to avoid two calls to GetInstance() which might have side effects.

Global Scope Code

Unlike C#, global statements, functions and variables are supported. Here's an example:

{{C#}}
extern dynamic Console;

string GetString()
{
    return "Hello World";
}

Console.WriteLine(GetString());
{{JavaScript}}
(function() {

// System.String GetString()
function GetString()
{
    return "Hello World";
}
Console.WriteLine(GetString());

})();

Notice the following about the above:

  • Everything generated by the compiler is wrapped in a global closure that prevents pollution of the external JavaScript namespace.
  • The extern dynamic Console statement declares an external dynamic variable called Console. Since it's dynamic we can call any method without a compile time error. Console is a method defined in my test harness and you'll see it popping up in many of these examples.

Simple Class Declarations

A simple class:

{{C#}}
extern dynamic Console;

class MyClass
{
    public string message()
    {
        return "Hello World";
    }
}

Console.WriteLine(new MyClass().message());
{{JavaScript}}
(function() {

// class MyClass
var MyClass = function()
{
};

// System.String MyClass.message()
MyClass.prototype.message = function()
{
    return "Hello World";
};


Console.WriteLine(new MyClass().message());

})();

Note that currently the members of a class are implemented as <classname>.prototype.<methodname> = function. There is also the ability to generate the prototype as a object literal which generates smaller code and would be suitable for a release mode build. For readability I've disabled in debug builds. Compare the above to this:

{{JavaScript}}
MyClass.prototype = 
{
    // System.String MyClass.message()
    message: function()
    {
        return "Hello World";
    }
        
    // Other functions would go here
};

Inheritance - abstract, virtual and override modifiers

Now we're getting into some fun stuff. What you can't see here is the error checking the compiler is doing in the background, including:

  • Trying to new Base would give an error cause the class is marked as abstract
  • Marking fn1 as abstract without marking the class as abstract would give an error
  • Using the override modifier would give an error if there's no matching method in the base class
  • Declaring a body for an abstract method would give an error
  • The assignment Base b = new Derived() is only allowed because Derived can be implicitly cast toBase.
{{C#}}
extern dynamic Console;

abstract class Base
{
   public abstract void fn1();

 public virtual void fn2()
   {
       Console.WriteLine("Base fn2");
  }
}

class Derived : Base
{
 public override void fn1()
  {
       Console.WriteLine("Derived fn1");
   }

   public override void fn2()
  {
       Console.WriteLine("Derived fn2");
   }
}

Base b = new Derived();
b.fn1();
b.fn2();
{{JavaScript}}
(function() {

// class Base
var Base = function()
{
};

//  Base.fn2()
Base.prototype.fn2 = function()
{
  Console.WriteLine("Base fn2");
};


// class Derived
var Derived = function()
{
};

Derived.prototype = new Base();

//  Derived.fn1()
Derived.prototype.fn1 = function()
{
  Console.WriteLine("Derived fn1");
};

//  Derived.fn2()
Derived.prototype.fn2 = function()
{
 Console.WriteLine("Derived fn2");
};


var b=new Derived();
b.fn1();
b.fn2();

})();

Note that base member hiding is not supported. This is actually very difficult to do efficiently JavaScript and it's usefulness doesn't justify the required effort. So, the new modifier on members is not supported and hiding base members generates an error.

Properties

This example demonstrates class properties. It's a weird example because it's from my test cases, but shows what gets generated:

{{C#}}
extern dynamic Console;
class MyClass
{
 string _prop;
   string SimpleProperty
   {
       get
     {
           Console.WriteLine("get");
           return _prop;
       }
       set
     {
           Console.WriteLine("set");
           _prop = value;
      }
   }
}

var x = new MyClass();

x.SimpleProperty = "Hello World";

Console.WriteLine(x.SimpleProperty);
{{JavaScript}}
(function() {

// class MyClass
var MyClass = function()
{
};

MyClass.prototype._prop = null

// System.String MyClass.get_SimpleProperty()
MyClass.prototype.get_SimpleProperty = function()
{
 Console.WriteLine("get");
   return this._prop;
};

// System.String MyClass.set_SimpleProperty(System.String value)
MyClass.prototype.set_SimpleProperty = function(value)
{
 Console.WriteLine("set");
   this._prop = value;
 return value;
};


var x=new MyClass();
x.set_SimpleProperty("Hello World");
Console.WriteLine(x.get_SimpleProperty());

})();

Of course, properties also require special handling for the compound assignment operators:

{{C#}}
x.SimpleProperty +=" Again";
{{JavaScript}}
x.set_SimpleProperty((x.get_SimpleProperty() + " Again"));

Although not shown here this is another case where side effects could arise so temporary variables are used where necessary, and integer division assignment /= is rounded - as you'd expect.

Function Overloading

Function overloading is something that doesn't really map well onto JavaScript - at least not without some sort of runtime inspection/decision making of the arguments variable. I've struggled somewhat with the best way to provide this feature and have come up with the following:

  • Overloaded functions are allowed and are internally implemented by qualifying their name with a $<number> suffix - call this the "mangled" name
  • To make public overloaded functions callable from external JavaScript you can provide an alternate name through an attribute on the original method.
  • Eventually, though not yet implemented there may be an unmangled function automatically generated that can make simple decisions about which mangled function to call based on the input arguments.

I think this provides the best of all worlds because:

  • It allows transparent function overloading within the confines of the language.
  • It provides minimal messing up of the generated code with long complicated unreadable mangled names (say if I was to mangle with all the parameter types encoded in some weird format)
  • It provides controllable access to overloaded public methods when necessary.
  • It provides some automatic function overloading to external JavaScript (maybe, if I get around to it).

So here's an example, first without the external name attribute

{{C#}}
extern dynamic Console;

class Foo
{
 public string Write(int x)
  {
       Console.WriteLine(x.toString());
    }

   public string Write(string message)
 {
       Console.WriteLine(message);
 }
}

var x=new Foo();
x.Write("Hello World");;
x.Write(23);;
{{JavaScript}}
(function() {

// class Foo
var Foo = function()
{
};

// System.String Foo.Write(System.Integer x)
Foo.prototype.Write$1 = function(x)
{
  Console.WriteLine(x.toString());
};

// System.String Foo.Write(System.String message)
Foo.prototype.Write$2 = function(message)
{
   Console.WriteLine(message);
};

var x=new Foo();
x.Write$2("Hello World");;
x.Write$1(23);;

})();

And, now with manually name attribute. Note the only change in the generated code is the additional prototype entries that map to the same mangled function names. (and I just decided I don't like jsname as the name of that attribute... expect that to change)

{{C#}}
extern dynamic Console;

class Foo
{
 [jsname("WriteInt")]
    public string Write(int x)
  {
       Console.WriteLine(x.toString());
    }

   [jsname("WriteString")]
 public string Write(string message)
 {
       Console.WriteLine(message);
 }
}

var x=new Foo();
x.Write("Hello World");;
x.Write(23);;
{{JavaScript}}
(function() {

// class Foo
var Foo = function()
{
};

// System.String Foo.Write(System.Integer x)
Foo.prototype.Write$1 = function(x)
{
  Console.WriteLine(x.toString());
};

// System.String Foo.Write(System.String message)
Foo.prototype.Write$2 = function(message)
{
   Console.WriteLine(message);
};

Foo.prototype.WriteInt = Foo.prototype.Write$1;
Foo.prototype.WriteString = Foo.prototype.Write$2;

var x=new Foo();
x.Write$2("Hello World");;
x.Write$1(23);;

})();

I'm pretty happy with that... I don't think the mangling is too hard on the eyes.

Finally, note that function overloading also works on global scope functions and on constructors. Which I haven't mentioned yet...

Constructors

Constructors have been the hardest thing to implement so far because:

  • of their relationship to the prototype chain
  • the fact they can be overloaded
  • the fact that they're not declared "inside" the class like other methods
  • the fact they need to automatically or explicitly call base constructors
  • the fact that the function name for the class needs to be kept unmangled to work in instanceof checks
  • the base class needs to be 'newed' as part of setting up the prototype chain - and it needs to do it without side effects.

In short there's a whole set of problems to resolve with constructors. The solution I came up with is this:

  • The function used to declare the class, is not really the class constructor - rather it represents the "type". The only place this function is used with the new operator is in setting up derived class prototype chains.
  • Constructors are implemented as separate functions mangled in the same way as methods are (as described above)
  • The constructor function prototype is set to the class function prototype.
  • Default and empty parameterless constructors are not generated to keep the code clean.

Here's an example:

{{C#}}
extern dynamic Console;

class Fruit
{
   public Fruit(int param)
 {
       Console.WriteLine("Fruit Constructor " + param.toString());
 }
}

class Apple : Fruit
{
  public Apple() : base(99)
   {
       Console.WriteLine("Apple Constructor");
 }
}
var x = new Apple();
{{JavaScript}}
(function() {

// class Fruit
var Fruit = function()
{
};

// Fruit.Fruit(System.Integer param)
var Fruit$1 = function(param)
{
    Console.WriteLine("Fruit Constructor " + param.toString());
};


Fruit$1.prototype = Fruit.prototype;

// class Apple
var Apple = function()
{
};

// Apple.Apple()
var Apple$1 = function()
{
   Fruit$1.call(this, 99);
 Console.WriteLine("Apple Constructor");
};


Apple.prototype = new Fruit();

Apple$1.prototype = Apple.prototype;

var x=new Apple$1();

})();

The points of interest here are:

  • The constructor functions are mangled - Fruit$1 and Apple$1
  • Each constructor's prototype is set to the class function's prototype to get instanceof working correctly - Fruit$1.prototype = Fruit.prototype
  • The derived class prototype is initialized by instantiating the base class function. Apple.prototype = new Fruit()
  • The call from the derived class constructor to the base constructor. Fruit1$.call(this, 99)

That last bit (Fruit1$.call(this, 99)) took me ages to come up with a way to generate that because the expression tree needs to be flipped on its head to get the this inside the call - instead of on the left.

And More...

I've actually done some more bits and pieces but I think that covers the major pieces of work done so far. Still need to come up with a name...

« Older - A Language Called [something] - The Story So Far - Part II Newer - Building my Own [something] to JavaScript Compiler »