Building my Own [something] to JavaScript Compiler

Thursday, 4 August 2011

If JavaScript is the "assembly language of the web", then there really needs to be a better compiler for it. And since I can't find one I like, I thought I'd have a crack at writing one myself...

This post is about Prefix a modern language that compiles to JavaScript. A language where C# is the inspiration - not the goal. Read more on the Prefix Project Page.

Although there's an elegance to JavaScript that I really admire, for any sizeable project it's dynamic typing really does scare me. So I decided to have a look at some of the options for compiling a more statically typed language into JavaScript.

I looked at a few - ScriptSharp, SharpKit, JSIL and GWT and while they're all impressive, they all miss the mark for what I want - which is just a better way to write JavaScript.

I also looked at CoffeeScript and loved the concept - that "it's just JavaScript" with a cleaner syntax and some of the more ugly, annoying quirks of JavaScript patched over. And while I don't have any real problem with braces and semicolons I do agree with this sentiment from the CoffeeScript home page:

Underneath all of those embarrassing braces and semicolons, JavaScript has always had a gorgeous object model at its heart

But CoffeeScript is still a dynamic language and in some ways it's just a syntax improvement (albeit a major one).

So, after not finding something I liked and a pretty severe bout of Not Invented Here Syndrome, I've started work on a new language and a compiler for it (as one does).

C# as the Inspiration, Not the Goal

C# is without doubt my favourite language so it's only natural to borrow heavily from it. The goal however is not to build a C# to JavaScript compiler. ScriptSharp and SharpKit are already doing this and it's not what I want.

This language will share much of the syntax and features of C#, but not the runtime environment. It will favour working well with JavaScript over perfectly implementing another language/framework.

Complete JavaScript Interoperability

Like CoffeeScript this project wont be attempting to hide you from the fact that you're still just writing JavaScript. Rather it will include language features that allow direct interoperability with JavaScript without requiring bridging libraries or classes. In particular, jQuery needs to work as is so $ will be supported as a valid identifier character.

Similarly, using code written in this language needs to be easily and intuitively accessible from JavaScript.

Strongly Typed

Primarily the language will be strongly/statically typed but allow dropping back to dynamic typing. Internally, strong typing will be enforced where you use it. Imported external objects can be treated as strongly typed by casting to defined types, or accessed directly as dynamic objects. Objects returned out of the language will obviously need to be dynamic.

Elimination of Evil and Ambiguity (and some annoyances)

There are a few constructs in JavaScript that are commonly considered "evil" - most notably the with and eval statements which simply wont be supported.

Ambiguous code such as JavaScript's optional semicolons wont be supported.

JavaScript has a few bits that are at least annoying if not borderline broken:

  • Local variable scoping
  • Handling of this in nested closures (ie: the private that convention described here)
  • The need to reference one's own member variables via this.

Code Generation

The code generated by this compiler will strive for:

  • No Runtime Dependencies - This is paramount. There may be a need for a few internal helper functions, but these will be tiny, neatly hidden away and automatically written by the compiler.

  • Readability - The generated code will be readable and recognizable as the original input program. This is important in order to make debugging as easy as possible. This includes maintaining original comments, minimal mangling of identifiers and simply producing clean code.

  • Best practice techniques - rather than provide a bunch of different options for how code is generated, as much as possible I hope to choose one technique that is considered best practice and stick to it.

  • Compatibility - The generated Javascipt will work across all browsers (well probably, that's not a fixed goal - perhaps I should say all modern browsers)

  • Assistance with Minification - A strongly typed language can do a much better job of facilitating minification than a general purpose JavaScript minifier. There will be an option to generate code intended for minification - where identifiers known to be private are pre-minified leaving less work for the minifier. (The compiler wont produce fully minified code but may have the option to invoke one).

Simple Tooling

The compiler will be a simple command line utility. Give it one or more input files and it will spit out a single self contained JavaScript file. It will be written in C# and run on .NET and Mono - so it'll work on just about any platform.

Getting Started

So that's the plan - where to start? The code base for MiniME seemed like a good place.

After cloning and ripping out the minification logic I was left with just the tokenizer, parser and renderer. I then spent a few hours tinkering and soon had a simple proof of concept up and running - it could process a simple class syntax and generate a prototype based "class" as the output.

Since then I've made good progress (which I'll write up over the coming days) but there's still a fair way to go.

I'm not sure yet whether this will be an open source project. At least initially I'll be keeping it closed but developing it in public. There'll probably be a series of blog posts over the coming weeks as I'd like to keep a record of what's been done and why.

Finally, I'm going to have to decide on name. I've got something in mind but I'm not committed yet... stay tuned.

« Introducing PetaTest - A Tiny Unit Testing Framework A Language Called [something] - The Story So Far »

8 Comments

Any chance you could show us a gist or something for what this would look like? Go on, go on, go on...

4 August 2011 07:46 AM
Bob Kummer

Once again, you have scoped out a project that describes exactly what I have been looking for. I wish I could provide some constructive suggestions, but you sound spot on. Can’t wait to see how this unfolds!

I have been using PetaPoco and now PetaTest with great success. PetaTest is ideal for checking my POCOs against the DB for schema changes and functional testing on my repositories. Is a PetaPoco community forming somewhere? I see some posts at Stack Overflow, Twitter and this blog, but PP & PT really deserve their own meeting place.

4 August 2011 05:51 PM

Shouldn't you rather focus on PetaPoco because there're still outstanding features to be done.

4 August 2011 09:11 PM

@James: there will be plenty of examples coming...

@Bob: Thanks. The "community" for PetaPoco is Stack Overflow (for questions) and GitHub (for issues and discussion)

@Robert: Right now PetaPoco does everything it set out to achieve and more and there are no new features I've committed to adding.

4 August 2011 11:03 PM

@Brad: Well not necessarily completely true. There's still multi resultset outstanding feature that many would love having implemented (including me). And you commented on the feature suggestion https://github.com/toptensoftware/PetaPoco/issues/39#issuecomment-1277897

MultiPoco does provide getting several objects from a single row, but having code return several simple resultsets (or multi ones) may run faster on DB.

Example:

This is a simplified example with only two joined tables, but in the real world these joins can be complex and retunring them in a single table with a huge row length is not feasible. So instead of having

select p., c. from dbo.Post p left join dbo.Comment c on (c.PostID = p.PostID);

one would rather have: select * from dbo.Post; select * from dbo.Comment;

Two separate resultset onstead of one.

5 August 2011 04:53 PM

@Robert: yep, I was planning on doing multiple records sets but at the time was just thinking the ability to retrieve the different result sets. It soon became apparent though that what was actually needed was the relationship binding between the multiple record sets. Since I've never really done that in a real world project I'm a little reluctant to write myself as I'd probably miss the mark. I'd certainly consider merging it in though if someone else could figure it out in an elegant way.

6 August 2011 03:39 AM

You might be interested in the project I am currently working on at Mozilla to aid debugging code compiled from X -> JavaScript.

https://wiki.mozilla.org/DevTools/Features/SourceMap

The library for creating source maps already exists0, and I'm in the middle of integrating support for source maps in to the webconsole. If you're up for it, I'd love to hear your experiences using the lib to generate source maps alongside the outputted JS.

6 August 2011 06:17 PM

really hope can see your first product soon

17 August 2011 03:38 PM

Leave a comment

Name (required)
Email (required, not shown, for gravatar)
Website (optional)
Your Message
Leave these blank: