blog tags:

About:

I'm Dmitry Popov,
lead developer and director of Infognition.

Known in the interwebs as Dee Mon since 1997. You could see me as thedeemon on reddit or LiveJournal.

RSS
Articles Technology Blog News Company
Blog
From native code to browser: Flash, Haxe, Dart or asm.js?
November 17, 2014

If you developed your own video codec and wanted to watch the video in a browser what would you do? That is a question we faced a few years ago with ScreenPressor and at that time the answer was Flash. It was cross-platform, cross-browser, widely available and pretty fast if you use the right programming language, i.e. Haxe instead of ActionScript. So we implemented a decoder (and a player) in Flash back then. But now Flash is clearly on decline, supported only on desktop, meanwhile different browsers in a race for speed made JavaScript significantly faster than it was a few years ago. So I decided it's time to check how JS can compare to Flash when it comes to computation-intensive task such as a video codec. I really don't like the idea of writing in pure JavaScript 'cause the language is such a mess, I'd rather use something that at least gets closures, objects and modules right and has some static type checking. Since I already had working code in C++ for native code and in Haxe for Flash, obvious choices were using Emscripten to generate asm.js from C++ code and retargeting Haxe code to JS (just another target for Haxe compiler). Also, Dart is pretty close as a language (porting to Dart is simpler than rewriting in some Haskell or Lisp clones) and Dart VM is marketed as a faster and better replacement for JS engines, so I was curious to try it.

To test and compare different languages and compilers I decided to implement in them a small part of the codec, the most CPU consuming one: decompression of a key frame to RGB24. I'm going to show the results first and then follow with some notes on each language.

Here are the times for decompressing one particular 960x540 px frame on a laptop with a 2.4 GHz Core i3 CPU and Windows 8.1:

Or in text:

time, ms Size
Chrome 38 Firefox 33 IE 11
Flash 57 58 58 6 KB
Dart to JS 60 70 95 135 KB
C++ to ASM.js 54 43 149 212 KB
Haxe to JS 52 49 62 12 KB
Native C++ 34 ms
Dart on Dart VM 172 ms

Follow the links to run the benchmarks for yourself.

And here's some mobile story, time of the same operation in ms:

A tablet with 1.3 GHz CPU on Android 4.2.2
Firefox 33.1 Chrome 39
Haxe to JS 250 289
ASM.js 197 330

A phone with 1.4 GHz CPU on Android 2.3.5, Firefox 33.0
Flash 403
Haxe to JS 313
ASM.js 296

Another curious comparison:

Compilation time, in seconds:
Haxe to JS 0.18
Haxe to Flash 0.13
dart2js 10.55
Emscripten 3.44

By the way, the picture used in the test shows a lossless compression from 960x540 = 518400 pixels = 1555200 bytes of RGB24 data to 149321 bytes, i.e. ~10x lossless compression. I couldn't reach similar size with PNG even with special tools, and JPG at this size shows visible artifacts.

General thoughts

Although people keep repeating "Flash is slow" it's actually pretty fast. 1.67 times slower than native C++ here, and in some other tests of mine sometimes only 20% slower. That's comparable to Java, C# or Go, and faster than most other languages/VMs. Of course, only if you use Haxe; with ActionScript it can easily be twice slower.

It seems that at least in Firefox and Chrome (Chromium also) JavaScript can finally be faster than Flash and be just 1.3 - 2 times slower than native C++ code. Which is really impressive, taking into account its lack of static types and simple integer values (every number is a double there). Internet Explorer is somewhat behind, Flash is still the fastest option there.

Among the tested languages none is a clear winner in all browsers, each browser has its own favorite. For example, asm.js is really great in Firefox but only because there is a special ahead of time compiler in Firefox that turns on for this code. Other browsers treat asm.js code as ordinary JavaScript and due to its increased size it often works slower than simple JS.

Haxe is really great both in speed of generated code, its size and speed of code generation itself. A lesson to future compiler makers: if you want your compiler to be really fast use OCaml, not Java! Haxe is more complex than Dart language-wise, having more sophisticated type system, real type inference and some macros, and yet it translates freaking 60 times faster.

Mobile web apps: slooow. Even when CPUs have only 2x lower frequency, due to ARM vs. Intel differences they turn out to be 5-6 times slower. But who knows, they may catch up in a couple of years.

Some notes on particular languages / targets:

Haxe targeting Flash

If you need compact code that works consistently fast in all desktop browsers, Flash compiled from Haxe is a really nice option. It's compact because of being distributed in bytecode. In Flash 10 they added special instructions for fast direct memory access. These instructions were not available in ActionScript but Alchemy (C++ to Flash compiler) and Haxe can use them. They are available via flash.Memory API and work on a single array: you select some array first to be this fast piece of memory and then use functions like Memory.getI32() and Memory.setI32() to access it, this works faster than ordinary arrays, but if you need to access many different arrays you have to manually allocate them inside this selected one and use indices with offsets.

Also, Haxe generally optimizes code much better than ActionScript, this combined with special memory API gives very fast code compared to AS3. As you could see above, Haxe compiler generates Flash incredibly fast, while the default ActionScript compiler (again, written in Java) is significantly slower.

Haxe targeting JavaScript

Haxe is a multi-target language, however each target has some specific APIs and there are also some semantic differences. I had a fully functioning ScreenPressor decoder in Haxe for Flash, but making a good JS version of it turned out harder than I expected. After changing API from Flash to JS (firstly, moving from flash.Memory to typed JS arrays) I've got a working JS version but it took more than 170 ms in Chrome to decode that frame. I knew it was too slow, at that time I already had a JS version generated from Dart and it worked ~3x faster. The slowdown was caused by UInts. In ScreenPressor we use range coder, a variant of arithmetic coder, and in our original C++ code it operates on 32-bit unsigned integers, doing some arithmetic and bit shifts. Haxe has a proper type for them - UInt, and in Flash it works perfectly fine. However in JavaScript there's no such thing, all numbers are really doubles. And any bitwise operation, like a shift or bitwise-or, turns its operands and result into a signed 32-bit int value (still stored as a double). That means 0xFF00 << 16 becomes a negative number. In order to keep UInts working, every time we use some UInt in an arithmetic expression Haxe inserts a comparison with 0 and addition of 4294967296.0 in case its JS value is negative. Comparisons with positive constants for some reason turn into weird calls to in-place lambda functions containing some weird constant comparisons. All these things make arithmetics with UInts very slow, hence the 3x slowdown of our code. What's interesting, while Haxe keeps UInts 32-bit, it doesn't keep 32-bitness of Ints (signed integers) in JS, so they allow values like 0xFFFFFFFF (which would become -1 in other Haxe targets). But again, any bitwise operation with them can turn them back into 32-bit signed int. So, when targeting JS from Haxe, if you want UInts, i.e. values 0 .. 0xFFFFFFFF, you can use Ints as long as you don't use shifts or bitwise logical operations. Just replace "<< 8" with "* 256" and "a | b" with "a + b" (where appropriate) and it works well and fast. Changing our UInts to Ints and doing these transformations allowed having simple and fast generated JS code.

Another note: the JS code generated by Haxe actually failed to work in IE, saying it cannot cast HTMLInputElement into HTMLInputElement. After editing the generated code manually to skip this cast it worked fine.

Dart

Dart is a nice language (although its generics are too limited, someone ate too much Java for breakfast), it comes as a package with Dartium (version of Chrome with Dart VM on board) and DartEditor that does static analysis as you type, infers types and shows lots of useful info in auto-completion pop-ups. Its API, although generally based on JavaScript's, is often more convenient.

Porting our Haxe code to Dart went smoothly. The surprise came when I tried to use dart2js and do some benchmarks. Code in Dart VM worked ~3x slower than the same code translated to JS with dart2js! And the reason was again our integers usage. There is only one integer type in Dart - int. Semantically it's an unbound integer. Internally in 32-bit Dart VM values that fit into 31 bits are called "smi" ("small int") and stored in a 32-bit word. Larger values are stored in boxed 64-bit integers and even larger values are using full-blown BigInts. When we write something like "x = 0xFFFFFFFF" it's a positive number with 32 "1"s in binary, you need at least 33 bits to store this number as a signed integer. So it doesn't fit in a "smi" and gets boxed. And although our original code only needs 32-bit unsigned integers, in 32-bit Dart VM many of them turn into boxed 64-bit values, and this causes the slowdown. When translated to JavaScript they all perfectly fit into ordinary unboxed JS numbers, so it works faster. Thanks to Vyacheslav "mr_aleph" Egorov, a Dart VM group insider, for explaining this. He shows that in 64-bit Dart VM my code runs 26% faster than generated JS. However there is no 64-bit Dartium available for Windows yet.

There is, however, one obvious thing that makes dart2js generate less efficient code than Haxe. Each array access gets preceded by an explicit bounds check and a call to a Dart-specific function in case of bounds error. It doubles the number of bounds checks and bloats the code, hence the slowdown compared to cleaner Haxe-generated code.

Emscripten and asm.js

Emscripten uses clang/LLVM to generate asm.js code from C++. Hearing about clang and LLVM made me instinctively expect difficulties using on Windows and I was ready to reboot to Linux for this experiment however it turned out on Windows installation of Emscripten is the easiest. Just an one click installer, and everything works well out of the box.

Using C++ code from JavaScript is pretty simple once you get how it all works. Just as Flash with its fast memory access inside a single array, asm.js works with one fixed size array which serves as the main memory (heap) for all your C++ code. You can't just pass two JS arrays into your C++ code, you need to manually allocate memory in this asm.js heap and copy the data there. After C++ code finished work on them, you can read the data from the array being the asm.js heap.

Porting our C++ code to asm.js was easy, since it was just pure algorithms and computations, no external libraries were used. I don't know how exactly Emscripten handles the uints question but everything worked well and pretty fast without me having to worry about it and having to turn uints into ints. The size of generated code was expectedly the largest: ~500 KB without minification and ~200 KB with it. Since the JS code is generated from LLVM bitcode, it's rather hard to trace it back to the source manually, so I didn't even try.

As for speed, as mentioned above, it only works really fast in Firefox where asm.js code gets special treatment. In other browsers it's not faster than much shorter code generated by other languages and sometimes (like in IE) significantly slower.

Conclusion

From the numbers above and mentioned preconditions I think it's pretty obvious what choice we're going to make: Haxe seems the best option for us to make the ScreenPressor decoder in JS.