Lately, I’ve been working on Flash ActionScript 3 decompiler, and I noticed an interesting pattern. Normally, if you work with a piece of well-known software and something goes wrong, it’s your fault. But with Flash it’s not anything like that! If it doesn’t work, then it’s probably a bug in the compiler which was preserved for compatibility. Or the specification is plain wrong. Or it’s a bug in the compiler which no one noticed or attributed to cosmic rays instead.
I’ll give a few examples.
Specification is wrong
The official specification on AVM2 is often plain incorrect. Apart from examples already covered in semi-official Mozilla-authored errata, there are a few subtle mistakes. Like mixing up sign bit and sign extension: section 4.1 of spec mentions that signed integers are stored with sign extension, whereas in reality they’re stored with 31th bit set when the values are negative.
There are some other ones (e.g. pushliteral opcodes are screwed up in spec), but they’re not worth explaining.
Compiler generates dangerously invalid code
When working on support for lookupswitch
opcode I wrote a small snippet to test my code with. Disassembling it yielded strange results; the code was seemingly invalid. I scratched my head on it for half a hour and then just went and tried to execute it. And you know what? It actually was invalid.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
(The “actual” results are derived from assembler listings. Tamarin shell refused to execute it due to verification errors.)
Update: As one Reddit commenter corrects me, recent versions of ASC no longer have this problem.
No optimization ever
ActionScript compiler does not optimize, period. This produces a lot of weird code and some pieces of modern art.
Consider this switch
statement (taken from abcdump.as utility):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Not only it generates a piece of modern art in an IR dump, but also has a statement so beautifully useless it should be preserved for future generations:
1
|
|
For those unaware of s-expressions and Lisp, not only does this conditional always execute the same branch, but its result also wouldn’t be different even if other one would be taken.
For extra horror, the “piece of modern art” above is executed from scratch each time the VM encounters it, including the constant expressions. Any doubt left why Flash is so slow and power-hungry?
Compiler intentionally generates invalid code
As I’ve already shown, ASC contains enough stupid errors (see this similar bug) to accidentally generate invalid code in not-so-rare cases. But it also intentionally generates invalid code in one very frequent case: a finally
block.
Let’s compile this function:
1 2 3 4 5 6 7 |
|
The compiler will emit a shitload of bytecode (including two catch and two throw statements), but the relevant part is here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
As you see, the opcode at addresses 0025 is invalid because it tries to pop an object from an empty stack. The virtual machine actually recognizes the finally
clause by encountering these invalid opcodes. Think about it a little longer, and you’ll go insane.
Also, the recommended way to flow control after a finally
statement is… using lookupswitch
opcode. The PushByte -1
is actually a mark for that lookupswitch
trampoline which makes it jump to a rethrow entry point.
There’s some more interesting stuff like jumps past the end of function (hardwired in VM to do the same as returnvoid
opcode) or deliberately emitted dead code.