(a bit of) whitespace

Has someone just said “lowlevel”?

Reaching the Limits of Adobe Stupidity

Lately, I’ve been working on Flash ActionScript 3 decompiler, and I noticed an interesting pattern. Normally, if you work with a piece of well-known software and something goes wrong, it’s your fault. But with Flash it’s not anything like that! If it doesn’t work, then it’s probably a bug in the compiler which was preserved for compatibility. Or the specification is plain wrong. Or it’s a bug in the compiler which no one noticed or attributed to cosmic rays instead.

I’ll give a few examples.

Specification is wrong

The official specification on AVM2 is often plain incorrect. Apart from examples already covered in semi-official Mozilla-authored errata, there are a few subtle mistakes. Like mixing up sign bit and sign extension: section 4.1 of spec mentions that signed integers are stored with sign extension, whereas in reality they’re stored with 31th bit set when the values are negative.

There are some other ones (e.g. pushliteral opcodes are screwed up in spec), but they’re not worth explaining.

Compiler generates dangerously invalid code

When working on support for lookupswitch opcode I wrote a small snippet to test my code with. Disassembling it yielded strange results; the code was seemingly invalid. I scratched my head on it for half a hour and then just went and tried to execute it. And you know what? It actually was invalid.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function propel_switch(q:int):Boolean {
  switch(q) {
  case 1:
    print("hoge");
  break;
  case 2:
    print("fuga");
  break;
  case 3:
    print("piyo");
  break;
  case 5:
    print("bar");
  break;
  default:
    print("baz");
  break;
  }
  return false;
}

//                   expected   actual
propel_switch(0); // baz        baz
propel_switch(1); // hoge       hoge
propel_switch(2); // fuga       fuga
propel_switch(3); // piyo       <nothing printed>
propel_switch(4); // baz        <infinite loop>
propel_switch(5); // bar        bar

(The “actual” results are derived from assembler listings. Tamarin shell refused to execute it due to verification errors.)

Update: As one Reddit commenter corrects me, recent versions of ASC no longer have this problem.

No optimization ever

ActionScript compiler does not optimize, period. This produces a lot of weird code and some pieces of modern art.

Consider this switch statement (taken from abcdump.as utility):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
 switch (version) {
 case 46<<16|14:
 case 46<<16|15:
 case 46<<16|16:
     var abc:Abc = new Abc(data)
     abc.dump()
     break
 case 67|87<<8|83<<16|10<<24: // SWC10
 case 67|87<<8|83<<16|9<<24: // SWC9
 case 67|87<<8|83<<16|8<<24: // SWC8
 case 67|87<<8|83<<16|7<<24: // SWC7
 case 67|87<<8|83<<16|6<<24: // SWC6
     var udata:ByteArray = new ByteArray
     udata.endian = "littleEndian"
     data.position = 8
     data.readBytes(udata,0,data.length-data.position)
     var csize:int = udata.length
     udata.uncompress()
     infoPrint("decompressed swf "+csize+" -> "+udata.length)
     udata.position = 0
     /*var swf:Swf =*/ new Swf(udata)
     break
 case 70|87<<8|83<<16|10<<24: // SWF10
 case 70|87<<8|83<<16|9<<24: // SWF9
 case 70|87<<8|83<<16|8<<24: // SWF8
 case 70|87<<8|83<<16|7<<24: // SWF7
 case 70|87<<8|83<<16|6<<24: // SWF6
 case 70|87<<8|83<<16|5<<24: // SWF5
 case 70|87<<8|83<<16|4<<24: // SWF4
     data.position = 8 // skip header and length
     /*var swf:Swf =*/ new Swf(data)
     break
 default:
     print('unknown format '+version)
     break
 }

Not only it generates a piece of modern art in an IR dump, but also has a statement so beautifully useless it should be preserved for future generations:

1
  (ternary (false) (integer 15) (integer 15))

For those unaware of s-expressions and Lisp, not only does this conditional always execute the same branch, but its result also wouldn’t be different even if other one would be taken.

For extra horror, the “piece of modern art” above is executed from scratch each time the VM encounters it, including the constant expressions. Any doubt left why Flash is so slow and power-hungry?

Compiler intentionally generates invalid code

As I’ve already shown, ASC contains enough stupid errors (see this similar bug) to accidentally generate invalid code in not-so-rare cases. But it also intentionally generates invalid code in one very frequent case: a finally block.

Let’s compile this function:

1
2
3
4
5
6
7
    function c() {
      try {
        hoge();
      } finally {
        piyo();
      }
    }

The compiler will emit a shitload of bytecode (including two catch and two throw statements), but the relevant part is here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
; This is an exception handler. Stack is empty upon jump to an
; exception handler.
;  Address          Opcode    Args   Stack state, comments
   0016             GetLocal0        ; [local0]
   0017             PushScope        ; []
   0018             GetLocal1        ; [local1]
   0019             PushScope        ; []
   0020              NewCatch        ; [catch]
   0022                   Dup        ; [catch catch_dup]
   0023             SetLocal2        ; [catch]
   0024             PushScope        ; []
   0025                 Throw        ; I want an object to throw! Ouch!
   0026              PopScope
   0027                  Kill     2
   0029              PushByte    -1
   0031                  Jump   +32  ; Jump to rethrow

As you see, the opcode at addresses 0025 is invalid because it tries to pop an object from an empty stack. The virtual machine actually recognizes the finally clause by encountering these invalid opcodes. Think about it a little longer, and you’ll go insane.

Also, the recommended way to flow control after a finally statement is… using lookupswitch opcode. The PushByte -1 is actually a mark for that lookupswitch trampoline which makes it jump to a rethrow entry point.

There’s some more interesting stuff like jumps past the end of function (hardwired in VM to do the same as returnvoid opcode) or deliberately emitted dead code.

Comments