Subj : Re: Compacting JS source via ParseTree or Decompile
To   : Dan Libby
From : Brendan Eich
Date : Mon Jun 07 2004 07:16 pm

Dan Libby wrote:
> For an initial release, I should probably scale things back a bit and 
> just concentrate on making a good code compactor -- one that will work 
> with the existing libjs.  Then, if we agree on a good design, perhaps we 
> can extend both to perform the more advanced pretty-printing.

Right, that's almost always a sound approach (drool, roll, crawl, ..., 
run ;-).

> BTW, with what I have already, our largest JS file shrunk from 119K to 
> 59K.  A 50% reduction!

Cool!

> Towards the goal of code compaction, there's a couple more items:
> 
> 1) I notice that even with the JS_DONT_PRETTY_PRINT flag, the decompiler 
>  always places a newline after "for" statements, eg:
> 
> for(...) {
>   code}more code;
> 
> I think the culprit is this bit of code in jsopcode.c, which does not 
> use the "pretty" flag:
> 
> 1490                 rval = OFF2STR(&ss->sprinter, ss->offsets[ss->top-1]);
> 1491                 js_printf(jp, " in %s) {\n", rval);
> 1492                 jp->indent += 4;
> 1493                 DECOMPILE_CODE(pc + oplen, tail - oplen);
> 1494                 jp->indent -= 4;
> 1495                 js_printf(jp, "\t}\n");

Wrong culprit -- here is the bad boy:

991                     /* Do the loop body. */
992                     js_puts(jp, ") {\n");
993                     jp->indent += 4;
994                     oplen = (cond) ? js_CodeSpec[pc[cond]].length : 0;
995                     DECOMPILE_CODE(pc + cond + oplen, next - cond - 
oplen);
996                     jp->indent -= 4;
997                     js_printf(jp, "\t}\n");

The problem is the js_puts usage, instead of js_printf.  See the closing 
brace js_printf at line 997.

> This is not a very big deal, but it would be nice to be able to strip 
> all newlines, saving those exta bytes.  Would it be safe to simply do a 
> global string replace of "\n" to "" on the decompiled script?   Or might 
> that break intentional user code?

It's just a bug, fixed in the patch for 
http://bugzilla.mozilla.org/show_bug.cgi?id=245795.

> 2) A killer feature for code compaction would be the ability to 
> automatically abbreviate local scope variables.  Global vars, class 
> vars, and function names are trickier, because they can be referenced 
> externally.  But a variable internal to a function should always be safe 
> to rename, so long as you also change the references within that 
> function.  Agreed?  Caveats?

Sure, why not?

> Okay, so I think that for the jscompact app to do this, it needs to be 
> able to:
>  - compile script (duh)
>  - walk all the functions in the bytecode
>    - walk the local vars in the function
>    - change the variable name  (store in a hash, oldname => newname )
>    - walk the opcodes to determine where oldname is used
>    - update the reference
>  - decompile bytecode

All you need to do is to rename the function's properties that have 
js_GetLocalVariable as their getter.  Are you renaming arguments too?

>> You'll need to extend the front end to keep comments.  Currently the 
>> scanner strips them.
>  
> Tricky.  I suppose it would need to somehow parse the comments and then 
> record some sort of "marker" for where to place the comment in the 
> decompiled string.  Erggg.

Rather than try to pass things through compiled script to the 
decompiler, I would recommend just annotating the parse tree, using 
#ifdef'd code, with white space, comments, and extra braces.  Then your 
back end would walk the parse tree created by js_ParseTokenStream, and 
you would avoid hacking or calling js_EmitTree etc. altogether.

> When you say "back end", exactly which code component and/or set of 
> API's are you talking about?  Clearly, if Decompile() and its 
> sub-functions were substantially re-written, that should do the trick.

See above -- I'm proposing you use the JS_FRIEND_API exported from 
jsparse.h.

> I suppose it would be possible to duplicate these in the jscompact 
> frontend, but with the necessary pretty-printing modifications. That 
> way, the library doesn't need to change at all. But it seems rather 
> brittle/gross using internal data structures.

Don't duplicate the decompiler (don't even generate code to decompile). 
  Do walk the JSParseNode tree yourself.

>> What do these templates look like?  Are they just for documentation, 
>> or for other purposes?
> 
> 
> Yes, for documentation.  I was just thinking of auto generating a 
> comment block above each function that doesn't already have one.  It 
> could use the javadoc (or other) format, and basically just saves the 
> tedium of copy/pasting the same base comment block all over the place. 
> Definitely not a big deal.

Easy to do if you write your own back end to the parser.

>>> I am still interested in the "walk pn and check sanity" approach, 
>>> modified] source code, or am I getting in too deep for a weekend 
>>> project?
>>
>> It depends on how much you know the engine, but let's design first.
>  
> I'm just taking my first look at the engine this week.  (Well, except 
> for a few headaches making it's ancestor and the rest of communicator 
> build on OS/2 back in 97-98)

Hey, I thought I recognized your name!

You're doing fine.

/be

.