ecmanaut: 200503

I've seriously started digging into Venkman now, the Mozilla project's javascript debugger and profiler. I'm sure it's a great tool if it's your own baby or if you have someone initiated around to teach you its ways, but short of that, you need to find good webpages to help you get anywhere, such as figuring out how to set a simple breakpoint. It's a bit like learning to make good use of Emacs, though in a GUI application. Striking.

Anyway, once you acquire some basic working skills, it has a lot to offer. I fell in love with the profiling tools, not so much because I tend to write javascript code in need of optimisation, but for being beautifully done. (Once you are sitting with the profiling data and have left Venkman's GUI safely behind, anyway.)

As it happens, though, I read through Svend Tofte's good guide referred above, and ended up at the BrainJar JavaScript Crunchinator. "Hey, cool hack!" I thought to myself, and tried feeding it my present work project, an application weighing in at 32 kilobyte, and it sat there for a long time grinding on it. Minutes later, it spat out a big chunk of code that started much like my own code and ended in mid air, three kilobyte short of the end of the file. Weird.

So I inspected my own source code, and found that I had commented out a block with /*...*/ just before where the crunchinator had given up, and the block ended in a // comment, inside the block comment -- and lo, the mystery was solved.

As I was curious to see if the crunched code would actually work, once that issue was resolved, I peeked on the comment stripper, decided it was beyond fixing and decided to run my own instead. After trying a regexp cut and paste approach, I was again annoyed at Javascript RegExps, for some reason not eating entire input strings (why does (.*) not match the rest of my input data? M'kay, I suppose I will read find the answer myself in ECMA-262 next time I'm bit by this and sufficiently annoyed to learn from the specification).

On the other hand, a regexp cut-and-paste solution is by rule of thumb always the wrong solution, for one reason or another, and after having given the matter some thought and made a brief inventory of the search methods on offer in javascript (thank you so much for Javascript: the Definitive Guide, David Flanagan!), I found a much more aesthetic solution built from String.search, String.indexOf and Array.join:

function removeComments( s )
{
  var found, code = [], commentStart = /\x2f[\x2f\x2a]/, commentEnd;
  while( (found = s.search( commentStart )) >= 0 )
  {
    code.push( s.substring( 0, found ) );
    if( s[++found] == '*' )
      commentEnd = '*/';
    else
      commentEnd = '\n';
    if( (found = s.indexOf( commentEnd, found )) >= 0 )
      s = s.substring( found + commentEnd.length );
    else
      s = '';
  }
  s = code.join(' ') + s;
  return s.replace( /\n/g, ' ' );
}

I paste it back into the crunchinator, fire away, and in mere seconds, the result pops up this time, no truncation to be seen. Surprised, I test it again. Sure enough, a speedy weasel indeed. I apply my newfound Venkman knowledge, sleep through the original code's 154.34 seconds worth of heavy processing (81 of which were spent in the original removeComments function), run my own version and get a lean 4.70 seconds for running the entire script. That's some mean garbage collection gains. Just to be sure I'm not measuring something irrelevant, I run the tests again in the other order. No difference worth mentioning.

I submit my improvements to the original author, notice that my additions just fell into the GPL (you know where to find the license, folks) and figure it's been a decent hack. Maybe someone could even learn from it. For reference, here is the original source code (don't do this at home):

function removeComments(s) {
  var lines, i, t;
  // Remove '/* ... */' comments.
  lines = s.split("*/");
  t = "";
  for (i = 0; i < lines.length; i++)
    t += lines[i].replace(/(.*)\x2f\x2a(.*)$/g, "$1 ");
  // Remove '//' comments from each line.
  lines = t.split("\n");

  t = "";
  for (i = 0; i < lines.length; i++)
    t += lines[i].replace(/([^\x2f]*)\x2f\x2f.*$/, "$1");
  // Replace newline characters with spaces.
  t = t.replace(/(.*)\n(.*)/g, "$1 $2");
  return t;
}

"The results?" I hear you asking. Well, my original 32 kilobyte application weighed in at 19, a pleasing 59.8% of its original weight, without resorting to variable renaming and similar destructive modifications. It worked, after fixing only six slight misses, half of which were my own (missing end-of-line semicolons and a case of an operator on both sides of a newline). The other half were inside regexps -- two related to apostrophes and quotation marks, the last one being the regexp / /, which had been optimised to // (...ow! -- and the rest of the line , or the rest of the script if you so prefer, was thus effectively cut off :-).

I suppose that means that another healthy exercise would be to rewrite the string literal parsing code too, but I would suspect that any improvements over the present would mean to parse by language grammar rather than crude string matching, and somehow it doesn't feel like very gratifying work. Not that I have peeked at the code, though.

Categories:

I absolutely detest all numeric non-ISO date formats, M/D/Y probably most of all. So when I encountered the Kingdom of Loathing calendar some benevolent (albeit calendrally challenged) person had published, I did not track down said person to tell him how glad I was at finding what I was looking for and how I felt about the format in which it was published. The meld of feelings would just not make any sense, and after all, the information was both there and fairly easily deciphered. I just strongly feel that deciphering is best left to computers.

Enter today's bookmarklet (feel free to bookmark it). It will ask you which (numeric) date format to convert from, harvest all frames for dates on that format and reformat them to readable ISO YYYY-MM-DD dates. If you go with the default M/D/Y, it will find 3/2/5, being sillyspeak for yesterday, and turn it into 2005-03-02. Short dates in the future (such as 3/3/6) will be assumed to mean the corresponding date from last century. Run it a year from now you will see 2006-03-03, though. Unless your clock is off, by a lot.

Upon googling for date tables to try it out on, I found a hilarious hallmark of stupidity - an excel sheet featuring the column "Employee Start date", "m/d/y or y/m/d e.g. 5/17/2 or 2002/5/17". The web is a silly place. Let's not go there.

Categories:

ecmanaut

2005-03-06

Profiling javascript with Venkman

2005-03-03

Calendar rant and date format conversion scriptlet