2006-07-02

Expressive user scripts with XPath and E4X

This is another tricks of the trade kind of post about some base tooling and patterns I mostly always end up using in my user scripts, much like the post I wrote about the event manager class.

First, there is the $x( xpath, root ) method, which takes an XPath expression (see a former article on an XPath bookmarklet, or go pick up the Firefox XPath Checker extension for playing interactively with XPath expressions) and an optional root node (to resolve it from), and returns an array of nodes matching the expression. The name is borrowed from FireBug, though FireBug unfortunately does not accept the second parameter. (Great for limiting node searches to subtrees of a page, and for writing tidy, small functions that parse out data from the document on their own, given a context node.) Mine looks like this:
function $x( xpath, root )
{
var doc = root ? root.evaluate?root:root.ownerDocument : document;
var got = doc.evaluate( xpath, root||doc, null, 0, null ), next;
var result = [];
while( next = got.iterateNext() )
result.push( next );
return result;
}
As it's not a practice I've seen many adopt in user scripts, and as it readily condenses very much of what I do to a few very expressive lines of code, let me share my second largest time saver and node dribbling readability improver. It's marrying map with the $x XPath slicer above to form a function that takes two or three parameters: an XPath expression (cutting out some relevant subset of DOM nodes in a page), a function (to apply to all of them) and again the optional root node (to resolve the expression from). I tend to name mine foreach in the interest of brevity:
function foreach( xpath, cb, root )
{
var nodes = $x( xpath, root ), e = 0;
for( var i=0; i<nodes.length; i++ )
e += cb( nodes[i], i ) || 0;
return e;
}
Usage is simple; here's an example I just tossed up that stops links from opening in new windows (direct install link):
foreach( '//a[@target]', dont_open_new_windows );
foreach( '//base[@target]', dont_open_new_windows );

function dont_open_new_windows( a )
{
if( !has_frame_named( top, a.target ) )
a.removeAttribute( 'target' );
}

function has_frame_named( w, name )
{
if( w.name == name )
return true;
for( var i=0; i<w.frames.length; i++ )
if( has_frame_named( w.frames[i], name ) )
return true;
return false;
}
People from functional language backgrounds take higher order functions like map for granted, the beautiful little swiss army knife that iterates lists performing some function on all their elements, returning a new list with the results. It makes for very neat and tidy code, without lots of looping constructs clogging up the flow, and with properly named functions as above, it even adds a free touch of documentation as to what action these loops performs. Bonus maintainability!

As the observant reader might have noticed, though, I allow the callback passed to foreach return a value that is added up and gets returned after the call. Not necessary for this rather typical case of just performing some action on the matched nodes, but once in a while that operation might be conditional, and it might be of interest to the caller how many times it was performed. By returning 1 from the callback when it did something, though, the foreach will return that information, allowing an early exit or similar, when appropriate.

For improved sanity (and, admittedly, for the hell of it), I tossed up another script that disposes of Dr. Phil references from match.com, making use of this functionality to add a menu command that reincarnates them again (...whyever someone would want to do that).

As merely hiding or removing a subset of nodes in an HTML document typically leaves weird-looking holes or layouts I opted to also add some subtle indicator of what was once there, and which could also serve as click-to-restore links. And while at it, to hint of how many Phils would pop back into view on clicking them using a hover title. As you see, the real action gets rather terse and readable using the tools above:
var phils = foreach( '//*[contains(@id,"Phil")]', hide_node );
if( phils )
{
phils = 'Restore '+ phils +' Dr. Phil reference'+
(phils==1?'':'s');
GM_registerMenuCommand( phils, show_phils );
}
foreach( '//div[@class="ugh"]',
function( div ){ div.title = phils; } );
(The class="ugh" divs being my click-to-restore indicators.) Some people prefer adding icons using images and the data: protocol, which gets a bit messy but works. I opted for a Unicode thumbnail of the original content, post mortem for added symbolism -- U+2620, or "☠".

As a programmer, never underestimate Unicode as the source of useful graphics for this kind of thing; there is plenty on offer. Googling for "unicode" and the artwork you seek often strikes gold. If you don't have a keyboard sporting a ☠ skull key, just have javascript render the proper string for you on its own instead:
var skull = String.fromCharCode( 0x2620 );
And while this post is already getting a bit lengthy, I thought I'd at least mention the script that got me writing this article in the first case, on realizing how quickly it came to be, thanks to all the handy tools introduced above: another scratch-an-itch script reshaping the pages of match.com.

It employs another usability improvement technique I can warmly recommend for pages on web sites that do not use the document title for useful page-relevant content. The document title being the name of tabs in modern browsers, it very useful for bringing order to an unruly browsing experience. Just set document.title to whatever title you would prefer, and if you feel like it, you may also change the tab favicon.

That can be done using this bit of code from Mark My Links, also parts of my standard library of useful assorted goodies:
function override_favicon( url )
{
foreach( '//link[@rel="shortcut icon"]', remove_node );
append_to( <link rel="shortcut icon" href={url}/>,
$x('//head')[0] );
}

function remove_node( node )
{
node.parentNode.removeChild( node );
}

function append_to( e4x, node, doc )
{
doc = doc || (node ? node.ownerDocument : document);
return node.appendChild( import_node( e4x, doc ) );
}
This is made as terse as above much thanks to the expressive power of E4X, and made somewhat verbose again in the library import_node method due to the present lack of the optional domNode() API, which unfortunately didn't make it into Firefox 1.5. Here's hoping it's coming to a browser near you before long so we can drop that mess.

Happy hacking! And don't write more code than you have to.

3 comments:

  1. Nice article!

    My xpath function is a little more simpler than yours and looks like
    this:

    function xpath(query, context) {
    context = context ? context : document;

    return document.evaluate(query, context, null,
    XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
    }

    "If you don't have a keyboard sporting a skull key..." who doesn't? :)

    ReplyDelete
  2. Great post.

    Shouldn't the override_favicon function in the last code snippet use foreach and not $x?

    ReplyDelete
  3. Quite right; thanks -- fixed. (Slight thinko in last-minute code touch-ups for the context of the post.)

    ReplyDelete

Limited HTML (such as <b>, <i>, <a>) is supported. (All comments are moderated by me amd rel=nofollow gets added to links -- to deter and weed out monetized spam.)

I would prefer not to have to do this as much as you do. Comments straying too far off the post topic often lost due to attention dilution.

Note: Only a member of this blog may post a comment.