First, we might recognize that an XPath expression essentially is to a DOM Node tree what a RegExp is to a string. It's a coarse but valid comparison, and one that should guide how we use XPath in web programming. From here, let's thus copy good design and sketch the outlines of an XPath class that is as good and versatile for the DOM as RegExp is for strings.
Javascript RegExps are first class objects, with methods tied to them that take a data parameter to operate on. XPaths should be too. Instantiation, using the class constructor, thus looks like:
var re = new RegExp("pattern"[, "flags"]);
var xp = new XPath("pattern"[, "flags"]);
The respective patterns already have their grammars defined and are thus left without further commentary here. RegExp flags are limited to combinations of:"g"
- global match
"i"
- case insensitive
"m"
- multiline match
Nodes wanted | Behaviour | UnOrdered | Ordered |
---|---|---|---|
Multiple | Iterator | UNORDERED | ORDERED |
Multiple | Snapshot | UNORDERED | ORDERED |
Single | Snapshot | ANY | FIRST |
that are reducible to permutations of whether you want a single or multiple items, want results sorted in document order or don't care, and if you want a snapshot, or something that just lets you iterate through the matches until you perform your first modification to a match. There are really ten options in all, but NUMBER
Copying some more good design, let's make those options a learnable set of three one-letter flags, over the hostile set of ten types, averaging 31.6 characters worth of typing each (or a single digit, completely devoid of semantic memorability). When desigining flags, we get to pick a default flag-less mode and an override. In RegExp the least expensive case is the default, bells and whistles invokable by flag override; we might, for instance, heed the same criteria (or, probably better still, deciding on what constitutes the most useful default behaviour, and naming the flags after the opposite behaviour instead):
"m"
- Multiple nodes
"o"
- Ordered nodes
"s"
- Snapshot nodes
It might be time we decided on calling conventions, so we have some context to anchor up what the results returned are with. Our XPath object (which, contrary to a result set, makes lots of sense sticking into an object, to keep around for doing additional queries with later on without parsing the path again) has an
exec()
method, as does RegExp, and it takes zero to two arguments.xp.exec( contextNode, nsResolver );
The first argument is a context node, from which the expression will resolve. If undefined or null, we resolve against document.documentElement. The context node may be anything accepted as a context node by present day document.evaluate, or an E4X object.
The second argument is, if provided as a function, a namespace resolver (of type XPathNSResolver, just as with the DOM API). If we instead provide an object, do a lookup for namespace prefixes from it by indexing out the value from it, as with an associative array. In the interest of collateral damage control, should the user have cluttered up Object.prototype, we might be best off to only pick up namespaces from it whose
nsResolver.hasOwnProperty(nsprefix)
yields true
.The return value from this method is similar in intended spirit to
XPathResult. ANY _ TYPE
, but without the ravioli. XPaths yielding number, string or boolean output returns a number, string or boolean. And the rest, which return node sets, return a proper javascript Array of the nodes matched. Or, if for some reason an enhanced object be needed, one which inherits from Array, so that all (native or prototype enhanced) Array methods; shift, splice, push and friends, work on this object, too.Finally, RegExps enjoy a nice, terse, literal syntax. I would argue that XPaths should, as well. My best proposal (as most of US-ASCII is already allocated) is to piggy-back onto the RegExp literal syntax, but mandate the flag "x" to signify being an XPath expression. Further on, as / is a too common character in XPath to even for a moment consider having to quote it, make the contents of the
/.../
containment a '
-encased string, following all the common string quoting conventions.var links_xp = /'.//a[@href]'/xmos;
var posts_xp = /'//div[@class="post"]'/xmos;
for instance, for slicing up a local
document.links
variant for some part of your DOM tree, and for picking up the root nodes of all posts on this blog page respectively. And it probably already shows that the better default is to make those flags the default behaviour, and name the inverse set instead. Perhaps these?"s"
- Single node only
"u"
- Unordered nodes
"i"
- Iterable nodes
When requesting a single node, you get either the node that matched, or null. Illegal combinations of flags and XPath expressions ideally yield compile time errors. Being able to instantiate new XPath objects off old ones, given new flags, would be another welcome feature. There probably are additional improvements and clarifications to make. Good ideas shape the future.