2006-01-31

Bookmarklet tool: Find links to you!

If you are the least bit interested in who links to you and use some service to show you HTTP referrers of site visitors coming to you via inbound links from elsewhere, or see trackbacks from other sites, and so on, you have most likely encoutered this problem, when visiting the page:

Where is that link to me?

I figured I'd make a quick bookmarklet to quickly scroll to the exact spot in a page where the link is, or, if invoked again, to zoom further down the page to the next link, in case there are any.

It turned out quite okay, and as it wasn't much work making it customizable I went that extra bit to make it a tool useful to mostly anybody. Edit: as it eventually turned out, that is just almost true; I took a full hour to polish it up to work better than my original quick and dirty hack. It also illustrates a few good bookmarklet making techniques (more thorough descriptions of a few of these are presented at gazingus.org);

  • Encasing the script in a anonymous function casing so it doesn't leak any variable or funciton names to the page you invoke it on so it won't upset any scripts running at the target page.

  • Creating support functions inside this function casing.

  • Using var to define function local variables.

  • Using the void operator to throw away the return value of a function, so the target page isn't replaced with a document with the value your bookmarklet produced.
I made two variants; the first bookmarklet prompts for a domain name (regexp) to match for all the links in the page, the second matches a fix domain without ever prompting.

Click either of the configure buttons below to set up both scripts to the domain of your preference before bookmarking them. You can click either button again and again to reconfigure the links in the page and bookmark the resulting scripts, if you have a whole slew of sites you want to have bookmarklets for. For the script that prompts for a domain, this sets up the default suggestion, for the other one, it sets which links it will look for.

Regardless of how you configure the scripts, the test will only be performed against the domain name of links, case insensitively (as domain names are not case sensitive) -- if you want to change that, you should be advanced enough to be able to tweak the script to your liking on your own.

Configure scripts by

The first script, which prompts for a domain name regexp:
javascript:void(function()
{
function Y( n )
{
var y = n.offsetTop;
while( n = n.offsetParent )
y += n.offsetTop;
return y
}
var l = document.links, i, u, y, o = [];
if( u = prompt( 'Find links to what domain? (regexp)',
'^ecmanaut\.blogspot\.com$' ) )
{
u = new RegExp( u, 'i' );
for( i = 0; i<l.length; i++ )
if( l[i].host.match( u ) )
o.push( Y(l[i]) );
o.sort( function( a, b ){ return a - b } );
for( i = 0; i<o.length; i++ )
if( (y = o[i]) > pageYOffset )
return scrollTo( 0, y );
alert('No more links found.')
}
})()
The second script, which does not prompt for the domain matcher regexp:
javascript:void(function()
{
function Y( n )
{
var y = n.offsetTop;
while( n = n.offsetParent )
y += n.offsetTop;
return y
}
var l = document.links, i, y, o = [],
u = /^ecmanaut\.blogspot\.com$/i;
for( i = 0; i<l.length; i++ )
if( l[i].host.match( u ) )
o.push( Y(l[i]) );
o.sort( function( a, b ){ return a - b } );
for( i = 0; i<o.length; i++ )
if( (y = o[i]) > pageYOffset )
return scrollTo( 0, y );
alert('No more links found.')
})()
If you want to, you can try clicking either script rather than bookmarking them, to go chasing around this page for links to places so you get a feel for how they work. The default setup prior to customization will look for links staying on this blog.

Things I (re)learned (or remembered a bit late, depending on how you see it) on writing the above scripts:

  • Chasing through document.links processes links in document order (DOM tree order), not to be confused with top-down order of the fully layouted page.

  • Array.prototype.sort() sorts in alphabetical order, which sorts the array [0,3,6,17,4711] as [0,17,3,4711,6] which can be quite different from what you wanted. To sort by numeric values instead, assuming the array only contains numbers, provide a comparison function function( a, b ){ return a-b; } to the sort method.
So those of you who saw the post within the first half hour or so of my posting it, might want to pick up the scripts again. (I opted not to clutter it all down with change markers, to keep the code readable.)

If you use Firefox (or another Mozilla derivate) I recommend editing the bookmarks you save to give each a keyword ("links-to", for instance). That way, you won't have to put it on some panel (or remember where in your maze of bookmarks you put it) to access it when the need arises; just type links-to in the address field and hit return, and the script will be run (and pulled into the address field, should you want to edit it afterwards). This is a very useful technique for keeping lots and lots of tools (not to mention sites) easily accessible, if you like me find it easier to recall names than traversing menu structures. Just hit Ctrl+L, type the script name and hit return. Quick and easy, and even works in full-screen presentation mode when you have no menus or toolbars visible.

This is unfortunately as close to access keys for bookmarks you get in the Mozilla browser family; go vote for bug 47199 now if you too want to put that feature on the development agenda. It has an embarrassing seven votes and was filed in the year 2000, and has not seen much action since. Imagine having any site or handy tool like this a keypress away. You know you want it. So off you go; vote away! Be heard.

JSONP: Why? How?

Codedread took me up on a few points worth discussion in relation to my previous post about JSONP:

  • Isn't the feed consumer very susceptible to incompatible changes in the source feed?

    Yes; naming issues are the same for JSON as for any XML dialect; the day your data provider breaks backwards compatibility with the format they previously committed to, your application breaks. It can be argued that widely adopted schemas such as RSS and ATOM are safer bets to write code for, but in my opinion it's a bit of a moot point. As soon as you use somebody else's data, you are at their mercy of still making it available to you in whatever format they picked. Content providers are still kings of their reign. When Del.icio.us has downtime (and don't care enough about their JSON feed consumers to degrade gracefully, still providing a valid, though empty, JSON feed) your application breaks.

  • How do you ensure that the JSONP feeds don’t embed bad code with side effects you did not opt in on?

    If you live solely client side, having just the provisions of the browser sandbox in unprivileged mode, as is the common case when JSONP is interesting at all: you can't. You are at the mercy of your feed provider's better judgement. Should you discover that they break the "contract" in passing code rather than data your way, wreak havoc and give them the devastating publicity they deserve. Pick your feed with the same attention to the trustworthiness of the source as when you pick your food. Don't eat it if you fear poisoned food. Or feeds.

    If you can't trust your feed source not to send code with unwanted side effects, and you have elevated privileges (either from being a signed script, or perhaps because you are a Greasemonkey user script or similar), so you can fetch content by way of XMLHttpRequest (side-stepping the same origin policy), you can use a JSON parser rather than eval() to process the feed. The JSON site provides a JSON parser written in javascript.

    Otherwise you would have to use some server side script to process and cleanse the feed first. In which case your choice of picking up just any feed at all and reformatting it as JSONP would be the solution most close at hand. Of course this is always an option open to you regardless of the original format (RSS, Atom or anything you know how to read) if you have a server side base of operations where you can cook your own JSONP. Making a generic XML to JSONP converter, as Jeff suggests, is of course a neat idea.

  • How do you include JSONP feeds dynamically into a web page?

    Point a script tag at the feed. If you plan on spawning off multiple JSONP requests throughout the lifetime of your page, clean up too, removing the script tag you added for the request, for instance from the callback you get when the script has loaded.

  • What about XML?

    What you do with XML, you should probably keep doing with XML, as there is that much of a larger tool base available for leveraging it. JSONP isn't here to replace XML in any way, it's strength is solely in overcoming the same origin policy of the browser security model. (Sure, the markup overhead of JSONP is roughly 50% less than that of XML, and XML formats are typically designed in a bulkier fashion than typical JSON formats in prevalent use today, but neither is really much of an argument for most practical purposes.)

  • Why should you adopt JSONP too?

    The way I see it, providing JSONP feeds for external consumption is really only interesting when you invite the wide public to innovate around your data, client side, unaided by any kind of server side resources on their part. If you don't, there is not much point in doing this at all. Doing JSONP is just lowering the bar as far as you can possibly go, in inviting others to use your service programmatically.

    It's comparable with providing a dynamic image meant to sit on a web page, or a really small web page component meant to go in an iframe of its own. The striking difference is that your data becomes available to programmatic leverage, which neither the image nor iframe does due to the browser security model (not counting the provions on offer for partially hiding them by way of CSS).

  • What is the guaranteed payoff with making a JSONP feed?

    None. If your users don't want to use your data, they won't. Nor will they, if they don't know it's there, or how to make use of it. JSONP isn't widely known yet, and javascript has much bad heritage of being a language people cut and paste from web sites to get nifty annoying page effects going on their web pages. To this public, a JSONP feed is useless. It's for the growing community of "Web 2.0 programmers", to use a nasty but understandable name for them, that your JSONP feed will be useful. And you get no guarantees that anyone will use it for anything either; your data might not be interesting, or you might smell like the kind of shady person that would salt their JSONP feed with nasty side effects, stifling people that would otherwise have considered building something with your data.

    The only firm guarantee is that nobody will use a feed you don't provide.
All of the above is the price we pay to use data we don't produce ourselves, and for overcoming the same origin browser policy security model, which is the very heart of the JSONP idea. Nothing has changed in this field since the millennium (except we have now got a name for a best practices approach for the behaviour/design pattern), though the WHAT WG are churning away at better hopes for the future. Until anything concrete comes out of that, we are stuck with JSONP for things like this, though. Be sure to make the best of it.

2006-01-30

JSONP: The recipe for visitor innovation

What RSS and Atom is to feeds, for making your data easy to access and subscribe to for humans, JSONP is for making your data easy to access for web page applications. To date, Yahoo are more or less alone on the large player front about having realized this. (They are also only almost JSONP compliant, but it's close enough to still be very useful.)

Where RSS and Atom are well defined XML formats, JSON is more similar to XML, in just being a data encoding, and a very light-weight one, at that. Besides being very light-weight, JSON is also the native javascript format for writing object literals, hence the name "JavaScript Object Notation". And since the native scripting language of web pages is javascript, it should come as no surprise that you open up your data for user innovation by providing it in JSONP form. The P stands for "with Padding", and it only defines a very basic URL calling convention for allowing a remote web page to supply your feed generator with a query parameter callback to encapsulate the generated feed in such a way as to make it usable for the page that requested it.

So, as neither JSON nor JSONP specifies anything about the actual feed content format for data, you are encouraged to think up something for yourself whichever way you please, and remain consistent with yourself, as, if and when you might opt to upgrade the feeds you provide with more data. I'll pick GeoURL for an example here, which has an RSS feed that contains this item structure:
<item rdf:about="http://ecmanaut.blogspot.com/">
<title>ecmanaut</title>
<link>http://ecmanaut.blogspot.com/</link>
<description>Near Linköping.</description>
<geo:lat>58.415294</geo:lat>
<geo:long>15.602978</geo:long>
<geourl:longitude>15.602978</geourl:longitude>
<geourl:latitude>58.415294</geourl:latitude>
</item>
Formatted as JSON, paying attention to making the data as useful as possible to an application as possible rather than formatting it for human consumption, it might end up looking like this instead:

{"title":"ecmanaut", "url":"http://ecmanaut.blogspot.com/", "cityname":"Linköping", "lat":"58.415294", "long":"15.602978"}

I substituted the "description" field for a "cityname" entry for the name of the closest city. You can of course easily improve on this with other relevant data too, for instance, why not add "citylat" and "citylong" values while at it? Or, you might nest it stylishly in another object, as

{"title":"ecmanaut", "url":"http://ecmanaut.blogspot.com/", "city":{"name":"Linköping", "lat":"58.4166667", "long":"15.6166667"}, "lat":"58.415294", "long":"15.602978"}

which would allow an application to access the data as city.name, city.lat and city.long. (Or, given that long is a reserved word in javascript, perhaps more safely as city["long"]. Hence the common practice of naming longitudes lng in javascript code instead.)

The spaces in the above examples are purely optional, and were only added here for easy reading. Whether to provide latitudes as strings, as above, or numeric literals (dropping the quotes) is of course up to you, but I deliberately chose the string representation above, as that leaves it up to the application whether to treat them as numbers to do math on or text data to do some kind of visual or URL formatting operations on. Using floats instead would have lost precision information, as decimal and rational numbers are represented by IEEE floating point numbers in javascript, meaning that there would be no telling the difference between 0.0 and 0.000, or different numbers with many figures, where the differences appears only near the end. Opting to use strings, your consumers can pick either; the difference is just another parseFloat() call at their end.

Anyway, you might ask yourself what good providing JSONP feeds does you, as a service and feed provider. At least you should, as it's a very good question. A JSONP feed will (technically) allow any web page anywhere on the web to use your data in any way it pleases. This may be what you want when you provide a blogging service you want to popularize, a user customizable guest book with built-in feed support et cetera, and it may be quite contrary to what you want for information whose reach you want to be in greater control of yourself. (Of course, you can technically limit the scope of a feed using referrer detection tricks just as you can to avoid image hot linking, and it naturally also has the same issues with browsers and proxies set up to provide some amount of browsing anonymization.)

The real value with JSONP distribution is this: user innovation. It allows a third party to very easily leverage the functionality your tool or service provides, mash it up with other tools or services, and build really great things in general; things you probably never would have thought of yourself, or if you did, at least not had the ambition of doing of your own accord. And definitely not at no investment cost to you.

Opening up your data allows anyone and everyone on the web to do those things for you, should they want to. Ask for a link back from their cool applications, and you have a sudden generator of incoming traffic and visitors that builds on its own as others innovate around your service. This is pretty much what the fluffy word Web 2.0 is all about, by the way, at least to me. Data transcending barriers between sites and applications. Users playing an active role on the web rather than consuming a web made by others to solve problems dictated by slowly rusting business models.

The visitor map on top of my blog is just one example of what has been (and hence can be) made with JSONP; the feed provider of coordinates for recent visitors is provided by GVisit, and the naked feed is combined with Google Maps using the public Google Maps API. Yes, that rather boring text data becomes the pretty eye candy that lets us boggle at the many interesting places from which people drop by for a casual read here.

Assuming GeoURL provided JSONP feeds too, I could have asked it for blogs and web pages created near places where recent visitors of mine reside, and perhaps feature a rolling feed of pages made in their surroundings, as my visitor map zooms around the corners of the globe, tracking my erratic visitor patterns as curiosity drives eyes this way from all over the world. Add to that some optional topic tags data to each URL from that service, and a way of filtering the feed on the same tags, and it might even be made an opt-in self moderated list of links to peers of mine around the world for other blogs on topics such as javascript, blog and web technology, or what have you.

As you see, it's actually rather easy to come up with good web services that interesting applications could be built around in an open community fashion, this way, and JSONP is the brilliantly no investment, user interest driven way of doing it. It's trivial to make a JSONP feed, it's easy to leverage it into applications of your own as a web programmer, and the possibilities they open up for are nothing short of astounding.

I warmly recommend following Yahoo's lead here, but don't squash the callback parameter, the way they do; put it in the feed verbatim, so the applications developer won't have to jump through needless hoops to put your feed to good use making tomorrow's killer applications.