2010-07-28

Showing inline HTML comments of Paul Graham's

I browsed a few of the older of Paul Graham's Essays tonight, dug up my Paul Graham click-to-inline footnotes user-script, which wasn't installed in this Google Chrome profile (install link here), and peeked at the source of a few of the essays which it still doesn't grok.

In doing so, I happened upon some HTML comments -- the next level of shaved-off cutting-floor material left around for our prying eyes, of you will -- many of which were interesting, much like his foot-notes. So I hacked up a new version of the script, which inlines those too, showing them as little <a>, <b>, <c>, onward -- expanding to something looking like an HTML comment when clicked.

Share and enjoy!

To test drive the new feature, you might want to re-read Why Smart People Have Bad Ideas, A Unified Theory of VC Suckage and The Age of the Essay. (And before anyone mentions it: no, I didn't actually get to making it augment his old-style [non-]markup for footnotes. Maybe next time. ;-)

2010-07-14

Optimizing SVGs

The other night I came across this cool Tree of Life page, featuring some pdfs and images of the family relationship of all life on Earth. Great stuff. Among them, this simplified rendition divided into about a hundred sub-families, and their relations:


You see our really ancient common heritage starting at 0 radians, progressing through evolution towards the really highly evolved creatures at two pi radians; birds, crocodiles, turtles and (you are here!) mammals (but in reverse order; sorry -- us mammals are not the last cry in evolution in all ways conceivable :-).

I liked it, but it felt wrong that it was trapped in a pdf; this kind of thing should really be a Scalable Vector Graphics image (SVG, henceforth) with cut-and-pastable text, and both readable and hackable right in the page source, for people like you and me that like to poke around in things.

So I made an exercise turning it into a somewhat nice SVG, to see both how small I could make it, without much effort, and where browsers are at in terms of rendering an inline SVG, these days. I haven't actually tested yet, so it'll be a fun surprise for me too, upon publishing this post. And if your browser doesn't render it, you still had the rasterized version above, or the source pdf (35008 bytes long).

Oh, and for the curious, there's a public git repository of all the changes on github, one step at a time, from the first version (where it's helpful to have a friend that has Adobe Illustrator, for instance, to do an initial machine translation of the pdf to a workable yet messy SVG). For reference, this page does not embed the minimized end result, which weighed in at 14852 bytes (or 6038, gzipped to an svgz).

(I consider those cheating, as the line data itself has been compressed somewhat beyond the point where it's still hackable.)

If you want to play around with this kind of thing, and get familiar with hand-editing SVG files, I can whole-heartedly recommend Sam Ruby's great library of sub-kilobyte hand-made SVG:s. While I can't find a statement to attest to it at the moment, I believe they are all freely MIT licensed (I think I asked him in person at SVG Open 2009), encouraging you to learn from and play with them. It is a great resource if you want to start playing with this yourself and want to pick up on some of the tricks of the trade, since they, on average, contain pretty much 100% signal, 0% noise.

Oh, and the SVG specification when you are curious about something specific. If you want to learn a minimal subset only that can do almost everything, look at the <path d="turtle graphics here"/> attribute.

And here is the outcome of my own craftsmanship, for the browsers that get it:

Spirochaetes Chlamydias Hyperthermophilic bacteria Cyanobacteria Low-GC Gram-positives High-GC Gram-positives Deinococcus/Thermus Proteobacteria Crenarchaeota Euryarchaeota Haptophytes Brown algae Diatoms Oomycetes Dinoflagellates Apicomplexans Ciliates Eudicots Monocots Magnoliids Star anise Water lilies Amborella Conifers Gnetophytes Ginkgo Cycads Ferns Horsetails Whisk ferns Club mosses and relatives Hornworts Mosses Liverworts Charales Coleochaetales Chlorophytes Red Algae Glaucophytes Kinetoplastids Euglenids Heteroloboseans Parabasalids Diplomonads Foraminiferans Cercozoans Radiolarians Amoebozoans Club Fungi Sac Fungi Arbuscular Mycorrhizal Fungi "Zygospore Fungi" "Chytrids" Microsporidia Choanoflagellates Glass sponges Demosponges Calcareous sponges Placozoans Ctenophores Cnidarians Bryozoans Flatworms Rotifers Ribbon worms Brachiopods Phoronids Annelids Mollusks Arrow worms Priapulids Kinorhynchs Loriciferans Horsehair worms Nematodes Tardigrades Onychophorans Chelicerates Myriapods Crustaceans Hexapods Echinoderms Hemichordates Cephalochordates Urochordates Hagfishes Lampreys Chondrichthyans Ray-finned fishes Lobe-finned fishes Lungfishes Amphibians Mammals Turtles Lepidosaurs Crocodilians Birds

Unfortunately Blogger intersperses it with <br> tags if I leave the new-lines in, so see github for a cleaner version. No luck with my current set of browsers, with at least this doctype and HTML version. It does degrade to showing the text content of all the families, though, which a PDF wouldn't.

2010-07-11

Google styleguides for JSON APIs

I just eyed through Google's dos and don'ts style guide for when exporting JSON APIs. Overall it's pretty good, ranging from the very basics of "abide by the JSON specification" (though stated at depth, presumably for the JSON illiterate, with all implications of what data types are available, what syntax is legal and the like) to how to do naming, grouping of properties, how to introduce and deprecate identifiers, what to leave out, how to represent what kinds of data, and so on.

It of course doesn't guarantee that the outcome will be good APIs (the JSON exported by Google Spreadsheets, at least in the incarnations I peeked at some years ago, was an absolutely horrible auto-translation from XML that even mangled parts of the data due to its imperfect representation of a data grid, for instance), but it prevents does protect against many needless pitfalls.

Time


Not all of its tips are great, though. The rest of this post is a rant about time, and how it's more complicated than you think (unless you have ever run across this). I specifically want to warn about its recommendation on Time Duration Property Values (my emphasis that it's talking about amounts of time rather than timestamps, which ISO 8601 is great for), which it suggests to be encoded ISO 8601 style. Example (comments are of course not part of the output):

{
  // three years, six months, four days, twelve hours,
  // thirty minutes, and five seconds
  "duration": "P3Y6M4DT12H30M5S"
}

Don't do this!

That is a daft idea and/or example. Think about it for a moment. If you truly want to convey the length of a duration for a period of three years, six months, four days, twelve hours, thirty minutes and five seconds, the total number of seconds of that should be computable with perfect precision, right? To this application, after all (whatever it is) -- the number of days, hours, minutes -- and even those last five seconds -- are significant, so we should get them right.

Here be dragons. Human readouts of time durations like the one above don't convey that information. If you talk about durations, you have to pick one, well-defined unit of time (or -- less usefully -- a number of units that translate to each other by well-defined rules, requiring no additional data inputs) of time, and stick to that one.

I'd recommend picking either days, seconds, milliseconds or nanoseconds as your (one!) unit of choice, dependent on what kind of duration you represent and what kinds of likely uses it has. Declare the unit (so the property name suggests the unit, if you're kind) and stick to integers.

Because the number of seconds in three years, six months, four days, twelve hours, thirty minutes and five seconds depends on when you start counting, and/or what you mean by "year", "month", "day" and "hour" (earth-centric time is complicated -- some minutes even have more than 60 seconds).

Typically, it's in reference to some specific reference time, from which to increment the year by 3, the month number by six, the date by four days, and finally add another 12h, 30m and 5s. But we didn't get a reference time; we just have a duration. It's like a vector denoting a coordinate in a coordinate system. You can't tell what it points at, without knowing where it points from.

And humans happily think up some well-defined case like counting from midnight this January 1st, finds that it works out to becoming 2013-06-04 12:30:05 after adding, and maybe even computes it to 108041406 seconds total and believes it's a well-defined amount of time. It just isn't; those six months only just turned out to be 182 days because 2013 isn't a leap year. If we had started counting from 2009 and ended up on 2012-06-04 12:30:05, they would have wound up 183. And if we had started counting from March 1st instead of January 1st, they would have been 185. No matter how you see that, we've suddenly got all this seemingly second-precision duration -- with a fuzz margin of plus or minus a day and a half -- a range of more than 250,000 seconds, which compares most unfavourably to advertised second-precision.

And if you decide that your years are a special 365 * 24 * 60 * 60 seconds long, then ten years from now won't be July 11th, but July 8:th. Humans might disagree.

So if you have want to represent a duration of time, and you want the API consumer to know how long that duration is, pick a unit and pass an integer. And if, say, your first API version had a timestamp of the start of something and you want API v2 consumers to be able to tell how long after that it completed, pick a unit and pass an integer. And if you have good reason to believe that the consumer (maybe a human) is interested in the end time, pass start and stop timestamps. Programs consuming your JSON will have access to date functions that can compute the amount of time between them, or what date and time it will be after a fix number of given time units from a reference time.

Thank you for taking your time to indulge in thinking and talking about time over JSON APIs.

2010-07-09

List hardlinks on HFS volumes

This one goes out to all of you mac users out there.

I recently made a little hack hardlinks that takes a path (or inode) of a hardlinked file and lists the paths of all of its clones (including its own), provided it lives on an HFS volume. Usage is simple:

sudo hardlinks path

or

sudo hardlinks -c inode

You need to have the hfsdebug-lite binary Amit Singh provides installed for it to work, and if you're on a modern mac, you need to install the Rosetta tools from your MacOS X install disk to get hfsdebug to run. (That sudo is needed for hfsdebug to access the raw device of the volume -- after that, hfsdebug drops privileges to the nobody user.)

Source code:

2010-07-06

A peek at Dropbox

On a friend's suggestion, I had a peek at Dropbox for syncing directories between multiple computers, sharing files with people without posting email attachments, and the like. It's got many rather useful properties, but seems a little immature in the unix world; it is unaware of file modes and symlinks (so symlinks will show up as real files or as nothing at all, if orphaned), making it less suitable for syncing git checkouts across multiple machines, as I was hoping to. As noted in the linked thread, though, the upcoming 0.8 release will get aware of file modes, which is a good start.

What it does seem really good for, though, is auto-syncing preference files, data sets of stuff you want comfy access to wherever you are, breaking through NATs to sync stuff to home machines behind firewalls, and the like. One of the neater ideas I came across in the otherwise mostly uninteresting comments on this tips and tricks post was to set up a home machine to poll for torrent files dropped into some Dropbox-synced directory and start downloading them (to another directory, presumably), instead of doing the same via ssh.

The free plan covers 2GB data stored (plus keeping 30 days of backup history), and if you sign up through a referral link (here's mine), both signee and referrer get a quarter-gig extra quota.

Their iPhone application delivers browsability of the files you sync (your own and those others share with you -- and it should be noted that "sharing" currently implements read/write access, only, so you'll want to keep backups and/or trust sharees as you trust yourself) and lets you micromanage pictures into your on-phone picture album one by one, lets you play music and video, but not add them to your on-phone music library.

Similarly, it lets you can micromanage a picture at a time back from the device photo library to cloud storage (and connected computers), or make (now read-only), copy and email urls to any of the files in your dropbox, instead of mailing them as large attachments.

I am not surprised that it doesn't much address the main pain points of the major data interop inconvenience that is the iPhone (App store terms probably don't allow them to), but I was more than a little surprised that it doesn't measure up to what a normal rsync does yet for typical machine-to-machine file transfers yet. It does a good job as a dual-direction sync feature for basic data between yourself and non-technical friends and relatives, though.

2010-07-02

Stop gif animations in Chrome with escape

It occurred to me that one of the basic browser features still missing in Chrome, to turn off gif animations as you hit the escape key, ought to be implementable as a tiny user script through canvas:

document.addEventListener('keydown', freeze_gifs_on_escape, true);

function freeze_gifs_on_escape(e) {
  if (e.keyCode == 27 && !e.shiftKey && !e.ctrlKey && !e.altKey && !e.metaKey) {
    [].slice.apply(document.images).filter(is_gif_image).map(freeze_gif);
  }
}

function is_gif_image(i) {
  return /^(?!data:).*\.gif/i.test(i.src);
}

function freeze_gif(i) {
  var c = document.createElement('canvas');
  var w = c.width = i.width;
  var h = c.height = i.height;
  c.getContext('2d').drawImage(i, 0, 0, w, h);
  try {
    i.src = c.toDataURL("image/gif"); // if possible, retain all css aspects
  } catch(e) { // cross-domain -- mimic original with all its tag attributes
    for (var j = 0, a; a = i.attributes[j]; j++)
      c.setAttribute(a.name, a.value);
    i.parentNode.replaceChild(c, i);
  }
}

It mostly works, though for gif images loaded from another domain, we're unfortunately still out of luck. I hope Chrome will soon offer an extension flag for doing privileged canvas operations, such as drawImage, for an image loaded from another domain, like here.

That privilege could even involve a manual extension review process in the Chrome extension gallery, for all I care; it is jarring that we can't fix user experience bugs like this due to the enforced security model.


Edit: As suggested in the tip below, we don't really need to .toDataURL the image, although that gives the best results on pages that apply css styling to img tags that we won't inherit to the canvas tag. The script has been updated to work everywhere; direct install link here.