2010-07-11

Google styleguides for JSON APIs

I just eyed through Google's dos and don'ts style guide for when exporting JSON APIs. Overall it's pretty good, ranging from the very basics of "abide by the JSON specification" (though stated at depth, presumably for the JSON illiterate, with all implications of what data types are available, what syntax is legal and the like) to how to do naming, grouping of properties, how to introduce and deprecate identifiers, what to leave out, how to represent what kinds of data, and so on.

It of course doesn't guarantee that the outcome will be good APIs (the JSON exported by Google Spreadsheets, at least in the incarnations I peeked at some years ago, was an absolutely horrible auto-translation from XML that even mangled parts of the data due to its imperfect representation of a data grid, for instance), but it prevents does protect against many needless pitfalls.

Time


Not all of its tips are great, though. The rest of this post is a rant about time, and how it's more complicated than you think (unless you have ever run across this). I specifically want to warn about its recommendation on Time Duration Property Values (my emphasis that it's talking about amounts of time rather than timestamps, which ISO 8601 is great for), which it suggests to be encoded ISO 8601 style. Example (comments are of course not part of the output):

{
  // three years, six months, four days, twelve hours,
  // thirty minutes, and five seconds
  "duration": "P3Y6M4DT12H30M5S"
}

Don't do this!

That is a daft idea and/or example. Think about it for a moment. If you truly want to convey the length of a duration for a period of three years, six months, four days, twelve hours, thirty minutes and five seconds, the total number of seconds of that should be computable with perfect precision, right? To this application, after all (whatever it is) -- the number of days, hours, minutes -- and even those last five seconds -- are significant, so we should get them right.

Here be dragons. Human readouts of time durations like the one above don't convey that information. If you talk about durations, you have to pick one, well-defined unit of time (or -- less usefully -- a number of units that translate to each other by well-defined rules, requiring no additional data inputs) of time, and stick to that one.

I'd recommend picking either days, seconds, milliseconds or nanoseconds as your (one!) unit of choice, dependent on what kind of duration you represent and what kinds of likely uses it has. Declare the unit (so the property name suggests the unit, if you're kind) and stick to integers.

Because the number of seconds in three years, six months, four days, twelve hours, thirty minutes and five seconds depends on when you start counting, and/or what you mean by "year", "month", "day" and "hour" (earth-centric time is complicated -- some minutes even have more than 60 seconds).

Typically, it's in reference to some specific reference time, from which to increment the year by 3, the month number by six, the date by four days, and finally add another 12h, 30m and 5s. But we didn't get a reference time; we just have a duration. It's like a vector denoting a coordinate in a coordinate system. You can't tell what it points at, without knowing where it points from.

And humans happily think up some well-defined case like counting from midnight this January 1st, finds that it works out to becoming 2013-06-04 12:30:05 after adding, and maybe even computes it to 108041406 seconds total and believes it's a well-defined amount of time. It just isn't; those six months only just turned out to be 182 days because 2013 isn't a leap year. If we had started counting from 2009 and ended up on 2012-06-04 12:30:05, they would have wound up 183. And if we had started counting from March 1st instead of January 1st, they would have been 185. No matter how you see that, we've suddenly got all this seemingly second-precision duration -- with a fuzz margin of plus or minus a day and a half -- a range of more than 250,000 seconds, which compares most unfavourably to advertised second-precision.

And if you decide that your years are a special 365 * 24 * 60 * 60 seconds long, then ten years from now won't be July 11th, but July 8:th. Humans might disagree.

So if you have want to represent a duration of time, and you want the API consumer to know how long that duration is, pick a unit and pass an integer. And if, say, your first API version had a timestamp of the start of something and you want API v2 consumers to be able to tell how long after that it completed, pick a unit and pass an integer. And if you have good reason to believe that the consumer (maybe a human) is interested in the end time, pass start and stop timestamps. Programs consuming your JSON will have access to date functions that can compute the amount of time between them, or what date and time it will be after a fix number of given time units from a reference time.

Thank you for taking your time to indulge in thinking and talking about time over JSON APIs.

2 comments:

  1. I understand your argument, but I don't agree that it's *always* better to express durations as a simple number of seconds. That overlooks the simple reality that complicated Earth time really is understood by all sorts of existing human systems. Consider that a duration like "10 business days" is in common use and is commonly understood, and it is most definitely not a fixed number of seconds. That the actual duration-in-microseconds of such a duration cannot be determined until a starting point is chosen is also implicitly understood.

    It really depends on the semantics of the particular duration being communicated. Sometimes, duration-in-microseconds is clearly correct, but sometimes it's not.

    ReplyDelete
  2. Johan SundströmMon Jul 12, 09:43:00 AM PDT

    The point of my post is not to use a string that needs parsing (and guessing!) to decipher its meaning, but to use an integer representing amount of a given unit -- not that that unit be seconds. To represent a period in business days, put businessDays: 10.

    As long as the underlying semantics of the API makes it clear what it applies to ("this service, once initiated, will take this number of business days to complete", for instance), it is of course irrelevant whether it can be translated to some other particular unit or not.

    The danger of the string-encoded example is that it might seem to be highly precise to a casual glance, while it actually just ships highly indeterminate data, by its broken design.

    ReplyDelete

Limited HTML (such as <b>, <i>, <a>) is supported. (All comments are moderated by me amd rel=nofollow gets added to links -- to deter and weed out monetized spam.)

I would prefer not to have to do this as much as you do. Comments straying too far off the post topic often lost due to attention dilution.

Note: Only a member of this blog may post a comment.