2007-03-07

Firefox content type bugfix extension

The not much heard of Open in browser extension is a great improvement over the Firefox deficiency of disallowing user overrides to the Content-Type header passed by web servers. The Content-Type header, if unfamiliar, is what states the data type of files you download, so that your browser can pick a suitable mode of decoding (if it is a JPEG picture, use the native JPEG decoder, for instance, while showing text files as text) and presenting the data.

Web server admins are people too that make mistakes, or occasionally have weird ideas contrary to yours about how to present a file (prompting with a Save As... dialog for plain text, HTML, or images, most frequently), instead of showing it right in the browser, and a native Firefox lets them rule you. This extension grants you the option of choice.

It presently (v1.1) does not allow you to specify any legal content type override, but it handles the basic xxx/yyy types, while considering "text/plain; charset=UTF-8", for instance, illegal. This seems to be a common misconception about Content Types (or MIME types, as they are also commonly called), which I would like to see fade away. interested in references, §14.17 of RFC 2616 (HTTP) states that the leading type/subtype declaration may be followed by any number of {semicolon, attribute=value pair} blocks, so if you are tempted to do validation on legal content type declarations, for some reason, don't disallow those.

Excerpt of the relevant ABNF, if you want to generate a proper grammar validator:

media-type     = type "/" subtype *( ";" parameter )
type = token
subtype = token
parameter = attribute "=" value

attribute = token
value = token | quoted-string
token = 1*<any CHAR except CTLs or separators>

CHAR = <any US-ASCII character (octets 0 - 127)>
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT

CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>

quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = <any TEXT except <">>
<"> = <US-ASCII double-quote mark (34)>

TEXT = <any OCTET except CTLs,
but including LWS>
OCTET = <any 8-bit sequence of data>
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
LWS = [CRLF] 1*( SP | HT )

CRLF = CR LF
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>

quoted-pair = "\" CHAR


If not, and you, say, want to do it with a Javascript regexp instead, here is a free regexp to choke on, if you really do want to do strict content type validity checking by (javascript) regexp, rather than, say, check for validity using a more lax variant, perhaps /.+\/.+/:

var validMIME = /^[^\0- "(),\/:-@\[-\]{}\x80-\xFF]+\/[^\0- "(),\/:-@\[-\]{}\x80-\xFF]+(;[^\0- "(),\/:-@\[-\]{}\x80-\xFF]+=([^\0- "(),\/:-@\[-\]{}\x80-\xFF]+|"(\\[\0-\x7F]|[^\\"\0-\x08\x0B\x0C\x0E-\x1F])*?"))*$/;

As you see, regexps are really horrible tools for doing this kind of thing, though, but with a bit of pain they can do the work. I'd suggest keeping a link to this page in a short comment in your code, should you adopt that monster, in case you will ever have any issues with it, or need to work out why it bugs out. Chances are your IDE does not let you mark up token semantics the way I did above.
blog comments powered by Disqus