2006-08-11

IIS your server well tuned?

For fun and to scratch some web site usability itches, I have been playing with two IIS backed web sites, and while I believe my readership audience doesn't have great overlap with people who run their web sites off IIS, I hope this could be useful to someone who might end up here by way of search engine. It's two tips hoping to improve the world in the very slightest sense, offering best practices thoughts that easily bypasses IIS site developers.

Drop the needless X-Powered-By: ASP.NET header!

With default settings, an IIS server running ASP.NET will happily announce this with every request for a less-than-caring web browser, not only with the HTTP standard Server header, but also via a special X-Powered-By: ASP.NET header, incurring a needless performance hit for every visitor of your site for every request, to the sole purpose of feeding the ego of some Microsoft employee, committee or other source of duhcision making. HTTP is a talkative protocol as it is, but you don't need to feed it additional payload for the sake of proving a point. Turn it off.

URLs are case sensitive!

I know, you don't see this much as an IIS developer, because in Microsoft land, URLs are case insensitive, and to IIS, they are too, probably to trade off worse performance of customer web sites for a lower toll on Microsoft support lines, since paths on Windows file systems have historically been case insensitive and users were expecting the same of URLs.

Anyway, the bad thing with treating URLs as were they case insensitive is that you see no visual indication of anything being wrong when you mix and match versalization of the URLs on your site as you extend it, linking to the "read post" page as /blog/readpost.aspx in one view, /Blog/ReadPost.aspx in another, and most likely also to /blog/ReadPost.aspx and /Blog/readpost.aspx through relative links from the same directory without the leading path segment. (As a visitor follows a URL to /blog/readpost.aspx and that page links adjacent posts via the ReadPost.asp capitalization, on following one, you end up at /blog/ReadPost.aspx.)

Multiply by the number of possible capitalizations offered in the query parameters, i e a random number of variants on UserID, PostID and date, for instance, and you have a huge number of different URLs that all point to the same resource (a specific post, reached through a combinatorially explosive number of same name, different case, aliases).

IIS will not care about the case discrepancy, BUT HTTP DOES.

Why? For two reasons: caching, and browser history. HTTP and your browser agree about URLs and that URLs are case sensitive. When you load /blog/readpost.aspx and /Blog/ReadPost.aspx, they are different URLs and there is no assumption of them being the same document, so there will be no attempt at pulling the second page from your browser's cache entry registered with the first page. Had the same casing been used, the second pageload would see a short 304 Not Modified response (possibly made not-as-short by the above extraneous header), without its page body payload, instantly serving the page from the browser's cache. Great! Less load on your server!

Furthermore, there is browser history. Good web sites employ style sheets (or simply do not serve any styling information at all, leaving that to browser defaults) that differentiate visually between visited links and non-visited links, typically by way of different shades of the link color. Great for usability; it's easier for your visitors to overview what content they have already read through and concentrate on the rest in their further exploration of your site. No need to read yesterday's blog posts again just because you don't recognized the title, when you will surely recall it on seeing the contents, and have wasted your time clicking the link to see it again.

This visualization also can't tell that different URLs are really the same resource, so the same blog post, linked from a view predisposed towards CamelCase, will not be recognized as the one you visited yesterday, which was linked in all lowercase, and the visual cue won't be there. So the view you came from yesterday will say you had read the post, but the view you see today won't. Browse a few posts from today's view and load the view you came through yesterday, and you will only have one post read there. It's all very confusing to your users, but IIS will make sure the pages load, and you won't see that your server draws combinatorially higher bandwidth due to the number of different ways of ending up with the same post than it would have had to, and to save face, your wesigners might just as well end up hiding the visual styling of visited and unvisited links as they just seem to break and confuse rather than help, for some weird reason nobody can quite understand.

But now, at least, you will!

I have been playing with using just user CSS to improve web sites, but this fallacy of random capitalization of links made my "visited" markers (a ✓ prefixed to visited links and non-breaking space characters to fill the same column for unvisited links makes for a great instant overview in vertical lists of what posts are read and not). Users of the Stylish extension might want to have a peek at the source code of my checkmark read posts user CSS, but to be really good with sites like these it probably takes a full user script to case normalize all links on the site first, which I haven't quite gotten to doing yet. I expect downcasing site internal path and query parameter names might make a perfect solution.

I have started seeing these "checkmark for read pages" appear on blogs recently, which is nice. So far I have only seen it as suffixes to links, which isn't as easy to overview in a flat list (as the right edge is rugged) as the variant with making an additional column for the visited / non-visited marking, but it's a good start. And if you don't want to meddle with unicode, you can of course employ a different background for the visited links, perhaps with a nicer still graphic of some kind. I suggest using some amount of left padding to give the image some space, rather than cluttering up the link text with the "background" imagery.

5 comments:

  1. This might take care of the case insensitivity - in a javascript you load on all pages add this to be done onload:

    l=document.links;
    for(var i=0;i<l.length;i++){l[i].href=l[i].href.toLowerCase()};

    Now all requests to the server will be in lowercase only. The server will still find regardless of its real name - but the browser will register visited links in a coherent way.

    I don't use IIS servers so I haven't tested this...

    The servers at Studentlitteratur - where I work - run a "cleansing script" on every page after it has been built to detect anomalies in urls, make the code more compact, cache parts of pages etc - this might be a way to go too.

    ReplyDelete
  2. Nor do I, actually, but I am a user of several such sites. Lowercasing the entire URL is probably a certain recipe for disaster, as query parameter values are not necessarily case insensitive. Also, javascript: URLs are not case insensitive at all.

    I think I'll try running a user script that tests how safe this kind of URL cleansing is, though (taking such relevant aspects into account), and whether it might prove useful, or actually destructive. (Limiting it to apply them to http pages and IIS servers only.)

    ReplyDelete
  3. You're right (oh how I hate being wrong ;-)...

    Well this would lowerCase only the pathname and query names - not query values, javascript urls or hash/anchor parts of the url:

    var l=document.links;
    for(var i=0;i<l.length;i++){
    l[i].pathname=l[i].pathname.toLowerCase();
    var s=l[i].search.split("&");
    for(var j=0;j<s.length;j++){
    var t=s[j].split("=");
    t[0]=t[0].toLowerCase();
    s[j]=t.join("=");
    }
    l[i].search=s.join("&");
    };

    ReplyDelete
  4. Well, I'm not sure I'd call it wrong, just not perhaps something within safety guarantees. You're somewhat trigger happy, though; my response already links a completed user script that implements the above. :-)

    My experiences from running it so far points out two other misfeatures with one of the two sites that prompted the experiment -- first, that there are a whole slew of query parameter values that list randomly cased GUIDs (the long ugly dashed hex clusters within curly braces), second, that it appears to be optional encasing them in curly braces, which also is something one page does differently from another.

    I think that approaches my limits for where it is an interesting challenge trying to see how much server side breakage you can mend client side and call it quits instead, but I might try out playing with some heuristics first.

    If you find yourself writing much string dribbling code, by the way (which I believe you do), you should try to pick up on regexp dribbling. Especially if you learn when to stop using them, it can make code a whole lot quicker and more readable at the same time.

    ReplyDelete
  5. Sorry for being trigger happy :)...

    I do use reg exps for serious string dribbling - though I'm a bit burned from my Perl days back in the 90s - trying to maintain code where every other line is an intricate reg exp is really no fun a year or a programmer later ;-)

    The last two years or so I find myself often reusing code snippets between JavaScript and ActionScript - this has also made me stay off reg exps a bit.

    Having said that your solution to this problem - although similar to mine - benefited in speed from your reg exp.

    Really hate those ugly hex clusters, so I think I'll call it quits too.

    ReplyDelete

Limited HTML (such as <b>, <i>, <a>) is supported. (All comments are moderated by me amd rel=nofollow gets added to links -- to deter and weed out monetized spam.)

I would prefer not to have to do this as much as you do. Comments straying too far off the post topic often lost due to attention dilution.

Note: Only a member of this blog may post a comment.