Sunday 09 February 2003

Only a matter of time

I’d pretty much resolved to retool my individual archives using the anders/gord dirified file naming system when Michael from i·me·michael tossed a spanner into the works (in the form of a first-rate piece of expository writing masquerading as a comment:

While I think gord has an interesting point, and one with merit, I don’t believe it has any more merit than the current system of archiving. The analogy of the generic filing system, where memos are indexed with a seemingly meaningless number, fails to acknowledge the more realistic scenario where “Joanne” probably has a indexing key which unlocks the numerical meaning attached to the memos. Any good filing/archiving system has a method for filing, and a method of retrieval. The default method of sequential numbering is neither better nor worse than a system where archived pages have names that are in some way tied to the content contained within.

Back to square one, I thought to myself, just before I clicked on the email notification of Phil Ringnalda’s comment (since I read most of the comments on my posts in SpamKiller, I’d read Michael’s ahead of Phil’s).

I’m not going to push my approach either, even if it bears fruit, having already twice inspired you to do things that I’d rather not be doing myself (entry ids and .php), but I will note that a .htaccess file using mod_rewrite on my 499 entries doesn’t seem to cause any major server stress. That I’ve noticed. In light testing.

Phil shouldn’t feel defensive since I’ve already made it clear that I was perfectly happy with entry_id archiving until gord pointed out its shortcomings and, following Burningbird’s advice, I’m only using PHP on my index page—all my archives are .html pages (though I may introduce other PHP pages for special purposes).

Then Bill Humphries responded to John’s LazyWeb request with a hybrid mod_rewrite/PHP solution:

What if you have several hundred, or several thousand entries?

You don’t want to use mod_rewrite. This is because in the case of .htaccess, Apache has to read that file for every request. And even if you put the rules in httpd.conf, that becomes a large ruleset that uses memory and processor. The set of mod_rewrite rules should be small.

An intermediate step would be to write a PHP script that handles the redirection for us.

But wait, there’s more! Jay Allen “just recently made the switch to smarter URLs away from the numerical form.” He explains how in a comment on Burningbird’s post, Don’t Touch That Button (in which she explained how to make the switch by creating a hashed database of redirects stored directly in the file system).

I don’t know how or when it will happen (I can’t use Alex’s method because my host doesn’t allow me to play with httpd.conf and I am drawn to Burningbird’s method because I like the idea of a “hashed database”), but I’m pretty sure I will make the switch to smarter URLs, despite Michael’s conclusion:

As I said in the beginning, gord’s point has merit, and it is neither better nor worse than the archiving schemes already in use, but I don’t personally see the value in switching to it.

Why switch in the face of a logical, persuasive argument? In some strange way, discovering that Phil is using dirified individual archive names may well be the clincher, in much the same way that I’m now holding off buying a Macintosh until Phil buys one. Even though he appears to be reluctant, Phil posts Mac-related entries often enough to suggest that he, like me, is teetering on the brink. (Assuming Phil does succumb, I’ll have to ask him to immediately email me the exact time and date he placed his order so that I can place mine 3.14159 days after.)

Because ultimately decisions like this are best made by balancing logic with emotion, intuition, and superstition.

Permalink | Technorati

Comments

Sure Jonathon, ping my site just when I'm in the middle of a redesign. Looks like I'll need to fix my trackback template earlier rather than later. :-)

Posted by john on 9 February 2003 (Comment Permalink)

A man's got to do what a man's got to do, but before you switch to "smarter" URLs, have a look at what the W3C says (http://www.w3.org/TR/2003/NOTE-chips-20030128/#gl1) about choosing URLs (or rather URIs in their terminology). Specifically, they point out that you shouldn't overload a URL with meaning. If you've got important metadata, the proper place for it is in metadata tags. The title belongs in the <title> tag, for example, not as part of the URL. The date probably belongs as a <meta> tag containing a DC.date value to the name attribute. Overloading the URL with metadata is a bad idea. Your current system probably works well enough, and follows W3C recommendations for URL creation. Best to leave well enough alone.

Posted by ralph on 10 February 2003 (Comment Permalink)

Exactly!

And by way of fixing that slightly broken URI (the parentheses did it), take it from the top of that section:

W3C | Understanding URIs - http://www.w3.org/TR/2003/NOTE-chips-20030128/#uri

Posted by michael on 10 February 2003 (Comment Permalink)

The CHIPs note has confused me twice now, but what jumped out at me this time through as being most germane to our issue was the example URI in http://www.w3.org/TR/2003/NOTE-chips-20030128/#gl4 where while discussing how a newsletter publisher should redirect from a "newest issue" URI to the (permalink) URI for the current issue, they give the example http://www.example.org/2042/02/12-newsletter

The bit about "Do not put too much meaning in a URI", coming as it does between two cites of "Cool URIs don't change," seems to me to be saying to only put in the meaning CoolURIs recommends, and no more, and that's creation date plus title. My use of /blog/ at the start of my URIs and .php at the end are both the result of my being too lazy to fight with my tools, but otherwise if TimBL had a blog I would expect the permalinks to be something like /2003/02(/01)/cool_uris.

There seems to be a bit of dissent within the W3C about whether URIs should be opaque or not, but since I rather like being able to back up from an entry's URI and have /blog/2003/02/ return a monthly archive, I left off the day (a day's worth of my entries not being worth much), but otherwise I think if you are looking for W3C guidance for your permalinks, you won't find much support for /archives/000001.php. What you do with that fact is up to you, of course.

One thing that did occur to me in favor of date + title permalinks: they should be easier to transfer from one program to another, or even just within the same program. My MT entry numbers were a bit off (1900 or so too high), thanks to some playing around with importing, and if I had to export and then import the export, I would have ended up with all new permalinks, since MT just assigns EntryID sequentially. Even without the gap at the start, aborted drafts would still mess up the numbering, but any weblog program that lets me design the URI and gets the date and title imported should be able to recreate my new URI scheme without any problem.

But the two things that really sold me were, while I was playing around I needed a link to the SimpleComments plugin entry. I knew I'd posted it, but I'd also posted a link to the previous, "Comments, please" entry. With the title in the link, I knew right off that I had the right one. Then, I got to thinking about people getting hits from my random blog entry in their referrer log, coming over to see what I said about them in entry number 2281, and then feeling duped when they saw what it really was. A referrer from .../random_blogs.php should be a bit more obvious.

Do I think that you (the general you) should change? Certainly not if you don't want to: you're not hurting me any with your entry number links. Am I happy I switched? Well, right up until I start to title a post with one of my usual long, rambling, too cute titles I am.

Posted by Phil Ringnalda on 10 February 2003 (Comment Permalink)

Phil, I think you're actually missing the point. It's an easy one to miss, and a subtle distinction, but the point is not "Do not put too much meaning in a URI", but rather, don't attach more meaning (in your thinking or design) to a URI than is necessary. The idea is to design a URI scheme that is as and simple as possible in an effort to facilitate use, as well as insure persistence.

Don't confuse a URI with directory structure or file names. They needn't be the same.

The newsletter example, while illustrating a good point, isn't actually germane to weblogs at all, since a resource (post/entry/archive, or what have you) is never pointed to from two different URIs. The main index page of any weblog is its own resource, and individual entry pages, as well as archive pages, are their own resource. If you had a link on your site for the latest entry, then the example would come into play because then you can create two URIs. One which might look like this:

http://www.philringnalda.com/blog/latest/

which would always point to the latest entry, and the current URI of the latest entry might, for example, look like this:

http://www.philringnalda.com/blog/2003/02/bloggerclonerediff

This still should not be confused with your directory structure, or file naming scheme, since that is designed and put in place for your administrative purposes. The actual file in this case, the latest entry, could be located here:

/home/philring/blog/entries/archive/200302-bloggerclonerediff.html

And I specifically used the path in the last example to illustrate the point that URIs aren't (necessarily) directory structures. They are in essence (or should be) a permanent alias for where the resource actually resides within the file system. The semantics of each can be separate and independent.

The point about not creating a URI that is tied to specific information ABOUT the resource is that some of the information may change over time. Directory structures can change, file extensions can change, titles can change. A well designed URI scheme is unaffected by changes to file systems, applications, titles and the like.

Going back and re-reading gord's original post on the subject, I find myself leaning towards the idea of using the date and time method, since it's the least subject to change, is portable to future systems, and is somewhat semantic. I'd even leave out the redundant parts of his example URI and maybe simply use a URI like the following:

http://weblog.delacour.net/2003/02/05/1355h

Again, I think that's easily rewritten in mod_rewrite:

RewriteRule ^/([0-9]+)/([0-9]+)/([0-9]+)/(([0-9]+)[a-z]) archives/$3&2$1-$5$4.php

but that is strictly forward looking. Redirecting all the previous entries would still be a pain.

As always, the W3C documents are guidelines, and it's up to the webmaster to design schemes that work best for his/her application.

Posted by michael on 10 February 2003 (Comment Permalink)

I was too quick with my comment regarding the redirection example not being germane to weblogs. Tantek does this. His 'weblog' is referenced by the URI http://tantek.com/log/ but this redirects to his latest entries which are currently located at http://tantek.com/log/2003/02.html , so I'll recant my use of absolutes, and stick to prefacing comments with generalizations like 'usually', 'seems', and 'perhaps'.

Posted by michael on 11 February 2003 (Comment Permalink)

The first thing I did to MT was to make the archives /{date}/{title} and write a little hack that lets you link to a file without having the extension on it ("/2003jan08/whatver" instead of "/2003jan08/whatever.php"). I hate ugly urls. Then I turned on Options +MultiViews in .htaccess. This has nothing to do with moving or redirecting, it's just how I personally link to archives. I think the {ID}.php is kinda lame since it's not particularly informative except to the program that generated the ID.

Posted by Phillip Harrington on 14 February 2003 (Comment Permalink)

This discussion is now closed. My thanks to everyone who contributed.

© Copyright 2007 Jonathon Delacour