Thursday 30 January 2003

Archive organization re-viewed

Dorothea Salo takes Mark Pilgrim to task over what she regards as his blog’s unnecessarily complex archive structure:

Maybe it’s that I read too damn fast, but individual-post archives never fail to annoy me. When I’m trying to catch up on a blog, as for example after my trip to Indiana a while back, I hate having to click on individual post links. Hate it hate it hate it. Larger date-based chunks, please.

Am I all alone here, or does this irk anyone else?

Actually, you’re not alone here, and it does irk someone else. Me. Though perhaps not in the way you anticipated.

Firstly, let’s file Dorothea’s complaint under Pot Calls Kettle Black or, alternatively, Glass House Dweller Throws Stones, given that she lavishes praise on her own archiving method, which happens to be the worst of all possible choices: the execrable weekly archive. To paraphrase: “When I’m trying to find a particular post in a blog, I hate having to click on weekly archive links. Loathe it, detest it, abhor it, despise it, feel revulsion towards it, am repelled by it, cannot stomach it, find it intolerable, hate its guts.”

Larger date-based chunks, please? Certainly. How about a delicious, nutritious, more substantial serving of monthly archive? With side orders of individual archive and category-based archive? In other words, something for every taste. Unless you like the taste of vomit… in which case, since I don’t offer weekly archives, I can’t help you.

Gee, Jonathon, I get the feeling you don’t care for weekly weblog archives. Why is that?

Because the purpose of an archive is (or should be) to provide easy access to past weblog entries. And weekly archives make it far more difficult than it needs to be.

Dorothea uses the example of wanting to catch up on a weblog after being out of town for a few days, noting that with many Blogspot blogs she can only do that by “hitting the archives.” I admit that weekly archives meet that need perfectly: click on the most recent weekly archive link to read the latest posts. But doesn’t that suggest that weekly archives are useful only for a negative reason? Because they allow you to route around problems caused by a third-rate blogging tool and/or hosting service?

A far more common use of a weblog archive is to find a post that’s fallen off the main index page. If you’re lucky, the weblog offers a searchable archive (Movable Type does this particularly well) and, as long as you can recall a keyword or two, you should have no trouble finding the post you want.

But let’s imagine you’re in the unhappy situation of looking for an entry in a weblog that offers no proper search facility and nothing but weeky archives. (You could, of course, do a Google search on “keyword” but since Dorothea has framed her argument in user interface terms—by comparing mouse clicks—let’s deal with it accordingly.)

The item was posted a month or two ago, in November or December last year. So you click on the link for 12/01/02 - 12/07/02 (another source of irritation, which I’ll get to later) and either scroll through the entries or do a browser Find on a keyword. Nothing. Now you have the choice of trying either 12/08/02 - 12/14/02 or 11/24/02 - 11/30/02. You try them both. Still nothing. And so it goes until you finally locate the entry you were after.

If the weblog had instead offered monthly archives, you could have found the post in a fraction of the time. And you wouldn’t have had to deal with the wretchedly unintuitive—to those of us who are not American—MM/DD/YY date format. When I see 12/01/02, I automatically think January.

To her credit, Dorothea displays her weekly archives in a DD MMM YY format that’s easy to understand (allowing for the mild confusion of some Latin abbreviations). And by making her weblog searchable and also providing category-based archives, she makes up for the deficiencies of her weekly archives.

I guess it comes back to what you regard as the natural unit of blogging discourse. I see it as the individual post, which is why I base my permalinks on individual entries. Since I allow comments, this enables me to combine post and associated comments on a single archive page. And I provide a proper search plus monthly and category archives. But not weekly archives. Never.

Because… weekly archives stink.

I agree with you, Jonathon, also I prefer the individual archive links based on future searchability. In a weekly (or longer) archive a lot of words may occur resulting in bizarre searchword combinations leaving both the user searching and yourself (paying for bandwidth) disappointed.

More info on search engines and Movable Type:

Posted by: andersja on 30 January 2003 at 08:59 PM

Since we're on the subject, Jonathon, I've recently tried to hunt down posts from you in your category archives, and have been chagrined to note that they are ordered temporally, but from earliest to latest. This, I have repeatedly discovered, well-thought out as it may well be, annoys the heck outta me.

Just thought I'd mention it.

Posted by: stavrosthewonderchicken on 30 January 2003 at 09:52 PM

I have received correspondence as well to the fact that my month and categorical archives are ordered temporally and not in reverse chronological order. I don't know, but it makes sense to me that when I look at the page for a whole month, it goes from the 1st day to the last day.

That said, I also was indecisive over how best to archive my blog. What I settled on was individual, monthly and categorical archives, with the monthly and categorical being listed with one entry on each line and some metadata about the entry on the line as well. This does not solve her clicking issue as you would still have to click, read, back, click, read . . . But, in my individual entry archives you can go to the next post or the previous post with one click. So to catch up, you go the last post you remember and keep your mouse over the previous link and just keep clicking, once per entry.

Posted by: gord on 30 January 2003 at 10:38 PM

I find myself agreeing with you on all points. I had fun a couple of weeks ago looking through your archives and found them very easy to navigate through. Once over the initial oddness, I found the chronological order very useful. Whilst reverse on the front page makes sense, when you are looking through an archive, you don't need the freshest post at the top. [I've just realised that my archival system is woeful and I'm off to fix it.]

Posted by: Paul Freeman on 30 January 2003 at 11:13 PM

OK, I'm in a really foul mood this morning, so I shouldn't be replying to this at all. But...

Dorothea, my site structure is based on the assumption that there are 3 classes of people:

1. people who regularly visit my site every day, and want to read everything I write.
2. people who don't regularly visit my site every day, and want to read only certain things I write.
3. people who don't regularly visit my site at all, and come via search engines looking for a specific piece of information.

Group 1 is satisfied, and well-served, by my current structure, because my home page is as small as possible.

Group 2 is satisfied, and well-served, by my current structure, because they can read excerpts of recent posts (at the bottom of the home page, in category archives, in daily archives) and decide whether they want to read further.

Group 3 is satisfied, and well-served, by my current structure, because (as others have noted) individual archive pages minimize the chances of false positives that result from mixing keywords between posts.

You are apparently in group 4: people who don't regularly visit my site every day but want to read everything I write anyway. You are not well-served by my current site structure, and I see no way to serve you better without harming group 1, 2, or 3, who, frankly, are more important than you are.

If keeping up with every last word is really that important to you, I suggest you get a news aggregator. Browsing manually is so 20th century.

Posted by: Mark on 30 January 2003 at 11:51 PM

I suspect I came across as rather more annoyed than I actually am. My annoyance is mostly directed at Phoenix, Mark, not you. In any case, I apologize unreservedly for causing offense.

Jonathon, I used to have monthly archives, and for people who don't run off at the keyboard the way I do I agree that they're the best solution.

Unfortunately, monthly archive files for me can get freakin' *huge*, enough to be a concern for people on low-speed connections (like, say, me at home). The proliferation of single-post files bothers me just enough so that I prefer the weekly archive format.

I do notice that for some reason my weekly archive files don't contain pointers to next and previous. *That* is a *problem*, and I shall fix it forthwith. Well, this weekend, probably.

Posted by: Dorothea Salo on 31 January 2003 at 12:04 AM

Mark: your three categories ignore one important class of visitor.

4. The first-time visitor who arrives at a page on your site through a link, Google search, or serendipity; likes what they see; and wants to read more of your writing.

I fell into that category last November through a Python-related link. I enjoyed your writing, and now fall into category 1. But when I was in "this is interesting, what else is here" mode ("Dive Into Archives"?) I found navigating through past history via the monthly calendar pages enormously frustrating.

I imagine us Hungry Gulping Noobs are quite common: certainly my first instinct, on stumbling on something interesting, is to look around for what's around and what's passed before.

Posted by: James Kew on 31 January 2003 at 05:56 AM

I must admit to being partial to having as much possible information onscreen at any one time, but not too much. But, of course, only the information I'm interested in. A tricky dynamic, indeed.

I agree that monthly archive pages can get large and unwieldy. And most default calendars don't provide any information other than whether or not something was posted on a given day.

Awhile back, I'd borrowed the format that Mark had borrowed for displaying a month's worth of posts at a time. I still think it's a pretty effective way to browse a large amount of content with a small amount of bandwidth. You still need to drill down to an individual post, but, depending on how you've formatted your individual pages, you can still provide lots of jumping off points for other content.

Posted by: RKB on 31 January 2003 at 06:18 AM

One of the reasons I stick (however laboriously) with hand-coding is the ability it gives me to decide the archive boundary myself based on number of screens or size of file rather than on date. (Bundling archives by arbitrary calendar marks reminds me of those awful tools that try to mimic paper-publishing with middle-of-a-sentence page breaks.)

Another is that, like gord, I prefer my archived entries re-ordered earliest-to-latest, mostly because I serialize essays so often.

It sounds as if MovableType can handle the second requirement. Does any automated tool handle the first yet?

Posted by: Ray on 31 January 2003 at 07:37 AM

Hm. Not based on size, I don't think, though I also would consider that ideal. There might be a way to gimmick Movable Type to do it by number of posts, though -- no, I guess not, actually; you can have it *display* a certain number of posts, but not archive on that basis.

Posted by: Dorothea Salo on 31 January 2003 at 08:23 AM

I solved the archive problem by writing my own software. If you want to read entry-by-entry, you're covered. If you want day-by-day, again, no problem. Even month-by-month. Weekly ... you can do, but you will have to adjust the URL accordingly.

In fact, I'd like to see more people pick up on the technology, but since it was written in C (and not Perl or PHP) I think the interest in it is just not there. But in any case, more information:

And yes, on the main page, the entries are in reverse chronological order (newest first) but in the archives they're in normal order (but if you like, you can have them in reverse too---just reorder the URL).

Posted by: Sean Conner on 1 February 2003 at 07:28 PM

I've solved my archiving problem by removing them. 31 days worth on the main page, and then they are gone. I'm aware of the weaknesses of this approach, but since the URLs are so fragile anyway...

Posted by: Ed Bilodeau on 2 February 2003 at 11:38 AM

Well, this has been a productive discussion. I was surprised (astonished might be a better word) to learn of Stavros's annoyance at my archive pages being in chronological order. That is a deliberate choice, which seemed to find general agreement amongst others who commented. I occasionally write a series of entries (and intend to do much more of this) so it's logical to present my archived (monthly- or category-based) posts in the order in which they were written.

Anders, thanks for the pointer to your Optimizing MT post. I'd bookmarked it and the preceding post, meaning to systematically implement your suggestions. I've done quite a number of them but you've reminded me that there's more to do: the monthly calendar page and related entries might be next.

Sean, I checked your approach and it's certainly simple and elegant but I'm not sure that less technical users would be able to modify the URLs. I could be wrong though...

Way to go, Ed! Your approach could be described as "we had to destroy the archives in order to save them."

Posted by: Jonathon on 2 February 2003 at 01:11 PM

Well, even though a user may not want to munge the URL, you can still set up a weekly archive using my system---just give a link with the appropriate range:

and so on. Now, I've yet to actually write code to generate the archive link pages automatically but it's nothing horribly difficult; just probably tedious.

(heck, you want them in reverse chronological order? although there still seems to be bugs in that part of the code since it's rarely used).

Posted by: Sean Conner on 5 February 2003 at 10:13 AM

Oops, forgot the year in the last link:

Posted by: Sean Conner on 5 February 2003 at 10:14 AM

This discussion is now closed. My thanks to everyone who contributed.

