Tuesday 26 November 2002

Excerpts or full posts in RSS

Kevin Laurence, who uses AmphtetaDesk to keep up with his regular weblog reading list, emailed me, noting that in late October my RSS feed changed from displaying a paragraph or so of each post to the entire text of each post. He also listed the reasons he didn’t like this change at all:

Give your users the choice, he suggested, and let them configure their news readers accordingly.

I was surprised since the only change I’d made was to update and validate my Movable Type RSS templates. I use AmphetaDesk too and the RSS 1.0 and RSS 2.0 feeds hadn’t changed as far as I could see. Initially I agreed that the RSS feed should include nothing more than the title plus (say) the first fifty words of the first paragraph. On reflection, I realized I had no problem with people reading my weblog posts in a news aggregator though I think the choice should be up to the individual user. So I emailed AmphetaDesk’s creator, Morbus Iff, explaining the problem:

I note that both the RSS 2.0 and RDF 1.0 feeds contain a <content:encoded> <content:encoded> block with the full text of each weblog entry. But there doesn’t seem to be any setting in AmphetaDesk that would allow one to select either the <description> or the <content:encoded>. I’m sure that Mark Pilgrim has said that lots of people read diveintomark.org in their RSS Aggregator, in which case either he or they must be doing something to provide the full content rather than the excerpt. I’d be grateful for any thoughts you might have about this.

I also searched diveintomark and found the reference to handling either excerpts or full posts:

Back to the template. Other than guid, the most important thing to note about this template is that the title, link, and description are all plain text. (description is an excerpt; if you do not enter an excerpt manually for a post, Movable Type will auto-generate one. You can control how long this auto-generated excerpt is by going to Blog Config, then Preferences, then “Number of words in excerpt”.) title was always supposed to be plain text, but sticking to plain text in the description tag is an intentional compromise, to support parsers that can not handle HTML, or handle it improperly. Never fear, the full HTML text of your post is still included; it’s stored in the content:encoded element. (Aggie already supports this.) This allows more robust news readers—that can handle either text or HTML—to offer the end user the choice of whether to see excerpts or full posts. Some people use news aggregators to find things to read, others like reading everything directly in their aggregator. RSS 0.9x made you (the author) choose one or the other; RSS 2.0 allows you to offer both, and pass the choice along to the end user. This is a good thing.

Morbus replied promptly (I don’t understand how he can provide such superb technical support for a free software application):

AmphetaDesk will display the <content:encoded> instead of the <description>, if it exists. My assumption is that:

  1. not all aggregators support <content:encoded>.
  2. publishers know about a).
  3. publishers want aggregators that do support <content:encoded> to have richer content than those that don’t.

And thus, Ampheta displays <content:encoded> above all else. You’d rather see this be an optional value that people can choose from? Something like ‘Show more content if it exists?’. In either case, Kevin can turn this feature off pretty easily with a simple one line modification.

  1. open up templates/default/index.html in a text editor.
  2. look for (two lines in the code, three here for readability):

# is mod_content's <content:encoded> used?
$item->{description} = $item->{"content:encoded"}
     if defined($item->{"content:encoded"});

  1. Add a # character in front of the second line. This will cause <content:encoded> to be ignored totally.

He can do this with AmphetaDesk running - simply refresh the page once he’s saved his changes.

To further clarify:

  1. in v0.93, <content:encoded> wasn’t supported. instead, we used the much longer version of mod_content (which no one used either).
  2. in v0.93.1, that was switched to <content:encoded>. In that case, you’re either using an old version of AmphetaDesk, or some templates that you’ve tweaked yourself, which don’t have the new code.

That explained why I hadn’t seen the change. I was still using AmphetaDesk v0.93. Now I’ve upgraded and made the change to the configuration file. Until Morbus adds the switch (which I have no doubt he will), you might like to do the same.


© Copyright 2002-2003 Jonathon Delacour