We have noted in our alpha invitations that we intend for FeedLounge (company, people and application) to be as open as we can possibly be. So along those lines, I will be posting here and on the FeedLounge Blog about architecture, features and development of FeedLounge, so that everyone can see inside the beast, so to speak.
Which feed parser should we use?
When are you building a web based feed reader like FeedLounge, having data to read is step one. Luckily, there are many feed parsers already out there, so the “build vs. buy” decision was fairly easy. Focusing on the development of the user experience of the feed reader, the feed parser part of the application is only a ‘necessary evil’ in the scheme of things. After checking out several possiblities, including using my own Java/SAX framework, we decided on feedparser, the canonical namesake of the feed parsing world. Built by Mark Pilgrim, and currently at version 3.3, this is probably the most forgiving feed parser on the planet. Had I gone with my own solution, I would have spent months and months creating something as good. And with a liberal open source license, I am allowed to use it in a commercial project like this.
feedparser features
- feed format support - v3.3 has impress support of 4 feed formats and 15 different versions of those formats. This probably would have taken a good chunk of time to come up with support for.
- encoding detection - Anyone who has done this understands the difficulty without any explanation.
- tidy support - Want clean HTML content as output? No problem, it’s in there
- translated access between specific terms - If you know channel instead of feed, these are the same thing in feedparser. Use the terms that you are comfortable with.
- relative url support - Useful to us since we are ripping the feed apart to store it. Having no relative URLs is a great relief.
- great documentation - Mark produces some of the best, most-useful documentation in the open source world. feedparser is no exception here. Terse, but covering what you need to know. Need to do 401 auth? Here. Wondering about E-Tag support? There.
- over 2000 unit tests - I may run into some arcane case not covered here, but the likelihood is not very high.
- HTML sanitizing - Extremely useful for a feed reader, to prevent bad things. You don’t want to let someone else’s JavaScript run inside your app. Debugging that would be a nightmare, and maliciousness is also a concern.
- date parsing - Support for every date format they came across. You get a simple date format, consistent from feed to feed.
- It just works!- The best is saved for last, as this point cannot be made often enough. In the months of development so far, feedparser has never been the spotlight of a single problem. The closest we have come to some kind of problem is not checking for the existence of some item before accessing it. feedparser has been a huge net positive on development, with an almost nil overhead. To have alpha testers say that some of the feeds that don’t open in nearly anthing else show up in FeedLounge, that wasn’t us, it was feedparser and its magic voodoo.
Mark, thanks a million. I know you have ‘gone dark’ in the blogging world, but you are still rocking mine.
Popularity: 34%
June 11th, 2005 at 4:58 pm
One of the two of you made some oblique Python reference the other day—probably in a conversation we were having—and I was hoping that you were using Mark Pilgrim’s parser. Yes, it’s wonderful to be using something that parses liberally and all that brings.
June 11th, 2005 at 5:47 pm
FeedLounge: Tagging and Renaming
Two things I’m really, really loving about Feedlounge:
Tagging feeds. As Dougal notes, you can tag feeds to give you clouds of feeds. He uses the “perl programming” “php programming” example, which really should be a…
June 13th, 2005 at 8:47 am
[…] of a series of posts on the development. If you missed the first one, check it out here: FeedLounge development: the parser. The feed validator We have fe […]