January 4, 2010

Yahoo Pipes

I’ve heard about Yahoo Pipes a while ago, but it just occurred to me today that it could help with a few tasks that I’ve been meaning to work on. (the gtkpod documentation suggests that if a podcast’s RSS feed doesn’t work, to make an identity function pipe to fix it. I guess Yahoo’s RSS reading code is more robust than gtkpod)

My first task: taking the boston.com BostonUpdate twitter feed and make it easy to follow in an RSS reader. Boston.com has many syndication options, but they often have some flaws to them. Many of the RSS feeds are automatically generated, so they can be a bit noisy and low in content to noise ratio. Also, much of the content partially overlaps, so finding the optimum collection of feeds is tough (not to many duplicates across feeds, not too many useless articles, but enough to find all of the content I want to read)

The Twitter feed takes little advantage of what Twitter provides (very few hashtags. very little linking to other users.) but the RSS feed that the twitter account does have the advantage of being generated by news producers. The only drawbacks to the Twitter RSS feed are that posts tend to use URL shortening services like bit.ly, and that the links go back to Twitter, rather than the original article.

Yahoo pipes winds up being a convenient way of interacting with Twitter to get an improved version of the BostonUpdate feed. The Twitter RSS Feed pipe will take a Twitter ID, fetch the RSS feed for the page, convert all bit.ly links to the URL they point to, make the RSS link into that page (rather than the Twitter Tweet page) and turn the result into a new RSS feed. All in under a dozen statements (which Yahoo Pipes displays as flow chart style blocks that connect together.)

The bit.ly URL expansion was done as another Yahoo pipe (so I could publish it independently of the Twitter stuff and reuse it.)

My next tasks will be something that can take multiple RSS feeds and dedup them. That way when the same article shows up in the Boston.com most popular, Boston.com top stories, and the BostonUpdate feeds, I only see one of them.

