Using Ruby to parse HTML and create RSS
Dec. 20th, 2007 02:35 pmI can't remember if it was
jaylake or Charlie Stross who first turned me on to the blog SF Novelists, a group blog which quickly entered my daily feed of things to read. There was one problem with this blog: it had a secondary feed that was completely worth putting into an RSS Feed, but which the purveyors of the website had somehow neglected.
If you look at the second column, "What our members are saying," it's a list of things the website's contributors are doing elsewhere. That kind of niftiness should have its own feed but doesn't. Naturally, your Elf has a solution.
Many RSS programs can take a script as a source, rather than a URL. The script just has to spew out RSS like any other RSS feed. Either you can use that facility, or you can put this script somewhere in a CGI-aware portion of a website and run it like any other script, and use that URL as the source for your feed. (Note how blithely I assume everyone has access to these sorts of things.)
Hpricot uses XPath, so it was easy to find the second column, and then disassembling it was little more than regular expressions. I used RSS/Maker not because it was the best thing I found, but because it was the first that did the job.
( Source code! )
![[livejournal.com profile]](https://www.dreamwidth.org/img/external/lj-userinfo.gif)
If you look at the second column, "What our members are saying," it's a list of things the website's contributors are doing elsewhere. That kind of niftiness should have its own feed but doesn't. Naturally, your Elf has a solution.
Many RSS programs can take a script as a source, rather than a URL. The script just has to spew out RSS like any other RSS feed. Either you can use that facility, or you can put this script somewhere in a CGI-aware portion of a website and run it like any other script, and use that URL as the source for your feed. (Note how blithely I assume everyone has access to these sorts of things.)
Hpricot uses XPath, so it was easy to find the second column, and then disassembling it was little more than regular expressions. I used RSS/Maker not because it was the best thing I found, but because it was the first that did the job.
( Source code! )