I found out the problem was with the use of namespace.
for FreeBSD's RSS feed:
<rss xmlns:atom="http://www.w3.org/2005/Atom" xmlns="http://www.w3.org/1999/xhtml" version="2.0">
For Ubuntu's feed:
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
When I remove the extra namespace declaration from FreeBSD's feed, everything works as expected.
So what does it means for you? I can think of a couple of different approaches:
- Use something else, such as BeautifulSoup. I tried it and it seems to work.
- Download the whole RSS feed, apply some search/replace to fix up the namespaces, then use
feedparser.parse()
afterward. This approach is a big hack; I would not use it myself.
Update
Here is a sample code for rss_get_items()
which will returns you a list of items from an RSS feed. Each item is a dictionary with some standard keys such as title, pubdate, link, and guid.
from bs4 import BeautifulSoupimport urllib2def rss_get_items(url): request = urllib2.Request(url) response = urllib2.urlopen(request) soup = BeautifulSoup(response) for item_node in soup.find_all('item'): item = {} for subitem_node in item_node.findChildren(): key = subitem_node.name value = subitem_node.text item[key] = value yield itemif __name__ == '__main__': url = 'http://www.freebsd.org/security/rss.xml' for item in rss_get_items(url): print item['title'] print item['pubdate'] print item['link'] print item['guid'] print '---'
Output:
FreeBSD-SA-14:04.bindTue, 14 Jan 2014 00:00:00 PSThttp://security.FreeBSD.org/advisories/FreeBSD-SA-14:04.bind.aschttp://security.FreeBSD.org/advisories/FreeBSD-SA-14:04.bind.asc---FreeBSD-SA-14:03.opensslTue, 14 Jan 2014 00:00:00 PSThttp://security.FreeBSD.org/advisories/FreeBSD-SA-14:03.openssl.aschttp://security.FreeBSD.org/advisories/FreeBSD-SA-14:03.openssl.asc---...
Notes:
- I omit error checking for sake of brevity.
- I recommend only using the
BeautifulSoup
API whenfeedparser
fails. The reason isfeedparser
is the right tool the the job. Hopefully, they will update it to be more forgiving in the future.