I am working on a python script to parse RSS links.
I use the Universal Feed Parser and I am encountering issues with some links, for example while trying to parse the FreeBSD Security AdvisoriesHere is the sample code:
feed = feedparser.parse(url) items = feed["items"]
Basically the feed["items"] should return all the entries in the feed, the fields that start with item, but it always returns empty.
I can also confirm that the following links are parsed as expected:
Is this a issue with the feeds, in that the ones from FreeBSD do nor respect the standard ?
EDIT:
I am using python 2.7.I ended up using feedparser, in combination with BeautifulSoup, like Hai Vu proposed.Here is the sample code I ended up with, slightly changed:
def rss_get_items_feedparser(self, webData): feed = feedparser.parse(webData) items = feed["items"] return itemsdef rss_get_items_beautifulSoup(self, webData): soup = BeautifulSoup(webData) for item_node in soup.find_all('item'): item = {} for subitem_node in item_node.findChildren(): if subitem_node.name is not None: item[str(subitem_node.name)] = str(subitem_node.contents[0]) yield itemdef rss_get_items(self, webData): items = self.rss_get_items_feedparser(webData) if (len(items) > 0): return items; return self.rss_get_items_beautifulSoup(webData)def parse(self, url): request = urllib2.Request(url) response = urllib2.urlopen(request) webData = response .read() for item in self.rss_get_items(webData): #parse items
I also tried passing the response directly to rss_get_items, without reading it, but it throws and exception, when BeautifulSoup tries to read:
File "bs4/__init__.py", line 161, in __init__ markup = markup.read()TypeError: 'NoneType' object is not callable