Quantcast
Viewing all articles
Browse latest Browse all 2

Universal Feed Parser issue

I am working on a python script to parse RSS links.

I use the Universal Feed Parser and I am encountering issues with some links, for example while trying to parse the FreeBSD Security AdvisoriesHere is the sample code:

    feed = feedparser.parse(url)    items = feed["items"]

Basically the feed["items"] should return all the entries in the feed, the fields that start with item, but it always returns empty.

I can also confirm that the following links are parsed as expected:

Is this a issue with the feeds, in that the ones from FreeBSD do nor respect the standard ?

EDIT:

I am using python 2.7.I ended up using feedparser, in combination with BeautifulSoup, like Hai Vu proposed.Here is the sample code I ended up with, slightly changed:

def rss_get_items_feedparser(self, webData):    feed = feedparser.parse(webData)    items = feed["items"]    return itemsdef rss_get_items_beautifulSoup(self, webData):    soup = BeautifulSoup(webData)    for item_node in soup.find_all('item'):        item = {}        for subitem_node in item_node.findChildren():            if subitem_node.name is not None:                item[str(subitem_node.name)] = str(subitem_node.contents[0])        yield itemdef rss_get_items(self, webData):    items = self.rss_get_items_feedparser(webData)    if (len(items) > 0):        return items;    return self.rss_get_items_beautifulSoup(webData)def parse(self, url):        request = urllib2.Request(url)        response = urllib2.urlopen(request)        webData = response .read()        for item in self.rss_get_items(webData):            #parse items

I also tried passing the response directly to rss_get_items, without reading it, but it throws and exception, when BeautifulSoup tries to read:

  File "bs4/__init__.py", line 161, in __init__    markup = markup.read()TypeError: 'NoneType' object is not callable        

Viewing all articles
Browse latest Browse all 2

Trending Articles