2012-10-13, 20:14
Hier ein kurzes Beispiel, wie man mittels Python und BeautifulSoup Texte aus Webseiten extrahieren kann.
import urllib2
from BeautifulSoup import BeautifulSoup
# http://stackoverflow.com/questions/1752662/beautifulsoup-easy-way-to-to-obtain-html-free-contents
def textOf(soup):
return u''.join(soup.findAll(text=True))
soup = BeautifulSoup(urllib2.urlopen('http://www.fmylife.com/').read())
for item in soup.findAll('div', attrs={'class': 'post article'}):
item = textOf(item)
print item[:item.find("FML#")] |
import urllib2
from BeautifulSoup import BeautifulSoup
# http://stackoverflow.com/questions/1752662/beautifulsoup-easy-way-to-to-obtain-html-free-contents
def textOf(soup):
return u''.join(soup.findAll(text=True))
soup = BeautifulSoup(urllib2.urlopen('http://www.fmylife.com/').read())
for item in soup.findAll('div', attrs={'class': 'post article'}):
item = textOf(item)
print item[:item.find("FML#")]
Uwe Ziegenhagen likes LaTeX and Python, sometimes even combined.
Do you like my content and would like to thank me for it? Consider making a small donation to my local fablab, the Dingfabrik Köln. Details on how to donate can be found here Spenden für die Dingfabrik.
More Posts - Website