如何通过python获得网页数据

2024-12-22 19:10:06

推荐回答（1个）

回答1：

用Beautiful Soup这类解析模块：
Beautiful Soup 是用Python写的一个HTML/XML的解析器，它可以很好的处理不规范标记并生成剖析树(parse tree)；
它提供简单又常用的导航(navigating)，搜索以及修改剖析树的操作；
用urllib或者urllib2(推荐)将页面的html代码下载后，用beautifulsoup解析该html；
然后用beautifulsoup的查找模块或者正则匹配将你想获得的内容找出来，就可以进行相关处理了，例如：
from BeautifulSoup import BeautifulSoup
html = 'test

test body

'
soup = BeautifulSoup(html)
soup.contents[0].name
# u'html'
soup.comtents[0].contents[0].name
# u'head'
head = soup.comtents[0].contents[0]
head.parent.name
# u'html'
head.next
# u'test