230问答网 > 在Python中如何用正则表达式提取xml中的<p>之间的内容

在Python中如何用正则表达式提取xml中的<p>之间的内容

2025-01-04 03:52:10

推荐回答（3个）

回答1：

# 代码
html_text = '''
When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the 
[17] and nanog ,
,[19] well-known markers for undifferentiated ES cells. 

(A) R1 cells were cultured for 5 days in the presence of 
[1] and nanog 
[2], [3] various doses of LIF (0–1,000 units/ml). 

'''

pattern = r'(.*?
)'
html_text = re.sub('\n', '', html_text)
text = re.findall(pattern, html_text)
print(text)

# 输出
['When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the [17] and nanog ,,[19] well-known markers for undifferentiated ES cells. 
',
 '(A) R1 cells were cultured for 5 days in the presence of [1] and nanog [2], [3] various doses of LIF (0–1,000 units/ml). 
']

回答2：

建议用python BeautifulSoup直接对xml进行解析吧，都不要正则匹配！

回答3：

直接用python的库读XML不是更方便