# 代码
html_text = '''When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the
[17] andnanog ,
,[19] well-known markers for undifferentiated ES cells.(A) R1 cells were cultured for 5 days in the presence of
[1] andnanog [2] ,[3] various doses of LIF (0–1,000 units/ml).
'''
pattern = r'(.*?
)'
html_text = re.sub('\n', '', html_text)
text = re.findall(pattern, html_text)
print(text)
# 输出
['When ES cells differentiate, they migrate out from colonies on gelatin-coated dishes, similar to the ES cells on the
',[17] andnanog ,,[19] well-known markers for undifferentiated ES cells.
'(A) R1 cells were cultured for 5 days in the presence of
'][1] andnanog [2] ,[3] various doses of LIF (0–1,000 units/ml).
建议用python BeautifulSoup直接对xml进行解析吧,都不要正则匹配!
直接用python的库读XML不是更方便