python学习（十）正则表达式

2021/10/10 17:14:12

编程Tag： 正则表达式 python 学习子模式 pattern re GROUP findall

本文主要是介绍python学习（十）正则表达式，对大家解决编程问题具有一定的参考价值，需要的程序猿们随着小编来一起学习吧！

 正则表达式是字符串处理的有力工具和技术。

 正则表达式使用某种预定义的模式去匹配一类具有共同特征的字符串，主要用于处理字符串，可以快速、准确地完成复杂的查找、替换等处理要求，在文本编辑与处理、网页爬虫之类的场合中有重要应用。

 Python中，re模块提供了正则表达式操作所需要的功能。

re模块主要方法

>>> import re #导入re模块
>>> text = 'alpha. beta....gamma delta' #测试用的字符串
>>> re.split('[\. ]+', text) #使用指定字符作为分隔符进行分隔
['alpha', 'beta', 'gamma', 'delta']
>>> re.split('[\. ]+', text, maxsplit=2) #最多分隔2次
['alpha', 'beta', 'gamma delta']
>>> re.split('[\. ]+', text, maxsplit=1) #最多分隔1次
['alpha', 'beta....gamma delta']
>>> pat = '[a-zA-Z]+'
>>> re.findall(pat, text) #查找所有单词
['alpha', 'beta', 'gamma', 'delta']
>>> pat = '{name}'
>>> text = 'Dear {name}...'
>>> re.sub(pat, 'Mr.Dong', text) #字符串替换
'Dear Mr.Dong...'
>>> s = 'a s d'
>>> re.sub('a|s|d', 'good', s) #字符串替换
'good good good'
>>> s = "It's a very good good idea"
>>> re.sub(r'(\b\w+) \1', r'\1', s) #处理连续的重复单词
"It's a very good idea"
>>> re.sub(r'((\w+) )\1', r'\2', s)
"It's a very goodidea"
>>> re.sub('a', lambda x:x.group(0).upper(), 'aaa abc abde')
#repl为可调用对象
'AAA Abc Abde'
>>> re.sub('[a-z]', lambda x:x.group(0).upper(), 'aaa abc abde')
'AAA ABC ABDE'
>>> re.sub('[a-zA-z]', lambda x:chr(ord(x.group(0))^32), 'aaa 
aBc abde')
#英文字母大小写互换
'AAA AbC ABDE'
>>> re.subn('a', 'dfg', 'aaa abc abde') #返回新字符串和替换次数
('dfgdfgdfg dfgbc dfgbde', 5)
>>> re.sub('a', 'dfg', 'aaa abc abde')
'dfgdfgdfg dfgbc dfgbde'
>>> re.escape('http://www.python.org') #字符串转义
'http\\:\\/\\/www\\.python\\.org'
>>> example = 'Beautiful is better than ugly.'
>>> re.findall('\\bb.+?\\b', example) #以字母b开头的完整单词
#此处问号?表示非贪心模式
['better']
>>> re.findall('\\bb.+\\b', example) #贪心模式的匹配结果
['better than ugly']
>>> re.findall('\\bb\w*\\b', example)
['better']
>>> re.findall('\\Bh.+?\\b', example) #不以h开头且含有h字母的单词剩余部分
['han']
>>> re.findall('\\b\w.+?\\b', example) #所有单词
['Beautiful', 'is', 'better', 'than', 'ugly']
>>> re.findall('\d+\.\d+\.\d+', 'Python 2.7.13')
#查找并返回x.x.x形式的数字
['2.7.13']
>>> re.findall('\d+\.\d+\.\d+', 'Python 2.7.13,Python 3.6.0')
['2.7.13', '3.6.0']
>>> s = '<html><head>This is head.</head><body>This is 
body.</body></html>'
>>> pattern = r'<html><head>(.+)</head><body>(.+)</body></html>'
>>> result = re.search(pattern, s)
>>> result.group(1) #第一个子模式
'This is head.'
>>> result.group(2) #第二个子模式
'This is body.'

使用正则表达式对象

 首先使用re模块的compile()方法将正则表达式编译生成正则表达式对象，然后再使用正则表达式对象提供的方法进行字符串处理。

 使用编译后的正则表达式对象可以提高字符串处理速度。

 正则表达式对象的match(string[, pos[, endpos]])方法用于在字符串开头或指定位置进行搜索，模式必须出现在字符串开头或指定位置；

 正则表达式对象的search(string[, pos[, endpos]])方法用于在整个字符串中进行搜索；

 正则表达式对象的findall(string[, pos[, endpos]])方法用于在字符串中查找所有符合正则表达式的字符串列表。

>>> import re
>>> example = 'ShanDong Institute of Business and Technology'
>>> pattern = re.compile(r'\bB\w+\b') #查找以B开头的单词
>>> pattern.findall(example) #使用正则表达式对象的findall()方法
['Business']
>>> pattern = re.compile(r'\w+g\b') #查找以字母g结尾的单词
>>> pattern.findall(example)
['ShanDong']
>>> pattern = re.compile(r'\b[a-zA-Z]{3}\b')#查找3个字母长的单词
>>> pattern.findall(example)
['and']

子模式

使用()表示一个子模式，括号中的内容作为一个整体出现，例如’(red)+’可以匹配’redred’、’redredred‘等多个重复’red’的情况。

>>> telNumber = '''Suppose my Phone No. is 0535-1234567,
yours is 010-12345678, his is 025-87654321.'''
>>> pattern = re.compile(r'(\d{3,4})-(\d{7,8})')
>>> pattern.findall(telNumber)
[('0535', '1234567'), ('010', '12345678'), ('025', '87654321')]

正则表达式对象的match方法和search方法匹配成功后返回match对象。match对象的主要方法有：

 group()：返回匹配的一个或多个子模式内容

 groups()：返回一个包含匹配的所有子模式内容的元组

 groupdict()：返回包含匹配的所有命名子模式内容的字典

 start()：返回指定子模式内容的起始位置

 end()：返回指定子模式内容的结束位置的前一个位置

 span()：返回一个包含指定子模式内容起始位置和结束位置前一个位置的元组。

>>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist")
>>> m.group(0) #返回整个模式内容
'Isaac Newton'
>>> m.group(1) #返回第1个子模式内容
'Isaac'
>>> m.group(2) #返回第2个子模式内容.
'Newton'
>>> m.group(1, 2) #返回指定的多个子模式内容
('Isaac', 'Newton')

这篇关于python学习（十）正则表达式的文章就介绍到这儿，希望我们推荐的文章对大家有所帮助，也希望大家多多支持为之网！

python学习（十）正则表达式

使用正则表达式对象

子模式

正则表达式对象的match方法和search方法匹配成功后返 回match对象。match对象的主要方法有：

相关编程文章

正则表达式对象的match方法和search方法匹配成功后返回match对象。match对象的主要方法有：