Python爬虫
2021/10/16 11:09:39
本文主要是介绍Python爬虫,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
import requests response = requests.get("http://www.baidu.com") print(response.text) #str 有乱码 response.encoding = 'utf8' print(response.text) print(response.content) #二进制 print(response.content.decode()) #将二进制解码
import requests from bs4 import BeautifulSoup soup = BeautifulSoup('<html>data</html>','lxml') print(soup)
import requests from bs4 import BeautifulSoup ''' find(self,name=None,attrs={},recursive=True,text=None,**kwargs) name:标签名<a> attrs:属性字典class recursive:是否递归循环查找 text:根据文本内容查找 返回:查找到第一个元素对象 find_all find(id='link1') find(attrs={'id':'link1'}) find(text='123') a.name 标签名 a.attts 属性 a.text 文本内容 '''
import requests from bs4 import BeautifulSoup response = requests.get('https://ncov.dxy.cn/ncovh5/view/pneumonia') home_page = response.content.decode() soup = BeautifulSoup(home_page,'lxml') script = soup.find(id='getListByCountryTypeService2true') text = script.text print(text)
import re rs = re.findall('a\nbc','a\nbc') print(rs) rs = re.findall('a\\nbc','a\\nbc') print(rs) rs = re.findall('a\\\\nbc','a\\nbc') print(rs) rs = re.findall(r'a\nbc','a\nbc') print(rs) rs = re.findall(r'\d','a123') print(rs)
import json import requests from bs4 import BeautifulSoup import re response = requests.get('https://ncov.dxy.cn/ncovh5/view/pneumonia') home_page = response.content.decode() soup = BeautifulSoup(home_page,'lxml') script = soup.find(id='getListByCountryTypeService2true') text = script.text #print(text) json_str = re.findall(r'\[.+\]',text)[0] #print(json_str) last_day = json.loads(json_str) print(last_day)
这篇关于Python爬虫的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-14获取参数学习:Python编程入门教程
- 2024-11-14Python编程基础入门
- 2024-11-14Python编程入门指南
- 2024-11-13Python基础教程
- 2024-11-12Python编程基础指南
- 2024-11-12Python基础编程教程
- 2024-11-08Python编程基础与实践示例
- 2024-11-07Python编程基础指南
- 2024-11-06Python编程基础入门指南
- 2024-11-06怎么使用python 计算两个GPS的距离功能-icode9专业技术文章分享