python爬虫的两种方式
2021/11/14 1:10:34
本文主要是介绍python爬虫的两种方式,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
1、requests方式
(1)无头部信息
import requests url = "https://www.cnblogs.com/dearvee/p/6558571.html" response = requests.get(url) response.encoding = 'utf-8' print(response.text)
(2)有头部信息
import requests url = "https://www.cnblogs.com/dearvee/p/6558571.html" headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"} response = requests.get(url, headers=headers) response.encoding = 'utf-8' print(response.text)
2、urllib.request方式
(1)无Request请求
from urllib import request url = "https://www.cnblogs.com/dearvee/p/6558571.html" response = request.urlopen(url) print(response.read().decode('utf-8'))
(2)构造Request请求
from urllib import request url = "https://www.cnblogs.com/dearvee/p/6558571.html" headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"} req = request.Request(url, headers=headers) response = request.urlopen(req) print(response.read().decode('utf-8'))
3、捕获错误信息
from urllib import request, error url = "https://www.douban.com" try: req = request.Request(url) response = request.urlopen(req) print(response.read().decode('utf-8')) except error.HTTPError as e: print(e)
4、随机获取头部信息
from fake_useragent import UserAgent ua = UserAgent() print(ua.ie) #随机打印ie浏览器任意版本 print(ua.firefox) #随机打印firefox浏览器任意版本 print(ua.chrome) #随机打印chrome浏览器任意版本 print(ua.random) #随机打印任意厂家的浏览器
这篇关于python爬虫的两种方式的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-09-29点击加载学习:Python编程基础教程
- 2024-09-29数据科学五大Python前端库:第二部分
- 2024-09-27使用python 将ETH账户的资产打散
- 2024-09-26Python编程基础
- 2024-09-2610 种方法写出更好的 Python 代码
- 2024-09-25Python编程基础详解
- 2024-09-25Python编程入门教程
- 2024-09-25从零开始使用Python构建LLaMA 3
- 2024-09-23Python中理解和使用树形结构的简单教程
- 2024-09-23Python 编程基础入门