python爬虫-2.requests库
2021/8/23 9:28:31
本文主要是介绍python爬虫-2.requests库,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
requests
requests 是在 urllib 的基础上搞出来的,通过它我们可以用更少的代码,模拟浏览器操作,使用比urllib方便
对于不是 python 的内置库
我们需要安装一下,直接使用 pip 安装pip install requests
导入 requests 模块
import requests
具体使用
一行代码 Get 请求
r = requests.get('https://api.github.com/events')
一行代码 Post 请求
r = requests.post('https://httpbin.org/post', data = {'key':'value'})
其它乱七八糟的 Http 请求
>>> r = requests.put('https://httpbin.org/put', data = {'key':'value'}) >>> r = requests.delete('https://httpbin.org/delete') >>> r = requests.head('https://httpbin.org/get') >>> r = requests.options('https://httpbin.org/get')
想要携带请求参数是吧?
>>> payload = {'key1': 'value1', 'key2': 'value2'} >>> r = requests.get('https://httpbin.org/get', params=payload)
假装自己是浏览器
>>> url = 'https://api.github.com/some/endpoint' >>> headers = {'user-agent': 'my-app/0.0.1'} >>> r = requests.get(url, headers=headers)
获取服务器响应文本内容
>>> import requests >>> r = requests.get('https://api.github.com/events') >>> r.text u'[{"repository":{"open_issues":0,"url":"https://github.com/...">https://github.com/... >>> r.encoding 'utf-8'
获取字节响应内容
>>> r.content b'[{"repository":{"open_issues":0,"url":"https://github.com/...
获取响应码
>>> r = requests.get('https://httpbin.org/get')>>> r.status_code200
获取响应头
>>> r.headers{ 'content-encoding': 'gzip', 'transfer-encoding': 'chunked', 'connection': 'close', 'server': 'nginx/1.0.4', 'x-runtime': '148ms', 'etag': '"e1ca502697e5c9317743dc078f67693f"', 'content-type': 'application/json' }
获取 Json 响应内容
>>> import requests >>> r = requests.get('https://api.github.com/events') >>> r.json() [{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
获取 socket 流响应内容
>>> r = requests.get('https://api.github.com/events', stream=True) >>> r.raw<urllib3.response.HTTPResponse object at 0x101194810 >>>> r.raw.read(10) '\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'
Post请求
当你想要一个键里面添加多个值的时候
>>> payload_tuples = [('key1', 'value1'), ('key1', 'value2')] >>> r1 = requests.post('https://httpbin.org/post', data=payload_tuples) >>> payload_dict = {'key1': ['value1', 'value2']} >>> r2 = requests.post('https://httpbin.org/post', data=payload_dict) >>> print(r1.text) { ... "form": { "key1": [ "value1", "value2" ] }, ...} >>> r1.text == r2.textTrue
请求的时候用 json 作为参数
>>> url = 'https://api.github.com/some/endpoint' >>> payload = {'some': 'data'} >>> r = requests.post(url, json=payload)
想上传文件?
>>> url = 'https://httpbin.org/post' >>> files = {'file': open('report.xls', 'rb')} >>> r = requests.post(url, files=files) >>> r.text { ... "files": { "file": "<censored...binary...data>" }, ...}
获取 cookie 信息
>>> url = 'http://example.com/some/cookie/setting/url' >>> r = requests.get(url) >>> r.cookies['example_cookie_name'] 'example_cookie_value'
发送 cookie 信息
>>> url = 'https://httpbin.org/cookies' >>> cookies = dict(cookies_are='working') >>> r = requests.get(url, cookies=cookies) >>> r.text '{"cookies": {"cookies_are": "working"}}'
设置超时
>>> requests.get('https://github.com/', timeout=0.001)Traceback (most recent call last): File "<stdin>", line 1, in <module>requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)
wistbean/learn_python3_spider: python爬虫教程系列
这篇关于python爬虫-2.requests库的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2025-01-03用FastAPI掌握Python异步IO:轻松实现高并发网络请求处理
- 2025-01-02封装学习:Python面向对象编程基础教程
- 2024-12-28Python编程基础教程
- 2024-12-27Python编程入门指南
- 2024-12-27Python编程基础
- 2024-12-27Python编程基础教程
- 2024-12-27Python编程基础指南
- 2024-12-24Python编程入门指南
- 2024-12-24Python编程基础入门
- 2024-12-24Python编程基础:变量与数据类型