Requests 库学习

了解一下 Requests 库

可以非常简单的发生网络请求，多用于爬虫

准备阶段

首先需要安装Requests库
```
pip install requests
```
使用前导入库文件
```
import requests
```

开始使用

requests.get

通过requests调用get,post,put,delete,head,options等方法: 调用时参数列表中可以传入的参数有data, json, headers, cookies, auth, timeout, proxies ....; 返回的是一个Response对象

get 请求时，传递的字典参数传递给params，会通过?拼接到url后面

import requests

# get请求头参数
# 这里的params参数是为了拼接到url后面；post请求时的传递的参数是传递给data
params = {'key': 'value1', 'key1': 2}
proxies = {}
header = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
    }
my_json = {'some': 'data'}

# 调用get方法
resp = requests.get('http://httpbin.org', params=params, headers=header, timeout=20, proxies=proxies, json=my_json)
print(resp.url)  # http://httpbin.org/?key=value1&key1=2
print(resp.status_code)  # 200
print(resp.cookies)  # <RequestsCookieJar[]>
print(resp.headers)  # {'Date': 'Sat, 27 Mar 2021 03:02:42 GMT', 'Content-Type': 'text/html; charset=utf-8', ...}
print(resp.encoding)  # utf-8
print(resp.history)  # []
print(resp.text)  # 返回的是整个html代码
resp.encoding = 'ISO-8859-1'  # 通过resp.encoding 属性来改变编码
print(resp.encoding)  # ISO-8859-1

# 调用其他方法
resp = requests.put('http://httpbin.org/put', data = {'key':'value'})
resp = requests.delete('http://httpbin.org/delete')
resp = requests.head('http://httpbin.org/get')
resp = requests.options('http://httpbin.org/get')

# 下载保存图片 二进制
image_url = "https://t7.baidu.com/it/u=2780797146,595893742&fm=193&f=GIF"
resp = requests.get(image_url, headers=header)
with open('views.jpg', 'wb') as f:
    # iter_content 避免了一次将大量的内容读入到内存中
    for data in resp.iter_content(1024):
        f.write(data)

requests.post

post请求像一个HTML表单，一些简单的参数可以传递一个字典给data参数，字典数据在发出请求的时候会自动编码为表单的形式

post 请求时，相当于form表单的参数传递给data, 会作为form表单参数传递给后端服务器

# -*— coding: utf-8 -*- 
import requests

# post 请求头的参数
data = {"name": "admin", "password": "//pq.2JF"}
# 或者 也可以传入一个元组列表，在表单中多个元素使用同一个key
data2 = (('name', 'admin'), ('name', 'administrator'))
headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36",
    }

resp = requests.post('https://httpbin.org/post', data=data2, headers=headers)
print(resp.text)

post一个多部分编码(Multipart-Encoded)的文件

# -*- coding: utf-8 -*-
import requests

# 上传文件
url = "http://httpbin.org/post"
headers = {}
# 上传xlsx， 显式地设置文件名，文件类型和请求头
files = {'file': ('report.xlsx', open('report.xlsx', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

# 发送作为文件来接收的字符串
# files = {'file': ('report.cvs', 'some,data,to,send\nanother,row,to,send\n')}

# 上传图片 (可以直接将views.jpg换成report.xlsx，也能上传)
# files = {'files': open('views.jpg', 'rb')}

resp = requests.post(url, files=files)
print(resp.status_code)
print(resp.text)
print(resp.headers)

响应中的cookies,可以通过response对象的cookies方法获取到cookies列表
也可以通过 cookies参数，将发送的cookie发送到服务器

Cookie 的返回对象为 RequestsCookieJar，它的行为和字典类似，适合跨域名跨路径使用。还可以把 Cookie Jar 传到 Requests 中

# -*- coding: utf-8 -*-
import requests

# 准备发送到服务器的cookie
cookies = dict(cookies_are="it's Working!")
url = "http://httpbin.org/cookies"

# 创建RequestsCookieJar
jar = requests.cookies.RequestsCookieJar()
jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')

resp = requests.get(url, cookies=cookies)
print(resp.status_code)
print(resp.text)

设置代理服务器

proxiex参数用于设置代理

# -*- coding: utf-8 -*-
import requests

url = "http://httpbin.org"
# 设置代理参数字典
proxies = {'http':'http://10.10.1.10:3128','https': 'http://10.10.1.10:1080',}

resp = requests.get(url, proxies)
pring(resp.status_code)

重定向与请求历史，重定向，错误与异常

默认情况下Requests可以自动处理所有的重定向
响应对象response有一个history方法可以用来追踪重定向
Response.history 是一个 Response 对象的列表，为了完成请求而创建了这些对象
allow_redirects=False 可以设置禁用重定向处理
timeout=1 timeout参数可以设置超时时间单位是秒，timeout 仅对连接过程有效，与响应体的下载无关
异常：Requests显式抛出的异常都继承自requests.exceptions.RequestsException：
- 网络异常时 requests抛出ConnectionError 异常
- HTTP 请求返回了不成功的状态码，抛出 HTTPError 异常
- 请求超时：抛出 Timeout 异常
- 请求超过了设定的最大重定向次数，则会抛出一个 TooManyRedirects 异常

# python # 爬虫