在现代 Web 开发中,与 API 交互是必不可少的一项技能。无论是调用第三方接口获取数据,还是开发自己的后端服务,HTTP 请求都是连接客户端与服务端的桥梁。
Python 的 requests 库是 Python 生态中最流行的 HTTP 客户端库,它以简洁优雅的 API 设计著称。本文将带你从入门到精通,全面掌握 requests 库的用法。
为什么选择 requests?#
在 requests 出现之前,Python 开发者通常使用 urllib 来发送 HTTP 请求。但 urllib 的 API 设计较为复杂,需要处理编码、会话管理、Cookie 等各种细节。requests 库的出现彻底改变了这一局面:
- 简洁的 API:一行代码即可完成 GET 请求
- 自动编码处理:无需手动处理编码问题
- 会话管理:方便管理 Cookie 和会话
- 超时控制:防止请求无限等待
- 文件上传/下载:内置支持
- 代理支持:轻松配置代理服务器
# 安装 requests
pip install requests基础用法#
GET 请求#
GET 请求是最常用的 HTTP 方法,用于从服务器获取数据。
import requests
# 最简单的 GET 请求
response = requests.get('https://api.github.com')
# 查看响应状态码
print(response.status_code) # 200
# 查看响应内容
print(response.text)
# 查看响应头
print(response.headers)
# 查看 JSON 响应(如果服务器返回 JSON)
data = response.json()
print(data)带参数的 GET 请求#
有两种方式传递 URL 参数:
# 方式一:直接拼接 URL
response = requests.get('https://api.github.com/search/repositories?q=python&sort=stars')
# 方式二:使用 params 参数(推荐,更清晰)
params = {
'q': 'python',
'sort': 'stars',
'order': 'desc'
}
response = requests.get('https://api.github.com/search/repositories', params=params)
print(response.url) # 自动拼接成完整 URLPOST 请求#
POST 请求用于向服务器提交数据。
# 提交表单数据
data = {
'username': 'test',
'password': '123456'
}
response = requests.post('https://httpbin.org/post', data=data)
print(response.json())
# 提交 JSON 数据
import json
response = requests.post(
'https://httpbin.org/post',
json={'name': '张三', 'age': 25}
)
print(response.json())其他 HTTP 方法#
requests 支持所有标准的 HTTP 方法:
# PUT 请求 - 更新资源
response = requests.put('https://httpbin.org/put', data={'key': 'value'})
# DELETE 请求 - 删除资源
response = requests.delete('https://httpbin.org/delete')
# PATCH 请求 - 部分更新
response = requests.patch('https://httpbin.org/patch', data={'key': 'new_value'})
# HEAD 请求 - 只获取响应头
response = requests.head('https://httpbin.org/get')
# OPTIONS 请求 - 获取支持的 HTTP 方法
response = requests.options('https://httpbin.org/get')
print(response.headers.get('allow'))响应对象详解#
发送请求后,会返回一个 Response 对象,它包含了服务器的所有响应信息。
常用属性#
response = requests.get('https://httpbin.org/get')
# 状态码
print(response.status_code) # 200
# 判断状态码是否成功(2xx 表示成功)
print(response.ok) # True
# 响应内容
print(response.text) # 文本形式(自动解码)
print(response.content) # 字节形式(二进制数据)
# JSON 响应
print(response.json()) # 解析为 Python 对象
# 响应头
print(response.headers) # 所有响应头
print(response.headers['Content-Type']) # 特定响应头
# 请求信息
print(response.url) # 最终请求的 URL
print(response.request) # 请求对象状态码处理#
# 方式一:手动检查状态码
response = requests.get('https://httpbin.org/get')
if response.status_code == 200:
print('请求成功')
elif response.status_code == 404:
print('资源不存在')
else:
print('其他错误')
# 方式二:使用 requests 内置的状态码常量
from requests import codes
response = requests.get('https://httpbin.org/get')
if response.status_code == codes.ok: # codes.ok == 200
print('请求成功')
# 常见状态码常量
print(codes.ok) # 200
print(codes.created) # 201
print(codes.no_content) # 204
print(codes.not_found) # 404
print(codes.server_error) # 500-599请求头处理#
添加自定义请求头#
headers = {
'User-Agent': 'MyApp/1.0',
'Accept': 'application/json',
'Authorization': 'Bearer your_token_here',
'X-Custom-Header': 'custom_value'
}
response = requests.get('https://httpbin.org/headers', headers=headers)
print(response.json())查看响应的编码#
response = requests.get('https://httpbin.org/encoding/utf8')
print(response.encoding) # 自动检测编码
# 手动设置编码
response.encoding = 'utf-8'
print(response.text)Cookie 处理#
发送带 Cookie 的请求#
# 方式一:通过 headers 发送
cookies = {'session_id': 'abc123'}
response = requests.get('https://httpbin.org/cookies', headers={'Cookie': 'session_id=abc123'})
# 方式二:使用 cookies 参数
response = requests.get('https://httpbin.org/cookies', cookies={'session_id': 'abc123'})
print(response.json())
# 设置多个 Cookie
cookies = {
'name': '张三',
'age': '25',
'city': '北京'
}
response = requests.get('https://httpbin.org/cookies', cookies=cookies)获取响应中的 Cookie#
response = requests.get('https://httpbin.org/cookies/set/name/value')
print(response.cookies) # 获取所有 Cookie
print(response.cookies['name']) # 获取特定 Cookie会话对象#
Session 对象允许你在多个请求之间保持某些参数和数据,如 Cookie、请求头等。
为什么使用 Session?#
普通请求每次都会创建新的连接,而 Session 会复用 TCP 连接,提高性能:
# 普通方式 - 每次请求都是独立的
requests.get('https://httpbin.org/cookies/set/session_id/abc123')
requests.get('https://httpbin.org/cookies') # 无法获取到上面的 Cookie
# 使用 Session - Cookie 会自动保持
session = requests.Session()
session.get('https://httpbin.org/cookies/set/session_id/abc123')
response = session.get('https://httpbin.org/cookies')
print(response.json()) # {'cookies': {'session_id': 'abc123'}}Session 的实际应用#
# 模拟登录并保持会话
session = requests.Session()
# 1. 登录获取 Cookie
login_data = {
'username': 'your_username',
'password': 'your_password'
}
session.post('https://example.com/login', data=login_data)
# 2. 后续请求自动带上 Cookie
response = session.get('https://example.com/profile')
print(response.text)
# 3. 设置默认请求头(所有请求都会带上)
session.headers.update({
'User-Agent': 'MyApp/1.0',
'Accept': 'application/json'
})
# 4. 清理会话
session.close()超时与重试#
设置超时#
超时设置非常重要,可以防止请求无限等待导致程序卡死:
# 设置超时时间(秒)
response = requests.get('https://httpbin.org/delay/1', timeout=5) # 5秒超时
# 分别设置连接超时和读取超时
response = requests.get(
'https://httpbin.org/delay/2',
timeout=(3, 10) # (连接超时, 读取超时)
)
# 不设置超时可能导致程序卡死
# response = requests.get('https://httpbin.org/delay/10') # 危险!超时异常处理#
from requests.exceptions import Timeout, RequestException
try:
response = requests.get('https://httpbin.org/delay/10', timeout=5)
print(response.json())
except Timeout:
print('请求超时,服务器响应太慢')
except RequestException as e:
print(f'请求失败: {e}')配置重试#
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def create_retry_session(retries=3, backoff_factor=0.5):
"""创建一个带有重试机制的 Session"""
session = requests.Session()
retry = Retry(
total=retries, # 总重试次数
read=retries, # 读取重试次数
connect=retries, # 连接重试次数
backoff_factor=backoff_factor, # 重试间隔倍数
status_forcelist=[500, 502, 503, 504] # 遇到这些状态码时重试
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
return session
# 使用
session = create_retry_session()
response = session.get('https://httpbin.org/status/500')错误处理#
requests 库定义了多种异常类型:
from requests.exceptions import (
ConnectionError, # 网络连接错误
HTTPError, # HTTP 错误响应
Timeout, # 请求超时
TooManyRedirects, # 重定向次数过多
RequestException # 所有异常的基类
)
try:
response = requests.get('https://httpbin.org/get')
response.raise_for_status() # 如果状态码不是 200,抛出异常
except HTTPError as e:
print(f'HTTP 错误: {e.response.status_code}')
except ConnectionError:
print('网络连接失败')
except Timeout:
print('请求超时')
except TooManyRedirects:
print('重定向次数过多')
except RequestException as e:
print(f'请求失败: {e}')使用 raise_for_status()#
response = requests.get('https://httpbin.org/status/404')
try:
response.raise_for_status()
except HTTPError as e:
print(f'错误: {e}')
print(f'响应内容: {response.text}')文件上传与下载#
上传文件#
# 上传文件
files = {
'file': open('example.txt', 'rb')
}
response = requests.post('https://httpbin.org/post', files=files)
# 指定文件名和内容类型
files = {
'file': ('custom_name.txt', open('example.txt', 'rb'), 'text/plain')
}
response = requests.post('https://httpbin.org/post', files=files)
# 上传图片
files = {
'image': ('photo.jpg', open('photo.jpg', 'rb'), 'image/jpeg')
}
response = requests.post('https://httpbin.org/post', files=files)
# 上传多个文件
files = [
('images', ('img1.jpg', open('img1.jpg', 'rb'), 'image/jpeg')),
('images', ('img2.jpg', open('img2.jpg', 'rb'), 'image/jpeg'))
]
response = requests.post('https://httpbin.org/post', files=files)下载文件#
# 下载小文件
response = requests.get('https://example.com/image.jpg')
with open('image.jpg', 'wb') as f:
f.write(response.content)
# 下载大文件(流式下载)
response = requests.get('https://example.com/large_file.zip', stream=True)
with open('large_file.zip', 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
# 使用迭代器下载
with requests.get('https://example.com/image.jpg', stream=True) as response:
with open('image.jpg', 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)下载进度显示#
import os
def download_with_progress(url, filename):
"""带进度显示的下载函数"""
response = requests.get(url, stream=True)
total_size = int(response.headers.get('content-length', 0))
downloaded = 0
with open(filename, 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
f.write(chunk)
downloaded += len(chunk)
progress = (downloaded / total_size) * 100
print(f'\r下载进度: {progress:.1f}%', end='')
print(f'\n文件已保存到: {filename}')
# 使用
download_with_progress(
'https://example.com/large_file.zip',
'large_file.zip'
)SSL 证书验证#
忽略 SSL 证书验证(仅用于测试)#
# 警告:生产环境不要这样做!
response = requests.get('https://example.com', verify=False)
# 消除警告
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
response = requests.get('https://example.com', verify=False)指定 CA 证书#
# 使用自定义 CA 证书
response = requests.get('https://example.com', verify='/path/to/ca-bundle.crt')
# 使用系统默认 CA 证书
import certifi
response = requests.get('https://example.com', verify=certifi.where())代理设置#
HTTP 代理#
# 普通代理
proxies = {
'http': 'http://proxy.example.com:8080',
'https': 'http://proxy.example.com:8080'
}
response = requests.get('https://httpbin.org/ip', proxies=proxies)
# 带认证的代理
proxies = {
'http': 'http://user:password@proxy.example.com:8080',
'https': 'http://user:password@proxy.example.com:8080'
}
response = requests.get('https://httpbin.org/ip', proxies=proxies)在 Session 中使用代理#
session = requests.Session()
session.proxies = {
'http': 'http://proxy.example.com:8080',
'https': 'http://proxy.example.com:8080'
}
response = session.get('https://httpbin.org/ip')身份认证#
基本认证#
from requests.auth import HTTPBasicAuth
response = requests.get(
'https://api.github.com/user',
auth=HTTPBasicAuth('username', 'password')
)
# 简写形式
response = requests.get('https://api.github.com/user', auth=('username', 'password'))其他认证方式#
from requests.auth import HTTPBasicAuth, HTTPDigestAuth
# Digest 认证
response = requests.get('https://httpbin.org/digest-auth/auth/user/passwd',
auth=HTTPDigestAuth('user', 'passwd'))
# OAuth 认证
headers = {
'Authorization': 'Bearer your_oauth_token'
}
response = requests.get('https://api.example.com/protected', headers=headers)实战案例#
调用 GitHub API#
import requests
class GitHubAPI:
"""GitHub API 封装"""
def __init__(self, token=None):
self.base_url = 'https://api.github.com'
self.headers = {
'Accept': 'application/vnd.github.v3+json'
}
if token:
self.headers['Authorization'] = f'token {token}'
def get_user(self, username):
"""获取用户信息"""
response = requests.get(
f'{self.base_url}/users/{username}',
headers=self.headers
)
response.raise_for_status()
return response.json()
def get_repos(self, username, sort='updated'):
"""获取用户的仓库列表"""
response = requests.get(
f'{self.base_url}/users/{username}/repos',
headers=self.headers,
params={'sort': sort, 'per_page': 10}
)
response.raise_for_status()
return response.json()
def create_issue(self, owner, repo, title, body):
"""创建 Issue"""
url = f'{self.base_url}/repos/{owner}/{repo}/issues'
data = {'title': title, 'body': body}
response = requests.post(url, headers=self.headers, json=data)
response.raise_for_status()
return response.json()
# 使用
github = GitHubAPI(token='your_token_here')
user_info = github.get_user('Hungerdream')
print(f"用户名: {user_info['name']}")
print(f"仓库数: {user_info['public_repos']}")
repos = github.get_repos('Hungerdream')
for repo in repos[:5]:
print(f"- {repo['name']}: ⭐ {repo['stargazers_count']}")批量下载图片#
import requests
import os
from concurrent.futures import ThreadPoolExecutor, as_completed
def download_image(url, save_path):
"""下载单张图片"""
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
# 从 URL 提取文件名
filename = url.split('/')[-1]
filepath = os.path.join(save_path, filename)
with open(filepath, 'wb') as f:
f.write(response.content)
return f'成功: {filename}'
except Exception as e:
return f'失败: {url} - {e}'
def batch_download(urls, save_path, max_workers=5):
"""批量下载图片"""
os.makedirs(save_path, exist_ok=True)
with ThreadPoolExecutor(max_workers=max_workers) as executor:
futures = {executor.submit(download_image, url, save_path): url
for url in urls}
for future in as_completed(futures):
print(future.result())
# 使用
image_urls = [
'https://example.com/image1.jpg',
'https://example.com/image2.jpg',
'https://example.com/image3.jpg',
]
batch_download(image_urls, 'downloads/')最佳实践#
1. 使用 context manager 管理资源#
# 确保连接被正确关闭
with requests.Session() as session:
response = session.get('https://api.example.com/data')
data = response.json()2. 始终设置超时#
response = requests.get(url, timeout=10) # 始终设置合理超时3. 异常处理#
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
except requests.RequestException as e:
logger.error(f'请求失败: {e}')
raise4. 复用 Session#
# 不要这样写
for url in urls:
response = requests.get(url) # 每次创建新连接
# 应该这样写
with requests.Session() as session:
for url in urls:
response = session.get(url) # 复用连接5. 不要禁用 SSL 验证#
# 危险!
response = requests.get(url, verify=False)
# 如果必须使用自签名证书
import certifi
response = requests.get(url, verify=certifi.where())总结#
requests 库是 Python 中处理 HTTP 请求的最佳选择。它的设计哲学是"让 HTTP 请求变得简单",而它确实做到了这一点。
本文涵盖了 requests 库的绝大部分功能:
- ✅ GET/POST 等 HTTP 方法
- ✅ 请求头、Cookie、会话管理
- ✅ 超时、重试、错误处理
- ✅ 文件上传下载
- ✅ SSL 证书、代理设置
- ✅ 各种认证方式
- ✅ 实战案例和最佳实践
掌握这些内容,你就能应对绝大多数与 HTTP 相关的开发需求了。
📚 推荐阅读
