发布时间:2025-03-10 09:30:26编辑:123阅读(17)
asyncio 采用单线程事件循环(event loop)来运行异步任务。这使得它在执行大量并发网络连接时特别高效。 asyncio 定义了一个 async/await 语法糖,用于编写异步代码。这种方法比旧式的回调函数风格更直观、更易于维护。
asyncio 为Python的异步编程提供了核心的框架和工具,但随着需求的不断增长,开发者需要更多功能来构建异步服务。这促进了像 aiohttp 这样的异步网络库的出现,它建立在 asyncio 的基础上,提供了一个简单的异步HTTP客户端和服务器端的实现。
目标网站 https://spa5.scrape.center/
示例代码:
import asyncio import aiohttp import json # 目标网站 https://spa5.scrape.center/ index_url = 'https://spa5.scrape.center/api/book/?limit=18&offset={offset}' # 列表页 detail_url = 'https://spa5.scrape.center/api/book/{id}' # 书详情 page_size = 18 # 每页显示数 concurrency = 10 # 并发数 page_number = 20 # 总page数 实际503 semaphore = asyncio.Semaphore(concurrency) session = None headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/133.0.0.0 Safari/537.36"} async def scrape_api(url): async with semaphore: try: async with session.get(url=url, headers=headers) as response: await asyncio.sleep(0.5) return await response.json() except aiohttp.ClientError: print('error %S', url) async def scrape_index(page): """ 获取列表页url :param page: :return: """ url = index_url.format(offset=str(page_size * (page - 1))) return await scrape_api(url) async def scrape_detail(id): """ 获取详情页信息 :param id: :return: """ url = detail_url.format(id=id) data = await scrape_api(url) return data async def main(): global session session = aiohttp.ClientSession() scrape_index_tasks = [asyncio.ensure_future(scrape_index(page)) for page in range(1, page_number + 1)] results = await asyncio.gather(*scrape_index_tasks) ids = [] for index_data in results: if not index_data: continue for item in index_data.get('results'): ids.append(item.get('id')) scrape_detail_tasks = [asyncio.ensure_future(scrape_detail(i)) for i in ids] ids_results = await asyncio.gather(*scrape_detail_tasks) await session.close() with open('data.json', 'a', encoding='utf-8') as file: file.write(json.dumps(ids_results, ensure_ascii=False)) file.write("\n") if __name__ == '__main__': asyncio.get_event_loop().run_until_complete(main())
上一篇: python爬取有道词典
下一篇: 没有了
48200
46941
37847
35150
29670
26331
25267
20292
19956
18417
17°
6056°
6765°
6249°
6227°
7348°
6173°
6301°
6775°
6782°