Python爬虫之利用requests,BeautifulSoup爬取小说标题、章节
2021/10/21 20:09:46
本文主要是介绍Python爬虫之利用requests,BeautifulSoup爬取小说标题、章节,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
- 爬取雪鹰领主标题和章节内容为列:
- 查看网页的源代码,如下图所示:
- 获取html内容部分
import requests headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'} response = requests.get('https://quanxiaoshuo.com/177913/', headers=headers)
- 获取标题代码部分
from bs4 import BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser', from_encoding='utf-8')#html.parser或lxml title = [] for volumn in soup.find_all(class_="volumn"): b = volumn.find('b') if b!=None: b_title = b.string title.append({'volumn': b_title})
- 获取章节代码部分
chapters = [] for chapter in soup.find_all(class_='chapter'):# 获取所有的a标记中url和章节内容 a = chapter.find('a') chapter_title = a.get('title') chapters.append({'chapter_title': chapter_title})
- 保存为json数据部分
import json with open('xylz_title.json', 'w') as fp: json.dump(title, fp=fp, indent=4) with open('xylz_chapters.json', 'w') as fp: json.dump(chapters, fp=fp, indent=4)
- 完整代码如下:
import requests from bs4 import BeautifulSoup import json #获取html内容 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko'} response = requests.get('https://quanxiaoshuo.com/177913/', headers=headers) #分析结构,抽取要标记的位置。获取标题与章节 soup = BeautifulSoup(response.text, 'html.parser', from_encoding='utf-8')#html.parser或lxml title = [] for volumn in soup.find_all(class_="volumn"): b = volumn.find('b') if b!=None: b_title = b.string# 获取标题 title.append({'volumn': b_title}) chapters = [] for chapter in soup.find_all(class_='chapter'):# 获取所有的a标记中章节 a = chapter.find('a') chapter_title = a.get('title') chapters.append({'chapter_title': chapter_title}) #将标题,章节和链接进行JSON储存 with open('xylz_title.json', 'w') as fp: json.dump(title, fp=fp, indent=4) with open('xylz_chapters.json', 'w') as fp: json.dump(chapters, fp=fp, indent=4)
这篇关于Python爬虫之利用requests,BeautifulSoup爬取小说标题、章节的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2025-01-03用FastAPI掌握Python异步IO:轻松实现高并发网络请求处理
- 2025-01-02封装学习:Python面向对象编程基础教程
- 2024-12-28Python编程基础教程
- 2024-12-27Python编程入门指南
- 2024-12-27Python编程基础
- 2024-12-27Python编程基础教程
- 2024-12-27Python编程基础指南
- 2024-12-24Python编程入门指南
- 2024-12-24Python编程基础入门
- 2024-12-24Python编程基础:变量与数据类型