Python爬虫实例-必应壁纸批量爬取
2021/4/17 20:26:26
本文主要是介绍Python爬虫实例-必应壁纸批量爬取,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
完整代码
import requests from lxml import etree import os def get_user_input(): print('要下载哪几页呀?可以输在下边,像这样"4 6 8",用空格分开,或者中间加个减号表示范围,像这样"4-7"') user_input = input() if len(user_input) == 1: start_end_ = user_input print('你要下载的这页:' + str(start_end_)) else: if '-' in user_input: test = list(user_input.replace('-', ' ').split()) start_end_ = list(range(int(test[0]), int(test[1]) + 1)) print('你要下载的是这些页:' + str(start_end_)) else: start_end_ = [int(n) for n in user_input.split()] print('你要下载的是这些页:' + str(start_end_)) return start_end_ def get_page_urls(start_end_): all_page_urls = [] for num in start_end_: all_page_urls.append('https://bing.ioliu.cn/?p={}'.format(str(num))) print('这是你要下载图片的网址:') print(all_page_urls) return all_page_urls if __name__ == '__main__': header = {'User-Agent': 'w'} page_number = 0 start_end = get_user_input() for page_url in get_page_urls(start_end): img_number = 1 res = requests.get(page_url, headers=header).text html = etree.HTML(res) img_url = html.xpath('//img/@src') if not os.path.exists('D:/Downloads/bing_wallpaper'): os.mkdir('D:/Downloads/bing_wallpaper') print('正在下载第{}页图片'.format(start_end[page_number])) for img_list in img_url: img_list = img_list.replace('640x480', '1920x1080') img = requests.get(img_list, headers=header).content html_text = html.xpath("/html/body/div[3]/div[" + str(img_number) + "]/div/div[1]/h3/text()")[0] html_text_format = str(html_text).replace(',', '_').replace('/', '_') img_name = (str(page_number * 12 + img_number) + '_' + str(html_text_format) + '.jpg') with open('D:\\Downloads\\bing_wallpaper\\' + img_name, 'wb') as save_img: # 写入图片数据 save_img.write(img) img_number += 1 page_number += 1
要点
导包
import requests
from lxml import etree
import os
定义一个函数,可以获取user的input(只是单纯为了方便),start_end_用来记录所爬页的页码
def get_user_input():
print(‘要下载哪几页呀?可以输在下边,像这样"4 6 8",用空格分开,或者中间加个减号表示范围,像这样"4-7"’)
user_input = input()
if len(user_input) == 1:
start_end_ = user_input
print(‘你要下载的这页:’ + str(start_end_))
else:
if ‘-’ in user_input:
test = list(user_input.replace(’-’, ’ ').split())
start_end_ = list(range(int(test[0]), int(test[1]) + 1))
print(‘你要下载的是这些页:’ + str(start_end_))
else:
start_end_ = [int(n) for n in user_input.split()]
print(‘你要下载的是这些页:’ + str(start_end_))
return start_end_
定义一个函数,可以获取要爬取所有所需页的url(这样看着比较清晰)
def get_page_urls(start_end_):
all_page_urls = []
for num in start_end_:
all_page_urls.append(‘https://bing.ioliu.cn/?p={}’.format(str(num)))
print(‘这是你要下载图片的网址:’)
print(all_page_urls)
return all_page_urls
主函数
if __ name__ == ‘__ main__’:
给个header
header = {‘User-Agent’: ‘w’}
page_number 用来记录下载的页数,下边配合(start_end[page_number]))用
page_number = 0
start_end = get_user_input()
这里开始解析网页
for page_url in get_page_urls(start_end):
img_number = 1
res = requests.get(page_url, headers=header).text
html = etree.HTML(res)
img_url = html.xpath(’//img/@src’)
创建文件夹
if not os.path.exists(‘D:/Downloads/bing_wallpaper’):
os.mkdir(‘D:/Downloads/bing_wallpaper’)
调整图片的分辨率
img_list = img_list.replace(‘640x480’, ‘1920x1080’)
get content
img = requests.get(img_list, headers=header).content
这里自己想一个命名图片的规则就可以了,比如我是像这样“1_新河峡国家公园中的新河峡大桥_西弗吉尼亚州 (© Entropy Workshop_iStock_Getty Images Plus)”
html_text = html.xpath("/html/body/div[3]/div[" + str(img_number) + “]/div/div[1]/h3/text()”)[0]
html_text_format = str(html_text).replace(’,’, ‘’).replace(’/’, '’)
img_name = (str(page_number * 12 + img_number) + ‘_’ + str(html_text_format) + ‘.jpg’)
把图片保存到文件夹里
with open(‘D:\Downloads\bing_wallpaper\’ + img_name, ‘wb’) as save_img:
save_img.write(img)
img_number += 1
这里记得把page_number +1
page_number += 1
这篇关于Python爬虫实例-必应壁纸批量爬取的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-09-27使用python 将ETH账户的资产打散
- 2024-09-26Python编程基础
- 2024-09-2610 种方法写出更好的 Python 代码
- 2024-09-25Python编程基础详解
- 2024-09-25Python编程入门教程
- 2024-09-25从零开始使用Python构建LLaMA 3
- 2024-09-23Python中理解和使用树形结构的简单教程
- 2024-09-23Python 编程基础入门
- 2024-09-18初探Python股票自动化交易:入门指南
- 2024-09-18Python量化入门:轻松掌握量化分析基础与实战