爬取站长素材
2021/8/21 23:06:53
本文主要是介绍爬取站长素材,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
1 import requests 2 from lxml import etree 3 import os 4 if __name__ == "__main__": 5 url = "https://aspx.sc.chinaz.com/query.aspx" 6 headers = { 7 "User-Agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3868.400 QQBrowser/10.8.4394.400" 8 } 9 if not os.path.exists('./zhanzhangsucai1'): 10 os.mkdir('./zhanzhangsucai1') 11 for page in range(11, 26): 12 page = str(page) 13 param = { 14 "keyword": "免费", 15 "issale": "", 16 "classID": "864", 17 "page": page 18 } 19 page_text = requests.get(url=url, params=param, headers=headers).text 20 tree = etree.HTML(page_text) 21 div_list = tree.xpath('//div[@class="box col3 ws_block"]') 22 print('第' + page + '页下载中') 23 for li in div_list: 24 detail_url = 'https:' + li.xpath('./a/@href')[0] 25 detail_page_text = requests.get(url=detail_url, headers=headers).text 26 detail_page_text = detail_page_text.encode('iso-8859-1').decode('utf-8') 27 detail_tree = etree.HTML(detail_page_text) 28 href_li = detail_tree.xpath('//div[@class="clearfix mt20 downlist"]/ul/li')[0] 29 ppt_url = href_li.xpath('./a/@href')[0] 30 ppt_Name = li.xpath('./a/img/@alt')[0] + '.rar' 31 file_data = requests.get(url=ppt_url, headers=headers).content 32 ppt_path = 'zhanzhangsucai1/' + ppt_Name 33 with open(ppt_path, 'wb') as fp: 34 fp.write(file_data) 35 print(ppt_Name, '下载成功!!!') 36 print('第' + page + '页下载完成')
这篇关于爬取站长素材的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-27本地多文件上传的简单教程
- 2024-11-27低代码开发:初学者的简单教程
- 2024-11-27如何轻松掌握拖动排序功能
- 2024-11-27JWT入门教程:从零开始理解与实现
- 2024-11-27安能物流 All in TiDB 背后的故事与成果
- 2024-11-27低代码开发入门教程:轻松上手指南
- 2024-11-27如何轻松入门低代码应用开发
- 2024-11-27ESLint开发入门教程:从零开始使用ESLint
- 2024-11-27Npm 发布和配置入门指南
- 2024-11-27低代码应用课程:新手入门指南