python爬虫爬取免费简历模板实例
2021/7/20 17:40:05
本文主要是介绍python爬虫爬取免费简历模板实例,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
爬取目标网站https://sc.chinaz.com/jianli/free.html
思路
思路捋清,直接上代码
# -*- codeing = utf-8 -*- # @Time : 2021/7/20 10:13 # @Author : ArthurHuang # @File : 10_xpath解析案例_站长素材中免费简历模板爬取.py # @Software : PyCharm import requests from lxml import html etree = html.etree #新版本etree现在需要这样导入 import os if __name__ == "__main__": url = 'http://sc.chinaz.com/jianli/free_%d.html' for page in range(1, 6): # 循环取前5页,每页20张简历 # UA伪装:将对应的User-Agent封装到一个字典中 headers = { "User-Agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, like Gecko) Chrome / 91.0.4472.77 Safari / 537.36" } if page == 1: # 第一页与其余几页的url不同,需要分开写 new_url = 'http://sc.chinaz.com/jianli/free.html' else: new_url = format(url % page) page_text = requests.get(url=new_url, headers=headers).text # 实例化etree对象 tree = etree.HTML(page_text) # 创建一个文件夹保存图片 if not os.path.exists('./jianliLibs'): os.mkdir('./jianliLibs') a_list = tree.xpath('//div[@id="container"]/div/a') for a in a_list: # 获取简历名称列表 all_titles = a.xpath('./img/@alt')[0]+'.zip' all_titles = all_titles.encode('iso-8859-1').decode('utf-8') # 通用处理中文乱码的解决方案 #print(all_titles) # 获取每个简历对应的单独网页地址 all_href = 'https:'+a.xpath('./@href')[0] response = requests.get(url=all_href, headers=headers) resume_data = response.text resumetree = etree.HTML(resume_data) resume_download_list = resumetree.xpath('//div[@id="down"]/div[2]/ul/li[1]') # 每个简历对应的点击下载的地址 for download in resume_download_list: all_downloads = download.xpath('./a/@href')[0] resume_rar_page = requests.get(url=all_downloads, headers=headers).content # 向点击下载的url发送请求,把简历下载到本地 resume_path = 'jianliLibs/' + all_titles with open(resume_path, 'wb')as fp: fp.write(resume_rar_page ) print(all_titles, "下载成功!!!")
成功获取
这篇关于python爬虫爬取免费简历模板实例的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-05-08有遇到过吗?同样的规则 Excel 中 比Python 结果大
- 2024-03-30开始python成长之路
- 2024-03-29python optparse
- 2024-03-29python map 函数
- 2024-03-20invalid format specifier python
- 2024-03-18pool.map python
- 2024-03-18threads in python
- 2024-03-14python Ai 应用开发基础训练,字符串,字典,文件
- 2024-03-13id3 algorithm python
- 2024-03-13sum array elements python