对于房天下租房信息进行爬取
2021/6/4 18:23:35
本文主要是介绍对于房天下租房信息进行爬取,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
对于房天下租房信息进行爬取代码
import re import requests from lxml.html import etree url_xpath = '//dd/p[1]/a[1]/@href' title_xpath = '//dd/p[1]/a[1]/@title' data_xpaht = '//dd/p[2]/text()' headers = { 'rpferpr': 'https://sh.zu.fang.com/', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36' } rp = requests.get('https://sh.zu.fang.com/', headers=headers) rp.encoding = rp.apparent_encoding html = etree.HTML(rp.text) url = html.xpath(url_xpath) title = html.xpath(title_xpath) data = re.findall('<p class="font15 mt12 bold">(.*?)</p>', rp.text, re.S) mold_lis = [] house_type_lis = [] area_lis = [] for a in data: a = re.sub('�O', '平方米', a) mold = re.findall('\r\n\s.*?(\S.*?)<span class="splitline">', a) house_type_area = re.findall('</span>(.*?)<span class="splitline">', a) try: mold_lis.append(mold[0]) house_type_lis.append(house_type_area[0]) area_lis.append(house_type_area[1]) except: pass data_zip = zip(title, url, mold_lis, house_type_lis, area_lis) with open('info.txt', 'a', encoding='utf8') as fa: for a in data_zip: fa.write(str(a)) fa.write('\n')
未完待续
后续接着对于分区进行爬取
arpa_dict = { '不限':'house', '浦东':'house-a025', '嘉定':'house-a029', '宝山':'house-a030', '闵行':'house-a018', '松江':'house-a0586', '普陀':'house-a028', '静安':'house-a021', '黄浦':'house-a024', '虹口':'house-a024', '青浦':'house-a024', '奉贤':'house-a024', '金山':'house-a024', '杨浦':'house-a024', '徐汇':'house-a024', '长宁':'house-a024', '崇明':'house-a0996', '上海周边':'house-a01046', }
这篇关于对于房天下租房信息进行爬取的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-23Springboot应用的多环境打包入门
- 2024-11-23Springboot应用的生产发布入门教程
- 2024-11-23Python编程入门指南
- 2024-11-23Java创业入门:从零开始的编程之旅
- 2024-11-23Java创业入门:新手必读的Java编程与创业指南
- 2024-11-23Java对接阿里云智能语音服务入门详解
- 2024-11-23Java对接阿里云智能语音服务入门教程
- 2024-11-23JAVA对接阿里云智能语音服务入门教程
- 2024-11-23Java副业入门:初学者的简单教程
- 2024-11-23JAVA副业入门:初学者的实战指南