python爬虫简单案例:猫眼top100爬取
2021/11/12 17:39:57
本文主要是介绍python爬虫简单案例:猫眼top100爬取,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
import requests from pyquery import PyQuery from bs4 import BeautifulSoup import openpyxl headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36', 'Cookie':'__mta=156144900.1636616654407.1636632158574.1636632889189.43; uuid_n_v=v1; uuid=FD45EE3042C211EC99BFC50282E54185B3C7DC56EE0D41BBB1F564A9125A1ACF; _csrf=2617d894c54088c5a9ea4d4ec82429d1651a7a26cc5c5b27e786c094416ad25e; Hm_lvt_703e94591e87be68cc8da0da7cbd0be2=1636616653; _lxsdk_cuid=17d0df449d8c8-05fa68edf298bf-376b4502-2a3000-17d0df449d8c8; _lxsdk=FD45EE3042C211EC99BFC50282E54185B3C7DC56EE0D41BBB1F564A9125A1ACF; __mta=156144900.1636616654407.1636616659285.1636616673893.3; Hm_lpvt_703e94591e87be68cc8da0da7cbd0be2=1636632887; _lxsdk_s=17d0e1e4301-169-cb1-4f5%7C%7C64' } lst_1=[] for i in range(0,100,10): url='https://www.maoyan.com/board/4?requestCode=9daba57e99d92be6fe6549abd87e9f258ifwl&offset={}'.format(i) lst_1.append(url) resp = requests.get(url, headers=headers) lst0 = [] lst1 = [] lst2 = [] lst3 = [] for lst_2 in lst_1: resp = requests.get(lst_2, headers=headers) cateye1=BeautifulSoup(resp.text,'lxml') name=cateye1.find_all('p',class_='name') # print(name1) # lst0=[] for name1 in name: lst0.append(name1.text) # lst1=[] star=cateye1.find_all('p',class_='star') for star1 in star: lst1.append(star1.text.strip()) # lst2=[] time=cateye1.find_all('p',class_='releasetime') for time1 in time: lst2.append(time1.text) # lst3=[] score=cateye1.find_all('p',class_='score') for score1 in score: lst3.append(score1.text) # zip方法直接打印 # for names,stars,times,scores in zip(lst0,lst1,lst2,lst3): # print('片名:',names,'|',stars.strip(),'|',times,'|','评分',scores,'\n',) #组合成列表的方法 lst_1=[] for i in range(len(lst0)): str1=(lst0[i]+','+lst1[i].replace(',','|')+','+lst2[i]+'|'+'评分:'+lst3[i]) lst_1.append(str1) # print(lst_1) # 输出到txt中 with open('猫眼top100.txt', 'w', encoding='utf-8') as file: for lst_2 in lst_1: # print(type(lst_2)) file.write(lst_2+'\n') # 输出到excel中 wb=openpyxl.Workbook() sheet=wb.active for lst_2 in lst_1: lst_3=lst_2.split(',') print(lst_3) sheet.append(lst_3) wb.save('猫眼top100.xlsx')
这篇关于python爬虫简单案例:猫眼top100爬取的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-26Python基础编程
- 2024-11-25Python编程基础:变量与类型
- 2024-11-25Python编程基础与实践
- 2024-11-24Python编程基础详解
- 2024-11-21Python编程基础教程
- 2024-11-20Python编程基础与实践
- 2024-11-20Python编程基础与高级应用
- 2024-11-19Python 基础编程教程
- 2024-11-19Python基础入门教程
- 2024-11-17在FastAPI项目中添加一个生产级别的数据库——本地环境搭建指南