Python入门学习之:10分钟1500访问量
2021/4/7 20:15:57
本文主要是介绍Python入门学习之:10分钟1500访问量,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
看效果:
不扯没用的,直接上代码:
# author : sunzd# date : 2019/9/01# position : beijingfrom fake_useragent import UserAgentfrom bs4 import BeautifulSoupfrom urllib import requestfrom urllib import errorimport reimport timedef html_request(url):if url is None:returnprint("download html is :{0}".format(url))# 如果url包含中文,则需要进行编码# 模拟浏览器行为headers = {'UserAgent': str(UserAgent().random)}req = request.Request(url, headers=headers)try:html = request.urlopen(req).read().decode('utf-8')except error.URLError as e:if hasattr(e, "code"):print(e.code)if hasattr(e, "reason"):print(e.reason)return None# print(html)return htmldef html_parser(url, html):if url is None or html is None:return# pattern = '<main>(.+?)</main>' #因为<main>后紧跟的时‘\n’因此需要忽略掉使用模式修正符re.S使'.'可以匹配任意字符# articles = re.compile(pattern, re.S).findall(html)# articles = articles[0]pattern_art = '<div class="article-item-box csdn-tracking-statistics" data(.+?)</div>'# print(articles)articles = re.compile(pattern_art, re.S).findall(html.replace('\n', ''))print(articles.__len__())for article in articles:soup = BeautifulSoup(article, 'html.parser')title = soup.find('a', attrs={'target': '_blank'})# print(title)print("文章题目:{0}\n文章类型:{1}".format(title.text.replace(' ', '').replace("原", "").replace("转", ""), title.span.text))print("文章链接:{0}".format(title.attrs['href']))html_request(title.attrs['href'])infors = soup.find('div', attrs={'class': 'info-box d-flex align-content-center'})# for infor in infors.p.next_siblings: next_siblings : 因为不包括自己,因此会把第一个p节点信息去掉。# for infor in infors.children:# if infor == ' ': # ‘ ’空格也会识别为他的孩子,因此需要过滤掉# continue# # print("======{0}".format(infor))# if infor.span: # 只需要<span >节点的信息# print("{0}".format(infor.span.text))pattern_next = '<li class="js-page-next js-page-action ui-pager ui-pager-disabled">'next = re.compile(pattern_next).findall(html)# print(html)print("是否为最后一页:{0}----{1}".format(len(next), next))if len(next) == 0:return 0else:return 0if __name__ == '__main__':name = '你自己的名称'page = 1url = "https://blog.csdn.net/" + name + "/article/list/" + str(page) + '?'while page < 7:html = html_request(url)# print(html)next = html_parser(url, html)page += 1if page > 6:page = 1url = "https://blog.csdn.net/" + name + "/article/list/" + str(page) + '?'
这篇关于Python入门学习之:10分钟1500访问量的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-24Python编程基础详解
- 2024-11-21Python编程基础教程
- 2024-11-20Python编程基础与实践
- 2024-11-20Python编程基础与高级应用
- 2024-11-19Python 基础编程教程
- 2024-11-19Python基础入门教程
- 2024-11-17在FastAPI项目中添加一个生产级别的数据库——本地环境搭建指南
- 2024-11-16`PyMuPDF4LLM`:提取PDF数据的神器
- 2024-11-16四种数据科学Web界面框架快速对比:Rio、Reflex、Streamlit和Plotly Dash
- 2024-11-14获取参数学习:Python编程入门教程