python 读取pdf,导出 txt 或 html
2022/9/9 1:23:14
本文主要是介绍python 读取pdf,导出 txt 或 html,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
本文链接:https://www.cnblogs.com/tujia/p/16670374.html
一、安装 pdfminer.six
pip install pdfminer.six
二、使用代码读取pdf
from io import StringIO from pdfminer.layout import LAParams from pdfminer.high_level import extract_text_to_fp output_string = StringIO() with open('test.pdf', 'rb') as fin: # 导出txt # extract_text_to_fp(fin, output_string) # 导出html extract_text_to_fp(fin, output_string, laparams=LAParams(), output_type='html', codec=None) with open('test.html', 'w', encoding='utf-8') as f: f.write(output_string.getvalue().strip())
官方文档:
https://pdfminersix.readthedocs.io/en/latest/tutorial/highlevel.html
https://pdfminersix.readthedocs.io/en/latest/reference/highlevel.html
三、使用脚本读取pdf
https://pdfminersix.readthedocs.io/en/latest/tutorial/commandline.html
https://pdfminersix.readthedocs.io/en/latest/reference/commandline.html
说明:略
本文链接:https://www.cnblogs.com/tujia/p/16670374.html
完。
这篇关于python 读取pdf,导出 txt 或 html的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-14获取参数学习:Python编程入门教程
- 2024-11-14Python编程基础入门
- 2024-11-14Python编程入门指南
- 2024-11-13Python基础教程
- 2024-11-12Python编程基础指南
- 2024-11-12Python基础编程教程
- 2024-11-08Python编程基础与实践示例
- 2024-11-07Python编程基础指南
- 2024-11-06Python编程基础入门指南
- 2024-11-06怎么使用python 计算两个GPS的距离功能-icode9专业技术文章分享