python 读取pdf,导出 txt 或 html
2022/9/9 1:23:14
本文主要是介绍python 读取pdf,导出 txt 或 html,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
本文链接:https://www.cnblogs.com/tujia/p/16670374.html
一、安装 pdfminer.six
pip install pdfminer.six
二、使用代码读取pdf
from io import StringIO from pdfminer.layout import LAParams from pdfminer.high_level import extract_text_to_fp output_string = StringIO() with open('test.pdf', 'rb') as fin: # 导出txt # extract_text_to_fp(fin, output_string) # 导出html extract_text_to_fp(fin, output_string, laparams=LAParams(), output_type='html', codec=None) with open('test.html', 'w', encoding='utf-8') as f: f.write(output_string.getvalue().strip())
官方文档:
https://pdfminersix.readthedocs.io/en/latest/tutorial/highlevel.html
https://pdfminersix.readthedocs.io/en/latest/reference/highlevel.html
三、使用脚本读取pdf
https://pdfminersix.readthedocs.io/en/latest/tutorial/commandline.html
https://pdfminersix.readthedocs.io/en/latest/reference/commandline.html
说明:略
本文链接:https://www.cnblogs.com/tujia/p/16670374.html
完。
这篇关于python 读取pdf,导出 txt 或 html的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-03-30开始python成长之路
- 2024-03-29python optparse
- 2024-03-29python map 函数
- 2024-03-20invalid format specifier python
- 2024-03-18pool.map python
- 2024-03-18threads in python
- 2024-03-14python Ai 应用开发基础训练,字符串,字典,文件
- 2024-03-13id3 algorithm python
- 2024-03-13sum array elements python
- 2024-03-12python colon equals