Python使用pandas_profiling库生成报告
2022/1/27 17:05:15
本文主要是介绍Python使用pandas_profiling库生成报告,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
Python使用pandas_profiling库生成报告
- Python安装pandas_profiling
命令行安装
pip install pandas_profiling
pip install pandas_profiling==2.10.1 --指定版本
清华镜像安装
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas_profiling
安装pandas_profiling报错处理卸载pandas_profiling
pip uninstall pandas_profiling
报错: ERROR: Cannot uninstall 'PyYAML'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall. 错误:无法卸载“PyYAML”。 它是一个distutils安装的项目,因此我们不能准确地确定哪些文件属于它,这将导致只部分卸载。 解决办法:卸载以后,在重新安装就可以了 在线下载命令 pip install -i https://pypi.douban.com/simple scrapy 常用的python 镜像 豆瓣,该网站比较稳定,速度也比较快 https://pypi.douban.com/simple 清华大学 https://pypi.tuna.tsinghua.edu.cn/simple 中国科技大学 https://mirrors.ustc.edu.cn/pypi/web/simple 阿里 https://mirrors.aliyun.com/pypi/simple/
- Python 代码如下:
import pandas as pd import pandas_profiling import os import re intput_dir = os.walk(r"../test_data") output_dir = '../test_data' hospitol = 'XX' for path, dir_list, file_list in intput_dir: for file_name in file_list: if file_name == 'XX.csv': #跑单张表pandas_profiling时使用; file_path = os.path.join(path, file_name) df = pd.read_csv(file_path) # 获取表名 tablename = re.compile(r'\w+') t_lst = re.findall(tablename, file_name) for l in t_lst: table_name = str.lower(l) #minimal=True 该参数,如果不设会出更详细的pandas_profiling报告; profile = pandas_profiling.ProfileReport(df, title=f'{hospitol}{table_name}表数据质量报告',minimal=True) profile.to_file(output_file=os.path.join(output_dir, table_name + '.html'))
- 以下是Pandas Profiling(2.11版)官方文档内容:
Pandas Profiling
Documentation | Slack | Stack Overflow
Generates profile reports from a pandas DataFrame
.
The pandas df.describe()
function is great but a little basic for serious exploratory data analysis.
pandas_profiling
extends the pandas DataFrame with df.profile_report()
for quick data analysis.
For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:
- Type inference: detect the types of columns in a dataframe.
- Essentials: type, unique values, missing values
- Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
- Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
- Most frequent values
- Histogram
- Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
- Missing values matrix, count, heatmap and dendrogram of missing values
- Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
- File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.
Announcements
Version v2.10.0rc1 released
v2.10.0rc1 includes a major overhaul of the type system, now fully reliant on visions.
See the changelog below to know what has changed.
Spark backend in progress
We can happily announce that we’re nearing v1 for the Spark backend for generating profile reports.
Stay tuned.
Support pandas-profiling
The development of pandas-profiling
relies completely on contributions.
If you find value in the package, we welcome you to support the project through GitHub Sponsors!
It’s extra exciting that GitHub matches your contribution for the first year.
Find more information here:
- Changelog v2.10.0rc1
- Sponsor the project on GitHub
January 5, 2021
这篇关于Python使用pandas_profiling库生成报告的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-21Python编程基础教程
- 2024-11-20Python编程基础与实践
- 2024-11-20Python编程基础与高级应用
- 2024-11-19Python 基础编程教程
- 2024-11-19Python基础入门教程
- 2024-11-17在FastAPI项目中添加一个生产级别的数据库——本地环境搭建指南
- 2024-11-16`PyMuPDF4LLM`:提取PDF数据的神器
- 2024-11-16四种数据科学Web界面框架快速对比:Rio、Reflex、Streamlit和Plotly Dash
- 2024-11-14获取参数学习:Python编程入门教程
- 2024-11-14Python编程基础入门