Python使用pandas_profiling库生成报告

2022/1/27 17:05:15

本文主要是介绍Python使用pandas_profiling库生成报告,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

Python使用pandas_profiling库生成报告

  • Python安装pandas_profiling

命令行安装
pip install pandas_profiling
pip install pandas_profiling==2.10.1 --指定版本

清华镜像安装
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas_profiling

卸载pandas_profiling
pip uninstall pandas_profiling

安装pandas_profiling报错处理
报错:
ERROR: Cannot uninstall 'PyYAML'.  It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

错误:无法卸载“PyYAML”。 它是一个distutils安装的项目,因此我们不能准确地确定哪些文件属于它,这将导致只部分卸载。

解决办法:卸载以后,在重新安装就可以了

在线下载命令
pip install -i https://pypi.douban.com/simple  scrapy

常用的python 镜像
豆瓣,该网站比较稳定,速度也比较快
https://pypi.douban.com/simple

清华大学
https://pypi.tuna.tsinghua.edu.cn/simple

中国科技大学
https://mirrors.ustc.edu.cn/pypi/web/simple

阿里
https://mirrors.aliyun.com/pypi/simple/


  • Python 代码如下:
import pandas as pd
import pandas_profiling
import os
import re

intput_dir = os.walk(r"../test_data")
output_dir = '../test_data'
hospitol = 'XX'

for path, dir_list, file_list in intput_dir:
    for file_name in file_list:
        if file_name == 'XX.csv': #跑单张表pandas_profiling时使用;
            file_path = os.path.join(path, file_name)
            df = pd.read_csv(file_path)
            # 获取表名
            tablename = re.compile(r'\w+')
            t_lst = re.findall(tablename, file_name)
            for l in t_lst:
                table_name = str.lower(l)
                #minimal=True 该参数,如果不设会出更详细的pandas_profiling报告;
                profile = pandas_profiling.ProfileReport(df, title=f'{hospitol}{table_name}表数据质量报告',minimal=True)
                profile.to_file(output_file=os.path.join(output_dir, table_name + '.html'))

  • 以下是Pandas Profiling(2.11版)官方文档内容:

Pandas Profiling

Pandas Profiling Logo Header

Documentation | Slack | Stack Overflow

Generates profile reports from a pandas DataFrame.

The pandas df.describe() function is great but a little basic for serious exploratory data analysis.
pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis.

For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:

  • Type inference: detect the types of columns in a dataframe.
  • Essentials: type, unique values, missing values
  • Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent values
  • Histogram
  • Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
  • Missing values matrix, count, heatmap and dendrogram of missing values
  • Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
  • File and Image analysis extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.

Announcements

Version v2.10.0rc1 released

v2.10.0rc1 includes a major overhaul of the type system, now fully reliant on visions.
See the changelog below to know what has changed.

Spark backend in progress

We can happily announce that we’re nearing v1 for the Spark backend for generating profile reports.
Stay tuned.

Support pandas-profiling

The development of pandas-profiling relies completely on contributions.
If you find value in the package, we welcome you to support the project through GitHub Sponsors!
It’s extra exciting that GitHub matches your contribution for the first year.

Find more information here:

  • Changelog v2.10.0rc1
  • Sponsor the project on GitHub

January 5, 2021



这篇关于Python使用pandas_profiling库生成报告的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!


扫一扫关注最新编程教程