数据清洗和准备--数据转换
2021/11/16 6:11:38
本文主要是介绍数据清洗和准备--数据转换,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
数据清洗和准备
二、数据转换
移除重复数据
data = pd.DataFrame({'k1':['one','two']*3+['two'], 'k2':[1,1,2,3,3,4,4]}) data Out: k1 k2 0 one 1 1 two 1 2 one 2 3 two 3 4 one 3 5 two 4 6 two 4
#检查 哪个重复 data.duplicated() Out: 0 False 1 False 2 False 3 False 4 False 5 False 6 True dtype: bool
# 删除重复数据 data.drop_duplicates() Out: k1 k2 0 one 1 1 two 1 2 one 2 3 two 3 4 one 3 5 two 4
data['v1'] = range(7) data Out: k1 k2 v1 0 one 1 0 1 two 1 1 2 one 2 2 3 two 3 3 4 one 3 4 5 two 4 5 6 two 4 6
data.drop_duplicates(['k1']) # 按照k1 这一列去除重复项 Out: k1 k2 v1 0 one 1 0 1 two 1 1
data.drop_duplicates(['k2'],keep='last') Out: k1 k2 v1 1 two 1 1 2 one 2 2 4 one 3 4 6 two 4 6
data = pd.DataFrame({'k1':['one','two']*3+['two'], 'k2':[1,1,2,3,3,4,4]}) data Out: k1 k2 0 one 1 1 two 1 2 one 2 3 two 3 4 one 3 5 two 4 6 two 4
data.drop_duplicates(keep='last') Out: k1 k2 0 one 1 1 two 1 2 one 2 3 two 3 4 one 3 6 two 4
利用函数或映射进行数据转换
data = pd.DataFrame({'food': ['Apple', 'banana', 'orange','apple','Mango', 'tomato'], 'price': [4, 3, 3.5, 6, 12,3]}) data Out: food price 0 Apple 4.0 1 banana 3.0 2 orange 3.5 3 apple 6.0 4 Mango 12.0 5 tomato 3.0
meat = {'apple':'fruit', 'banana':'fruit', 'orange':'fruit', 'mango':'fruit', 'tomato':'vagetables'}
#值小写 low = data['food'].str.lower() low Out: 0 apple 1 banana 2 orange 3 apple 4 mango 5 tomato Name: food, dtype: object
data['class'] = low.map(meat) data Out: food price class class1 0 Apple 4.0 fruit fruit 1 banana 3.0 fruit fruit 2 orange 3.5 fruit fruit 3 apple 6.0 fruit fruit 4 Mango 12.0 fruit fruit 5 tomato 3.0 vagetables vagetables
data['class1'] = data['food'].map(lambda x:meat[x.lower()]) data Out: food price class class1 0 Apple 4.0 fruit fruit 1 banana 3.0 fruit fruit 2 orange 3.5 fruit fruit 3 apple 6.0 fruit fruit 4 Mango 12.0 fruit fruit 5 tomato 3.0 vagetables vagetables
data['class1'] = data['food'].map(lambda x: meat[x.lower()]) data Out: food price class class1 0 Apple 4.0 fruit fruit 1 banana 3.0 fruit fruit 2 orange 3.5 fruit fruit 3 apple 6.0 fruit fruit 4 Mango 12.0 fruit fruit 5 tomato 3.0 vegetables vegetables
替换值
data = pd.Series([1,-999,2,-1000,3]) data Out: 0 1 1 -999 2 2 3 -1000 4 3 dtype: int64
data.replace(-999,np.nan) Out: 0 1.0 1 NaN 2 2.0 3 -1000.0 4 3.0 dtype: float64
data.replace([-999,-1000],np.nan) # 替换多个 Out: 0 1.0 1 NaN 2 2.0 3 NaN 4 3.0 dtype: float64
data1 = data.replace([-999,-1000],[np.nan,0]) # replace 会返回一个新的对象 data1 Out: 0 1.0 1 NaN 2 2.0 3 0.0 4 3.0 dtype: float64
data.replace({-999:np.nan,-1000:0}) Out: 0 1 1 -999 2 2 3 -1000 4 3 dtype: int64
重命名索引
data = pd.DataFrame(np.arange(12).reshape((3, 4)), index=['BeiJing', 'Tokyo', 'New York'], columns=['one', 'two', 'three', 'four']) data Out: one two three four BeiJing 0 1 2 3 Tokyo 4 5 6 7 New York 8 9 10 11
# 重新索引 data.reindex(['a', 'b', 'c']) # reindex 只能修改已有的标签名 Out: one two three four a NaN NaN NaN NaN b NaN NaN NaN NaN c NaN NaN NaN NaN
data Out: one two three four BeiJing 0 1 2 3 Tokyo 4 5 6 7 New York 8 9 10 11
#大写 tran = lambda x:x[:4].upper() data.index.map(tran) Out: Index(['BEIJ', 'TOKY', 'NEW '], dtype='object')
data.index = data.index.map(tran) data Out: one two three four BEIJ 0 1 2 3 TOKY 4 5 6 7 NEW 8 9 10 11
# rename data.rename(index=str.title,columns = str.upper) Out: ONE TWO THREE FOUR Beij 0 1 2 3 Toky 4 5 6 7 New 8 9 10 11
#结合字典型对象对标签更新 data.rename(index={'TOKY':'东京'},columns={'three':'第三年'}) Out: one two 第三年 four BEIJ 0 1 2 3 东京 4 5 6 7 NEW 8 9 10 11
data.rename(index={'TOKY':'东京'},columns={'three':'第三年'},inplace = True) data Out: one two 第三年 four BEIJ 0 1 2 3 东京 4 5 6 7 NEW 8 9 10 11
这篇关于数据清洗和准备--数据转换的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-06-19《2023版Java工程师》课程升级公告
- 2024-06-15matplotlib作图不显示3D图,怎么办?
- 2024-06-1503-Loki 日志监控
- 2024-06-1504-让LLM理解知识 -Prompt
- 2024-06-05做软件测试需要懂代码吗?
- 2024-06-0514-ShardingSphere的分布式主键实现
- 2024-06-03为什么以及如何要进行架构设计权衡?
- 2024-05-31全网首发第二弹!软考2024年5月《软件设计师》真题+解析+答案!(11-20题)
- 2024-05-31全网首发!软考2024年5月《软件设计师》真题+解析+答案!(21-30题)
- 2024-05-30【Java】百万数据excel导出功能如何实现