Python数据分析pandas入门练习题(八)
2021/7/10 11:06:01
本文主要是介绍Python数据分析pandas入门练习题(八),对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
Python数据分析基础
- Preparation
- Exercise 1- US - Baby Names
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/US_Baby_Names/US_Baby_Names_right.csv).
- Step 3. Assign it to a variable called baby_names.
- Step 4. See the first 10 entries
- Step 5. Delete the column 'Unnamed: 0' and 'Id'
- Step 6. Is there more male or female names in the dataset?
- Step 7. Group the dataset by name and assign to names
- Step 8. How many different names exist in the dataset?
- Step 9. What is the name with most occurrences?
- Step 10. How many different names have the least occurrences?
- Step 11. What is the median name occurrence?
- Step 12. What is the standard deviation of names?
- Step 13. Get a summary with the mean, min, max, std and quartiles.
- Exercise 2- Wind Statistics
- Introduction:
- Step 1. Import the necessary libraries
- Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data)
- Step 3. Assign it to a variable called data and replace the first 3 columns by a proper datetime index.
- Step 4. Year 2061? Do we really have data from this year? Create a function to fix it and apply it.
- Step 5. Set the right dates as the index. Pay attention at the data type, it should be datetime64[ns].
- Step 6. Compute how many values are missing for each location over the entire record.
- They should be ignored in all calculations below.
- Step 7. Compute how many non-missing values there are in total.
- Step 8. Calculate the mean windspeeds of the windspeeds over all the locations and all the times.
- A single number for the entire dataset.
- Step 9. Create a DataFrame called loc_stats and calculate the min, max and mean windspeeds and standard deviations of the windspeeds at each location over all the days
- A different set of numbers for each location.
- Step 10. Create a DataFrame called day_stats and calculate the min, max and mean windspeed and standard deviations of the windspeeds across all the locations at each day.
- A different set of numbers for each day.
- Step 11. Find the average windspeed in January for each location.
- Treat January 1961 and January 1962 both as January.
- Step 12. Downsample the record to a yearly frequency for each location.
- Step 13. Downsample the record to a monthly frequency for each location.
- Step 14. Downsample the record to a weekly frequency for each location.
- Step 15. Calculate the min, max and mean windspeeds and standard deviations of the windspeeds across all locations for each week (assume that the first week starts on January 2 1961) for the first 52 weeks.
- Conclusion
Preparation
需要数据集可以自行网上寻找(都是公开的数据集)或私聊博主,传到csdn,你们下载要会员,就不传了。下面数据集链接下载不一定能成功。
Exercise 1- US - Baby Names
Introduction:
We are going to use a subset of US Baby Names from Kaggle.
In the file it will be names from 2004 until 2014
Step 1. Import the necessary libraries
代码如下:
import pandas as pd
Step 2. Import the dataset from this address.
Step 3. Assign it to a variable called baby_names.
代码如下:
baby_names = pd.read_csv("US_Baby_Names_right.csv") baby_names.info()
输出结果如下:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1016395 entries, 0 to 1016394 Data columns (total 7 columns): Unnamed: 0 1016395 non-null int64 Id 1016395 non-null int64 Name 1016395 non-null object Year 1016395 non-null int64 Gender 1016395 non-null object State 1016395 non-null object Count 1016395 non-null int64 dtypes: int64(4), object(3) memory usage: 54.3+ MB
Step 4. See the first 10 entries
代码如下:
baby_names.head(10)
输出结果如下:
Unnamed: 0 | Id | Name | Year | Gender | State | Count | |
---|---|---|---|---|---|---|---|
0 | 11349 | 11350 | Emma | 2004 | F | AK | 62 |
1 | 11350 | 11351 | Madison | 2004 | F | AK | 48 |
2 | 11351 | 11352 | Hannah | 2004 | F | AK | 46 |
3 | 11352 | 11353 | Grace | 2004 | F | AK | 44 |
4 | 11353 | 11354 | Emily | 2004 | F | AK | 41 |
5 | 11354 | 11355 | Abigail | 2004 | F | AK | 37 |
6 | 11355 | 11356 | Olivia | 2004 | F | AK | 33 |
7 | 11356 | 11357 | Isabella | 2004 | F | AK | 30 |
8 | 11357 | 11358 | Alyssa | 2004 | F | AK | 29 |
9 | 11358 | 11359 | Sophia | 2004 | F | AK | 28 |
Step 5. Delete the column ‘Unnamed: 0’ and ‘Id’
代码如下:
del baby_names['Id'] # OR del baby_names['Unnamed: 0'] baby_names = baby_names.loc[:, ~baby_names.columns.str.contains('^Unnamed')] baby_names.head()
输出结果如下:
Name | Year | Gender | State | Count | |
---|---|---|---|---|---|
0 | Emma | 2004 | F | AK | 62 |
1 | Madison | 2004 | F | AK | 48 |
2 | Hannah | 2004 | F | AK | 46 |
3 | Grace | 2004 | F | AK | 44 |
4 | Emily | 2004 | F | AK | 41 |
Step 6. Is there more male or female names in the dataset?
代码如下:
# baby_names['Gender'].value_counts() baby_names.groupby('Gender').Count.sum()
输出结果如下:
Gender F 16380293 M 19041199 Name: Count, dtype: int64
Step 7. Group the dataset by name and assign to names
代码如下:
del baby_names["Year"] names = baby_names.groupby("Name").sum() names.head() print(names.shape) names.sort_values("Count", ascending = 0).head() # names= baby_names.groupby('Name') # names.head(1)
输出结果如下:
(17632, 1)
Count | |
---|---|
Name | |
Jacob | 242874 |
Emma | 214852 |
Michael | 214405 |
Ethan | 209277 |
Isabella | 204798 |
Step 8. How many different names exist in the dataset?
代码如下:
len(names)
输出结果如下:
17632
Step 9. What is the name with most occurrences?
代码如下:
# names['Count'].sum().argmax() names.Count.idxmax() # idxmax()获取pandas中series最大值对应的索引
输出结果如下:
'Jacob'
Step 10. How many different names have the least occurrences?
代码如下:
len(names[names.Count == names.Count.min()])
输出结果如下:
2578
Step 11. What is the median name occurrence?
代码如下:
names[names.Count == names.Count.median()]
输出结果如下:
Count | |
---|---|
Name | |
Aishani | 49 |
Alara | 49 |
Alysse | 49 |
Ameir | 49 |
Anely | 49 |
Antonina | 49 |
Aveline | 49 |
Aziah | 49 |
Baily | 49 |
Caleah | 49 |
Carlota | 49 |
Cristine | 49 |
Dahlila | 49 |
Darvin | 49 |
Deante | 49 |
Deserae | 49 |
Devean | 49 |
Elizah | 49 |
Emmaly | 49 |
Emmanuela | 49 |
Envy | 49 |
Esli | 49 |
Fay | 49 |
Gurshaan | 49 |
Hareem | 49 |
Iven | 49 |
Jaice | 49 |
Jaiyana | 49 |
Jamiracle | 49 |
Jelissa | 49 |
... | ... |
Kyndle | 49 |
Kynsley | 49 |
Leylanie | 49 |
Maisha | 49 |
Malillany | 49 |
Mariann | 49 |
Marquell | 49 |
Maurilio | 49 |
Mckynzie | 49 |
Mehdi | 49 |
Nabeel | 49 |
Nalleli | 49 |
Nassir | 49 |
Nazier | 49 |
Nishant | 49 |
Rebecka | 49 |
Reghan | 49 |
Ridwan | 49 |
Riot | 49 |
Rubin | 49 |
Ryatt | 49 |
Sameera | 49 |
Sanjuanita | 49 |
Shalyn | 49 |
Skylie | 49 |
Sriram | 49 |
Trinton | 49 |
Vita | 49 |
Yoni | 49 |
Zuleima | 49 |
66 rows × 1 columns
Step 12. What is the standard deviation of names?
代码如下:
names.Count.std()
输出结果如下:
11006.069467891111
Step 13. Get a summary with the mean, min, max, std and quartiles.
代码如下:
names.describe()
输出结果如下:
Count | |
---|---|
count | 17632.000000 |
mean | 2008.932169 |
std | 11006.069468 |
min | 5.000000 |
25% | 11.000000 |
50% | 49.000000 |
75% | 337.000000 |
max | 242874.000000 |
Exercise 2- Wind Statistics
Introduction:
The data have been modified to contain some missing values, identified by NaN.
Using pandas should make this exercise
easier, in particular for the bonus question.
You should be able to perform all of these operations without using
a for loop or other looping construct.
- The data in ‘wind.data’ has the following format:
""" Yr Mo Dy RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL 61 1 1 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04 61 1 2 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83 61 1 3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71 """
'\nYr Mo Dy RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO BEL MAL\n61 1 1 15.04 14.96 13.17 9.29 NaN 9.87 13.67 10.25 10.83 12.58 18.50 15.04\n61 1 2 14.71 NaN 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67 17.54 13.83\n61 1 3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 NaN 8.50 7.67 12.75 12.71\n'
The first three columns are year, month and day. The
remaining 12 columns are average windspeeds in knots at 12
locations in Ireland on that day.
More information about the dataset go here.
Step 1. Import the necessary libraries
代码如下:
import pandas as pd import datetime
Step 2. Import the dataset from this address
Step 3. Assign it to a variable called data and replace the first 3 columns by a proper datetime index.
代码如下:
data = pd.read_table('wind.data', sep='\s+', parse_dates = [[0, 1, 2]]) data.head()
输出结果如下:
Yr_Mo_Dy | RPT | VAL | ROS | KIL | SHA | BIR | DUB | CLA | MUL | CLO | BEL | MAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2061-01-01 | 15.04 | 14.96 | 13.17 | 9.29 | NaN | 9.87 | 13.67 | 10.25 | 10.83 | 12.58 | 18.50 | 15.04 |
1 | 2061-01-02 | 14.71 | NaN | 10.83 | 6.50 | 12.62 | 7.67 | 11.50 | 10.04 | 9.79 | 9.67 | 17.54 | 13.83 |
2 | 2061-01-03 | 18.50 | 16.88 | 12.33 | 10.13 | 11.17 | 6.17 | 11.25 | NaN | 8.50 | 7.67 | 12.75 | 12.71 |
3 | 2061-01-04 | 10.58 | 6.63 | 11.75 | 4.58 | 4.54 | 2.88 | 8.63 | 1.79 | 5.83 | 5.88 | 5.46 | 10.88 |
4 | 2061-01-05 | 13.33 | 13.25 | 11.42 | 6.17 | 10.71 | 8.21 | 11.92 | 6.54 | 10.92 | 10.34 | 12.92 | 11.83 |
Step 4. Year 2061? Do we really have data from this year? Create a function to fix it and apply it.
代码如下:
def fix_century(x): year = x.year - 100 if x.year > 1989 else x.year return datetime.date(year, x.month, x.day) data['Yr_Mo_Dy'] = data['Yr_Mo_Dy'].apply(fix_century) data.head()
输出结果如下:
Yr_Mo_Dy | RPT | VAL | ROS | KIL | SHA | BIR | DUB | CLA | MUL | CLO | BEL | MAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1961-01-01 | 15.04 | 14.96 | 13.17 | 9.29 | NaN | 9.87 | 13.67 | 10.25 | 10.83 | 12.58 | 18.50 | 15.04 |
1 | 1961-01-02 | 14.71 | NaN | 10.83 | 6.50 | 12.62 | 7.67 | 11.50 | 10.04 | 9.79 | 9.67 | 17.54 | 13.83 |
2 | 1961-01-03 | 18.50 | 16.88 | 12.33 | 10.13 | 11.17 | 6.17 | 11.25 | NaN | 8.50 | 7.67 | 12.75 | 12.71 |
3 | 1961-01-04 | 10.58 | 6.63 | 11.75 | 4.58 | 4.54 | 2.88 | 8.63 | 1.79 | 5.83 | 5.88 | 5.46 | 10.88 |
4 | 1961-01-05 | 13.33 | 13.25 | 11.42 | 6.17 | 10.71 | 8.21 | 11.92 | 6.54 | 10.92 | 10.34 | 12.92 | 11.83 |
Step 5. Set the right dates as the index. Pay attention at the data type, it should be datetime64[ns].
代码如下:
data["Yr_Mo_Dy"] = pd.to_datetime(data["Yr_Mo_Dy"]) # 转换为datetime64 data = data.set_index('Yr_Mo_Dy') data.head()
输出结果如下:
RPT | VAL | ROS | KIL | SHA | BIR | DUB | CLA | MUL | CLO | BEL | MAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Yr_Mo_Dy | ||||||||||||
1961-01-01 | 15.04 | 14.96 | 13.17 | 9.29 | NaN | 9.87 | 13.67 | 10.25 | 10.83 | 12.58 | 18.50 | 15.04 |
1961-01-02 | 14.71 | NaN | 10.83 | 6.50 | 12.62 | 7.67 | 11.50 | 10.04 | 9.79 | 9.67 | 17.54 | 13.83 |
1961-01-03 | 18.50 | 16.88 | 12.33 | 10.13 | 11.17 | 6.17 | 11.25 | NaN | 8.50 | 7.67 | 12.75 | 12.71 |
1961-01-04 | 10.58 | 6.63 | 11.75 | 4.58 | 4.54 | 2.88 | 8.63 | 1.79 | 5.83 | 5.88 | 5.46 | 10.88 |
1961-01-05 | 13.33 | 13.25 | 11.42 | 6.17 | 10.71 | 8.21 | 11.92 | 6.54 | 10.92 | 10.34 | 12.92 | 11.83 |
Step 6. Compute how many values are missing for each location over the entire record.
They should be ignored in all calculations below.
代码如下:
data.isnull().sum()
输出结果如下:
RPT 6 VAL 3 ROS 2 KIL 5 SHA 2 BIR 0 DUB 3 CLA 2 MUL 3 CLO 1 BEL 0 MAL 4 dtype: int64
Step 7. Compute how many non-missing values there are in total.
代码如下:
data.shape[0] - data.isnull().sum() #OR data.notnull.sum()
输出结果如下:
RPT 6568 VAL 6571 ROS 6572 KIL 6569 SHA 6572 BIR 6574 DUB 6571 CLA 6572 MUL 6571 CLO 6573 BEL 6574 MAL 6570 dtype: int64
Step 8. Calculate the mean windspeeds of the windspeeds over all the locations and all the times.
A single number for the entire dataset.
代码如下:
data.fillna(0).values.flatten().mean() # a.flatten()就是把data降到一维,默认是按行的方向降
输出结果如下:
10.223864592840483
Step 9. Create a DataFrame called loc_stats and calculate the min, max and mean windspeeds and standard deviations of the windspeeds at each location over all the days
A different set of numbers for each location.
代码如下:
# loc_stats = data.loc[:, 'RPT':'MAL'].describe(percentiles=[]) # loc_stats data.describe(percentiles=[])
输出结果如下:
RPT | VAL | ROS | KIL | SHA | BIR | DUB | CLA | MUL | CLO | BEL | MAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 6568.000000 | 6571.000000 | 6572.000000 | 6569.000000 | 6572.000000 | 6574.000000 | 6571.000000 | 6572.000000 | 6571.000000 | 6573.000000 | 6574.000000 | 6570.000000 |
mean | 12.362987 | 10.644314 | 11.660526 | 6.306468 | 10.455834 | 7.092254 | 9.797343 | 8.495053 | 8.493590 | 8.707332 | 13.121007 | 15.599079 |
std | 5.618413 | 5.267356 | 5.008450 | 3.605811 | 4.936125 | 3.968683 | 4.977555 | 4.499449 | 4.166872 | 4.503954 | 5.835037 | 6.699794 |
min | 0.670000 | 0.210000 | 1.500000 | 0.000000 | 0.130000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.040000 | 0.130000 | 0.670000 |
50% | 11.710000 | 10.170000 | 10.920000 | 5.750000 | 9.960000 | 6.830000 | 9.210000 | 8.080000 | 8.170000 | 8.290000 | 12.500000 | 15.000000 |
max | 35.800000 | 33.370000 | 33.840000 | 28.460000 | 37.540000 | 26.160000 | 30.370000 | 31.080000 | 25.880000 | 28.210000 | 42.380000 | 42.540000 |
Step 10. Create a DataFrame called day_stats and calculate the min, max and mean windspeed and standard deviations of the windspeeds across all the locations at each day.
A different set of numbers for each day.
代码如下:
day_stats = pd.DataFrame() day_stats['min'] = data.min(axis = 1) day_stats['max'] = data.max(axis = 1) day_stats['mean'] = data.mean(axis = 1) day_stats['std'] = data.std(axis = 1) day_stats.head()
输出结果如下:
min | max | mean | std | |
---|---|---|---|---|
Yr_Mo_Dy | ||||
1961-01-01 | 9.29 | 18.50 | 13.018182 | 2.808875 |
1961-01-02 | 6.50 | 17.54 | 11.336364 | 3.188994 |
1961-01-03 | 6.17 | 18.50 | 11.641818 | 3.681912 |
1961-01-04 | 1.79 | 11.75 | 6.619167 | 3.198126 |
1961-01-05 | 6.17 | 13.33 | 10.630000 | 2.445356 |
Step 11. Find the average windspeed in January for each location.
Treat January 1961 and January 1962 both as January.
代码如下:
data.loc[data.index.month == 1].mean()
输出结果如下:
RPT 14.847325 VAL 12.914560 ROS 13.299624 KIL 7.199498 SHA 11.667734 BIR 8.054839 DUB 11.819355 CLA 9.512047 MUL 9.543208 CLO 10.053566 BEL 14.550520 MAL 18.028763 dtype: float64
Step 12. Downsample the record to a yearly frequency for each location.
代码如下:
# pd.Period()创建时期数据 # pd.Period()参数:一个时间戳 + freq 参数 → freq 用于指明该 period 的长度,时间戳则说明该 period 在时间轴上的位置 # DatetimeIndex对象的数据转换为PeriodIndex data.groupby(data.index.to_period('A')).mean()
输出结果如下:
RPT | VAL | ROS | KIL | SHA | BIR | DUB | CLA | MUL | CLO | BEL | MAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Yr_Mo_Dy | ||||||||||||
1961 | 12.299583 | 10.351796 | 11.362369 | 6.958227 | 10.881763 | 7.729726 | 9.733923 | 8.858788 | 8.647652 | 9.835577 | 13.502795 | 13.680773 |
1962 | 12.246923 | 10.110438 | 11.732712 | 6.960440 | 10.657918 | 7.393068 | 11.020712 | 8.793753 | 8.316822 | 9.676247 | 12.930685 | 14.323956 |
1963 | 12.813452 | 10.836986 | 12.541151 | 7.330055 | 11.724110 | 8.434712 | 11.075699 | 10.336548 | 8.903589 | 10.224438 | 13.638877 | 14.999014 |
1964 | 12.363661 | 10.920164 | 12.104372 | 6.787787 | 11.454481 | 7.570874 | 10.259153 | 9.467350 | 7.789016 | 10.207951 | 13.740546 | 14.910301 |
1965 | 12.451370 | 11.075534 | 11.848767 | 6.858466 | 11.024795 | 7.478110 | 10.618712 | 8.879918 | 7.907425 | 9.918082 | 12.964247 | 15.591644 |
1966 | 13.461973 | 11.557205 | 12.020630 | 7.345726 | 11.805041 | 7.793671 | 10.579808 | 8.835096 | 8.514438 | 9.768959 | 14.265836 | 16.307260 |
1967 | 12.737151 | 10.990986 | 11.739397 | 7.143425 | 11.630740 | 7.368164 | 10.652027 | 9.325616 | 8.645014 | 9.547425 | 14.774548 | 17.135945 |
1968 | 11.835628 | 10.468197 | 11.409754 | 6.477678 | 10.760765 | 6.067322 | 8.859180 | 8.255519 | 7.224945 | 7.832978 | 12.808634 | 15.017486 |
1969 | 11.166356 | 9.723699 | 10.902000 | 5.767973 | 9.873918 | 6.189973 | 8.564493 | 7.711397 | 7.924521 | 7.754384 | 12.621233 | 15.762904 |
1970 | 12.600329 | 10.726932 | 11.730247 | 6.217178 | 10.567370 | 7.609452 | 9.609890 | 8.334630 | 9.297616 | 8.289808 | 13.183644 | 16.456027 |
1971 | 11.273123 | 9.095178 | 11.088329 | 5.241507 | 9.440329 | 6.097151 | 8.385890 | 6.757315 | 7.915370 | 7.229753 | 12.208932 | 15.025233 |
1972 | 12.463962 | 10.561311 | 12.058333 | 5.929699 | 9.430410 | 6.358825 | 9.704508 | 7.680792 | 8.357295 | 7.515273 | 12.727377 | 15.028716 |
1973 | 11.828466 | 10.680493 | 10.680493 | 5.547863 | 9.640877 | 6.548740 | 8.482110 | 7.614274 | 8.245534 | 7.812411 | 12.169699 | 15.441096 |
1974 | 13.643096 | 11.811781 | 12.336356 | 6.427041 | 11.110986 | 6.809781 | 10.084603 | 9.896986 | 9.331753 | 8.736356 | 13.252959 | 16.947671 |
1975 | 12.008575 | 10.293836 | 11.564712 | 5.269096 | 9.190082 | 5.668521 | 8.562603 | 7.843836 | 8.797945 | 7.382822 | 12.631671 | 15.307863 |
1976 | 11.737842 | 10.203115 | 10.761230 | 5.109426 | 8.846339 | 6.311038 | 9.149126 | 7.146202 | 8.883716 | 7.883087 | 12.332377 | 15.471448 |
1977 | 13.099616 | 11.144493 | 12.627836 | 6.073945 | 10.003836 | 8.586438 | 11.523205 | 8.378384 | 9.098192 | 8.821616 | 13.459068 | 16.590849 |
1978 | 12.504356 | 11.044274 | 11.380000 | 6.082356 | 10.167233 | 7.650658 | 9.489342 | 8.800466 | 9.089753 | 8.301699 | 12.967397 | 16.771370 |
Step 13. Downsample the record to a monthly frequency for each location.
代码如下:
data.groupby(data.index.to_period('M')).mean().head()
输出结果如下:
RPT | VAL | ROS | KIL | SHA | BIR | DUB | CLA | MUL | CLO | BEL | MAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Yr_Mo_Dy | ||||||||||||
1961-01 | 14.841333 | 11.988333 | 13.431613 | 7.736774 | 11.072759 | 8.588065 | 11.184839 | 9.245333 | 9.085806 | 10.107419 | 13.880968 | 14.703226 |
1961-02 | 16.269286 | 14.975357 | 14.441481 | 9.230741 | 13.852143 | 10.937500 | 11.890714 | 11.846071 | 11.821429 | 12.714286 | 18.583214 | 15.411786 |
1961-03 | 10.890000 | 11.296452 | 10.752903 | 7.284000 | 10.509355 | 8.866774 | 9.644194 | 9.829677 | 10.294138 | 11.251935 | 16.410968 | 15.720000 |
1961-04 | 10.722667 | 9.427667 | 9.998000 | 5.830667 | 8.435000 | 6.495000 | 6.925333 | 7.094667 | 7.342333 | 7.237000 | 11.147333 | 10.278333 |
1961-05 | 9.860968 | 8.850000 | 10.818065 | 5.905333 | 9.490323 | 6.574839 | 7.604000 | 8.177097 | 8.039355 | 8.499355 | 11.900323 | 12.011613 |
Step 14. Downsample the record to a weekly frequency for each location.
代码如下:
data.groupby(data.index.to_period('W')).mean().head()
输出结果如下:
RPT | VAL | ROS | KIL | SHA | BIR | DUB | CLA | MUL | CLO | BEL | MAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Yr_Mo_Dy | ||||||||||||
1960-12-26/1961-01-01 | 15.040000 | 14.960000 | 13.170000 | 9.290000 | NaN | 9.870000 | 13.670000 | 10.250000 | 10.830000 | 12.580000 | 18.500000 | 15.040000 |
1961-01-02/1961-01-08 | 13.541429 | 11.486667 | 10.487143 | 6.417143 | 9.474286 | 6.435714 | 11.061429 | 6.616667 | 8.434286 | 8.497143 | 12.481429 | 13.238571 |
1961-01-09/1961-01-15 | 12.468571 | 8.967143 | 11.958571 | 4.630000 | 7.351429 | 5.072857 | 7.535714 | 6.820000 | 5.712857 | 7.571429 | 11.125714 | 11.024286 |
1961-01-16/1961-01-22 | 13.204286 | 9.862857 | 12.982857 | 6.328571 | 8.966667 | 7.417143 | 9.257143 | 7.875714 | 7.145714 | 8.124286 | 9.821429 | 11.434286 |
1961-01-23/1961-01-29 | 19.880000 | 16.141429 | 18.225714 | 12.720000 | 17.432857 | 14.828571 | 15.528571 | 15.160000 | 14.480000 | 15.640000 | 20.930000 | 22.530000 |
Step 15. Calculate the min, max and mean windspeeds and standard deviations of the windspeeds across all locations for each week (assume that the first week starts on January 2 1961) for the first 52 weeks.
代码如下:
# data.groupby(data.index.to_period('1961-01-02', 'W')).describe(percentiles=[]).head() weekly = data.resample('W').agg(['min', 'max', 'mean', 'std']) # resample()重新设置频率采样,再sh weekly.loc[weekly.index[1:53], "RPT":"MAL"].head(10)
输出结果如下:
RPT | VAL | ROS | ... | CLO | BEL | MAL | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
min | max | mean | std | min | max | mean | std | min | max | ... | mean | std | min | max | mean | std | min | max | mean | std | |
Yr_Mo_Dy | |||||||||||||||||||||
1961-01-08 | 10.58 | 18.50 | 13.541429 | 2.631321 | 6.63 | 16.88 | 11.486667 | 3.949525 | 7.62 | 12.33 | ... | 8.497143 | 1.704941 | 5.46 | 17.54 | 12.481429 | 4.349139 | 10.88 | 16.46 | 13.238571 | 1.773062 |
1961-01-15 | 9.04 | 19.75 | 12.468571 | 3.555392 | 3.54 | 12.08 | 8.967143 | 3.148945 | 7.08 | 19.50 | ... | 7.571429 | 4.084293 | 5.25 | 20.71 | 11.125714 | 5.552215 | 5.17 | 16.92 | 11.024286 | 4.692355 |
1961-01-22 | 4.92 | 19.83 | 13.204286 | 5.337402 | 3.42 | 14.37 | 9.862857 | 3.837785 | 7.29 | 20.79 | ... | 8.124286 | 4.783952 | 6.50 | 15.92 | 9.821429 | 3.626584 | 6.79 | 17.96 | 11.434286 | 4.237239 |
1961-01-29 | 13.62 | 25.04 | 19.880000 | 4.619061 | 9.96 | 23.91 | 16.141429 | 5.170224 | 12.67 | 25.84 | ... | 15.640000 | 3.713368 | 14.04 | 27.71 | 20.930000 | 5.210726 | 17.50 | 27.63 | 22.530000 | 3.874721 |
1961-02-05 | 10.58 | 24.21 | 16.827143 | 5.251408 | 9.46 | 24.21 | 15.460000 | 5.187395 | 9.04 | 19.70 | ... | 9.460000 | 2.839501 | 9.17 | 19.33 | 14.012857 | 4.210858 | 7.17 | 19.25 | 11.935714 | 4.336104 |
1961-02-12 | 16.00 | 24.54 | 19.684286 | 3.587677 | 11.54 | 21.42 | 16.417143 | 3.608373 | 13.67 | 21.34 | ... | 14.440000 | 1.746749 | 15.21 | 26.38 | 21.832857 | 4.063753 | 17.04 | 21.84 | 19.155714 | 1.828705 |
1961-02-19 | 6.04 | 22.50 | 15.130000 | 5.064609 | 11.63 | 20.17 | 15.091429 | 3.575012 | 6.13 | 19.41 | ... | 13.542857 | 2.531361 | 14.09 | 29.63 | 21.167143 | 5.910938 | 10.96 | 22.58 | 16.584286 | 4.685377 |
1961-02-26 | 7.79 | 25.80 | 15.221429 | 7.020716 | 7.08 | 21.50 | 13.625714 | 5.147348 | 6.08 | 22.42 | ... | 12.730000 | 4.920064 | 9.59 | 23.21 | 16.304286 | 5.091162 | 6.67 | 23.87 | 14.322857 | 6.182283 |
1961-03-05 | 10.96 | 13.33 | 12.101429 | 0.997721 | 8.83 | 17.00 | 12.951429 | 2.851955 | 8.17 | 13.67 | ... | 12.370000 | 1.593685 | 11.58 | 23.45 | 17.842857 | 4.332331 | 8.83 | 17.54 | 13.951667 | 3.021387 |
1961-03-12 | 4.88 | 14.79 | 9.376667 | 3.732263 | 8.08 | 16.96 | 11.578571 | 3.230167 | 7.54 | 16.38 | ... | 10.458571 | 3.655113 | 10.21 | 22.71 | 16.701429 | 4.358759 | 5.54 | 22.54 | 14.420000 | 5.769890 |
10 rows × 48 columns
Conclusion
今天的pandas题更新,继续刷题,加油!
这篇关于Python数据分析pandas入门练习题(八)的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-11-21Python编程基础教程
- 2024-11-20Python编程基础与实践
- 2024-11-20Python编程基础与高级应用
- 2024-11-19Python 基础编程教程
- 2024-11-19Python基础入门教程
- 2024-11-17在FastAPI项目中添加一个生产级别的数据库——本地环境搭建指南
- 2024-11-16`PyMuPDF4LLM`:提取PDF数据的神器
- 2024-11-16四种数据科学Web界面框架快速对比:Rio、Reflex、Streamlit和Plotly Dash
- 2024-11-14获取参数学习:Python编程入门教程
- 2024-11-14Python编程基础入门