思考题
import pandas as pd
import numpy as np
data = pd.read_csv('table.csv',index_col='ID')
data.head()
School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|
ID | ||||||||
1101 | S_1 | C_1 | M | street_1 | 173 | 63 | 34.0 | A+ |
1102 | S_1 | C_1 | F | street_2 | 192 | 73 | 32.5 | B+ |
1103 | S_1 | C_1 | M | street_2 | 186 | 82 | 87.2 | B+ |
1104 | S_1 | C_1 | F | street_2 | 167 | 81 | 80.4 | B- |
1105 | S_1 | C_1 | F | street_4 | 159 | 64 | 84.8 | B+ |
思考题¶
将物理成绩(’Physics’)从 object 类型转化为类别类型 Categorical,并且定义其顺序,列出每个分数(按降序排列)有多少学生,以及每个物理分数对应的同学的数学成绩的平均值。
data['Physics'] = pd.Categorical(data['Physics'],categories=['C','B-','B','B+','A-','A','A+'], ordered=True)
data.value_counts('Physics').sort_index(ascending=False)
Physics
A+ 3
A 4
A- 3
B+ 9
B 8
B- 6
C 2
dtype: int64
data.groupby('Physics')['Math'].mean().sort_index()
Physics
C 74.600000
B- 62.283333
B 52.612500
B+ 62.144444
A- 93.400000
A 49.350000
A+ 55.533333
Name: Math, dtype: float64
将数学成绩(’Math’)按照分数段 0-40,40-50,50-60,60-70,70-80,80-90,90-100 (左开右闭区间) 转换成 ‘A+’,’A’,’A-‘,’B+’,’B’,’B-‘,’C’ 的类别数据。列出每个分数(按升序排列)有多少学生。找出数学和物理都得 A 的学生。
bins = [0,40,50,60,70,80,90,100]
cuts = pd.cut(data['Math'],bins=bins,labels=['C','B-','B','B+','A-','A','A+'])
data['Math'] = pd.Categorical(cuts,categories=['C','B-','B','B+','A-','A','A+'], ordered=True)
data.value_counts('Math').sort_index(ascending=True)
Math
C 7
B- 6
B 4
B+ 6
A- 3
A 7
A+ 2
dtype: int64
data[(data.Math=='A') & (data.Physics=='A')]
School | Class | Gender | Address | Height | Weight | Math | Physics | |
---|---|---|---|---|---|---|---|---|
ID | ||||||||
1304 | S_1 | C_3 | M | street_2 | 195 | 70 | A | A |