思考题

import pandas as pd
import numpy as np
data = pd.read_csv('table.csv',index_col='ID')
data.head()
School Class Gender Address Height Weight Math Physics
ID
1101 S_1 C_1 M street_1 173 63 34.0 A+
1102 S_1 C_1 F street_2 192 73 32.5 B+
1103 S_1 C_1 M street_2 186 82 87.2 B+
1104 S_1 C_1 F street_2 167 81 80.4 B-
1105 S_1 C_1 F street_4 159 64 84.8 B+

思考题

  1. 将物理成绩(’Physics’)从 object 类型转化为类别类型 Categorical,并且定义其顺序,列出每个分数(按降序排列)有多少学生,以及每个物理分数对应的同学的数学成绩的平均值。

data['Physics'] = pd.Categorical(data['Physics'],categories=['C','B-','B','B+','A-','A','A+'], ordered=True)
data.value_counts('Physics').sort_index(ascending=False)
Physics
A+    3
A     4
A-    3
B+    9
B     8
B-    6
C     2
dtype: int64
data.groupby('Physics')['Math'].mean().sort_index()
Physics
C     74.600000
B-    62.283333
B     52.612500
B+    62.144444
A-    93.400000
A     49.350000
A+    55.533333
Name: Math, dtype: float64
  1. 将数学成绩(’Math’)按照分数段 0-40,40-50,50-60,60-70,70-80,80-90,90-100 (左开右闭区间) 转换成 ‘A+’,’A’,’A-‘,’B+’,’B’,’B-‘,’C’ 的类别数据。列出每个分数(按升序排列)有多少学生。找出数学和物理都得 A 的学生。

bins = [0,40,50,60,70,80,90,100]
cuts = pd.cut(data['Math'],bins=bins,labels=['C','B-','B','B+','A-','A','A+'])
data['Math'] = pd.Categorical(cuts,categories=['C','B-','B','B+','A-','A','A+'], ordered=True)
data.value_counts('Math').sort_index(ascending=True)
Math
C     7
B-    6
B     4
B+    6
A-    3
A     7
A+    2
dtype: int64
data[(data.Math=='A') & (data.Physics=='A')]
School Class Gender Address Height Weight Math Physics
ID
1304 S_1 C_3 M street_2 195 70 A A