Numpy Pandas

Anaconda简介

本教程来源于学堂在线

MacOS下部分命令

  • 创建名为ml的python3.6环境 conda create -n ml python=3.6

  • 创建名为ml的环境 conda env remove -n ml

  • 进入名为ml的环境 source activate ml

  • 退出环境 conda deactivate

  • 安装工具包 conda install numpy pandas scikit-learn

  • 查看所有环境名字 conda env list

  • 查看当前环境下所有已安装的工具包 conda list

Pandas & Numpy

为帮助python基础较弱的同学完成案例作业,下面为大家演示讲解numpy和pandas里的一些常用函数,了解基本操作方便之后使用。

载入工具包

import pandas as pd
import numpy as np

Pandas

初始化一个pandas的DataFrame

比较常用的一种初始化方式是从python的字典初始化,列名是字典的key,每列的元素是字典的value,要求每列长度相同

df = pd.DataFrame({'age': [1,2,3], 'name': ['a', 'b', 'c']})
print(df)
print(type(df))
   age name
0    1    a
1    2    b
2    3    c
<class 'pandas.core.frame.DataFrame'>

pandas也可以读入一个csv文件成DataFrame

df = pd.read_csv('./data/high_diamond_ranked_10min.csv', sep=',')
print(type(df))
<class 'pandas.core.frame.DataFrame'>

查看DataFrame的信息

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9879 entries, 0 to 9878
Data columns (total 40 columns):
gameId                          9879 non-null int64
blueWins                        9879 non-null int64
blueWardsPlaced                 9879 non-null int64
blueWardsDestroyed              9879 non-null int64
blueFirstBlood                  9879 non-null int64
blueKills                       9879 non-null int64
blueDeaths                      9879 non-null int64
blueAssists                     9879 non-null int64
blueEliteMonsters               9879 non-null int64
blueDragons                     9879 non-null int64
blueHeralds                     9879 non-null int64
blueTowersDestroyed             9879 non-null int64
blueTotalGold                   9879 non-null int64
blueAvgLevel                    9879 non-null float64
blueTotalExperience             9879 non-null int64
blueTotalMinionsKilled          9879 non-null int64
blueTotalJungleMinionsKilled    9879 non-null int64
blueGoldDiff                    9879 non-null int64
blueExperienceDiff              9879 non-null int64
blueCSPerMin                    9879 non-null float64
blueGoldPerMin                  9879 non-null float64
redWardsPlaced                  9879 non-null int64
redWardsDestroyed               9879 non-null int64
redFirstBlood                   9879 non-null int64
redKills                        9879 non-null int64
redDeaths                       9879 non-null int64
redAssists                      9879 non-null int64
redEliteMonsters                9879 non-null int64
redDragons                      9879 non-null int64
redHeralds                      9879 non-null int64
redTowersDestroyed              9879 non-null int64
redTotalGold                    9879 non-null int64
redAvgLevel                     9879 non-null float64
redTotalExperience              9879 non-null int64
redTotalMinionsKilled           9879 non-null int64
redTotalJungleMinionsKilled     9879 non-null int64
redGoldDiff                     9879 non-null int64
redExperienceDiff               9879 non-null int64
redCSPerMin                     9879 non-null float64
redGoldPerMin                   9879 non-null float64
dtypes: float64(6), int64(34)
memory usage: 3.0 MB

查看头/尾10条数据

# df.head(10)
# df.tail(10)
print(df)
print(df.drop(columns=['gameId']))
print(df)
          gameId  blueWins  blueWardsPlaced  blueWardsDestroyed  \
0     4519157822         0               28                   2   
1     4523371949         0               12                   1   
2     4521474530         0               15                   0   
3     4524384067         0               43                   1   
4     4436033771         0               75                   4   
...          ...       ...              ...                 ...   
9874  4527873286         1               17                   2   
9875  4527797466         1               54                   0   
9876  4527713716         0               23                   1   
9877  4527628313         0               14                   4   
9878  4523772935         1               18                   0   

      blueFirstBlood  blueKills  blueDeaths  blueAssists  blueEliteMonsters  \
0                  1          9           6           11                  0   
1                  0          5           5            5                  0   
2                  0          7          11            4                  1   
3                  0          4           5            5                  1   
4                  0          6           6            6                  0   
...              ...        ...         ...          ...                ...   
9874               1          7           4            5                  1   
9875               0          6           4            8                  1   
9876               0          6           7            5                  0   
9877               1          2           3            3                  1   
9878               1          6           6            5                  0   

      blueDragons  ...  redTowersDestroyed  redTotalGold  redAvgLevel  \
0               0  ...                   0         16567          6.8   
1               0  ...                   1         17620          6.8   
2               1  ...                   0         17285          6.8   
3               0  ...                   0         16478          7.0   
4               0  ...                   0         17404          7.0   
...           ...  ...                 ...           ...          ...   
9874            1  ...                   0         15246          6.8   
9875            1  ...                   0         15456          7.0   
9876            0  ...                   0         18319          7.4   
9877            1  ...                   0         15298          7.2   
9878            0  ...                   0         15339          6.8   

      redTotalExperience  redTotalMinionsKilled  redTotalJungleMinionsKilled  \
0                  17047                    197                           55   
1                  17438                    240                           52   
2                  17254                    203                           28   
3                  17961                    235                           47   
4                  18313                    225                           67   
...                  ...                    ...                          ...   
9874               16498                    229                           34   
9875               18367                    206                           56   
9876               19909                    261                           60   
9877               18314                    247                           40   
9878               17379                    201                           46   

      redGoldDiff  redExperienceDiff  redCSPerMin  redGoldPerMin  
0            -643                  8         19.7         1656.7  
1            2908               1173         24.0         1762.0  
2            1172               1033         20.3         1728.5  
3            1321                  7         23.5         1647.8  
4            1004               -230         22.5         1740.4  
...           ...                ...          ...            ...  
9874        -2519              -2469         22.9         1524.6  
9875         -782               -888         20.6         1545.6  
9876         2416               1877         26.1         1831.9  
9877          839               1085         24.7         1529.8  
9878         -927                 58         20.1         1533.9  

[9879 rows x 40 columns]
      blueWins  blueWardsPlaced  blueWardsDestroyed  blueFirstBlood  \
0            0               28                   2               1   
1            0               12                   1               0   
2            0               15                   0               0   
3            0               43                   1               0   
4            0               75                   4               0   
...        ...              ...                 ...             ...   
9874         1               17                   2               1   
9875         1               54                   0               0   
9876         0               23                   1               0   
9877         0               14                   4               1   
9878         1               18                   0               1   

      blueKills  blueDeaths  blueAssists  blueEliteMonsters  blueDragons  \
0             9           6           11                  0            0   
1             5           5            5                  0            0   
2             7          11            4                  1            1   
3             4           5            5                  1            0   
4             6           6            6                  0            0   
...         ...         ...          ...                ...          ...   
9874          7           4            5                  1            1   
9875          6           4            8                  1            1   
9876          6           7            5                  0            0   
9877          2           3            3                  1            1   
9878          6           6            5                  0            0   

      blueHeralds  ...  redTowersDestroyed  redTotalGold  redAvgLevel  \
0               0  ...                   0         16567          6.8   
1               0  ...                   1         17620          6.8   
2               0  ...                   0         17285          6.8   
3               1  ...                   0         16478          7.0   
4               0  ...                   0         17404          7.0   
...           ...  ...                 ...           ...          ...   
9874            0  ...                   0         15246          6.8   
9875            0  ...                   0         15456          7.0   
9876            0  ...                   0         18319          7.4   
9877            0  ...                   0         15298          7.2   
9878            0  ...                   0         15339          6.8   

      redTotalExperience  redTotalMinionsKilled  redTotalJungleMinionsKilled  \
0                  17047                    197                           55   
1                  17438                    240                           52   
2                  17254                    203                           28   
3                  17961                    235                           47   
4                  18313                    225                           67   
...                  ...                    ...                          ...   
9874               16498                    229                           34   
9875               18367                    206                           56   
9876               19909                    261                           60   
9877               18314                    247                           40   
9878               17379                    201                           46   

      redGoldDiff  redExperienceDiff  redCSPerMin  redGoldPerMin  
0            -643                  8         19.7         1656.7  
1            2908               1173         24.0         1762.0  
2            1172               1033         20.3         1728.5  
3            1321                  7         23.5         1647.8  
4            1004               -230         22.5         1740.4  
...           ...                ...          ...            ...  
9874        -2519              -2469         22.9         1524.6  
9875         -782               -888         20.6         1545.6  
9876         2416               1877         26.1         1831.9  
9877          839               1085         24.7         1529.8  
9878         -927                 58         20.1         1533.9  

[9879 rows x 39 columns]
          gameId  blueWins  blueWardsPlaced  blueWardsDestroyed  \
0     4519157822         0               28                   2   
1     4523371949         0               12                   1   
2     4521474530         0               15                   0   
3     4524384067         0               43                   1   
4     4436033771         0               75                   4   
...          ...       ...              ...                 ...   
9874  4527873286         1               17                   2   
9875  4527797466         1               54                   0   
9876  4527713716         0               23                   1   
9877  4527628313         0               14                   4   
9878  4523772935         1               18                   0   

      blueFirstBlood  blueKills  blueDeaths  blueAssists  blueEliteMonsters  \
0                  1          9           6           11                  0   
1                  0          5           5            5                  0   
2                  0          7          11            4                  1   
3                  0          4           5            5                  1   
4                  0          6           6            6                  0   
...              ...        ...         ...          ...                ...   
9874               1          7           4            5                  1   
9875               0          6           4            8                  1   
9876               0          6           7            5                  0   
9877               1          2           3            3                  1   
9878               1          6           6            5                  0   

      blueDragons  ...  redTowersDestroyed  redTotalGold  redAvgLevel  \
0               0  ...                   0         16567          6.8   
1               0  ...                   1         17620          6.8   
2               1  ...                   0         17285          6.8   
3               0  ...                   0         16478          7.0   
4               0  ...                   0         17404          7.0   
...           ...  ...                 ...           ...          ...   
9874            1  ...                   0         15246          6.8   
9875            1  ...                   0         15456          7.0   
9876            0  ...                   0         18319          7.4   
9877            1  ...                   0         15298          7.2   
9878            0  ...                   0         15339          6.8   

      redTotalExperience  redTotalMinionsKilled  redTotalJungleMinionsKilled  \
0                  17047                    197                           55   
1                  17438                    240                           52   
2                  17254                    203                           28   
3                  17961                    235                           47   
4                  18313                    225                           67   
...                  ...                    ...                          ...   
9874               16498                    229                           34   
9875               18367                    206                           56   
9876               19909                    261                           60   
9877               18314                    247                           40   
9878               17379                    201                           46   

      redGoldDiff  redExperienceDiff  redCSPerMin  redGoldPerMin  
0            -643                  8         19.7         1656.7  
1            2908               1173         24.0         1762.0  
2            1172               1033         20.3         1728.5  
3            1321                  7         23.5         1647.8  
4            1004               -230         22.5         1740.4  
...           ...                ...          ...            ...  
9874        -2519              -2469         22.9         1524.6  
9875         -782               -888         20.6         1545.6  
9876         2416               1877         26.1         1831.9  
9877          839               1085         24.7         1529.8  
9878         -927                 58         20.1         1533.9  

[9879 rows x 40 columns]

DataFrame每行的index

这里的index就是普通的数字0-9879,有时index也可以是其他一些特殊对象,如日期时间等。

print(df.index)
RangeIndex(start=0, stop=9879, step=1)

列名

csv文件一般列名读入是字符串str,有些情况下没有列名或简单以数字作为列名。

print(df.columns)
cols = df.columns[:2]
print(type(cols))
cols = list(cols) + ['blueAvgLevel', 'redAvgLevel']
print(cols)
Index(['gameId', 'blueWins', 'blueWardsPlaced', 'blueWardsDestroyed',
       'blueFirstBlood', 'blueKills', 'blueDeaths', 'blueAssists',
       'blueEliteMonsters', 'blueDragons', 'blueHeralds',
       'blueTowersDestroyed', 'blueTotalGold', 'blueAvgLevel',
       'blueTotalExperience', 'blueTotalMinionsKilled',
       'blueTotalJungleMinionsKilled', 'blueGoldDiff', 'blueExperienceDiff',
       'blueCSPerMin', 'blueGoldPerMin', 'redWardsPlaced', 'redWardsDestroyed',
       'redFirstBlood', 'redKills', 'redDeaths', 'redAssists',
       'redEliteMonsters', 'redDragons', 'redHeralds', 'redTowersDestroyed',
       'redTotalGold', 'redAvgLevel', 'redTotalExperience',
       'redTotalMinionsKilled', 'redTotalJungleMinionsKilled', 'redGoldDiff',
       'redExperienceDiff', 'redCSPerMin', 'redGoldPerMin'],
      dtype='object')
<class 'pandas.core.indexes.base.Index'>
['gameId', 'blueWins', 'blueAvgLevel', 'redAvgLevel']

访问列

可以传入列名字符串(返回Series)或列名的list(返回DataFame)

df['redAvgLevel']
# df[cols]
type(df[['redAvgLevel']])
pandas.core.frame.DataFrame

访问行

df[0:3]
gameId blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood blueKills blueDeaths blueAssists blueEliteMonsters blueDragons ... redTowersDestroyed redTotalGold redAvgLevel redTotalExperience redTotalMinionsKilled redTotalJungleMinionsKilled redGoldDiff redExperienceDiff redCSPerMin redGoldPerMin
0 4519157822 0 28 2 1 9 6 11 0 0 ... 0 16567 6.8 17047 197 55 -643 8 19.7 1656.7
1 4523371949 0 12 1 0 5 5 5 0 0 ... 1 17620 6.8 17438 240 52 2908 1173 24.0 1762.0
2 4521474530 0 15 0 0 7 11 4 1 1 ... 0 17285 6.8 17254 203 28 1172 1033 20.3 1728.5

3 rows × 40 columns

访问某个区域

例如前3行,cols对应的4列

df.loc[range(0,3), cols]
gameId blueWins blueAvgLevel redAvgLevel
0 4519157822 0 6.6 6.8
1 4523371949 0 6.6 6.8
2 4521474530 0 6.4 6.8

不建议在loc中使用形如0:3这样的index,因为行为略为反常。

df.loc[0:3]
gameId blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood blueKills blueDeaths blueAssists blueEliteMonsters blueDragons ... redTowersDestroyed redTotalGold redAvgLevel redTotalExperience redTotalMinionsKilled redTotalJungleMinionsKilled redGoldDiff redExperienceDiff redCSPerMin redGoldPerMin
0 4519157822 0 28 2 1 9 6 11 0 0 ... 0 16567 6.8 17047 197 55 -643 8 19.7 1656.7
1 4523371949 0 12 1 0 5 5 5 0 0 ... 1 17620 6.8 17438 240 52 2908 1173 24.0 1762.0
2 4521474530 0 15 0 0 7 11 4 1 1 ... 0 17285 6.8 17254 203 28 1172 1033 20.3 1728.5
3 4524384067 0 43 1 0 4 5 5 1 0 ... 0 16478 7.0 17961 235 47 1321 7 23.5 1647.8

4 rows × 40 columns

访问某个元素

loc是传入index和列名, 但是iloc传入的是编号,无论index和column是否为数字,都传入0-xxx的数字下标

df.loc[0, 'gameId']
df.iloc[0, 0]
df.at[0, 'gameId']
4519157822

拷贝

=是引用一个DataFrame对象,修改df_copy则df也会发生变化。 如果不想原df被修改,可以使用copy深拷贝一个DataFrame对象。

df_copy = df
df_copy = df.copy()

这样修改df_copy时df不会发生变化

df_copy.loc[0, 'gameId'] = 1234
df.loc[0, 'gameId']
print(df_copy.loc[0, 'gameId'])
1234

过滤行

实际df['blueWins'] > 0返回的是一个true/false的列

df[df['blueWins'] > 0]
# df['blueWins'] > 0
gameId blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood blueKills blueDeaths blueAssists blueEliteMonsters blueDragons ... redTowersDestroyed redTotalGold redAvgLevel redTotalExperience redTotalMinionsKilled redTotalJungleMinionsKilled redGoldDiff redExperienceDiff redCSPerMin redGoldPerMin
5 4475365709 1 18 0 0 5 3 6 1 1 ... 0 15201 7.0 18060 221 59 -698 -101 22.1 1520.1
6 4493010632 1 18 3 1 7 6 7 1 1 ... 0 14463 6.4 15404 164 35 -2411 -1563 16.4 1446.3
9 4509433346 1 13 1 1 4 5 5 1 1 ... 0 16605 6.8 18379 247 43 1548 1574 24.7 1660.5
12 4515594785 1 18 1 1 7 1 11 1 1 ... 0 14591 6.8 17443 240 50 -3274 -1659 24.0 1459.1
14 4516505202 1 15 3 1 4 4 4 0 0 ... 0 16192 7.0 18083 242 48 470 187 24.2 1619.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9872 4527650398 1 12 0 1 7 7 9 0 0 ... 0 16399 7.0 18001 216 58 -756 -1 21.6 1639.9
9873 4527878058 1 18 2 1 12 6 13 0 0 ... 0 15934 6.6 17027 197 38 -2639 -2364 19.7 1593.4
9874 4527873286 1 17 2 1 7 4 5 1 1 ... 0 15246 6.8 16498 229 34 -2519 -2469 22.9 1524.6
9875 4527797466 1 54 0 0 6 4 8 1 1 ... 0 15456 7.0 18367 206 56 -782 -888 20.6 1545.6
9878 4523772935 1 18 0 1 6 6 5 0 0 ... 0 15339 6.8 17379 201 46 -927 58 20.1 1533.9

4930 rows × 40 columns

也可以有更复杂的条件

df[(df['blueWardsPlaced'] > 10) & (df['blueWardsPlaced'].isin([9, 15, 17]))]
gameId blueWins blueWardsPlaced blueWardsDestroyed blueFirstBlood blueKills blueDeaths blueAssists blueEliteMonsters blueDragons ... redTowersDestroyed redTotalGold redAvgLevel redTotalExperience redTotalMinionsKilled redTotalJungleMinionsKilled redGoldDiff redExperienceDiff redCSPerMin redGoldPerMin
2 4521474530 0 15 0 0 7 11 4 1 1 ... 0 17285 6.8 17254 203 28 1172 1033 20.3 1728.5
14 4516505202 1 15 3 1 4 4 4 0 0 ... 0 16192 7.0 18083 242 48 470 187 24.2 1619.2
15 4482120064 0 17 1 0 3 7 3 0 0 ... 0 17011 7.2 18778 237 51 1996 1804 23.7 1701.1
22 4480384157 0 17 2 0 4 6 3 0 0 ... 0 17027 7.0 18129 231 60 1254 567 23.1 1702.7
25 4523978853 0 17 1 0 4 8 4 0 0 ... 0 17887 7.0 17114 221 36 2472 1067 22.1 1788.7
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9845 4527853173 0 17 0 0 2 8 1 0 0 ... 0 18122 7.2 19051 243 50 4676 3551 24.3 1812.2
9851 4527637091 0 15 2 0 4 8 2 0 0 ... 0 17300 7.2 18342 236 55 2591 1250 23.6 1730.0
9853 4527865649 0 17 2 0 7 8 9 1 0 ... 0 17206 7.0 18580 212 56 -188 676 21.2 1720.6
9855 4527433973 0 15 3 0 2 7 3 0 0 ... 0 16148 6.6 17538 198 42 880 -179 19.8 1614.8
9874 4527873286 1 17 2 1 7 4 5 1 1 ... 0 15246 6.8 16498 229 34 -2519 -2469 22.9 1524.6

2205 rows × 40 columns

内置函数

DataFrame内置了一些非常方便的函数,比如算均值mean,最小值min,最大值max等等。 API文档

df[cols].mean()
gameId          4.500084e+09
blueWins        4.990384e-01
blueAvgLevel    6.916004e+00
redAvgLevel     6.925316e+00
dtype: float64

某列Series也是一样有很多内置函数 API文档

print(df['redAvgLevel'].mean())
print(df['redAvgLevel'].min())
print(df['redAvgLevel'].max())
6.925316327563518
4.8
8.2

比如可以快速统计出某一列各个数据出现的次数。

df['blueWins'].value_counts()
0    4949
1    4930
Name: blueWins, dtype: int64

apply函数

apply是个非常强大的函数,可以传入一个函数,这一列中的每个元素都会被这个函数作用,最后返回一个新的列。

df['blueWins'].apply(lambda x: 'win' if x==1 else 'lose')
0       lose
1       lose
2       lose
3       lose
4       lose
        ... 
9874     win
9875     win
9876    lose
9877    lose
9878     win
Name: blueWins, Length: 9879, dtype: object

也可以不用lambda表达式,定义更复杂的函数

def win(x):
    return 'win' if x == 1 else 'lose'
df_copy['blueWins'] = df['blueWins'].apply(win)
df_copy['blueWins']
0       lose
1       lose
2       lose
3       lose
4       lose
        ... 
9874     win
9875     win
9876    lose
9877    lose
9878     win
Name: blueWins, Length: 9879, dtype: object

列运算

df['brGoldDiff'] = df['blueTotalGold'] - df['redTotalGold']
print(df['brGoldDiff'])
print(df['brGoldDiff'] == df['blueGoldDiff'])
0        643
1      -2908
2      -1172
3      -1321
4      -1004
        ... 
9874    2519
9875     782
9876   -2416
9877    -839
9878     927
Name: brGoldDiff, Length: 9879, dtype: int64
0       True
1       True
2       True
3       True
4       True
        ... 
9874    True
9875    True
9876    True
9877    True
9878    True
Length: 9879, dtype: bool
type(df['brGoldDiff'] == df['blueGoldDiff'])
pandas.core.series.Series

小例子

案例1中可能会要求大家对列数据进行离散话,这里举个例子,把某列数据离散化成最小值到最大值的k个区间。

(实际上还有pandas还有cut和qcut函数可以帮助离散化,感兴趣的同学可以进一步了解使用)。

print(df['blueTotalGold'])
def min_max(x, max_v, min_v, k):
    return (x - min_v) // ((max_v - min_v + 1) // k)
min_v = df['blueTotalGold'].min()
max_v = df['blueTotalGold'].max()
df_copy['blueTotalGold'] = df['blueTotalGold'].apply(lambda x: min_max(x, max_v=max_v, min_v=min_v, k=10))
print(df_copy['blueTotalGold'])
df_copy['blueTotalGold'].value_counts()
0       17210
1       14712
2       16113
3       15157
4       16400
        ...  
9874    17765
9875    16238
9876    15903
9877    14459
9878    16266
Name: blueTotalGold, Length: 9879, dtype: int64
0       4
1       3
2       4
3       3
4       4
       ..
9874    5
9875    4
9876    3
9877    2
9878    4
Name: blueTotalGold, Length: 9879, dtype: int64





4     3236
3     2753
5     1970
2      879
6      721
7      198
1       67
8       40
9       12
0        2
10       1
Name: blueTotalGold, dtype: int64

其他

其他更多应用和函数可查询API文档(中文) API文档(英文)

Numpy

pandas的DataFrame和列Series都可以直接取出数据为numpy的矩阵

print(type(df.values))
print(df.values)
print(type(df['blueTotalGold'].values))
<class 'numpy.ndarray'>
[[ 4.51915782e+09  0.00000000e+00  2.80000000e+01 ...  1.97000000e+01
   1.65670000e+03  6.43000000e+02]
 [ 4.52337195e+09  0.00000000e+00  1.20000000e+01 ...  2.40000000e+01
   1.76200000e+03 -2.90800000e+03]
 [ 4.52147453e+09  0.00000000e+00  1.50000000e+01 ...  2.03000000e+01
   1.72850000e+03 -1.17200000e+03]
 ...
 [ 4.52771372e+09  0.00000000e+00  2.30000000e+01 ...  2.61000000e+01
   1.83190000e+03 -2.41600000e+03]
 [ 4.52762831e+09  0.00000000e+00  1.40000000e+01 ...  2.47000000e+01
   1.52980000e+03 -8.39000000e+02]
 [ 4.52377294e+09  1.00000000e+00  1.80000000e+01 ...  2.01000000e+01
   1.53390000e+03  9.27000000e+02]]
<class 'numpy.ndarray'>

数据是没有列名的,类似一个n维数组

blueTotalGold = df['blueTotalGold'].values
print(blueTotalGold)
print(blueTotalGold.dtype)
[17210 14712 16113 ... 15903 14459 16266]
int64

矩阵运算

比n维数组好的地方是支持矩阵运算,比如如果是python数组,要对每个元素+1,需要做循环

python_list = [[1,2,3,4,5], [11,22,33,44,55]]
python_list = [[i + 1 for i in l] for l in python_list]
print(python_list)
[[2, 3, 4, 5, 6], [12, 23, 34, 45, 56]]

但是numpy方便矩阵运算

numpy_array = np.array([[1,2,3,4,5], [11,22,33,44,55]])
print(numpy_array + 1)
[[ 2  3  4  5  6]
 [12 23 34 45 56]]

新建特殊矩阵

print(np.zeros((3, 4)))
print(np.ones((2, 2)))
print(np.empty( (2,3) ) )
print(np.arange( 10, 30, 5 ))
print(np.arange(12).reshape(4,3))
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1.]
 [1. 1.]]
[[8.5029e-320 7.2687e-320 7.9609e-320]
 [7.8571e-320 7.1437e-320 8.0365e-320]]
[10 15 20 25]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

矩阵属性信息

print(df[cols])
arr = df[cols].values
print(arr)
print(arr.shape)
print(arr.ndim)
print(arr.size)
print(arr.dtype)

          gameId  blueWins  blueAvgLevel  redAvgLevel
0     4519157822         0           6.6          6.8
1     4523371949         0           6.6          6.8
2     4521474530         0           6.4          6.8
3     4524384067         0           7.0          7.0
4     4436033771         0           7.0          7.0
...          ...       ...           ...          ...
9874  4527873286         1           7.2          6.8
9875  4527797466         1           7.2          7.0
9876  4527713716         0           7.0          7.4
9877  4527628313         0           6.6          7.2
9878  4523772935         1           7.0          6.8

[9879 rows x 4 columns]
[[4.51915782e+09 0.00000000e+00 6.60000000e+00 6.80000000e+00]
 [4.52337195e+09 0.00000000e+00 6.60000000e+00 6.80000000e+00]
 [4.52147453e+09 0.00000000e+00 6.40000000e+00 6.80000000e+00]
 ...
 [4.52771372e+09 0.00000000e+00 7.00000000e+00 7.40000000e+00]
 [4.52762831e+09 0.00000000e+00 6.60000000e+00 7.20000000e+00]
 [4.52377294e+09 1.00000000e+00 7.00000000e+00 6.80000000e+00]]
(9879, 4)
2
39516
float64

元素访问

访问第2行

arr[1, :]
array([4.52337195e+09, 0.00000000e+00, 6.60000000e+00, 6.80000000e+00])

访问第3列

arr[:, 2]
array([6.6, 6.6, 6.4, ..., 7. , 6.6, 7. ])

访问前3行,3-最后一列

arr[:3, 2:]
array([[6.6, 6.8],
       [6.6, 6.8],
       [6.4, 6.8]])

数学运算

arr[:3, 2:] + 1
array([[7.6, 7.8],
       [7.6, 7.8],
       [7.4, 7.8]])
(arr[:3, 2:] - 1) * 10
array([[56., 58.],
       [56., 58.],
       [54., 58.]])
arr[:3, 2:] ** 2
array([[43.56, 46.24],
       [43.56, 46.24],
       [40.96, 46.24]])
arr[:3, 2:].sum()
40.0
arr[:3, 2:].mean()
6.666666666666667
arr[:3, 2:].max()
arr[:3, 2:].min()
6.4
np.sum(arr[:3, 2:])
np.sin(arr[:3, 2:])
np.cos(arr[:3, 2:])
array([[0.95023259, 0.86939749],
       [0.95023259, 0.86939749],
       [0.99318492, 0.86939749]])

*是元素乘,@是矩阵乘

A = np.array( [[1,1],
               [0,1]] )
B = np.array( [[2,0],
               [3,4]] )
A * B
array([[2, 0],
       [0, 4]])
A @ B
array([[5, 4],
       [3, 4]])
A + B
array([[3, 1],
       [3, 5]])
A - B
array([[-1,  1],
       [-3, -3]])

按元素相除,除0会抛出warning,返回无穷inf

A / B
/home/shisy13/anaconda3/envs/conda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
  """Entry point for launching an IPython kernel.





array([[0.5 ,  inf],
       [0.  , 0.25]])

数学运算矩阵的大小必须要合法

# A * np.array([1, 2, 3])
A * np.array([1, 2])
array([[1, 2],
       [0, 2]])

拷贝

和pandsa一样,一般等号负值是引用,copy之后才是新的对象复制

A_copy = A.copy()
A_copy[0,0] = 2
print(A)
print(A_copy)
[[1 1]
 [0 1]]
[[2 1]
 [0 1]]

小例子

假设要对平均等级做离散化,用类似之前pandas里的实现方式,最大最小值之间以0.2分割区间

blueAvgLevel = df['blueAvgLevel'].values
print(blueAvgLevel)
[6.6 6.6 6.4 ... 7.  6.6 7. ]
print(blueAvgLevel.min(), blueAvgLevel.max())
4.6 8.0

float运算后还是float

(blueAvgLevel - 4.6)/0.1
array([20., 20., 18., ..., 24., 20., 24.])

np.around()可以四舍五入,astype可以变换数据类型

print((blueAvgLevel - 4.6)/0.2)
blueAvgLevel_new = np.around((blueAvgLevel - 4.6)/0.2).astype(int)
print(blueAvgLevel_new)
[10. 10.  9. ... 12. 10. 12.]
[10 10  9 ... 12 10 12]

可以把array赋值给pandas的某一列

df['blueAvgLevel'] = blueAvgLevel_new
print(df['blueAvgLevel'])
df[cols]
del df
del arr
0       10
1       10
2        9
3       12
4       12
        ..
9874    13
9875    13
9876    12
9877    10
9878    12
Name: blueAvgLevel, Length: 9879, dtype: int64

其他

其他更多应用和函数可查询API文档(中文) API文档(英文)