【2.1】Pandas Series

May 15, 2018 pandas 阅读量：次

Series类型由一组数据及与之相关的数据索引组成

索引	数据
index_0	data_a
index_1	data_b
index_2	data_c
index_3	data_d

例子：

import pandas as pd
d =pd.Series([9,8,7,6],index = ['a','b','c','d'])
print d

结果

a 9
b 8
c 7
d 6
dtype: int64
#index可以缺省

一、series的创建

Series类型可以由如下类型创建:

Python列表
标量值
Python字典
ndarray
其他函数

1.1 从标量值创建

import pandas as pd
d =pd.Series(25,index = ['a','b','c','d'])
print d

结果：

a 25
b 25
c 25
d 25
dtype: int64
#不能缺省index

1.2 从字典类型创建

import pandas as pd
d =pd.Series({'a':23,'d':12,'b':15},index = ['a','b','c','d'])
print d

运行结果：

a 23.0
b 15.0
c NaN
d 12.0
dtype: float64
#inde从字典中选择操作

1.3 从ndarray类型创建

import pandas as pd
import numpy as np 
d =pd.Series(np.arange(5),index = np.arange(9,4,-1))
print d

运行结果：

9 0
8 1
7 2
6 3
5 4
dtype: int64

Series类型可以由如下类型创建:

Python列表，index与列表元素个数一致
标量值，index表达Series类型的尺寸
Python字典，键值对中的“键”是索引，index从字典中进行选择操作 * ndarray，索引和数据都可以通过ndarray类型创建
其他函数，range()函数等

二、操作

2.1 Series类型的基本操作：

Series 类型包括index和values两部分
Series类型的操作类似ndarray类型
Series类型的操作类似Python字典类型

示例数据：

import pandas as pd
import numpy as np 
d =pd.Series([9,8,7,6],index = ['a','b','c','d'])

输出索引：

print d.index
 #Index([u'a', u'b', u'c', u'd'], dtype='object')

输出值：

print d.values
 #[9 8 7 6]

其他：

print d['d']
#6

print d[1]
#8
#自动索引和自定义索引并存

print d[['a','b','c']]

a    9
b    8
c    7
dtype: int64

print d[['a','b','c',0]]
a    9.0
b    8.0
c    7.0
0    NaN
dtype: float64
#两套索引并存，但不能混用

2.2 Series类型的操作类似ndarray类型:

索引方法相同，采用[]
NumPy中运算和操作可用于Series类型
可以通过自定义索引的列表进行切片
可以通过自动索引进行切片，如果存在自定义索引，则一同被切片

案例：

import pandas as pd
import numpy as np 
d =pd.Series([9,8,7,6],index = ['a','b','c','d'])

print d
a    9
b    8
c    7
d    6
dtype: int64

print d[3]
6

print d[:2]
a    9
b    8
dtype: int64

print d[d>d.median()]
a    9
b    8
dtype: int64

print np.exp(d)
a    8103.083928
b    2980.957987
c    1096.633158
d     403.428793
dtype: float64

2.3 Series类型的操作类似Python字典类型:

通过自定义索引访问
保留字in操作
使用.get()方法

案例：

import pandas as pd	
import numpy as np

d =pd.Series([9,8,7,6],index = ['a','b','c','d'])

d['b']
Out[27]: 8

'c' in d
Out[28]: True

0 in d
Out[29]: False

d.get('f',100)
Out[31]: 100

2.4 Series类型对齐操作

Series类型在运算中会自动对齐不同索引的数据

import pandas as pd
import numpy as np
d =pd.Series([9,8,7,6],index = ['a','b','c','d'])
a =pd.Series([1,2,3],index = ['c','d','f'])

a+d
Out[36]: 
a    NaN
b    NaN
c    8.0
d    8.0
f    NaN
dtype: float64

2.5 Series类型的name属性

Series对象和索引都可以有一个名字，存储在属性.name中

import pandas as pd
d =pd.Series([9,8,7,6],index = ['a','b','c','d'])
d.name
d.name ='Series对象'
d.index.name = '索引列'

d
Out[42]: 
索引列
a    9
b    8
c    7
d    6
Name: Series对象, dtype: int64

2.6 Series类型的修改

series对象可以随时修改并即刻生效

import pandas as pd

d =pd.Series([9,8,7,6],index = ['a','b','c','d'])
d['b','c']=20

d
Out[44]: 
索引列
a     9
b    20
c    20
d     6
Name: Series对象, dtype: int64

Series是一维带“标签”数组

Series基本操作类似ndarray和字典，根据索引对齐

2.7 转换成字典数据

>>> s = pd.Series([1, 2, 3, 4])

>>> s.to_dict()
{0: 1, 1: 2, 2: 3, 3: 4}
>>> from collections import OrderedDict, defaultdict

>>> s.to_dict(OrderedDict)
OrderedDict([(0, 1), (1, 2), (2, 3), (3, 4)])

>>> dd = defaultdict(list)
>>> s.to_dict(dd)
defaultdict(<class 'list'>, {0: 1, 1: 2, 2: 3, 3: 4})

转换成dict了，就很方便转dataframe了。

cdrs_size = final_df.groupby(['cdrs'])['cdrs'].count()

cdr_dict = cdrs_size.to_dict()

cdr_df = pd.DataFrame.from_dict({'cdrs':cdr_dict.keys(),'cdr_num':cdr_dict.values()})

final_df = pd.merge(final_df, cdr_df, left_on='cdrs', right_on='cdrs', how='left')

参考资料

北京理工大学嵩山 www.python123.org

药企，独角兽，苏州。团队长期招人，感兴趣的都可以发邮件聊聊：tiehan@sina.cn

个人公众号，比较懒，很少更新，可以在上面提问题，如果回复不及时，可发邮件给我： tiehan@sina.cn