1.Series
生成一维数组,左边索引,右边值:
In [3]: obj = Series([1,2,3,4,5]) In [4]: obj Out[4]: 0 1 1 2 2 3 3 4 4 5 dtype: int64 In [5]: obj.values Out[5]: array([1, 2, 3, 4, 5], dtype=int64) In [6]: obj.index Out[6]: RangeIndex(start=0, stop=5, step=1)
创建对各个数据点进行标记的索引:
In [7]: obj2 = Series([4,1,9,7], index=["a","c","e","ff"]) In [8]: obj2 Out[8]: a 4 c 1 e 9 ff 7 dtype: int64 In [9]: obj2.index Out[9]: Index(['a', 'c', 'e', 'ff'], dtype='object')
取一个值或一组值:
In [10]: obj2["c"] Out[10]: 1 In [11]: obj2[["c","e"]] Out[11]: c 1 e 9 dtype: int64
数组运算,会显示索引:
In [12]: obj2[obj2>3] Out[12]: a 4 e 9 ff 7 dtype: int64
Series还可以看作有序的字典,很多字典操作可以使用:
In [13]: "c" in obj2 Out[13]: True
直接用字典创建Series:
In [14]: data = {"name":"liu","year":18,"sex":"man"} In [15]: obj3 = Series(data) In [16]: obj3 Out[16]: name liu year 18 sex man dtype: object
用字典结合列表创建Series:
In [17]: list1 = ["name","year","mobile"] In [18]: obj4 = Series(data,index=list1) In [19]: obj4 Out[19]: name liu year 18 mobile NaN dtype: object
PS:因为data字典中没有mobile所以值为NaN
检测数据是否缺失:
In [20]: pd.isnull(obj4) Out[20]: name False year False mobile True dtype: bool In [21]: pd.notnull(obj4) Out[21]: name True year True mobile False dtype: bool In [22]: obj4.isnull() Out[22]: name False year False mobile True dtype: bool In [23]: obj4.notnull() Out[23]: name True year True mobile False dtype: bool
Series的name属性:
In [7]: obj4.name = "hahaha" In [8]: obj4.index.name = "state" In [9]: obj4 Out[9]: state name liu year 18 mobile NaN Name: hahaha, dtype: object
2.DataFrame
构建DataFrame
In [13]: data = { "state":[1,1,2,1,1], "year":[2000,2001,2002,2004,2005], "pop":[1.5,1.7,3.6,2.4,2.9] } In [14]: frame = DataFrame(data) In [15]: frame Out[15]: state year pop 0 1 2000 1.5 1 1 2001 1.7 2 2 2002 3.6 3 1 2004 2.4 4 1 2005 2.9
设定行与列的名称,如果数据找不到则产生NA值:
In [18]: frame2 = DataFrame( data, columns=["year","state","pop","debt"], index=["one","two","three","four","five"] ) In [19]: frame2 Out[19]: year state pop debt one 2000 1 1.5 NaN two 2001 1 1.7 NaN three 2002 2 3.6 NaN four 2004 1 2.4 NaN five 2005 1 2.9 NaN
将DataFrame的列获取成为Series:
In [7]: frame2.year Out[7]: one 2000 two 2001 three 2002 four 2004 five 2005 Name: year, dtype: int64
PS:返回的索引不变,且name属性被设置了
获取行:
In [11]: frame2.loc["three"] Out[11]: year 2002 state 2 pop 3.6 debt NaN Name: three, dtype: object
赋值列:
In [12]: frame2['debt'] = 16.5 In [13]: frame2 Out[13]: year state pop debt one 2000 1 1.5 16.5 two 2001 1 1.7 16.5 three 2002 2 3.6 16.5 four 2004 1 2.4 16.5 five 2005 1 2.9 16.5
如果赋值列表或数组,长度需要相等;如果赋值Series,则精确匹配索引
In [17]: val = Series([1.2,1.5,1.7], index=["two","four","five"]) In [18]: frame2['debt'] = val In [19]: frame2 Out[19]: year state pop debt one 2000 1 1.5 NaN two 2001 1 1.7 1.2 three 2002 2 3.6 NaN four 2004 1 2.4 1.5 five 2005 1 2.9 1.7
如果列不存在,则创建:
In [21]: frame2["eastern"] = frame2.state == 1 In [22]: frame2 Out[22]: year state pop debt eastern one 2000 1 1.5 NaN True two 2001 1 1.7 1.2 True three 2002 2 3.6 NaN False four 2004 1 2.4 1.5 True five 2005 1 2.9 1.7 True
对于嵌套字典,DataFrame会解释为外层为列,内层为行索引:
In [23]: dic = {"name":{"one":"liu","two":"rui"},"year":{"one":"23","two":"22"}} In [24]: frame3 = DataFrame(dic) In [25]: frame3 Out[25]: name year one liu 23 two rui 22
显示行,列名:
In [26]: frame3.index.name = "index" In [27]: frame3.columns.name = "state" In [28]: frame3 Out[28]: state name year index one liu 23 two rui 22
返回二维ndarray形式的数据:
In [29]: frame3.values Out[29]: array([['liu', '23'], ['rui', '22']], dtype=object)
3.索引对象
In [30]: obj = Series(range(3),index=["a","b","c"]) In [31]: index = obj.index In [32]: index Out[32]: Index(['a', 'b', 'c'], dtype='object')
index对象不可修改的,使得index在多个数据结构中可以共享
In [35]: index = pd.Index(np.arange(3)) In [36]: obj2 = Series([1.5,0.5,2],index=index) In [37]: obj2.index is index Out[37]: True