pandas知识点(数据结构)

发布时间:2019-04-09 21:42:19编辑:auto阅读(1772)

    1.Series
    生成一维数组,左边索引,右边值:
    In [3]: obj = Series([1,2,3,4,5])
    In [4]: obj
    Out[4]:
    0    1
    1    2
    2    3
    3    4
    4    5
    dtype: int64
    In [5]: obj.values
    Out[5]: array([1, 2, 3, 4, 5], dtype=int64)
    In [6]: obj.index
    Out[6]: RangeIndex(start=0, stop=5, step=1)

     

    创建对各个数据点进行标记的索引:

    In [7]: obj2 = Series([4,1,9,7], index=["a","c","e","ff"])
    In [8]: obj2
    Out[8]:
    a     4
    c     1
    e     9
    ff    7
    dtype: int64
    In [9]: obj2.index
    Out[9]: Index(['a', 'c', 'e', 'ff'], dtype='object')

     

    取一个值或一组值:

    In [10]: obj2["c"]
    Out[10]: 1
    In [11]: obj2[["c","e"]]
    Out[11]:
    c    1
    e    9
    dtype: int64

     

    数组运算,会显示索引:

    In [12]: obj2[obj2>3]
    Out[12]:
    a     4
    e     9
    ff    7
    dtype: int64

     

    Series还可以看作有序的字典,很多字典操作可以使用:
    In [13]: "c" in obj2
    Out[13]: True

     

    直接用字典创建Series:
    In [14]: data = {"name":"liu","year":18,"sex":"man"}
    In [15]: obj3 = Series(data)
    In [16]: obj3
    Out[16]:
    name    liu
    year     18
    sex     man
    dtype: object

     

    用字典结合列表创建Series:
    In [17]: list1 = ["name","year","mobile"]
    In [18]: obj4 = Series(data,index=list1)
    In [19]: obj4
    Out[19]:
    name      liu
    year       18
    mobile    NaN
    dtype: object

    PS:因为data字典中没有mobile所以值为NaN

     
    检测数据是否缺失:
    In [20]: pd.isnull(obj4)
    Out[20]:
    name      False
    year      False
    mobile     True
    dtype: bool
     
    In [21]: pd.notnull(obj4)
    Out[21]:
    name       True
    year       True
    mobile    False
    dtype: bool
     
    In [22]: obj4.isnull()
    Out[22]:
    name      False
    year      False
    mobile     True
    dtype: bool
     
    In [23]: obj4.notnull()
    Out[23]:
    name       True
    year       True
    mobile    False
    dtype: bool

     

    Series的name属性:
    In [7]: obj4.name = "hahaha"
    In [8]: obj4.index.name = "state"
    In [9]: obj4
    Out[9]:
    state
    name      liu
    year       18
    mobile    NaN
    Name: hahaha, dtype: object

     

    2.DataFrame
    构建DataFrame
    In [13]: data = {
    "state":[1,1,2,1,1],
    "year":[2000,2001,2002,2004,2005],
    "pop":[1.5,1.7,3.6,2.4,2.9]
    }
    In [14]: frame = DataFrame(data)
    In [15]: frame
    Out[15]:
       state  year  pop
    0      1  2000  1.5
    1      1  2001  1.7
    2      2  2002  3.6
    3      1  2004  2.4
    4      1  2005  2.9

     

    设定行与列的名称,如果数据找不到则产生NA值:
    In [18]: frame2 = DataFrame(
    data,
    columns=["year","state","pop","debt"],
    index=["one","two","three","four","five"]
    )
    In [19]: frame2
    Out[19]:
           year  state  pop debt
    one    2000      1  1.5  NaN
    two    2001      1  1.7  NaN
    three  2002      2  3.6  NaN
    four   2004      1  2.4  NaN
    five   2005      1  2.9  NaN

     

    将DataFrame的列获取成为Series:
    In [7]: frame2.year
    Out[7]:
    one      2000
    two      2001
    three    2002
    four     2004
    five     2005
    Name: year, dtype: int64

    PS:返回的索引不变,且name属性被设置了

     

    获取行:
    In [11]: frame2.loc["three"]
    Out[11]:
    year     2002
    state       2
    pop       3.6
    debt      NaN
    Name: three, dtype: object

     

    赋值列:
    In [12]: frame2['debt'] = 16.5
    In [13]: frame2
    Out[13]:
           year  state  pop  debt
    one    2000      1  1.5  16.5
    two    2001      1  1.7  16.5
    three  2002      2  3.6  16.5
    four   2004      1  2.4  16.5
    five   2005      1  2.9  16.5

     

    如果赋值列表或数组,长度需要相等;如果赋值Series,则精确匹配索引
    In [17]: val = Series([1.2,1.5,1.7], index=["two","four","five"])
    In [18]: frame2['debt'] = val
    In [19]: frame2
    Out[19]:
           year  state  pop  debt
    one    2000      1  1.5   NaN
    two    2001      1  1.7   1.2
    three  2002      2  3.6   NaN
    four   2004      1  2.4   1.5
    five   2005      1  2.9   1.7

     

    如果列不存在,则创建:
    In [21]: frame2["eastern"] = frame2.state == 1
    In [22]: frame2
    Out[22]:
           year  state  pop  debt  eastern
    one    2000      1  1.5   NaN     True
    two    2001      1  1.7   1.2     True
    three  2002      2  3.6   NaN    False
    four   2004      1  2.4   1.5     True
    five   2005      1  2.9   1.7     True

     

    对于嵌套字典,DataFrame会解释为外层为列,内层为行索引:
    In [23]: dic = {"name":{"one":"liu","two":"rui"},"year":{"one":"23","two":"22"}}
    In [24]: frame3 = DataFrame(dic)
    In [25]: frame3
    Out[25]:
        name year
    one  liu   23
    two  rui   22

     

    显示行,列名:
    In [26]: frame3.index.name = "index"
    In [27]: frame3.columns.name = "state"
    In [28]: frame3
    Out[28]:
    state name year
    index
    one    liu   23
    two    rui   22

     

    返回二维ndarray形式的数据:
    In [29]: frame3.values
    Out[29]:
    array([['liu', '23'],
           ['rui', '22']], dtype=object)

     

    3.索引对象
    In [30]: obj = Series(range(3),index=["a","b","c"])
    In [31]: index = obj.index
    In [32]: index
    Out[32]: Index(['a', 'b', 'c'], dtype='object')

     

    index对象不可修改的,使得index在多个数据结构中可以共享
    In [35]: index = pd.Index(np.arange(3))
    In [36]: obj2 = Series([1.5,0.5,2],index=index)
    In [37]: obj2.index is index
    Out[37]: True
     

     

关键字

上一篇: Linux下安装Python

下一篇: 操作文件方法