Python(Head First)學習筆記：五

5 推導數據：處理數據、格式、編碼、解碼、排序

　　處理數據：從Head First Python 上下載資源文件，即：james.txt，julie.txt，mikey.txt，sarah.txt。

　　實例一：打開以上文件，將數據提取到列表中

>>> with open('james.txt') as jaf:
    data = jaf.readline()
    james = data.strip().split(',')
    with open('julie.txt')as juf:
        data = juf.readline()
        julie = data.strip().split(',')

        
>>> print(james)
['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22']
>>> print(julie)
['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21']
>>> with open('mikey.txt')as mif:
    data = mif.readline()
    mikey = data.strip().split(',')
    with open('sarah.txt')as saf:
        data = saf.readline()
        sarah = data.strip().split(',')

        
>>> print(mikey)
['2:22', '3.01', '3:01', '3.02', '3:02', '3.02', '3:22', '2.49', '2:38']
>>> print(sarah)
['2:58', '2.58', '2:39', '2-25', '2-55', '2:54', '2.18', '2:55', '2:55']

View Code

　　data.strip().split(',')，這種形式的代碼段叫方法串鏈（method chaining）。

　　第一個方法strip()應用到data中的數據行，這個方法會去除字符串中不想要的空白符；

　　第二個方法split(',')會創建一個列表；

　　采用這種方法可以把多個方法串鏈接在一起，生成所需要的結果。從左到右讀。

　　排序：有兩種方式

　　　　一、原地排序(In-plice sorting)

　　　　　　使用sort()方法，新生成的數據會替代原來的數據；

　　　　二、復制排序(Copied sorting)

　　　　　　保留原來的數據，然后新生成一個排序后的數據；

>>> data2=[6,3,1,2,4,5]
>>> data2
[6, 3, 1, 2, 4, 5]

>>> sorted(data2)
[1, 2, 3, 4, 5, 6]
>>> data2
[6, 3, 1, 2, 4, 5]
>>> data3=sorted(data2)
>>> data3
[1, 2, 3, 4, 5, 6]
>>> data1=[2,4,6,5,1,3]
>>> data1.sort()
>>> data1
[1, 2, 3, 4, 5, 6]

　　使用print(sorted(data))來輸出之前的james,julie,mikey,sarah列表，如下：

>>> print(sorted(james))
['2-22', '2-34', '2.34', '2.45', '2:01', '2:01', '3.01', '3:10', '3:21']
>>> print(sorted(julie))
['2-23', '2.11', '2.59', '2:11', '2:23', '3-10', '3-21', '3.21', '3:10']
>>> print(sorted(mikey))
['2.49', '2:22', '2:38', '3.01', '3.02', '3.02', '3:01', '3:02', '3:22']
>>> print(sorted(sarah))
['2-25', '2-55', '2.18', '2.58', '2:39', '2:54', '2:55', '2:55', '2:58']

　　會發現，排序并不正確，目標是從左到右，從小到大。

　　仔細看，發現有'-'，':'，'.'這些符號，因為符號不統一，所以會影響排序。

　　接下來，創建一個函數，名為：sanitize()，作用是：從各個選手的列表接收一個字符串，

　　然后處理這個字符串，將找到的'-'和':'替換為'.'并返回清理過的字符串，此外如果字符串

　　本身已經包含'.'，那么就不需要在做清理工作了。

　　　　函數代碼如下：

>>> def sanitize(time_string):
　　　　if'-'in time_string:
　　　　　　splitter='-'
　　　　elif ':'in time_string:
　　　　　　splitter=':'
　　　　else:
　　　　　　return(time_string)
　　　　(mins,secs)=time_string.split(splitter)
　　　　return(mins+'.'+secs)

　　　　實例二：接下來實現正確排序上面四個文件生成的列表

>>> with open('james.txt') as jaf:
    data = jaf.readline()
    james=data.strip().split(',')
    with open('julie.txt')as juf:
        data = juf.readline()
        julie=data.strip().split(',')
    with open('mikey.txt')as mif:
        data = mif.readline()
        mikey=data.strip().split(',')
    with open('sarah.txt')as saf:
        data = saf.readline()
        sarah=data.strip().split(',')
    clean_james=[]
    clean_julie=[]
    clean_mikey=[]
    clean_sarah=[]
    for each_t in james:
        clean_james.append(sanitize(each_t))
    for each_t in julie:
        clean_julie.append(sanitize(each_t))
    for each_t in mikey:
        clean_mikey.append(sanitize(each_t))
    for each_t in sarah:
        clean_sarah.append(sanitize(each_t))

        
>>> print(clean_james)
['2.34', '3.21', '2.34', '2.45', '3.01', '2.01', '2.01', '3.10', '2.22']
>>> print(clean_julie)
['2.59', '2.11', '2.11', '2.23', '3.10', '2.23', '3.10', '3.21', '3.21']
>>> print(clean_mikey)
['2.22', '3.01', '3.01', '3.02', '3.02', '3.02', '3.22', '2.49', '2.38']
>>> print(clean_sarah)
['2.58', '2.58', '2.39', '2.25', '2.55', '2.54', '2.18', '2.55', '2.55']

View Code

　　重新排序如下：　

>>> print(sorted(clean_james))
['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']
>>> print(sorted(clean_julie))
['2.11', '2.11', '2.23', '2.23', '2.59', '3.10', '3.10', '3.21', '3.21']
>>> print(sorted(clean_mikey))
['2.22', '2.38', '2.49', '3.01', '3.01', '3.02', '3.02', '3.02', '3.22']
>>> print(sorted(clean_sarah))
['2.18', '2.25', '2.39', '2.54', '2.55', '2.55', '2.55', '2.58', '2.58']

　　推導列表　　　　　　　

>>> print(sorted([sanitize(t)for t in james]))
['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']
>>> print(sorted([sanitize(t)for t in julie]))
['2.11', '2.11', '2.23', '2.23', '2.59', '3.10', '3.10', '3.21', '3.21']
>>> print(sorted([sanitize(t)for t in mikey]))
['2.22', '2.38', '2.49', '3.01', '3.01', '3.02', '3.02', '3.02', '3.22']
>>> print(sorted([sanitize(t)for t in sarah]))
['2.18', '2.25', '2.39', '2.54', '2.55', '2.55', '2.55', '2.58', '2.58']

　　Python的列表推導是這種語言支持函數編程概念的一個例子。

　　列表推導的妙處：通過使用列表推導可以大幅減少需要維護的代碼。

　　迭代刪除重復項：

>>> unique_james=[]
>>> for each_t in james:
　　　　if each_t not in unique_james:
　　　　　　unique_james.append(each_t)

　　　　　　>>> print(unique_james[0:3])
　　　　　　　　　　['2-34', '3:21', '2.34']

　　　　　　　　通過not in操作符來濾除列表中的重復項。

　　用集合刪除重復項：

　　　　　　通過set()可以創建一個新的集合，屬于“工廠函數”，用于創建某種類型的新的數據項。　　

　　重新定義函數，精簡代碼，將數據返回代碼前完成分解/去除空白符處理。

>>> unique_james=[]
>>> for each_t in james:
    if each_t not in unique_james:
        unique_james.append(each_t)

    
>>> print(unique_james[0:3])
['2-34', '3:21', '2.34']
>>> def get_coach_data(filename):
    try:
        with open(filename)as f:
            data=f.readline()
        return(data.strip().split(','))
    except IOError as ioerr:
        print('File error: '+ str(ioerr))
        return(None)

    
>>> sarah1 = get_coach_data('sarah.txt')
>>> print(sorted(set([sanitize(t)for t in james]))[0:3])
['2.01', '2.22', '2.34']

View Code

>>> print(sarah1)
['2:58', '2.58', '2:39', '2-25', '2-55', '2:54', '2.18', '2:55', '2:55']
>>> print(sorted(set([sanitize(t)for t in sarah1]))[0:3])
['2.18', '2.25', '2.39']

　　函數串鏈：如 print(sorted(set([sanitize(t)for t in sarah1]))[0:3])，需要從右往左讀，和方法串鏈正好相反。

　　本質上是一堆函數的嵌套操作。

　　總結

　　　　Python術語：1原地排序：轉換然后替換；

　　　　　　　　　　 2復制排序：轉換然后返回；

　　　　　　　　　　 3方法串鏈：對數據應用一組方法；

　　　　　　　　　　 4函數串鏈：對數據應用一組函數；

　　　　　　　　　　 5列表推導：在一行上指定一個轉換；

　　　　　　　　　　 6分片：從一個列表，訪問多個列表項；

　　　　　　　　　　 7集合：一組無需的數據項，其中不包含重復項。

　　　　具體方法：1 sort()：原地排序；

　　　　　　　　　2 sorted()：復制排序；

　　　　　　　　　3 對于以下代碼：

　　　　　　　　　　 new=[]

　　　　　　　　　　for t in old:

　　　　　　　　　　　　new.append(len(t))

　　　　　　　　　　可以用列表推導代替：new=[len(t) for t in old]；

　　　　　　　　　4 分片：使用my_list[3:6]可以訪問列表my_list從索引位置3到索引位置6的列表數據項；　　　　　　　　　

　　　　　　　　　5 使用set()工廠方法可以創建一個集合。

------------------------------------------------The End of Fifth Chapter------------------------------------------------

posted @ 2017-09-25 14:12 Blog_WHP 閱讀(1273) 評論(0) 收藏舉報

刷新頁面返回頂部

Blog_WHP

Python(Head First)學習筆記：五

公告