Python(Head First)學習筆記:五
5 推導數據:處理數據、格式、編碼、解碼、排序
處理數據:從Head First Python 上下載資源文件,即:james.txt,julie.txt,mikey.txt,sarah.txt。
實例一:打開以上文件,將數據提取到列表中
>>> with open('james.txt') as jaf: data = jaf.readline() james = data.strip().split(',') with open('julie.txt')as juf: data = juf.readline() julie = data.strip().split(',') >>> print(james) ['2-34', '3:21', '2.34', '2.45', '3.01', '2:01', '2:01', '3:10', '2-22'] >>> print(julie) ['2.59', '2.11', '2:11', '2:23', '3-10', '2-23', '3:10', '3.21', '3-21'] >>> with open('mikey.txt')as mif: data = mif.readline() mikey = data.strip().split(',') with open('sarah.txt')as saf: data = saf.readline() sarah = data.strip().split(',') >>> print(mikey) ['2:22', '3.01', '3:01', '3.02', '3:02', '3.02', '3:22', '2.49', '2:38'] >>> print(sarah) ['2:58', '2.58', '2:39', '2-25', '2-55', '2:54', '2.18', '2:55', '2:55']
data.strip().split(','),這種形式的代碼段叫方法串鏈(method chaining)。
第一個方法strip()應用到data中的數據行,這個方法會去除字符串中不想要的空白符;
第二個方法split(',')會創建一個列表;
采用這種方法可以把多個方法串鏈接在一起,生成所需要的結果。從左到右讀。
排序:有兩種方式
一、原地排序(In-plice sorting)
使用sort()方法,新生成的數據會替代原來的數據;
二、復制排序(Copied sorting)
保留原來的數據,然后新生成一個排序后的數據;
>>> data2=[6,3,1,2,4,5]
>>> data2
[6, 3, 1, 2, 4, 5]
>>> sorted(data2)
[1, 2, 3, 4, 5, 6]
>>> data2
[6, 3, 1, 2, 4, 5]
>>> data3=sorted(data2)
>>> data3
[1, 2, 3, 4, 5, 6]
>>> data1=[2,4,6,5,1,3]
>>> data1.sort()
>>> data1
[1, 2, 3, 4, 5, 6]
使用print(sorted(data))來輸出之前的james,julie,mikey,sarah列表,如下:
>>> print(sorted(james))
['2-22', '2-34', '2.34', '2.45', '2:01', '2:01', '3.01', '3:10', '3:21']
>>> print(sorted(julie))
['2-23', '2.11', '2.59', '2:11', '2:23', '3-10', '3-21', '3.21', '3:10']
>>> print(sorted(mikey))
['2.49', '2:22', '2:38', '3.01', '3.02', '3.02', '3:01', '3:02', '3:22']
>>> print(sorted(sarah))
['2-25', '2-55', '2.18', '2.58', '2:39', '2:54', '2:55', '2:55', '2:58']
會發現,排序并不正確,目標是從左到右,從小到大。
仔細看,發現有'-',':','.'這些符號,因為符號不統一,所以會影響排序。
接下來,創建一個函數,名為:sanitize(),作用是:從各個選手的列表接收一個字符串,
然后處理這個字符串,將找到的'-'和':'替換為'.'并返回清理過的字符串,此外如果字符串
本身已經包含'.',那么就不需要在做清理工作了。
函數代碼如下:
>>> def sanitize(time_string):
if'-'in time_string:
splitter='-'
elif ':'in time_string:
splitter=':'
else:
return(time_string)
(mins,secs)=time_string.split(splitter)
return(mins+'.'+secs)
實例二:接下來實現正確排序上面四個文件生成的列表
>>> with open('james.txt') as jaf: data = jaf.readline() james=data.strip().split(',') with open('julie.txt')as juf: data = juf.readline() julie=data.strip().split(',') with open('mikey.txt')as mif: data = mif.readline() mikey=data.strip().split(',') with open('sarah.txt')as saf: data = saf.readline() sarah=data.strip().split(',') clean_james=[] clean_julie=[] clean_mikey=[] clean_sarah=[] for each_t in james: clean_james.append(sanitize(each_t)) for each_t in julie: clean_julie.append(sanitize(each_t)) for each_t in mikey: clean_mikey.append(sanitize(each_t)) for each_t in sarah: clean_sarah.append(sanitize(each_t)) >>> print(clean_james) ['2.34', '3.21', '2.34', '2.45', '3.01', '2.01', '2.01', '3.10', '2.22'] >>> print(clean_julie) ['2.59', '2.11', '2.11', '2.23', '3.10', '2.23', '3.10', '3.21', '3.21'] >>> print(clean_mikey) ['2.22', '3.01', '3.01', '3.02', '3.02', '3.02', '3.22', '2.49', '2.38'] >>> print(clean_sarah) ['2.58', '2.58', '2.39', '2.25', '2.55', '2.54', '2.18', '2.55', '2.55']
重新排序如下:
>>> print(sorted(clean_james))
['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']
>>> print(sorted(clean_julie))
['2.11', '2.11', '2.23', '2.23', '2.59', '3.10', '3.10', '3.21', '3.21']
>>> print(sorted(clean_mikey))
['2.22', '2.38', '2.49', '3.01', '3.01', '3.02', '3.02', '3.02', '3.22']
>>> print(sorted(clean_sarah))
['2.18', '2.25', '2.39', '2.54', '2.55', '2.55', '2.55', '2.58', '2.58']
推導列表
>>> print(sorted([sanitize(t)for t in james]))
['2.01', '2.01', '2.22', '2.34', '2.34', '2.45', '3.01', '3.10', '3.21']
>>> print(sorted([sanitize(t)for t in julie]))
['2.11', '2.11', '2.23', '2.23', '2.59', '3.10', '3.10', '3.21', '3.21']
>>> print(sorted([sanitize(t)for t in mikey]))
['2.22', '2.38', '2.49', '3.01', '3.01', '3.02', '3.02', '3.02', '3.22']
>>> print(sorted([sanitize(t)for t in sarah]))
['2.18', '2.25', '2.39', '2.54', '2.55', '2.55', '2.55', '2.58', '2.58']
Python的列表推導是這種語言支持函數編程概念的一個例子。
列表推導的妙處:通過使用列表推導可以大幅減少需要維護的代碼。
迭代刪除重復項:
>>> unique_james=[]
>>> for each_t in james:
if each_t not in unique_james:
unique_james.append(each_t)
>>> print(unique_james[0:3])
['2-34', '3:21', '2.34']
通過not in操作符來濾除列表中的重復項。
用集合刪除重復項:
通過set()可以創建一個新的集合,屬于“工廠函數”,用于創建某種類型的新的數據項。
重新定義函數,精簡代碼,將數據返回代碼前完成分解/去除空白符處理。
>>> unique_james=[] >>> for each_t in james: if each_t not in unique_james: unique_james.append(each_t) >>> print(unique_james[0:3]) ['2-34', '3:21', '2.34'] >>> def get_coach_data(filename): try: with open(filename)as f: data=f.readline() return(data.strip().split(',')) except IOError as ioerr: print('File error: '+ str(ioerr)) return(None) >>> sarah1 = get_coach_data('sarah.txt') >>> print(sorted(set([sanitize(t)for t in james]))[0:3]) ['2.01', '2.22', '2.34']
>>> print(sarah1)
['2:58', '2.58', '2:39', '2-25', '2-55', '2:54', '2.18', '2:55', '2:55']
>>> print(sorted(set([sanitize(t)for t in sarah1]))[0:3])
['2.18', '2.25', '2.39']
函數串鏈:如 print(sorted(set([sanitize(t)for t in sarah1]))[0:3]),需要從右往左讀,和方法串鏈正好相反。
本質上是一堆函數的嵌套操作。
總結
Python術語:1原地排序:轉換然后替換;
2復制排序:轉換然后返回;
3方法串鏈:對數據應用一組方法;
4函數串鏈:對數據應用一組函數;
5列表推導:在一行上指定一個轉換;
6分片:從一個列表,訪問多個列表項;
7集合:一組無需的數據項,其中不包含重復項。
具體方法:1 sort():原地排序;
2 sorted():復制排序;
3 對于以下代碼:
new=[]
for t in old:
new.append(len(t))
可以用列表推導代替:new=[len(t) for t in old];
4 分片:使用my_list[3:6]可以訪問列表my_list從索引位置3到索引位置6的列表數據項;
5 使用set()工廠方法可以創建一個集合。
------------------------------------------------The End of Fifth Chapter------------------------------------------------

浙公網安備 33010602011771號