爬取 豆瓣電影Top250
目標
學習爬蟲,爬豆瓣榜單,獲取爬取靜態頁面信息的能力
豆瓣電影 Top 250 https://movie.douban.com/top250

代碼
import requests
from bs4 import BeautifulSoup
def getHTMLText(url):
try:
r = requests.get(url,timeout=30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
return '產生異常'
if __name__ == '__main__':
i = 0
urls = ['https://movie.douban.com/top250?start='+str(n)+'&filter=' for n in range(0,250,25)]
for url in urls:
r = getHTMLText(url)
soup = BeautifulSoup(r,'html.parser')
titles = soup.select('div.hd a')
rates = soup.select('span.rating_num')
pics = soup.select('img[width="100"]')
for title,rate,pic in zip(titles,rates,pics):
data={'title':list(title.stripped_strings),
'rate':rate.get_text(),
'pic':pic.get('src')}
i+=1
fileName=str(i)+'_'+data['title'][0]+' '+data['rate']+'分.jpg'
pic1 = requests.get(data['pic'])
with open('G:\\test\\'+fileName,'wb') as photo:
photo.write(pic1.content)
print(data)
爬取結果

作者:九命貓幺
博客出處:http://www.rzrgm.cn/yongestcat/
歡迎轉載,轉載請標明出處。
如果你覺得本文還不錯,對你的學習帶來了些許幫助,請幫忙點擊右下角的推薦
博客出處:http://www.rzrgm.cn/yongestcat/
歡迎轉載,轉載請標明出處。
如果你覺得本文還不錯,對你的學習帶來了些許幫助,請幫忙點擊右下角的推薦

浙公網安備 33010602011771號