08多任務(wù)爬蟲

多任務(wù)爬蟲

線程進(jìn)程回顧

實(shí)現(xiàn)多任務(wù)爬蟲的方式：多進(jìn)程/多線程

進(jìn)程：是計(jì)算機(jī)當(dāng)中最小的資源分配單位

線程：是計(jì)算機(jī)當(dāng)中可以被CPU調(diào)度的最小單位

我們執(zhí)行一個(gè)python代碼的時(shí)候，在計(jì)算機(jī)的內(nèi)部會(huì)創(chuàng)建一個(gè)進(jìn)程，在進(jìn)程當(dāng)中會(huì)創(chuàng)建一個(gè)線程，代碼是由線程去執(zhí)行的

創(chuàng)建進(jìn)程/線程

import time
from multiprocessing import Process


# 1. 將需要子進(jìn)程執(zhí)行的事情寫在一個(gè)函數(shù)里面
def func1():
    print('開始執(zhí)行')
    time.sleep(3)
    print('執(zhí)行結(jié)束')

if __name__ == '__main__':
    # 2. 通過Process類創(chuàng)建一個(gè)進(jìn)程對(duì)象 關(guān)聯(lián)函數(shù)
    p1 = Process(target=func1)
    # 3. 通過進(jìn)程對(duì)象調(diào)用start方法
    p1.start()

class MyProcess(Process):
    def __init__(self, value):
        super(MyProcess, self).__init__()
        self.value = value

    def run(self):
        print('開始執(zhí)行')
        print(self.value)
        time.sleep(3)
        print('執(zhí)行結(jié)束')
        
        
if __name__ == '__main__':
    p1 = MyProcess("jcx")
    p1.start()

進(jìn)程特點(diǎn)

我們開啟的進(jìn)程是原來進(jìn)程的子進(jìn)程，子進(jìn)程復(fù)制父進(jìn)程全部的代碼
子進(jìn)程只運(yùn)行target函數(shù)，其他的都是父進(jìn)程執(zhí)行的內(nèi)容，但是子進(jìn)程也有父進(jìn)程有的資源
進(jìn)程之間執(zhí)行互不影響，也沒有先后順序

一個(gè)進(jìn)程中所有線程是共享進(jìn)程資源的

GIL:全局解釋器鎖

GIL是cpython特點(diǎn) 而不是python的特點(diǎn)
我們安裝python都是cpython版本
要一個(gè)進(jìn)程在同一時(shí)刻是有一個(gè)線程能被cpu調(diào)用
如果要想使用計(jì)算機(jī)的多核優(yōu)勢(shì)的話我們就要使用進(jìn)程
如果不需要使用計(jì)算機(jī)的多核優(yōu)勢(shì)的話我們就要使用線程

lol英雄皮膚

import os

import requests
from concurrent.futures import ThreadPoolExecutor


class LOLImageSpider:
    def __init__(self):
        self.hero_list_url = 'https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js?ts=2795830'
        self.hero_info_url = 'https://game.gtimg.cn/images/lol/act/img/js/hero/{}.js?ts=2795830'
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 2.0.50727; SLCC2; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; Tablet PC 2.0; .NET4.0E)'
        }
        self.pool = ThreadPoolExecutor(10)

    def get_hero_url(self, url):
        """獲取到英雄id，獲取到英雄皮膚的url"""
        json_data = requests.get(url, headers=self.headers).json()

        hero_list = json_data['hero']

        for hero in hero_list:
            hero_id = hero['heroId']
            info_url = self.hero_info_url.format(hero_id)
            response = requests.get(info_url, headers=self.headers).json()
            self.parse_data(response)

    def parse_data(self, response):
        """獲取圖片鏈接"""
        skins_list = response["skins"]
        for skins in skins_list:
            name = skins['name']
            main_img = skins['mainImg']
            if main_img:
                self.pool.submit(self.save, name, main_img)

    def save(self, name, main_img):
        """圖片存儲(chǔ)的方法"""
        img = requests.get(main_img, headers=self.headers).content
        filename = 'lol1'
        # 如果當(dāng)前文件同級(jí)的目錄下 沒有一個(gè)文件夾叫l(wèi)ol
        if not os.path.exists(filename):
            # 幫我們創(chuàng)建這個(gè)文件夾
            os.mkdir(filename)

        # lol/xx.jpg
        with open(filename + f"/{name.replace('/', '')}.jpg", 'wb')as f:
            f.write(img)
        print(name, main_img)

    def run(self):
        self.get_hero_url(self.hero_list_url)


if __name__ == '__main__':
    spider = LOLImageSpider()
    spider.run()

posted @ 2023-04-05 00:47 LePenseur 閱讀(29) 評(píng)論(0) 收藏舉報(bào)

刷新頁面返回頂部

woshijcx