<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      Centos7__Scrapy + Scrapy_redis 用Docker 實現(xiàn)分布式爬蟲

      原理:其實就是用到redis的優(yōu)點及特性,好處自己查---

      1,scrapy 分布式爬蟲配置:

      settings.py

      BOT_NAME = 'first'
      
      SPIDER_MODULES = ['first.spiders']
      NEWSPIDER_MODULE = 'first.spiders'
      
      
      # Crawl responsibly by identifying yourself (and your website) on the user-agent
      #USER_AGENT = 'first (+http://www.yourdomain.com)'
      
      # Obey robots.txt rules
      ROBOTSTXT_OBEY = False
      
      # Configure a delay for requests for the same website (default: 0)
      # See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
      # See also autothrottle settings and docs
      #DOWNLOAD_DELAY = 3
      # The download delay setting will honor only one of:
      #CONCURRENT_REQUESTS_PER_DOMAIN = 16
      #CONCURRENT_REQUESTS_PER_IP = 16
      
      # Disable cookies (enabled by default)
      #COOKIES_ENABLED = False
      
      
      ITEM_PIPELINES = {
         # 'first.pipelines.FirstPipeline': 300,
         'scrapy_redis.pipelines.RedisPipeline':300,
         'first.pipelines.VideoPipeline': 100,
      }
      
      #分布式爬蟲
      #指定redis數(shù)據(jù)庫的連接參數(shù)
      REDIS_HOST = '172.17.0.2' 
      REDIS_PORT = 15672
      REDIS_ENCODING = 'utf-8'
      # REDIS_PARAMS ={
      #     'password': '123456',  # 服務器的redis對應密碼
      # }
      #使用了scrapy_redis的調(diào)度器,在redis里分配請求
      SCHEDULER = "scrapy_redis.scheduler.Scheduler"
      # # 確保所有爬蟲共享相同的去重指紋
      DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
      # Requests的調(diào)度策略,默認優(yōu)先級隊列
      SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.PriorityQueue'
      #(可選). 在redis中保持scrapy-redis用到的各個隊列,從而允許暫停和暫停后恢復,也就是不清理redis queues
      SCHEDULER_PERSIST = True

      spider.py

      # -*- coding: utf-8 -*-
      import scrapy
      import redis
      from ..items import Video
      from scrapy.http import Request
      from scrapy_redis.spiders import RedisSpider
      class VideoRedis(RedisSpider):
          name = 'video_redis'
          allowed_domains = ['zuidazy2.net']
          # start_urls = ['http://zuidazy2.net/']
          redis_key = 'zuidazy2:start_urls'
          def parse(self, response):
              item = Video()
              res= response.xpath('//div[@class="xing_vb"]/ul/li/span[@class="xing_vb4"]/a/text()').extract_first()
              url = response.xpath('//div[@class="xing_vb"]/ul/li/span[@class="xing_vb4"]/a/@href').extract_first()
              v_type = response.xpath('//div[@class="xing_vb"]/ul/li/span[@class="xing_vb5"]/text()').extract_first()
              u_time  = response.xpath('//div[@class="xing_vb"]/ul/li/span[@class="xing_vb6"]/text()').extract_first()
              if res is not None:
                  item['name'] = res
                  item['v_type'] = v_type
                  item['u_time'] = u_time
                  url = 'http://www.zuidazy2.net' + url
                  yield scrapy.Request(url, callback=self.info_data,meta={'item': item},dont_filter=True)
              next_link =  response.xpath('//div[@class="xing_vb"]/ul/li/div[@class="pages"]/a[last()-1]/@href').extract_first()
              if next_link:
                  yield scrapy.Request('http://www.zuidazy2.net'+next_link,callback=self.parse,dont_filter=True)
                  
          def info_data(self,data):
              item = data.meta['item']
              res = data.xpath('//div[@id="play_2"]/ul/li/text()').extract()
              if res:
                  item['url'] = res
              else:
                  item['url'] = ''
              yield item

      items.py

      import scrapy
      
      
      class FirstItem(scrapy.Item):
          # define the fields for your item here like:
          content = scrapy.Field()
      
      class Video(scrapy.Item):
          name = scrapy.Field()
          url = scrapy.Field()
          v_type = scrapy.Field()
          u_time = scrapy.Field()

      pipelines.py

      from pymysql import *
      import aiomysql
      class FirstPipeline(object):
          def process_item(self, item, spider):
              print('*'*30,item['content'])
              return item
        
      class VideoPipeline(object):
          def __init__(self): 
              self.conn = connect(host="39.99.37.85",port=3306,user="root",password="",database="queyou")
              self.cur = self.conn.cursor()
      
          def process_item(self, item, spider):
              
              # print(item)
              try:
                  # insert_sql = f"insert into video values(0,'{item['name']}','{item['url']}','{item['v_type']}','{item['u_time']}')"
                  # self.cur.execute(insert_sql)
                  # # self.cur.execute("insert into video values(0,'"+item['name']+"','"+item['url']+"','"+item['v_type']+"','"+item['u_time']+"')")
                  self.cur.execute("insert into video values(0,'"+item['name']+"','"+item['v_type']+"')")
                  self.conn.commit()
              except Exception as e:
                  print(e)
              finally:
                  return item
          def close_spider(self, spider):    
              self.conn.commit()  # 提交數(shù)據(jù)
              self.cur.close()
              self.conn.close()

      2,Docker 安裝和pull centos7 及安裝依賴  (跳過)

      3,cp 宿主機項目到 Centos7 容器中

      docker cp /root/first 35fa:/root

      4,安裝redis 及 開啟遠程連接服務

      具體在 /etc/redis.conf 中設置和開啟redis服務

      redis-server /etc/redis.conf

      5,Docker虛擬主機多個及建立通信容器

      運行容器
      docker run -tid --name 11  CONTAINER_ID 
      
      建立通信容器
      docker run  -tid --name 22 --link 11 CONTAINER_ID
      
      進入容器  docker attach CONTAINER_ID
      
      退出容器,不kill掉容器(通常exit會kill掉的) ctrl+q+p
      查看網(wǎng)絡情況 cat /etc/hosts

       

       5,開啟所有容器的spider 及定義start_url

       

       完工!

      附上打包好的鏡像:鏈接: https://pan.baidu.com/s/1Sj244da0pOZvL3SZ_qagUg 提取碼: qp23

      posted @ 2020-02-26 17:39  Xcsg  Views(623)  Comments(0)    收藏  舉報
      主站蜘蛛池模板: 国产成人高清亚洲综合| 亚洲深深色噜噜狠狠网站| 苗栗县| 国内精品久久久久电影院| 亚洲天堂网中文在线资源| 隔壁老王国产在线精品| 阜宁县| 国产精品va无码一区二区| 成码无人AV片在线电影网站| 手机| 国产色无码专区在线观看 | 精选国产av精选一区二区三区| 国产精品沙发午睡系列990531| 亚洲国产中文在线有精品| 中国女人熟毛茸茸A毛片| 国产真人无遮挡免费视频| 日本一区二区三区18岁| 在线天堂中文新版www| 欧美成人午夜在线观看视频| 人妻伦理在线一二三区| 97精品亚成在人线免视频| 成人免费无遮挡无码黄漫视频| 日本人一区二区在线观看| 一本色道久久东京热| 久久久久国色av免费观看性色| 亚洲综合伊人久久大杳蕉| 少妇午夜福利一区二区三区| 日韩在线成年视频人网站观看 | 中文字幕国产精品自拍| 久久国内精品一国内精品| 中文字幕永久精品国产| 男人的天堂av社区在线| 青青草国产自产一区二区| 精品久久精品午夜精品久久| 亚洲国产精品综合久久20| 亚洲18禁一区二区三区| 国产精品亚洲片夜色在线| 亚洲情综合五月天| 亚洲av永久无码精品网站| 亚洲av无码一区二区三区网站| 久久久久久久一线毛片|