Selenium模塊

作用

便捷的獲取頁面中動態加載的數據
便捷的模擬登錄

簡單使用

環境安裝：

pip install selenium

根據瀏覽器版本下載web驅動：http://npm.taobao.org/mirrors/chromedriver （谷歌）

Selenium的使用流程：

# 實例化web驅動 綁定對應的驅動程序   driverpath為本地驅動程序的路徑
web = webdriver.Chrome(executable_path="driverpath")

# 發起請求
web.get("url")

# 獲取頁面數據
page_text = web.page_source

# 關閉瀏覽器
web.quit()

Selenium的方法:

獲取標簽：

find方法：

# ============ find方法

# 根據id、 ClassName、TagName查找元素  參數: (by=id/ClassName/TagName查找元素,value="")
web.find_element()
web.find_elements()

# === 根據名字查找
# 根據查找標簽
web.find_element_by_name()
web.find_elements_by_name()
# 根據屬性類名查找標簽
web.find_element_by_class_name()
web.find_elements_by_class_name()
# 根據標簽名查找標簽
web.find_element_by_tag_name()
web.find_elements_by_tag_name()

# 根據id名查找標簽
web.find_element_by_id()
web.find_elements_by_id()


# 根據超鏈接內容查找標簽
web.find_element_by_link_text()
web.find_elements_by_link_text()

# 根據超鏈接內容查找標簽
web.find_element_by_partial_link_text()
web.find_elements_by_partial_link_text()

# 根據xpath查找標簽
web.find_element_by_xpath('//div/td[1]')
web.find_elements_by_xpath('//div/td[1]')

頁面操作：

前進： web.forword()

后退： web.back()

應用實例：

from selenium import webdriver
from lxml import etree
import time
# 實例化web驅動 綁定對應的驅動程序
web = webdriver.Chrome(executable_path="./chromedriver")
# 發起請求
web.get("http://125.35.6.84:81/xk")
# 獲取頁面數據
page_text = web.page_source
# 解析企業信息
tree = etree.HTML(page_text)
li_list = tree.xpath("//ul[@id='gzlist']/li")

for li in li_list:
    name = li.xpath('./dl/@title')[0]
    print(name)

time.sleep(2)

web.get("https://www.baidu.com/")
time.sleep(2)
# 返回瀏覽器上個頁面
web.back()
time.sleep(2)

# 前進瀏覽器下個頁面
web.forward()
time.sleep(3)
# 獲取百度首頁輸入框
search_input = web.find_element_by_id("kw")
# 往輸入框添加內容
search_input.send_keys("美女")
time.sleep(1)
# 獲取搜索按鈕并點擊進行搜索
search = web.find_element_by_css_selector(".s_btn")
search.click()

time.sleep(5)
# 關閉瀏覽器
web.quit()

Iframe和動作鏈處理

什么是Iframe？

Iframe是用于前端頁面之間相互嵌套的一種方法,格式如下：

<div id="iframewrapper">
　　<iframe frameborder="0" id="iframeResult" style="height: 302.6px;">
　　　　<html>
　　　　　　<head>
　　　　　　</head>
　　　　　　<body>
　　　　　　</body>
　　　　</html>
　　</iframe>
</div>

在Selenium中處理"Iframe"中的標簽，步驟如下：

切換作用域：處理"Iframe"中的標簽，必須先將作用域切換到“Iframe”：

from selenium import webdriver
from selenium.webdriver import ActionChains
from lxml import etree
import time
# 實例化web驅動 綁定對應的驅動程序
web = webdriver.Chrome(executable_path="./chromedriver")
# 發起請求
web.get("http://125.35.6.84:81/xk")
# 切換到Iframe作用域
web.switch_to.frame("iframe標簽的id")

動作鏈處理

1、導入ActionChains模塊，實例化action對象

2、創建一個動作綁定作用的標簽

3、調用方法執行動作的操作行為

4、釋放動作

from selenium import webdriver
from selenium.webdriver import ActionChains
from lxml import etree
import time
# 實例化web驅動 綁定對應的驅動程序
web = webdriver.Chrome(executable_path="./chromedriver")
# 發起請求
web.get("http://125.35.6.84:81/xk")
# 切換到Iframe作用域
web.switch_to.frame("iframe標簽的id")
# 實例化動作連對象
action = ActionChains(web)
# 使用動作
action.click_and_hold("需要處理的Iframe中的標簽")
# 將標簽進行移動 xoffset：水平方向距離  yoffset:垂直方向距離  perform:表示立即執行該動作的操作
action.move_by_offset(xoffset="",yoffset=).perform()
# 釋放action
action.release()

qq空間登錄實例

from selenium import webdriver
import time

# 實例化瀏覽器對象 傳入對應驅動
web = webdriver.Chrome(executable_path="./chromedriver")

# 發起請求
web.get("https://qzone.qq.com/")

# 切換到Iframe作用域
web.switch_to.frame("login_frame")

time.sleep(2)
# 選擇用戶名密碼登錄
username_login = web.find_element_by_id("switcher_plogin")
username_login.click()

time.sleep(2)
# 獲取用戶名、密碼輸入框 并傳入內容
username_tag = web.find_element_by_id("u")
password_tag = web.find_element_by_id("p")
username_tag.send_keys("122342423")
time.sleep(1)
password_tag.send_keys("122342423")
time.sleep(2)

# 獲取登錄按鈕 點擊登錄
login_btn = web.find_element_by_id("login_button")
login_btn.click()

time.sleep(2)

# 關閉瀏覽器
web.quit()

無頭瀏覽器和規避Selenium檢測

無頭瀏覽器

對于爬蟲程序來說，我們不希望見到執行程序之后彈出一個瀏覽器頁面，只需讓它默默執行爬取操作。

那如何不讓其顯示界面呢？

1、導入模塊：

# 實現無可視化界面（無頭瀏覽器）
from selenium.webdriver.chrome.options import Options

2、配置參數：

# 實例化Options對象
option = Options()
# 添加參數
option.add_argument("--headless")
option.add_argument("--disable-gpu")

3、將option傳入瀏覽器對象中

# 實例化瀏覽器對象 傳入對應驅動
web = webdriver.Chrome(executable_path="./chromedriver", chrome_options=option,)

規避Selenium檢測

隨著使用Selenium的熱度節節攀升，許多門戶網站為了不讓我們隨意爬取頁面動態數據，都對網站做了Selenium檢測機制，也就是反爬機制

正所謂上有政策，下有對策，反檢測策略也就應運而生了：

# 規避Selenium檢測
from selenium.webdriver import ChromeOptions

option = ChromeOptions()
option.add_experimental_option("excludeSwitches", ["enable-automation"])
# 實例化瀏覽器對象 傳入對應驅動
web = webdriver.Chrome(executable_path="./chromedriver", chrome_options=chrome_option, options=option)

posted @ 2020-06-09 18:01 繁華無殤閱讀(171) 評論(0) 收藏舉報

刷新頁面返回頂部

繁華無殤

Selenium模塊

Selenium模塊

作用

簡單使用

環境安裝：

Selenium的使用流程：

Selenium的方法:

獲取標簽：

頁面操作：

應用實例：

Iframe和動作鏈處理

什么是Iframe？

動作鏈處理

qq空間登錄實例

無頭瀏覽器和 規避Selenium檢測

無頭瀏覽器

規避Selenium檢測

公告

無頭瀏覽器和規避Selenium檢測