批量監控關鍵業務站點nginx的http狀態碼
一些業務站點比較關鍵,比如API接口或者web站點,需要對出現的訪問http狀態碼進行監控,比如監控10分鐘以內,或者最近的1000次訪問,當出現的50x狀態碼過多時觸發告警,可以讓一些問題暴露出來及時處理
1.編寫腳本
# cat /usr/local/zabbix_agents_3.2.0/scripts/web_nginx_code.sh #!/bin/bash # function:monitor store nginx access error code web_domain_discovery () { WEB_DOMAIN=($(cat /usr/local/zabbix_agents_3.2.0/scripts/web_site.txt|grep -v "^#")) printf '{\n' printf '\t"data":[\n' for((i=0;i<${#WEB_DOMAIN[@]};++i)) { num=$(echo $((${#WEB_DOMAIN[@]}-1))) if [ "$i" != ${num} ]; then printf "\t\t{ \n" printf "\t\t\t\"{#SITENAME}\":\"${WEB_DOMAIN[$i]}\"},\n" else printf "\t\t{ \n" printf "\t\t\t\"{#SITENAME}\":\"${WEB_DOMAIN[$num]}\"}]}\n" fi } } # 統計nginx access log中的50x代碼個數 error_code_count () { web_site=$1 if [ ${web_site} == 'store.chinasoft.jp' ];then #/usr/bin/cat /data/www/logs/nginx_log/access/${web_site}.access.log |awk '{print $1" "$10" "$11}'|grep 'HTTP/1.1" 50'|uniq |wc -l tail -n 1000 /data/www/logs/nginx_log/access/${web_site}.access.log |awk '{print $1" "$10" "$11}'|grep 'HTTP/1.1" 50'|uniq |wc -l exit 0 fi #/usr/bin/cat /data/www/logs/nginx_log/access/${web_site}_access.log |awk '{print $1" "$10" "$11}'|grep 'HTTP/1.1" 50'|uniq |wc -l tail -n 1000 /data/www/logs/nginx_log/access/${web_site}_access.log |awk '{print $1" "$10" "$11}'|grep 'HTTP/1.1" 50'|uniq |wc -l } last10_mins_error_code_count () { web_site=$1 if [ ${web_site} == 'store.chinasoft.jp' ];then /usr/bin/cat /data/www/logs/nginx_log/access/${web_site}.access.log |awk '{print $1" "$10" "$11}'|grep 'HTTP/1.1" 50'|uniq |wc -l exit 0 fi #/usr/bin/cat /data/www/logs/nginx_log/access/${web_site}_access.log |awk '{print $1" "$10" "$11}'|grep 'HTTP/1.1" 50'|uniq |wc -l # 統計最近 10 分鐘的 50x 錯誤 /usr/bin/tac /data/www/logs/nginx_log/access/${web_site}_access.log| awk 'BEGIN{ "date -d \"-10 minute\" +\"%H:%M:%S\"" | getline min10ago } { if (substr($4, 14) > min10ago) print $0;else exit }' | tac| awk '{print $1" "$10" "$11}' | grep 'HTTP/1.1" 50'|uniq|wc -l } case "$1" in web_domain_discovery) web_domain_discovery ;; error_code_count) error_code_count $2 ;; last10_mins_error_code_count) last10_mins_error_code_count $2 ;; *) echo "Usage:$0 {web_web_discovery|error_code_count|last10_mins_error_code_count}" ;; esac
站點的讀取文件格式
# cat /usr/local/zabbix_agents_3.2.0/scripts/web_site.txt account.chinasoft.cn distriapi.chinasoft.cn innerapi.chinasoft.cn masterapi.chinasoft.cn mm.chinasoft.cn userapi.chinasoft.cn
2.編寫監控配置
# cat /usr/local/zabbix_agents_3.2.0/conf/zabbix_agentd/web_nginx_code_discovery.conf UserParameter=web.domain.discovery,/usr/local/zabbix_agents_3.2.0/scripts/web_nginx_code.sh web_domain_discovery UserParameter=web.domain.code[*],/usr/local/zabbix_agents_3.2.0/scripts/web_nginx_code.sh error_code_count $1 UserParameter=web.domain.10mins.code[*],/usr/local/zabbix_agents_3.2.0/scripts/web_nginx_code.sh last10_mins_error_code_count $1
3.創建監控項
name和key都是:web.domain.discovery

監控項
10分鐘內的狀態碼
name: web.domain.10mins.code ON $1
key: web.domain.10mins.code[{#SITENAME},]

監控項
name:web.domain.code ON $1
key: web.domain.code[{#SITENAME},]

觸發器
name: {#SITENAME} last 10 minutes nginx 50x great than 10
表達式
{Template alisz nginx site access error_code count:web.domain.10mins.code[{#SITENAME},].last()}>50
10分鐘超過50個50x就報警

最近的1000個訪問超過200觸發報警
name:{#SITENAME} nginx 50x code great than 20%
觸發器:
{Template alisz nginx site access error_code count:web.domain.code[{#SITENAME},].last()}>200


浙公網安備 33010602011771號