Prometheus +VictoriaMetrics+Granafa安裝部署
測(cè)試環(huán)境
prometheus-2.54.1.linux-amd64.tar.gz
下載地址:
https://www.prometheus.io/download/
node_exporter-1.8.2.linux-amd64.tar.gz
下載地址:
https://prometheus.io/download/#node_exporter
pushgateway-1.9.0.linux-amd64.tar.gz
下載地址:
https://www.prometheus.io/download/#pushgateway
victoria-metrics-linux-amd64-v1.103.0.tar.gz
下載地址:
https://github.com/VictoriaMetrics/VictoriaMetrics/releases
https://github.com/VictoriaMetrics/VictoriaMetrics/releases/tag/v1.103.0
grafana-7.5.6-1.x86_64.rpm
下載地址:https://dl.grafana.com/oss/release/grafana-7.5.6-1.x86_64.rpm
CentOS 7.9
注意:prometheus,victoria-metrics,grafana,pushgateway 都可以安裝在不同機(jī)器上,本文僅涉及學(xué)習(xí)實(shí)踐,所以,都安裝在同一臺(tái)機(jī)器上了。
實(shí)踐過(guò)程
VictoriaMetrics安裝
# wget https://github.com/VictoriaMetrics/VictoriaMetrics/releases/download/v1.103.0/victoria-metrics-linux-amd64-v1.103.0.tar.gz
# tar -xvzf victoria-metrics-linux-amd64-v1.103.0.tar.gz -C /usr/local/bin # 解壓后會(huì)生成一個(gè)名為victoria-metrics-prod的二進(jìn)制文件
# 創(chuàng)建一個(gè)用于存儲(chǔ)VictoriaMetrics數(shù)據(jù)的文件夾
# mkdir -p /usr/data/victoria-metrics
# 創(chuàng)建服務(wù)
# vi /etc/systemd/system/victoriametrics.service
[Unit]
Description=Victoria metrics service
After=network.target
[Service]
Type=simple
Restart=always
TimeoutStartSec=30
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/victoria-metrics-prod -storageDataPath=/usr/data/victoria-metrics -retentionPeriod=30d -selfScrapeInterval=10s
ExecStop=/bin/kill $MAINPID
ExecReload=/bin/kill -HUP $MAINPID
PrivateTmp=yes
[Install]
WantedBy=multi-user.target
說(shuō)明:
-storageDataPath 設(shè)置數(shù)據(jù)目錄路徑(如果目錄路徑不存在,程序啟動(dòng)時(shí)會(huì)自動(dòng)創(chuàng)建)。VictoriaMetrics會(huì)將所有數(shù)據(jù)存儲(chǔ)在此目錄中。默認(rèn)存儲(chǔ)在當(dāng)前工作目錄中的victoria-metrics-data目錄
-retentionPeriod 設(shè)置存儲(chǔ)數(shù)據(jù)的保留。自動(dòng)刪除舊的數(shù)據(jù)。默認(rèn)保留時(shí)間為1個(gè)月(31天)。 最小值為24h或者1d,-retentionPeriod=3設(shè)置數(shù)據(jù)僅存儲(chǔ)3個(gè)月,-retentionPeriod=1d設(shè)置數(shù)據(jù)僅保留1天。
一般情況下,只需要設(shè)置上述兩個(gè)參數(shù)標(biāo)識(shí)即可,其它參數(shù)已經(jīng)有足夠好的默認(rèn)值,僅在確實(shí)需要修改它們時(shí)才進(jìn)行設(shè)置,執(zhí)行./victoria-metrics-prod --help 可以查看所有可獲取的參數(shù)描述和及其默認(rèn)值所有可獲取的參數(shù)描述和默認(rèn)值。
默認(rèn)的,VictoriaMetrics通過(guò)端口8428監(jiān)聽(tīng)Prometheus查詢API請(qǐng)求
建議為 VictoriaMetrics設(shè)置監(jiān)控
前臺(tái)啟動(dòng)查看
# /usr/local/bin/victoria-metrics-prod -storageDataPath=/usr/data/victoria-metrics -retentionPeriod=30d -selfScrapeInterval=10s
2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:12 build version: victoria-metrics-20240828-135248-tags-v1.103.0-0-g5aeb759df9
2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:13 command-line flags
2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:20 -retentionPeriod="30d"
2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:20 -selfScrapeInterval="10s"
2024-09-03T16:33:42.187Z info VictoriaMetrics/lib/logger/flag.go:20 -storageDataPath="/usr/data/victoria-metrics"
2024-09-03T16:33:42.187Z info VictoriaMetrics/app/victoria-metrics/main.go:73 starting VictoriaMetrics at "[:8428]"...
2024-09-03T16:33:42.187Z info VictoriaMetrics/app/vmstorage/main.go:107 opening storage at "/usr/data/victoria-metrics" with -retentionPeriod=30d
2024-09-03T16:33:42.189Z info VictoriaMetrics/lib/memory/memory.go:42 limiting caches to 611758080 bytes, leaving 407838720 bytes to the OS according to -memory.allowedPercent=60
2024-09-03T16:33:42.205Z info VictoriaMetrics/app/vmstorage/main.go:121 successfully opened storage "/usr/data/victoria-metrics" in 0.018 seconds; partsCount: 0; blocksCount: 0; rowsCount: 0; sizeBytes: 0
2024-09-03T16:33:42.205Z info VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:127 loading rollupResult cache from "/usr/data/victoria-metrics/cache/rollupResult"...
2024-09-03T16:33:42.207Z info VictoriaMetrics/app/vmselect/promql/rollup_result_cache.go:156 loaded rollupResult cache from "/usr/data/victoria-metrics/cache/rollupResult" in 0.001 seconds; entriesCount: 0, sizeBytes: 0
2024-09-03T16:33:42.207Z info VictoriaMetrics/app/victoria-metrics/main.go:84 started VictoriaMetrics in 0.020 seconds
2024-09-03T16:33:42.207Z info VictoriaMetrics/lib/httpserver/httpserver.go:121 starting server at http://127.0.0.1:8428/
2024-09-03T16:33:42.207Z info VictoriaMetrics/lib/httpserver/httpserver.go:122 pprof handlers are exposed at http://127.0.0.1:8428/debug/pprof/
2024-09-03T16:33:42.208Z info VictoriaMetrics/app/victoria-metrics/self_scraper.go:46 started self-scraping `/metrics` page with interval 10.000 seconds
2024-09-03T16:33:52.293Z info VictoriaMetrics/lib/storage/partition.go:202 creating a partition "2024_09" with smallPartsPath="/usr/data/victoria-metrics/data/small/2024_09", bigPartsPath="/usr/data/victoria-metrics/data/big/2024_09"
2024-09-03T16:33:52.295Z info VictoriaMetrics/lib/storage/partition.go:211 partition "2024_09" has been created
關(guān)閉前臺(tái)啟動(dòng)的進(jìn)程,啟動(dòng)服務(wù)并設(shè)置開(kāi)機(jī)啟動(dòng)
# systemctl daemon-reload && sudo systemctl enable --now victoriametrics.service
# 查看服務(wù)是否啟動(dòng)成功
# systemctl status victoriametrics.service
環(huán)境變量
所有VictoriaMetrics組件都允許通過(guò)%{ENV_VAR}語(yǔ)法在yaml配置文件(如-promgraft.config)和命令行標(biāo)志中引用環(huán)境變量。例如,如果VictoriaMetrics啟動(dòng)時(shí)存在METRICS_AUTH_KEY=top-secret環(huán)境變量,則-metricsAuthKey=%{METRICS_AUTH_KEY} 會(huì)被VictoriaMetrics自動(dòng)擴(kuò)展為-metricsAuthKey=top-secret 。
VictoriaMetrics在啟動(dòng)時(shí)遞歸擴(kuò)展環(huán)境變量中的%{ENV_VAR}引用。例如,如果 BAR=a%{BAZ} 且BAZ=bc,則FOO=%{BAR}環(huán)境變量擴(kuò)展為FOO=abc。
此外,所有VictoriaMetrics組件都允許根據(jù)以下規(guī)則通過(guò)環(huán)境變量設(shè)置參數(shù)標(biāo)識(shí)值:
- 必須設(shè)置
-envflag.enable - 參數(shù)標(biāo)識(shí)名稱中的每個(gè)
.字符必須替換成_( 例如,-insert.maxQueueDuration <duration>將轉(zhuǎn)換為insert_maxQueueDuration=<duration>) - 對(duì)于重復(fù)標(biāo)識(shí),一種可用的可選語(yǔ)法將不同的標(biāo)識(shí)值放在一起,通過(guò)
,分隔 (例如-storageNode <nodeA> -storageNode <nodeB>可以轉(zhuǎn)換成storageNode=<nodeA>,<nodeB>) - 可以通過(guò)
-envflag.prefix為環(huán)境變量設(shè)置前綴。例如,如果-envflag.prefix=VM_, 那么環(huán)境變量必須以VM_開(kāi)頭
開(kāi)放防火墻端口
# firewall-cmd --permanent --zone=public --add-port=8248/tcp
success
# firewall-cmd --reload
success
除了以二進(jìn)制方式啟動(dòng),VictoriaMetrics也支持Docker安裝,具體可參考鏈接 https://hub.docker.com/r/victoriametrics/victoria-metrics/
參考鏈接
https://docs.victoriametrics.com/quick-start/
Prometheus安裝與配置
# wget https://github.com/prometheus/prometheus/releases/download/v2.54.1/prometheus-2.54.1.linux-amd64.tar.gz
# tar -C /usr/local/ -xvzf prometheus-2.54.1.linux-amd64.tar.gz
# cd /usr/local/prometheus-2.54.1.linux-amd64
# ls
console_libraries consoles LICENSE NOTICE prometheus prometheus.yml promtool
# ln -s /usr/local/prometheus-2.26.0.linux-amd64/prometheus /usr/local/bin/prometheus
# cp prometheus.yml prometheus.yml.bak
# echo ''> prometheus.yml
# vi prometheus.yml
將prometheus.yml內(nèi)容替換為以下內(nèi)容
global:
scrape_interval: 15s # 默認(rèn),每15秒采樣一次目標(biāo)
# 一份采樣配置僅包含一個(gè) endpoint 來(lái)做采樣
# 下面是 Prometheus 本身的endpoint:
scrape_configs:
# job_name 將被被當(dāng)作一個(gè)標(biāo)簽 `job=<job_name>`添加到該配置的任意時(shí)序采樣.
- job_name: 'prometheus'
# 覆蓋全局默認(rèn)值,從該job每5秒對(duì)目標(biāo)采樣一次
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
remote_write:
- url: http://192.168.88.132:8428/api/v1/write
queue_config:
max_samples_per_send: 10000
capacity: 20000
max_shards: 30
說(shuō)明:
為了能給VictoriaMetrics發(fā)送數(shù)據(jù),在Prometheus配置文件中,增加remote_write 配置。 添加以下代碼到 Prometheus 配置文件(一般是在prometheus.yml):
remote_write:
- url: http://<victoriametrics-addr>:8428/api/v1/write
注意:添加時(shí),需要替換<victoriametrics-addr>為VictoriaMetrics主機(jī)名稱或者IP地址,形如以下
remote_write:
- url: http://192.168.88.132.71.170:8428/api/v1/write
啟動(dòng)Prometheus服務(wù)
# ./prometheus
ts=2024-09-04T15:50:32.906Z caller=main.go:601 level=info msg="No time or size retention was set so using the default time retention" duration=15d
ts=2024-09-04T15:50:32.906Z caller=main.go:645 level=info msg="Starting Prometheus Server" mode=server version="(version=2.54.1, branch=HEAD, revision=e6cfa720fbe6280153fab13090a483dbd40bece3)"
ts=2024-09-04T15:50:32.906Z caller=main.go:650 level=info build_context="(go=go1.22.6, platform=linux/amd64, user=root@812ffd741951, date=20240827-10:56:41, tags=netgo,builtinassets,stringlabels)"
ts=2024-09-04T15:50:32.906Z caller=main.go:651 level=info host_details="(Linux 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 localhost.localdomain (none))"
ts=2024-09-04T15:50:32.906Z caller=main.go:652 level=info fd_limits="(soft=4096, hard=4096)"
ts=2024-09-04T15:50:32.906Z caller=main.go:653 level=info vm_limits="(soft=unlimited, hard=unlimited)"
ts=2024-09-04T15:50:32.917Z caller=web.go:571 level=info component=web msg="Start listening for connections" address=0.0.0.0:9090
ts=2024-09-04T15:50:32.925Z caller=main.go:1160 level=info msg="Starting TSDB ..."
ts=2024-09-04T15:50:32.930Z caller=tls_config.go:313 level=info component=web msg="Listening on" address=[::]:9090
ts=2024-09-04T15:50:32.930Z caller=tls_config.go:316 level=info component=web msg="TLS is disabled." http2=false address=[::]:9090
ts=2024-09-04T15:50:32.932Z caller=head.go:626 level=info component=tsdb msg="Replaying on-disk memory mappable chunks if any"
ts=2024-09-04T15:50:32.932Z caller=head.go:713 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=5.601μs
ts=2024-09-04T15:50:32.933Z caller=head.go:721 level=info component=tsdb msg="Replaying WAL, this may take a while"
ts=2024-09-04T15:50:32.933Z caller=head.go:793 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
ts=2024-09-04T15:50:32.933Z caller=head.go:830 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=25.237μs wal_replay_duration=560.14μs wbl_replay_duration=141ns chunk_snapshot_load_duration=0s mmap_chunk_replay_duration=5.601μs total_replay_duration=605.091μs
ts=2024-09-04T15:50:32.938Z caller=main.go:1181 level=info fs_type=XFS_SUPER_MAGIC
ts=2024-09-04T15:50:32.938Z caller=main.go:1184 level=info msg="TSDB started"
ts=2024-09-04T15:50:32.938Z caller=main.go:1367 level=info msg="Loading configuration file" filename=prometheus.yml
ts=2024-09-04T15:50:32.940Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Starting WAL watcher" queue=b93975
ts=2024-09-04T15:50:32.940Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Starting scraped metadata watcher"
ts=2024-09-04T15:50:32.945Z caller=main.go:1404 level=info msg="updated GOGC" old=100 new=75
ts=2024-09-04T15:50:32.945Z caller=main.go:1415 level=info msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=6.619214ms db_storage=25.792μs remote_storage=1.190631ms web_handler=652ns query_engine=18.267μs scrape=3.897727ms scrape_sd=49.586μs notify=1.164μs notify_sd=954ns rules=60.122μs tracing=62.555μs
ts=2024-09-04T15:50:32.945Z caller=main.go:1145 level=info msg="Server is ready to receive web requests."
ts=2024-09-04T15:50:32.945Z caller=manager.go:164 level=info component="rule manager" msg="Starting rule manager..."
ts=2024-09-04T15:50:32.945Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Replaying WAL" queue=b93975
ts=2024-09-04T15:50:40.288Z caller=dedupe.go:112 component=remote level=info remote_name=b93975 url=http://192.168.88.132:8428/api/v1/write msg="Done replaying WAL" duration=7.342804783s
說(shuō)明:如果希望使用非默認(rèn)配置文件,可以在執(zhí)行命令時(shí)指定具體的配置文件,類似如下:
# ./prometheus --config.file=./custom_prometheus.yml
備注:重啟命令
kill -HUP `pid_of_prometheus`
當(dāng)然也可直接Ctrl + c終止Prometheus進(jìn)程,然后重新運(yùn)行。
Prometheus把傳入的數(shù)據(jù)寫(xiě)入到本地存儲(chǔ)的同時(shí)將數(shù)據(jù)復(fù)制到遠(yuǎn)程存儲(chǔ)。這意味著即使遠(yuǎn)程存儲(chǔ)不可用,存儲(chǔ)在本地,--storage.tsdb.retention.time 指定數(shù)據(jù)保留期內(nèi)的數(shù)據(jù)依然可用。
如果需要從多個(gè) Prometheus 實(shí)例往 VictoriaMetrics 發(fā)送數(shù)據(jù),添加external_labels配置到Prometheus配置文件的global結(jié)點(diǎn),形如以下:
global:
external_labels:
datacenter: dc-123
如上,以上配置告訴 Prometheus 添加 datacenter=dc-123 標(biāo)簽到發(fā)送給遠(yuǎn)程存儲(chǔ)的每個(gè)時(shí)間序列。標(biāo)簽名稱可以是任意的,比如 datacenter 。標(biāo)簽值必須在所有 Prometheus 實(shí)例中保持唯一,這樣,可以通過(guò)該標(biāo)簽過(guò)濾或者分組時(shí)序。
對(duì)于高負(fù)載的Prometheus實(shí)例(每秒200k+個(gè)樣本),可以應(yīng)用以下調(diào)優(yōu):
remote_write:
- url: http://<victoriametrics-addr>:8428/api/v1/write
queue_config:
max_samples_per_send: 10000
capacity: 20000
max_shards: 30
使用遠(yuǎn)程寫(xiě)入增加 Prometheus 約25%的內(nèi)存使用率,這取決于數(shù)據(jù)形態(tài)(原文:Using remote write increases memory usage for Prometheus up to ~25% and depends on the shape of data)。如果你正面臨太高內(nèi)存消耗的問(wèn)題,嘗試降低 max_samples_per_send 和 capacity 參數(shù)配置值(注意:這兩個(gè)參數(shù)是緊密相連的)查看更多關(guān)于遠(yuǎn)程寫(xiě)入調(diào)優(yōu).
建議升級(jí)Prometheus到 v2.12.0 或更高,因?yàn)橹鞍姹臼褂?remote_write存在問(wèn)題。
也可以查看下 vmagent 和 vmalert, 也是用于減少Prometheus更快和更少的資源消耗的一種選擇方案。
參考鏈接:https://docs.victoriametrics.com/#prometheus-setup
創(chuàng)建服務(wù)
# vi /etc/systemd/system/rometheus.service
[Unit]
Description=Prometheus service
After=network.target
[Service]
Type=simple
Restart=always
TimeoutStartSec=30
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/prometheus-2.54.1.linux-amd64/prometheus --config.file=/usr/local/prometheus-2.54.1.linux-amd64/prometheus.yml
ExecStop=/bin/kill $MAINPID
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
注意:配置文件必須使用絕對(duì)路徑,否則運(yùn)行服務(wù)報(bào)錯(cuò),找不到文件/prometheus.yml
先手動(dòng)停止上述前臺(tái)運(yùn)行的prometheus,然后執(zhí)行以下命令
# systemctl daemon-reload && systemctl enable --now prometheus
# systemctl status prometheus
grafana安裝與配置
# yum install grafana-7.5.6-1.x86_64.rpm
說(shuō)明:如果使用yum命令安裝找不到軟件包,可以考慮下載安裝,如下
# wget https://dl.grafana.com/oss/release/grafana-7.5.6-1.x86_64.rpm
# yum install -y fontconfig urw-fonts
# rpm -ivh grafana-7.5.6-1.x86_64.rpm
warning: grafana-7.5.6-1.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 24098cb6: NOKEY
Preparing... ################################# [100%]
Updating / installing...
1:grafana-7.5.6-1 ################################# [100%]
### NOT starting on installation, please execute the following statements to configure grafana to start automatically using systemd
sudo /bin/systemctl daemon-reload
sudo /bin/systemctl enable grafana-server.service
### You can start grafana-server by executing
sudo /bin/systemctl start grafana-server.service
POSTTRANS: Running script
# /bin/systemctl daemon-reload
~]# /bin/systemctl enable grafana-server.service
Created symlink from /etc/systemd/system/multi-user.target.wants/grafana-server.service to /usr/lib/systemd/system/grafana-server.service.
# /bin/systemctl start grafana-server.service
說(shuō)明:如果不執(zhí)行yum install -y fontconfig urw-fonts命令,安裝grafana時(shí)可能報(bào)錯(cuò),如下
warning: grafana-7.5.6-1.x86_64.rpm: Header V4 RSA/SHA256 Signature, key ID 24098cb6: NOKEY
error: Failed dependencies:
fontconfig is needed by grafana-7.5.6-1.x86_64
urw-fonts is needed by grafana-7.5.6-1.x86_64
修改grafana配置[可選]
# vim /etc/grafana/grafana.ini
瀏覽器輸入網(wǎng)址:http://<grafana-addr>:3000訪問(wèn)看看效果:

備注:grafana默認(rèn)登錄賬號(hào)密碼為admin/admin
參考鏈接:https://grafana.com/grafana/download?pg=get&plcmt=selfmanaged-box1-cta1
開(kāi)放防火墻端口
# firewall-cmd --permanent --zone=public --add-port=3000/tcp
success
# firewall-cmd --reload
success
創(chuàng)建Prometheus數(shù)據(jù)源
使用以下URL創(chuàng)建 Prometheus數(shù)據(jù)源 (創(chuàng)建數(shù)據(jù)源時(shí)僅修改URL,其它保持默認(rèn)) :
http://<victoriametrics-addr>:8428
如下,替換<victoriametrics-addr> 為VictoriaMetrics主機(jī)名稱或者IP地址,形如http://192.168.55.88.132:8428,然后使用PromQL 或MetricsQL用創(chuàng)建的數(shù)據(jù)源創(chuàng)建圖表

Access模式簡(jiǎn)介
訪問(wèn)模式控制如何處理對(duì)數(shù)據(jù)源的請(qǐng)求。如果沒(méi)有其他說(shuō)明,Server(default)應(yīng)該是首選方式。
Server訪問(wèn)模式(默認(rèn))
來(lái)自瀏覽器發(fā)起的所有請(qǐng)求都將發(fā)送到Grafana后端/服務(wù)器,后者將請(qǐng)求轉(zhuǎn)發(fā)到數(shù)據(jù)源,從而規(guī)避可能的跨域源資源共享Cross-Origin Resource Sharing (CORS)要求。如果選擇該訪問(wèn)模式,則要求URL可被grafana后端/服務(wù)器訪問(wèn)。Browser 訪問(wèn)模式
來(lái)自瀏覽器的所有請(qǐng)求都將直接發(fā)送給數(shù)據(jù)源,并可能受到跨域源資源共享要求的約束。如果選擇該訪問(wèn)模式,則要求URL可從瀏覽器訪問(wèn)。·
參考鏈接:https://docs.victoriametrics.com/#grafana-setup
安裝 pushgateway
何時(shí)使用pushgateway
pushgateway是一種中間服務(wù),允許從Prometheus無(wú)法抓取的job中推送指標(biāo)。有關(guān)詳細(xì)信息,請(qǐng)參閱推送指標(biāo).
建議僅在某些有限的情況下使用Pushgateway。盲目使用pushgateway而不是Prometheus通常的拉取模型來(lái)收集一般指標(biāo)時(shí),會(huì)有幾個(gè)陷阱:
- 當(dāng)通過(guò)單個(gè)Pushgateway監(jiān)視多個(gè)實(shí)例時(shí),pushgateway既成單點(diǎn)故障,也是潛在的瓶頸。
- 將失去Prometheus通過(guò)
up指標(biāo)(每次抓取時(shí)生成)進(jìn)行的自動(dòng)實(shí)例健康監(jiān)測(cè)。 - Pushgateway永遠(yuǎn)不會(huì)忘記推送給它的序列(series),并將它們永遠(yuǎn)暴露給Prometheus,除非通過(guò)Pushgateway API手動(dòng)刪除這些序列
當(dāng)一個(gè)job的多個(gè)實(shí)例通過(guò)instance標(biāo)簽或其它類似的在Pushgateway中區(qū)分它們的指標(biāo)時(shí),后一點(diǎn)尤其重要。即使原始實(shí)例被重命名或刪除,實(shí)例的指標(biāo)依然保留在pushgateway中。這是因?yàn)閜ushgateway作為指標(biāo)緩存的生命周期與向其推送指標(biāo)的進(jìn)程的生命周期從根本上是分開(kāi)的。與Prometheus通常的拉取式監(jiān)控相比:當(dāng)實(shí)例消失(有意或無(wú)意)時(shí),其指標(biāo)將自動(dòng)消失。使用Pushgateway時(shí),情況并非如此,必須自己手動(dòng)刪除任何過(guò)時(shí)的指標(biāo)或自動(dòng)化此生命周期同步。
通常,Pushgateway的唯一有效使用場(chǎng)景是捕獲服務(wù)級(jí)別批處理作業(yè)的結(jié)果。“服務(wù)級(jí)別”的批處理作業(yè)是指與特定機(jī)器或作業(yè)實(shí)例在語(yǔ)義上無(wú)關(guān)的作業(yè)(例如,刪除整個(gè)服務(wù)的多個(gè)用戶的批處理作業(yè))。此類作業(yè)的指標(biāo)不應(yīng)包括機(jī)器或?qū)嵗龢?biāo)簽,以將特定機(jī)器或?qū)嵗纳芷谂c推送的指標(biāo)解耦。這減輕了Pushgateway中管理過(guò)時(shí)指標(biāo)的負(fù)擔(dān)。另請(qǐng)參閱監(jiān)控批處理作業(yè)的最佳實(shí)踐.
參考鏈接:
https://prometheus.io/docs/practices/pushing/?#when-to-use-the-pushgateway
安裝pushgateway
# wget https://github.com/prometheus/pushgateway/releases/download/v1.9.0/pushgateway-1.9.0.linux-amd64.tar.gz
# tar -C /usr/local/ -xvzf pushgateway-1.9.0.linux-amd64.tar.gz
# ln -s /usr/local/pushgateway-1.9.0.linux-amd64/pushgateway /usr/local/bin/pushgateway
# pushgateway
ts=2024-09-04T17:38:16.325Z caller=main.go:87 level=info msg="starting pushgateway" version="(version=1.9.0, branch=HEAD, revision=d1ca1a6a426126a09a21f745e8ffbaba550f9643)"
ts=2024-09-04T17:38:16.325Z caller=main.go:88 level=info build_context="(go=go1.22.4, platform=linux/amd64, user=root@2167597b1e9c, date=20240608-15:04:08, tags=unknown)"
ts=2024-09-04T17:38:16.328Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9091
ts=2024-09-04T17:38:16.328Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9091
配置Prometheus實(shí)例
為了訪問(wèn)pushgateway指標(biāo),需要正確配置本地運(yùn)行的Prometheus實(shí)例。以下prometheus.yml示例配置文件將告訴Prometheus實(shí)例通過(guò)localhost:9100從 Node Exporter中抓取數(shù)據(jù),以及抓取頻率:
global:
scrape_interval: 15s
scrape_configs:
- job_name: node
static_configs:
- targets: ['localhost:9100']
修改prometheus配置文件
global:
scrape_interval: 15s # 默認(rèn),每15秒采樣一次目標(biāo)
# 一份采樣配置僅包含一個(gè) endpoint 來(lái)做采樣
# 下面是 Prometheus 本身的endpoint:
scrape_configs:
# job_name 將被被當(dāng)作一個(gè)標(biāo)簽 `job=<job_name>`添加到該配置的任意時(shí)序采樣.
- job_name: 'prometheus'
# 覆蓋全局默認(rèn)值,從該job每5秒對(duì)目標(biāo)采樣一次
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'pushgateway'
static_configs:
# 注意:不能寫(xiě)成['http://192.168.88.132:9090'],這樣運(yùn)行會(huì)報(bào)錯(cuò)
- targets: ['192.168.88.132:9091']
remote_write:
- url: http://192.168.88.132:8428/api/v1/write
queue_config:
max_samples_per_send: 10000
capacity: 20000
max_shards: 30
重啟Prometheus
# system restart prometheus
# system status prometheus
創(chuàng)建服務(wù)
# vi /etc/systemd/system/pushgateway.service
[Unit]
Description=Pushgateway service
After=network.target
[Service]
Type=simple
Restart=always
TimeoutStartSec=30
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/pushgateway-1.9.0.linux-amd64/pushgateway
ExecStop=/bin/kill $MAINPID
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
先手動(dòng)停止上述前臺(tái)運(yùn)行的pushgateway,然后執(zhí)行以下命令
# systemctl daemon-reload && systemctl enable --now pushgateway
# systemctl status pushgateway
Node Exporter安裝與配置
注意:只需在需要被監(jiān)控的機(jī)器上安裝(本例中為一臺(tái)redis服務(wù)器,IP:192.168.88.131)
# wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
# tar -C /usr/local/ -xvzf node_exporter-1.8.2.linux-amd64.tar.gz
# ln -s /usr/local/node_exporter-1.8.2.linux-amd64/node_exporter /usr/local/bin/node_exporter
# node_exporter
ts=2024-09-04T16:01:28.241Z caller=node_exporter.go:193 level=info msg="Starting node_exporter" version="(version=1.8.2, branch=HEAD, revision=f1e0e8360aa60b6cb5e5cc1560bed348fc2c1895)"
ts=2024-09-04T16:01:28.241Z caller=node_exporter.go:194 level=info msg="Build context" build_context="(go=go1.22.5, platform=linux/amd64, user=root@03d440803209, date=20240714-11:53:45, tags=unknown)"
ts=2024-09-04T16:01:28.242Z caller=node_exporter.go:196 level=warn msg="Node Exporter is running as root user. This exporter is designed to run as unprivileged user, root is not required."
ts=2024-09-04T16:01:28.242Z caller=filesystem_common.go:111 level=info collector=filesystem msg="Parsed flag --collector.filesystem.mount-points-exclude" flag=^/(dev|proc|run/credentials/.+|sys|var/lib/docker/.+|var/lib/containers/storage/.+)($|/)
ts=2024-09-04T16:01:28.242Z caller=filesystem_common.go:113 level=info collector=filesystem msg="Parsed flag --collector.filesystem.fs-types-exclude" flag=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
ts=2024-09-04T16:01:28.242Z caller=diskstats_common.go:111 level=info collector=diskstats msg="Parsed flag --collector.diskstats.device-exclude" flag=^(z?ram|loop|fd|(h|s|v|xv)d[a-z]|nvme\d+n\d+p)\d+$
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:111 level=info msg="Enabled collectors"
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=arp
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=bcache
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=bonding
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=btrfs
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=conntrack
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=cpu
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=cpufreq
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=diskstats
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=dmi
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=edac
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=entropy
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=fibrechannel
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=filefd
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=filesystem
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=hwmon
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=infiniband
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=ipvs
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=loadavg
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=mdadm
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=meminfo
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=netclass
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=netdev
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=netstat
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=nfs
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=nfsd
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=nvme
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=os
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=powersupplyclass
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=pressure
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=rapl
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=schedstat
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=selinux
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=sockstat
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=softnet
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=stat
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=tapestats
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=textfile
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=thermal_zone
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=time
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=timex
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=udp_queues
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=uname
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=vmstat
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=watchdog
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=xfs
ts=2024-09-04T16:01:28.243Z caller=node_exporter.go:118 level=info collector=zfs
ts=2024-09-04T16:01:28.244Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9100
ts=2024-09-04T16:01:28.244Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9100
通過(guò)輸出可知,Node Exporter已在運(yùn)行,并且在端口9100暴露指標(biāo)
Node Exporter 指標(biāo)
通過(guò)請(qǐng)求/metrics端點(diǎn)來(lái)確認(rèn)指標(biāo)是否已成功暴露:
# curl http://localhost:9100/metrics
看到類似下輸出則表明暴露成功。
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
...
Node Exporter現(xiàn)在暴露了Prometheus可以抓取的指標(biāo),包括輸出中更深層級(jí)的各種系統(tǒng)指標(biāo)(以node_為前綴)。要查看這些指標(biāo)可以執(zhí)行以下命令:
# curl http://localhost:9100/metrics | grep "node_"
配置Prometheus實(shí)例
修改prometheus配置文件
global:
scrape_interval: 15s # 默認(rèn),每15秒采樣一次目標(biāo)
# 一份采樣配置僅包含一個(gè) endpoint 來(lái)做采樣
# 下面是 Prometheus 本身的endpoint:
scrape_configs:
# job_name 將被被當(dāng)作一個(gè)標(biāo)簽 `job=<job_name>`添加到該配置的任意時(shí)序采樣.
- job_name: 'prometheus'
# 覆蓋全局默認(rèn)值,從該job每5秒對(duì)目標(biāo)采樣一次
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'pushgateway'
static_configs:
# 注意:不能寫(xiě)成['http://192.168.88.131:9091'],這樣運(yùn)行會(huì)報(bào)錯(cuò)
- targets: ['192.168.88.131:9091']
- job_name: 'redis_node'
static_configs:
- targets: ['192.168.88.131:9100']
remote_write:
- url: http://192.168.88.132:8428/api/v1/write
queue_config:
max_samples_per_send: 10000
capacity: 20000
max_shards: 30
重啟Prometheus
通過(guò)Prometheus expression browser查看Node Exporter 指標(biāo)

特定于Node Exporter的指標(biāo)前綴為Node_,包括Node_cpu_seconds_total和Node_Exporter_build_info等指標(biāo)。
點(diǎn)擊以下鏈接查看一些示例指標(biāo)
| Metric | Meaning |
|---|---|
rate(node_cpu_seconds_total{mode="system"}[1m]) |
最近一分鐘,每秒在系統(tǒng)模式下消耗的CPU時(shí)間平均值(以秒為單位) |
node_filesystem_avail_bytes |
非root用戶可用的文件系統(tǒng)空間(以字節(jié)為單位) |
rate(node_network_receive_bytes_total[1m]) |
最近一分鐘,每秒接收的平均網(wǎng)絡(luò)流量(以字節(jié)為單位) |
驗(yàn)證Grafana能否正常展示數(shù)據(jù)

作者:授客
微信/QQ:1033553122
全國(guó)軟件測(cè)試QQ交流群:7156436
Git地址:https://gitee.com/ishouke
友情提示:限于時(shí)間倉(cāng)促,文中可能存在錯(cuò)誤,歡迎指正、評(píng)論!
作者五行缺錢,如果覺(jué)得文章對(duì)您有幫助,請(qǐng)掃描下邊的二維碼打賞作者,金額隨意,您的支持將是我繼續(xù)創(chuàng)作的源動(dòng)力,打賞后如有任何疑問(wèn),請(qǐng)聯(lián)系我!!!
微信打賞
支付寶打賞 全國(guó)軟件測(cè)試交流QQ群
浙公網(wǎng)安備 33010602011771號(hào)