<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      [Tip] 利用kube-state-metrics自動生成prometheus的配置文件

      最近運維的k8s集群的node節點變動頻繁,總是要手動更新prometheus配置文件表示很蛋疼。所以研究一下怎么對node節點做service discovery,自動更新監控targets列表。

      先看看已有的配置維護方案

      prometheus的配置通過file_sd_config實現動態加載,用Python腳本訪問每個集群的apiserver獲取node節點然后生成對應的json配置文件。

      這個方案可以拿來直接用的,不過配置起來不夠靈活。我不想在腳本里面維護各個集群的認證方式,只想安安靜靜的更新prometheus自己的配置。pass

      再看看prometheus官方提供的方案

      prometheus提供了kubernetes_sd_config,可以在prometheus.yml中配置好集群的認證方式,這樣prometheus會定期去各個apiserver獲取需要監控的node列表。在測試環境折騰了半天,發現這種方式對于部署在內部的prometheus配置起來很友好,然后如果是多個集群共用一個prometheus的話認證證書維護起來比較麻煩且容易出現集群認證配置更新了,prometheus中的配置沒更新的尷尬情況。雖說集群的認證更新不會很頻繁,但是每次更新就得重啟prometheus也是不方便。

      所以這種的確是最優雅的方案,也被pass了。

      我的方案

      最后在研究promethes的各個監控項目的時候,發現了kubernetes官方提供了詳細的node節點監控:kube-state-metrics,這些監控配置同樣可以通過file_sd_config動態加載。于是有了以下方案:

      人工維護kube-state-metrics的配置groups/kube-state-metrics/*.json,使用定時執行的腳本通過獲取localhost的prometheus監控數據來更新node列表

      prometheus部署

      • prometheus.yaml中必備配置:
      # my global config
      global:
        scrape_interval:     60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
        evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
        scrape_timeout: 60s
        # scrape_timeout is set to the global default (10s).
      
        # Attach these labels to any time series or alerts when communicating with
        # external systems (federation, remote storage, Alertmanager).
        external_labels:
            monitor: 'k8s-prometheus-monitor'
      
      alerting:
        alertmanagers:
        - static_configs:
          - targets: ["localhost:9093"]
      
      # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
      rule_files:
        # - "first.rules"
        # - "second.rules"
        - /home/server/prometheus/rule.yml
      # A scrape configuration containing exactly one endpoint to scrape:
      # Here it's Prometheus itself.
      
      
      scrape_configs:
        # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
        - job_name: 'prometheus'
      
          # metrics_path defaults to '/metrics'
          # scheme defaults to 'http'.
          static_configs:
            - targets: ['localhost:9090']
      
        - job_name: 'kube-state-metrics'
          scrape_interval: 180s
          scrape_timeout:  30s
          file_sd_configs:
            - files: ['groups/kube-state-metrics/*.json']
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: "(kube_node_status_condition|kube_node_labels|kube_node_info|kube_pod_container_resource_requests_cpu_cores|kube_node_status_allocatable_cpu_cores|kube_pod_container_resource_requests_memory_bytes|kube_node_status_allocatable_memory_bytes)"
              action: keep
      
        - job_name: 'cadvisor'
          scrape_interval: 90s
          scrape_timeout:  30s
          file_sd_configs:
            - files: ['groups/cadvisor/*.json']
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: "(container_cpu_usage_seconds_total|container_memory_rss|container_memory_usage_bytes|container_spec_memory_limit_bytes|container_spec_cpu_quota|container_memory_swap|container_memory_cache|container_network_receive_bytes_total|container_network_transmit_bytes_total|container_cpu_cfs_throttled_periods_total|container_cpu_cfs_periods_total|container_cpu_user_seconds_total|container_cpu_system_seconds_total|container_memory_failures_total|container_fs_reads_bytes_total|container_fs_writes_bytes_total|container_cpu_cfs_throttled_seconds_total|container_memory_working_set_bytes|kube_deployment_spec_replicas|kube_node_status_capacity_cpu_cores|kube_pod_container_resource_limits|kube_pod_container_resource_limits_cpu_cores|kube_pod_container_resource_limits_memory_bytes|kube_pod_container_resource_requests|kube_pod_container_resource_requests_cpu_cores|kube_replicationcontroller_spec_replicas|kube_replicationcontroller_status_replicas)"
              action: keep
      
        # kubernetes > 1.13
        - job_name: 'cadvisor-standalone'
          scrape_interval: 90s
          scrape_timeout:  30s
          file_sd_configs:
            - files: ['groups/cadvisor-standalone/*.json']
          metric_relabel_configs:
            - source_labels: ['container_label_io_kubernetes_pod_name']
              target_label: 'pod_name'
            - source_labels: ['container_label_io_kubernetes_container_name']
              target_label: 'container_name'
            - source_labels: [__name__]
              regex: "(container_cpu_usage_seconds_total|container_memory_rss|container_memory_usage_bytes|container_spec_memory_limit_bytes|container_spec_cpu_quota|container_memory_swap|container_memory_cache|container_network_receive_bytes_total|container_network_transmit_bytes_total|container_cpu_cfs_throttled_periods_total|container_cpu_cfs_periods_total|container_cpu_user_seconds_total|container_cpu_system_seconds_total|container_memory_failures_total|container_fs_reads_bytes_total|container_fs_writes_bytes_total|container_cpu_cfs_throttled_seconds_total|container_memory_working_set_bytes|kube_deployment_spec_replicas|kube_node_status_capacity_cpu_cores|kube_pod_container_resource_limits|kube_pod_container_resource_limits_cpu_cores|kube_pod_container_resource_limits_memory_bytes|kube_pod_container_resource_requests|kube_pod_container_resource_requests_cpu_cores|kube_replicationcontroller_spec_replicas|kube_replicationcontroller_status_replicas)"
              action: keep
      
        - job_name: 'node-exporter'
          scrape_interval: 90s
          scrape_timeout:  30s
          file_sd_configs:
            - files: ['groups/node-exporter/*.json']
          metric_relabel_configs:
            - source_labels: [__name__]
              regex: "(node_cpu_seconds_total|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_load1|node_load5)"
              action: keep
      
      • docker-comose.yaml配置
      prometheus:
        image: prom/prometheus:v2.24.1
        net: host
        restart: always
        environment:
         -  TZ=Asia/Shanghai
        volumes:
         -  /etc/localtime:/etc/localtime:ro
         - ./prometheus/:/etc/prometheus/
         - /home/data/prometheus_data/:/prometheus_data/:rw
        command:
         - '--config.file=/etc/prometheus/prometheus.yml'
         - '--storage.tsdb.path=/prometheus_data/'
         - '--storage.tsdb.retention.time=2d'
         - '--storage.tsdb.max-block-duration=2h'
         - '--storage.tsdb.min-block-duration=2h'
         - '--query.max-samples=100000000'
         - '--web.console.libraries=/usr/share/prometheus/console_libraries'
         - '--web.console.templates=/usr/share/prometheus/consoles'
        ports:
         - 9090:9090
      
      alertmanager:
        image: prom/alertmanager
        ports:
          - 9093:9093
        volumes:
          - ./alertmanager/:/etc/alertmanager/
        net: host
        restart: always
        command:
          - '--config.file=/etc/alertmanager/config.yml'
          - '--storage.path=/alertmanager'
      
      • 啟動腳本
      #!/bin/bash
      
      mkdir -p /home/server/prometheus/groups/{kube-state-metrics,node-exporter,cadvisor,cadvisor-standalone}
      mkdir -p /home/data/prometheus_data/
      mkdir -p /home/server/alertmanager/
      
      • Python腳本生成node配置
      # coding: utf-8
      
      import json
      import socket
      import time
      from datetime import datetime
      
      import requests
      
      
      def send_alarm(msg):
          pass
      
      def simple_query(query='kube_node_status_condition{status="true"}==1'):
          """
          獲取5分鐘前的監控數據
          """
          step = 20
          now = int(time.time())
          start = now - 300
          end = now - 300
          params = (
              ('query', str(query)),
              ('start', str(start)),
              ('end', str(end)),
              ('step', str(step)),
          )
      
          response = requests.get(
              'http://localhost:9090/api/v1/query_range', params=params)
      
          if response:
              return response.json()
          else:
              return None
      
      
      def json_node_exporter_cadvisor():
          kube_node_status_condition = simple_query(
              query='kube_node_status_condition{status="true"}==1')
          kube_node_labels = simple_query(query='kube_node_labels')
          kube_node_info = simple_query(query='kube_node_info')
          try:
              cluster_count_old = {}
              with open("/home/server/prometheus/groups/node-exporter/config_node_exporter.json") as f:
                  config = json.loads(f.read())
                  for node in config:
                      cluster = node["labels"]["cluster"]
                      if cluster not in cluster_count_old:
                          cluster_count_old[cluster] = 1
                      else:
                          cluster_count_old[cluster] += 1
          except:
              cluster_count_old = {}
      
      
          try:
              nodes_ready = [i["metric"]["node"]
                             for i in kube_node_status_condition["data"]["result"]]
              node_label_dict = {
                  i["metric"]["node"]: {
                      "cluster": i["metric"]["cluster"],
                      "group": i["metric"].get("label_group")
                  } for i in kube_node_labels["data"]["result"]
              }
              node_version_dict = {
                  i["metric"]["node"]: str(
                      i["metric"]["kubelet_version"]).strip("v").split(".")
                  for i in kube_node_info["data"]["result"]
              }
              config_node_exporter = []
              config_cadvisor = []
              config_cadvisor_standalone = []
              cluster_count_new = {}
              for n in nodes_ready:
                  targets_9100 = [str(n) + ":9100"]
                  targets_4194 = [str(n) + ":4194"]
                  version = node_version_dict[n]
                  cluster = node_label_dict[n]["cluster"]
                  if cluster not in cluster_count_new:
                      cluster_count_new[cluster] = 1
                  else:
                      cluster_count_new[cluster] += 1
                  item_node = {"labels": node_label_dict[n], "targets": targets_9100}
                  item_cadvisor = {
                      "labels": node_label_dict[n], "targets": targets_4194}
                  config_node_exporter.append(item_node)
                  if int(version[1]) == 1 and int(version[1]) >= 14:
                      config_cadvisor_standalone.append(item_cadvisor)
                  else:
                      config_cadvisor.append(item_cadvisor)
      
              cluster_config_change = {
                  cluster: cluster_count_new.get(cluster, 0) - cluster_count_old.get(cluster) for cluster in cluster_count_old
              }
              change_min = min(list(cluster_config_change.values()))
              if change_min >= -3:  #node節點減少如果大于3個則不自動更新配置
                  with open("/home/server/prometheus/groups/node-exporter/config_node_exporter.json", "w") as f:
                      f.write(json.dumps(config_node_exporter, indent=4))
                  with open("/home/server/prometheus/groups/cadvisor/config_cadvisor.json", "w") as f:
                      f.write(json.dumps(config_cadvisor, indent=4))
                  with open("/home/server/prometheus/groups/cadvisor-standalone/config_cadvisor_standalone.json", "w") as f:
                      f.write(json.dumps(config_cadvisor_standalone, indent=4))
              else:
                  with open("/home/server/prometheus/groups/node-exporter/config_node_exporter.json.new", "w") as f:
                      f.write(json.dumps(config_node_exporter, indent=4))
                  with open("/home/server/prometheus/groups/cadvisor/config_cadvisor.json.new", "w") as f:
                      f.write(json.dumps(config_cadvisor, indent=4))
                  with open("/home/server/prometheus/groups/cadvisor-standalone/config_cadvisor_standalone.json.new", "w") as f:
                      f.write(json.dumps(config_cadvisor_standalone, indent=4))
      
                  msg = "prometheus配置中node數量變更為{},請確認配置: {}".format(str(change_min), str(cluster_config_change))
                  send_alarm(msg)
              return cluster_config_change
          except Exception as e:
              print(e)
      
      if __name__ == "__main__":
          res = json_node_exporter_cadvisor()
          print(res)
      
      posted @ 2021-02-09 13:05  changediff  閱讀(857)  評論(0)    收藏  舉報
      主站蜘蛛池模板: 99久久er热在这里只有精品99| 中文字幕在线国产精品| 国产久爱免费精品视频| 精品一区二区三区日韩版| 久久精品免视看国产成人| 成人亚欧欧美激情在线观看| 亚洲最大在线精品| 国产老头多毛Gay老年男| 国模无吗一区二区二区视频| 国产一区二区一卡二卡| 欧美孕妇乳喷奶水在线观看 | 岳阳市| 国产人妻高清国产拍精品| 日本一区二区三区视频版| 在线成人精品国产区免费| 精品日本乱一区二区三区| 五原县| 保定市| 夜夜添无码一区二区三区| 99www久久综合久久爱com| 日韩亚洲中文图片小说| 久久亚洲av成人无码软件| 中文字幕亚洲精品第一页| 欧美日韩国产综合草草| 久久一区二区中文字幕| 人人干人人噪人人摸| 精品国产一区二区亚洲人| 黄页网站在线观看免费视频| 最新亚洲人成网站在线影院| 欧美xxxx做受欧美.88| 蜜桃av亚洲精品一区二区| 我国产码在线观看av哈哈哈网站| 无码av波多野结衣| 亚洲婷婷综合色香五月| 一区二区中文字幕久久| 永新县| 成全高清在线播放电视剧| 国内精品免费久久久久电影院97| 2021国产精品视频网站| 久久发布国产伦子伦精品| 国产精品十八禁在线观看|