K8S 生产实践-16-3-Prometheus 实战

一、Helm&Operator

部署方案选择

  • 1、手动部署
  • 2、Helm
  • 3、Prometheus Operator
  • 4、Helm + Prometheus Operator

工具简介

Helm简介

  • Ubuntu的apt-get、Centos的yum
  • K8S 的包管理器
  • 一包一 Chart(一个目录)

file

Helm其实就是一个基于Kubernetes的程序包(资源包)管理器,它将一个应用的相关资源组织成为Charts,并通过Charts管理程序包。再简单点说,可以当做RHEL/CentOS系统中的yum机制,有 yum install,也有 helm install等等。具体可以参考网上其他介绍。

kubernetes : 解决了容器维护的难题,通过yaml编写,比如deployment,job,statefulset、configmap等等,通过控制循环,让容器镜像便于管理,集群维护难度大减(google 15年 生产负载经验,每周维护数十亿容器)

管理应用过程中,维护一系列yaml文件,学习成本有点高。

所以Helm来了,Helm 定义了一套 Chart 格式来描述一个应用。打个比方,一个安卓程序打包成 APK 格式,就可以安装到任意一台运行安卓系统的手机上。如果我们把 kubernetes 比做安卓系统,kubernetes 应用比做安卓程序,那么 Chart 就可以比做 APK。这也意味着,kubernetes 应用只要打包成 Chart,就可以通过 Helm 部署到任意一个 kubernetes 集群上。

Helm 社区已经维护了一个官方 Helm Hub,我们可以直接使用别人已经做好的 Helm Chart,如下图,通过helm能更简单的管理比较复杂的应用程序。

所以对于我们来说:

  • 面向java编程:java文件
  • 面向docker编程:dockerfile
  • 面向kubernetes编程:yaml
  • 面向helm编程:chart

官网:https://helm.sh/docs/using_helm/#quickstart-guide
GitHub:https://github.com/helm/helm

Helm部署

首先你需要保证部署helm的节点必须可以正常执行kubectl

1. Helm客户端安装

======================== 更新为 helm3.x版本,横线内作废 Begin ========================

下载

Helm是一个二进制文件,我们直接到github的release去下载就可以,地址如下:
https://github.com/helm/helm/releases

由于国内网络原因,无法科学上网的同学可以到我的网盘上下载,版本是2.13.1-linux-amd64。
链接: https://pan.baidu.com/s/1bu-cpjVaSVGVXuWvWoqHEw
提取码: 5wds

安装
# 下载 
$ wget https://get.helm.sh/helm-v2.3.1-linux-amd64.tar.gz

# 解压
$ tar -zxvf helm-v2.13.1-linux-amd64.tar.gz
$ mv linux-amd64/helm /usr/local/bin/

# 没配置环境变量的需要先配置好
$ export PATH=$PATH:/usr/local/bin/

# 验证
$ helm version

[root@hombd03 softwards]# helm version
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Error: could not find tiller

2. Tiller安装

Tiller 是以 Deployment 方式部署在 Kubernetes 集群中的,由于 Helm 默认会去 storage.googleapis.com 拉取镜像,我们这里就默认无法科学上网的情况:

# 指向阿里云的仓库
$ helm init --client-only --stable-repo-url https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts/
$ helm repo add incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
$ helm repo update

# 因为官方的镜像无法拉取,使用-i指定自己的镜像
$ helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1  --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts

# 创建TLS认证服务端
$ helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1 --tiller-tls-cert /etc/kubernetes/ssl/tiller001.pem --tiller-tls-key /etc/kubernetes/ssl/tiller001-key.pem --tls-ca-cert /etc/kubernetes/ssl/ca.pem --tiller-namespace kube-system --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts

3. 给Tiller授权

因为 Helm 的服务端 Tiller 是一个部署在 Kubernetes 中的 Deployment,它会去访问ApiServer去对集群进行操作。目前的 Tiller 部署时默认没有定义授权的 ServiceAccount,这会导致访问 API Server 时被拒绝。所以我们需要明确为 Tiller 部署添加授权。

# 创建serviceaccount
$ kubectl create serviceaccount --namespace kube-system tiller
# 创建角色绑定
$ kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller

4. 验证

# 查看Tiller的serviceaccount,需要跟我们创建的名字一致:tiller
$ kubectl get deploy --namespace kube-system tiller-deploy -o yaml|grep serviceAccount

# 验证pods
$ kubectl -n kube-system get pods|grep tiller

# 验证版本
$ helm version

file

file
======================== 更新为 helm3.x版本,横线内作废 End ========================

以上在安装Tiller的时候会报错,所以,这里升级安装 helm3.0的版本,3.x版本已经移除了tiller,减少了很多麻烦;

载解压后就能用了,3.x版本已经移除了tiller
https://github.com/helm/helm/releases

wget https://get.helm.sh/helm-v3.8.0-linux-amd64.tar.gz
tar xf helm-v3.8.0-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm

打印:

root@homaybd03 softwards]# helm version
version.BuildInfo{Version:"v3.8.0", GitCommit:"d14138609b01886f544b2025f5000351c9eb092e", GitTreeState:"clean", GoVersion:"go1.17.5"}

3、添加仓库

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

4、安装

helm install 软件名. -f ./values.yaml --namespace 命名空间名

5、查看

helm list -n命名空间名

3、安装

在 Helm 2 中,如果没有指定 release 的名称,则会自动随机生成一个名称。但是在 Helm 3 中,则必须主动指定名称,或者增加 --generate-name 参数让它自动生成一个随机的名称。

在 helm v3 中,可以使用:

helm install [NAME] [CHART]

例子:

helm install rancher rancher-stable/rancher
helm install rancher-stable/rancher --generate-name

实战:

# 创建命名空间
kubectl create namespace monitoring

helm install   imooc-prom stable/prometheus-operator --namespace monitoring

安装的时候结果报错了:

[root@hombd03 softwards]# helm install   imooc-prom stable/prometheus-operator --namespace monitoring
Error: INSTALLATION FAILED: failed to download "stable/prometheus-operator"
[root@homaybd03 softwards]# 

这是因为 helm 仓库没有prometheus-operator 的资源,需要去 github 上直接拉取;
https://github.com/helm/charts/tree/master/stable/prometheus-operator

下载 charts:

# yum -y install git
[root@homaybd03 softwards]# git clone https://github.com/helm/charts.git

下载完之后查看目录:

[root@hombd03 softwards]# cd charts/
[root@hombd03 charts]# ls -l
total 76
-rw-r--r--.   1 root root   137 Jun 26 09:40 code-of-conduct.md
-rw-r--r--.   1 root root  6765 Jun 26 09:40 CONTRIBUTING.md
drwxr-xr-x.  75 root root  4096 Jun 26 09:40 incubator
-rw-r--r--.   1 root root 11343 Jun 26 09:40 LICENSE
-rw-r--r--.   1 root root   240 Jun 26 09:40 OWNERS
-rw-r--r--.   1 root root  3248 Jun 26 09:40 PROCESSES.md
-rw-r--r--.   1 root root 10057 Jun 26 09:40 README.md
-rw-r--r--.   1 root root 15223 Jun 26 09:40 REVIEW_GUIDELINES.md
drwxr-xr-x. 284 root root  8192 Jun 26 09:40 stable
drwxr-xr-x.   3 root root   168 Jun 26 09:40 test
[root@homaybd03 charts]# cd stable/
[root@homaybd03 stable]# ls -l
total 4
drwxr-xr-x. 3 root root   96 Jun 26 09:40 acs-engine-autoscaler
drwxr-xr-x. 3 root root   96 Jun 26 09:40 aerospike
...
root@homaybd03 stable]# cd prometheus-operator/
[root@homaybd03 prometheus-operator]# ls -l
total 160
-rw-r--r--. 1 root root   790 Jun 26 09:40 Chart.yaml
drwxr-xr-x. 2 root root    83 Jun 26 09:40 ci
-rw-r--r--. 1 root root   658 Jun 26 09:40 CONTRIBUTING.md
drwxr-xr-x. 2 root root   181 Jun 26 09:40 crds
drwxr-xr-x. 3 root root   129 Jun 26 09:40 hack
-rw-r--r--. 1 root root 78549 Jun 26 09:40 README.md
-rw-r--r--. 1 root root   457 Jun 26 09:40 requirements.lock
-rw-r--r--. 1 root root   468 Jun 26 09:40 requirements.yaml
drwxr-xr-x. 7 root root   140 Jun 26 09:40 templates
-rw-r--r--. 1 root root 64481 Jun 26 09:40 values.yaml
[root@homaybd03 prometheus-operator]# 

将需要的文件拷贝到当前目录:

[root@hombd03 prometheus-operator]# cd ~
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/prometheus-operator/ .
[root@hombd03 ~]# ls -l
total 248
-rw-------. 1 root root   1820 Jan 27  2021 anaconda-ks.cfg
-rw-r--r--. 1 root root 202880 Jun  4 23:02 calico.yaml
-rw-r--r--. 1 root root   5159 Jun  4 23:53 coredns.yaml
-rw-r--r--. 1 root root  15169 Jun 19 01:06 deploy.yaml
drwxr-xr-x. 2 root root     53 Jun 19 09:36 ingress-nginx
drwxr-xr-x. 2 root root     42 Jun 18 00:59 nginx
-rw-r--r--. 1 root root    497 Jun  5 00:28 nginx-ds.yml
-rw-r--r--. 1 root root   3807 Jun  5 00:04 nodelocaldns.yaml
drwxr-xr-x. 2 root root   4096 Jun  4 01:46 pki
-rw-r--r--. 1 root root    160 Jun  5 01:26 pod-nginx.yaml
drwxr-xr-x. 6 root root    203 Jun 26 09:44 prometheus-operator
drwxr-xr-x. 2 root root     34 Mar  3 16:41 software
-rw-r--r--. 1 root root   1473 Jun 23 00:45 tiller.yaml
drwxr-xr-x. 2 root root     55 Mar 17 11:29 wing-test
[root@homaybd03 ~]# 

然后再去尝试安装:

helm install   imooc-prom ./prometheus-operator/ --namespace monitoring
[root@hombd03 ~]# helm install   imooc-prom ./prometheus-operator/ --namespace monitoring
WARNING: This chart is deprecated
Error: INSTALLATION FAILED: An error occurred while checking for chart dependencies. You may need to run `helm dependency build` to fetch missing dependencies: found in Chart.yaml, but missing in charts/ directory: kube-state-metrics, prometheus-node-exporter, grafana
[root@hombd03 ~]# 

是缺少了依赖,进行修复:

[root@hombd03 ~]# mkdir prometheus-operator/charts
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/kube-state-metrics/ prometheus-operator/charts/
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/prometheus-node-exporter/ prometheus-operator/charts/
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/grafana/ prometheus-operator/charts/

然后再执行上述安装命令:

# kubectl create namespace monitoring
[root@hombd03 ~]# helm install   imooc-prom ./prometheus-operator/ --namespace monitoring

ed in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 ValidatingWebhookConfiguration
NAME: imooc-prom
LAST DEPLOYED: Sun Jun 26 10:42:48 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
*******************
*** DEPRECATED ****
*******************
* stable/prometheus-operator chart is deprecated.
* Further development has moved to https://github.com/prometheus-community/helm-charts
* The chart has been renamed kube-prometheus-stack to more clearly reflect
* that it installs the `kube-prometheus` project stack, within which Prometheus
* Operator is only one component.

The Prometheus Operator has been installed. Check its status by running:
  kubectl --namespace monitoring get pods -l "release=imooc-prom"

Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.

查看状态:

[root@hombd03 ~]# kubectl --namespace monitoring get pods -l "release=imooc-prom"
NAME                                                   READY   STATUS    RESTARTS   AGE
imooc-prom-prometheus-node-exporter-7kngz              1/1     Running   0          111s
imooc-prom-prometheus-node-exporter-fj6pk              1/1     Running   0          111s
imooc-prom-prometheus-oper-operator-69897755d9-c8jtg   2/2     Running   0          111s
[root@homaybd03 ~]# 

查看自定义 crd:

root@hombd03 ~]# kubectl get crd|grep coreos
alertmanagers.monitoring.coreos.com                   2022-06-26T02:39:39Z
podmonitors.monitoring.coreos.com                     2022-06-26T02:39:39Z
prometheuses.monitoring.coreos.com                    2022-06-26T02:39:39Z
prometheusrules.monitoring.coreos.com                 2022-06-26T02:39:40Z
servicemonitors.monitoring.coreos.com                 2022-06-26T02:39:40Z
thanosrulers.monitoring.coreos.com                    2022-06-26T02:39:40Z
[root@homaybd03 ~]# 

查看yaml资源:

[root@hombd03 ~]# kubectl get crd alertmanagers.monitoring.coreos.com -o yaml | less

查看该命名空间下的资源:

[root@hombd03 ~]# kubectl get all -n monitoring
[root@homaybd03 ~]# kubectl get alertmanager -n monitoring
NAME                                      VERSION   REPLICAS   AGE
imooc-prom-prometheus-oper-alertmanager   v0.21.0   1          25m

查看pod:

[root@hombd03 ~]# kubectl get pods -n monitoring
NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-imooc-prom-prometheus-oper-alertmanager-0   2/2     Running   0          29m
imooc-prom-grafana-7958597d48-6j62s                      2/2     Running   0          29m
imooc-prom-kube-state-metrics-b5586675f-cr5dh            1/1     Running   0          29m
imooc-prom-prometheus-node-exporter-7kngz                1/1     Running   0          29m
imooc-prom-prometheus-node-exporter-fj6pk                1/1     Running   0          29m
imooc-prom-prometheus-oper-operator-69897755d9-c8jtg     2/2     Running   0          29m
prometheus-imooc-prom-prometheus-oper-prometheus-0       3/3     Running   1          29m
[root@homaybd03 ~]# 

查看部署信息:

[root@hombd03 ~]# kubectl get deploy -n monitoring
NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
imooc-prom-grafana                    1/1     1            1           34m
imooc-prom-kube-state-metrics         1/1     1            1           34m
imooc-prom-prometheus-oper-operator   1/1     1            1           34m
[root@homaybd03 ~]# 

查看ds信息:

[root@hombd03 ~]# kubectl get ds -n monitoring
NAME                                  DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
imooc-prom-prometheus-node-exporter   2         2         2       2            2           <none>          35m
[root@homaybd03 ~]# 

查看 statefulset:

[root@hombd03 ~]# kubectl get statefulset -n monitoring
NAME                                                   READY   AGE
alertmanager-imooc-prom-prometheus-oper-alertmanager   1/1     37m
prometheus-imooc-prom-prometheus-oper-prometheus       1/1     36m
[root@homaybd03 ~]# 

查看 service:

[root@hombd03 ~]# kubectl get svc -n monitoring
NAME                                      TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                     ClusterIP   None              <none>        9093/TCP,9094/TCP,9094/UDP   39m
imooc-prom-grafana                        ClusterIP   192.233.63.180    <none>        80/TCP                       39m
imooc-prom-kube-state-metrics             ClusterIP   192.233.158.170   <none>        8080/TCP                     39m
imooc-prom-prometheus-node-exporter       ClusterIP   192.233.99.65     <none>        9100/TCP                     39m
imooc-prom-prometheus-oper-alertmanager   ClusterIP   192.233.249.182   <none>        9093/TCP                     39m
imooc-prom-prometheus-oper-operator       ClusterIP   192.233.158.223   <none>        8080/TCP,443/TCP             39m
imooc-prom-prometheus-oper-prometheus     ClusterIP   192.233.119.38    <none>        9090/TCP                     39m
prometheus-operated                       ClusterIP   None              <none>        9090/TCP                     39m
[root@hombd03 ~]# 

查看访问:

[root@hombd03 ~]# kubectl get svc -n monitoring imooc-prom-prometheus-oper-prometheus -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: imooc-prom
    meta.helm.sh/release-namespace: monitoring
  creationTimestamp: "2022-06-26T02:43:13Z"
  labels:
    app: prometheus-operator-prometheus
    app.kubernetes.io/managed-by: Helm
    chart: prometheus-operator-9.3.2
    heritage: Helm
    release: imooc-prom
    self-monitor: "true"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:meta.helm.sh/release-name: {}
          f:meta.helm.sh/release-namespace: {}
        f:labels:
          .: {}
          f:app: {}
          f:app.kubernetes.io/managed-by: {}
          f:chart: {}
          f:heritage: {}
          f:release: {}
          f:self-monitor: {}
      f:spec:
        f:ports:
          .: {}
          k:{"port":9090,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:selector:
          .: {}
          f:app: {}
          f:prometheus: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: helm
    operation: Update
    time: "2022-06-26T02:43:13Z"
  name: imooc-prom-prometheus-oper-prometheus
  namespace: monitoring
  resourceVersion: "2591280"
  uid: 5cc3cb90-3107-4e67-893a-c2969a22d983
spec:
  clusterIP: 192.233.119.38
  clusterIPs:
  - 192.233.119.38
  ports:
  - name: web
    port: 9090
    protocol: TCP
    targetPort: 9090
  selector:
    app: prometheus
    prometheus: imooc-prom-prometheus-oper-prometheus
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}
[root@homaybd03 ~]# 

部署 ingress-prometheus:

[root@homaybd03 12-monitoring]# kubectl apply -f ingress-prometheus.yaml
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Error from server (InternalError): error when creating "ingress-prometheus.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s": dial tcp 192.233.236.91:443: i/o timeout
[root@homaybd03 12-monitoring]# 

创建自定义ingress报错:Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io

[root@homaybd03 ~]# kubectl get validatingwebhookconfigurations
NAME                                   WEBHOOKS   AGE
imooc-prom-prometheus-oper-admission   1          116m
ingress-nginx-admission                1          7d11h
[root@homaybd03 ~]# 

删除 ingress-nginx-admission

kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
kubectl delete -A ValidatingWebhookConfiguration imooc-prom-prometheus-oper-admission

然后再重新执行:

[root@homaybd03 12-monitoring]# kubectl apply -f ingress-prometheus.yaml
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
ingress.extensions/prometheus created

进行配置hosts:

192.168.1.124 prometheus.mooc.com

访问界面:
http://prometheus.mooc.com/graph

file

file

file

卸载Promethues:
由于Promethues太耗资源,cpu爆满,所以在测试完成后需要将其卸载:

[root@hombd03 ~]# helm list -n monitoring
NAME        NAMESPACE   REVISION    UPDATED                                 STATUS      CHART                       APP VERSION
imooc-prom  monitoring  1           2022-06-26 10:42:48.532812441 +0800 CST deployed    prometheus-operator-9.3.2   0.38.1     
[root@homaybd03 ~]# 

[root@hombd03 ~]# helm delete imooc-prom -n monitoring

卸载打印:

[root@hombd03 ~]# helm delete imooc-prom -n monitoring
W0706 10:52:12.535602   22712 warnings.go:70] rbac.authorization.k8s.io/v1beta1 RoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 RoleBinding
W0706 10:52:12.547651   22712 warnings.go:70] rbac.authorization.k8s.io/v1beta1 Role is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 Role
W0706 10:52:12.774223   22712 warnings.go:70] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration
W0706 10:52:12.969331   22712 warnings.go:70] admissionregistration.k8s.io/v1beta1 ValidatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 ValidatingWebhookConfiguration
release "imooc-prom" uninstalled
[root@hombd03 ~]# 

file

file

file

file

file

file

图表:
file

file

可以监控节点及组件的信息;

Grafana:
file

file

file

file

为者常成,行者常至