K8S 生产实践-16-3-Prometheus 实战
一、Helm&Operator
部署方案选择
- 1、手动部署
- 2、Helm
- 3、Prometheus Operator
- 4、Helm + Prometheus Operator
工具简介
Helm简介
- Ubuntu的apt-get、Centos的yum
- K8S 的包管理器
- 一包一 Chart(一个目录)
Helm其实就是一个基于Kubernetes的程序包(资源包)管理器,它将一个应用的相关资源组织成为Charts,并通过Charts管理程序包。再简单点说,可以当做RHEL/CentOS系统中的yum机制,有 yum install
,也有 helm install
等等。具体可以参考网上其他介绍。
kubernetes : 解决了容器维护的难题,通过yaml编写,比如deployment,job,statefulset、configmap等等,通过控制循环,让容器镜像便于管理,集群维护难度大减(google 15年 生产负载经验,每周维护数十亿容器)
管理应用过程中,维护一系列yaml文件,学习成本有点高。
所以Helm来了,Helm 定义了一套 Chart 格式来描述一个应用。打个比方,一个安卓程序打包成 APK 格式,就可以安装到任意一台运行安卓系统的手机上。如果我们把 kubernetes 比做安卓系统,kubernetes 应用比做安卓程序,那么 Chart 就可以比做 APK。这也意味着,kubernetes 应用只要打包成 Chart,就可以通过 Helm 部署到任意一个 kubernetes 集群上。
Helm 社区已经维护了一个官方 Helm Hub,我们可以直接使用别人已经做好的 Helm Chart,如下图,通过helm能更简单的管理比较复杂的应用程序。
所以对于我们来说:
- 面向java编程:java文件
- 面向docker编程:dockerfile
- 面向kubernetes编程:yaml
- 面向helm编程:chart
官网:https://helm.sh/docs/using_helm/#quickstart-guide
GitHub:https://github.com/helm/helm
Helm部署
首先你需要保证部署helm的节点必须可以正常执行kubectl
1. Helm客户端安装
======================== 更新为 helm3.x版本,横线内作废 Begin ========================
下载
Helm是一个二进制文件,我们直接到github的release去下载就可以,地址如下:
https://github.com/helm/helm/releases
由于国内网络原因,无法科学上网的同学可以到我的网盘上下载,版本是2.13.1-linux-amd64。
链接: https://pan.baidu.com/s/1bu-cpjVaSVGVXuWvWoqHEw
提取码: 5wds
安装
# 下载
$ wget https://get.helm.sh/helm-v2.3.1-linux-amd64.tar.gz
# 解压
$ tar -zxvf helm-v2.13.1-linux-amd64.tar.gz
$ mv linux-amd64/helm /usr/local/bin/
# 没配置环境变量的需要先配置好
$ export PATH=$PATH:/usr/local/bin/
# 验证
$ helm version
[root@hombd03 softwards]# helm version
Client: &version.Version{SemVer:"v2.13.1", GitCommit:"618447cbf203d147601b4b9bd7f8c37a5d39fbb4", GitTreeState:"clean"}
Error: could not find tiller
2. Tiller安装
Tiller 是以 Deployment 方式部署在 Kubernetes 集群中的,由于 Helm 默认会去 storage.googleapis.com 拉取镜像,我们这里就默认无法科学上网的情况:
# 指向阿里云的仓库
$ helm init --client-only --stable-repo-url https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts/
$ helm repo add incubator https://aliacs-app-catalog.oss-cn-hangzhou.aliyuncs.com/charts-incubator/
$ helm repo update
# 因为官方的镜像无法拉取,使用-i指定自己的镜像
$ helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1 --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
# 创建TLS认证服务端
$ helm init --service-account tiller --upgrade -i registry.cn-hangzhou.aliyuncs.com/google_containers/tiller:v2.13.1 --tiller-tls-cert /etc/kubernetes/ssl/tiller001.pem --tiller-tls-key /etc/kubernetes/ssl/tiller001-key.pem --tls-ca-cert /etc/kubernetes/ssl/ca.pem --tiller-namespace kube-system --stable-repo-url https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
3. 给Tiller授权
因为 Helm 的服务端 Tiller 是一个部署在 Kubernetes 中的 Deployment,它会去访问ApiServer去对集群进行操作。目前的 Tiller 部署时默认没有定义授权的 ServiceAccount,这会导致访问 API Server 时被拒绝。所以我们需要明确为 Tiller 部署添加授权。
# 创建serviceaccount
$ kubectl create serviceaccount --namespace kube-system tiller
# 创建角色绑定
$ kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
4. 验证
# 查看Tiller的serviceaccount,需要跟我们创建的名字一致:tiller
$ kubectl get deploy --namespace kube-system tiller-deploy -o yaml|grep serviceAccount
# 验证pods
$ kubectl -n kube-system get pods|grep tiller
# 验证版本
$ helm version
======================== 更新为 helm3.x版本,横线内作废 End ========================
以上在安装Tiller的时候会报错,所以,这里升级安装 helm3.0的版本,3.x版本已经移除了tiller,减少了很多麻烦;
载解压后就能用了,3.x版本已经移除了tiller
https://github.com/helm/helm/releases
wget https://get.helm.sh/helm-v3.8.0-linux-amd64.tar.gz
tar xf helm-v3.8.0-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm
打印:
root@homaybd03 softwards]# helm version
version.BuildInfo{Version:"v3.8.0", GitCommit:"d14138609b01886f544b2025f5000351c9eb092e", GitTreeState:"clean", GoVersion:"go1.17.5"}
3、添加仓库
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
4、安装
helm install 软件名. -f ./values.yaml --namespace 命名空间名
5、查看
helm list -n命名空间名
3、安装
在 Helm 2 中,如果没有指定 release 的名称,则会自动随机生成一个名称。但是在 Helm 3 中,则必须主动指定名称,或者增加 --generate-name 参数让它自动生成一个随机的名称。
在 helm v3 中,可以使用:
helm install [NAME] [CHART]
例子:
helm install rancher rancher-stable/rancher
helm install rancher-stable/rancher --generate-name
实战:
# 创建命名空间
kubectl create namespace monitoring
helm install imooc-prom stable/prometheus-operator --namespace monitoring
安装的时候结果报错了:
[root@hombd03 softwards]# helm install imooc-prom stable/prometheus-operator --namespace monitoring
Error: INSTALLATION FAILED: failed to download "stable/prometheus-operator"
[root@homaybd03 softwards]#
这是因为 helm 仓库没有prometheus-operator 的资源,需要去 github 上直接拉取;
https://github.com/helm/charts/tree/master/stable/prometheus-operator
下载 charts:
# yum -y install git
[root@homaybd03 softwards]# git clone https://github.com/helm/charts.git
下载完之后查看目录:
[root@hombd03 softwards]# cd charts/
[root@hombd03 charts]# ls -l
total 76
-rw-r--r--. 1 root root 137 Jun 26 09:40 code-of-conduct.md
-rw-r--r--. 1 root root 6765 Jun 26 09:40 CONTRIBUTING.md
drwxr-xr-x. 75 root root 4096 Jun 26 09:40 incubator
-rw-r--r--. 1 root root 11343 Jun 26 09:40 LICENSE
-rw-r--r--. 1 root root 240 Jun 26 09:40 OWNERS
-rw-r--r--. 1 root root 3248 Jun 26 09:40 PROCESSES.md
-rw-r--r--. 1 root root 10057 Jun 26 09:40 README.md
-rw-r--r--. 1 root root 15223 Jun 26 09:40 REVIEW_GUIDELINES.md
drwxr-xr-x. 284 root root 8192 Jun 26 09:40 stable
drwxr-xr-x. 3 root root 168 Jun 26 09:40 test
[root@homaybd03 charts]# cd stable/
[root@homaybd03 stable]# ls -l
total 4
drwxr-xr-x. 3 root root 96 Jun 26 09:40 acs-engine-autoscaler
drwxr-xr-x. 3 root root 96 Jun 26 09:40 aerospike
...
root@homaybd03 stable]# cd prometheus-operator/
[root@homaybd03 prometheus-operator]# ls -l
total 160
-rw-r--r--. 1 root root 790 Jun 26 09:40 Chart.yaml
drwxr-xr-x. 2 root root 83 Jun 26 09:40 ci
-rw-r--r--. 1 root root 658 Jun 26 09:40 CONTRIBUTING.md
drwxr-xr-x. 2 root root 181 Jun 26 09:40 crds
drwxr-xr-x. 3 root root 129 Jun 26 09:40 hack
-rw-r--r--. 1 root root 78549 Jun 26 09:40 README.md
-rw-r--r--. 1 root root 457 Jun 26 09:40 requirements.lock
-rw-r--r--. 1 root root 468 Jun 26 09:40 requirements.yaml
drwxr-xr-x. 7 root root 140 Jun 26 09:40 templates
-rw-r--r--. 1 root root 64481 Jun 26 09:40 values.yaml
[root@homaybd03 prometheus-operator]#
将需要的文件拷贝到当前目录:
[root@hombd03 prometheus-operator]# cd ~
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/prometheus-operator/ .
[root@hombd03 ~]# ls -l
total 248
-rw-------. 1 root root 1820 Jan 27 2021 anaconda-ks.cfg
-rw-r--r--. 1 root root 202880 Jun 4 23:02 calico.yaml
-rw-r--r--. 1 root root 5159 Jun 4 23:53 coredns.yaml
-rw-r--r--. 1 root root 15169 Jun 19 01:06 deploy.yaml
drwxr-xr-x. 2 root root 53 Jun 19 09:36 ingress-nginx
drwxr-xr-x. 2 root root 42 Jun 18 00:59 nginx
-rw-r--r--. 1 root root 497 Jun 5 00:28 nginx-ds.yml
-rw-r--r--. 1 root root 3807 Jun 5 00:04 nodelocaldns.yaml
drwxr-xr-x. 2 root root 4096 Jun 4 01:46 pki
-rw-r--r--. 1 root root 160 Jun 5 01:26 pod-nginx.yaml
drwxr-xr-x. 6 root root 203 Jun 26 09:44 prometheus-operator
drwxr-xr-x. 2 root root 34 Mar 3 16:41 software
-rw-r--r--. 1 root root 1473 Jun 23 00:45 tiller.yaml
drwxr-xr-x. 2 root root 55 Mar 17 11:29 wing-test
[root@homaybd03 ~]#
然后再去尝试安装:
helm install imooc-prom ./prometheus-operator/ --namespace monitoring
[root@hombd03 ~]# helm install imooc-prom ./prometheus-operator/ --namespace monitoring
WARNING: This chart is deprecated
Error: INSTALLATION FAILED: An error occurred while checking for chart dependencies. You may need to run `helm dependency build` to fetch missing dependencies: found in Chart.yaml, but missing in charts/ directory: kube-state-metrics, prometheus-node-exporter, grafana
[root@hombd03 ~]#
是缺少了依赖,进行修复:
[root@hombd03 ~]# mkdir prometheus-operator/charts
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/kube-state-metrics/ prometheus-operator/charts/
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/prometheus-node-exporter/ prometheus-operator/charts/
[root@hombd03 ~]# cp -r /opt/softwards/charts/stable/grafana/ prometheus-operator/charts/
然后再执行上述安装命令:
# kubectl create namespace monitoring
[root@hombd03 ~]# helm install imooc-prom ./prometheus-operator/ --namespace monitoring
ed in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 ValidatingWebhookConfiguration
NAME: imooc-prom
LAST DEPLOYED: Sun Jun 26 10:42:48 2022
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
*******************
*** DEPRECATED ****
*******************
* stable/prometheus-operator chart is deprecated.
* Further development has moved to https://github.com/prometheus-community/helm-charts
* The chart has been renamed kube-prometheus-stack to more clearly reflect
* that it installs the `kube-prometheus` project stack, within which Prometheus
* Operator is only one component.
The Prometheus Operator has been installed. Check its status by running:
kubectl --namespace monitoring get pods -l "release=imooc-prom"
Visit https://github.com/coreos/prometheus-operator for instructions on how
to create & configure Alertmanager and Prometheus instances using the Operator.
查看状态:
[root@hombd03 ~]# kubectl --namespace monitoring get pods -l "release=imooc-prom"
NAME READY STATUS RESTARTS AGE
imooc-prom-prometheus-node-exporter-7kngz 1/1 Running 0 111s
imooc-prom-prometheus-node-exporter-fj6pk 1/1 Running 0 111s
imooc-prom-prometheus-oper-operator-69897755d9-c8jtg 2/2 Running 0 111s
[root@homaybd03 ~]#
查看自定义 crd:
root@hombd03 ~]# kubectl get crd|grep coreos
alertmanagers.monitoring.coreos.com 2022-06-26T02:39:39Z
podmonitors.monitoring.coreos.com 2022-06-26T02:39:39Z
prometheuses.monitoring.coreos.com 2022-06-26T02:39:39Z
prometheusrules.monitoring.coreos.com 2022-06-26T02:39:40Z
servicemonitors.monitoring.coreos.com 2022-06-26T02:39:40Z
thanosrulers.monitoring.coreos.com 2022-06-26T02:39:40Z
[root@homaybd03 ~]#
查看yaml资源:
[root@hombd03 ~]# kubectl get crd alertmanagers.monitoring.coreos.com -o yaml | less
查看该命名空间下的资源:
[root@hombd03 ~]# kubectl get all -n monitoring
[root@homaybd03 ~]# kubectl get alertmanager -n monitoring
NAME VERSION REPLICAS AGE
imooc-prom-prometheus-oper-alertmanager v0.21.0 1 25m
查看pod:
[root@hombd03 ~]# kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-imooc-prom-prometheus-oper-alertmanager-0 2/2 Running 0 29m
imooc-prom-grafana-7958597d48-6j62s 2/2 Running 0 29m
imooc-prom-kube-state-metrics-b5586675f-cr5dh 1/1 Running 0 29m
imooc-prom-prometheus-node-exporter-7kngz 1/1 Running 0 29m
imooc-prom-prometheus-node-exporter-fj6pk 1/1 Running 0 29m
imooc-prom-prometheus-oper-operator-69897755d9-c8jtg 2/2 Running 0 29m
prometheus-imooc-prom-prometheus-oper-prometheus-0 3/3 Running 1 29m
[root@homaybd03 ~]#
查看部署信息:
[root@hombd03 ~]# kubectl get deploy -n monitoring
NAME READY UP-TO-DATE AVAILABLE AGE
imooc-prom-grafana 1/1 1 1 34m
imooc-prom-kube-state-metrics 1/1 1 1 34m
imooc-prom-prometheus-oper-operator 1/1 1 1 34m
[root@homaybd03 ~]#
查看ds信息:
[root@hombd03 ~]# kubectl get ds -n monitoring
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
imooc-prom-prometheus-node-exporter 2 2 2 2 2 <none> 35m
[root@homaybd03 ~]#
查看 statefulset:
[root@hombd03 ~]# kubectl get statefulset -n monitoring
NAME READY AGE
alertmanager-imooc-prom-prometheus-oper-alertmanager 1/1 37m
prometheus-imooc-prom-prometheus-oper-prometheus 1/1 36m
[root@homaybd03 ~]#
查看 service:
[root@hombd03 ~]# kubectl get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 39m
imooc-prom-grafana ClusterIP 192.233.63.180 <none> 80/TCP 39m
imooc-prom-kube-state-metrics ClusterIP 192.233.158.170 <none> 8080/TCP 39m
imooc-prom-prometheus-node-exporter ClusterIP 192.233.99.65 <none> 9100/TCP 39m
imooc-prom-prometheus-oper-alertmanager ClusterIP 192.233.249.182 <none> 9093/TCP 39m
imooc-prom-prometheus-oper-operator ClusterIP 192.233.158.223 <none> 8080/TCP,443/TCP 39m
imooc-prom-prometheus-oper-prometheus ClusterIP 192.233.119.38 <none> 9090/TCP 39m
prometheus-operated ClusterIP None <none> 9090/TCP 39m
[root@hombd03 ~]#
查看访问:
[root@hombd03 ~]# kubectl get svc -n monitoring imooc-prom-prometheus-oper-prometheus -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: imooc-prom
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2022-06-26T02:43:13Z"
labels:
app: prometheus-operator-prometheus
app.kubernetes.io/managed-by: Helm
chart: prometheus-operator-9.3.2
heritage: Helm
release: imooc-prom
self-monitor: "true"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:meta.helm.sh/release-name: {}
f:meta.helm.sh/release-namespace: {}
f:labels:
.: {}
f:app: {}
f:app.kubernetes.io/managed-by: {}
f:chart: {}
f:heritage: {}
f:release: {}
f:self-monitor: {}
f:spec:
f:ports:
.: {}
k:{"port":9090,"protocol":"TCP"}:
.: {}
f:name: {}
f:port: {}
f:protocol: {}
f:targetPort: {}
f:selector:
.: {}
f:app: {}
f:prometheus: {}
f:sessionAffinity: {}
f:type: {}
manager: helm
operation: Update
time: "2022-06-26T02:43:13Z"
name: imooc-prom-prometheus-oper-prometheus
namespace: monitoring
resourceVersion: "2591280"
uid: 5cc3cb90-3107-4e67-893a-c2969a22d983
spec:
clusterIP: 192.233.119.38
clusterIPs:
- 192.233.119.38
ports:
- name: web
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prometheus
prometheus: imooc-prom-prometheus-oper-prometheus
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
[root@homaybd03 ~]#
部署 ingress-prometheus
:
[root@homaybd03 12-monitoring]# kubectl apply -f ingress-prometheus.yaml
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
Error from server (InternalError): error when creating "ingress-prometheus.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post "https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s": dial tcp 192.233.236.91:443: i/o timeout
[root@homaybd03 12-monitoring]#
创建自定义ingress报错:Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io
[root@homaybd03 ~]# kubectl get validatingwebhookconfigurations
NAME WEBHOOKS AGE
imooc-prom-prometheus-oper-admission 1 116m
ingress-nginx-admission 1 7d11h
[root@homaybd03 ~]#
删除 ingress-nginx-admission
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
kubectl delete -A ValidatingWebhookConfiguration imooc-prom-prometheus-oper-admission
然后再重新执行:
[root@homaybd03 12-monitoring]# kubectl apply -f ingress-prometheus.yaml
Warning: extensions/v1beta1 Ingress is deprecated in v1.14+, unavailable in v1.22+; use networking.k8s.io/v1 Ingress
ingress.extensions/prometheus created
进行配置hosts:
192.168.1.124 prometheus.mooc.com
访问界面:
http://prometheus.mooc.com/graph
卸载Promethues:
由于Promethues太耗资源,cpu爆满,所以在测试完成后需要将其卸载:
[root@hombd03 ~]# helm list -n monitoring
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
imooc-prom monitoring 1 2022-06-26 10:42:48.532812441 +0800 CST deployed prometheus-operator-9.3.2 0.38.1
[root@homaybd03 ~]#
[root@hombd03 ~]# helm delete imooc-prom -n monitoring
卸载打印:
[root@hombd03 ~]# helm delete imooc-prom -n monitoring
W0706 10:52:12.535602 22712 warnings.go:70] rbac.authorization.k8s.io/v1beta1 RoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 RoleBinding
W0706 10:52:12.547651 22712 warnings.go:70] rbac.authorization.k8s.io/v1beta1 Role is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 Role
W0706 10:52:12.774223 22712 warnings.go:70] admissionregistration.k8s.io/v1beta1 MutatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 MutatingWebhookConfiguration
W0706 10:52:12.969331 22712 warnings.go:70] admissionregistration.k8s.io/v1beta1 ValidatingWebhookConfiguration is deprecated in v1.16+, unavailable in v1.22+; use admissionregistration.k8s.io/v1 ValidatingWebhookConfiguration
release "imooc-prom" uninstalled
[root@hombd03 ~]#
图表:
可以监控节点及组件的信息;
Grafana:
为者常成,行者常至
自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)