自宅のラズパイk8sクラスターでHPAを利用できるようにしようと思い立ち、metrics-serverをインストールしてみたものの、metrics-serverが起動しなかったのでトラブルシュートしてみた
環境
- Kubernetes v1.30.0
- metrics-server v0.7.1
インストール
手順に従いインストールコマンドを実行
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
github.com
起動確認
Deployment
READYが0/1のまま...
k get deployment metrics-server -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 0/1 1 0 10h
Pod
metrics-serverだけREADYにならない
k get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7db6d8ff4d-67gfj 1/1 Running 0 8d
coredns-7db6d8ff4d-xlznj 1/1 Running 0 8d
etcd-k8s-master01 1/1 Running 0 8d
kube-apiserver-k8s-master01 1/1 Running 0 8d
kube-controller-manager-k8s-master01 1/1 Running 0 8d
kube-proxy-dww28 1/1 Running 0 8d
kube-proxy-lb9h2 1/1 Running 0 8d
kube-proxy-t9lsn 1/1 Running 0 8d
kube-scheduler-k8s-master01 1/1 Running 0 8d
metrics-server-7ffbc6d68-pqtdt 0/1 Running 0 10h
Describe Pods
kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Readiness probeが失敗してしまっているようだ
k describe po metrics-server-7ffbc6d68-pqtdt -n kube-system
Name: metrics-server-7ffbc6d68-pqtdt
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: metrics-server
Node: k8s-worker01/192.168.1.111
Start Time: Wed, 29 May 2024 23:51:51 +0900
Labels: k8s-app=metrics-server
pod-template-hash=7ffbc6d68
Annotations: <none>
Status: Running
IP: 10.244.1.73
IPs:
IP: 10.244.1.73
Controlled By: ReplicaSet/metrics-server-7ffbc6d68
Containers:
metrics-server:
Container ID: containerd://603b5f91968f0afb410fa3a49786c563bd26f87de59c11ce1d4822e743a7fa29
Image: registry.k8s.io/metrics-server/metrics-server:v0.7.1
Image ID: registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a
Port: 10250/TCP
Host Port: 0/TCP
Args:
--cert-dir=/tmp
--secure-port=10250
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--kubelet-use-node-status-port
--metric-resolution=15s
State: Running
Started: Wed, 29 May 2024 23:52:00 +0900
Ready: False
Restart Count: 0
Requests:
cpu: 100m
memory: 200Mi
Liveness: http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp from tmp-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6szcf (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tmp-dir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-6szcf:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 4m41s (x4114 over 10h) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
Logs
k logs metrics-server-7ffbc6d68-6srfk -n kube-system
I0530 03:08:07.651554 1 serving.go:374] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0530 03:08:08.554701 1 handler.go:275] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
I0530 03:08:08.686371 1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0530 03:08:08.686459 1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0530 03:08:08.686466 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0530 03:08:08.686518 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0530 03:08:08.686517 1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0530 03:08:08.687032 1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0530 03:08:08.689211 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
E0530 03:08:08.689974 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.101:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.101 because it doesn't contain any IP SANs" node="k8s-master01"
I0530 03:08:08.690832 1 secure_serving.go:213] Serving securely on [::]:10250
I0530 03:08:08.691791 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
E0530 03:08:08.692009 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.112:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.112 because it doesn't contain any IP SANs" node="k8s-worker02"
E0530 03:08:08.701807 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.111:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.111 because it doesn't contain any IP SANs" node="k8s-worker01"
I0530 03:08:08.786953 1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0530 03:08:08.787115 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0530 03:08:08.788043 1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E0530 03:08:23.685957 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.112:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.112 because it doesn't contain any IP SANs" node="k8s-worker02"
E0530 03:08:23.692988 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.111:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.111 because it doesn't contain any IP SANs" node="k8s-worker01"
E0530 03:08:23.695203 1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.101:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.101 because it doesn't contain any IP SANs" node="k8s-master01"
I0530 03:08:36.493054 1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"
ログからmetrics Serverが起動しない原因は、Kubeletの証明書がIP SAN(Subject Alternative Name)を含んでいないため、TLS検証が失敗していることっぽいことがわかったのでTLS検証を無効化していく
TLS検証の無効化
というわけで現状自宅のラズパイk8sクラスターはkubelet証明書がクラスター認証局に署名されていないのでargs
に--kubelet-insecure-tls
を渡して証明書の検証を無効化していく
k edit deploy metrics-server -n kube-system
template:
metadata:
creationTimestamp: null
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=10250
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls
再度動作確認
Pods
k get po -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-7db6d8ff4d-67gfj 1/1 Running 0 12d
coredns-7db6d8ff4d-xlznj 1/1 Running 0 12d
etcd-k8s-master01 1/1 Running 0 12d
kube-apiserver-k8s-master01 1/1 Running 0 12d
kube-controller-manager-k8s-master01 1/1 Running 0 12d
kube-proxy-dww28 1/1 Running 0 12d
kube-proxy-lb9h2 1/1 Running 0 12d
kube-proxy-t9lsn 1/1 Running 0 12d
kube-scheduler-k8s-master01 1/1 Running 0 12d
metrics-server-d994c478f-8f7hh 1/1 Running 0 23m
Deployments
k get deploy -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
coredns 2/2 2 2 12d
metrics-server 1/1 1 1 3d10h
というわけで無事起動させることができた!