Tocyukiのブログ

ギターと柔術とプログラミングが好き!

HPAを利用するためにインストールしたmetrics-serverが起動しなかったのでトラブルシュートした

自宅のラズパイk8sクラスターでHPAを利用できるようにしようと思い立ち、metrics-serverをインストールしてみたものの、metrics-serverが起動しなかったのでトラブルシュートしてみた

環境

  • Kubernetes v1.30.0
  • metrics-server v0.7.1

インストール

手順に従いインストールコマンドを実行

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

github.com

起動確認

Deployment

READYが0/1のまま...

k get deployment metrics-server -n kube-system

NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   0/1     1            0           10h

Pod

metrics-serverだけREADYにならない

k get po -n kube-system

NAME                                   READY   STATUS    RESTARTS   AGE
coredns-7db6d8ff4d-67gfj               1/1     Running   0          8d
coredns-7db6d8ff4d-xlznj               1/1     Running   0          8d
etcd-k8s-master01                      1/1     Running   0          8d
kube-apiserver-k8s-master01            1/1     Running   0          8d
kube-controller-manager-k8s-master01   1/1     Running   0          8d
kube-proxy-dww28                       1/1     Running   0          8d
kube-proxy-lb9h2                       1/1     Running   0          8d
kube-proxy-t9lsn                       1/1     Running   0          8d
kube-scheduler-k8s-master01            1/1     Running   0          8d
metrics-server-7ffbc6d68-pqtdt         0/1     Running   0          10h

Describe Pods

kubelet Readiness probe failed: HTTP probe failed with statuscode: 500

Readiness probeが失敗してしまっているようだ

k describe po metrics-server-7ffbc6d68-pqtdt -n kube-system

Name:                 metrics-server-7ffbc6d68-pqtdt
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      metrics-server
Node:                 k8s-worker01/192.168.1.111
Start Time:           Wed, 29 May 2024 23:51:51 +0900
Labels:               k8s-app=metrics-server
                      pod-template-hash=7ffbc6d68
Annotations:          <none>
Status:               Running
IP:                   10.244.1.73
IPs:
  IP:           10.244.1.73
Controlled By:  ReplicaSet/metrics-server-7ffbc6d68
Containers:
  metrics-server:
    Container ID:  containerd://603b5f91968f0afb410fa3a49786c563bd26f87de59c11ce1d4822e743a7fa29
    Image:         registry.k8s.io/metrics-server/metrics-server:v0.7.1
    Image ID:      registry.k8s.io/metrics-server/metrics-server@sha256:db3800085a0957083930c3932b17580eec652cfb6156a05c0f79c7543e80d17a
    Port:          10250/TCP
    Host Port:     0/TCP
    Args:
      --cert-dir=/tmp
      --secure-port=10250
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --kubelet-use-node-status-port
      --metric-resolution=15s
    State:          Running
      Started:      Wed, 29 May 2024 23:52:00 +0900
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     200Mi
    Liveness:     http-get https://:https/livez delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get https://:https/readyz delay=20s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp from tmp-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6szcf (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  tmp-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  kube-api-access-6szcf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  4m41s (x4114 over 10h)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 500

Logs

k logs metrics-server-7ffbc6d68-6srfk -n kube-system

I0530 03:08:07.651554       1 serving.go:374] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0530 03:08:08.554701       1 handler.go:275] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
I0530 03:08:08.686371       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0530 03:08:08.686459       1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
I0530 03:08:08.686466       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0530 03:08:08.686518       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0530 03:08:08.686517       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0530 03:08:08.687032       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0530 03:08:08.689211       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key"
E0530 03:08:08.689974       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.101:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.101 because it doesn't contain any IP SANs" node="k8s-master01"
I0530 03:08:08.690832       1 secure_serving.go:213] Serving securely on [::]:10250
I0530 03:08:08.691791       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
E0530 03:08:08.692009       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.112:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.112 because it doesn't contain any IP SANs" node="k8s-worker02"
E0530 03:08:08.701807       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.111:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.111 because it doesn't contain any IP SANs" node="k8s-worker01"
I0530 03:08:08.786953       1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
I0530 03:08:08.787115       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0530 03:08:08.788043       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
E0530 03:08:23.685957       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.112:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.112 because it doesn't contain any IP SANs" node="k8s-worker02"
E0530 03:08:23.692988       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.111:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.111 because it doesn't contain any IP SANs" node="k8s-worker01"
E0530 03:08:23.695203       1 scraper.go:149] "Failed to scrape node" err="Get \"https://192.168.1.101:10250/metrics/resource\": tls: failed to verify certificate: x509: cannot validate certificate for 192.168.1.101 because it doesn't contain any IP SANs" node="k8s-master01"
I0530 03:08:36.493054       1 server.go:191] "Failed probe" probe="metric-storage-ready" err="no metrics to serve"

ログからmetrics Serverが起動しない原因は、Kubeletの証明書がIP SAN(Subject Alternative Name)を含んでいないため、TLS検証が失敗していることっぽいことがわかったのでTLS検証を無効化していく

TLS検証の無効化

というわけで現状自宅のラズパイk8sクラスターはkubelet証明書がクラスター認証局に署名されていないのでargs--kubelet-insecure-tlsを渡して証明書の検証を無効化していく

k edit deploy metrics-server -n kube-system
  template:
    metadata:
      creationTimestamp: null
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=10250
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls # ←こいつを追加

再度動作確認

Pods

k get po -n kube-system

NAME                                   READY   STATUS    RESTARTS   AGE
coredns-7db6d8ff4d-67gfj               1/1     Running   0          12d
coredns-7db6d8ff4d-xlznj               1/1     Running   0          12d
etcd-k8s-master01                      1/1     Running   0          12d
kube-apiserver-k8s-master01            1/1     Running   0          12d
kube-controller-manager-k8s-master01   1/1     Running   0          12d
kube-proxy-dww28                       1/1     Running   0          12d
kube-proxy-lb9h2                       1/1     Running   0          12d
kube-proxy-t9lsn                       1/1     Running   0          12d
kube-scheduler-k8s-master01            1/1     Running   0          12d
metrics-server-d994c478f-8f7hh         1/1     Running   0          23m

Deployments

k get deploy -n kube-system

NAME             READY   UP-TO-DATE   AVAILABLE   AGE
coredns          2/2     2            2           12d
metrics-server   1/1     1            1           3d10h

というわけで無事起動させることができた!