Kubernetes 证书更新

声明：以下实验环境为 k8s v1.21.0, 其它版本不一定适用。

k8s突然发现启动不了了，用 journalctl -xefu kubelet 查看一下日志, 发现有如下打印：

part of the existing bootstrap client certificate in /etc/kubernetes/kubelet.conf is expired: 2023-05-19 01:57:11 +0000 UTC

很明显是证书过期的问题，可以用命令 kubeadm certs check-expiration 查看证书过期时间确认一下：

master> kubeadm certs check-expiration 
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 May 22, 2023 05:24 UTC   <invalid>                               no      
apiserver                  May 22, 2023 05:24 UTC   <invalid>       ca                      no      
apiserver-etcd-client      May 22, 2023 05:24 UTC   <invalid>       etcd-ca                 no      
apiserver-kubelet-client   May 22, 2023 05:24 UTC   <invalid>       ca                      no      
controller-manager.conf    May 22, 2023 05:25 UTC   <invalid>                               no      
etcd-healthcheck-client    May 22, 2023 05:24 UTC   <invalid>       etcd-ca                 no      
etcd-peer                  May 22, 2023 05:24 UTC   <invalid>       etcd-ca                 no      
etcd-server                May 22, 2023 05:24 UTC   <invalid>       etcd-ca                 no      
front-proxy-client         May 22, 2023 05:24 UTC   <invalid>       front-proxy-ca          no      
scheduler.conf             May 22, 2023 05:25 UTC   <invalid>                               no      

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Nov 20, 2030 05:24 UTC   7y              no      
etcd-ca                 Nov 20, 2030 05:24 UTC   7y              no      
front-proxy-ca          Nov 20, 2030 05:24 UTC   7y              no

可以确认是证书过期的问题了，K8S默认证书只时间只有一年时间，用以下的方式重新分配证书可以解决这个问题。

首先在master上执行kubeadm certs renew all分配新的证书：

master> kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.

然后可以再次执行kubeadm certs check-expiration查看一下证书的过期时间是否已经更新。

然后重启kubelet服务，发现并未启动成功，而是提示 failed to run Kubelet: unable to load bootstrap kubeconfig

在网络上找到如下方法可以解决：

$ cd /etc/kubernetes/pki/
$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} ~/
$ kubeadm init phase certs all
$ cd /etc/kubernetes/
$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} ~/
$ kubeadm init phase kubeconfig all
$ cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

再次重启kubelet服务，成功启动了。现在查看一下集群的状态：

> kubectl get nodes
NAME            STATUS      ROLES    AGE      VERSION
s1001.lab.org   Ready       master   2y182d   v1.21.0
s1002.lab.org   NotReady    node     2y182d   v1.21.0
s1003.lab.org   NotReady    node     2y182d   v1.21.0

此时master节点是正常运行的，而node节点却还没有启动。

我这里采用的方式是让node节点重新加入集群，这样证书会自动更新。首先在master节点上创建token:

master> kubeadm token create --print-join-command
kubeadm join 172.16.10.11:6443 --token xu2m0g.7t5rlkq2yjwpkrtv --discovery-token-ca-cert-hash sha256:0bc85cc15d5b5b76e4209d8c3781330aa38309f29c271f3c45617a053d4bb23a

然后在node节点上删除原证书信息(删除前注意先备份，以免发生意外)，然后再执行上面打印的加入集群的命令：

rm -rf /etc/kubernetes/kubelet.conf
rm -rf /etc/kubernetes/pki/ca.crt
kubeadm token create --print-join-command
kubeadm join 172.16.10.11:6443 --token xu2m0g.7t5rlkq2yjwpkrtv --discovery-token-ca-cert-hash sha256:0bc85cc15d5b5b76e4209d8c3781330aa38309f29c271f3c45617a053d4bb23a

确认节点成功加入集群后，就可以重复以上操作将其它节点重新加入集群。

再次查看集群状态，已经都正常启动了。

master> kubectl get nodes
NAME            STATUS   ROLES    AGE      VERSION
s1001.lab.org   Ready    master   2y182d   v1.21.0
s1002.lab.org   Ready    node     2y182d   v1.21.0
s1003.lab.org   Ready    node     2y182d   v1.21.0

PS: 我这里用的 k8s v1.21 版本是两年的，现在也算个老版本了，目前 k8s 的最新版是 v1.27。这里只是记录一下解决问题的过程。后面时间充裕了再更新下k8s版本。

Kubernetes certificate renewal

CATALOG

FEATURED TAGS