通过备份 Etcd 来完美恢复 Kubernetes 中的误删数据
2021/7/22 6:06:24
本文主要是介绍通过备份 Etcd 来完美恢复 Kubernetes 中的误删数据,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
误删除或者服务器宕机,会导致 Etcd 数据的丢失或某个节点的 Etcd 数据异常时,当误删时,需要恢复数据,这个在实际环境当中是不可避免的。以下描述删除两个 namespace 下的 Pod,如何恢复对应 namespace 的数据。
1、操作环境
- 3 个(master、etcd)+1 个 node
- 新建 1 个 namespace 下且创建 Pod 和 default namespace 下创建 Pod
2、准备
数据误删除已经在 Etcd 备份
3、数据环境的模拟
3.1、新建 test namespace,在该 ns 下创建 Pod,创建成功之后,此时这些数据已存在 Etcd 中
kubectl get pod -n test NAME READY STATUS RESTARTS AGE details-v1-7d78fc5688-ntvtd 1/1 Running 0 12h productpage-v1-844495cb4b-qn6l9 1/1 Running 0 12h ratings-v1-55ccf46fb4-pwqw7 1/1 Running 0 12h reviews-v1-68bb7b8c4f-h6vr9 1/1 Running 0 12h
3.2、在 default namespace 创建 Pod,如下所示,创建成功之后,此时这些数据已存在 Etcd 中
kubectl get pod NAME READY STATUS RESTARTS AGE nginx-6db489d4b7-jdfkw 1/1 Running 0 12h nginx1-675bf6c9f-jndrx 1/1 Running 0 12h nginx2-6dfc958b55-vgsc4 1/1 Running 0 12h nginx3-8f65b44f9-2pk4l 1/1 Running 0 12h nginx4-9bb966479-svhs6 1/1 Running 0 12h nginx5-7d4d998c6c-7xlp8 1/1 Running 0 12h nginx6-79444687cf-rc8vk 1/1 Running 0 12h
3.3、 在某目录下已存放包含上述产生数据的 Etcd 快照
ls member snapshot.db
3.4、模拟数据的删除
default 和 test namespace 下 Pod 都被干掉
kubectl get pod No resources found in default namespace. [root@node1 etcd-2021-01-19_02:00:01]# kubectl get pod -n test No resources found in test namespace.
4、数据恢复
停止所有 master 上 kube-apiserver 服务,因为和 Etcd 交互是 kube-apiserver。此步骤不会影响正常运行的 Pod。
4.1、所有 master 机器上操作如下指令,将 kube-apiserver.yaml 文件移除当前目录下,作用是停止 kube-apiserver 服务
mv /etc/kubernetes/manifests/kube-apiserver.yaml .
4.2、检查 kube-apiserver 服务已停止
kubectl get pod -n kube-system | grep kube-apiserver Unable to connect to the server: EOF
4.3、所有 Etcd 机器上,停止 Etcd 运行,此步操作不会影响现有的 Pod,只是不能新建、删除 Pod
systemctl stop etcd
4.4、查看 Etcd 服务是否已停止
docker ps -a | grep etcd 18dfbbfb8317 kubesphere/etcd:v3.3.12 "/usr/local/bin/etcd" 15 hours ago Exited (0) 45 seconds ago etcd3
4.5、移除所有 Etcd 存储目录下数据,不同环境下,存储目录可能不一样,可以通过 systemctl status etcd 查看 Etcd 配置参数
mv /var/lib/etcd /var/lib/etcd.bak
4.6、将 node1 机器上的快照 snapshot.db 分别拷贝到另外两台 Etcd 机器上
scp snapshot.db 192.168.0.26:/root scp snapshot.db 192.168.0.28:/root
5、恢复备份
不同环境下,目录可能不一样,可以通过 systemctl status etcd 查看 Etcd 配置参数。特别需要注意 name、initial-cluster、initial-cluster-token、initial-advertise-peer-urls 和 data-dir 参数的值。
5.1、在第一台 Etcd 节点上,注意需要 ETCDCTL_API=3、name 值、IP 值、snapshot.db 文件目录和 data-dir 目录。
export ETCDCTL_API=3 一条指令,可以直接在终端上修改里面参数 etcdctl snapshot restore snapshot.db --name etcd1 --initial-cluster "etcd1=https://192.168.0.25:2380,etcd2=https://192.168.0.26:2380,etcd3=https://192.168.0.28:2380" --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://192.168.0.25:2380 --data-dir=/var/lib/etcd 和上面指令一样作用,把长的指令以换行形式展现 etcdctl snapshot restore snapshot.db --name etcd1 \ --initial-cluster "etcd1=https://192.168.0.25:2380,etcd2=https://192.168.0.26:2380,etcd3=https://192.168.0.28:2380" \ --initial-cluster-token k8s_etcd \ --initial-advertise-peer-urls https://192.168.0.25:2380 \ --data-dir=/var/lib/etcd 2021-01-19 11:17:06.773113 I | mvcc: restore compact to 96139 2021-01-19 11:17:06.800086 I | etcdserver/membership: added member 7370b1d3dc967c [https://192.168.0.25:2380] to cluster e4d7f96e88cc9d71 2021-01-19 11:17:06.800159 I | etcdserver/membership: added member 2ef3cfc4ca48ad38 [https://192.168.0.26:2380] to cluster e4d7f96e88cc9d71 2021-01-19 11:17:06.800190 I | etcdserver/membership: added member 3a0c86c4c744477c [https://192.168.0.28:2380] to cluster e4d7f96e88cc9d71
5.2、第二台和第三台 Etcd 恢复数据,同样需要改变 ETCDCTL_API=3、name 值、IP 值、snapshot.db 文件目录和 data-dir 目录。
export ETCDCTL_API=3 一条指令,可以直接在终端上修改里面参数 etcdctl snapshot restore snapshot.db --name etcd2 --initial-cluster "etcd1=https://192.168.0.25:2380,etcd2=https://192.168.0.26:2380,etcd3=https://192.168.0.28:2380" --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://192.168.0.26:2380 --data-dir=/var/lib/etcd 和上面指令一样作用,把长的指令以换行形式展现 etcdctl snapshot restore snapshot.db --name etcd2 \ --initial-cluster "etcd1=https://192.168.0.25:2380,etcd2=https://192.168.0.26:2380,etcd3=https://192.168.0.28:2380" \ --initial-cluster-token k8s_etcd \ --initial-advertise-peer-urls https://192.168.0.26:2380 \ --data-dir=/var/lib/etcd 2021-01-19 11:19:59.857363 I | mvcc: restore compact to 96139 2021-01-19 11:19:59.873793 I | etcdserver/membership: added member 7370b1d3dc967c [https://192.168.0.25:2380] to cluster e4d7f96e88cc9d71 2021-01-19 11:19:59.873837 I | etcdserver/membership: added member 2ef3cfc4ca48ad38 [https://192.168.0.26:2380] to cluster e4d7f96e88cc9d71 2021-01-19 11:19:59.873852 I | etcdserver/membership: added member 3a0c86c4c744477c [https://192.168.0.28:2380] to cluster e4d7f96e88cc9d71 export ETCDCTL_API=3 一条指令,可以直接在终端上修改里面参数 etcdctl snapshot restore snapshot.db --name etcd3 --initial-cluster "etcd1=https://192.168.0.25:2380,etcd2=https://192.168.0.26:2380,etcd3=https://192.168.0.28:2380" --initial-cluster-token k8s_etcd --initial-advertise-peer-urls https://192.168.0.28:2380 --data-dir=/var/lib/etcd 和上面指令一样作用,把长的指令以换行形式展现 etcdctl snapshot restore snapshot.db --name etcd3 \ --initial-cluster "etcd1=https://192.168.0.25:2380,etcd2=https://192.168.0.26:2380,etcd3=https://192.168.0.28:2380" \ --initial-cluster-token k8s_etcd \ --initial-advertise-peer-urls https://192.168.0.28:2380 \ --data-dir=/var/lib/etcd 2021-01-19 11:22:21.423215 I | mvcc: restore compact to 96139 2021-01-19 11:22:21.438319 I | etcdserver/membership: added member 7370b1d3dc967c [https://192.168.0.25:2380] to cluster e4d7f96e88cc9d71 2021-01-19 11:22:21.438357 I | etcdserver/membership: added member 2ef3cfc4ca48ad38 [https://192.168.0.26:2380] to cluster e4d7f96e88cc9d71 2021-01-19 11:22:21.438371 I | etcdserver/membership: added member 3a0c86c4c744477c [https://192.168.0.28:2380] to cluster e4d7f96e88cc9d71
5.3、上面三台 Etcd 启动
systemctl start etcd
5.4、检查 Etcd 集群状态,注意:如果有证书,需要加上证书;在哪台 Etcd 机器执行,注意找到对应的证书文件及最后面加 endpoint health 字样。
一条指令,可以直接在终端上修改里面参数 etcdctl --cacert=/etc/ssl/etcd/ssl/ca.pem --cert=/etc/ssl/etcd/ssl/node-node3.pem --key=/etc/ssl/etcd/ssl/node-node3-key.pem --endpoints=https://192.168.0.25:2379,https://192.168.0.26:2379,https://192.168.0.28:2379 endpoint health 和上面指令一样作用,把长的指令以换行形式展现 etcdctl --cacert=/etc/ssl/etcd/ssl/ca.pem \ --cert=/etc/ssl/etcd/ssl/node-node3.pem \ --key=/etc/ssl/etcd/ssl/node-node3-key.pem \ --endpoints=https://192.168.0.25:2379,https://192.168.0.26:2379,https://192.168.0.28:2379 \ endpoint health https://192.168.0.28:2379 is healthy: successfully committed proposal: took = 11.664519ms https://192.168.0.26:2379 is healthy: successfully committed proposal: took = 5.04665ms https://192.168.0.25:2379 is healthy: successfully committed proposal: took = 1.837265ms
5.5、三台 Etcd 全部正常,分别到每台 master 启动 kube-apiserver
mv /root/kube-apiserver.yaml /etc/kubernetes/manifests/
5.6、检查 Kubernetes 集群和创建的 Pod 是否恢复正常,包括 Etcd 服务、kube-apiserver 服务及对应 ns 下删除的 Pod。
kubectl get pod -n kube-system | grep kube-apiserver kube-apiserver-node1 1/1 Running 0 16h kube-apiserver-node2 1/1 Running 0 16h kube-apiserver-node3 1/1 Running 0 16h [root@node1 ssl]# docker ps -a | grep etcd 156b19a72bf0 kubesphere/etcd:v3.3.12 "/usr/local/bin/etcd" 15 minutes ago Up 15 minutes etcd1 kubectl get pod NAME READY STATUS RESTARTS AGE nginx-6db489d4b7-jdfkw 1/1 Running 0 13h nginx1-675bf6c9f-jndrx 1/1 Running 0 13h nginx2-6dfc958b55-vgsc4 1/1 Running 0 13h nginx3-8f65b44f9-2pk4l 1/1 Running 0 13h nginx4-9bb966479-svhs6 1/1 Running 0 13h nginx5-7d4d998c6c-7xlp8 0/1 ContainerCreating 0 13h nginx6-79444687cf-rc8vk 1/1 Running 0 13h [root@node1 ssl]# kubectl get pod -n test NAME READY STATUS RESTARTS AGE details-v1-7d78fc5688-ntvtd 1/1 Running 0 13h productpage-v1-844495cb4b-qn6l9 1/1 Running 0 13h ratings-v1-55ccf46fb4-pwqw7 1/1 Running 0 13h reviews-v1-68bb7b8c4f-h6vr9 1/1 Running 0 13h
这篇关于通过备份 Etcd 来完美恢复 Kubernetes 中的误删数据的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-12-23云原生周刊:利用 eBPF 增强 K8s
- 2024-12-20/kubernetes 1.32版本更新解读:新特性和变化一目了然
- 2024-12-19拒绝 Helm? 如何在 K8s 上部署 KRaft 模式 Kafka 集群?
- 2024-12-16云原生周刊:Kubernetes v1.32 正式发布
- 2024-12-13Kubernetes上运行Minecraft:打造开发者平台的例子
- 2024-12-12深入 Kubernetes 的健康奥秘:探针(Probe)究竟有多强?
- 2024-12-10运维实战:K8s 上的 Doris 高可用集群最佳实践
- 2024-12-022024年最好用的十大Kubernetes工具
- 2024-12-02OPA守门人:Kubernetes集群策略编写指南
- 2024-11-26云原生周刊:K8s 严重漏洞