【Kubernetes】calico-node的pod实例一直报错重启的问题
2021/9/4 22:07:21
本文主要是介绍【Kubernetes】calico-node的pod实例一直报错重启的问题,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!
【背景】
今天测试K8s集群的node节点扩容,扩容的整个过程都很顺利,但是后来发现在新扩容的node节点(k8s-node04)上,一直有一个calico-node的pod实例报错、不断重启。
【现象】
从下面的pod实例的运行状态查询结果来看,可以发现有一个pod实例(calico-node-xl9bc)在不断的重启。
[root@k8s-master01 ~]# kubectl get pods -A| grep calico kube-system calico-kube-controllers-78d6f96c7b-tv2g6 1/1 Running 0 75m kube-system calico-node-6dk7g 1/1 Running 0 75m kube-system calico-node-dlf26 1/1 Running 0 75m kube-system calico-node-s5phd 1/1 Running 0 75m kube-system calico-node-xl9bc 0/1 Running 30 3m28s
【排查】
查询pod的日志
[root@k8s-master01 ~]# kubectl logs calico-node-xl9bc -n kube-system -f -----省略部分日志---------------- 2021-09-04 12:32:45.011 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:46.025 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:47.038 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:48.050 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:49.061 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:50.072 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:51.079 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:52.093 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:53.104 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:54.114 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:55.127 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:56.138 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:57.148 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:58.162 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:32:59.176 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:33:00.186 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:33:01.199 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:33:02.211 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:33:03.225 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host 2021-09-04 12:33:04.238 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
这个报错在网上搜寻了很久,基本也没发现有针对性的解决方案。
后来根据报错信息,在网上的一篇文章中发现了一个一模一样的报错案例,案例的原因是因为/etc/hosts文件中缺少了两行ipv4和ipv6的回环地址,回头来查看我本地node节点的/etc/hosts文件,竟然真的没有这两行,不知道昨晚我自己装这台虚拟机的时候,到底做了什么诡异的操作。
### /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
在k8s-node04的/etc/hosts文件中增加了这两行配置以后,重启网络,发现pod实例出现CrashLoopBackOff。
[root@k8s-master01 ~]# kubectl get pods -A| grep calico kube-system calico-kube-controllers-78d6f96c7b-tv2g6 1/1 Running 0 80m kube-system calico-node-6dk7g 1/1 Running 0 80m kube-system calico-node-dlf26 1/1 Running 0 80m kube-system calico-node-s5phd 1/1 Running 0 80m kube-system calico-node-xl9bc 0/1 CrashLoopBackOff 7 8m24s
删除这个pod实例,发现重新创建的pod实例的运行状态终于恢复正常了。
[root@k8s-master01 ~]# kubectl delete pod calico-node-xl9bc -n kube-system pod "calico-node-xl9bc" deleted [root@k8s-master01 ~]# kubectl get pods -A| grep calico kube-system calico-kube-controllers-78d6f96c7b-tv2g6 1/1 Running 0 81m kube-system calico-node-6dk7g 1/1 Running 0 81m kube-system calico-node-dlf26 1/1 Running 0 81m kube-system calico-node-mz58r 0/1 Running 0 5s kube-system calico-node-s5phd 1/1 Running 0 81m [root@k8s-master01 ~]# kubectl get pods -A| grep calico kube-system calico-kube-controllers-78d6f96c7b-tv2g6 1/1 Running 0 81m kube-system calico-node-6dk7g 1/1 Running 0 81m kube-system calico-node-dlf26 1/1 Running 0 81m kube-system calico-node-mz58r 0/1 Running 0 7s kube-system calico-node-s5phd 1/1 Running 0 81m [root@k8s-master01 ~]# kubectl get pods -A| grep calico kube-system calico-kube-controllers-78d6f96c7b-tv2g6 1/1 Running 0 81m kube-system calico-node-6dk7g 1/1 Running 0 81m kube-system calico-node-dlf26 1/1 Running 0 81m kube-system calico-node-mz58r 0/1 Running 0 8s kube-system calico-node-s5phd 1/1 Running 0 81m [root@k8s-master01 ~]# kubectl get pods -A| grep calico kube-system calico-kube-controllers-78d6f96c7b-tv2g6 1/1 Running 0 81m kube-system calico-node-6dk7g 1/1 Running 0 81m kube-system calico-node-dlf26 1/1 Running 0 81m kube-system calico-node-mz58r 1/1 Running 0 11s kube-system calico-node-s5phd 1/1 Running 0 81m
参考:
http://www.manongjc.com/detail/15-tufllpwnqavcxef.html
https://www.lmonkey.com/t/DExg407BK
这篇关于【Kubernetes】calico-node的pod实例一直报错重启的问题的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!
- 2024-12-20/kubernetes 1.32版本更新解读:新特性和变化一目了然
- 2024-12-19拒绝 Helm? 如何在 K8s 上部署 KRaft 模式 Kafka 集群?
- 2024-12-16云原生周刊:Kubernetes v1.32 正式发布
- 2024-12-13Kubernetes上运行Minecraft:打造开发者平台的例子
- 2024-12-12深入 Kubernetes 的健康奥秘:探针(Probe)究竟有多强?
- 2024-12-10运维实战:K8s 上的 Doris 高可用集群最佳实践
- 2024-12-022024年最好用的十大Kubernetes工具
- 2024-12-02OPA守门人:Kubernetes集群策略编写指南
- 2024-11-26云原生周刊:K8s 严重漏洞
- 2024-11-15在Kubernetes (k8s) 中搭建三台 Nginx 服务器怎么实现?-icode9专业技术文章分享