本文档参考:
https://github.com/gjmzj/kubeasz
扩展:
使用 kubeadm 部署集群
https://blog.frognew.com/2018/08/kubeadm-install-kubernetes-1.11.html
软硬件限制
1)cpu 和内存 master:至少 1c2g,推荐 2c4g;node:至少 1c2g
2)linux 系统 内核版本至少 3.10,推荐 CentOS7/RHEL7
3)docker 至少 1.9 版本
4)etcd 至少 2.0 版本
kubernetes 官方 github 地址
https://github.com/kubernetes/kubernetes/releases
高可用集群所需节点规划
部署节点------x1 : 运行 ansible/easzctl 脚本,可以复用 master
etcd 节点------x3 : 注意 etcd 集群必须是 1,3,5,7... 奇数个节点
master 节点----x2 : 高可用集群至少 2 个 master 节点
node 节点------x2 : 真正应用负载的节点,根据需要提升机器配置和增加节点数
机器规划
ip | 主机名 | 角色 |
---|---|---|
10.18.30.28 | k8s-master-1 | deploy、master1、etcd |
10.18.30.29 | k8s-node-1 | etcd、node1 |
10.18.30.30 | k8s-node-2 | etcd、node2 |
10.18.30.32 | k8s-master-2 | master2 |
前期准备
四台机器,全部执行
yum install -y epel-release python && yum -y update && init 6
deploy 节点安装和准备 ansible
yum install git python-pip -y
pip install pip --upgrade -i https://mirrors.aliyun.com/pypi/simple/
pip install ansible==2.6.18 netaddr==0.7.19 -i https://mirrors.aliyun.com/pypi/simple/
deploy 节点配置免密码登录
ssh-keygen //生产密钥
for ip in 28 29 30 32; do ssh-copy-id 10.18.30.$ip; done
测试是否添加密钥成功
[root@k8s-master-1 ~]# for ip in 28 29 30 32; do ssh 10.18.30.$ip; done
Last login: Wed Mar 18 17:37:24 2020 from 10.18.30.131
[root@k8s-master-1 ~]# exit
登出
Connection to 10.18.30.28 closed.
Last login: Wed Mar 18 17:39:27 2020 from 10.18.30.131
[root@k8s-node-1 ~]# exit
登出
Connection to 10.18.30.29 closed.
Last login: Wed Mar 18 17:40:19 2020 from 10.18.30.131
[root@k8s-node-2 ~]# exit
登出
Connection to 10.18.30.30 closed.
Last login: Wed Mar 18 17:37:25 2020 from 10.18.30.131
[root@k8s-master-2 ~]# exit
登出
Connection to 10.18.30.32 closed.
deploy 上编排 k8s
准备工作
下载需要的包
export release=2.2.0
curl -C- -fLO --retry 3 https://github.com/easzlab/kubeasz/releases/download/${release}/easzup
chmod +x ./easzup
./easzup -D
配置集群参数
cd /etc/ansible && cp example/hosts.multi-node hosts //根据实际情况修改IP地址
vim /etc/ansible/hosts
[etcd]
10.18.30.28 NODE_NAME=etcd1
10.18.30.29 NODE_NAME=etcd2
10.18.30.30 NODE_NAME=etcd3
[kube-master]
10.18.30.28
10.18.30.32
[kube-node]
10.18.30.29
10.18.30.30
修改完 hosts,测试
ansible all -m ping
开始安装(下面两个安装步骤,按喜好选择一个)
一步安装
ansible-playbook 90.setup.yml
分步骤安装
1)创建证书和安装准备
ansible-playbook 01.prepare.yml
2)安装 etcd 集群
ansible-playbook 02.etcd.yml
检查 etcd 节点健康状况
bash
export NODE_IPS="28 29 30"
for ip in ${NODE_IPS}; do ETCDCTL_API=3 etcdctl --endpoints=https://10.18.30.${ip}:2379 --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem endpoint health; done
输出:
https://10.18.30.28:2379 is healthy: successfully committed proposal: took = 7.859522ms
https://10.18.30.29:2379 is healthy: successfully committed proposal: took = 6.022965ms
https://10.18.30.30:2379 is healthy: successfully committed proposal: took = 6.306561ms
3)安装 docker
ansible-playbook 03.docker.yml
4)安装 master 节点
ansible-playbook 04.kube-master.yml
kubectl get componentstatus //查看集群状态
输出:
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
5)安装 node 节点
ansible-playbook 05.kube-node.yml
kubectl get nodes //查看 node 节点
输出:
NAME STATUS ROLES AGE VERSION
10.18.30.28 Ready,SchedulingDisabled master 3m42s v1.17.2
10.18.30.29 Ready node 44s v1.17.2
10.18.30.30 Ready node 44s v1.17.2
10.18.30.32 Ready,SchedulingDisabled master 3m43s v1.17.2
6)部署集群网络
ansible-playbook 06.network.yml
kubectl get pod -n kube-system //查看 kube-system namespace 上的 pod,从中可以看到 flannel 相关的 pod
输出:
NAME READY STATUS RESTARTS AGE
kube-flannel-ds-amd64-4q54p 1/1 Running 0 63s
kube-flannel-ds-amd64-6t75r 1/1 Running 0 63s
kube-flannel-ds-amd64-h2fgf 1/1 Running 0 63s
kube-flannel-ds-amd64-xpwqq 1/1 Running 0 63s
7)安装集群插件(dns, dashboard)
ansible-playbook 07.cluster-addon.yml
kubectl get svc -n kube-system //查看 kube-system namespace 下的服务
输出:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dashboard-metrics-scraper ClusterIP 10.68.133.114 <none> 8000/TCP 23s
kube-dns ClusterIP 10.68.0.2 <none> 53/UDP,53/TCP,9153/TCP 52s
kubernetes-dashboard NodePort 10.68.135.156 <none> 443:23744/TCP 23s
metrics-server ClusterIP 10.68.34.18 <none> 443/TCP 47s
traefik-ingress-service NodePort 10.68.87.63 <none> 80:23456/TCP,8080:36336/TCP 16s
查看集群信息
kubectl cluster-info
输出:
Kubernetes master is running at https://10.18.30.28:6443
CoreDNS is running at https://10.18.30.28:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://10.18.30.28:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Metrics-server is running at https://10.18.30.28:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
查看 node/pod 使用资源情况
kubectl top node
输出:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
10.18.30.28 47m 2% 1518Mi 49%
10.18.30.29 37m 2% 1417Mi 45%
10.18.30.30 35m 1% 1401Mi 45%
10.18.30.32 37m 2% 1493Mi 48%
kubectl top pod --all-namespaces
输出:
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system coredns-76b74f549-4q442 2m 12Mi
kube-system dashboard-metrics-scraper-7b8b58dc8b-vrvvt 1m 11Mi
kube-system kube-flannel-ds-amd64-4q54p 1m 10Mi
kube-system kube-flannel-ds-amd64-6t75r 1m 10Mi
kube-system kube-flannel-ds-amd64-h2fgf 2m 10Mi
kube-system kube-flannel-ds-amd64-xpwqq 1m 8Mi
kube-system kubernetes-dashboard-567b96c67d-lgccc 1m 14Mi
kube-system metrics-server-745cb4496f-2c2lt 1m 11Mi
kube-system traefik-ingress-controller-857568d49f-pcjws 2m 15Mi
测试 DNS
a)创建 nginx service
kubectl run nginx --image=nginx --expose --port=80
kubectl get svc
输出:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 31m
nginx ClusterIP 10.68.67.252 <none> 80/TCP 17s
kubectl get pods
输出:
NAME READY STATUS RESTARTS AGE
nginx-5578584966-pgk4x 1/1 Running 0 29s
b)创建 busybox 测试 pod
kubectl run busybox --rm -it --image=busybox:1.28.4 /bin/sh //进入到 busybox 内部
nslookup nginx.default.svc.cluster.local
输出:
Server: 10.68.0.2
Address 1: 10.68.0.2 kube-dns.kube-system.svc.cluster.local
Name: nginx.default.svc.cluster.local
Address 1: 10.68.67.252 nginx.default.svc.cluster.local
其他相关操作
增加 node 节点
https://github.com/easzlab/kubeasz/blob/master/docs/op/op-node.md
增加 master 节点
https://github.com/easzlab/kubeasz/blob/master/docs/op/op-master.md
升级集群
https://github.com/easzlab/kubeasz/blob/master/docs/op/upgrade.md
集群备份与恢复
备份
ansible-playbook /etc/ansible/23.backup.yml
ll /etc/ansible/.cluster/backup/
总用量 3384
-rw-r--r--. 1 root root 1842 3月 19 14:21 hosts
-rw-r--r--. 1 root root 1842 3月 19 14:21 hosts-202003191421
-rw-------. 1 root root 1724448 3月 19 14:21 snapshot-202003191421.db
-rw-------. 1 root root 1724448 3月 19 14:21 snapshot.db
对集群做修改
kubectl run mysql --image=mysql:5.7 --expose --port=3306 --env="MYSQL_ROOT_PASSWORD=ITsupport.0"
kubectl get pods
输出
NAME READY STATUS RESTARTS AGE
mysql-9b94874c7-9fs8b 1/1 Running 0 65s
nginx-5578584966-pgk4x 1/1 Running 0 3h8m
恢复
可以在 roles/cluster-restore/defaults/main.yml 文件中配置需要恢复的 etcd 备份版本(从上述备份目录中选取),默认使用最近一次备份;执行恢复后,需要一定时间等待 pod/svc 等资源恢复重建。
ansible-playbook /etc/ansible/24.restore.yml
kubectl get pods
输出
NAME READY STATUS RESTARTS AGE
nginx-5578584966-pgk4x 1/1 Running 0 3h12m
如果集群主要组件(master/etcd/node)等出现不可恢复问题,可以尝试使用如下步骤 清理 --> 创建 --> 恢复
清理
执行清空脚本
ansible-playbook /etc/ansible/99.clean.yml
init 6
重新连接并看下是否已经清空了
kubectl get pods
The connection to the server localhost:8080 was refused - did you specify the right host or port?
pstree
systemd─┬─NetworkManager───2*[{NetworkManager}]
├─VGAuthService
├─abrt-dbus───2*[{abrt-dbus}]
├─abrt-watch-log
├─abrtd
├─agetty
├─atd
├─auditd───{auditd}
├─chronyd
├─crond
├─dbus-daemon───{dbus-daemon}
├─irqbalance
├─lsmd
├─lvmetad
├─master─┬─pickup
│ └─qmgr
├─polkitd───6*[{polkitd}]
├─rngd
├─rpcbind
├─rsyslogd
├─smartd
├─5*[ssh]
├─sshd─┬─sshd───bash───bash───bash───bash───pstree
│ └─2*[sshd]
├─systemd-journal
├─systemd-logind
├─systemd-udevd
├─tuned───4*[{tuned}]
└─vmtoolsd
创建
执行一键创建脚本重新部署集群
cd /etc/ansible/
ansible-playbook 90.setup.yml
测试是否正确部署
bash
kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
10.18.30.28 51m 2% 2667Mi 86%
10.18.30.29 32m 1% 1313Mi 42%
10.18.30.30 33m 1% 1284Mi 41%
10.18.30.32 26m 1% 1435Mi 46%
kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-76b74f549-rxglz 1/1 Running 0 2m14s
dashboard-metrics-scraper-7b8b58dc8b-txfq6 1/1 Running 0 2m14s
kube-flannel-ds-amd64-hcgtz 1/1 Running 0 3m2s
kube-flannel-ds-amd64-jxfxf 1/1 Running 0 3m2s
kube-flannel-ds-amd64-k6stc 1/1 Running 0 3m2s
kube-flannel-ds-amd64-smhms 1/1 Running 0 3m2s
kubernetes-dashboard-567b96c67d-fw5ff 1/1 Running 0 2m14s
metrics-server-745cb4496f-fzrck 1/1 Running 0 2m14s
traefik-ingress-controller-857568d49f-2tsnm 1/1 Running 0 86s
恢复
执行恢复脚本
ansible-playbook 24.restore.yml
查看是或否恢复备份了
kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5578584966-pgk4x 1/1 Running 0 3h42m
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.68.0.1 <none> 443/TCP 4h13m
nginx ClusterIP 10.68.67.252 <none> 80/TCP 3h42m
部署 harbor
单独服务器部署 harbor,参考以下文章
https://www.itwordsweb.com/linux_doc/docker_private.html
kubeasz 部署 harbor(以下命令均在 ansible 机器上执行,除非有特殊说明)
首先要新开一台机器 IP 为:10.18.30.33
1)下载 harbor 离线安装包
cd /etc/ansible/down/
wget https://github.com/goharbor/harbor/releases/download/v1.10.1/harbor-offline-installer-v1.10.1.tgz
2)修改 ansible 的 hosts 文件和默认配置文件
# 参数 NEW_INSTALL=(yes/no):yes 表示新建 harbor,并配置 k8s 节点的 docker 可以使用 harbor 仓库
# no 表示仅配置 k8s 节点的 docker 使用已有的 harbor 仓库
# 参数 SELF_SIGNED_CERT=(yes/no): yes 表示使用自签名证书,即安装程序帮你做一个自己签名的证书(当然这样的证书是得不到浏览器直接认可的)
# no 表示使用已有的证书,如 letsencrypt 或者其他证书颁发机构,如使用此参数,需把证书提前放在 down 目录下,文件名称分别为:harbor.pem 和 harbor-key.pem
# 如果不需要设置域名访问 harbor,可以配置参数 HARBOR_DOMAIN=""
vim /etc/ansible/hosts
[harbor]
10.18.30.33 HARBOR_DOMAIN="" NEW_INSTALL=yes SELF_SIGNED_CERT=yes
vim /etc/ansible/roles/harbor/defaults/main.yml
HARBOR_VER: "v1.10.1"
3)安装 harbor
ssh-copy-id 10.18.30.33
ansible all -m shell -a 'mkdir -p /etc/cloud/templates/ && touch /etc/cloud/templates/hosts.redhat.tmpl && chmod 755 /etc/cloud/templates/hosts.redhat.tmpl'
cd /etc/ansible/
ansible-playbook 11.harbor.yml
4)到 10.18.30.33 上验证是否安装 harbor
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6c4e172d4c4c goharbor/nginx-photon:v1.10.1 "nginx -g 'daemon of…" 2 minutes ago Up 2 minutes (healthy) 0.0.0.0:80->8080/tcp, 0.0.0.0:443->8443/tcp nginx
8092b5309d4e goharbor/harbor-jobservice:v1.10.1 "/harbor/harbor_jobs…" 2 minutes ago Up 2 minutes (healthy) harbor-jobservice
efe36fe4b88a goharbor/clair-adapter-photon:v1.0.1-v1.10.1 "/clair-adapter/clai…" 2 minutes ago Up 2 minutes (healthy) 8080/tcp clair-adapter
c24da60b5e29 goharbor/harbor-core:v1.10.1 "/harbor/harbor_core" 2 minutes ago Up 2 minutes (healthy) harbor-core
f9681720d440 goharbor/clair-photon:v2.1.1-v1.10.1 "./docker-entrypoint…" 2 minutes ago Up 2 minutes (healthy) 6060-6061/tcp clair
36aa5f06a93e goharbor/harbor-portal:v1.10.1 "nginx -g 'daemon of…" 2 minutes ago Up 2 minutes (healthy) 8080/tcp harbor-portal
54d75c4a030c goharbor/registry-photon:v2.7.1-patch-2819-2553-v1.10.1 "/home/harbor/entryp…" 2 minutes ago Up 2 minutes (healthy) 5000/tcp registry
9d276104e6dd goharbor/harbor-db:v1.10.1 "/docker-entrypoint.…" 2 minutes ago Up 2 minutes (healthy) 5432/tcp harbor-db
b2f24c28923c goharbor/harbor-registryctl:v1.10.1 "/home/harbor/start.…" 2 minutes ago Up 2 minutes (healthy) registryctl
f92511776b7d goharbor/redis-photon:v1.10.1 "redis-server /etc/r…" 2 minutes ago Up 2 minutes (healthy) 6379/tcp redis
f218f5339217 goharbor/harbor-log:v1.10.1 "/bin/sh -c /usr/loc…" 2 minutes ago Up 2 minutes (healthy) 127.0.0.1:1514->10514/tcp harbor-log
5)到 10.18.30.33 上获取密码并使用浏览器访问 harbor
节点的 IP 地址 https://10.18.30.33,管理员账号是 admin ,密码见 harbor.cfg(v1.5-v1.7) 或 harbor.yml(v1.8+) 文件 harbor_admin_password 对应值
cat /data/harbor/harbor.yml
harbor_admin_password: sT8dzn54iBuA7pgP
在 k8s 集群使用 harbor
如无特别说明,以下操作均在 ansible 上做实验,也即 master 节点上
1)镜像上传
这里直接使用 harbor 的默认公开仓库
先确认下都有哪些 docker 镜像
docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
easzlab/flannel v0.11.0-amd64 ff281650a721 13 months ago 52.6MB
mirrorgooglecontainers/pause-amd64 3.1 da86e6ba6ca1 2 years ago 742kB
登陆一下 harbor
docker login 10.18.30.33
Username: admin
Password:
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store
Login Succeeded
打 tag,上传
docker tag easzlab/flannel:v0.11.0-amd64 10.18.30.33/library/easzlab/flannel:1.0
docker push 10.18.30.33/library/easzlab/flannel
The push refers to repository [10.18.30.33/library/easzlab/flannel]
9ce0bb155166: Pushed
3f3a4ce2b719: Pushed
9b48060f404d: Pushed
5d3f68f6da8f: Pushed
7bff100f35cb: Pushed
1.0: digest: sha256:bd76b84c74ad70368a2341c2402841b75950df881388e43fc2aca000c546653a size: 1369
2)在 harbor 检查是否上传成功
3)k8s 中使用 harbor
kubectl create secret docker-registry my-secret --docker-server=10.18.30.33 --docker-username=admin --docker-password=sT8dzn54iBuA7pgP --docker-email=team@test.com
kubectl get secrets
NAME TYPE DATA AGE
default-token-77rck kubernetes.io/service-account-token 3 56m
my-secret kubernetes.io/dockerconfigjson 1 14s
vim easzlab-test.yaml
apiVersion: v1
kind: Pod
metadata:
name: easzlab-test
spec:
containers:
- image: 10.18.30.33/library/easzlab/flannel:1.0
name: easzlab-test
imagePullSecrets:
- name: my-secret
kubectl create -f easzlab-test.yaml
kubectl get pods
NAME READY STATUS RESTARTS AGE
easzlab-test 1/1 Running 0 22s
参考资料
https://github.com/easzlab/kubeasz/tree/master/docs