PVE 上 Debian 12 虚拟机部署 Kubernetes 1.31 简要记录
status
Published
type
Post
slug
pve-debian-12-vm-deploy-kubernetes-1-31-brief-record
date
Sep 23, 2024
tags
PVE
Linux
Config
K8s
DevOps
summary
文章记录了在 PVE 虚拟机上部署 Kubernetes 1.31 集群的操作过程,包括环境准备,安装并配置 containerd 作为容器运行时,使用 kubeadm 初始化 Kubernetes 集群,并部署 Cilium 。最后,文章验证了集群功能并展示了如何删除集群。
Kubernetes 近期发布了 1.31 版本,正好之前 PVE 中已有的 Kubernetes 测试环境版本有点旧了。干脆重新创建虚拟机部署 Kubernetes 吧!
参照上方的链接👆,使用 Terraform 来创建四台虚拟机,一台作 Master Node,其余三台作 Worker Node。(资源限制,此处采用单节点控制面,不做高可用)
集群主机配置
# 设置host cat <<EOF >> /etc/hosts 192.168.31.20 debian-0 192.168.31.21 debian-1 192.168.31.22 debian-2 192.168.31.23 debian-3 EOF # master 执行 hostnamectl set-hostname k8s-master # worker 执行 hostnamectl set-hostname k8s-node1
前置准备工作
卸载 Docker (全新环境或未安装过可跳过)
sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin docker-ce-rootless-extras sudo rm -rf /var/lib/docker sudo rm -rf /var/lib/containerd sudo rm -rf /etc/docker
关闭服务
# 关闭swap swapoff -a # 临时关闭所有已激活的 swap 分区 sed -ri 's/.*swap.*/#&/' /etc/fstab #永久关闭(禁用系统启动自动挂载 swap 分区) # 关闭selinux(非必须,如后续有问题可考虑执行) sudo setenforce 0 sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config # 停止 AppArmor 服务 systemctl stop apparmor.service # 禁用 AppArmor 服务 systemctl disable apparmor.service # 清空 iptables iptables -F iptables -X # 禁用 Uncomplicated Firewall(ufw) ufw disable # 停止 ufw 服务 systemctl stop ufw.service # 禁用 ufw 服务 systemctl disable ufw.service
内核参数设置
cat > /etc/sysctl.conf << EOF vm.swappiness=0 vm.overcommit_memory=1 vm.panic_on_oom=0 fs.inotify.max_user_watches = 524288 fs.inotify.max_user_instances = 512 EOF sysctl -p # 生效 cat <<EOF | tee /etc/sysctl.d/k8s.conf net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.conf.default.rp_filter=1 net.ipv4.conf.all.rp_filter=1 EOF sysctl --system # 生效(无需重启)
内核模块
apt-get install -y ipset ipvsadm apt-transport-https ca-certificates curl gpg # 系统引导时自动加载的内核模块 cat <<EOF | tee /etc/modules-load.d/k8s.conf overlay br_netfilter ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack_ipv4 ip_tables EOF # 添加可执行权限 chmod a+x /etc/modules-load.d/k8s.conf # 加载内核模块 modprobe br_netfilter # 网络桥接模块 modprobe overlay # 联合文件系统模块 lsmod | grep -e br_netfilter -e overlay
CRI (Containerd)
containerd 安装
# cri-containerd 包 比 containerd 包多了 runc,可以省去单独再安装 runc wget https://github.com/containerd/containerd/releases/download/v1.7.22/cri-containerd-1.7.22-linux-amd64.tar.gz tar Cxzvf /usr/local cri-containerd-1.7.22-linux-amd64.tar.gz # containerd 配置目录 mkdir -p /etc/containerd/ # containerd 默认配置文件创建 containerd config default > /etc/containerd/config.toml # vim /etc/containerd/config.toml # 此处配置Cgroup driver ,默认已经是 systemd ,可以不修改 # 增加镜像配置,使用大陆的镜像源 # 修改配置文件中使用的沙箱镜像版本 sed -i 's#registry.k8s.io/pause:3.8#registry.aliyuncs.com/google_containers/pause:3.10#' /etc/containerd/config.toml # 设置容器运行时(containerd + CRI)在创建容器时使用 Systemd Cgroups 驱动 sed -i '/SystemdCgroup/s/false/true/' /etc/containerd/config.toml # 修改存储目录 # mkdir /data1/containerd # sed -i 's#root = "/var/lib/containerd"#root = "/data/containerd"#' /etc/containerd/config.toml
containerd 服务
# 创建目录,如已存在可跳过 mkdir -p /usr/local/lib/systemd/system/ curl -o /usr/local/lib/systemd/system/containerd.service https://raw.githubusercontent.com/containerd/containerd/main/containerd.service # 启用并立即启动 containerd 服务 systemctl enable --now containerd.service # 检查 containerd 服务的当前状态 systemctl status containerd.service # 检查 containerd crictl runc 的版本 containerd --version crictl --version runc --version crictl config runtime-endpoint unix:///run/containerd/containerd.sock
containerd.service
[Unit] Description=containerd container runtime Documentation=https://containerd.io After=network.target local-fs.target [Service] ExecStartPre=-/sbin/modprobe overlay ExecStart=/usr/local/bin/containerd Type=notify Delegate=yes KillMode=process Restart=always RestartSec=5 # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNPROC=infinity LimitCORE=infinity # Comment TasksMax if your systemd version does not supports it. # Only systemd 226 and above support this version. TasksMax=infinity OOMScoreAdjust=-999 [Install] WantedBy=multi-user.target
CNI-Plugins 安装
wget https://github.com/containernetworking/plugins/releases/download/v1.5.1/cni-plugins-linux-amd64-v1.5.1.tgz mkdir -p /opt/cni/bin tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.5.1.tgz
nerdctl
nerdctl 是一个操作 containerd 的命令行工具,提供类似 docker 命令的操作,可以提升运维操作体验。
cd /tmp wget https://github.com/containerd/nerdctl/releases/download/v1.7.7/nerdctl-1.7.7-linux-amd64.tar.gz tar xf nerdctl-1.7.7-linux-amd64.tar.gz mv nerdctl /usr/sbin
安装 Kubernetes 组件
更新软件包
sudo apt-get update # 安装必要包,之前执行过则可以略过 sudo apt-get install -y apt-transport-https ca-certificates curl gpg #导入 Kubernetes 软件包仓库的公共签名密钥。所有仓库都使用相同的签名密钥,可以忽略URL中的版本 curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg # 添加 Kubernetes apt 仓库源 echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list sudo apt-get update sudo apt-get install -y kubelet kubeadm kubectl #锁定版本 sudo apt-mark hold kubelet kubeadm kubectl
初始化集群
在要作为主节点的机器上通过 Kubeadm 命令初始化集群
- 命令行方式:
# kubeadm config images list --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers # kubeadm config images pull --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers # --apiserver-advertise-address 指定 Kubernetes API Server 的宣告地址,可以不设置让其自动检测 # 其他节点和客户端将使用此地址连接到 API Server kubeadm init \ --image-repository=registry.cn-hangzhou.aliyuncs.com/google_containers \ --control-plane-endpoint="192.168.31.20" \ --pod-network-cidr=10.100.0.0/16 \ --service-cidr=10.200.0.0/16 \ --token-ttl=0 \ --upload-certs
- 配置文件方式
生成默认配置文件
kubeadm config print init-defaults > kubeadm.yaml
# 按需修改 kubeadm.yaml 中的配置, 并在末尾增加如下内容 # 在版本 1.22 及更高版本中,如果用户没有在 KubeletConfiguration 中设置 cgroupDriver 字段, kubeadm 会将它设置为默认值 systemd # # --- # apiVersion: kubelet.config.k8s.io/v1beta1 # kind: KubeletConfiguration # cgroupDriver: systemd --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta3 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controllerManager: {} dns: {} etcd: local: dataDir: /var/lib/etcd # 指定阿里云镜像以及k8s版本 imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers kind: ClusterConfiguration kubernetesVersion: 1.31.1 # 新增 controlPlaneEndpoint: 192.168.31.20:6443 networking: dnsDomain: cluster.local serviceSubnet: 10.200.0.0/16 # service 网段 podSubnet: 10.100.0.0/16 # pod 网段 scheduler: {} --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration mode: ipvs
# 验证镜像仓配置是否生效 kubeadm config images list --config=kubeadm.yaml # 提前拉取镜像 kubeadm config images pull --config=kubeadm.yaml # 查看镜像是否下载 crictl images # 开始初始化 kubeadm init --config=kubeadm.yaml
初始化过程截图
初始化完成后会显示加入集群的相关命令,如下所示:
You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of the control-plane node running the following command on each as root: kubeadm join 192.168.31.20:6443 --token emq8ij.a2gsgdlpbxpfpru8 \ --discovery-token-ca-cert-hash sha256:7a5c23337fb6e87d25a38ff11370e512b0ba621fa1a5b8981e905b0765186527 \ --control-plane --certificate-key 0dcde4a8bd52e4645c1ea40e8296c96040ffb04d4103b2a592a4c567a82215f6 Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.31.20:6443 --token emq8ij.a2gsgdlpbxpfpru8 \ --discovery-token-ca-cert-hash sha256:7a5c23337fb6e87d25a38ff11370e512b0ba621fa1a5b8981e905b0765186527
根据提示在其余节点执行
kubeadm join
命令。# 普通用户执行 kubectl 配置 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config # root 用户配置 echo 'export KUBECONFIG=/etc/kubernetes/admin.conf' >> /etc/profile source /etc/profile
# 查看token kubeadm token list # 查看cert-hash openssl x509 -in /etc/kubernetes/pki/ca.crt -pubkey -noout | openssl pkey -pubin -outform DER | openssl dgst -sha256 # 查看节点状态 kubectl get nodes # 查看集群信息 kubectl cluster-info
此时未安装pod网络插件,所以集群状态是 NotReady 未就绪的。
由于集群节点通常是按顺序初始化的,CoreDNS Pod 很可能都运行在第一个控制面节点上。 为了提供更高的可用性,请在加入至少一个新节点后使用kubectl -n kube-system rollout restart deployment coredns
命令,重新平衡这些 CoreDNS Pod。
CNI 网络组件
Cilium
既然是测试环境,就体验下基于 eBPF 的 Cilium,下面为安装过程
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt) CLI_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
cilium install --version 1.16.1 cilium status --wait
此时发现有 Pod 无法启动,通过 kubectl desc 命令得到启动失败原因如下:
Last State: Terminated Reason: Error Message: cp: cannot create regular file '/hostbin/cilium-mount': Permission denied
执行如下命令解决:
sudo chown -R root:root /opt/cni/bin/ sudo chmod -R 755 /opt/cni/bin/
Calico
这里附带一下之前 calico 的安装方式
curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/calico.yaml
修改
CALICO_IPV4POOL_CIDR
为自定义的网段,修改 CALICO_IPV4POOL_IPIP
为 Always
启用 ipip 协议。验证 coredns dns 转发是否正常
apt install -y dnsutils kubectl get svc -n kube-system dig -t a www.163.com @10.200.0.10
验证集群
# 创建pod kubectl create deployment nginx --image=nginx --replicas 3 # 添加 nginx service 设置映射端口 # 如果是临时测试:kubectl port-forward deployment nginx 3000:3000 kubectl expose deployment nginx --name nginx-svc --type=NodePort --port=80 --target-port 80 # 查看pod,svc状态 kubectl get pod,svc kubectl delete deployment nginx kubectl delete svc nginx-svc
删除集群
kubeadm reset -f # 重置集群 -f 强制执行 rm -rf /var/lib/kubelet # 删除核心组件目录 rm -rf /etc/kubernetes # 删除集群配置 rm -rf /etc/cni/net.d/ # 删除容器网络配置 rm -rf /var/log/pods && rm -rf /var/log/containers # 删除pod和容器日志 iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X # 删除 iptables 规则 service kubelet restart # 镜像一般保留,查看当前节点已下载的镜像命令如下 crictl images # 快速删除节点上的全部镜像 # rm -rf /var/lib/containerd/* # 然后可能需要重启节点才能再次加入集群 reboot