基于K8s的微服务架构搭建完整方案
- 内容介绍
- 文章标签
- 相关推荐
基于K8s的微服务架构搭建完整方案
本次环境构建所有服务器操作系统均为 Ubuntu 22.04 LTS 最小安装版本
1 Nginx + KeepAlived 负载均衡和高可用环境
1.1 服务器列表
| 主机名称 | IP地址 | 配置 | 角色 | VIP |
|---|---|---|---|---|
| nginx-01 | 192.168.2.143 | 2c/8g/200g | master | 192.168.2.170 |
| nginx-02 | 192.168.2.144 | 2c/8g/200g | backup | 192.168.2.170 |
1.2 安装Nginx
sudo apt-get install nginx -y
验证:
sudo nginx -t
# 输出
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
1.3 修改默认配置
将默认的80端口改为81端口,80端口留给负载均衡
sudo vim /etc/nginx/sites-enabled/default
server {
listen 81 default_server;
listen [::]:81 default_server;
....
}
重启nginx服务
sudo systemctl restart nginx
修改默认页面方便区分是哪个节点
sudo vim /var/www/html/index.nginx-debian.html
<p><em>Thank you for using nginx.</em></p>
<p><em>This is from Nginx-01</em></p> /**新增这行 */
访问地址验证
curl localhost:81
1.4 配置负载均衡
1.4.1 配置k8s master负载均衡
sudo vim /etc/nginx/nginx.conf
输入以下信息:
...
events {
worker_connections 1024;
# multi_accept on;
}
stream {
log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';
access_log /var/log/nginx/k8s-access.log main;
upstream k8s-apiserver {
server 192.168.2.143:6443; # k8s-master-01 APISERVER IP:PORT
server 192.168.2.144:6443; # k8s-master-02 APISERVER IP:PORT
}
server {
listen 6443;
proxy_pass k8s-apiserver;
}
}
...
重启nginx服务:
sudo systemctl restart nginx
1.4.2 配置应用负载均衡
cd /etc/nginx/conf.d/
sudo vim nginx.conf
输入以下信息:
# 应用服务器地址列表配置
upstream balance_server {
# 服务器的访问地址,负载均衡算法使用权重轮询,也可以采用其他算法。
server 192.168.2.141:81 weight=1;
server 192.168.2.142:81 weight=2;
}
# 负载均衡服务
server {
# 负载均衡的监听端口
listen 80 default_server;
listen [::]:80 default_server;
# 负载均衡服务器的服务名称,没有时填写 _
server_name _;
location / {
# 代理转发应用服务
proxy_pass http://balance_server;
}
}
重启nginx服务:
sudo systemctl restart nginx
测试负载均衡:
curl localhost
连续响应三次之后,可以看到有两次是nginx-02,一次是nginx-01,说明负载均衡已生效。
1.5 安装Keepalived
1.5.1 部署
sudo apt install keepalived -y
设置随系统自动启动:
sudo vim /etc/rc.local
查看主机网卡名称:
ip a
# 输出
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:b0:e3:d7 brd ff:ff:ff:ff:ff:ff
altname enp2s1
inet 192.168.2.141/24 brd 192.168.2.255 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:feb0:e3d7/64 scope link
valid_lft forever preferred_lft forever
可以看到网卡名称为:ens33
编辑keepalived配置文件,设置VIP地址:192.168.2.170
sudo vim /etc/keepalived/keepalived.conf
再master节点中添加以下内容:
global_defs {
router_id 192.168.2.141
}
vrrp_script chk_nginx {
script "/etc/keepalived/nginx_chk.sh"
interval 2
}
vrrp_instance VI_1{
state MASTER
interface ens33 # 这个就是上面查看的网卡名称
virtual_router_id 100
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1369
}
virtual_ipaddress {
192.168.2.170
}
track_script {
chk_nginx
}
}
在backup节点中添加以下内容:
global_defs {
router_id 192.168.2.142
}
vrrp_script chk_nginx {
script "/etc/keepalived/nginx_chk.sh"
interval 2
}
vrrp_instance VI_1{
state BACKUP
interface ens33
virtual_router_id 100
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1369
}
virtual_ipaddress {
192.168.2.170
}
track_script {
chk_nginx
}
}
1.5.2 创建nginx检查脚本
sudo vim /etc/keepalived/nginx_chk.sh
输入以下内容:
#!/bin/bash
#检查是否有nginx相关的进程
A=`ps -C nginx --no-header |wc -l`
#如果没有
if [ $A -eq 0 ];then
# 重启nginx,延迟2秒
service nginx restart
sleep 2
# 重新检查是否有nginx相关的进程
if [ `ps -C nginx --no-header |wc -l` -eq 0 ];then
# 仍然没有nginx相关的进程,杀死当前keepalived,切换到备用机
killall keepalived
fi
fi
为脚本添加执行权限:
sudo chmod +x /etc/keepalived/nginx_chk.sh
检查脚本,不报错即可:
cd /etc/keepalived
./nginx_chk.sh
1.5.3 验证keepalived服务
重启keepalived服务:
sudo systemctl restart keepalived
在master节点检查VIP是否设置成功:
ip a
可以看到VIP已经出现在列表中了,注意此时backup节点是没有VIP地址的。
可以访问VIP地址,验证nginx是否正常:
curl 192.168.2.170
如果停止master的keepalived服务,可以看到master的ip列表中VIP地址消失,而backup的ip地址中出现了VIP地址
sudo systemctl stop keepalived
Nginx + Keepalived 高可用环境搭建完成!
2 KubeAdm构建K8s集群
2.1 服务器环境
2.1.1 服务器准备
kubeadm安装k8s集群要求服务器最低2核
| 主机名称 | IP地址 | 配置 | 功能 |
|---|---|---|---|
| k8s-master-01 | 192.168.2.143 | 4c/16g/200g | master,dashboard |
| k8s-master-02 | 192.168.2.144 | 4c/16g/200g | master |
| k8s-node-01 | 192.168.2.145 | 4c/16g/200g | node |
| k8s-node-02 | 192.168.2.146 | 4c/16g/200g | node |
| k8s-node-03 | 192.168.2.150 | 4c/16g/200g | node |
| k8s-node-monitor | 192.168.2.149 | 4c/8g/200g | node |
| vip | 192.168.2.170 | - | vip |
因为这里是多个master高可用,所以需要用到高可用的VIP:192.168.2.170
修改各服务器的hostname:
sudo hostnamectl set-name k8s-master-01
添加各服务器的映射:
sudo vim /etc/hosts
# 输入
192.168.2.143 k8s-master-01
192.168.2.144 k8s-master-02
192.168.2.145 k8s-node-01
192.168.2.146 k8s-node-02
192.168.2.151 k8s-node-03
192.168.2.149 k8s-node-monitor
2.1.2 关闭swap(所有服务器上执行)
sudo swapoff -a # 关闭swap
sudo vim /etc/fstab # 注释swap那一行,持久化生效
sudo swapon -show # 验证swap是否关闭,如果关闭则无结果
2.1.3 内核配置(所有服务器上执行)
echo "net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.d/k8s.conf
sudo modprobe br_netfilter
sudo sysctl --system
2.2 安装Docker(所有服务器上执行)
sudo apt-get install docker.io -y
配置国内镜像加速,可以设置成自己的阿里云镜像加速器:
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://ntcaogap.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
2.3 安装kubeadm、kubectl、kubelet(所有服务器上执行)
2.3.1 创建安装脚本
sudo mkdir k8s
cd k8s
sudo vim kubeadm-install.sh
贴入下面安装脚本
#!/bin/bash
#上面那行注释不要删除
apt update && apt install -y ca-certificates curl software-properties-common apt-transport-https curl
curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main"
apt-get update
apt-cache madison kubelet kubectl kubeadm |grep '1.23.1-00'
apt install -y kubelet=1.23.1-00 kubectl=1.23.1-00 kubeadm=1.23.1-00
apt-mark hold kubelet kubeadm kubectl
sudo sh kubeadm-install.sh # 执行安装脚本
安装完成
2.3.2 再次禁用swap
sudo vim /etc/default/kubelet
KUBELET_EXTRA_ARGS="--fail-swap-on=false" # 添加这段
sudo systemctl daemon-reload && sudo systemctl restart kubelet # 重新加载配置并重启kubelet
2.3.3 修改cgroup管理器
sudo vim /etc/docker/daemon.json
{
"exec-opts": [
"native.cgroupdriver=systemd"
], // 添加此配置
"registry-mirrors": ["https://xxxx.mirror.aliyuncs.com"] // 用自己的阿里云镜像加速器
}
sudo systemctl restart docker
sudo systemctl restart kubelet
2.4 创建集群
2.4.1 创建master节点(在 k8s-master-01 上执行)
sudo kubeadm init \
--kubernetes-version=v1.23.1 \
--image-repository=registry.aliyuncs.com/google_containers \
--apiserver-advertise-address=192.168.2.143 \
--control-plane-endpoint=192.168.2.143:6443 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.24.0.0/16 \
--token-ttl=0 \
--apiserver-cert-extra-sans="192.168.2.144,192.168.2.141,192.168.2.142,192.168.2.170,kubernetes.xxxxxx.com" \
--ignore-preflight-errors=Swap
- –image-repository: Kubenetes默认Registries地址是 k8s.gcr.io,国内不可用,这里指定为国内阿里云镜像地址
- –apiserver-advertise-address: 指定master的地址
- –control-plane-endpoint: 这里我们是master高可用环境,所以需要加上 --control-plane-endpint 参数,kubeadm不支持将没有 --control-plane-endpoint 参数的单个控制平面集群转换为高可用性集群,先设置为本机IP,后续需要将地址指向为高可用VIP地址
- –service-cidr: 指定Service的IP地址范围,此处的范围是 10.96.0.0~10.96.255.255
- –pod-network-cidr: 指定Pod的IP地址范围,此处的范围是 10.24.0.0~10.24.255.255
- –token-ttl: 指定Token过期时间,默认是24h0m0s,为0即永不过期
- –apiserver-cert-extra-sans: 指定证书额外的SANs,可以将所有master的IP、高可用VIP以及后面可能用到的ip或域名放进去,以防之后再加时需要更新ApiServer的证书
- –ignore-preflight-errors: 忽略运行时的错误,有多个忽略是需要写多个参数,而不是简单的字符串拼接,例如执行时存在[ERROR NumCPU]和[ERROR Swap],忽略这两个报错就是增加–ignore-preflight-errors=NumCPU 和–ignore-preflight-errors=Swap的配置即可
创建完成会输出以下信息:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.2.143:6443 --token yhqzst.y1aupc1ebpzox9sw \
--discovery-token-ca-cert-hash sha256:5b7c5fdf7823d70861fde94781c9d6aa3a402067e8885877733933f3d4f2dc17
根据上面提示,在k8s-master-01上执行以下语句:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
2.4.2 加入另外的master节点
先在k8s-master-01上执行一下语句:
sudo kubeadm init phase upload-certs --upload-certs
会得到一个证书Key:
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
1f68ef066fc0133ca1a127fb6e9d91ef13baedba7940bdb6627507b1d0468648
然后再k8s-master-02上将刚才得到的Key和之前初始化得到的Join的语句拼接起来:
sudo kubeadm join 192.168.2.143:6443 --token yhqzst.y1aupc1ebpzox9sw \
--discovery-token-ca-cert-hash sha256:5b7c5fdf7823d70861fde94781c9d6aa3a402067e8885877733933f3d4f2dc17 --control-plane --certificate-key 1f68ef066fc0133ca1a127fb6e9d91ef13baedba7940bdb6627507b1d0468648
这里加上了–control-plane --certificate-key的参数,加入进去才是master节点,不然直接join就是node节点。
2.4.3 加入node节点
在所有node节点上执行下面语句:
sudo kubeadm join 192.168.2.143:6443 --token yhqzst.y1aupc1ebpzox9sw \
--discovery-token-ca-cert-hash sha256:5b7c5fdf7823d70861fde94781c9d6aa3a402067e8885877733933f3d4f2dc17
如果是多master集群,请将上面的master的ip地址改为负载均衡的VIP地址。
2.4.4 查看集群节点
在k8s-master-01节点上执行:
kubectl get nodes -o wide
可以查看到目前已经部署了2个master节点,3个node节点,这里的节点状态依然是NotReady
2.5 安装网络插件
以下网络插件2选1即可,只需要在k8s-master-01上执行
2.5.1 安装 Calico
下载部署的yaml,如果不能访问,可以先在可以访问国外的环境下先行下载,然后复制进
curl https://docs.projectcalico.org/manifests/calico.yaml -O
修改yaml中的pod网络:
sudo vim calico.yaml
...
- name: CALICO_IPV4POOL_CIDR
value: "10.24.0.0/16"
...
这个 CALICO_IPV4POOL_CIDR 参数默认是注释的,如果不改则默认为:192.168.0.0/16,与我们之前在初始化集群的时候设置的 --pod-network-cidr 参数不一致
部署calico:
kubectl apply -f calico.yaml
2.5.2 安装 Flannel
下载部署的yaml,如果不能访问,可以先在可以访问国外的环境下先行下载,然后复制进
curl https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml -O
修改yaml中的pod网络:
...
net-conf.json: |
{
"Network": "10.24.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
...
Flannel 默认pod网络为:10.244.0.0/16,修改为之前设置的 --pod-network-cidr 参数保持一致
部署flannel:
sudo kubectl apply -f kube-flannel.yaml
2.5.3 查看集群节点状态
等待完成之后查看集群状态:
sudo kubectl get nodes -o wide
所有节点都是Ready即为完成
3 Kubernetes Dashboard
已丢弃(字数超过了发文限制,所以抛弃了),推荐用 Kuboard,界面更清晰友好,符合国人审美
4 Ingress-Nginx
4.1 下载deploy的yaml文件
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.6.4/deploy/static/provider/cloud/deploy.yaml
更新yaml文件中的三处镜像地址为国内地址:
registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 →
anjia0532/google-containers.ingress-nginx.controller:v1.6.4registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20220916-gd32f8c343@sha256:39c5b2e3310dc4264d638ad28d9d1d96c4cbb2b2dcfb52368fe4e3c63f61e10f → anjia0532/google-containers.ingress-nginx.kube-webhook-certgen:v20220916-gd32f8c343
修改部署方式为DaemonSet
...
apiVersion: apps/v1
# kind: Deployment
kind: DaemonSet
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.6.4
name: ingress-nginx-controller
namespace: ingress-nginx
...
# dnsPolicy: ClusterFirst
nodeSelector:
# kubernetes.io/os: linux
app: ingress
hostNetwork: true # 增加
serviceAccountName: ingress-nginx
...
增加了hostNetwork参数,controller的地址将直接采用node节点的宿主机ip,从而固定ip地址。
4.2 修改node节点的label
由于上面修改节点选择方式为label具备app=ingress的节点,所以需要为所有node节点修改label,在master节点上执行:
kubectl label nodes k8s-node-01 app="ingress"
kubectl label nodes k8s-node-02 app="ingress"
kubectl label nodes k8s-node-03 app="ingress"
4.3 部署ingress-nginx
sudo kubectl apply -f deploy.yaml
Note:如果日志显示访问apiserver不通,比如:dial tcp 10.96.0.1:443: timeout,需要修改kube-proxy:
- 修改配置:kubectl edit cm kube-proxy -n kube-system
{ ... iptables: masqueradeAll: true //默认为false mode: "ipvs" //默认为空 ... }
- 删除kube-proxy所有pod:kubectl get pod -n kube-system |grep kube-proxy |awk ‘{system(“kubectl delete pod “$1” -n kube-system”)}’
删除之后会自动创建
4.4 给Dashboard配置Ingress
Dashboard 都丢弃了,还要这个干啥,也丢弃
4.5 配置ingress负载均衡
4.5.1 生成证书
sudo openssl req -x509 -newkey rsa:4096 -sha256 -nodes -keyout xxxxxx.com.key -out xxxxxx.com.pem -days 3650
4.5.2 创建负载均衡配置
修改/etc/nginx/conf.d/nginx.conf:
sudo vim /etc/nginx/conf.d/nginx.conf
贴入以下配置:
upstream balance_ingress_server {
# 服务器的访问地址,负载均衡算法默认使用权重轮询,也可以采用其他算法。
# 这里直接配置ingress所在节点的ip
server 192.168.2.145:443;
server 192.168.2.146:443;
server 192.168.2.151:443;
}
# 负载均衡服务
# 80端口直接转发到443
server {
# 负载均衡的监听端口
listen 80;
listen [::]:80;
# 负载均衡服务器的服务名称,没有时填写 _
server_name xxxxxx.com;
rewrite ^/(.*) https://$server_name$request_uri? permanent;
# return 301 https://$server_name$request_uri;
}
server {
# 负载均衡的监听端口
listen 443 ssl http2;
listen [::]:443 ssl http2;
# 负载均衡服务器的服务名称,没有时填写 _
server_name xxxxxx.com;
ssl_certificate /etc/nginx/cert/xxxxxx.com.pem;
ssl_certificate_key /etc/nginx/cert/xxxxxx.com.key;
ssl_session_timeout 10m;
ssl_protocols TLSV1 TLSv1.1 TLSv1.2;
ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/k8s-ingress-access.log;
error_log /var/log/nginx/k8s-ingress-error.log;
location / {
# 代理转发Ingress服务器
proxy_pass https://balance_ingress_server;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Note: 因为有些服务都是https访问,所以在做负载均衡时,要指向目标ip的443端口,不然导致 ERR_TOO_MANY_REDIRECTS
5 私有镜像仓库
5.1 给指定节点打好标签
kubectl label nodes k8s-node-01 registry="yes"
5.2 准备部署yaml
创建Namespace,registry-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: docker-registry
创建StorageClass,registry-sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
创建PersistentVolume,registry-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: docker-registry-pv
labels:
pv: docker-registry-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /data/docker # 记得创建此目录
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-node-01 # 指定打好标签的node节点
创建PersistantVolumnClaim,registry-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: docker-registry-pvc
namespace: docker-registry
spec:
resources:
requests:
storage: 10Gi
accessModes:
- ReadWriteMany
storageClassName: local-storage
selector:
matchLabels:
pv: docker-registry-pv
创建Deployment,registry-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: docker-registry
name: docker-registry
namespace: docker-registry
spec:
replicas: 1
revisionHistoryLimit: 5
selector:
matchLabels:
registry: "yes" # 指定打标签的节点
template:
metadata:
labels:
registry: "yes"
spec:
securityContext:
runAsUser: 0
containers:
- name: docker-registry
image: registry:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5000
name: web
protocol: TCP
resources:
requests:
memory: 200Mi
cpu: "0.1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/registry/
name: docker-registry-data
volumes:
- name: docker-registry-data
persistentVolumeClaim:
claimName: docker-registry-pvc
创建Service,registry-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: docker-registry-service
namespace: docker-registry
spec:
ports:
- name: port-name
port: 5000
protocol: TCP
targetPort: 5000
nodePort: 30500
selector:
registry: "yes" # 指定打标签的节点
type: NodePort
5.3 应用yaml文件
kubectl apply -f registry-namespace.yaml
kubectl apply -f registry-sc.yaml
kubectl apply -f registry-pv.yaml
kubectl apply -f registry-pvc.yaml
kubectl apply -f registry-deploy.yaml
kubectl apply -f registry-svc.yaml
5.4 推送本地image到仓库
5.4.1 增加仓库地址
Windows系统安装Docker Desktop,在设置中修改Docker Engine:
...
"insecure-registries": [
"192.168.2.145:30500"
],
"registry-mirrors": [
"http://192.168.2.145:30500"
]
Linux/MacOs修改/etc/docker/daemon.json,同样贴入上面代码
5.4.2 给本地image打标签
docker tag miniapi:dev 192.168.2.145:30500/miniapi:v1.0.0
5.4.2 推送并查看
# 推送
docker push 192.168.2.145:30500/miniapi:v1.0.0
# 查看仓库内所有镜像
curl http://192.168.2.145:30500/v2/_catalog
# 查看指定镜像的tags
curl http://192.168.2.145:30500/v2/miniapi/tags/list
5.4.3 删除指定标签的镜像
curl -X DELETE http://192.168.2.145:30500/v2/miniapi/manifests/v1.0.0
6 服务注册与发现
6.1 给指定节点打好标签
kubectl label nodes k8s-node-01 serviceDiscovery="consul"
kubectl label nodes k8s-node-02 serviceDiscovery="consul"
kubectl label nodes k8s-node-03 serviceDiscovery="consul"
6.2 Kubectl部署
6.2.1 部署consul service
创建consul-svc.yaml
apiVersion: v1
kind: Namespace
metadata:
name: consul
---
apiVersion: v1
kind: Service
metadata:
name: consul-svc
namespace: consul
labels:
name: consul
spec:
type: ClusterIP
ports:
- name: http
port: 8500
targetPort: 8500
- name: https
port: 8443
targetPort: 8443
- name: rpc
port: 8400
targetPort: 8400
- name: serflan-tcp
protocol: "TCP"
port: 8301
targetPort: 8301
- name: serflan-udp
protocol: "UDP"
port: 8301
targetPort: 8301
- name: serfwan-tcp
protocol: "TCP"
port: 8302
targetPort: 8302
- name: serfwan-udp
protocol: "UDP"
port: 8302
targetPort: 8302
- name: server
port: 8300
targetPort: 8300
- name: consuldns
port: 8600
targetPort: 8600
selector:
serviceDiscovery: consul
kubectl apply -f consul-svc.yaml
6.2.2 部署consul server
创建consul-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: consul
namespace: consul
spec:
serviceName: consul-svc
replicas: 3
selector:
matchLabels:
serviceDiscovery: consul
template:
metadata:
labels:
serviceDiscovery: consul
spec:
terminationGracePeriodSeconds: 10
containers:
- name: consul
image: hashicorp/consul:latest
args:
- "agent"
- "-server"
- "-bootstrap-expect=3"
- "-ui"
- "-data-dir=/consul/data"
- "-bind=0.0.0.0"
- "-client=0.0.0.0"
- "-advertise=$(PODIP)"
- "-retry-join=consul-0.consul-svc.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-1.consul-svc.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-2.consul-svc.$(NAMESPACE).svc.cluster.local"
- "-domain=cluster.local"
- "-disable-host-node-id"
volumeMounts:
- name: data
mountPath: /consul/data
env:
- name: PODIP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- containerPort: 8500
name: ui-port
- containerPort: 8400
name: alt-port
- containerPort: 53
name: udp-port
- containerPort: 8443
name: https-port
- containerPort: 8080
name: http-port
- containerPort: 8301
name: serflan
- containerPort: 8302
name: serfwan
- containerPort: 8600
name: consuldns
- containerPort: 8300
name: server
volumes:
- name: data
hostPath:
path: /data/consul
kubectl apply -f consul-statefulset.yaml
这里采用StatefulSet的方式,是为了可以保证Pod的名称固定,当名称固定且数量固定,pod命名规则就是:$(name)-(序列),如此可以预先知道dns规则,以便在加入集群中自动加入,如其中参数:-retry-join=consul-0.consul-svc.$(NAMESPACE).svc.cluster.local
6.3 Helm部署(推荐)
6.3.1 添加Helm仓库
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
6.3.2 编写自定义部署配置
# 查看默认配置
helm inspect values hashicorp/consul
# 创建配置文件复制默认配置,然后修改
sudo vim values.yaml
修改以下内容:
...
server:
replicas: 3 # 部署三个副本
...
6.3.3 部署PV
consul helm安装默认数据存储为pv,所以需要部署pv,不然pvc会一直pending;当然也可以先部署consul,找到pvc的name之后,根据这些name再创建pv,保持name和storageClassName和pvc一致,不然无法绑定
编辑consul-pv.yaml
apiVersion: v1
kind: Namespace
metadata:
name: consul
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: data-consul-consul-consul-server-0
namespace: consul
labels:
type: local
spec:
storageClassName: ""
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/consul"
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: data-consul-consul-consul-server-1
namespace: consul
labels:
type: local
spec:
storageClassName: ""
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/consul"
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: data-consul-consul-consul-server-2
namespace: consul
labels:
type: local
spec:
storageClassName: ""
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/consul"
# 给挂载目录授权
sudo mkdir /data/consul
sudo chmod -R 777 /data/consul
# 创建pv
kubectl apply -f consul-pv.yaml -n consul
6.3.4 部署
helm install consul hashicorp/consul -n consul --values values.yaml
6.4 部署consul ingress
创建consul-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: consul-ingress
namespace: consul
spec:
ingressClassName: "nginx"
rules:
- host: consul.xxxxxx.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: consul-consul-ui # 此处为helm部署的服务名,如果是yaml文件部署配置成consul-svc,保持服务名一致
port:
number: 80 # 此处为helm部署的端口,如果是yaml文件部署配置成8500,保持和服务端口一致
kubectl apply -f consul-ingress.yaml
配置好本地DNS解析之后就可以通过:https://consul.xxxxxx.com 访问了
7 服务监控
7.1 部署node-exporter
创建部署node-exporter.yaml:
kind: DaemonSet
apiVersion: apps/v1
metadata:
labels:
app: node-exporter
name: node-exporter
namespace: monitor
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
containers:
- name: node-exporter
image: prom/node-exporter:latest
ports:
- containerPort: 9100
protocol: TCP
name: http
hostNetwork: true # 获得Node的物理指标信息
hostPID: true # 获得Node的物理指标信息
# tolerations: # Master节点
# - effect: NoSchedule
# operator: Exists
---
kind: Service
apiVersion: v1
metadata:
labels:
app: node-exporter
name: node-exporter-svc
namespace: monitor
spec:
ports:
- name: http
port: 9100
nodePort: 31672
protocol: TCP
type: NodePort
selector:
app: node-exporter
kubectl apply -f node-exporter.yaml
# 部署完成之后可以访问验证
curl http://192.168.2.145:31672/metrics
7.2 部署Prometheus
创建prometheus.yaml:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs:
- get
- watch
- list
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- watch
- list
- nonResourceURLs: ["/metrics"]
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitor
labels:
app: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitor
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitor
labels:
app: prometheus
data:
prometheus.yml: |-
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'grafana'
static_configs:
- targets:
- 'grafana-svc.monitor:3000'
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# Keep only the default/kubernetes service endpoints for the https port. This
# will add targets for each API server which Kubernetes adds an endpoint to
# the default/kubernetes service.
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# Scrape config for nodes (kubelet).
#
# Rather than connecting directly to the node, the scrape is proxied though the
# Kubernetes apiserver. This means it will work if Prometheus is running out of
# cluster, or can't connect to nodes for some other reason (e.g. because of
# firewalling).
- job_name: 'kubernetes-nodes'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for Kubelet cAdvisor.
#
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
# (those whose names begin with 'container_') have been removed from the
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
# retrieve those metrics.
#
# In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
# the --cadvisor-port=0 Kubelet flag).
#
# This job is not necessary and should be removed in Kubernetes 1.6 and
# earlier versions, or it will cause the metrics to be scraped twice.
- job_name: 'kubernetes-cadvisor'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
# Example scrape config for probing ingresses via the Blackbox Exporter.
#
# The relabeling allows the actual ingress scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-ingresses'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitor
labels:
app: prometheus
data:
cpu-usage.rule: |
groups:
- name: NodeCPUUsage
rules:
- alert: NodeCPUUsage
expr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75
for: 2m
labels:
severity: "page"
annotations:
summary: "{{$labels.instance}}: High CPU usage detected"
description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: "prometheus-data-pv"
labels:
name: prometheus-data-pv
release: stable
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: local-storage
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- monitor
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data-pvc
namespace: monitor
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: local-storage
selector:
matchLabels:
name: prometheus-data-pv
release: stable
---
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: prometheus
name: prometheus
namespace: monitor
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
securityContext:
runAsUser: 0
containers:
- name: prometheus
image: bitnami/prometheus:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /prometheus
name: prometheus-data-volume
- mountPath: /etc/prometheus/prometheus.yml
name: prometheus-config-volume
subPath: prometheus.yml
- mountPath: /etc/prometheus/rules
name: prometheus-rules-volume
ports:
- containerPort: 9090
protocol: TCP
volumes:
- name: prometheus-data-volume
persistentVolumeClaim:
claimName: prometheus-data-pvc
- name: prometheus-config-volume
configMap:
name: prometheus-config
- name: prometheus-rules-volume
configMap:
name: prometheus-rules
nodeSelector:
app: monitor
---
kind: Service
apiVersion: v1
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
app: prometheus
name: prometheus-svc
namespace: monitor
spec:
ports:
- port: 9090
targetPort: 9090
selector:
app: prometheus
type: NodePort
kubectl apply -f prometheus.yaml
# 部署之后访问验证
curl http://192.168.2.149:32101/targets
7.3 部署Grafana
创建grafana.yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: "grafana-data-pv"
labels:
name: grafana-data-pv
release: stable
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: local-storage
local:
path: /data/grafana
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- monitor
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-data-pvc
namespace: monitor
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: local-storage
selector:
matchLabels:
name: grafana-data-pv
release: stable
---
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: grafana
name: grafana
namespace: monitor
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
runAsUser: 0
containers:
- name: grafana
image: grafana/grafana:latest
imagePullPolicy: IfNotPresent
env:
- name: GF_AUTH_BASIC_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "false"
readinessProbe:
httpGet:
path: /login
port: 3000
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-data-volume
ports:
- containerPort: 3000
protocol: TCP
volumes:
- name: grafana-data-volume
persistentVolumeClaim:
claimName: grafana-data-pvc
nodeSelector:
app: monitor
---
kind: Service
apiVersion: v1
metadata:
labels:
app: grafana
name: grafana-svc
namespace: monitor
spec:
ports:
- port: 3000
targetPort: 3000
selector:
app: grafana
type: NodePort
kubectl apply -f grafana.yaml
在浏览器中访问:http://192.168.2.149:32129/ ,默认用户名密码:admin/admin,第一次进去需要修改密码
8 链路追踪
创建zipkin-server.yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: zipkin
namespace: monitor
labels:
app: zipkin
spec:
replicas: 1
selector:
matchLabels:
app: zipkin
template:
metadata:
labels:
app: zipkin
spec:
containers:
- name: zipkin
image: openzipkin/zipkin:latest
imagePullPolicy: IfNotPresent
env:
- name: JAVA_OPTS
value: "
-Xms512m -Xmx512m
-Dlogging.level.zipkin=DEBUG
-Dlogging.level.zipkin2=DEBUG
-Duser.timezone=Asia/Shanghai
"
# - name: STORAGE_TYPE
# value: "elasticsearch" #设置数据存储在ES中
# - name: ES_HOSTS
# value: "elasticsearch-svc.monitor:9200" #ES地址
# - name: ES_INDEX #设置ES中存储的zipkin索引名称
# value: "zipkin"
# - name: ES_INDEX_REPLICAS #ES索引副本数
# value: "1"
# - name: ES_INDEX_SHARDS #ES分片数量
# value: "3"
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
nodeSelector:
app: monitor
---
apiVersion: v1
kind: Service
metadata:
name: zipkin-svc
namespace: monitor
labels:
app: zipkin
spec:
type: NodePort
ports:
- port: 9411
targetPort: 9411
nodePort: 30190
selector:
app: zipkin
kubectl apply -f zipkin-server.yaml
部署完成之后,访问验证:http://192.168.2.149:30190/zipkin/
9 ELK
将环境部署在k8s-node-monitor节点上
9.1 安装ElasticSearch
创建elasticsearch.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-config
namespace: monitor
data:
elasticsearch.yml: |-
cluster.name: ${CLUSTER_NAME}
node.name: ${NODE_NAME}
network.host: 0.0.0.0
xpack.security.enabled: false
xpack.monitoring.collection.enabled: true
xpack.security.http.ssl.enabled: false
# xpack.security.http.ssl.keystore.path: certs/certificate.pem
# xpack.security.http.ssl.keystore.password: xxxxxx@888
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: false
# xpack.security.transport.ssl.verification_mode: certificate
# xpack.security.transport.ssl.keystore.path: certs/certificate.pem
# xpack.security.transport.ssl.truststore.path: certs/certificate.pem
# xpack.security.transport.ssl.keystore.password: xxxxxx@888
---
apiVersion: apps/v1
kind: Deployment
metadata:
generation: 1
labels:
app: elasticsearch
name: elasticsearch
namespace: monitor
spec:
minReadySeconds: 10
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: elasticsearch
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
app: elasticsearch
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
- name: xpack.security.enrollment.enabled
value: 'true'
- name: CLUSTER_NAME
value: elasticsearch
- name: NODE_NAME
value: elasticsearch
- name: ELASTIC_USERNAME
value: elastic
- name: ELASTIC_PASSWORD
value: xxxxxx@999
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: MINIMUM_MASTER_NODES
value: "1"
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.1
imagePullPolicy: IfNotPresent
name: elasticsearch
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: es-persistent-storage
- mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
name: elasticsearch-config
subPath: elasticsearch.yml
- mountPath: /usr/share/elasticsearch/config/certs
name: elasticsearch-certs
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: user-1-registrysecret
initContainers:
- command:
- /sbin/sysctl
- -w
- vm.max_map_count=262144
image: alpine:3.6
imagePullPolicy: IfNotPresent
name: elasticsearch-init
resources: {}
securityContext:
privileged: true
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /opt/monitor/es_data
type: ""
name: es-persistent-storage
- configMap:
defaultMode: 429
name: elasticsearch-config
name: elasticsearch-config
- secret:
secretName: elasticsearch-certs
name: elasticsearch-certs
nodeSelector:
app: monitor
创建elasticsearch-svc.yaml:
apiVersion: v1
kind: Service
metadata:
namespace: monitor
name: elasticsearch-svc
labels:
app: elasticsearch
spec:
type: NodePort
ports:
- port: 9200
targetPort: 9200
nodePort: 30920
name: elasticsearch
selector:
app: elasticsearch
部署yaml:
kubectl apply -f elasticsearch.yaml
kubectl apply -f elasticsearch-svc.yaml
9.2 安装Kibana
创建kibana.yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: monitor
labels:
app: kibana
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
affinity:
nodeAffinity: {}
containers:
- name: kibana
image: docker.elastic.co/kibana/kibana:8.11.1
ports:
- containerPort: 5601
protocol: TCP
env:
- name: ELASTICSEARCH_URL
value: http://10.96.81.0:9200 # 这里是ES的集群id+端口
nodeSelector:
app: monitor
---
apiVersion: v1
kind: Service
metadata:
name: kibana-svc
namespace: monitor
labels:
app: kibana
spec:
ports:
- protocol: TCP
port: 80
targetPort: 5601
selector:
app: kibana
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kibana-ingress
namespace: monitor
spec:
ingressClassName: "nginx"
rules:
- host: kibana.xxxxxx.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kibana-svc
port:
number: 80
部署yaml:
kubectl apply -f kibana.yaml
部署完成之后,在kibana这个pod中查看日志可以看到一个地址:http://0.0.0.0:5601?code=xxxxxx
这末尾的6位数字记录下来,配置好本地dns解析之后访问:https://kibana.xxxxxx.com 进入配置页面,选择手动配置,输入好正确的es地址(集群id+端口),然后输入上面的6位code,即可进入kibana
PS:如果kibana运行一段时间后出现index_not_found之类的错误,说明es没有设置自动创建索引,可以在kibana中执行以下脚本:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "true"
}
}
10 RDB主备
10.1 安装PostgreSql
10.1.1 添加源
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
10.1.2 安装
sudo apt-get -y install postgresql
sudo apt-get install postgresql-contrib
10.1.3 修改默认用户密码
sudo -u postgres psql
alter user postgres with password 'xxxxxx@999';
10.1.4 设置远程访问
sudo vim /etc/postgresql/16/main/postgresql.conf
修改listen_address,将localhost改为*
#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------
# - Connection Settings -
listen_addresses = '*' # what IP address(es) to listen on;
sudo vim /etc/postgresql/16/main/pg_hba.conf
在最后一行添加
host all all 0.0.0.0/0 md5
保存重启
sudo service postgresql restart
设置开启自启动
sudo systemctl enable postgresql.service
10.1.5 配置主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | PostgreSql 16.1 | Master |
| db-02 | 192.168.2.148 | PostgreSql 16.1 | Slave |
10.1.5.1 在Master上配置
添加主从备份的用户
sudo -u postgres psql
# 添加用户并赋予replication和login权限(后面使用此用户来进行主从)
create role replica login replication encrypted password 'xxxxxx@888';
配置参数
sudo vim /etc/postgresql/16/main/postgresql.conf
max_connections = 500
hot_standby = on # 打开热备
wal_level = replica # 设置 WAL 日志级别为 replica
max_wal_senders = 10 # 允许的 WAL 发送者数量,根据需要进行调整
archive_mode = on
archive_command = 'test ! -f /data/postgresql/archive/%f && cp %p /data/postgresql/archive/%f'
wal_keep_size = 256
wal_sender_timeout = 60s
sudo mkdir /data/postgresql/archive
允许slave同步
sudo vim /etc/postgresql/16/main/pg_hba.conf
在最后一行添加
host replication replica 192.168.2.148/24 md5
保存重启
sudo service postgresql restart
10.1.5.2 在Slave上配置
sudo systemctl stop postgresql
su - postgres
# 复制一份,防止误操作
cp -r /var/lib/postgresql/16/main /var/lib/postgresql/16/main.bak
# 清除本地数据
rm -rf /var/lib/postgresql/16/main
执行命令从主数据库备份数据
pg_basebackup -h 192.168.2.147 -U replica -F p -X stream -P -R -D /var/lib/postgresql/16/main
-h –指定作为主服务器的主机
-D –指定数据目录
-U –指定连接用户
-P –启用进度报告
-v –启用详细模式
-R –启用恢复配置的创建:创建一个standby.signal文件,并将连接设置附加到数据目录下
配置参数
sudo vim /etc/postgresql/16/main/postgresql.conf
# 设置 WAL 日志级别为 replica
wal_level = replica
# 允许的 WAL 发送者数量,根据需要进行调整
max_wal_senders = 10
# 一般从库做主要的读服务时,设置值需要高于主
max_connections = 1000
# 在备份的同时允许查询
hot_standby = on
# 流复制最大延迟 (可选)
max_standby_streaming_delay = 30s
# 从向主报告状态的最大间隔时间 (可选)
wal_receiver_status_interval = 10s
# 查询冲突时向主反馈 #默认参数,非主从配置相关参数,表示到数据库的连接数 (可选)
hot_standby_feedback = on
保存重启
sudo systemctl restart postgresql
10.1.5.3 验证
在Master服务器中执行:
sudo -u postgres psql
select client_addr,sync_state from pg_stat_replication;
可以在master中新建库表,看是否同步到slave中。
10.2 安装MySql
10.2.1 安装
sudo apt-get update
sudo apt-get install mysql-server -y
10.2.2 修改配置(主从都修改)
sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
修改以下内容:
bind-address=0.0.0.0 # 这一行注释掉,或者改为0.0.0.0
server-id = 147 # mysql服务的唯一ID,主从之间全局唯一,一般用IP的最后一段
log_bin = /var/log/mysql/mysql-bin.log # binlog日志
# binlog_do_db = include_database_name # 需要同步的数据库,多个就配置多条,不要用逗号隔开
保存重启
sudo systemctl restart mysql.service
10.2.3 修改root用户密码
# 查看默认用户名密码
sudo cat /etc/mysql/debian.cnf
# 登录默认用户
mysql -u debian-sys-maint -p
mysql> use mysql;
mysql> update user set authentication_string='' where user='root';
mysql> alter user 'root'@'localhost' identified with mysql_native_password by 'xxxxxx@999';
修改后重启服务:
sudo service mysql restart
10.2.4 配置主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | Mysql 8.0.35 | Master |
| db-02 | 192.168.2.148 | Mysql 8.0.35 | Slave |
10.2.3 配置master
mysql -u root -p
# 创建同步账户
mysql> CREATE USER 'repl'@'%' IDENTIFIED WITH mysql_native_password BY 'xxxxxx@888';
# 给账户授权
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
mysql> flush privileges;
10.2.4 配置slave
mysql -u root -p
mysql> stop replica;
mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST='192.168.2.147',SOURCE_PORT=3306,SOURCE_USER='repl',SOURCE_PASSWORD='xxxxxx@888';
mysql> start replica;
10.2.5 验证
# 查看同步状态
mysql> show replica status\G;
可以看到这两项状态为Yes即为成功,Connecting或者No都为失败
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
可以在master中新建库表,看是否同步到slave中。
11 Redis主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | Redis 6.0.16 | Master |
| db-02 | 192.168.2.148 | Redis 6.0.16 | Slave |
11.1 安装
sudo apt-get update
sudo apt-get install redis-server -y
11.2 master配置
sudo vim /etc/redis/redis.conf
修改以下配置:
bind 0.0.0.0
port 6379
daemonize yes
protected-mode no
requirepass xxxxxx@999
masterauth xxxxxx@999
sudo systemctl restart redis
11.3 slave配置
sudo vim /etc/redis/redis.conf
修改以下配置:
bind 0.0.0.0
port 6379
daemonize yes
protected-mode no
requirepass xxxxxx@999
masterauth xxxxxx@999
replicaof 192.168.2.147 6379
replica-read-only yes # 默认从库只读
sudo systemctl restart redis
11.4 验证
在master查看日志
tail -f /var/log/redis/redis-server.log
可以看到已经同步成功
可以在master上创建key-value,查看slave上是否已同步。
12 RabbitMQ主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | RabbitMQ 3.9.13 | Master |
| db-02 | 192.168.2.148 | RabbitMQ 3.9.13 | Slave |
sudo vim /etc/hosts
# 写入ip映射
192.168.2.147 db-01
192.168.2.147 db-02
保存重启
sudo reboot
12.1 安装
# 安装erlang
sudo apt-get install erlang-nox -y
# 安装rabbitmq
wget -O- https://www.rabbitmq.com/rabbitmq-release-signing-key.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install rabbitmq-server -y
# 查看状态
sudo systemctl status rabbitmq-server
# 启动管理页面
sudo rabbitmq-plugins enable rabbitmq_management
# 创建管理用户
sudo rabbitmqctl add_user admin xxxxxx@999
sudo rabbitmqctl set_user_tags admin administrator
sudo rabbitmqctl set_permissions -p / admin ".*" ".*" ".*"
12.2 配置主从
12.2.1 复制.erlang.cookie文件
# 复制master的.erlang.cookie文件到slave
sudo scp /var/lib/rabbitmq/.erlang.cookie yj@db-02:/var/lib/rabbitmq/
# 如果权限不够,可以先复制到用户home下,然后再移动到/var/lib/rabbitmq/
# 新的.erlang.cookie需要授权
sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
sudo chmod 600 /var/lib/rabbitmq/.erlang.cookie
保持master和slave的.erlang.cookie一致
12.2.2 重启每个节点
sudo rabbitmqctl stop
sudo rabbitmq-server -detached
查看rabbitmq的状态,如果没启动,或者重新启动还是报错,可以reboot即可
12.2.3 加入集群
在slave上执行:
sudo rabbitmqctl stop_app
sudo rabbitmqctl join_cluster rabbit@db-01
sudo rabbitmqctl start_app
查看集群状态:
sudo rabbitmqctl cluster_status
进入管理页面:http://192.168.2.147:15672/#/ 可以看到集群节点。
13 部署Apollo配置中心
13.1 创建数据库
进入:apollo/scripts/sql at v2.1.0 · apolloconfig/apollo · GitHub 查看数据库脚本,然后下载数据库脚本在数据库上执行,创建好ApolloConfigDB和ApolloPortalDB两个数据库。
mysql -u root -p
# 执行脚本创建数据库
mysql> source /home/yj/apollo/apolloconfigdb.sql
mysql> source /home/yj/apollo/apolloportaldb.sql
# 添加用户并授权
mysql> create user 'apollo'@'%' identified by 'xxxxxxApollo@999';
mysql> grant privileges on ApolloConfigDB.* to 'apollo'@'%';
mysql> grant privileges on ApolloPortalDB.* to 'apollo'@'%';
用户名密码:apollo/xxxxxxApollo@999
13.2 安装Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
sudo chmod 700 get_helm.sh
sudo ./get_helm.sh
# 输出
Downloading https://get.helm.sh/helm-v3.13.1-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
安装好之后输入helm即可查看信息了
13.3 添加仓库
helm repo add apollo https://charts.apolloconfig.com
# 添加好之后可以查看
helm search repo apollo
13.4 部署
创建namespace
kubectl create namespace apollo
给需要部署的node节点打标签
kubectl label nodes k8s-node-01 config="apollo"
kubectl label nodes k8s-node-02 config="apollo"
kubectl label nodes k8s-node-03 config="apollo"
13.4.1 部署service
自定义配置文件service-values.yaml
configdb:
host: 192.168.2.148
port: 3306
dbName: ApolloConfigDB
userName: apollo
password: xxxxxxApollo@999
service:
enabled: true
configService:
replicaCount: 3
nodeSelector:
config: apollo
adminService:
replicaCount: 3
nodeSelector:
config: apollo
helm install apollo-service-dev -f service-values.yaml -n apollo apollo/apollo-service
# 输出
NAME: apollo-service-dev
LAST DEPLOYED: Fri Dec 8 05:54:34 2023
NAMESPACE: apollo
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Meta service url for current release:
echo http://apollo-service-dev-apollo-configservice.apollo:8080
For local test use:
export POD_NAME=$(kubectl get pods --namespace apollo -l "app=apollo-service-dev-apollo-configservice" -o jsonpath="{.items[0].metadata.name}")
echo http://127.0.0.1:8080
kubectl --namespace apollo port-forward $POD_NAME 8080:8080
Urls registered to meta service:
Config service: http://apollo-service-dev-apollo-configservice.apollo:8080
Admin service: http://apollo-service-dev-apollo-adminservice.apollo:8090
记录上面地址,部署portal会用到
13.4.2 部署portal
自定义配置文件portal-values.yaml
portaldb:
host: 192.1682.147
userName: apollo
password: xxxxxxApollo@999
connectionStringProperties: characterEncoding=utf8&useSSL=false
service:
enabled: true
config:
envs: dev
metaServers:
dev: http://apollo-service-dev-apollo-configservice.apollo:8080
replicaCount: 3
containerPort: 8070
nodeSelector:
config: apollo
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
hosts:
- host: apollo.xxxxxx.com
paths:
- /
helm install apollo-portal -f portal-values.yaml -n apollo apollo/apollo-portal
# 输出
NAME: apollo-portal
LAST DEPLOYED: Fri Dec 8 06:23:47 2023
NAMESPACE: apollo
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Portal url for current release:
export POD_NAME=$(kubectl get pods --namespace apollo -l "app=apollo-portal" -o jsonpath="{.items[0].metadata.name}")
echo "Visit http://127.0.0.1:8070 to use your application"
kubectl --namespace apollo port-forward $POD_NAME 8070:8070
Ingress:
http://apollo.xxxxxx.com/
配置好本地dns解析之后,访问 https://apollo.xxxxxx.com ,默认用户名密码是:apollo/admin,可以在 “管理员工具-用户管理” 中修改密码。
网友解答:--【壹】--:
感谢大佬
--【贰】--:
好长啊,喂给GPT,它能帮我搞定不?
--【叁】--:
我选k3s
--【肆】--:
很详细很好
--【伍】--:
收藏先
--【陆】--:
这待挨多少打才能成角啊
--【柒】--:
感恩佬儿分享
--【捌】--:
这得码多久才码完?
--【玖】--:
牛的大佬,我先收藏。。
--【拾】--:
已点赞,接下来刚好有项目用到k8s,到时候回来参考
--【拾壹】--:
666, 谁都不服就服大佬你
--【拾贰】--:
学习一下
--【拾叁】--:
牛逼 我服了
--【拾肆】--:
mark 有空就搞
--【拾伍】--:
感谢大佬
--【拾陆】--:
大佬牛哇
--【拾柒】--:
这一套下来服务器都力竭了
--【拾捌】--:
就喜欢码字大佬!
--【拾玖】--:
学习一下
基于K8s的微服务架构搭建完整方案
本次环境构建所有服务器操作系统均为 Ubuntu 22.04 LTS 最小安装版本
1 Nginx + KeepAlived 负载均衡和高可用环境
1.1 服务器列表
| 主机名称 | IP地址 | 配置 | 角色 | VIP |
|---|---|---|---|---|
| nginx-01 | 192.168.2.143 | 2c/8g/200g | master | 192.168.2.170 |
| nginx-02 | 192.168.2.144 | 2c/8g/200g | backup | 192.168.2.170 |
1.2 安装Nginx
sudo apt-get install nginx -y
验证:
sudo nginx -t
# 输出
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
1.3 修改默认配置
将默认的80端口改为81端口,80端口留给负载均衡
sudo vim /etc/nginx/sites-enabled/default
server {
listen 81 default_server;
listen [::]:81 default_server;
....
}
重启nginx服务
sudo systemctl restart nginx
修改默认页面方便区分是哪个节点
sudo vim /var/www/html/index.nginx-debian.html
<p><em>Thank you for using nginx.</em></p>
<p><em>This is from Nginx-01</em></p> /**新增这行 */
访问地址验证
curl localhost:81
1.4 配置负载均衡
1.4.1 配置k8s master负载均衡
sudo vim /etc/nginx/nginx.conf
输入以下信息:
...
events {
worker_connections 1024;
# multi_accept on;
}
stream {
log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';
access_log /var/log/nginx/k8s-access.log main;
upstream k8s-apiserver {
server 192.168.2.143:6443; # k8s-master-01 APISERVER IP:PORT
server 192.168.2.144:6443; # k8s-master-02 APISERVER IP:PORT
}
server {
listen 6443;
proxy_pass k8s-apiserver;
}
}
...
重启nginx服务:
sudo systemctl restart nginx
1.4.2 配置应用负载均衡
cd /etc/nginx/conf.d/
sudo vim nginx.conf
输入以下信息:
# 应用服务器地址列表配置
upstream balance_server {
# 服务器的访问地址,负载均衡算法使用权重轮询,也可以采用其他算法。
server 192.168.2.141:81 weight=1;
server 192.168.2.142:81 weight=2;
}
# 负载均衡服务
server {
# 负载均衡的监听端口
listen 80 default_server;
listen [::]:80 default_server;
# 负载均衡服务器的服务名称,没有时填写 _
server_name _;
location / {
# 代理转发应用服务
proxy_pass http://balance_server;
}
}
重启nginx服务:
sudo systemctl restart nginx
测试负载均衡:
curl localhost
连续响应三次之后,可以看到有两次是nginx-02,一次是nginx-01,说明负载均衡已生效。
1.5 安装Keepalived
1.5.1 部署
sudo apt install keepalived -y
设置随系统自动启动:
sudo vim /etc/rc.local
查看主机网卡名称:
ip a
# 输出
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:50:56:b0:e3:d7 brd ff:ff:ff:ff:ff:ff
altname enp2s1
inet 192.168.2.141/24 brd 192.168.2.255 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:feb0:e3d7/64 scope link
valid_lft forever preferred_lft forever
可以看到网卡名称为:ens33
编辑keepalived配置文件,设置VIP地址:192.168.2.170
sudo vim /etc/keepalived/keepalived.conf
再master节点中添加以下内容:
global_defs {
router_id 192.168.2.141
}
vrrp_script chk_nginx {
script "/etc/keepalived/nginx_chk.sh"
interval 2
}
vrrp_instance VI_1{
state MASTER
interface ens33 # 这个就是上面查看的网卡名称
virtual_router_id 100
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1369
}
virtual_ipaddress {
192.168.2.170
}
track_script {
chk_nginx
}
}
在backup节点中添加以下内容:
global_defs {
router_id 192.168.2.142
}
vrrp_script chk_nginx {
script "/etc/keepalived/nginx_chk.sh"
interval 2
}
vrrp_instance VI_1{
state BACKUP
interface ens33
virtual_router_id 100
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1369
}
virtual_ipaddress {
192.168.2.170
}
track_script {
chk_nginx
}
}
1.5.2 创建nginx检查脚本
sudo vim /etc/keepalived/nginx_chk.sh
输入以下内容:
#!/bin/bash
#检查是否有nginx相关的进程
A=`ps -C nginx --no-header |wc -l`
#如果没有
if [ $A -eq 0 ];then
# 重启nginx,延迟2秒
service nginx restart
sleep 2
# 重新检查是否有nginx相关的进程
if [ `ps -C nginx --no-header |wc -l` -eq 0 ];then
# 仍然没有nginx相关的进程,杀死当前keepalived,切换到备用机
killall keepalived
fi
fi
为脚本添加执行权限:
sudo chmod +x /etc/keepalived/nginx_chk.sh
检查脚本,不报错即可:
cd /etc/keepalived
./nginx_chk.sh
1.5.3 验证keepalived服务
重启keepalived服务:
sudo systemctl restart keepalived
在master节点检查VIP是否设置成功:
ip a
可以看到VIP已经出现在列表中了,注意此时backup节点是没有VIP地址的。
可以访问VIP地址,验证nginx是否正常:
curl 192.168.2.170
如果停止master的keepalived服务,可以看到master的ip列表中VIP地址消失,而backup的ip地址中出现了VIP地址
sudo systemctl stop keepalived
Nginx + Keepalived 高可用环境搭建完成!
2 KubeAdm构建K8s集群
2.1 服务器环境
2.1.1 服务器准备
kubeadm安装k8s集群要求服务器最低2核
| 主机名称 | IP地址 | 配置 | 功能 |
|---|---|---|---|
| k8s-master-01 | 192.168.2.143 | 4c/16g/200g | master,dashboard |
| k8s-master-02 | 192.168.2.144 | 4c/16g/200g | master |
| k8s-node-01 | 192.168.2.145 | 4c/16g/200g | node |
| k8s-node-02 | 192.168.2.146 | 4c/16g/200g | node |
| k8s-node-03 | 192.168.2.150 | 4c/16g/200g | node |
| k8s-node-monitor | 192.168.2.149 | 4c/8g/200g | node |
| vip | 192.168.2.170 | - | vip |
因为这里是多个master高可用,所以需要用到高可用的VIP:192.168.2.170
修改各服务器的hostname:
sudo hostnamectl set-name k8s-master-01
添加各服务器的映射:
sudo vim /etc/hosts
# 输入
192.168.2.143 k8s-master-01
192.168.2.144 k8s-master-02
192.168.2.145 k8s-node-01
192.168.2.146 k8s-node-02
192.168.2.151 k8s-node-03
192.168.2.149 k8s-node-monitor
2.1.2 关闭swap(所有服务器上执行)
sudo swapoff -a # 关闭swap
sudo vim /etc/fstab # 注释swap那一行,持久化生效
sudo swapon -show # 验证swap是否关闭,如果关闭则无结果
2.1.3 内核配置(所有服务器上执行)
echo "net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.d/k8s.conf
sudo modprobe br_netfilter
sudo sysctl --system
2.2 安装Docker(所有服务器上执行)
sudo apt-get install docker.io -y
配置国内镜像加速,可以设置成自己的阿里云镜像加速器:
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://ntcaogap.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker
2.3 安装kubeadm、kubectl、kubelet(所有服务器上执行)
2.3.1 创建安装脚本
sudo mkdir k8s
cd k8s
sudo vim kubeadm-install.sh
贴入下面安装脚本
#!/bin/bash
#上面那行注释不要删除
apt update && apt install -y ca-certificates curl software-properties-common apt-transport-https curl
curl -fsSL https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main"
apt-get update
apt-cache madison kubelet kubectl kubeadm |grep '1.23.1-00'
apt install -y kubelet=1.23.1-00 kubectl=1.23.1-00 kubeadm=1.23.1-00
apt-mark hold kubelet kubeadm kubectl
sudo sh kubeadm-install.sh # 执行安装脚本
安装完成
2.3.2 再次禁用swap
sudo vim /etc/default/kubelet
KUBELET_EXTRA_ARGS="--fail-swap-on=false" # 添加这段
sudo systemctl daemon-reload && sudo systemctl restart kubelet # 重新加载配置并重启kubelet
2.3.3 修改cgroup管理器
sudo vim /etc/docker/daemon.json
{
"exec-opts": [
"native.cgroupdriver=systemd"
], // 添加此配置
"registry-mirrors": ["https://xxxx.mirror.aliyuncs.com"] // 用自己的阿里云镜像加速器
}
sudo systemctl restart docker
sudo systemctl restart kubelet
2.4 创建集群
2.4.1 创建master节点(在 k8s-master-01 上执行)
sudo kubeadm init \
--kubernetes-version=v1.23.1 \
--image-repository=registry.aliyuncs.com/google_containers \
--apiserver-advertise-address=192.168.2.143 \
--control-plane-endpoint=192.168.2.143:6443 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.24.0.0/16 \
--token-ttl=0 \
--apiserver-cert-extra-sans="192.168.2.144,192.168.2.141,192.168.2.142,192.168.2.170,kubernetes.xxxxxx.com" \
--ignore-preflight-errors=Swap
- –image-repository: Kubenetes默认Registries地址是 k8s.gcr.io,国内不可用,这里指定为国内阿里云镜像地址
- –apiserver-advertise-address: 指定master的地址
- –control-plane-endpoint: 这里我们是master高可用环境,所以需要加上 --control-plane-endpint 参数,kubeadm不支持将没有 --control-plane-endpoint 参数的单个控制平面集群转换为高可用性集群,先设置为本机IP,后续需要将地址指向为高可用VIP地址
- –service-cidr: 指定Service的IP地址范围,此处的范围是 10.96.0.0~10.96.255.255
- –pod-network-cidr: 指定Pod的IP地址范围,此处的范围是 10.24.0.0~10.24.255.255
- –token-ttl: 指定Token过期时间,默认是24h0m0s,为0即永不过期
- –apiserver-cert-extra-sans: 指定证书额外的SANs,可以将所有master的IP、高可用VIP以及后面可能用到的ip或域名放进去,以防之后再加时需要更新ApiServer的证书
- –ignore-preflight-errors: 忽略运行时的错误,有多个忽略是需要写多个参数,而不是简单的字符串拼接,例如执行时存在[ERROR NumCPU]和[ERROR Swap],忽略这两个报错就是增加–ignore-preflight-errors=NumCPU 和–ignore-preflight-errors=Swap的配置即可
创建完成会输出以下信息:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.2.143:6443 --token yhqzst.y1aupc1ebpzox9sw \
--discovery-token-ca-cert-hash sha256:5b7c5fdf7823d70861fde94781c9d6aa3a402067e8885877733933f3d4f2dc17
根据上面提示,在k8s-master-01上执行以下语句:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
2.4.2 加入另外的master节点
先在k8s-master-01上执行一下语句:
sudo kubeadm init phase upload-certs --upload-certs
会得到一个证书Key:
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
1f68ef066fc0133ca1a127fb6e9d91ef13baedba7940bdb6627507b1d0468648
然后再k8s-master-02上将刚才得到的Key和之前初始化得到的Join的语句拼接起来:
sudo kubeadm join 192.168.2.143:6443 --token yhqzst.y1aupc1ebpzox9sw \
--discovery-token-ca-cert-hash sha256:5b7c5fdf7823d70861fde94781c9d6aa3a402067e8885877733933f3d4f2dc17 --control-plane --certificate-key 1f68ef066fc0133ca1a127fb6e9d91ef13baedba7940bdb6627507b1d0468648
这里加上了–control-plane --certificate-key的参数,加入进去才是master节点,不然直接join就是node节点。
2.4.3 加入node节点
在所有node节点上执行下面语句:
sudo kubeadm join 192.168.2.143:6443 --token yhqzst.y1aupc1ebpzox9sw \
--discovery-token-ca-cert-hash sha256:5b7c5fdf7823d70861fde94781c9d6aa3a402067e8885877733933f3d4f2dc17
如果是多master集群,请将上面的master的ip地址改为负载均衡的VIP地址。
2.4.4 查看集群节点
在k8s-master-01节点上执行:
kubectl get nodes -o wide
可以查看到目前已经部署了2个master节点,3个node节点,这里的节点状态依然是NotReady
2.5 安装网络插件
以下网络插件2选1即可,只需要在k8s-master-01上执行
2.5.1 安装 Calico
下载部署的yaml,如果不能访问,可以先在可以访问国外的环境下先行下载,然后复制进
curl https://docs.projectcalico.org/manifests/calico.yaml -O
修改yaml中的pod网络:
sudo vim calico.yaml
...
- name: CALICO_IPV4POOL_CIDR
value: "10.24.0.0/16"
...
这个 CALICO_IPV4POOL_CIDR 参数默认是注释的,如果不改则默认为:192.168.0.0/16,与我们之前在初始化集群的时候设置的 --pod-network-cidr 参数不一致
部署calico:
kubectl apply -f calico.yaml
2.5.2 安装 Flannel
下载部署的yaml,如果不能访问,可以先在可以访问国外的环境下先行下载,然后复制进
curl https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml -O
修改yaml中的pod网络:
...
net-conf.json: |
{
"Network": "10.24.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
...
Flannel 默认pod网络为:10.244.0.0/16,修改为之前设置的 --pod-network-cidr 参数保持一致
部署flannel:
sudo kubectl apply -f kube-flannel.yaml
2.5.3 查看集群节点状态
等待完成之后查看集群状态:
sudo kubectl get nodes -o wide
所有节点都是Ready即为完成
3 Kubernetes Dashboard
已丢弃(字数超过了发文限制,所以抛弃了),推荐用 Kuboard,界面更清晰友好,符合国人审美
4 Ingress-Nginx
4.1 下载deploy的yaml文件
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.6.4/deploy/static/provider/cloud/deploy.yaml
更新yaml文件中的三处镜像地址为国内地址:
registry.k8s.io/ingress-nginx/controller:v1.4.0@sha256:34ee929b111ffc7aa426ffd409af44da48e5a0eea1eb2207994d9e0c0882d143 →
anjia0532/google-containers.ingress-nginx.controller:v1.6.4registry.k8s.io/ingress-nginx/kube-webhook-certgen:v20220916-gd32f8c343@sha256:39c5b2e3310dc4264d638ad28d9d1d96c4cbb2b2dcfb52368fe4e3c63f61e10f → anjia0532/google-containers.ingress-nginx.kube-webhook-certgen:v20220916-gd32f8c343
修改部署方式为DaemonSet
...
apiVersion: apps/v1
# kind: Deployment
kind: DaemonSet
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.6.4
name: ingress-nginx-controller
namespace: ingress-nginx
...
# dnsPolicy: ClusterFirst
nodeSelector:
# kubernetes.io/os: linux
app: ingress
hostNetwork: true # 增加
serviceAccountName: ingress-nginx
...
增加了hostNetwork参数,controller的地址将直接采用node节点的宿主机ip,从而固定ip地址。
4.2 修改node节点的label
由于上面修改节点选择方式为label具备app=ingress的节点,所以需要为所有node节点修改label,在master节点上执行:
kubectl label nodes k8s-node-01 app="ingress"
kubectl label nodes k8s-node-02 app="ingress"
kubectl label nodes k8s-node-03 app="ingress"
4.3 部署ingress-nginx
sudo kubectl apply -f deploy.yaml
Note:如果日志显示访问apiserver不通,比如:dial tcp 10.96.0.1:443: timeout,需要修改kube-proxy:
- 修改配置:kubectl edit cm kube-proxy -n kube-system
{ ... iptables: masqueradeAll: true //默认为false mode: "ipvs" //默认为空 ... }
- 删除kube-proxy所有pod:kubectl get pod -n kube-system |grep kube-proxy |awk ‘{system(“kubectl delete pod “$1” -n kube-system”)}’
删除之后会自动创建
4.4 给Dashboard配置Ingress
Dashboard 都丢弃了,还要这个干啥,也丢弃
4.5 配置ingress负载均衡
4.5.1 生成证书
sudo openssl req -x509 -newkey rsa:4096 -sha256 -nodes -keyout xxxxxx.com.key -out xxxxxx.com.pem -days 3650
4.5.2 创建负载均衡配置
修改/etc/nginx/conf.d/nginx.conf:
sudo vim /etc/nginx/conf.d/nginx.conf
贴入以下配置:
upstream balance_ingress_server {
# 服务器的访问地址,负载均衡算法默认使用权重轮询,也可以采用其他算法。
# 这里直接配置ingress所在节点的ip
server 192.168.2.145:443;
server 192.168.2.146:443;
server 192.168.2.151:443;
}
# 负载均衡服务
# 80端口直接转发到443
server {
# 负载均衡的监听端口
listen 80;
listen [::]:80;
# 负载均衡服务器的服务名称,没有时填写 _
server_name xxxxxx.com;
rewrite ^/(.*) https://$server_name$request_uri? permanent;
# return 301 https://$server_name$request_uri;
}
server {
# 负载均衡的监听端口
listen 443 ssl http2;
listen [::]:443 ssl http2;
# 负载均衡服务器的服务名称,没有时填写 _
server_name xxxxxx.com;
ssl_certificate /etc/nginx/cert/xxxxxx.com.pem;
ssl_certificate_key /etc/nginx/cert/xxxxxx.com.key;
ssl_session_timeout 10m;
ssl_protocols TLSV1 TLSv1.1 TLSv1.2;
ssl_ciphers ECDHE-RSA-AES128-GCM-SHA256:ECDHE:ECDH:AES:HIGH:!NULL:!aNULL:!MD5:!ADH:!RC4;
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/k8s-ingress-access.log;
error_log /var/log/nginx/k8s-ingress-error.log;
location / {
# 代理转发Ingress服务器
proxy_pass https://balance_ingress_server;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Note: 因为有些服务都是https访问,所以在做负载均衡时,要指向目标ip的443端口,不然导致 ERR_TOO_MANY_REDIRECTS
5 私有镜像仓库
5.1 给指定节点打好标签
kubectl label nodes k8s-node-01 registry="yes"
5.2 准备部署yaml
创建Namespace,registry-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: docker-registry
创建StorageClass,registry-sc.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
创建PersistentVolume,registry-pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: docker-registry-pv
labels:
pv: docker-registry-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /data/docker # 记得创建此目录
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- k8s-node-01 # 指定打好标签的node节点
创建PersistantVolumnClaim,registry-pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: docker-registry-pvc
namespace: docker-registry
spec:
resources:
requests:
storage: 10Gi
accessModes:
- ReadWriteMany
storageClassName: local-storage
selector:
matchLabels:
pv: docker-registry-pv
创建Deployment,registry-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: docker-registry
name: docker-registry
namespace: docker-registry
spec:
replicas: 1
revisionHistoryLimit: 5
selector:
matchLabels:
registry: "yes" # 指定打标签的节点
template:
metadata:
labels:
registry: "yes"
spec:
securityContext:
runAsUser: 0
containers:
- name: docker-registry
image: registry:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5000
name: web
protocol: TCP
resources:
requests:
memory: 200Mi
cpu: "0.1"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/registry/
name: docker-registry-data
volumes:
- name: docker-registry-data
persistentVolumeClaim:
claimName: docker-registry-pvc
创建Service,registry-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: docker-registry-service
namespace: docker-registry
spec:
ports:
- name: port-name
port: 5000
protocol: TCP
targetPort: 5000
nodePort: 30500
selector:
registry: "yes" # 指定打标签的节点
type: NodePort
5.3 应用yaml文件
kubectl apply -f registry-namespace.yaml
kubectl apply -f registry-sc.yaml
kubectl apply -f registry-pv.yaml
kubectl apply -f registry-pvc.yaml
kubectl apply -f registry-deploy.yaml
kubectl apply -f registry-svc.yaml
5.4 推送本地image到仓库
5.4.1 增加仓库地址
Windows系统安装Docker Desktop,在设置中修改Docker Engine:
...
"insecure-registries": [
"192.168.2.145:30500"
],
"registry-mirrors": [
"http://192.168.2.145:30500"
]
Linux/MacOs修改/etc/docker/daemon.json,同样贴入上面代码
5.4.2 给本地image打标签
docker tag miniapi:dev 192.168.2.145:30500/miniapi:v1.0.0
5.4.2 推送并查看
# 推送
docker push 192.168.2.145:30500/miniapi:v1.0.0
# 查看仓库内所有镜像
curl http://192.168.2.145:30500/v2/_catalog
# 查看指定镜像的tags
curl http://192.168.2.145:30500/v2/miniapi/tags/list
5.4.3 删除指定标签的镜像
curl -X DELETE http://192.168.2.145:30500/v2/miniapi/manifests/v1.0.0
6 服务注册与发现
6.1 给指定节点打好标签
kubectl label nodes k8s-node-01 serviceDiscovery="consul"
kubectl label nodes k8s-node-02 serviceDiscovery="consul"
kubectl label nodes k8s-node-03 serviceDiscovery="consul"
6.2 Kubectl部署
6.2.1 部署consul service
创建consul-svc.yaml
apiVersion: v1
kind: Namespace
metadata:
name: consul
---
apiVersion: v1
kind: Service
metadata:
name: consul-svc
namespace: consul
labels:
name: consul
spec:
type: ClusterIP
ports:
- name: http
port: 8500
targetPort: 8500
- name: https
port: 8443
targetPort: 8443
- name: rpc
port: 8400
targetPort: 8400
- name: serflan-tcp
protocol: "TCP"
port: 8301
targetPort: 8301
- name: serflan-udp
protocol: "UDP"
port: 8301
targetPort: 8301
- name: serfwan-tcp
protocol: "TCP"
port: 8302
targetPort: 8302
- name: serfwan-udp
protocol: "UDP"
port: 8302
targetPort: 8302
- name: server
port: 8300
targetPort: 8300
- name: consuldns
port: 8600
targetPort: 8600
selector:
serviceDiscovery: consul
kubectl apply -f consul-svc.yaml
6.2.2 部署consul server
创建consul-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: consul
namespace: consul
spec:
serviceName: consul-svc
replicas: 3
selector:
matchLabels:
serviceDiscovery: consul
template:
metadata:
labels:
serviceDiscovery: consul
spec:
terminationGracePeriodSeconds: 10
containers:
- name: consul
image: hashicorp/consul:latest
args:
- "agent"
- "-server"
- "-bootstrap-expect=3"
- "-ui"
- "-data-dir=/consul/data"
- "-bind=0.0.0.0"
- "-client=0.0.0.0"
- "-advertise=$(PODIP)"
- "-retry-join=consul-0.consul-svc.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-1.consul-svc.$(NAMESPACE).svc.cluster.local"
- "-retry-join=consul-2.consul-svc.$(NAMESPACE).svc.cluster.local"
- "-domain=cluster.local"
- "-disable-host-node-id"
volumeMounts:
- name: data
mountPath: /consul/data
env:
- name: PODIP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- containerPort: 8500
name: ui-port
- containerPort: 8400
name: alt-port
- containerPort: 53
name: udp-port
- containerPort: 8443
name: https-port
- containerPort: 8080
name: http-port
- containerPort: 8301
name: serflan
- containerPort: 8302
name: serfwan
- containerPort: 8600
name: consuldns
- containerPort: 8300
name: server
volumes:
- name: data
hostPath:
path: /data/consul
kubectl apply -f consul-statefulset.yaml
这里采用StatefulSet的方式,是为了可以保证Pod的名称固定,当名称固定且数量固定,pod命名规则就是:$(name)-(序列),如此可以预先知道dns规则,以便在加入集群中自动加入,如其中参数:-retry-join=consul-0.consul-svc.$(NAMESPACE).svc.cluster.local
6.3 Helm部署(推荐)
6.3.1 添加Helm仓库
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
6.3.2 编写自定义部署配置
# 查看默认配置
helm inspect values hashicorp/consul
# 创建配置文件复制默认配置,然后修改
sudo vim values.yaml
修改以下内容:
...
server:
replicas: 3 # 部署三个副本
...
6.3.3 部署PV
consul helm安装默认数据存储为pv,所以需要部署pv,不然pvc会一直pending;当然也可以先部署consul,找到pvc的name之后,根据这些name再创建pv,保持name和storageClassName和pvc一致,不然无法绑定
编辑consul-pv.yaml
apiVersion: v1
kind: Namespace
metadata:
name: consul
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: data-consul-consul-consul-server-0
namespace: consul
labels:
type: local
spec:
storageClassName: ""
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/consul"
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: data-consul-consul-consul-server-1
namespace: consul
labels:
type: local
spec:
storageClassName: ""
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/consul"
---
kind: PersistentVolume
apiVersion: v1
metadata:
name: data-consul-consul-consul-server-2
namespace: consul
labels:
type: local
spec:
storageClassName: ""
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data/consul"
# 给挂载目录授权
sudo mkdir /data/consul
sudo chmod -R 777 /data/consul
# 创建pv
kubectl apply -f consul-pv.yaml -n consul
6.3.4 部署
helm install consul hashicorp/consul -n consul --values values.yaml
6.4 部署consul ingress
创建consul-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: consul-ingress
namespace: consul
spec:
ingressClassName: "nginx"
rules:
- host: consul.xxxxxx.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: consul-consul-ui # 此处为helm部署的服务名,如果是yaml文件部署配置成consul-svc,保持服务名一致
port:
number: 80 # 此处为helm部署的端口,如果是yaml文件部署配置成8500,保持和服务端口一致
kubectl apply -f consul-ingress.yaml
配置好本地DNS解析之后就可以通过:https://consul.xxxxxx.com 访问了
7 服务监控
7.1 部署node-exporter
创建部署node-exporter.yaml:
kind: DaemonSet
apiVersion: apps/v1
metadata:
labels:
app: node-exporter
name: node-exporter
namespace: monitor
spec:
revisionHistoryLimit: 10
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
containers:
- name: node-exporter
image: prom/node-exporter:latest
ports:
- containerPort: 9100
protocol: TCP
name: http
hostNetwork: true # 获得Node的物理指标信息
hostPID: true # 获得Node的物理指标信息
# tolerations: # Master节点
# - effect: NoSchedule
# operator: Exists
---
kind: Service
apiVersion: v1
metadata:
labels:
app: node-exporter
name: node-exporter-svc
namespace: monitor
spec:
ports:
- name: http
port: 9100
nodePort: 31672
protocol: TCP
type: NodePort
selector:
app: node-exporter
kubectl apply -f node-exporter.yaml
# 部署完成之后可以访问验证
curl http://192.168.2.145:31672/metrics
7.2 部署Prometheus
创建prometheus.yaml:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""] # "" indicates the core API group
resources:
- nodes
- nodes/proxy
- services
- endpoints
- pods
verbs:
- get
- watch
- list
- apiGroups:
- extensions
resources:
- ingresses
verbs:
- get
- watch
- list
- nonResourceURLs: ["/metrics"]
verbs:
- get
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitor
labels:
app: prometheus
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitor
roleRef:
kind: ClusterRole
name: prometheus
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitor
labels:
app: prometheus
data:
prometheus.yml: |-
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'grafana'
static_configs:
- targets:
- 'grafana-svc.monitor:3000'
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# If your node certificates are self-signed or use a different CA to the
# master CA, then disable certificate verification below. Note that
# certificate verification is an integral part of a secure infrastructure
# so this should only be disabled in a controlled environment. You can
# disable certificate verification by uncommenting the line below.
#
# insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# Keep only the default/kubernetes service endpoints for the https port. This
# will add targets for each API server which Kubernetes adds an endpoint to
# the default/kubernetes service.
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
# Scrape config for nodes (kubelet).
#
# Rather than connecting directly to the node, the scrape is proxied though the
# Kubernetes apiserver. This means it will work if Prometheus is running out of
# cluster, or can't connect to nodes for some other reason (e.g. because of
# firewalling).
- job_name: 'kubernetes-nodes'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
# This TLS & bearer token file config is used to connect to the actual scrape
# endpoints for cluster components. This is separate to discovery auth
# configuration because discovery & scraping are two separate concerns in
# Prometheus. The discovery auth config is automatic if Prometheus runs inside
# the cluster. Otherwise, more config options have to be provided within the
# <kubernetes_sd_config>.
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
# Scrape config for Kubelet cAdvisor.
#
# This is required for Kubernetes 1.7.3 and later, where cAdvisor metrics
# (those whose names begin with 'container_') have been removed from the
# Kubelet metrics endpoint. This job scrapes the cAdvisor endpoint to
# retrieve those metrics.
#
# In Kubernetes 1.7.0-1.7.2, these metrics are only exposed on the cAdvisor
# HTTP endpoint; use "replacement: /api/v1/nodes/${1}:4194/proxy/metrics"
# in that case (and ensure cAdvisor's HTTP server hasn't been disabled with
# the --cadvisor-port=0 Kubelet flag).
#
# This job is not necessary and should be removed in Kubernetes 1.6 and
# earlier versions, or it will cause the metrics to be scraped twice.
- job_name: 'kubernetes-cadvisor'
# Default to scraping over https. If required, just disable this or change to
# `http`.
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# Scrape config for service endpoints.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/scrape`: Only scrape services that have a value of `true`
# * `prometheus.io/scheme`: If the metrics endpoint is secured then you will need
# to set this to `https` & most likely set the `tls_config` of the scrape config.
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: If the metrics are exposed on a different port to the
# service then set this appropriately.
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
# Example scrape config for probing services via the Blackbox Exporter.
#
# The relabeling allows the actual service scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-services'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__]
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
target_label: kubernetes_name
# Example scrape config for probing ingresses via the Blackbox Exporter.
#
# The relabeling allows the actual ingress scrape endpoint to be configured
# via the following annotations:
#
# * `prometheus.io/probe`: Only probe services that have a value of `true`
- job_name: 'kubernetes-ingresses'
metrics_path: /probe
params:
module: [http_2xx]
kubernetes_sd_configs:
- role: ingress
relabel_configs:
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.example.com:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
# Example scrape config for pods
#
# The relabeling allows the actual pod scrape endpoint to be configured via the
# following annotations:
#
# * `prometheus.io/scrape`: Only scrape pods that have a value of `true`
# * `prometheus.io/path`: If the metrics path is not `/metrics` override this.
# * `prometheus.io/port`: Scrape the pod on the indicated port instead of the
# pod's declared ports (default is a port-free target if none are declared).
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
---
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-rules
namespace: monitor
labels:
app: prometheus
data:
cpu-usage.rule: |
groups:
- name: NodeCPUUsage
rules:
- alert: NodeCPUUsage
expr: (100 - (avg by (instance) (irate(node_cpu{name="node-exporter",mode="idle"}[5m])) * 100)) > 75
for: 2m
labels:
severity: "page"
annotations:
summary: "{{$labels.instance}}: High CPU usage detected"
description: "{{$labels.instance}}: CPU usage is above 75% (current value is: {{ $value }})"
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: "prometheus-data-pv"
labels:
name: prometheus-data-pv
release: stable
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: local-storage
local:
path: /data/prometheus
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- monitor
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data-pvc
namespace: monitor
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: local-storage
selector:
matchLabels:
name: prometheus-data-pv
release: stable
---
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: prometheus
name: prometheus
namespace: monitor
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
securityContext:
runAsUser: 0
containers:
- name: prometheus
image: bitnami/prometheus:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /prometheus
name: prometheus-data-volume
- mountPath: /etc/prometheus/prometheus.yml
name: prometheus-config-volume
subPath: prometheus.yml
- mountPath: /etc/prometheus/rules
name: prometheus-rules-volume
ports:
- containerPort: 9090
protocol: TCP
volumes:
- name: prometheus-data-volume
persistentVolumeClaim:
claimName: prometheus-data-pvc
- name: prometheus-config-volume
configMap:
name: prometheus-config
- name: prometheus-rules-volume
configMap:
name: prometheus-rules
nodeSelector:
app: monitor
---
kind: Service
apiVersion: v1
metadata:
annotations:
prometheus.io/scrape: 'true'
labels:
app: prometheus
name: prometheus-svc
namespace: monitor
spec:
ports:
- port: 9090
targetPort: 9090
selector:
app: prometheus
type: NodePort
kubectl apply -f prometheus.yaml
# 部署之后访问验证
curl http://192.168.2.149:32101/targets
7.3 部署Grafana
创建grafana.yaml:
apiVersion: v1
kind: PersistentVolume
metadata:
name: "grafana-data-pv"
labels:
name: grafana-data-pv
release: stable
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: local-storage
local:
path: /data/grafana
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: app
operator: In
values:
- monitor
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-data-pvc
namespace: monitor
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: local-storage
selector:
matchLabels:
name: grafana-data-pv
release: stable
---
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
app: grafana
name: grafana
namespace: monitor
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
runAsUser: 0
containers:
- name: grafana
image: grafana/grafana:latest
imagePullPolicy: IfNotPresent
env:
- name: GF_AUTH_BASIC_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "false"
readinessProbe:
httpGet:
path: /login
port: 3000
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-data-volume
ports:
- containerPort: 3000
protocol: TCP
volumes:
- name: grafana-data-volume
persistentVolumeClaim:
claimName: grafana-data-pvc
nodeSelector:
app: monitor
---
kind: Service
apiVersion: v1
metadata:
labels:
app: grafana
name: grafana-svc
namespace: monitor
spec:
ports:
- port: 3000
targetPort: 3000
selector:
app: grafana
type: NodePort
kubectl apply -f grafana.yaml
在浏览器中访问:http://192.168.2.149:32129/ ,默认用户名密码:admin/admin,第一次进去需要修改密码
8 链路追踪
创建zipkin-server.yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: zipkin
namespace: monitor
labels:
app: zipkin
spec:
replicas: 1
selector:
matchLabels:
app: zipkin
template:
metadata:
labels:
app: zipkin
spec:
containers:
- name: zipkin
image: openzipkin/zipkin:latest
imagePullPolicy: IfNotPresent
env:
- name: JAVA_OPTS
value: "
-Xms512m -Xmx512m
-Dlogging.level.zipkin=DEBUG
-Dlogging.level.zipkin2=DEBUG
-Duser.timezone=Asia/Shanghai
"
# - name: STORAGE_TYPE
# value: "elasticsearch" #设置数据存储在ES中
# - name: ES_HOSTS
# value: "elasticsearch-svc.monitor:9200" #ES地址
# - name: ES_INDEX #设置ES中存储的zipkin索引名称
# value: "zipkin"
# - name: ES_INDEX_REPLICAS #ES索引副本数
# value: "1"
# - name: ES_INDEX_SHARDS #ES分片数量
# value: "3"
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
nodeSelector:
app: monitor
---
apiVersion: v1
kind: Service
metadata:
name: zipkin-svc
namespace: monitor
labels:
app: zipkin
spec:
type: NodePort
ports:
- port: 9411
targetPort: 9411
nodePort: 30190
selector:
app: zipkin
kubectl apply -f zipkin-server.yaml
部署完成之后,访问验证:http://192.168.2.149:30190/zipkin/
9 ELK
将环境部署在k8s-node-monitor节点上
9.1 安装ElasticSearch
创建elasticsearch.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: elasticsearch-config
namespace: monitor
data:
elasticsearch.yml: |-
cluster.name: ${CLUSTER_NAME}
node.name: ${NODE_NAME}
network.host: 0.0.0.0
xpack.security.enabled: false
xpack.monitoring.collection.enabled: true
xpack.security.http.ssl.enabled: false
# xpack.security.http.ssl.keystore.path: certs/certificate.pem
# xpack.security.http.ssl.keystore.password: xxxxxx@888
xpack.license.self_generated.type: basic
xpack.security.transport.ssl.enabled: false
# xpack.security.transport.ssl.verification_mode: certificate
# xpack.security.transport.ssl.keystore.path: certs/certificate.pem
# xpack.security.transport.ssl.truststore.path: certs/certificate.pem
# xpack.security.transport.ssl.keystore.password: xxxxxx@888
---
apiVersion: apps/v1
kind: Deployment
metadata:
generation: 1
labels:
app: elasticsearch
name: elasticsearch
namespace: monitor
spec:
minReadySeconds: 10
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: elasticsearch
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
app: elasticsearch
spec:
containers:
- env:
- name: TZ
value: Asia/Shanghai
- name: xpack.security.enrollment.enabled
value: 'true'
- name: CLUSTER_NAME
value: elasticsearch
- name: NODE_NAME
value: elasticsearch
- name: ELASTIC_USERNAME
value: elastic
- name: ELASTIC_PASSWORD
value: xxxxxx@999
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: -Xms512m -Xmx512m
- name: MINIMUM_MASTER_NODES
value: "1"
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.1
imagePullPolicy: IfNotPresent
name: elasticsearch
ports:
- containerPort: 9200
name: db
protocol: TCP
- containerPort: 9300
name: transport
protocol: TCP
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: es-persistent-storage
- mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
name: elasticsearch-config
subPath: elasticsearch.yml
- mountPath: /usr/share/elasticsearch/config/certs
name: elasticsearch-certs
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: user-1-registrysecret
initContainers:
- command:
- /sbin/sysctl
- -w
- vm.max_map_count=262144
image: alpine:3.6
imagePullPolicy: IfNotPresent
name: elasticsearch-init
resources: {}
securityContext:
privileged: true
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /opt/monitor/es_data
type: ""
name: es-persistent-storage
- configMap:
defaultMode: 429
name: elasticsearch-config
name: elasticsearch-config
- secret:
secretName: elasticsearch-certs
name: elasticsearch-certs
nodeSelector:
app: monitor
创建elasticsearch-svc.yaml:
apiVersion: v1
kind: Service
metadata:
namespace: monitor
name: elasticsearch-svc
labels:
app: elasticsearch
spec:
type: NodePort
ports:
- port: 9200
targetPort: 9200
nodePort: 30920
name: elasticsearch
selector:
app: elasticsearch
部署yaml:
kubectl apply -f elasticsearch.yaml
kubectl apply -f elasticsearch-svc.yaml
9.2 安装Kibana
创建kibana.yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: monitor
labels:
app: kibana
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
affinity:
nodeAffinity: {}
containers:
- name: kibana
image: docker.elastic.co/kibana/kibana:8.11.1
ports:
- containerPort: 5601
protocol: TCP
env:
- name: ELASTICSEARCH_URL
value: http://10.96.81.0:9200 # 这里是ES的集群id+端口
nodeSelector:
app: monitor
---
apiVersion: v1
kind: Service
metadata:
name: kibana-svc
namespace: monitor
labels:
app: kibana
spec:
ports:
- protocol: TCP
port: 80
targetPort: 5601
selector:
app: kibana
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: kibana-ingress
namespace: monitor
spec:
ingressClassName: "nginx"
rules:
- host: kibana.xxxxxx.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kibana-svc
port:
number: 80
部署yaml:
kubectl apply -f kibana.yaml
部署完成之后,在kibana这个pod中查看日志可以看到一个地址:http://0.0.0.0:5601?code=xxxxxx
这末尾的6位数字记录下来,配置好本地dns解析之后访问:https://kibana.xxxxxx.com 进入配置页面,选择手动配置,输入好正确的es地址(集群id+端口),然后输入上面的6位code,即可进入kibana
PS:如果kibana运行一段时间后出现index_not_found之类的错误,说明es没有设置自动创建索引,可以在kibana中执行以下脚本:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "true"
}
}
10 RDB主备
10.1 安装PostgreSql
10.1.1 添加源
sudo sh -c 'echo "deb https://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
10.1.2 安装
sudo apt-get -y install postgresql
sudo apt-get install postgresql-contrib
10.1.3 修改默认用户密码
sudo -u postgres psql
alter user postgres with password 'xxxxxx@999';
10.1.4 设置远程访问
sudo vim /etc/postgresql/16/main/postgresql.conf
修改listen_address,将localhost改为*
#------------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#------------------------------------------------------------------------------
# - Connection Settings -
listen_addresses = '*' # what IP address(es) to listen on;
sudo vim /etc/postgresql/16/main/pg_hba.conf
在最后一行添加
host all all 0.0.0.0/0 md5
保存重启
sudo service postgresql restart
设置开启自启动
sudo systemctl enable postgresql.service
10.1.5 配置主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | PostgreSql 16.1 | Master |
| db-02 | 192.168.2.148 | PostgreSql 16.1 | Slave |
10.1.5.1 在Master上配置
添加主从备份的用户
sudo -u postgres psql
# 添加用户并赋予replication和login权限(后面使用此用户来进行主从)
create role replica login replication encrypted password 'xxxxxx@888';
配置参数
sudo vim /etc/postgresql/16/main/postgresql.conf
max_connections = 500
hot_standby = on # 打开热备
wal_level = replica # 设置 WAL 日志级别为 replica
max_wal_senders = 10 # 允许的 WAL 发送者数量,根据需要进行调整
archive_mode = on
archive_command = 'test ! -f /data/postgresql/archive/%f && cp %p /data/postgresql/archive/%f'
wal_keep_size = 256
wal_sender_timeout = 60s
sudo mkdir /data/postgresql/archive
允许slave同步
sudo vim /etc/postgresql/16/main/pg_hba.conf
在最后一行添加
host replication replica 192.168.2.148/24 md5
保存重启
sudo service postgresql restart
10.1.5.2 在Slave上配置
sudo systemctl stop postgresql
su - postgres
# 复制一份,防止误操作
cp -r /var/lib/postgresql/16/main /var/lib/postgresql/16/main.bak
# 清除本地数据
rm -rf /var/lib/postgresql/16/main
执行命令从主数据库备份数据
pg_basebackup -h 192.168.2.147 -U replica -F p -X stream -P -R -D /var/lib/postgresql/16/main
-h –指定作为主服务器的主机
-D –指定数据目录
-U –指定连接用户
-P –启用进度报告
-v –启用详细模式
-R –启用恢复配置的创建:创建一个standby.signal文件,并将连接设置附加到数据目录下
配置参数
sudo vim /etc/postgresql/16/main/postgresql.conf
# 设置 WAL 日志级别为 replica
wal_level = replica
# 允许的 WAL 发送者数量,根据需要进行调整
max_wal_senders = 10
# 一般从库做主要的读服务时,设置值需要高于主
max_connections = 1000
# 在备份的同时允许查询
hot_standby = on
# 流复制最大延迟 (可选)
max_standby_streaming_delay = 30s
# 从向主报告状态的最大间隔时间 (可选)
wal_receiver_status_interval = 10s
# 查询冲突时向主反馈 #默认参数,非主从配置相关参数,表示到数据库的连接数 (可选)
hot_standby_feedback = on
保存重启
sudo systemctl restart postgresql
10.1.5.3 验证
在Master服务器中执行:
sudo -u postgres psql
select client_addr,sync_state from pg_stat_replication;
可以在master中新建库表,看是否同步到slave中。
10.2 安装MySql
10.2.1 安装
sudo apt-get update
sudo apt-get install mysql-server -y
10.2.2 修改配置(主从都修改)
sudo vi /etc/mysql/mysql.conf.d/mysqld.cnf
修改以下内容:
bind-address=0.0.0.0 # 这一行注释掉,或者改为0.0.0.0
server-id = 147 # mysql服务的唯一ID,主从之间全局唯一,一般用IP的最后一段
log_bin = /var/log/mysql/mysql-bin.log # binlog日志
# binlog_do_db = include_database_name # 需要同步的数据库,多个就配置多条,不要用逗号隔开
保存重启
sudo systemctl restart mysql.service
10.2.3 修改root用户密码
# 查看默认用户名密码
sudo cat /etc/mysql/debian.cnf
# 登录默认用户
mysql -u debian-sys-maint -p
mysql> use mysql;
mysql> update user set authentication_string='' where user='root';
mysql> alter user 'root'@'localhost' identified with mysql_native_password by 'xxxxxx@999';
修改后重启服务:
sudo service mysql restart
10.2.4 配置主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | Mysql 8.0.35 | Master |
| db-02 | 192.168.2.148 | Mysql 8.0.35 | Slave |
10.2.3 配置master
mysql -u root -p
# 创建同步账户
mysql> CREATE USER 'repl'@'%' IDENTIFIED WITH mysql_native_password BY 'xxxxxx@888';
# 给账户授权
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
mysql> flush privileges;
10.2.4 配置slave
mysql -u root -p
mysql> stop replica;
mysql> CHANGE REPLICATION SOURCE TO SOURCE_HOST='192.168.2.147',SOURCE_PORT=3306,SOURCE_USER='repl',SOURCE_PASSWORD='xxxxxx@888';
mysql> start replica;
10.2.5 验证
# 查看同步状态
mysql> show replica status\G;
可以看到这两项状态为Yes即为成功,Connecting或者No都为失败
Replica_IO_Running: Yes
Replica_SQL_Running: Yes
可以在master中新建库表,看是否同步到slave中。
11 Redis主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | Redis 6.0.16 | Master |
| db-02 | 192.168.2.148 | Redis 6.0.16 | Slave |
11.1 安装
sudo apt-get update
sudo apt-get install redis-server -y
11.2 master配置
sudo vim /etc/redis/redis.conf
修改以下配置:
bind 0.0.0.0
port 6379
daemonize yes
protected-mode no
requirepass xxxxxx@999
masterauth xxxxxx@999
sudo systemctl restart redis
11.3 slave配置
sudo vim /etc/redis/redis.conf
修改以下配置:
bind 0.0.0.0
port 6379
daemonize yes
protected-mode no
requirepass xxxxxx@999
masterauth xxxxxx@999
replicaof 192.168.2.147 6379
replica-read-only yes # 默认从库只读
sudo systemctl restart redis
11.4 验证
在master查看日志
tail -f /var/log/redis/redis-server.log
可以看到已经同步成功
可以在master上创建key-value,查看slave上是否已同步。
12 RabbitMQ主从
服务器环境
| 主机名称 | 主机地址 | 数据库版本 | 角色 |
|---|---|---|---|
| db-01 | 192.168.2.147 | RabbitMQ 3.9.13 | Master |
| db-02 | 192.168.2.148 | RabbitMQ 3.9.13 | Slave |
sudo vim /etc/hosts
# 写入ip映射
192.168.2.147 db-01
192.168.2.147 db-02
保存重启
sudo reboot
12.1 安装
# 安装erlang
sudo apt-get install erlang-nox -y
# 安装rabbitmq
wget -O- https://www.rabbitmq.com/rabbitmq-release-signing-key.asc | sudo apt-key add -
sudo apt-get update
sudo apt-get install rabbitmq-server -y
# 查看状态
sudo systemctl status rabbitmq-server
# 启动管理页面
sudo rabbitmq-plugins enable rabbitmq_management
# 创建管理用户
sudo rabbitmqctl add_user admin xxxxxx@999
sudo rabbitmqctl set_user_tags admin administrator
sudo rabbitmqctl set_permissions -p / admin ".*" ".*" ".*"
12.2 配置主从
12.2.1 复制.erlang.cookie文件
# 复制master的.erlang.cookie文件到slave
sudo scp /var/lib/rabbitmq/.erlang.cookie yj@db-02:/var/lib/rabbitmq/
# 如果权限不够,可以先复制到用户home下,然后再移动到/var/lib/rabbitmq/
# 新的.erlang.cookie需要授权
sudo chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
sudo chmod 600 /var/lib/rabbitmq/.erlang.cookie
保持master和slave的.erlang.cookie一致
12.2.2 重启每个节点
sudo rabbitmqctl stop
sudo rabbitmq-server -detached
查看rabbitmq的状态,如果没启动,或者重新启动还是报错,可以reboot即可
12.2.3 加入集群
在slave上执行:
sudo rabbitmqctl stop_app
sudo rabbitmqctl join_cluster rabbit@db-01
sudo rabbitmqctl start_app
查看集群状态:
sudo rabbitmqctl cluster_status
进入管理页面:http://192.168.2.147:15672/#/ 可以看到集群节点。
13 部署Apollo配置中心
13.1 创建数据库
进入:apollo/scripts/sql at v2.1.0 · apolloconfig/apollo · GitHub 查看数据库脚本,然后下载数据库脚本在数据库上执行,创建好ApolloConfigDB和ApolloPortalDB两个数据库。
mysql -u root -p
# 执行脚本创建数据库
mysql> source /home/yj/apollo/apolloconfigdb.sql
mysql> source /home/yj/apollo/apolloportaldb.sql
# 添加用户并授权
mysql> create user 'apollo'@'%' identified by 'xxxxxxApollo@999';
mysql> grant privileges on ApolloConfigDB.* to 'apollo'@'%';
mysql> grant privileges on ApolloPortalDB.* to 'apollo'@'%';
用户名密码:apollo/xxxxxxApollo@999
13.2 安装Helm
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
sudo chmod 700 get_helm.sh
sudo ./get_helm.sh
# 输出
Downloading https://get.helm.sh/helm-v3.13.1-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
安装好之后输入helm即可查看信息了
13.3 添加仓库
helm repo add apollo https://charts.apolloconfig.com
# 添加好之后可以查看
helm search repo apollo
13.4 部署
创建namespace
kubectl create namespace apollo
给需要部署的node节点打标签
kubectl label nodes k8s-node-01 config="apollo"
kubectl label nodes k8s-node-02 config="apollo"
kubectl label nodes k8s-node-03 config="apollo"
13.4.1 部署service
自定义配置文件service-values.yaml
configdb:
host: 192.168.2.148
port: 3306
dbName: ApolloConfigDB
userName: apollo
password: xxxxxxApollo@999
service:
enabled: true
configService:
replicaCount: 3
nodeSelector:
config: apollo
adminService:
replicaCount: 3
nodeSelector:
config: apollo
helm install apollo-service-dev -f service-values.yaml -n apollo apollo/apollo-service
# 输出
NAME: apollo-service-dev
LAST DEPLOYED: Fri Dec 8 05:54:34 2023
NAMESPACE: apollo
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Meta service url for current release:
echo http://apollo-service-dev-apollo-configservice.apollo:8080
For local test use:
export POD_NAME=$(kubectl get pods --namespace apollo -l "app=apollo-service-dev-apollo-configservice" -o jsonpath="{.items[0].metadata.name}")
echo http://127.0.0.1:8080
kubectl --namespace apollo port-forward $POD_NAME 8080:8080
Urls registered to meta service:
Config service: http://apollo-service-dev-apollo-configservice.apollo:8080
Admin service: http://apollo-service-dev-apollo-adminservice.apollo:8090
记录上面地址,部署portal会用到
13.4.2 部署portal
自定义配置文件portal-values.yaml
portaldb:
host: 192.1682.147
userName: apollo
password: xxxxxxApollo@999
connectionStringProperties: characterEncoding=utf8&useSSL=false
service:
enabled: true
config:
envs: dev
metaServers:
dev: http://apollo-service-dev-apollo-configservice.apollo:8080
replicaCount: 3
containerPort: 8070
nodeSelector:
config: apollo
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
hosts:
- host: apollo.xxxxxx.com
paths:
- /
helm install apollo-portal -f portal-values.yaml -n apollo apollo/apollo-portal
# 输出
NAME: apollo-portal
LAST DEPLOYED: Fri Dec 8 06:23:47 2023
NAMESPACE: apollo
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Portal url for current release:
export POD_NAME=$(kubectl get pods --namespace apollo -l "app=apollo-portal" -o jsonpath="{.items[0].metadata.name}")
echo "Visit http://127.0.0.1:8070 to use your application"
kubectl --namespace apollo port-forward $POD_NAME 8070:8070
Ingress:
http://apollo.xxxxxx.com/
配置好本地dns解析之后,访问 https://apollo.xxxxxx.com ,默认用户名密码是:apollo/admin,可以在 “管理员工具-用户管理” 中修改密码。
网友解答:--【壹】--:
感谢大佬
--【贰】--:
好长啊,喂给GPT,它能帮我搞定不?
--【叁】--:
我选k3s
--【肆】--:
很详细很好
--【伍】--:
收藏先
--【陆】--:
这待挨多少打才能成角啊
--【柒】--:
感恩佬儿分享
--【捌】--:
这得码多久才码完?
--【玖】--:
牛的大佬,我先收藏。。
--【拾】--:
已点赞,接下来刚好有项目用到k8s,到时候回来参考
--【拾壹】--:
666, 谁都不服就服大佬你
--【拾贰】--:
学习一下
--【拾叁】--:
牛逼 我服了
--【拾肆】--:
mark 有空就搞
--【拾伍】--:
感谢大佬
--【拾陆】--:
大佬牛哇
--【拾柒】--:
这一套下来服务器都力竭了
--【拾捌】--:
就喜欢码字大佬!
--【拾玖】--:
学习一下

