K8S部署B(yǎng)OBAI 服務(wù)(Nvidia版)
目錄
一、GPU 節(jié)點(diǎn)部署 Driver && CUDA部署
1、前提準(zhǔn)備
檢查機(jī)器上面有支持CUDA的NVIDIA GPU
lspci | grep -i nvidia
查看自己的系統(tǒng)是否支持
驗(yàn)證系統(tǒng)是否有GCC編譯環(huán)境
gcc -v
驗(yàn)證系統(tǒng)是否安裝了正確的內(nèi)核頭文件和開發(fā)
sudo yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
2、開始安裝
禁用nouveau
nouveau是一個(gè)第三方開源的Nvidia驅(qū)動(dòng),一般Linux安裝的時(shí)候默認(rèn)會(huì)安裝這個(gè)驅(qū)動(dòng)。 這個(gè)驅(qū)動(dòng)會(huì)與Nvidia官方的驅(qū)動(dòng)沖突,在安裝Nvidia驅(qū)動(dòng)和和CUDA之前應(yīng)先禁用nouveau。
# 查看系統(tǒng)是否正在使用nouveau
lsmod | grep nouveau
# 如果顯示內(nèi)容,則禁用。以下是centos7的禁用方法
#新建一個(gè)配置文件
sudo vim /etc/modprobe.d/blacklist-nouveau.conf
#寫入以下內(nèi)容
blacklist nouveau
options nouveau modeset=0
#保存并退出
:wq
#備份當(dāng)前的鏡像
sudo mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
#建立新的鏡像
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r)
#重啟
sudo reboot
#最后輸入上面的命令驗(yàn)證
lsmod | grep nouveau
開始安裝驅(qū)動(dòng)NVIDIA Driver (也可跳過,直接去安裝CUDA)
??一定要確認(rèn)NVIDIA Driver的版本適合自己的顯卡
-
下載NVIDIA Driver
首先到 NVIDIA 驅(qū)動(dòng)下載 下載對(duì)應(yīng)的顯卡驅(qū)動(dòng):

在下載前確認(rèn)Driver是否支持自己的顯卡

-
安裝NVIDIA Driver
rpm -ivh nvidia-driver-local-repo-rhel9-580.82.07-1.0-1.x86_64.rpm -
驗(yàn)證驅(qū)動(dòng)是否安裝成功
# 執(zhí)行如下命令 root@GPU1:~ nvidia-smi +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 Tesla T4 On | 00000000:3B:00.0 Off | 0 | | N/A 51C P0 29W / 70W | 12233MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 Tesla T4 On | 00000000:86:00.0 Off | 0 | | N/A 49C P0 30W / 70W | 6017MiB / 15360MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | +---------------------------------------------------------------------------------------到這里我們的GPU的驅(qū)動(dòng)就安裝好了,系統(tǒng)也可以正常的識(shí)別到GPU了。這里顯示的CUDA Version指的是當(dāng)前驅(qū)動(dòng)最大支持的CUDA版本。
開始安裝CUDA Toolkit
-
下載CUDA Toolkit
首先下載CUDA Toolkit

選擇好自己的系統(tǒng)和版本

建議下載.run文件
-
安裝CUDA Toolkit
wget https://developer.download.nvidia.com/compute/cuda/13.0.0/local_installers/cuda_13.0.0_580.65.06_linux.run sudo sh cuda_13.0.0_580.65.06_linux.run # 安裝成功的日志示例 =========== = Summary = =========== Driver: Installed Toolkit: Installed in /usr/local/cuda-13.0/ Please make sure that - PATH includes /usr/local/cuda-13.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-13.0/lib64, or, add /usr/local/cuda-13.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-13.0/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Logfile is /var/log/cuda-installer.log下載并安裝即可,其實(shí)如果你沒有安裝NVIDIA Driver這一步會(huì)幫你安裝好適合你的NVIDIA Driver,比如這個(gè)會(huì)安裝580.65.06版本的Driver。
-
配置環(huán)境變量
vim /etc/profile.d/cuda.sh # 編輯一個(gè)新文件,內(nèi)容如下: # 添加 CUDA 13.0 到 PATH export PATH=/usr/local/cuda-13.0/bin:$PATH # 添加 CUDA 13.0的 lib64 到 LD_LIBRARY_PATH export LD_LIBRARY_PATH=/usr/local/cuda-13.0/lib64:$LD_LIBRARY_PATH保存,刷新配置文件
source /etc/profile.d/cuda.sh
檢查是否部署成功 ```markdown # 如果輸出版本號(hào)即為成功 (base) root@Colourdata-GPU:~# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Fri_Feb_21_20:23:50_PST_2025 Cuda compilation tools, release 12.8, V12.8.93 Build cuda_12.8.r12.8/compiler.35583870_0
二、容器環(huán)境(Docker or Containerd)
1、安裝 nvidia-container-toolkit
說明:
NVIDIA Container Toolkit 的主要作用是將 NVIDIA GPU 設(shè)備掛載到容器中。兼容docker、containerd、cri-o等。
With dnf: RHEL/CentOS, Fedora, Amazon Linux
# 配置生產(chǎn)存儲(chǔ)庫
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
# 安裝NVIDIA Container Toolkit 軟件包
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
sudo dnf install -y \
nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
libnvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION}
2、配置Runtime 為NVIDIA
Docker
# 配置runtime=doker
sudo nvidia-ctk runtime configure --runtime=docker
# 建議在 /etc/docker/daemon.json 里面檢查一下,并將默認(rèn)runtime也修改為nvidia
(base) root@Colourdata-GPU:~# vim /etc/docker/daemon.json
{
"registry-mirrors": [
"https://ihsxva0f.mirror.aliyuncs.com",
"https://docker.m.daocloud.io",
"https://registry.docker-cn.com"
],
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
},
"storage-driver": "overlay2",
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
systemctl daemon-relod
systemctl restart docker
Containerd
# 配置runtime=containerd
sudo nvidia-ctk runtime configure --runtime=containerd
# 建議在 /etc/containerd/config.toml 里面檢查一下,并將默認(rèn)runtime也修改為nvidia
# 修改好后重啟containerd
sudo systemctl restart containerd
以上部署完成,就可以配置K8S調(diào)用GPU了。
三、K8S調(diào)用GPU
說明:
device-plugin由NVIDIA提供,官網(wǎng)文檔。
部署Plugin
-
建議先給GPU節(jié)點(diǎn)打上標(biāo)簽 gpu=true
-
部署服務(wù)
# 下載地址,建議選擇新版本 https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/static/nvidia-device-plugin.yml # 發(fā)布服務(wù) root@test:~# kubectl apply -f nvidia-device-plugin.yml root@test:~# kubectl get po -l app=gpu -n bobai NAME READY STATUS RESTARTS AGE nvidia-device-plugin-daemonset-7nkjw 1/1 Running 0 10m -
檢查服務(wù)是否部署成功
# 如果可以看到nvidia gpu說明服務(wù)已經(jīng)部署成功了 root@test:~# kubectl describe node GPU | grep nvidia.com/gpu nvidia.com/gpu: 2
以上部署完成后,你的K8S集群就可以調(diào)用GPU了。
三、部署服務(wù)
1、部署Deekseek-v3
示例yaml文件如下,僅供參考
apiVersion: apps/v1
kind: Deployment
metadata:
name: deepseek-v3
namespace: bobai
spec:
replicas: 1
selector:
matchLabels:
app: deepseek-v3
template:
metadata:
labels:
app: deepseek-v3
spec:
containers:
- command:
- sh
- -c
- vllm serve --port 8000 --trust-remote-code --served-model-name deepseek-v3 --dtype=fp8 --max-model-len 65536 --gpu-memory-utilization 0.95 /models/DeepSeek-V3
name: deepseek-v3
image: registry.cn-shanghai.aliyuncs.com/colourdata/bobai-dependency:vllm
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8000
volumeMounts:
- name: model-volume
mountPath: /models
resources:
requests:
nvidia.com/gpu: 8
memory: "16Gi"
cpu: "8"
limits:
nvidia.com/gpu: 8
memory: "32Gi"
cpu: "16"
livenessProbe:
tcpSocket:
port: 8000
initialDelaySeconds: 300
periodSeconds: 10
failureThreshold: 3
readinessProbe:
tcpSocket:
port: 8000
initialDelaySeconds: 300
periodSeconds: 10
failureThreshold: 3
volumes:
- name: model-volume
hostPath:
path: /models
type: Directory
2、qwen-embedding模型部署
示例yaml文件如下,僅供參考
apiVersion: apps/v1
kind: Deployment
metadata:
name: vllm-embedding
namespace: bobai
spec:
replicas: 1
selector:
matchLabels:
app: vllm-embedding
template:
metadata:
labels:
app: vllm-embedding
spec:
containers:
- command:
- sh
- -c
- vllm serve --port 8000 --trust-remote-code --served-model-name vllm-embedding --max-model-len 4096 --gpu-memory-utilization 0.85 /models/Qwen3-Embedding-0.6B
name: vllm-embedding
image: registry.cn-shanghai.aliyuncs.com/colourdata/bobai-dependency:vllm
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8000
volumeMounts:
- name: model-volume
mountPath: /models
resources:
limits:
nvidia.com/gpu: 1
requests:
memory: "8Gi"
cpu: "4"
limits:
memory: "16Gi"
cpu: "8"
livenessProbe:
tcpSocket:
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
tcpSocket:
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
volumes:
- name: model-volume
nfs:
server: 192.168.2.250
path: /data/bobai/models
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: vllm-embedding
namespace: bobai
spec:
type: ClusterIP
ports:
- port: 8000
protocol: TCP
targetPort: 8000
selector:
app: vllm-embedding
3、Tika部署
示例yaml文件如下,僅供參考
apiVersion: apps/v1
kind: Deployment
metadata:
name: tika
namespace: bobai
spec:
replicas: 1
selector:
matchLabels:
app: tika
template:
metadata:
labels:
app: tika
spec:
containers:
- name: tika
image: tika-ocr-cn:v1
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9998
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1"
livenessProbe:
tcpSocket:
port: 9998
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
tcpSocket:
port: 9998
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: tika-service
namespace: bobai
spec:
type: ClusterIP
selector:
app: tika
ports:
- protocol: TCP
port: 9998
targetPort: 9998
4、部署ASR
示例yaml文件如下,僅供參考
apiVersion: apps/v1
kind: Deployment
metadata:
name: openai-edge-tts
namespace: bobai
spec:
replicas: 1
selector:
matchLabels:
app: openai-edge-tts
template:
metadata:
labels:
app: openai-edge-tts
spec:
containers:
- name: openai-edge-tts
image: travisvn/openai-edge-tts:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5050
env:
- name: API_KEY
value: "Colourdata1234@"
- name: PORT
value: "5050"
- name: DEFAULT_VOICE
value: "en-US-AvaNeural"
- name: DEFAULT_RESPONSE_FORMAT
value: "mp3"
- name: DEFAULT_SPEED
value: "1.0"
- name: DEFAULT_LANGUAGE
value: "en-US"
- name: REQUIRE_API_KEY
value: "True"
- name: REMOVE_FILTER
value: "False"
- name: EXPAND_API
value: "True"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1"
livenessProbe:
tcpSocket:
port: 5050
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
tcpSocket:
port: 5050
initialDelaySeconds: 10
periodSeconds: 5
failureThreshold: 3
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: openai-edge-tts-service
namespace: bobai
spec:
type: ClusterIP
selector:
app: openai-edge-tts
ports:
- protocol: TCP
port: 5050
targetPort: 5050
浙公網(wǎng)安備 33010602011771號(hào)