機(jī)器視覺-嘗試Windows下使用Docker 容器進(jìn)行AMD ROCm加速(2)
結(jié)論:
- Windows 無法通過WLS2的AMD ROCm Docker無法找到GPU, 參考: https://unix.stackexchange.com/questions/715847/wsl2-issue-installing-new-kernel/715922#715922
- Linux作為宿主機(jī)應(yīng)該是可以的, 參考: https://github.com/harakas/amd_igpu_yolo_v8
這里記錄一些WLS下嘗試過程.
啟動(dòng)Docker deskop
啟動(dòng)Docker deskop, 通過界面可設(shè)置 http/https proxy, 以加快docker鏡像源的訪問.
準(zhǔn)備 Dockerfile
參考 https://github.com/harakas/amd_igpu_yolo_v8 , 該reps 使用的Linux作為宿主機(jī).
在 d:\my_workspace\docker\ 創(chuàng)建一個(gè) Dockfile 文件:
# Based on
# https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html#using-docker-with-pytorch-pre-installed
# https://pytorch.org/hub/ultralytics_yolov5/
FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get install -y migraphx
RUN apt-get install -y fonts-freefont-ttf
RUN pip install -U pip
RUN pip install -U 'ultralytics' 'gitpython>=3.1.30' 'Pillow>=10.0.1' 'numpy>=1.23.5' 'scipy>=1.11.4' 'onnx>=1.12.0' 'onnxruntime'
RUN mkdir /opt/cwd
WORKDIR /opt/cwd
ENTRYPOINT ["/opt/conda/envs/py_3.10/bin/python"]
基于該 Dockfile 文件創(chuàng)建鏡像
docer 鏡像 tag 為 rocm-pytorch , 我編譯耗時(shí)近兩個(gè)小時(shí).
cd D:\my_workspace\docker
d:
"C:\Program Files\Docker\Docker\resources\bin\docker.exe" build -t rocm-pytorch .
運(yùn)行docker鏡像
創(chuàng)建 bash 文件 docker_run.sh , 內(nèi)容如下, 然后運(yùn)行
#! /bin/bash
#
# Based on
# https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html#using-docker-with-pytorch-pre-installed
#
# The following environment variables are necessary for ROCM to work properly:
# -e HSA_ENABLE_SDMA=0
# So that ROCM would work at all due to missing pce atomics
# -e HSA_OVERRIDE_GFX_VERSION=9.0.0
# ROCM/Tensile, etc do no work with gfx90c due to missing profile files
# I have gfx90c so I override the gfx version for the libraries to work properly
# See https://github.com/ROCm/ROCm/issues/1743#issuecomment-1149902796
#
# Also you need the following groups for ROCM to run properly (as we do not run as root):
# --group-add video --group-add _ssh --group-add render
#
docker run -it \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
--group-add video \
--group-add _ssh \
--group-add render \
-e HSA_ENABLE_SDMA=0 \
-e HSA_OVERRIDE_GFX_VERSION=9.0.0 \
-e PYTHONPATH=/opt/rocm-5.7.0/lib/ \
--ipc=host \
--shm-size 16G \
--name my_docker \
-v /d/my_workspace/docker_vol:/docker_vol \
rocm-pytorch
說明:
- 參數(shù)--shm-size是Docker鏡像的共享內(nèi)存大小(Shared Memories Size),如果需要訓(xùn)練很大的模型,可以根據(jù)實(shí)際需要調(diào)大。
- 將
d:\my_workspace\docker_vol目錄映射到docker中 - 鏡像中已經(jīng)創(chuàng)建了 /opt/cwd 作為工作目錄
- 也可以增加
--network=host參數(shù)試試 - 也可以增加
- e DEVICE=cuda參數(shù)試試
當(dāng)創(chuàng)建 mydocker 容器后, 我們可以通過指定容器tag來運(yùn)行, 比如:
docker container start my_docker
docker container stop my_docker
docker container rm my_docker
python中是否能檢測(cè)出假冒的Cuda
命令行驗(yàn)證:
python3 -c 'import torch' 2> /dev/null && echo 'Success' || echo 'Failure'
python3 -c 'import torch; print(torch.cuda.is_available())'
python 解釋器中驗(yàn)證:
import torch
torch.cuda.is_available()
結(jié)果顯示False, 也就是沒有被認(rèn)為是CUDA, 在docker中執(zhí)行 rocminfo 命令, 顯示:
# 執(zhí)行 rocminfo
ROCk module is NOT loaded, possibly no GPU devices
# 執(zhí)行 sudo modprobe amdgpu
modprobe: FATAL: Module amdgpu not found in directory /lib/modules/5.15.133.1-microsoft-standard-WSL2
參考: https://unix.stackexchange.com/questions/715847/wsl2-issue-installing-new-kernel/715922#715922 , 發(fā)現(xiàn)該問題無法解決, 放棄繼續(xù)探索

浙公網(wǎng)安備 33010602011771號(hào)