跳至主要內容

nVidia

nVidia GPU 计算能力 查询 - cnopen in new window
nVidia GPU 计算能力 查询 - enopen in new window
vVidia GPU 计算能力 - Cuda 版本 对照表open in new window
nVidia 驱动程序下载open in new window
nVidia Cudaopen in new window

nVidia Ampere Cubaopen in new window
CUDA C++ Programming Guideopen in new window
CUDA C++ Best Practices Guideopen in new window
CUDA Toolkit Archiveopen in new window
GPU, CUDA Toolkit, and CUDA Driver Requirementsopen in new window

注意:不要使用 ubuntu 推荐的命令,否则会安装不适配的 nvidia-cuda-toolkit

如何选版本

Pytorch、torchvision、CUDA 各个版本对应关系以及安装指令open in new window

注:不要使用pip的方式直接下载torch安装,这种方式安装的torch无法调用GPU,因为pip默认下载的CPU版本的torch。

显卡驱动、CUDA和Pytorch的对应关系
显卡驱动、CUDA和Pytorch的对应关系
  • 显卡驱动是操作系统和显卡硬件之间的软件接口,复杂管理显卡的的功能和性能,一台主机上只能安装一个版本的显卡驱动。
  • CUDA是NVIDIA 开发的基于C/C++的并行计算工具包,可以使开发人员在 GPU 上编写高性能的并行程序。每个版本的显卡驱动有其支持的最高CUDA版本,并可以向后兼容旧版本的CUDA。 一台主机上可以同时安装多个版本的显卡驱动,使用时只需要在~/.bashrc文件中将对应版本cuda的安装路径加入环境变量即可。
  • Pytorch是在CUDA上开发的一个开源的机器学习框架,用于构建深度学习模型。每个版本的CUDA都可能支持多个版本的Pytorch。一般情况下,pytorch版本与开源代码要求的略有不同不会影响使用,但pytorch基于的cuda版本一定要与本机环境变量中设置的cuda版本相同。
  1. 约束一:从torch版本入手,确定你要安装的torch版本,pytorch每个版本有自己支持的CUDA版本,此约束不一定强制满足,高版本CUDA也可以正常安装。
  2. 约束二:确定自己的nvidia驱动最高支持安装CUDA的版本(驱动版本能高尽量高,因为驱动对Cuda版本是向前兼容的)。
  3. 约束三:CUDA对GPU的算力需要匹配。
  4. 最后根据这三个约束找到你要的CUDA版本,再根据CUDA版本安装CUDNN以及torch。

显卡驱动无法支持cuda版本问题open in new window

Cuda 与 计算能力 对照
Cuda 与 计算能力 对照

卸载、安装、查看 nvidia驱动

## 清除已安装的nvidia驱动
apt-get purge nvidia*
apt-get purge nvidia-*
apt-get remove --purge nvidia*
## 卸载所有和nvidia相关
apt autoremove *nvidia*
## 清除相关的依赖包
apt autoremove

## 加入gpu ppa
add-apt-repository ppa:graphics-drivers
## 更新软件包
apt-get update
## 查询支持的gpu版本
ubuntu-drivers list
ubuntu-drivers devices

ERROR:root:aplay command not found
== /sys/devices/pci0000:00/0000:00:0d.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor   : NVIDIA Corporation
model    : TU104GL [Tesla T4]
manual_install: True
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-450-server - distro non-free
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-545 - distro non-free
driver   : nvidia-driver-418-server - distro non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin


## 从支持列表中安装指定版本
apt install nvidia-driver-xxx

## 安装完成后,重启电脑
reboot

## 查看显卡驱动是否顺利更新
nvidia-smi
### 若打印:gpu相关信息,则表示成功了;否则,表示失败了。

查看驱动

(base) [~ Thu Jun 06 11:16:59 root@ecs-811c]#nvidia-smi   // V100 Cuda 11.4

Tue Apr 30 15:12:35 2024  
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02   Driver Version: 470.223.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  On   | 00000000:00:0D.0 Off |                    0 |
| N/A   27C    P0    23W / 250W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

(base) [~ Thu Jun 06 11:16:59 root@ecs-811c]#nvidia-smi  // T4 Cuda 11.4
Thu Jun  6 11:17:19 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:0D.0 Off |                    0 |
| N/A   39C    P8    15W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
(base) [~ Thu Jun 06 11:17:19 root@ecs-811c]#
(base) [~ Thu Jun 06 15:24:21 root@ecs-811c]#nvidia-smi  // T4 Cuda 12.3
Thu Jun  6 15:24:41 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:0D.0 Off |                    0 |
| N/A   39C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
(base) [~ Thu Jun 06 15:24:41 root@ecs-811c]#
(base) [~/open-webui/backend Wed Jun 19 12:02:20 root@ecs-811c]#nvidia-smi // T4 Cuda 12.5
Wed Jun 19 12:02:21 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.42.02              Driver Version: 555.42.02      CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla T4                       Off |   00000000:00:0D.0 Off |                    0 |
| N/A   51C    P0             66W /   70W |    4979MiB /  15360MiB |     89%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     38802      C   ...unners/cuda_v11/ollama_llama_server       4976MiB |
+-----------------------------------------------------------------------------------------+
(base) [~/open-webui/backend Wed Jun 19 12:02:21 root@ecs-811c]#nvidia-smi
nvcc -V

pyTorch

pyTorch 官网open in new window

上次编辑于:
贡献者: Michael-LiuQ