|
|
@@ -9,7 +9,7 @@
|
|
|
&& sudo apt -y install nvidia-container-toolkit nvidia-container-runtime nvidia-docker2
|
|
|
|
|
|
|
|
|
- DCGM on host machine running nvidia GPU
|
|
|
+ DCGM on host machine running Nvidia GPU
|
|
|
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin \
|
|
|
&& sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 \
|
|
|
&& sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub \
|
|
|
@@ -21,15 +21,11 @@
|
|
|
## Deployment
|
|
|
|
|
|
1. Modify the prometheus configuration template `/etc/prometheus/prometheus.yml` location.
|
|
|
-2. # job for nvidia DCGM exporter
|
|
|
+# job for nvidia DCGM exporter
|
|
|
- job_name: 'nvidia_exporter'
|
|
|
static_configs:
|
|
|
- targets: ['nvidia_exporter:9400'] # if nvidia_exporter container is not on same docker network , change this line to "- targets: ['whichever ip your host is:9400']"
|
|
|
|
|
|
-## Configuration
|
|
|
-
|
|
|
-None
|
|
|
-
|
|
|
# Additional Referfences
|
|
|
[Official DCGM Documentations](https://github.com/NVIDIA/DCGM)
|
|
|
[Nvidia container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide)
|