Wiki‎ > ‎

Select GPU for desmond MD jobs on a Multiple-CPU computer (compute and persistence mode) and set power limit

posted Dec 14, 2015, 5:32 PM by Dong Xu   [ updated Jan 1, 2016, 9:59 AM ]
Reference: http://on-demand.gputechconf.com/gtc/2014/presentations/S4253-tools-tips-for-managing-a-gpu-cluster.pdf

Initializing a GPU in  runlevel 3
Most clusters operate at  runlevel 3 so you should initialize  the GPU explicitly in an  init script
 At minimum:

— Load kernel modules  – nvidia +  nvidia_uvm (in CUDA 6)
— Create devices with  mknod

Optional steps:
— Configure compute mode
— Set driver persistence
— Set power limits


Set GPU power limits
Power consumption limits can be set with NVML/ nvidia - smi
Set on a per - GPU basis
Useful in power - constrained environments

nvidia - smi – pl <power in watts>

Settings don’t persist across reboots
set this in your  init script
Requires driver persistence

======================================

sudo vi /etc/init.d/after.local

add:
/usr/bin/nvidia-smi -c 3
/usr/bin/nvidia-smi -pm 1

/usr/bin/nvidia-smi -pl 120  (optional, used on T5500 GTX970 120watt limit) due to power constraints

sudo chmod 755 /etc/init.d/after.local
reboot

This was added to all lab workstations with more than one GPU. Most of time, CUDA_VISIBLE_DEVICES is still needed on top of this configuration.
Comments