Wiki‎ > ‎

GPU nvidia-smi and Schrodinger

posted Jul 13, 2016, 9:36 AM by Dong Xu   [ updated Aug 11, 2016, 10:10 PM ]

Schrodinger:

Second, ensure that all your cards have ECC memory turned off, are set to Exclusive Process Mode, and Persistence Mode is on.

  1. Disable ECC by running the following command as root:

    nvidia-smi -e 0
    
  2. Reboot the machine. This will disable ECC on all the GPUs. You must make sure this has been done for all GPUs: you can run nvidia-smi to check.
  3. Set the compute Mode of the GPU to "exclusive process" in the boot process by adding the following line to /etc/rc.local:
    nvidia-smi -c 3
    

    On some systems, this command does not work, so you must set each card in this case:

    nvidia-smi -c 3 -i 0
    nvidia-smi -c 3 -i 1
    ...
    
  4. Set persistence mode to allow quick startup of CUDA:
    nvidia-smi -pm 1
    

    This line should also be added to /etc/rc.local to ensure it is set on boot.

  5. Ensure that /dev/nvidiactl is created. You may need to add the line below to /etc/rc.local:
    modprobe nvidia
    
  6. Ensure that users can read and write to the /dev/nvidia* devices:
    chmod 666 /dev/nvidia*
    

Next, set up and configure the software.

  1. Install the Schrödinger software. You must install a release no earlier than 2013-1. Only Linux-x86_64 software is supported with GPU use.
  2. Ensure that the GPU cards are recognized by Schrödinger Software, by running the command
    $SCHRODINGER/utilities/query_gpgpu
    
  3. Install a queueing system, if one is not already installed. A queueing system is required for FEP jobs and is highly recommended for other jobs.
  4. Add explicit entries to the hosts file, schrodinger.hosts, for running on GPUs. See Article 1844 for detailed instructions and examples.

Important Notes

  • When a GPU card is placed into exclusive mode, it can only serve one job at a time. Overloading the machine with more jobs than available GPUs will cause some of the jobs to crash.
  • If the machine you are setting up has a single GPU, you must make an important choice between two modes:
    • Set the GPU compute mode to Default (<tt>nvidia-smi -c 0</tt>). This will allow Maestro and Desmond to share the GPU, but at significantly reduced performance.
    • Set the GPU compute mode to Exclusive (<tt>nvidia-smi -c 3</tt>) as described above. You may run Maestro or Desmond, but not both simultaneously. You will need to launch Desmond jobs from the command line after exiting Maestro.
  • If the machine has an extra card that is reserved for display, you may want to explicitly list the devices to be considered in the computation. To do this, add a line to the hosts file entry like the following example, which does not include device 0:
    env: SCHRODINGER_CUDA_VISIBLE_DEVICES="1, 2, 3"
    

=======================================================================

Also see https://sourceforge.net/p/xcat/wiki/xCAT_P8LE_cuda_installing/

Persistence Mode

On Linux, you can set GPUs to persistence mode to keep the NVIDIA driver loaded even when no applications are accessing the cards. This is particularly useful when you have a series of short jobs running. Persistence mode uses more power, but prevents the fairly long delays that occur each time a GPU application is started. It is also necessary if you’ve assigned specific clock speeds or power limits to the GPUs (as those changes are lost when the NVIDIA driver is unloaded). Enable persistence mode on all GPUS by running:
nvidia-smi -pm 1

On Windows, nvidia-smi is not able to set persistence mode. Instead, you need to set your computational GPUs to TCC mode. This should be done through NVIDIA’s graphical GPU device management panel.

Querying GPU Status

Microway’s GPU Test Drive cluster, which we provide as a benchmarking service to our customers, contains a group of NVIDIA’s latest Tesla GPUs. These are NVIDIA’s high-performance compute GPUs and provide a good deal of health and status information. The examples below are taken from this internal cluster.

To list all available NVIDIA devices, run:

[root@md ~]# nvidia-smi -L
GPU 0: Tesla K40m (UUID: GPU-d0e093a0-c3b3-f458-5a55-6eb69fxxxxxx)
GPU 1: Tesla K40m (UUID: GPU-d105b085-7239-3871-43ef-975ecaxxxxxx)

To list certain details about each GPU, try:

[root@md ~]# nvidia-smi --query-gpu=index,name,uuid,serial --format=csv
0, Tesla K40m, GPU-d0e093a0-c3b3-f458-5a55-6eb69fxxxxxx, 0323913xxxxxx
1, Tesla K40m, GPU-d105b085-7239-3871-43ef-975ecaxxxxxx, 0324214xxxxxx

Monitoring and Managing GPU Boost

The GPU Boost feature which NVIDIA has included with more recent GPUs allows the GPU clocks to vary depending upon load (achieving maximum performance so long as power and thermal headroom are available). However, the amount of available headroom will vary by application (and even by input file!) so users and administrators should keep their eyes on the status of the GPUs.

A listing of available clock speeds can be shown for each GPU (in this case, the Tesla K80):

nvidia-smi -q -d SUPPORTED_CLOCKS

GPU 0000:04:00.0
    Supported Clocks
        Memory                      : 2505 MHz
            Graphics                : 875 MHz
            Graphics                : 862 MHz
            Graphics                : 849 MHz
            Graphics                : 836 MHz
            Graphics                : 823 MHz
            Graphics                : 810 MHz
            Graphics                : 797 MHz
            Graphics                : 784 MHz
            Graphics                : 771 MHz
            Graphics                : 758 MHz
            Graphics                : 745 MHz
            Graphics                : 732 MHz
            Graphics                : 719 MHz
            Graphics                : 705 MHz
            Graphics                : 692 MHz
            Graphics                : 679 MHz
            Graphics                : 666 MHz
            Graphics                : 653 MHz
            Graphics                : 640 MHz
            Graphics                : 627 MHz
            Graphics                : 614 MHz
            Graphics                : 601 MHz
            Graphics                : 588 MHz
            Graphics                : 575 MHz
            Graphics                : 562 MHz
        Memory                      : 324 MHz
            Graphics                : 324 MHz

The above output indicates that only two memory clock speeds are supported (2505 MHz and 324 MHz). With the memory running at 2505 MHz, there are 25 supported GPU clock speeds. With the memory running at 324 MHz, only a single GPU clock speed is supported (which is the idle GPU state). On the Tesla K80, GPU Boost automatically manages these speeds and runs as fast as possible. On other models, such as Tesla K40, the administrator must specifically select the desired GPU clock speed.

To review the current GPU clock speed, default clock speed, and maximum possible clock speed, run:

nvidia-smi -q -d CLOCK

GPU 0000:04:00.0
    Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
    Applications Clocks
        Graphics                    : 875 MHz
        Memory                      : 2505 MHz
    Default Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
    SM Clock Samples
        Duration                    : 3730.56 sec
        Number of Samples           : 8
        Max                         : 875 MHz
        Min                         : 324 MHz
        Avg                         : 873 MHz
    Memory Clock Samples
        Duration                    : 3730.56 sec
        Number of Samples           : 8
        Max                         : 2505 MHz
        Min                         : 324 MHz
        Avg                         : 2500 MHz
    Clock Policy
        Auto Boost                  : On
        Auto Boost Default          : On

Ideally, you’d like all clocks to be running at the highest speed all the time. However, this will not be possible for all applications. To review the current state of each GPU and any reasons for clock slowdowns, use the PERFORMANCE flag:

nvidia-smi -q -d PERFORMANCE

GPU 0000:04:00.0
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Unknown                     : Not Active

If any of the GPU clocks is running at a slower speed, one or more of the above Clocks Throttle Reasons will be marked as active. The most concerning condition would be if HW Slowdown or Unknown are active, as these would most likely indicate a power or cooling issue. The remaining conditions typically indicate that the card is idle or has been manually set into a slower mode by a system administrator.

Reviewing System Topology

To properly take advantage of more advanced NVIDIA GPU features (such as GPU Direct), it is often vital that the system topology be properly configured. The topology refers to how the PCI-Express devices (GPUs, InfiniBand HCAs, storage controllers, etc.) connect to each other and to the system’s CPUs. If not correct, it is possible that certain features will slow down or even stop working altogether. To help tackle such questions, recent versions of nvidia-smi include an experimental system topology view:

nvidia-smi topo --matrix

        GPU0    GPU1    GPU2    GPU3    mlx4_0  CPU Affinity
GPU0     X      PIX     PHB     PHB     PHB     0-11
GPU1    PIX      X      PHB     PHB     PHB     0-11
GPU2    PHB     PHB      X      PIX     PHB     0-11
GPU3    PHB     PHB     PIX      X      PHB     0-11
mlx4_0  PHB     PHB     PHB     PHB      X 

Legend:

  X   = Self
  SOC = Path traverses a socket-level link (e.g. QPI)
  PHB = Path traverses a PCIe host bridge
  PXB = Path traverses multiple PCIe internal switches
  PIX = Path traverses a PCIe internal switch

Reviewing this section will take some getting used to, but can be very valuable. The above configuration shows two Tesla K80 GPUs and one Mellanox FDR InfiniBand HCA all connected to the first CPU of a server. Because the CPUs are 12-core Xeons, the topology tool recommends that jobs be assigned to the first 12 CPU cores (although this will vary by application). Get in touch with one of our HPC GPU experts if you have questions on this topic.

Printing all GPU Details

To list all available data on a particular GPU, specify the ID of the card with -i. Here’s the output from an older Tesla GPU card:

nvidia-smi -i 0 -q

==============NVSMI LOG==============

Timestamp                       : Mon Dec  5 22:05:49 2011

Driver Version                  : 270.41.19

Attached GPUs                   : 2

GPU 0:2:0
    Product Name                : Tesla M2090
    Display Mode                : Disabled
    Persistence Mode            : Disabled
    Driver Model
        Current                 : N/A
        Pending                 : N/A
    Serial Number               : 032251100xxxx
    GPU UUID                    : GPU-2b1486407f70xxxx-98bdxxxx-660cxxxx-1d6cxxxx-9fbd7e7cd9bf55a7cfb2xxxx
    Inforom Version
        OEM Object              : 1.1
        ECC Object              : 2.0
        Power Management Object : 4.0
    PCI
        Bus                     : 2
        Device                  : 0
        Domain                  : 0
        Device Id               : 109110DE
        Bus Id                  : 0:2:0
    Fan Speed                   : N/A
    Memory Usage
        Total                   : 5375 Mb
        Used                    : 9 Mb
        Free                    : 5365 Mb
    Compute Mode                : Default
    Utilization
        Gpu                     : 0 %
        Memory                  : 0 %
    Ecc Mode
        Current                 : Enabled
        Pending                 : Enabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Total           : 0
            Double Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Total           : 0
        Aggregate
            Single Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Total           : 0
            Double Bit            
                Device Memory   : 0
                Register File   : 0
                L1 Cache        : 0
                L2 Cache        : 0
                Total           : 0
    Temperature
        Gpu                     : N/A
    Power Readings
        Power State             : P12
        Power Management        : Supported
        Power Draw              : 31.57 W
        Power Limit             : 225 W
    Clocks
        Graphics                : 50 MHz
        SM                      : 100 MHz
        Memory                  : 135 MHz

The above example shows an idle card. Here is an excerpt for a card running GPU-accelerated AMBER:

nvidia-smi -i 0 -q -d MEMORY,UTILIZATION,POWER,CLOCK,COMPUTE

==============NVSMI LOG==============

Timestamp                       : Mon Dec  5 22:32:00 2011

Driver Version                  : 270.41.19

Attached GPUs                   : 2

GPU 0:2:0
    Memory Usage
        Total                   : 5375 Mb
        Used                    : 1904 Mb
        Free                    : 3470 Mb
    Compute Mode                : Default
    Utilization
        Gpu                     : 67 %
        Memory                  : 42 %
    Power Readings
        Power State             : P0
        Power Management        : Supported
        Power Draw              : 109.83 W
        Power Limit             : 225 W
    Clocks
        Graphics                : 650 MHz
        SM                      : 1301 MHz
        Memory                  : 1848 MHz

You’ll notice that unfortunately the earlier M-series passively-cooled Tesla GPUs do not report temperatures to nvidia-smi. More recent Quadro and Tesla GPUs support a greater quantity of metrics data:

==============NVSMI LOG==============

Timestamp                           : Tue Apr  7 13:01:34 2015
Driver Version                      : 346.46

Attached GPUs                       : 2
GPU 0000:05:00.0
    Product Name                    : Tesla K80
    Product Brand                   : Tesla
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Enabled
    Accounting Mode Buffer Size     : 128
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : 0324614xxxxxx
    GPU UUID                        : GPU-81dexxxx-87xx-4axx-79xx-3ddf4dxxxxxx
    Minor Number                    : 0
    VBIOS Version                   : 80.21.1B.00.01
    MultiGPU Board                  : Yes
    Board ID                        : 0x300
    Inforom Version
        Image Version               : 2080.0200.00.04
        OEM Object                  : 1.1
        ECC Object                  : 3.0
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    PCI
        Bus                         : 0x05
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x102D10DE
        Bus Id                      : 0000:05:00.0
        Sub System Id               : 0x106C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : PLX
            Firmware                : 0xF0472900
        Replays since reset         : 0
        Tx Throughput               : N/A
        Rx Throughput               : N/A
    Fan Speed                       : N/A
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 12287 MiB
        Used                        : 56 MiB
        Free                        : 12231 MiB
    BAR1 Memory Usage
        Total                       : 16384 MiB
        Used                        : 2 MiB
        Free                        : 16382 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : Disabled
        Pending                     : Disabled
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : 0
        Double Bit ECC              : 0
        Pending                     : No
    Temperature
        GPU Current Temp            : 34 C
        GPU Shutdown Temp           : 93 C
        GPU Slowdown Temp           : 88 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 25.65 W
        Power Limit                 : 149.00 W
        Default Power Limit         : 149.00 W
        Enforced Power Limit        : 149.00 W
        Min Power Limit             : 100.00 W
        Max Power Limit             : 175.00 W
    Clocks
        Graphics                    : 324 MHz
        SM                          : 324 MHz
        Memory                      : 324 MHz
    Applications Clocks
        Graphics                    : 875 MHz
        Memory                      : 2505 MHz
    Default Applications Clocks
        Graphics                    : 562 MHz
        Memory                      : 2505 MHz
    Max Clocks
        Graphics                    : 875 MHz
        SM                          : 875 MHz
        Memory                      : 2505 MHz
    Clock Policy
        Auto Boost                  : On
        Auto Boost Default          : On
    Processes                       : None

Of course, we haven’t covered all the possible uses of the nvidia-smi tool. To read the full list of options, run nvidia-smi -h (it’s fairly lengthy). If you need to change settings on your cards, you’ll want to look at the device modification section:

    -pm,  --persistence-mode=   Set persistence mode: 0/DISABLED, 1/ENABLED
    -e,   --ecc-config=         Toggle ECC support: 0/DISABLED, 1/ENABLED
    -p,   --reset-ecc-errors=   Reset ECC error counts: 0/VOLATILE, 1/AGGREGATE
    -c,   --compute-mode=       Set MODE for compute applications:
                                0/DEFAULT, 1/EXCLUSIVE_THREAD,
                                2/PROHIBITED, 3/EXCLUSIVE_PROCESS
          --gom=                Set GPU Operation Mode:
                                    0/ALL_ON, 1/COMPUTE, 2/LOW_DP
    -r    --gpu-reset           Trigger reset of the GPU.
                                Can be used to reset the GPU HW state in situations
                                that would otherwise require a machine reboot.
                                Typically useful if a double bit ECC error has
                                occurred.
                                Reset operations are not guarenteed to work in
                                all cases and should be used with caution.
                                --id= switch is mandatory for this switch
    -ac   --applications-clocks= Specifies  clocks as a
                                    pair (e.g. 2000,800) that defines GPU's
                                    speed in MHz while running applications on a GPU.
    -rac  --reset-applications-clocks
                                Resets the applications clocks to the default values.
    -acp  --applications-clocks-permission=
                                Toggles permission requirements for -ac and -rac commands:
                                0/UNRESTRICTED, 1/RESTRICTED
    -pl   --power-limit=        Specifies maximum power management limit in watts.
    -am   --accounting-mode=    Enable or disable Accounting Mode: 0/DISABLED, 1/ENABLED
    -caa  --clear-accounted-apps
                                Clears all the accounted PIDs in the buffer.
          --auto-boost-default= Set the default auto boost policy to 0/DISABLED
                                or 1/ENABLED, enforcing the change only after the
                                last boost client has exited.
          --auto-boost-permission=
                                Allow non-admin/root control over auto boost mode:
                                0/UNRESTRICTED, 1/RESTRICTED

With this tool, checking the status and health of NVIDIA GPUs is simple. If you’re looking to monitor the cards over time, then nvidia-smi might be more resource-intensive than you’d like. For that, have a look at NVIDIA’s GPU Management Library (NVML), which offers C, Perl and Python bindings. Commonly-used cluster tools, such as Ganglia, use these bindings to query GPU status.


Comments