Re: Utilising the GPU in NAMD((NVIDIA CUDA acceleration) in windows

From: Darin Lory (
Date: Thu Feb 21 2019 - 16:52:52 CST


Check out nvtop for GPU stats, I built on my AWS EC2 P3 RHEL instances for

My notes are below. You will need cmake version 3.x. I used RHEL 7, but
this is easy to do for Ubuntu.

Best regards,


"The most exciting phrase to hear in science, the one that heralds new
discoveries, is not 'Eureka!' (I found it!) but 'That's funny ...'"
  -Issac Asimov

[image: --]

Darin S. Lory
[image: https://]
Compiling nvtop -

git clone
mkdir -p nvtop/build && cd nvtop/build
cmake3 ..
make install

output from git clone,cmake3, make, and make install

[root_at_ip-10-246-148-209 packages]# git clone
Cloning into 'nvtop'...
remote: Enumerating objects: 93, done.
remote: Counting objects: 100% (93/93), done.
remote: Compressing objects: 100% (64/64), done.
remote: Total 496 (delta 51), reused 64 (delta 26), pack-reused 403
Receiving objects: 100% (496/496), 242.52 KiB | 0 bytes/s, done.
Resolving deltas: 100% (295/295), done.
[root_at_ip-10-246-148-209 packages]# mkdir -p nvtop/build && cd nvtop/build
[root_at_ip-10-246-148-209 build]# cmake3 ..
-- The C compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/lib64/ccache/cc
-- Check for working C compiler: /usr/lib64/ccache/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Setting build type to 'Release' as none was specified.
-- Found NVML: /usr/local/cuda-10.0/include (found version "10")
-- Looking for cbreak in /usr/lib64/
-- Looking for cbreak in /usr/lib64/ - found
-- Found Curses: /usr/lib64/
-- Performing Test compiler_has-Wall
-- Performing Test compiler_has-Wall - Success
-- Performing Test compiler_has-Wpedantic
-- Performing Test compiler_has-Wpedantic - Success
-- Performing Test compiler_has-Wextra
-- Performing Test compiler_has-Wextra - Success
-- Performing Test compiler_has-Waddress
-- Performing Test compiler_has-Waddress - Success
-- Performing Test compiler_has-Waggressive-loop-optimizations
-- Performing Test compiler_has-Waggressive-loop-optimizations - Success
-- Performing Test compiler_has-Wcast-qual
-- Performing Test compiler_has-Wcast-qual - Success
-- Performing Test compiler_has-Wcast-align
-- Performing Test compiler_has-Wcast-align - Success
-- Performing Test compiler_has-Wbad-function-cast
-- Performing Test compiler_has-Wbad-function-cast - Success
-- Performing Test compiler_has-Wmissing-declarations
-- Performing Test compiler_has-Wmissing-declarations - Success
-- Performing Test compiler_has-Wmissing-parameter-type
-- Performing Test compiler_has-Wmissing-parameter-type - Success
-- Performing Test compiler_has-Wmissing-prototypes
-- Performing Test compiler_has-Wmissing-prototypes - Success
-- Performing Test compiler_has-Wnested-externs
-- Performing Test compiler_has-Wnested-externs - Success
-- Performing Test compiler_has-Wold-style-declaration
-- Performing Test compiler_has-Wold-style-declaration - Success
-- Performing Test compiler_has-Wold-style-definition
-- Performing Test compiler_has-Wold-style-definition - Success
-- Performing Test compiler_has-Wstrict-prototypes
-- Performing Test compiler_has-Wstrict-prototypes - Success
-- Performing Test compiler_has-Wpointer-sign
-- Performing Test compiler_has-Wpointer-sign - Success
-- Performing Test compiler_has-Wdouble-promotion
-- Performing Test compiler_has-Wdouble-promotion - Success
-- Performing Test compiler_has-Wuninitialized
-- Performing Test compiler_has-Wuninitialized - Success
-- Performing Test compiler_has-Winit-self
-- Performing Test compiler_has-Winit-self - Success
-- Performing Test compiler_has-Wstrict-aliasing
-- Performing Test compiler_has-Wstrict-aliasing - Success
-- Performing Test compiler_has-Wsuggest-attribute-const
-- Performing Test compiler_has-Wsuggest-attribute-const - Success
-- Performing Test compiler_has-Wtrampolines
-- Performing Test compiler_has-Wtrampolines - Success
-- Performing Test compiler_has-Wfloat-equal
-- Performing Test compiler_has-Wfloat-equal - Success
-- Performing Test compiler_has-Wshadow
-- Performing Test compiler_has-Wshadow - Success
-- Performing Test compiler_has-Wunsafe-loop-optimizations
-- Performing Test compiler_has-Wunsafe-loop-optimizations - Success
-- Performing Test compiler_has-Wfloat-conversion
-- Performing Test compiler_has-Wfloat-conversion - Failed
-- Performing Test compiler_has-Wlogical-op
-- Performing Test compiler_has-Wlogical-op - Success
-- Performing Test compiler_has-Wnormalized
-- Performing Test compiler_has-Wnormalized - Failed
-- Performing Test compiler_has-Wdisabled-optimization
-- Performing Test compiler_has-Wdisabled-optimization - Success
-- Performing Test compiler_has-Whsa
-- Performing Test compiler_has-Whsa - Failed
-- Performing Test compiler_has-Wconversion
-- Performing Test compiler_has-Wconversion - Success
-- Performing Test compiler_has-Wunused-result
-- Performing Test compiler_has-Wunused-result - Success
-- Performing Test compiler_has-Werror-implicit-function-declaration
-- Performing Test compiler_has-Werror-implicit-function-declaration -
-- Performing Test linker_has-Wl_-z_relro
-- Performing Test linker_has-Wl_-z_relro - Success
-- Performing Test sanitizer-address-available
-- Performing Test sanitizer-address-available - Failed
-- Performing Test sanitizer-undefined-available
-- Performing Test sanitizer-undefined-available - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: /apps/packages/nvtop/build
[root_at_ip-10-246-148-209 build]# pwd
[root_at_ip-10-246-148-209 build]# make
Scanning dependencies of target nvtop
[ 12%] Building C object src/CMakeFiles/nvtop.dir/nvtop.c.o
[ 25%] Building C object src/CMakeFiles/nvtop.dir/interface.c.o
[ 37%] Building C object
[ 50%] Building C object src/CMakeFiles/nvtop.dir/get_process_info_linux.c.o
[ 62%] Building C object src/CMakeFiles/nvtop.dir/extract_gpuinfo.c.o
[ 75%] Building C object src/CMakeFiles/nvtop.dir/time.c.o
[ 87%] Building C object src/CMakeFiles/nvtop.dir/plot.c.o
[100%] Linking C executable nvtop
[100%] Built target nvtop
[root_at_ip-10-246-148-209 build]# make install
[100%] Built target nvtop
Install the project...
-- Install configuration: "Release"
-- Installing: /usr/local/share/man/man1/nvtop.1
-- Installing: /usr/local/bin/nvtop
-- Set runtime path of "/usr/local/bin/nvtop" to "/usr/local/lib"
[root_at_ip-10-246-148-209 build]# which nvtop


NVIDIA commands:

nvidia-smi -q -g 0 -d UTILIZATION -l
nvidia-smi -a -q
nvidia-smi -i 3 -l -q -d

yum install ncurses-devel
cd /apps
mkdir nvtop
cd nvtop
git clone
mkdir -p nvtop/build && cd nvtop/build
/data/apps/build/cmake/cmake-3.4.3-Linux-x86_64/bin/cmake ..
make install



On Thu, Feb 21, 2019 at 1:25 PM Vermaas, Joshua <>

> And the logfile. The top usually has stuff like this (important parts in
> bold) if the GPU is actually being detected and is being used:
> Charm++: standalone mode (not using charmrun)
> Charm++> Running in Multicore mode: 36 threads
> Charm++> Using recursive bisection (scheme 3) for topology aware partitions
> Converse/Charm++ Commit ID:
> v6.8.2-0-g26d4bd8-namd-charm-6.8.2-build-2018-Jan-11-30463
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> Running on 1 unique compute nodes (36-way SMP).
> Charm++> cpu topology info is gathered in 0.002 seconds.
> *Info: Built with CUDA version 9010*
> Did not find +devices i,j,k,... argument, using all
> Pe 0 physical rank 0 will use CUDA device of pe 16
> Pe 8 physical rank 8 will use CUDA device of pe 16
> Pe 24 physical rank 24 will use CUDA device of pe 32
> Pe 30 physical rank 30 will use CUDA device of pe 32
> Pe 23 physical rank 23 will use CUDA device of pe 32
> Pe 25 physical rank 25 will use CUDA device of pe 32
> Pe 7 physical rank 7 will use CUDA device of pe 16
> Pe 14 physical rank 14 will use CUDA device of pe 16
> Pe 9 physical rank 9 will use CUDA device of pe 16
> Pe 27 physical rank 27 will use CUDA device of pe 32
> Pe 1 physical rank 1 will use CUDA device of pe 16
> Pe 13 physical rank 13 will use CUDA device of pe 16
> Pe 20 physical rank 20 will use CUDA device of pe 32
> Pe 19 physical rank 19 will use CUDA device of pe 32
> Pe 15 physical rank 15 will use CUDA device of pe 16
> Pe 17 physical rank 17 will use CUDA device of pe 16
> Pe 4 physical rank 4 will use CUDA device of pe 16
> Pe 26 physical rank 26 will use CUDA device of pe 32
> Pe 5 physical rank 5 will use CUDA device of pe 16
> Pe 31 physical rank 31 will use CUDA device of pe 32
> Pe 21 physical rank 21 will use CUDA device of pe 32
> Pe 28 physical rank 28 will use CUDA device of pe 32
> Pe 22 physical rank 22 will use CUDA device of pe 32
> Pe 3 physical rank 3 will use CUDA device of pe 16
> Pe 29 physical rank 29 will use CUDA device of pe 32
> Pe 34 physical rank 34 will use CUDA device of pe 32
> Pe 6 physical rank 6 will use CUDA device of pe 16
> Pe 18 physical rank 18 will use CUDA device of pe 32
> Pe 2 physical rank 2 will use CUDA device of pe 16
> Pe 10 physical rank 10 will use CUDA device of pe 16
> Pe 33 physical rank 33 will use CUDA device of pe 32
> Pe 35 physical rank 35 will use CUDA device of pe 32
> Pe 11 physical rank 11 will use CUDA device of pe 16
> Pe 12 physical rank 12 will use CUDA device of pe 16
> *Pe 16 physical rank 16 binding to CUDA device 0 on r103u01: 'Tesla
> V100-PCIE-16GB' Mem: 16130MB Rev: 7.0 PCI: 0:37:0*
> *Pe 32 physical rank 32 binding to CUDA device 1 on r103u01: 'Tesla
> V100-PCIE-16GB' Mem: 16130MB Rev: 7.0 PCI: 0:86:0*
> *Info: NAMD 2.13 for Linux-x86_64-multicore-CUDA*
> If these sorts of messages aren't coming up, you'll need to do some
> debugging to actually figure out what's what.
> -Josh
> On 2019-02-21 08:38:14-07:00 wrote:
> Do you have CUDA installed with a requisite NVIDIA driver and are running
> the NAMD CUDA version? If so, what are the outputs of `nvidia-smi`?
> ------------------------------
> *From:* <> on behalf of
> Denish Poudyal <>
> *Sent:* Thursday, February 21, 2019 8:59:29 AM
> *To:* NAMD list
> *Subject:* namd-l: Utilising the GPU in NAMD((NVIDIA CUDA acceleration)
> in windows
> I have a system with NVIDIA's Quadra K420 GPU & 12 CPU cores, and while
> using the cmd like
> namd2 +idlepoll +p10 <.conf file>
> I am still seeing GPU usage around 1 % and CPU usage around 90%. How can I
> employ gpu in this simulation? Obviously, we dont have GTXs so trying to
> use what we've got. What in the code is missing to force Quadro to get
> involved with this NAMD simulation?
> *Denish PoudyalCDPTU, Nepal*

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:31 CST