AW: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 0 (thomasASUS): CUDA driver version is insufficient for CUDA runtime version

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Oct 22 2012 - 01:01:02 CDT

Hi Thomas,

 

as NAMD is only partly ported to GPU, it need to switch between GPU and CPU
at every timestep. To prevent NAMD from doing this, you can use a higher
value for fullelectfrequency, for instance 4, to let NAMD stay at the GPU
for 4 steps, before returning to CPU to do the electrostatic stuff. This
will harm energy conservation and comes with a slight drift in temperature,
but can be controlled with a low damping langevin.

 

Nevertheless, there should be a speedup of about 2-3 times compared to CPU
only without this hack and about 5-10 with. As you got a mobile chipset, you
should check the following things:

 

1. Make sure the GPU is allowed to run in performance rather energy
saving mode. (nvidia-smi)

2. Make sure it's running on PCIE 2 or higher (nvidia-smi)

3. Try comparing the timing of raising numbers of cpus with and
without GPU.

This will show if you oversubscribe the GPU or the PCIE.

4. Are you really sure that your notebook got 8 physical cores??

It doesn't make much sense to oversubscribe the GPU with HT cores.

5. Why do you need to set the +devices?

 

Let us know

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Thomas Evangelidis
Gesendet: Samstag, 20. Oktober 2012 18:47
An: namd-l
Betreff: Re: namd-l: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 0
(thomasASUS): CUDA driver version is insufficient for CUDA runtime version

 

Hi again,

I managed to install the latest NVIDIA drivers (NVIDIA-Linux-x86_64-304.51)
and the latest production CUDA-5.0 release on my AsusN56V with i7-3610QM and
GeForce GT 650M. The trick for NAMD to find my GPU was to explicitly give in
the command line "+devices 0". The whole command line looked like this:

${NAMD_HOME}/charmrun ++local +p8 ${NAMD_HOME}/namd2 +idlepoll +devices 0
production_default.amberff.octahedron.namd

I used the precompiled binaries NAMD_CVS-2012-09-22_Linux-x86_64-multicore
and NAMD_CVS-2012-09-22_Linux-x86_64-multicore-CUDA to monitor the
performance on my system, which is a truncated octahedron with a protein
(the ff I use is Amber99SB-NMR1-ILDN), 131788 TIP4P-Ew water atoms (32947
waters; each TIP4P-Ew counts 4 atoms in the Amber .prmtop), 93 Na and 113 Cl
ions, namely 131788+93+113+2796=134790 atoms in total. Surprisingly the
performance without the GPU is better as you can see below.

With the GPU:
Info: Benchmark time: 8 CPUs 0.238132 s/step 1.37808 days/ns 359.961 MB
memory

Without the GPU:
Info: Benchmark time: 8 CPUs 0.206626 s/step 1.19575 days/ns 720.852 MB
memory

The only case I get better performance with the GPU is when I run NAMD in
serial mode:

With the GPU:
Info: Benchmark time: 1 CPUs 0.26001 s/step 1.50469 days/ns 256.984 MB
memory

Without the GPU:
Info: Benchmark time: 1 CPUs 0.808154 s/step 4.67682 days/ns 504.398 MB
memory

For the apo1a benchmark, NAMD complained about "++local" so I used the
following command line:

${NAMD_HOME}//charmrun +p8 ${NAMD_HOME}//namd2 +idlepoll +devices 0
apoa1.namd

This time the performance was almost the same with and without the GPU:

With the GPU:
Info: Benchmark time: 8 CPUs 0.22935 s/step 2.65451 days/ns 280.375 MB
memory

Without the GPU:
Info: Benchmark time: 8 CPUs 0.223781 s/step 2.59006 days/ns 696.184 MB
memory

Is there any parameter I can tweak to get better GPU performance for my
system??? Below is the GPU assignment when I run on all available cores.

Pe 7 physical rank 7 will use CUDA device of pe 4
Pe 2 physical rank 2 will use CUDA device of pe 4
Pe 3 physical rank 3 will use CUDA device of pe 4
Pe 6 physical rank 6 will use CUDA device of pe 4
Pe 1 physical rank 1 will use CUDA device of pe 4
Pe 5 physical rank 5 will use CUDA device of pe 4
Pe 4 physical rank 4 binding to CUDA device 0 on thomasASUS: 'GeForce GT
650M' Mem: 2047MB Rev: 3.0
Pe 0 physical rank 0 will use CUDA device of pe 4

Thanks,
Thomas

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:22:11 CST