Re: FATAL ERROR: CUDA error in cudaGetDeviceCount on Pe 0 (thomasASUS): CUDA driver version is insufficient for CUDA runtime version

From: Thomas Evangelidis (tevang3_at_gmail.com)
Date: Mon Oct 22 2012 - 16:42:57 CDT

Thank you all for your comment! I'll try to address all your questions
below:

@Norman

as NAMD is only partly ported to GPU, it need to switch between GPU and CPU
> at every timestep. To prevent NAMD from doing this, you can use a higher
> value for fullelectfrequency, for instance 4, to let NAMD stay at the GPU
> for 4 steps, before returning to CPU to do the electrostatic stuff. This
> will harm energy conservation and comes with a slight drift in temperature,
> but can be controlled with a low damping langevin.
>
I set stepsPerCycle 20, nonBondedFreq 2, fullElectFrequency 4, but still
the non-CUDA binary runs faster.
Without the GPU:
Info: Initial time: 4 CPUs 0.127345 s/step 0.736951 days/ns 690.531 MB
memory
With the GPU:
Info: Initial time: 4 CPUs 0.132138 s/step 0.764688 days/ns 325.098 MB
memory

> ****
>
> ** **
>
> Nevertheless, there should be a speedup of about 2-3 times compared to CPU
> only without this hack and about 5-10 with. As you got a mobile chipset,
> you should check the following things:****
>
> ** **
>
> **1. **Make sure the GPU is allowed to run in performance rather
> energy saving mode. (nvidia-smi)
>
I did: nvidia-smi -i 0 -c 3
         nvidia-smi -i 0 -pm 1
Unfortunately my GPU does not support performance monitoring with
nvidia-smi neither setting GPU Operation Mode to COMPUTE (--gom=1).

> ****
>
> **2. **Make sure itís running on PCIE 2 or higher (nvidia-smi)
>
>From my graphics card specifications I know it runs on PCI Express 2.0, PCI
Express 3.0, but I cannot get that information from nvidia-smi.

> ****
>
> **3. **Try comparing the timing of raising numbers of cpus with and
> without GPU.****
>
> This will show if you oversubscribe the GPU or the PCIE.
>
The following statistics were measured using the default amber ff
parameters taken from http://ambermd.org/namd/namd_amber.html

Without the GPU:
Info: Initial time: 1 CPUs 0.815043 s/step 4.71669 days/ns 496.727 MB memory
Info: Initial time: 2 CPUs 0.433734 s/step 2.51003 days/ns 561.973 MB memory
Info: Initial time: 3 CPUs 0.296255 s/step 1.71444 days/ns 579.684 MB memory
Info: Initial time: 4 CPUs 0.240091 s/step 1.38942 days/ns 685.805 MB memory
Info: Initial time: 5 CPUs 0.301051 s/step 1.74219 days/ns 802.668 MB memory
Info: Initial time: 6 CPUs 0.261714 s/step 1.51455 days/ns 600.703 MB memory
Info: Initial time: 7 CPUs 0.230685 s/step 1.33498 days/ns 660.172 MB memory
Info: Initial time: 8 CPUs 0.234672 s/step 1.35805 days/ns 721.027 MB memory

With the GPU:
Info: Initial time: 1 CPUs 0.259564 s/step 1.50211 days/ns 275.074 MB memory
Info: Initial time: 2 CPUs 0.242015 s/step 1.40055 days/ns 304.035 MB memory
Info: Initial time: 3 CPUs 0.240104 s/step 1.38949 days/ns 308.801 MB memory
Info: Initial time: 4 CPUs 0.236633 s/step 1.36941 days/ns 332.348 MB memory
Info: Initial time: 5 CPUs 0.241345 s/step 1.39667 days/ns 338.609 MB memory
Info: Initial time: 6 CPUs 0.239742 s/step 1.3874 days/ns 342.359 MB memory
Info: Initial time: 7 CPUs 0.236587 s/step 1.36914 days/ns 372.566 MB memory
Info: Initial time: 8 CPUs 0.241 s/step 1.39468 days/ns 367.969 MB memory

> ****
>
> **4. **Are you really sure that your notebook got 8 physical cores??
> ****
>
> It doesnít make much sense to oversubscribe the GPU with HT cores.
>
i7 processors have 4 physical cores which are split into 8 threads.

> ****
>
> **5. **Why do you need to set the +devices?
>
Because otherwise I get:

FATAL ERROR: CUDA error on Pe 1 (thomasASUS device 0): All CUDA devices are
in prohibited mode, of compute capability 1.0, or otherwise unusable.

Possibly this has to do with Optimus technology, NAMD finds just the Intel
on-board graphics card.

@Aron

Thomas, I strongly second what Norman says about the hyper-threading,
> telling NAMD to use hyperthreads is extremely punishing to performance.
> test +p2, +p3, and +p4, one should work fairly well.
>

Is there a way to disable hyperthreading apart from just using +p4 or less?

> Is there a really good reason for using the newest CUDA release, rather
> than 4.x? NAMD comes with it's own cuda library so maybe it doesn't
> matter, but still, it wasn't made for 5.0.
>
> It has better support for Kepler architecture.

> On the same note, it looks like you installed the latest non-development
> drivers, you might want to instead install the latest development (295 or
> something)
>
>
I installed the latest available, 304.51.

> Do you need to use charmrun? You should download the binaries for
> NAMD_2.9_Linux-x86_64-
> multicore-CUDA, and then you should just be able to run: namd2 +p n
> +idlepoll myconfig.namd
>

Apparently not. I discovered it later.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:41 CST