AW: malloc memory error using CUDA devices

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Nov 01 2012 - 02:01:50 CDT

Hi,

as you are not able to run namd long enough to see the memory consumption on
the GPU, how do you know that you are not running out of memory? To verify
this, you could try a even smaller test system to see if it's working then.
Also what is also very interesting, namd should output the name of the GPU
device, does it show GTX470, maybe it uses the wrong device for some reason,
try device 0 also. You could also try with less processors, I'm not sure but
the childs could need additional resources maybe.

If this doesn't solve it, we can dig deeper.

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von Blake Mertz
> Gesendet: Mittwoch, 31. Oktober 2012 13:25
> An: namd-l_at_ks.uiuc.edu
> Betreff: namd-l: malloc memory error using CUDA devices
>
> Hello,
>
> I've been attempting to troubleshoot this problem, with no progress so
> far, and was hoping someone had run into this before. I'm using a
> pre-compiled NAMD2.9 x86_64 multicore CUDA build on a debian 6.0
> machine, using debian's nvidia drivers (304.18) and cuda libraries.
> I've verified that these drivers will work using namd on another
> similar build, so I know the drivers are not the issue.
>
> While attempting to run the apoa1 benchmark, I get the following error
> after specifying using my GPU card:
>
> namd2 +idlepoll +p3 +devices 1 apoa1.namd
>
> NAMD will start up, but after startup phase I get this:
>
> Pe 2 has 72 local and 72 remote patches and 1944 local and 1944 remote
> computes.
> FATAL ERROR: CUDA error malloc everything on Pe 2 (NEI-GPU device 1):
> out of memory
> ------------- Processor 2 Exiting: Called CmiAbort ------------
> Reason: FATAL ERROR: CUDA error malloc everything on Pe 2 (NEI-GPU
> device 1): out of memory
>
> And here is my output from nvidia-smi:
>
> Wed Oct 31 08:24:29 2012
> +------------------------------------------------------+
> | NVIDIA-SMI 4.304.48 Driver Version: 304.48 |
> |-------------------------------+----------------------+---------------
> -------+
> | GPU Name | Bus-Id Disp. | Volatile
> Uncorr. ECC |
> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util
> Compute M. |
> |===============================+======================+===============
> =======|
> | 0 nForce 980a/780a SLI | 0000:02:00.0 N/A |
> N/A |
> | N/A 63C N/A N/A / N/A | 35% 43MB / 125MB | N/A
> Default |
> +-------------------------------+----------------------+---------------
> -------+
> | 1 GeForce GTX 470 | 0000:05:00.0 N/A |
> N/A |
> | 40% 32C N/A N/A / N/A | 0% 4MB / 1279MB | N/A
> Default |
> +-------------------------------+----------------------+---------------
> -------+
>
> +----------------------------------------------------------------------
> -------+
> | Compute processes: GPU
> Memory |
> | GPU PID Process name
> Usage |
> |======================================================================
> =======|
> | 0 Not Supported
> |
> | 1 Not Supported
> |
> +----------------------------------------------------------------------
> -------+
>
> So I know I shouldn't be running out of memory on the GTX470, since
> I'm only using 4MB out of 1.2GB. I've had the same error when using a
> GTX680 instead of the 470. If anyone has some help with this issue, it
> would be greatly appreciated. Thanks.
>
> Blake
> --
> Assistant Professor
> C. Eugene Bennett Department of Chemistry
> (304) 293-9166
>
> "Life is not easy for any of us. But what of that? We must have
> perseverance and above all confidence in ourselves. We must believe
> that we are gifted for something and that this thing must be
> attained." Marie Curie
>
> "Start by doing what's necessary; then do what's possible; and
> suddenly you are doing the impossible." St. Francis of Assissi

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:42 CST