Re: AW: CUDA error on Pe 4: device cannot map host memory

From: Vignesh (vignesh757_at_gmail.com)
Date: Wed Jun 13 2012 - 02:16:24 CDT

I actually solved this issue. The problem was that the libcudart.so.4
library that comes with NAMD 2.9 is different than the libcudart.so.4 that
comes with the CUDA library. Because the files were of three same name,
there was a problem reading the correct version. The latter does not work
with multiple gpu on single node, so the one that comes with NAMD must be
linked to the ld_library_path appropriately. once done this works
perfectly.

There was something about this in the manual but it was not clear due to
the same library name.

Best,
Vignesh
On Jun 13, 2012 1:58 AM, "Norman Geist" <norman.geist_at_uni-greifswald.de>
wrote:

> Hi,
>
> as you aren't able to run the device 1 it sounds like is not correctly
> configured. I've never seen this error before. You should check if both
> gpus
> run the same driver version and same settings. Also the permissions to the
> /dev/nvidia devices.
>
> BTW are you using SLI?
> What slots are the gpus plugged in? (recommended PCI-E 2.0 x16 full)
> Is a screen working on this gpu? (to check if the gpu and driver is ok)
>
> Let us know
>
> Norman Geist.
>
> > -----Ursprüngliche Nachricht-----
> > Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> > Auftrag von Vignesh
> > Gesendet: Mittwoch, 13. Juni 2012 01:57
> > An: namd-l_at_ks.uiuc.edu
> > Betreff: namd-l: CUDA error on Pe 4: device cannot map host memory
> >
> > Hello,
> >
> > I am having trouble initiating a namd run on two GPUs (GTX 560). Both
> > my GPUs are recognized and the compute mode for both of them are set
> > to 0/default. I am using NAMD_2.9b1_Linux-x86-multicore-CUDA.
> >
> >
> > The error I get is as follows:
> >
> > Exiting: Called CmiAbort -------
> > Reason: FATAL ERROR: CUDA error on Pe 4 (device 1) : device cannot map
> > host memory
> >
> > Charm++ fatal error:
> > FATAL ERROR: CUDA error on Pe 4 (device 1): device cannot map host
> > memory
> >
> > The command used was:
> > ./charmrun +p4 ./namd2 +idlepoll +devices 0,1 abc.conf > abc.log &
> >
> > If initiate the run on only one GPU (device 0) it works fine but I
> > cannot initiate a run on the other GPU (device 1 i.e.) or both GPUs
> > together. It would be very helpful if someone could help me understand
> > where I am going wrong.
> >
> > Sincerely,
> > Vignesh
> >
> > P.S: The motherboard has 3x6Gb RAM in case someone was wondering.
>
>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:06 CST