Re: Fwd: "cuda error cudastreamcreate",

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Fri Jun 17 2011 - 00:50:05 CDT

Does NAMD parallelization work on OpenCL/ATI GPU? Is VMD supported on
OpenCL/ATI GPU?
If not, what could I do with ATI GPU?
francesco

On Fri, Jun 17, 2011 at 4:13 AM, Brian Morris <cymraegish_at_gmail.com> wrote:
> Huh ?
>
> As far as I can see other than for technical details OpenCL and CUDA are the
> same, except that OpenCL works for both NVIDIA and ATI/AMD. *I don't mean
> OpenGL !!*
>
> Besides being  cross platform, OpenCL is, well, open. Indeed Microsoft has
> also their own idea which only complicates things further.
>
> For scientic purposes there is strong compelling reason to use OpenCL (given
> that development tools are available which they are), that is the
> repeatability and reviewability of scientific results. If I have ATI GPU and
> you have NVIDIA we cannot share code very well unless we have a standard. If
> you are doing scientific research rather than commercial development, it is
> self-defeating to be supporting proprietary standards, unless of course your
> funding is tied to it in which case well it is bad for science, but a good
> reason for supporting open standards is so that researchers are not subject
> to this sort of manipulation.
>
> Personally I have just bought a new rather expensive (for me) new machine
> which has ATI GPU, which I intend to use with OpenCL for Machine Learning
> research and (open source) code development. There is plenty of support in
> code libraries an open source projects examples here to get me started.
>
>
>
>
> On Thu, Jun 16, 2011 at 7:12 AM, Francesco Pietra <chiendarret_at_gmail.com>
> wrote:
>>
>> I forgot the list.
>> f.
>>
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Thu, Jun 16, 2011 at 4:11 PM
>> Subject: Re: Fwd: "cuda error cudastreamcreate",
>> To: Brian Morris <cymraegish_at_gmail.com>
>>
>>
>> Oh, no, absolutely no. Where are scientific apencl applications? And
>> not only for that.
>> f.
>>
>> On Thu, Jun 16, 2011 at 3:59 AM, Brian Morris <cymraegish_at_gmail.com>
>> wrote:
>> > Why are you using Cuda rather than OpenCL ? Nvidia has said they are
>> > cutting
>> > back on their GPU business and moving into CPUs for tablets which are
>> > now
>> > appearing on the market. If you have to move to AMD/ATI in the future
>> > OpenCL
>> > will still work, but CUDA will not.
>> >
>> >
>> >
>> > On Wed, Jun 15, 2011 at 8:22 AM, Francesco Pietra
>> > <chiendarret_at_gmail.com>
>> > wrote:
>> >>
>> >> Running "nvidia-smi -L" as root restores the visibility of the graphic
>> >> cards. At any boot such visibility vanishes. So, it is a small
>> >> problem, or no problem. francesco
>> >>
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: Francesco Pietra <chiendarret_at_gmail.com>
>> >> Date: Wed, Jun 15, 2011 at 4:37 PM
>> >> Subject: Fwd: Fwd: "cuda error cudastreamcreate",
>> >> To: Lennart Sorensen <lsorense_at_csclub.uwaterloo.ca>, amd64 Debian
>> >> <debian-amd64_at_lists.debian.org>
>> >>
>> >>
>> >> The simulation (pressure equilibration) was completed successfully.
>> >> Next run (just a continuation of previous pressure equilibration)
>> >> failed, again 'Device Emulation (CPU' , see log file below. Attempted
>> >> again, same error.
>> >>
>> >> # modinfo nvidia
>> >> filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
>> >> alias:          char-major-195-*
>> >> supported:      external
>> >> license:        NVIDIA
>> >> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> >> alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>> >> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>> >> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>> >> depends:        i2c-core
>> >> vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
>> >> parm:           NVreg_EnableVia4x:int
>> >> parm:           NVreg_EnableALiAGP:int
>> >> parm:           NVreg_ReqAGPRate:int
>> >> parm:           NVreg_EnableAGPSBA:int
>> >> parm:           NVreg_EnableAGPFW:int
>> >> parm:           NVreg_Mobile:int
>> >> parm:           NVreg_ResmanDebugLevel:int
>> >> parm:           NVreg_RmLogonRC:int
>> >> parm:           NVreg_ModifyDeviceFiles:int
>> >> parm:           NVreg_DeviceFileUID:int
>> >> parm:           NVreg_DeviceFileGID:int
>> >> parm:           NVreg_DeviceFileMode:int
>> >> parm:           NVreg_RemapLimit:int
>> >> parm:           NVreg_UpdateMemoryTypes:int
>> >> parm:           NVreg_InitializeSystemMemoryAllocations:int
>> >> parm:           NVreg_UseVBios:int
>> >> parm:           NVreg_RMEdgeIntrCheck:int
>> >> parm:           NVreg_UsePageAttributeTable:int
>> >> parm:           NVreg_EnableMSI:int
>> >> parm:           NVreg_MapRegistersEarly:int
>> >> parm:           NVreg_RegisterForACPIEvents:int
>> >> parm:           NVreg_RegistryDwords:charp
>> >> parm:           NVreg_RmMsg:charp
>> >> parm:           NVreg_NvAGP:int
>> >>
>> >> However:
>> >>
>> >> $ nvidia-smi -L
>> >> Could not open device /dev/nvidia1 (no such file)
>> >> Failed to initialize NVML: unknown error.
>> >>
>> >>
>> >> I am unable to draw technical conclusions from this 'unknown error'. I
>> >> wonder whether other information can be extracted to fix the problems.
>> >>
>> >> Thanks for advice.
>> >>
>> >> francesco
>> >>
>> >>
>> >>
>> >>
>> >> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
>> >> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
>> >> Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
>> >> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
>> >> Info: CPU topology information available.
>> >> Info: Charm++/Converse parallel runtime startup completed at 0.00658393
>> >> s
>> >> Pe 2 sharing CUDA device 0 first 0 next 3
>> >> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
>> >> Emulation (CPU)'  Mem: 0MB  Rev: 9999.9999
>> >> FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
>> >> CUDA-capable device is available
>> >>
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: Francesco Pietra <chiendarret_at_gmail.com>
>> >> Date: Wed, Jun 15, 2011 at 9:04 AM
>> >> Subject: Re: Fwd: "cuda error cudastreamcreate",
>> >> To: Fabricio Cannini <fabricio_at_versatushpc.com.br>, Lennart Sorensen
>> >> <lsorense_at_csclub.uwaterloo.ca>, amd64 Debian
>> >> <debian-amd64_at_lists.debian.org>
>> >>
>> >>
>> >> The "nvidia-smi -L"  output was for a machine of Jim Phillips, the
>> >> main developer of NAMD. He provided that to show that it should also
>> >> work with my GTX 470 cards.
>> >>
>> >> That said, my problems seem to have been solved by following Lennart's
>> >> indications. The driver was rebuilt, date 15 June, and NAMD simulation
>> >> could be started regularly. However, we have to wait before claiming
>> >> full victory. Please see below..
>> >>
>> >> In retrospect, the nvidia.ko I had before, dated 5 June, must have
>> >> also been built within Debian. Renaming it no_nvidia.ko prevented
>> >> rebuilding for the reasons that Lennart clarified.
>> >>
>> >> For some reasons, previous installation of nvidia.ko must have had
>> >> some problems, as, for example, "nvidia-smi -L" did not work (there
>> >> was a single installation of nvidia-smi, "nvidia-smi 270.41.19-1"),
>> >> while "modinfo nvidia" output was correct. Now, both are correct:
>> >>
>> >> $ nvidia-smi -L
>> >> GPU 0: GeForce GTX 470 (UUID: N/A)
>> >> GPU 1: GeForce GTX 470 (UUID: N/A)
>> >>
>> >> # modinfo nvidia
>> >> filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
>> >> alias:          char-major-195-*
>> >> supported:      external
>> >> license:        NVIDIA
>> >> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> >> alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>> >> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>> >> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>> >> depends:        i2c-core
>> >> vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
>> >> parm:           NVreg_EnableVia4x:int
>> >> parm:           NVreg_EnableALiAGP:int
>> >> parm:           NVreg_ReqAGPRate:int
>> >> parm:           NVreg_EnableAGPSBA:int
>> >> parm:           NVreg_EnableAGPFW:int
>> >> parm:           NVreg_Mobile:int
>> >> parm:           NVreg_ResmanDebugLevel:int
>> >> parm:           NVreg_RmLogonRC:int
>> >> parm:           NVreg_ModifyDeviceFiles:int
>> >> parm:           NVreg_DeviceFileUID:int
>> >> parm:           NVreg_DeviceFileGID:int
>> >> parm:           NVreg_DeviceFileMode:int
>> >> parm:           NVreg_RemapLimit:int
>> >> parm:           NVreg_UpdateMemoryTypes:int
>> >> parm:           NVreg_InitializeSystemMemoryAllocations:int
>> >> parm:           NVreg_UseVBios:int
>> >> parm:           NVreg_RMEdgeIntrCheck:int
>> >> parm:           NVreg_UsePageAttributeTable:int
>> >> parm:           NVreg_EnableMSI:int
>> >> parm:           NVreg_MapRegistersEarly:int
>> >> parm:           NVreg_RegisterForACPIEvents:int
>> >> parm:           NVreg_RegistryDwords:charp
>> >> parm:           NVreg_RmMsg:charp
>> >> parm:           NVreg_NvAGP:int
>> >>
>> >>
>> >> I said above that time will show if the system is stable. In fact,
>> >> this morning, NAMD simulation did not start (I am using the console
>> >> memory to recover commands, so that no error of digitizing). I had not
>> >> carried out any amd64 upgrade in between. From the simulation log:
>> >>
>> >>
>> >> Info: Charm++/Converse parallel runtime startup completed at 0.00989103
>> >> s
>> >> Pe 2 sharing CUDA device 0 first 0 next 3
>> >> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
>> >> Emulation (CPU)'  Mem: 0MB  Rev: 9999.9999
>> >> FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
>> >> CUDA-capable device is available
>> >>
>> >> 'Device Emulation (CPU)' indicates (for some to me unclear reasons)
>> >> that things have gone bad.
>> >>
>> >> On a second identical attempt (after having explored the driver
>> >> location and carried out info commands), NAMD simulation started, with
>> >> the correct log output:
>> >>
>> >> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
>> >> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
>> >> Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
>> >> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
>> >> Info: CPU topology information available.
>> >> Info: Charm++/Converse parallel runtime startup completed at 0.00650811
>> >> s
>> >>
>> >>
>> >> We will see if failure/success will be presented again (now a
>> >> simulation lasts several hours (which would be days on a 8 processor
>> >> machine). If failure will occur again, there are so many possible
>> >> reasons, including problems with the NAMD code.
>> >>
>> >> I was so discomforted yesterday to allude to a change of driver
>> >> source. Which was unfair.
>> >>
>> >> Thanks a lot
>> >> francesco
>> >>
>> >> On Wed, Jun 15, 2011 at 2:22 AM, Fabricio Cannini
>> >> <fabricio_at_versatushpc.com.br> wrote:
>> >> > Em terça-feira 14 junho 2011, às 16:01:57, Lennart Sorensen escreveu:
>> >> >> On Tue, Jun 14, 2011 at 07:23:38PM +0200, Francesco Pietra wrote:
>> >> >> > I forgot to answer: yes, sometime it works, sometimes not,
>> >> >> > everything
>> >> >> > being the same.
>> >> >> >
>> >> >> > As a matter of fact, after a day of failure, I have now renamed
>> >> >> > back
>> >> >> >
>> >> >> > /lib/modules/2.638-2-amd64/updatesdkms/no_nvidia.ko
>> >> >> >
>> >> >> > to
>> >> >> >
>> >> >> > /lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko
>> >> >> >
>> >> >> > and the NAMD simulation started regularly using both gtx 470. The
>> >> >> > machine had not been touched either.
>> >> >>
>> >> >> I wonder if having the 9800 card in there along with the 470 gtx
>> >> >> cards
>> >> >> is confusing the driver.  Maybe the card order is getting swapped
>> >> >> around
>> >> >> on some boots.
>> >> >>
>> >> >> What is the 9800 doing in the box anyhow?
>> >> >
>> >> > Hi All.
>> >> >
>> >> > I'm thinking the same as Lennart. It seems to me that the order which
>> >> > the
>> >> > cards are named varies, thus confusing the application( s ). I'd try
>> >> > to
>> >> > fix the
>> >> > order in /etc/X11/xorg.conf and see if it works. Look in the cuda
>> >> > docs
>> >> > how to
>> >> > do that.
>> >> >
>> >> > Good luck.
>> >> >
>> >> >
>> >> > --
>> >> > To UNSUBSCRIBE, email to debian-amd64-REQUEST_at_lists.debian.org
>> >> > with a subject of "unsubscribe". Trouble? Contact
>> >> > listmaster_at_lists.debian.org
>> >> > Archive:
>> >> > http://lists.debian.org/201106142122.04376.fcannini@gmail.com
>> >> >
>> >> >
>> >>
>> >>
>> >> --
>> >> To UNSUBSCRIBE, email to debian-amd64-REQUEST_at_lists.debian.org
>> >> with a subject of "unsubscribe". Trouble? Contact
>> >> listmaster_at_lists.debian.org
>> >> Archive:
>> >>
>> >> http://lists.debian.org/BANLkTimUuPNrKwcjy_2SyMwLDS4A1nCbXA@mail.gmail.com
>> >>
>> >
>> >
>
>

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:24:06 CST