Re: vmd-l: Re: Fwd: "cuda error cudastreamcreate",

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Fri Jun 17 2011 - 02:36:49 CDT

Hello John:
Thanks for making us the matter clear. My own, was a reply to Brian.

Personally, I have no interest in Mac, and far less in Windows. As a
GNU-Linux Debian amd64 (extended to "contrib" and "non-fee") user, I
never had obstacles in scientific computing as far as the OS is
concerned. When I had some, as recently with NAMD-CUDA, it was my
silly misusing. Actually, the nvidia driver compiles smoothly in amd64
wheezy without the need of external compilation by nvidia software
(thanks to Lennart Sorenson) and graphic cards are easily made visible
(thanks to Axel).

I am working with present means. Should one day hardware become
unavailable for CUDA, we will change. We have invested some 1000 euros
in that, nearly pocket money, while we plan to build a cluster with
infiniband. At any event, that with hardware was never an investment,
I suppose it will be so even with ATI. And for us, GTX 470 was a
start. Perhaps, our main problem is with the motherboards. My feeling
is that the Gigabite GA 890FXAUD5 we use, does not exploit fully the
two GTX 470 installed. More importantly we can become interested in
ATI-OpenCL if NAMD runs faster, or offers wither tools, with that. At
present, NAMD/VMD in CUDA. are unbelievable gifts.

i wish you a nice day
francesco

On Fri, Jun 17, 2011 at 8:22 AM, John Stone <johns_at_ks.uiuc.edu> wrote:
>
> Hi Francesco,
>  The main thing that has prevented OpenCL from being enabled
> by default in production versions of VMD, NAMD, and other tools
> has been lack of stability across the implementations
> by multiple vendors, and lack of compatibility between binaries compiled
> with one vendor's OpenCL SDK with another vendors runtime (something
> we've had working for a decade for OpenGL for graphics...)
> There is hope though, and I've recently heard that the major problems
> were either recently fixed or will be very shortly.
>
> What operating system are you running?  Up to now there have not been
> stable multi-vendor-compatible OpenCL shared libraries for platforms
> other than MacOS X 10.6.  VMD 1.9 does include OpenCL support for one of the
> electrostatics algorithms and for molecular orbital display, but it is
> not turned on in the binaries we distribute due to the issues I describe above
> about cross-vendor compatibility of the shared libraries.
>
> In my testing cross-vendor OpenCL already works fine on MacOS X 10.6,
> so I am currently hoping to release an OpenCL-enabled VMD 1.9.1 for
> MacOS X 10.6.  If cross-vendor OpenCL works properly for Windows
> and/or Linux, I may enable it on those platforms for the next release as well.
>
> The main thing that will determine whether we release VMD with
> OpenCL enabled by default will be how stable it is on the average
> user's machine.  This will be something we get a lot of feedback
> on during the early beta versions.
> We'll have to see how things go in testing.
>
> OpenCL and CUDA are actually quite different in a number of
> important respects.  Also, while the same OpenCL kernel can run
> on devices made by different vendors, it will not necessarily
> be _fast_ on devices made by multiple vendors...  In VMD I had
> to make 3 different versions of the molecular orbital kernels for
> OpenCL in order to get good performance on NVIDIA, ATI, and Intel x86
> CPUs.  They could each run the other kernel versions, but often at
> a fraction of the performance achievable when kernels were written
> with knowledge of the underlying hardware.
>
> Also, while the OpenCL/CUDA kernel code looks very similar for
> trivial examples, OpenCL still lacks many of the advanced
> features of CUDA that were added in the past two years,
> and some things that are more fundamental, e.g. C pointer
> manipulation, support for C++ operator overloading,
> templates, etc.  While one can get by without the advanced features
> of CUDA that OpenCL is missing, they frequently make code development
> a lot easier and are sorely missed when you're used to having them...
>
> Cheers,
>  John Stone
>  vmd_at_ks.uiuc.edu
>
> On Fri, Jun 17, 2011 at 07:50:05AM +0200, Francesco Pietra wrote:
>> Does NAMD parallelization work on OpenCL/ATI GPU? Is VMD supported on
>> OpenCL/ATI GPU?
>> If not, what could I do with ATI GPU?
>> francesco
>>
>> On Fri, Jun 17, 2011 at 4:13 AM, Brian Morris <cymraegish_at_gmail.com> wrote:
>> > Huh ?
>> >
>> > As far as I can see other than for technical details OpenCL and CUDA are the
>> > same, except that OpenCL works for both NVIDIA and ATI/AMD. *I don't mean
>> > OpenGL !!*
>> >
>> > Besides being  cross platform, OpenCL is, well, open. Indeed Microsoft has
>> > also their own idea which only complicates things further.
>> >
>> > For scientic purposes there is strong compelling reason to use OpenCL (given
>> > that development tools are available which they are), that is the
>> > repeatability and reviewability of scientific results. If I have ATI GPU and
>> > you have NVIDIA we cannot share code very well unless we have a standard. If
>> > you are doing scientific research rather than commercial development, it is
>> > self-defeating to be supporting proprietary standards, unless of course your
>> > funding is tied to it in which case well it is bad for science, but a good
>> > reason for supporting open standards is so that researchers are not subject
>> > to this sort of manipulation.
>> >
>> > Personally I have just bought a new rather expensive (for me) new machine
>> > which has ATI GPU, which I intend to use with OpenCL for Machine Learning
>> > research and (open source) code development. There is plenty of support in
>> > code libraries an open source projects examples here to get me started.
>> >
>> >
>> >
>> >
>> > On Thu, Jun 16, 2011 at 7:12 AM, Francesco Pietra <chiendarret_at_gmail.com>
>> > wrote:
>> >>
>> >> I forgot the list.
>> >> f.
>> >>
>> >>
>> >> ---------- Forwarded message ----------
>> >> From: Francesco Pietra <chiendarret_at_gmail.com>
>> >> Date: Thu, Jun 16, 2011 at 4:11 PM
>> >> Subject: Re: Fwd: "cuda error cudastreamcreate",
>> >> To: Brian Morris <cymraegish_at_gmail.com>
>> >>
>> >>
>> >> Oh, no, absolutely no. Where are scientific apencl applications? And
>> >> not only for that.
>> >> f.
>> >>
>> >> On Thu, Jun 16, 2011 at 3:59 AM, Brian Morris <cymraegish_at_gmail.com>
>> >> wrote:
>> >> > Why are you using Cuda rather than OpenCL ? Nvidia has said they are
>> >> > cutting
>> >> > back on their GPU business and moving into CPUs for tablets which are
>> >> > now
>> >> > appearing on the market. If you have to move to AMD/ATI in the future
>> >> > OpenCL
>> >> > will still work, but CUDA will not.
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Jun 15, 2011 at 8:22 AM, Francesco Pietra
>> >> > <chiendarret_at_gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Running "nvidia-smi -L" as root restores the visibility of the graphic
>> >> >> cards. At any boot such visibility vanishes. So, it is a small
>> >> >> problem, or no problem. francesco
>> >> >>
>> >> >>
>> >> >> ---------- Forwarded message ----------
>> >> >> From: Francesco Pietra <chiendarret_at_gmail.com>
>> >> >> Date: Wed, Jun 15, 2011 at 4:37 PM
>> >> >> Subject: Fwd: Fwd: "cuda error cudastreamcreate",
>> >> >> To: Lennart Sorensen <lsorense_at_csclub.uwaterloo.ca>, amd64 Debian
>> >> >> <debian-amd64_at_lists.debian.org>
>> >> >>
>> >> >>
>> >> >> The simulation (pressure equilibration) was completed successfully.
>> >> >> Next run (just a continuation of previous pressure equilibration)
>> >> >> failed, again 'Device Emulation (CPU' , see log file below. Attempted
>> >> >> again, same error.
>> >> >>
>> >> >> # modinfo nvidia
>> >> >> filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
>> >> >> alias:          char-major-195-*
>> >> >> supported:      external
>> >> >> license:        NVIDIA
>> >> >> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> >> >> alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>> >> >> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>> >> >> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>> >> >> depends:        i2c-core
>> >> >> vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
>> >> >> parm:           NVreg_EnableVia4x:int
>> >> >> parm:           NVreg_EnableALiAGP:int
>> >> >> parm:           NVreg_ReqAGPRate:int
>> >> >> parm:           NVreg_EnableAGPSBA:int
>> >> >> parm:           NVreg_EnableAGPFW:int
>> >> >> parm:           NVreg_Mobile:int
>> >> >> parm:           NVreg_ResmanDebugLevel:int
>> >> >> parm:           NVreg_RmLogonRC:int
>> >> >> parm:           NVreg_ModifyDeviceFiles:int
>> >> >> parm:           NVreg_DeviceFileUID:int
>> >> >> parm:           NVreg_DeviceFileGID:int
>> >> >> parm:           NVreg_DeviceFileMode:int
>> >> >> parm:           NVreg_RemapLimit:int
>> >> >> parm:           NVreg_UpdateMemoryTypes:int
>> >> >> parm:           NVreg_InitializeSystemMemoryAllocations:int
>> >> >> parm:           NVreg_UseVBios:int
>> >> >> parm:           NVreg_RMEdgeIntrCheck:int
>> >> >> parm:           NVreg_UsePageAttributeTable:int
>> >> >> parm:           NVreg_EnableMSI:int
>> >> >> parm:           NVreg_MapRegistersEarly:int
>> >> >> parm:           NVreg_RegisterForACPIEvents:int
>> >> >> parm:           NVreg_RegistryDwords:charp
>> >> >> parm:           NVreg_RmMsg:charp
>> >> >> parm:           NVreg_NvAGP:int
>> >> >>
>> >> >> However:
>> >> >>
>> >> >> $ nvidia-smi -L
>> >> >> Could not open device /dev/nvidia1 (no such file)
>> >> >> Failed to initialize NVML: unknown error.
>> >> >>
>> >> >>
>> >> >> I am unable to draw technical conclusions from this 'unknown error'. I
>> >> >> wonder whether other information can be extracted to fix the problems.
>> >> >>
>> >> >> Thanks for advice.
>> >> >>
>> >> >> francesco
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
>> >> >> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
>> >> >> Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
>> >> >> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
>> >> >> Info: CPU topology information available.
>> >> >> Info: Charm++/Converse parallel runtime startup completed at 0.00658393
>> >> >> s
>> >> >> Pe 2 sharing CUDA device 0 first 0 next 3
>> >> >> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
>> >> >> Emulation (CPU)'  Mem: 0MB  Rev: 9999.9999
>> >> >> FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
>> >> >> CUDA-capable device is available
>> >> >>
>> >> >>
>> >> >> ---------- Forwarded message ----------
>> >> >> From: Francesco Pietra <chiendarret_at_gmail.com>
>> >> >> Date: Wed, Jun 15, 2011 at 9:04 AM
>> >> >> Subject: Re: Fwd: "cuda error cudastreamcreate",
>> >> >> To: Fabricio Cannini <fabricio_at_versatushpc.com.br>, Lennart Sorensen
>> >> >> <lsorense_at_csclub.uwaterloo.ca>, amd64 Debian
>> >> >> <debian-amd64_at_lists.debian.org>
>> >> >>
>> >> >>
>> >> >> The "nvidia-smi -L"  output was for a machine of Jim Phillips, the
>> >> >> main developer of NAMD. He provided that to show that it should also
>> >> >> work with my GTX 470 cards.
>> >> >>
>> >> >> That said, my problems seem to have been solved by following Lennart's
>> >> >> indications. The driver was rebuilt, date 15 June, and NAMD simulation
>> >> >> could be started regularly. However, we have to wait before claiming
>> >> >> full victory. Please see below..
>> >> >>
>> >> >> In retrospect, the nvidia.ko I had before, dated 5 June, must have
>> >> >> also been built within Debian. Renaming it no_nvidia.ko prevented
>> >> >> rebuilding for the reasons that Lennart clarified.
>> >> >>
>> >> >> For some reasons, previous installation of nvidia.ko must have had
>> >> >> some problems, as, for example, "nvidia-smi -L" did not work (there
>> >> >> was a single installation of nvidia-smi, "nvidia-smi 270.41.19-1"),
>> >> >> while "modinfo nvidia" output was correct. Now, both are correct:
>> >> >>
>> >> >> $ nvidia-smi -L
>> >> >> GPU 0: GeForce GTX 470 (UUID: N/A)
>> >> >> GPU 1: GeForce GTX 470 (UUID: N/A)
>> >> >>
>> >> >> # modinfo nvidia
>> >> >> filename:       /lib/modules/2.6.38-2-amd64/updates/dkms/nvidia.ko
>> >> >> alias:          char-major-195-*
>> >> >> supported:      external
>> >> >> license:        NVIDIA
>> >> >> alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
>> >> >> alias:          pci:v000010DEd00000AA3sv*sd*bc0Bsc40i00*
>> >> >> alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
>> >> >> alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
>> >> >> depends:        i2c-core
>> >> >> vermagic:       2.6.38-2-amd64 SMP mod_unload modversions
>> >> >> parm:           NVreg_EnableVia4x:int
>> >> >> parm:           NVreg_EnableALiAGP:int
>> >> >> parm:           NVreg_ReqAGPRate:int
>> >> >> parm:           NVreg_EnableAGPSBA:int
>> >> >> parm:           NVreg_EnableAGPFW:int
>> >> >> parm:           NVreg_Mobile:int
>> >> >> parm:           NVreg_ResmanDebugLevel:int
>> >> >> parm:           NVreg_RmLogonRC:int
>> >> >> parm:           NVreg_ModifyDeviceFiles:int
>> >> >> parm:           NVreg_DeviceFileUID:int
>> >> >> parm:           NVreg_DeviceFileGID:int
>> >> >> parm:           NVreg_DeviceFileMode:int
>> >> >> parm:           NVreg_RemapLimit:int
>> >> >> parm:           NVreg_UpdateMemoryTypes:int
>> >> >> parm:           NVreg_InitializeSystemMemoryAllocations:int
>> >> >> parm:           NVreg_UseVBios:int
>> >> >> parm:           NVreg_RMEdgeIntrCheck:int
>> >> >> parm:           NVreg_UsePageAttributeTable:int
>> >> >> parm:           NVreg_EnableMSI:int
>> >> >> parm:           NVreg_MapRegistersEarly:int
>> >> >> parm:           NVreg_RegisterForACPIEvents:int
>> >> >> parm:           NVreg_RegistryDwords:charp
>> >> >> parm:           NVreg_RmMsg:charp
>> >> >> parm:           NVreg_NvAGP:int
>> >> >>
>> >> >>
>> >> >> I said above that time will show if the system is stable. In fact,
>> >> >> this morning, NAMD simulation did not start (I am using the console
>> >> >> memory to recover commands, so that no error of digitizing). I had not
>> >> >> carried out any amd64 upgrade in between. From the simulation log:
>> >> >>
>> >> >>
>> >> >> Info: Charm++/Converse parallel runtime startup completed at 0.00989103
>> >> >> s
>> >> >> Pe 2 sharing CUDA device 0 first 0 next 3
>> >> >> Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'Device
>> >> >> Emulation (CPU)'  Mem: 0MB  Rev: 9999.9999
>> >> >> FATAL ERROR: CUDA error cudaStreamCreate on Pe 2 (gig64 device 0): no
>> >> >> CUDA-capable device is available
>> >> >>
>> >> >> 'Device Emulation (CPU)' indicates (for some to me unclear reasons)
>> >> >> that things have gone bad.
>> >> >>
>> >> >> On a second identical attempt (after having explored the driver
>> >> >> location and carried out info commands), NAMD simulation started, with
>> >> >> the correct log output:
>> >> >>
>> >> >> Info: Based on Charm++/Converse 60303 for net-linux-x86_64-iccstatic
>> >> >> Info: Built Sat Jun 4 02:22:51 CDT 2011 by jim on lisboa.ks.uiuc.edu
>> >> >> Info: 1 NAMD  CVS-2011-06-04  Linux-x86_64-CUDA  6    gig64  francesco
>> >> >> Info: Running on 6 processors, 6 nodes, 1 physical nodes.
>> >> >> Info: CPU topology information available.
>> >> >> Info: Charm++/Converse parallel runtime startup completed at 0.00650811
>> >> >> s
>> >> >>
>> >> >>
>> >> >> We will see if failure/success will be presented again (now a
>> >> >> simulation lasts several hours (which would be days on a 8 processor
>> >> >> machine). If failure will occur again, there are so many possible
>> >> >> reasons, including problems with the NAMD code.
>> >> >>
>> >> >> I was so discomforted yesterday to allude to a change of driver
>> >> >> source. Which was unfair.
>> >> >>
>> >> >> Thanks a lot
>> >> >> francesco
>> >> >>
>> >> >> On Wed, Jun 15, 2011 at 2:22 AM, Fabricio Cannini
>> >> >> <fabricio_at_versatushpc.com.br> wrote:
>> >> >> > Em terça-feira 14 junho 2011, às 16:01:57, Lennart Sorensen escreveu:
>> >> >> >> On Tue, Jun 14, 2011 at 07:23:38PM +0200, Francesco Pietra wrote:
>> >> >> >> > I forgot to answer: yes, sometime it works, sometimes not,
>> >> >> >> > everything
>> >> >> >> > being the same.
>> >> >> >> >
>> >> >> >> > As a matter of fact, after a day of failure, I have now renamed
>> >> >> >> > back
>> >> >> >> >
>> >> >> >> > /lib/modules/2.638-2-amd64/updatesdkms/no_nvidia.ko
>> >> >> >> >
>> >> >> >> > to
>> >> >> >> >
>> >> >> >> > /lib/modules/2.638-2-amd64/updatesdkms/nvidia.ko
>> >> >> >> >
>> >> >> >> > and the NAMD simulation started regularly using both gtx 470. The
>> >> >> >> > machine had not been touched either.
>> >> >> >>
>> >> >> >> I wonder if having the 9800 card in there along with the 470 gtx
>> >> >> >> cards
>> >> >> >> is confusing the driver.  Maybe the card order is getting swapped
>> >> >> >> around
>> >> >> >> on some boots.
>> >> >> >>
>> >> >> >> What is the 9800 doing in the box anyhow?
>> >> >> >
>> >> >> > Hi All.
>> >> >> >
>> >> >> > I'm thinking the same as Lennart. It seems to me that the order which
>> >> >> > the
>> >> >> > cards are named varies, thus confusing the application( s ). I'd try
>> >> >> > to
>> >> >> > fix the
>> >> >> > order in /etc/X11/xorg.conf and see if it works. Look in the cuda
>> >> >> > docs
>> >> >> > how to
>> >> >> > do that.
>> >> >> >
>> >> >> > Good luck.
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > To UNSUBSCRIBE, email to debian-amd64-REQUEST_at_lists.debian.org
>> >> >> > with a subject of "unsubscribe". Trouble? Contact
>> >> >> > listmaster_at_lists.debian.org
>> >> >> > Archive:
>> >> >> > http://lists.debian.org/201106142122.04376.fcannini@gmail.com
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >> --
>> >> >> To UNSUBSCRIBE, email to debian-amd64-REQUEST_at_lists.debian.org
>> >> >> with a subject of "unsubscribe". Trouble? Contact
>> >> >> listmaster_at_lists.debian.org
>> >> >> Archive:
>> >> >>
>> >> >> http://lists.debian.org/BANLkTimUuPNrKwcjy_2SyMwLDS4A1nCbXA@mail.gmail.com
>> >> >>
>> >> >
>> >> >
>> >
>> >
>
> --
> NIH Resource for Macromolecular Modeling and Bioinformatics
> Beckman Institute for Advanced Science and Technology
> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> http://www.ks.uiuc.edu/~johns/           Phone: 217-244-3349
> http://www.ks.uiuc.edu/Research/vmd/       Fax: 217-244-6078
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:20:28 CST