From: Robert Wohlhueter (bobwohlhueter_at_earthlink.net)
Date: Wed Nov 27 2013 - 09:12:57 CST

John, Josh,

Thanks, all, for your tips. It appears to be almost certainly a cuda
problem. As cuda test, I simply used namd (specifically
NAMD_CVS-2013-07-06_Linux-x86_64-multicore-CUDA/namd2; which worked
before my upgrade to Ubuntu 13.10.) Sourceforge has a utility
"cuda-memtest", which, however, encountered fatal errors on compilation
(function parameter count mismatch. I'll deal with that later. Do any
of you have suggestions as to alternative cuda-tests?)

A couple OpenGL programs seem to work fine on my machine.

The output from `ldd`with vmd (the vmd_LINUXAMD64 binary) is also copied
below; there were no indications of missing libs, and several of
obvious, satisfied references to relevant cuda and GL libs. Do either of
you see anything pathological in that output?

Worst of all is the result of running namd (also copied below). After
the "CharmLB>" message, nothing more was output (after running
overnight), but with all 4 cpus running continuously at 100%! `ps -ef |
grep namd` shows one process for charmrun and four for
NAMD_CVS-2013-07-06_Linux-x86_64-multicore-CUDA/namd2.

As to cuda software, the most I can tell you is that I have installed
essentially all the nvidia-cuda packages that show up in a Synaptic
search for "cuda".

Bob W.

bobw_at_winter-linux: ...vmd/test_cuda [4]> charmrun
/usr/local/lib/namd/NAMD_CVS-2013-07-06_Linux-x86_64-multicore-CUDA/namd2 ++local
+idlepoll +p4 ./2htq_box.config | tee 2htq_box.log
charmrun
/usr/local/lib/namd/NAMD_CVS-2013-07-06_Linux-x86_64-multicore-CUDA/namd2 ++local
+idlepoll +p4 ./2htq_box.config
tee 2htq_box.log
Charmrun> started all node programs in 0.448 seconds.
Converse/Charm++ Commit ID: v6.5.1-rc2-2-g1bd45bf
CharmLB> Load balancer assumes all CPUs are same.

bobw_at_winter-linux: ...vmd/vmd-1.9.1 [9]> ldd vmd_LINUXAMD64
ldd vmd_LINUXAMD64
     linux-vdso.so.1 => (0x00007fff38f76000)
     libGL.so.1 => /usr/lib/nvidia-319/libGL.so.1 (0x00007fe4522cd000)
     libGLU.so.1 => /usr/lib/x86_64-linux-gnu/libGLU.so.1
(0x00007fe45204d000)
     libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6
(0x00007fe451d18000)
     libcudart.so.4 => /usr/local/cuda/lib64/libcudart.so.4
(0x00007fe451aba000)
     libXinerama.so.1 => /usr/lib/x86_64-linux-gnu/libXinerama.so.1
(0x00007fe4518b6000)
     libXi.so.6 => /usr/lib/x86_64-linux-gnu/libXi.so.6 (0x00007fe4516a6000)
     libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007fe451489000)
     libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe451184000)
     libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fe450f80000)
     libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fe450d7d000)
     libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(0x00007fe450a78000)
     libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
(0x00007fe450862000)
     libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe45049a000)
     libnvidia-tls.so.319.32 =>
/usr/lib/nvidia-319/tls/libnvidia-tls.so.319.32 (0x00007fe450296000)
     libnvidia-glcore.so.319.32 =>
/usr/lib/nvidia-319/libnvidia-glcore.so.319.32 (0x00007fe44dd3f000)
     libXext.so.6 => /usr/lib/x86_64-linux-gnu/libXext.so.6
(0x00007fe44db2d000)
     libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1
(0x00007fe44d90e000)
     librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fe44d706000)
     /lib64/ld-linux-x86-64.so.2 (0x00007fe452629000)
     libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6
(0x00007fe44d501000)
     libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6
(0x00007fe44d2fb000)
bobw_at_winter-linux: ...vmd/vmd-1.9.1 [10]>

On 11/26/13 8:08 PM, John Stone wrote:
> Robert, Josh,
> Regarding rlwrap, if it causes you trouble, feel free to disable it...
> VMD doesn't care about rlwrap. This is something that we added to the
> VMD startup script to please users that prefer command interfaces with
> up/down arrow command histories and similar features as one might have
> popular command shells and various GNU tools on Linux. VMD itself doesn't
> know anything about rlwrap and will run just fine without it.
>
> Cheers,
> John Stone
> vmd_at_ks.uiuc.edu
>
> On Tue, Nov 26, 2013 at 06:42:09PM -0600, Josh Vermaas wrote:
>> Hi Robert,
>>
>> Based on when the segfault is occurring, and the general list of things
>> that break on an upgrade, it might just be a version mismatch caused by
>> conflicting versions of the nvidia driver package. This happens to me when
>> nvidia-current gets upgraded, as it will pick up the new driver, but won't
>> get rid of the old ones. One thing I would check is the result of ldd
>> vmd_LINUXAMD64 in /usr/local/lib. On my system, which uses version 319.37,
>> this is what a fraction of it looks like:
>> libGL.so.1 => /usr/lib/nvidia-current/libGL.so.1 (0x00002b4635b53000)
>> libGLU.so.1 => /usr/lib/x86_64-linux-gnu/libGLU.so.1
>> (0x00002b4635e82000)
>> libcudart.so.4 => not found
>> libnvidia-tls.so.319.37 =>
>> /usr/lib/nvidia-current/tls/libnvidia-tls.so.319.37 (0x00002b4637a31000)
>> libnvidia-glcore.so.319.37 =>
>> /usr/lib/nvidia-current/libnvidia-glcore.so.319.37 (0x00002b4637c34000)
>> Libcudart isn't actually needed unless you need one of the commands that
>> uses GPU acceleration, but the other 4 had better resolve, and should all
>> resolve to libraries corresponding to the right version (in my case
>> 319.37). Manually removing old installed versions of the nvidia drivers is
>> how I tend to fix these problems when they come up.
>>
>> In terms of the rlwrap "fun" you've been having, I know this seems like a
>> stupid thing to do, but unless you need rlwrap for something else, the
>> stock VMD distribution actually works better without rlwrap installed, as
>> then the script just complains about a missing rlwrap command rather than
>> a malformatted command that causes an exit and will load vmd. In using
>> this approach, it doesn't look like anything obvious is broken.
>>
>> -Josh Vermaas
>>
>> On 11/26/13, 4:15 PM, Robert Wohlhueter wrote:
>>
>> Using Ubuntu 13.10 on an AMD64 computer with NVIDIA GTX275 and NVIDIA
>> driver 319.32:
>> vmd-1.9.1 binary distribution is broken. The same binary on same the
>> hardware (with NVIDIA 304 driver)
>> under Ubuntu 12.10 worked fine.
>>
>> Using the original installed vmd.csh script, startup seems to hang
>> because of inability to set
>> rlwrap (though in fact the file "vmd_completion.dat" is present):
>>
>> ############################################################################
>> bobw_at_winter-linux: ...lib/vmd [56]> /usr/local/bin/vmd
>> /usr/local/bin/vmd.wrap
>> rlwrap: No match.
>> ############################################################################
>>
>> If I comment out the lines relevant to loading vmd_completion.dat, then
>> run the script, the "rlwrap"-error is avoided, but I get no output at
>> all:
>>
>> ###########################################################################
>> obw_at_winter-linux: ...lib/vmd [59]> /usr/local/bin/vmd.nowrap
>> /usr/local/bin/vmd
>> bobw_at_winter-linux: ...lib/vmd [60]>
>> ###########################################################################
>>
>> But these are probably minor problems. If I by pass the script entirely
>> (but with VMDDIR and
>> MASTERVMDDIR envvars set manually), I get a little further, before
>> dumping core:
>>
>> ############################################################################
>> bobw_at_winter-linux: ...vmd/vmd-1.9.1 [60]>./vmd_LINUXAMD64
>> ./vmd_LINUXAMD64
>> Info) VMD for LINUXAMD64, version 1.9.1 (February 1, 2012)
>> Info) http://www.ks.uiuc.edu/Research/vmd/
>> Info) Email questions and bug reports to vmd_at_ks.uiuc.edu
>> Info) Please include this reference in published work using VMD:
>> Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual
>> Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
>> Info) -------------------------------------------------------------
>> Info) Multithreading available, 4 CPUs detected.
>> Info) Free system memory: 6482MB (81%)
>> Segmentation fault (core dumped)
>> ###########################################################################
>>
>> I would guess the problem lies not with Ubuntu 13.10 per se, but with
>> the
>> change in video driver between 12.10 and 13.10. I'm reluctant to muck
>> around
>> with video drivers, in particular to try to revert to NVIDIA 304.x,
>> since this
>> always breaks a lot of programs. Still my hardward/video driver must be
>> fairly commomplace.
>>
>> Anyone have clues to what's wrong? I'm grateful for any pointers.
>>
>> Bob Wohlhueter