From: Vlad Cojocaru (vlad.cojocaru_at_mpi-muenster.mpg.de)
Date: Fri Oct 07 2011 - 07:50:19 CDT

John,

Thanks a lot for the answer. This is what I am thinking as well.
Cleaning doesn't help.
However, if I install a Q 4000 from another workstation, until now
everything seems stable and the problem does not occur.
Also, when I put the QFX 3800 in the other workstation, the problem is
the same on this machine as well (below is the error message from dmesg).

OS and driver version are the same on the 2 machines, only the CPU
differs (mine has AMD, the other has Intel)
Therefore, everything points to a hardware issue ...

However, the biggest problem is that it seems I cannot reproduce this
issue in Windows.... Therefore the manufacturer (HP) refuses me a
graphics card replacement (they don't support Linux) ...

So, I would like to ask other VMD users that use the QFX 3800 if they
have had any problems with VMD. Please write me on private ... No need
for the list to have all these messages. I will sent a summary mail at
the end if a solution/conclusion comes out...If you noticed anything
unusual (seg faults, lagging xorg) that repeats itself and appears to be
graphics card dependent, I would appreciate if you could let me know

I know this is probably not a VMD related issue and I apologize if you
feel I am abusing the list with topics that might be irrelevant ...

However, I don't have neither the time and the nerves to start opening
accounts on other forums (NVIDIA or others) ... And my problems appear
while using VMD so I am just asking if anybody else experienced
something similar

Best,
Vlad

P.S. I wouldn't advise anybody (at least in Germany) to buy HP
workstations. Their technical support is so incredible bad and within 1
year of using my workstation I had so many problems with the machine. I
spent at least 2-3 weeks (effective time) on resolving myself hardware
issues even with an on-site support contract. It's just unbelievable.
Sorry for my frustration ...

--------- the errors
Oct 6 10:43:00 kratos kernel: [ 204.188177] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:43:02 kratos kernel: [ 206.188092] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:43:24 kratos kernel: [ 228.232161] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:43:26 kratos kernel: [ 230.232081] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:43:44 kratos kernel: [ 248.268094] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:43:46 kratos kernel: [ 250.268045] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:44:04 kratos kernel: [ 268.302101] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:44:06 kratos kernel: [ 270.302051] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:44:36 kratos kernel: [ 300.362110] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:44:38 kratos kernel: [ 302.362060] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:45:00 kratos kernel: [ 324.406251] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:45:02 kratos kernel: [ 326.406165] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:45:28 kratos kernel: [ 352.458249] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:45:30 kratos kernel: [ 354.458167] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:45:48 kratos kernel: [ 372.494210] NVRM: Xid (0000:18:00): 8,
Channel 00000005
Oct 6 10:45:50 kratos kernel: [ 374.494137] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:46:12 kratos kernel: [ 396.534145] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:46:36 kratos kernel: [ 418.574072] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:47:19 kratos kernel: [ 463.660178] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:47:34 kratos kernel: [ 478.686137] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context
Oct 6 10:48:17 kratos kernel: [ 521.768150] NVRM: os_schedule:
Attempted to yield the CPU while in atomic or interrupt context

On 10/05/2011 04:36 PM, John Stone wrote:
> Vlad,
> This is most likely a GPU driver or hardware problem since you
> have had cases where the entire machine locks up and you're unable
> to access it remotely. What happens if you install a different GPU,
> or swap the GPUs between the machine that has been stable with the
> one that has been unstable. If the problem moves with the GPU, then
> you most likely have a GPU that is beginning to fail. Have you inspected
> the GPU to determine if it is being cooled properly? You might blow
> the dust out of it using canned air to see if it makes any difference.
> We dust out our machines every two or three years as we've previously
> run into problems caused by poor airflow through the CPU or GPU heat sinks.
> See if either swapping the GPUs or doing some cleaning helps at all.
>
> Cheers,
> John Stone
> vmd_at_ks.uiuc.edu
>
> On Wed, Oct 05, 2011 at 01:47:08PM +0200, Vlad Cojocaru wrote:
>> Dear VMD users,
>>
>> I have some strange problems when running VMD on my workstation. There
>> are 3 symptoms I noticed since a while already and recently they got worse:
>>
>> 1. VMD fails to start stalling while the message "Creating CUDA device
>> pool and initializing hardware ..." appears on the console
>> 2. VMD stalls causing the freezing of the entire graphics machine upon
>> loading a structure (3000 atoms, nothing special) ... The console is
>> blocked with the message "Determining bond structure from distance search".
>> 3. VMD crashes simply with a segmentation fault.
>>
>> My workstation has 2 AMD, quad-core opterons, 32 GB RAM, NVIDIA Quadro
>> FX 3800 + NVIDIA 3D kit, 2 monitors SAMSUNG SyncMaster, 3D capable. The
>> operating system is openSUSE 11.4, the NVIDIA graphics driver version
>> 280.13 (latest).
>>
>> Tests I performed:
>> 1. Removing the stereo devices (emitter) and starting with no stereo in
>> the xorg.conf does not solve the problem.
>> 2. Removing 1 monitor does not solve the problem
>> 2. The problem is independent of the NVIDIA graphics driver (I tried
>> several of them)
>> 2. The problem remains both if using the executable supplied by VMD or
>> using my own compilation of VMD 1.9 or VMD CVS from 26.09 (CUDA 3.2, GCC
>> 4.5 - default)
>> 3. The problem is not observed on a second machine with exactly the same
>> operating system but with Intel Xeon CPUs and with Quadro 4000 Graphics
>> card (neither when using pre-compiled binaries nor my own compilations)...
>>
>> The worst is that the problem is not always reproducible ... Sometimes,
>> VMD starts normally, sometimes one of the symptoms described appear.
>> Lately, the crashes occur more frequently than before. Sometimes symptom
>> 2 blocks the computer as such that I cannot even restart it from a
>> different machine and I need to shutdown completely ... Sometime, a
>> flickering in the screens is observed (both when stereo is ON of OFF,
>> both when 1 or 2 monitors are connected).
>>
>> My question is: Has somebody observed this behavior before ? Can it be a
>> graphics card issue ?
>>
>> Thanks for any thoughts on this
>>
>> Best wishes
>> Vlad
>>
>> --
>> Dr. Vlad Cojocaru
>> Max Planck Institute for Molecular Biomedicine
>> Department of Cellular and Developmental Biology
>> Roentgenstrasse 20
>> 48149 Muenster, Germany
>> tel: +49-251-70365-324
>> fax: +49-251-70365-399
>> email: vlad.cojocaru[at]mpi-muenster.mpg.de
>>

-- 
Dr. Vlad Cojocaru
Max Planck Institute for Molecular Biomedicine
Department of Cellular and Developmental Biology
Roentgenstrasse 20
48149 Muenster, Germany
tel: +49-251-70365-324
fax: +49-251-70365-399
email: vlad.cojocaru[at]mpi-muenster.mpg.de