Re: Fwd: nvidia issue with namd12 Debian 11

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Mon Jan 17 2022 - 14:50:14 CST

Hi Josh, no big system:
Info) Analyzing structure ...
Info) Atoms: 107292
Info) Bonds: 77829
Info) Angles: 61441 Dihedrals: 46455 Impropers: 1604 Cross-terms: 158
Info) Bondtypes: 0 Angletypes: 0 Dihedraltypes: 0 Impropertypes: 0
Info) Residues: 31152
Info) Waters: 30102
Info) Segments: 128
Info) Fragments: 30587 Protein: 9 Nucleic: 25

Following your hint, I tried MD with a very small system:

Info) Analyzing structure ...
Info) Atoms: 1448
Info) Bonds: 1187
Info) Angles: 1618 Dihedrals: 699 Impropers: 0 Cross-terms: 0
Info) Bondtypes: 0 Angletypes: 0 Dihedraltypes: 0 Impropertypes: 0
Info) Residues: 261
Info) Waters: 0
Info) Segments: 33
Info) Fragments: 261 Protein: 0 Nucleic: 0

Exactly the same error messages that I reported for the bigger system. So,
it is not a problem of insufficient mem on the GTX.
My very feeble guess is that there is a mismatch between the linux kernel
and the nvidia driver, but they were selected by the Debian code and other
people should have met the issue. I am not sure that Debian 11 could work
correctly with a downgraded couple of linux kernel/nvidia driver. Perhaps
it could easier to downgrade to Debian 10, which worked correctly on my
raid1 box.

thanks
francesco

Incidentally, I said namd12, while it is 14.

On Mon, Jan 17, 2022 at 1:24 PM Vermaas, Josh <vermaasj_at_msu.edu> wrote:

> How big is your system? The error being tossed back is that you are out of
> memory. The GTX 680 only has 2GB of memory, and so depending on your system
> size you may run yourself out of memory.
>
>
>
> -Josh
>
>
>
> *From: *<owner-namd-l_at_ks.uiuc.edu> on behalf of Francesco Pietra <
> chiendarret_at_gmail.com>
> *Reply-To: *"namd-l_at_ks.uiuc.edu" <namd-l_at_ks.uiuc.edu>, Francesco Pietra <
> chiendarret_at_gmail.com>
> *Date: *Monday, January 17, 2022 at 4:40 AM
> *To: *NAMD <namd-l_at_ks.uiuc.edu>, debian-users <
> debian-user_at_lists.debian.org>
> *Subject: *namd-l: Fwd: nvidia issue with namd12 Debian 11
>
>
>
> I forgot to add that commands 'nvidia-detect' and 'nvidia-smi' detect both
> GTX 680 as activated and tells that they are supported by all driver
> versions, including those for Tesla 450.
>
> Actually, legacy nvidia drivers are only required for very old nvidia
> graphic cards, from 400 downwards.
>
>
>
> I alsoo add that the box is at CUDA 11.2
>
>
>
> ---------- Forwarded message ---------
> From: *Francesco Pietra* <chiendarret_at_gmail.com>
> Date: Mon, Jan 17, 2022 at 4:15 AM
> Subject: nvidia issue with namd12 Debian 11
> To: NAMD <namd-l_at_ks.uiuc.edu>, debian-users <debian-user_at_lists.debian.org>
>
>
>
> With a Debian 11 box with two GTX 680 I am unable to get them working. The
> problem occurred with upgrading from debian 10 to 11 and, from namd 11 to
> 12 (/NAMD_Git-2021-11-27_Linux-x86_64-multicore-CUDA)
>
>
>
> nvidia-driver 460.91.03-1
>
> linux-image-amd64 5.10.84-1
>
> linux kernel 5.10.0-10-amd64
>
>
>
> Error when trying a minimization:
>
>
>
> TCL: Minimizing for 3000 steps
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
> on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was
> encountered
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
> on Pe 2 (gig64 device 0 pci 0:2:0): an illegal memory access was
> encountered
> [Partition 0][Node 0] End of program
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
> on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was
> encountered
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists, line 1577
> on Pe 4 (gig64 device 1 pci 0:3:0): an illegal memory access was
> encountered
>
>
>
> I have also reconfigured the xserver, at no avail.
>
>
>
> I have noticed issues about namd12/nvidia on the web, apparently
> unresolved.
>
>
>
> Thanks for advice
>
> francesco pietra
>
>
>
>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST