AW: NAMD-2.12 handful of issues with CUDA

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Fri Mar 10 2017 - 05:41:10 CST

Thanks for your interest,

 

yes, since it works with gromacs, cp2k and namd versions < 2.12. Maybe I should also mention that I’m using amber FF and files.

 

Best,

Norman Geist

 

Von: Ajasja Ljubetič [mailto:ajasja.ljubetic_at_gmail.com]
Gesendet: Freitag, 10. März 2017 12:14
An: namd-l <namd-l_at_ks.uiuc.edu>; Norman Geist <norman.geist_at_uni-greifswald.de>
Betreff: Re: namd-l: NAMD-2.12 handful of issues with CUDA

 

Are you sure your graphics card is OK?

Have you tried any of the available memory checkers?
https://www.raymond.cc/blog/having-problems-with-video-card-stress-test-its-memory/

Best,

Ajasja

 

On 10 March 2017 at 11:55, Norman Geist <norman.geist_at_uni-greifswald.de <mailto:norman.geist_at_uni-greifswald.de> > wrote:

3. Randomly also constraint errors occur, some memory uninitialized somewhere?

 

Von: owner-namd-l_at_ks.uiuc.edu <mailto:owner-namd-l_at_ks.uiuc.edu> [mailto:owner-namd-l_at_ks.uiuc.edu <mailto:owner-namd-l_at_ks.uiuc.edu> ] Im Auftrag von Norman Geist
Gesendet: Freitag, 10. März 2017 10:16
An: namd-l_at_ks.uiuc.edu <mailto:namd-l_at_ks.uiuc.edu>
Betreff: namd-l: NAMD-2.12 handful of issues with CUDA

 

Dear experts,

 

somehow I have a lot of problems with the NAMD-2.12 version. All CUDA jobs will:

 

1. Immediately fail for SMP single process runs when having more than 1 thread via ++ppn:

FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists

on Pe 4 (gpu5 device 1): an illegal memory access was encountered

------------- Processor 4 Exiting: Called CmiAbort ------------

Reason: FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file src/CudaTileListKernel.cu, function sortTileLists

on Pe 4 (gpu5 device 1): an illegal memory access was encountered

 

This happens for my own compiled versions (CUDA-7.5) as well as for the precompiled multicore version (CUDA-6.5).

 

2. Fail after a random amount of steps (few ps up to tens of ns) with either segfault or even illegal instruction O_o (MPI + CUDA-7.5 + SMP build)

 

I already upgraded the GPU driver but nothing changed. I remember that I also had Problems with namd-2.11 and GBIS when using CUDA (illegal instruction) just btw.

 

Any hints?

 

Regards

 

Norman Geist

 

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:08 CST