Re: NAMD-2.12 handful of issues with CUDA

From: Ajasja Ljubetič (ajasja.ljubetic_at_gmail.com)
Date: Fri Mar 10 2017 - 05:14:10 CST

Are you sure your graphics card is OK?
Have you tried any of the available memory checkers?
https://www.raymond.cc/blog/having-problems-with-video-card-stress-test-its-memory/

Best,
Ajasja

On 10 March 2017 at 11:55, Norman Geist <norman.geist_at_uni-greifswald.de>
wrote:

> 3. Randomly also constraint errors occur, some memory uninitialized
> somewhere?
>
>
>
> *Von:* owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] *Im
> Auftrag von *Norman Geist
> *Gesendet:* Freitag, 10. März 2017 10:16
> *An:* namd-l_at_ks.uiuc.edu
> *Betreff:* namd-l: NAMD-2.12 handful of issues with CUDA
>
>
>
> Dear experts,
>
>
>
> somehow I have a lot of problems with the NAMD-2.12 version. All CUDA jobs
> will:
>
>
>
> 1. Immediately fail for SMP single process runs when having more
> than 1 thread via ++ppn:
>
> FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists
>
> on Pe 4 (gpu5 device 1): an illegal memory access was encountered
>
> ------------- Processor 4 Exiting: Called CmiAbort ------------
>
> Reason: FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
> src/CudaTileListKernel.cu, function sortTileLists
>
> on Pe 4 (gpu5 device 1): an illegal memory access was encountered
>
>
>
> This happens for my own compiled versions (CUDA-7.5) as well as for the
> precompiled multicore version (CUDA-6.5).
>
>
>
> 2. Fail after a random amount of steps (few ps up to tens of ns)
> with either segfault or even illegal instruction O_o (MPI + CUDA-7.5 + SMP
> build)
>
>
>
> I already upgraded the GPU driver but nothing changed. I remember that I
> also had Problems with namd-2.11 and GBIS when using CUDA (illegal
> instruction) just btw.
>
>
>
> Any hints?
>
>
>
> Regards
>
>
>
> Norman Geist
>

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:08 CST