NAMD-2.12 CUDA2 and PMECUDA problems

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Mon Aug 28 2017 - 05:54:58 CDT

Dear experts,

for some unknown reason we are not able here at our site to benefit
from the extensive speedup namd-2.12-cuda offeres cause of random
stability issues and other errors. There seems to be a relation to
systems containing lots of vacuum. We are using amber input files.

Using 1 process with 10 threads and 2 K20 GPUs is very fast, but for
some systems crashes immediately:

FATAL ERROR: CUDA error cudaStreamSynchronize(stream) in file
src/CudaTileListKernel.cu, function sortTileLists
 on Pe 8 (gpu5 device 1): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaGetLastError() in file
src/CudaTileListKernel.cu, function sortTileLists
 on Pe 4 (gpu5 device 0): an illegal memory access was encountered
FATAL ERROR: CUDA error cudaMallocHost(&p, size) in file
src/ComputePmeCUDAMgr.C, function alloc_
 on Pe 7 (gpu5 device 1): an illegal memory access was encountered
[8] Stack Traceback:
  [8:0] CmiAbortHelper+0x6b  [0xe0a0fb]
  [8:1] _Z8NAMD_diePKc+0x74  [0x6d1414]
  [8:2] _Z7cudaDiePKc9cudaError+0x9f  [0x9a382f]
  [8:3]
_ZN18CudaTileListKernel13sortTileListsEbibiiNS_7PtrSizeI8TileListEENS0_IiEENS0_IjEES3_NS0_I15PatchPairRecordEENS0_I8TileExclEEiiS2_S3_S4_S3_S6_S8_P11CUstream_st+0x1bbe 
[0xbbe8de]
  [8:4]
_ZN18CudaTileListKernel15reSortTileListsEbP11CUstream_st+0x1c2 
[0xbc0a82]
  [8:5] _ZN20CudaComputeNonbonded15reSortTileListsEv+0x81 
[0x9709d1]
  [8:6] _ZN20CudaComputeNonbonded14forceDoneCheckEPvd+0x137 
[0x970c37]
  [8:7] CcdCallBacks+0x1d4  [0xe14d94]
  [8:8] CsdScheduleForever+0xf7  [0xe0e5d7]
  [8:9] CsdScheduler+0x2d  [0xe0e7dd]
  [8:10]   [0xe0c6f6]
  [8:11]   [0xe0cda5]
  [8:12] +0x80db  [0x7ff65cc230db]
  [8:13] clone+0x6d  [0x7ff65b81fe3d]
[4] Stack Traceback:
  [4:0] CmiAbortHelper+0x6b  [0xe0a0fb]
  [4:1] _Z8NAMD_diePKc+0x74  [0x6d1414]
  [4:2] _Z7cudaDiePKc9cudaError+0x9f  [0x9a382f]
  [4:3]
_ZN18CudaTileListKernel13sortTileListsEbibiiNS_7PtrSizeI8TileListEENS0_IiEENS0_IjEES3_NS0_I15PatchPairRecordEENS0_I8TileExclEEiiS2_S3_S4_S3_S6_S8_P11CUstream_st+0x2989 
[0xbbf6a9]
  [4:4]
_ZN18CudaTileListKernel15reSortTileListsEbP11CUstream_st+0x1c2 
[0xbc0a82]
  [4:5] _ZN20CudaComputeNonbonded15reSortTileListsEv+0x81 
[0x9709d1]
  [4:6] _ZN20CudaComputeNonbonded14forceDoneCheckEPvd+0x137 
[0x970c37]
  [4:7] CcdCallBacks+0x1d4  [0xe14d94]
  [4:8] CsdScheduleForever+0xf7  [0xe0e5d7]
  [4:9] CsdScheduler+0x2d  [0xe0e7dd]
  [4:10]   [0xe0c6f6]
  [4:11]   [0xe0cda5]
  [4:12] +0x80db  [0x7ff65cc230db]
  [4:13] clone+0x6d  [0x7ff65b81fe3d]
[7] Stack Traceback:
  [7:0] CmiAbortHelper+0x6b  [0xe0a0fb]
  [7:1] _Z8NAMD_diePKc+0x74  [0x6d1414]
  [7:2] _Z7cudaDiePKc9cudaError+0x9f  [0x9a382f]
  [7:3] _ZN18CudaPmeAtomStorage6alloc_Ei+0x46  [0x91cf26]
  [7:4] _ZN20ComputePmeCUDADevice9recvAtomsEP10PmeAtomMsg+0x859 
[0x91c759]
  [7:5] _ZN14ComputePmeCUDA9sendAtomsEv+0x6ff  [0x90262f]
  [7:6]
_ZN19CkIndex_WorkDistrib29_call_enqueuePme_LocalWorkMsgEPvS0_+0xd 
[0xb2465d]
  [7:7] CkDeliverMessageFree+0x22  [0xd24872]
  [7:8] _Z15_processHandlerPvP11CkCoreState+0x192c  [0xd2b5dc]
  [7:9] CsdScheduleForever+0x48  [0xe0e528]
  [7:10] CsdScheduler+0x2d  [0xe0e7dd]
  [7:11]   [0xe0c6f6]
  [7:12]   [0xe0cda5]
  [7:13] +0x80db  [0x7ff65cc230db]
  [7:14] clone+0x6d  [0x7ff65b81fe3d]

Using 2 processes with 5 threads and 2 K20 GPUs with "useCUDA2 no"
runs for a while but slower as using a single process and crashes
later with either RATTLE error or segfault.

I'd really like to help finding the issue with the new CUDA Kernels,
since they are so awesome fast.

Norman Geist

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:35 CST