Re: NAMD CUDA Error

From: Mcguire, Kelly (klmcguire_at_UCSD.EDU)
Date: Sat Mar 12 2022 - 19:29:13 CST

Actually, the full error is:

FATAL ERROR: CUDA error cudaMemsetAsync(data, 0, sizeofT*ndata, stream) in file src/CudaUtils.C, function clear_device_array_async_T, line 52
 on Pe 16 (gpu-12 device 1 pci 0:d8:0): invalid argument

[Partition 0][Node 0] End of program

FATAL ERROR: CUDA error cudaGetLastError() in file src/ComputeBondedCUDAKernel.cu, function bondedForce, line 1857
 on Pe 16 (gpu-12 device 1 pci 0:d8:0): invalid argument

________________________________
From: Mcguire, Kelly
Sent: Saturday, March 12, 2022 5:26 PM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: NAMD CUDA Error

Has anyone had this error before? It seems somewhat random (i.e. hard to reproduce), but still happens often enough to be a nuisance. So far, it's happened while using NAMD 2.14 multicore CUDA 8/05/2020 version using nodes that each have either NVIDIA 2x2080ti or 2xTitanX gpus. I'm trying the same jobs on 2.13 right now to see if the same error occurs. Showed the person who manages our Linux cluster, but he hasn't seen this before and is looking into it. In the meantime, I'm checking here too. Thanks!

FATAL ERROR: CUDA error cudaMemsetAsync(data, 0, sizeof*ndata, stream) in file src/CudaUtils.C, function clear_device_array_async_T

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST