NAMD3 Alpha7 MultiGPU errors

From: Raman Preet Singh (ramanpreetsingh_at_hotmail.com)
Date: Sun Mar 07 2021 - 07:01:04 CST

Dear NAMD Community,

I was attempting to use NAMD3 alpha7 MultiGPU version but keep bumping into errors. My system has Ubuntu 20.04 LTS with CUDA 11.2 and three GTX 1050 Ti GPUs (Pascal architecture).

I can run the NAMD conf file NAMD3 alpha8 SingleGPU (with "CUDASOAintegrate on" + "margin 4") as well as NAMD 2.14 (both, "CUDASOAintegrate on" and "margin 4" deleted from conf file) without any errors. However, when trying to run the same input file with multiGPU flavor, I keep bumping into errors. So, I guess this is a problem with the specific NAMD flavor.

When I used 3 GPUs and 3 cores "namd3 +p3 +devices 0,1,2 <NAMD conf file>", I get error as appended below. The same error appears if use "+p2" with different combinations of +devices arguments (0,1 or 0,2 or 1,2).

I was wondering if I am doing it the right way or this has something to do with the old architecture of my GPUs.

Thanks in advance for help.

Regards,
Raman

PS: A few lines (at the end) in the console are:
Info: Startup phase 2 took 0.0116317 s, 0 MB of memory in use
Info: Startup phase 3 took 7.2254e-05 s, 0 MB of memory in use
Info: Startup phase 4 took 0.000177379 s, 0 MB of memory in use
Info: Startup phase 5 took 3.4625e-05 s, 0 MB of memory in use
Info: PATCH GRID IS 9 (PERIODIC) BY 2 (PERIODIC) BY 1 (PERIODIC)
Info: PATCH GRID IS 1-AWAY BY 1-AWAY BY 1-AWAY
Info: Reading from binary file step4_equilibration.vel
Info: REMOVING COM VELOCITY -0.0219164 -0.0463273 -0.043283
Info: LARGEST PATCH (10) HAS 2867 ATOMS
Info: TORUS A SIZE 1 USING 0
Info: TORUS B SIZE 1 USING 0
Info: TORUS C SIZE 1 USING 0
Info: TORUS MINIMAL MESH SIZE IS 1 BY 1 BY 1
Info: Placed 100% of base nodes on same physical node as patch
Info: Startup phase 6 took 0.0113036 s, 0 MB of memory in use
Info: Use 3D box decompostion in PME FFT.
Info: PME using 1 x 1 x 1 pencil grid for FFT and reciprocal sum.
Info: Startup phase 7 took 0.000145019 s, 0 MB of memory in use
Info: Updated CUDA force table with 4096 elements.
Info: Updated CUDA LJ table with 704 x 704 elements.
Info: Updated CUDA force table with 4096 elements.
Info: Updated CUDA LJ table with 704 x 704 elements.
Info: Startup phase 8 took 0.0145335 s, 0 MB of memory in use
Info: Startup phase 9 took 7.8326e-05 s, 0 MB of memory in use
Info: Startup phase 10 took 3.0677e-05 s, 0 MB of memory in use
Info: Startup phase 11 took 0.00250443 s, 0 MB of memory in use
LDB: Central LB being created...
Info: Startup phase 12 took 0.00139309 s, 0 MB of memory in use
Info: CREATING 394 COMPUTE OBJECTS
CUDANBOND[0]: Allocating patch data structure with 12 patches!
CUDANBOND[1]: Allocating patch data structure with 12 patches!
Info: Found 564 unique exclusion lists needing 21824 bytes
Info: Found 564 unique exclusion lists needing 21824 bytes
Info: useSync: 0 useProxySync: 0
Info: Startup phase 13 took 0.0858623 s, 0 MB of memory in use
Info: Startup phase 14 took 7.2114e-05 s, 0 MB of memory in use
Info: Startup phase 15 took 3.5726e-05 s, 0 MB of memory in use
Info: Finished startup at 11.7413 s, 0 MB of memory in use

TCL: Running for 25000000 steps
FATAL ERROR: CUDA error cub::DeviceSelect::If(d_temp_storage, temp_storage_bytes, hgi, hgi, d_nHG, natoms, notZero(), stream) in file src/SequencerCUDAKernel.cu, function buildRattleLists, line 4461
 on Pe 0 (raman device 0 pci 0:3:0): invalid device function
FATAL ERROR: CUDA error cub::DeviceSelect::If(d_temp_storage, temp_storage_bytes, hgi, hgi, d_nHG, natoms, notZero(), stream) in file src/SequencerCUDAKernel.cu, function buildRattleLists, line 4461
 on Pe 0 (raman device 0 pci 0:3:0): invalid device function
[Partition 0][Node 0] End of program

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:10 CST