Fwd: CUDA error in cuda_check_remote_progress on Pe 2

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Sat Apr 06 2013 - 04:49:21 CDT

Solved by thinking better: 2,000,000 steps with ts=0.01fs, while setting
"no" for both useFlexibleCell and useConstantArea (this metalloprotein is
surrounded by water, no membrane involved). Then, equilibration is going at
the final desired ts=1fs.
fp

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Fri, Apr 5, 2013 at 5:04 PM
Subject: Fwd: namd-l: CUDA error in cuda_check_remote_progress on Pe 2
To: NAMD <namd-l_at_ks.uiuc.edu>

Sorry, I also forgot to show how the system works:

Info: Charm++/Converse parallel runtime startup completed at 0.00402617 s
Pe 4 physical rank 4 binding to CUDA device 1 on gig64: 'GeForce GTX 680'
Mem: 2047MB Rev: 3.0
Pe 1 physical rank 1 will use CUDA device of pe 2
Pe 2 physical rank 2 binding to CUDA device 0 on gig64: 'GeForce GTX 680'
Mem: 2047MB Rev: 3.0
Pe 5 physical rank 5 will use CUDA device of pe 4
Pe 3 physical rank 3 will use CUDA device of pe 4
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 will use CUDA device of pe 2

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Fri, Apr 5, 2013 at 4:58 PM
Subject: Fwd: namd-l: CUDA error in cuda_check_remote_progress on Pe 2
To: NAMD <namd-l_at_ks.uiuc.edu>

Forgot to add that my namd-cuda works finely with other metalloproteins,
same metal center, with ts=1fs (I never used and ts longer than that)
fp

---------- Forwarded message ----------
From: Francesco Pietra <chiendarret_at_gmail.com>
Date: Fri, Apr 5, 2013 at 4:55 PM
Subject: namd-l: CUDA error in cuda_check_remote_progress on Pe 2
To: NAMD <namd-l_at_ks.uiuc.edu>

Hello:

Any guess from the following error on running namd-2.9_cuda4.0 (two
GTX-680) MD on a sensitive metalloprotein?

WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 107200
WRITING COORDINATES TO RESTART FILE AT STEP 107200
FINISHED WRITING RESTART COORDINATES
WRITING VELOCITIES TO RESTART FILE AT STEP 107200
FINISHED WRITING RESTART VELOCITIES
WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 107300
WRITING COORDINATES TO RESTART FILE AT STEP 107300
FINISHED WRITING RESTART COORDINATES
WRITING VELOCITIES TO RESTART FILE AT STEP 107300
FINISHED WRITING RESTART VELOCITIES
FATAL ERROR: CUDA error in cuda_check_remote_progress on Pe 2 (gig64 device
0): unspecified launch failure
------------- Processor 2 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cuda_check_remote_progress on Pe 2
(gig64 device 0): unspecified launch failure

Charm++ fatal error:
FATAL ERROR: CUDA error in cuda_check_remote_progress on Pe 2 (gig64 device
0): unspecified launch failure

Used ts=0.01fs. In contrast, with non-cuda namd-2.9, no problems even with
ts=1fs (in all cases all bonds, except water, free vibrating). Margin
enlarged to no avail. Even for minimization I had to use non-cuda code.
With CPU only, my hardware is too limited.

I understand that the error output is cryptic enough. Just a try.

thanks

francesco pietra

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:06 CST