From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Apr 17 2014 - 04:50:10 CDT
Seems you got some namd zombies still hanging on the gpus producing the
"launch failure". Try "pkill -9 namd2" to remove them or reboot. Also, the
minimization on GPU is less stable then on CPUs. So if you have a bad
initial structure you might want to use CPUs for the minimization and switch
over to GPUs afterwards to prevent "constraint failure .".  If minimizing on
CPUs produce the same, check your structure, there must be something wrong
(close contacts, super imposed atoms etc.)
 
Norman Geist.
 
Von: Abhishek TYAGI [mailto:atyagiaa_at_connect.ust.hk] 
Gesendet: Donnerstag, 17. April 2014 11:24
An: Norman Geist
Cc: Namd Mailing List
Betreff: RE: namd-l: CUDA error in cuda_check_local_progress
 
Hi ,
 
I tried to run on +p1, +p2, +p3, p4 separately too. But, it run on +p1 for
few minutes then the same output appears. But when I run nvidia-smi, I
observe that namd is still running. Finally the output from log file is as
follows:
 
**
WRITING EXTENDED SYSTEM TO RESTART FILE AT STEP 10000
WRITING COORDINATES TO DCD FILE AT STEP 10000
WRITING COORDINATES TO RESTART FILE AT STEP 10000
FINISHED WRITING RESTART COORDINATES
WRITING VELOCITIES TO RESTART FILE AT STEP 10000
FINISHED WRITING RESTART VELOCITIES
REINITIALIZING VELOCITIES AT STEP 10000 TO 288 KELVIN.
TCL: Running for 5000 steps
ERROR: Constraint failure in RATTLE algorithm for atom 4170!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Constraint failure in RATTLE algorithm for atom 4236!
ERROR: Constraint failure; simulation has become unstable.
ERROR: Exiting prematurely; see error messages above.
====================================================
WallClock: 50792.585938  CPUTime: 50638.324219  Memory: 1220.495789 MB
Program finished.
 
 
 
 
Can you suggest some more way to do it.
 
regards
Abhi
  _____  
From: Norman Geist <norman.geist_at_uni-greifswald.de>
Sent: Thursday, April 17, 2014 4:21 PM
To: Abhishek TYAGI
Cc: Namd Mailing List
Subject: AW: namd-l: CUDA error in cuda_check_local_progress 
 
What GPUs are that? This error occurs for example if your cutoff or
pairlistdist, etc. are too large to fit the GPUs memory and stuff. Whats the
output of "nvidia-smi -q". Maybe there are multiple GPUs where one is only
for display and therefore hasn't enough memory. Try setting +devices to
select the GPU ids manually and see if it works with one GPU separately.
 
 
Norman Geist.
 
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag
von Abhishek TYAGI
Gesendet: Donnerstag, 17. April 2014 09:41
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: CUDA error in cuda_check_local_progress
 
Hi,
 
I am running a simulation for graphene and dna system. While running in my
CPU their is no error, but while running on GPU Cluster (Nvidia, Cuda) I am
using NAMD tool available on website
(NAMD_2.9_Linux-x86_64-multicore-CUDA.tar.gz). The following error appears
all the time. I tried to change timesteps, frequencies and other things too
but i really dont understand what to do in this case. 
 
I run the command for minimization but it is failed everytime:
% charmrun namd2 +idlepoll +p4 eq1.namd > eq1.log &
 
 
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 0 (gpu10
device 0): unspecified launch failure
Charm++ fatal error:
FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 0 (gpu10 device
0): unspecified launch failure
 
 
 
The eq1.namd conf file is as follows:
#############################################################
## JOB DESCRIPTION                                         ##
#############################################################
# Minimization and Equilibration of 
# COMMENT ON YOUR SYSTEM HERE 
#############################################################
## ADJUSTABLE PARAMETERS                                   ##
#############################################################
structure          ionized.psf 
coordinates        ionized.pdb
set temperature    298
set outputname     eq1
firsttimestep      0
#############################################################
## SIMULATION PARAMETERS                                   ##
#############################################################
# Input
paraTypeCharmm         on
parameters          par_all27_na.prm
parameters          par_graphene.prm
temperature         $temperature
# Force-Field Parameters
exclude             scaled1-4
1-4scaling          1.0
cutoff              12.
switching           on
switchdist          10.
pairlistdist        13.5
# Integrator Parameters
timestep            0.5 
rigidBonds          all  
nonbondedFreq       2
fullElectFrequency  4  
stepspercycle       10
# Constant Temperature Control
langevin            off    
langevinDamping     5     
langevinTemp        $temperature
langevinHydrogen    off    
# Output
outputName          $outputname
restartfreq         500     ;# 500steps = every 1ps
dcdfreq             300
outputEnergies      100
outputPressure      100
#############################################################
## PBC PARAMETERS                                        ##
#############################################################
# Periodic Boundary Conditions
cellBasisVector1     40.0   0.0    0.0
cellBasisVector2     0.0    40.0   0.0
cellBasisVector3     0.0    0.0    30.0
cellOrigin           0.0    0.0    0.0
#############################################################
## EXECUTION SCRIPT                                        ##
#############################################################
# Minimization
minimize            100000
reinitvels          $temperature
run 50000 
 
Please suggest me how to resolve this issue.
Thanks in advance
 
Abhishek
 
  _____  
Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus
<http://www.avast.com/>  Schutz ist aktiv. 
 
--- Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus Schutz ist aktiv. http://www.avast.com
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:21 CST