From: Abhishek TYAGI (atyagiaa_at_connect.ust.hk)
Date: Thu Apr 17 2014 - 04:07:36 CDT
Hi,
The nidia-smi is as follows
[atyagiaa_at_gpu10 bin]$ nvidia-smi
Thu Apr 17 17:06:26 2014
+------------------------------------------------------+
| NVIDIA-SMI 5.319.37   Driver Version: 319.37         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M2090         On   | 0000:09:00.0     Off |                  Off |
| N/A   N/A    P0    78W /  N/A |       85MB /  6143MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M2090         On   | 0000:0A:00.0     Off |                  Off |
| N/A   N/A    P0    79W /  N/A |       85MB /  6143MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M2090         On   | 0000:0D:00.0     Off |                  Off |
| N/A   N/A    P0   122W /  N/A |      255MB /  6143MB |     93%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M2090         On   | 0000:0E:00.0     Off |                  Off |
| N/A   N/A    P0    79W /  N/A |       85MB /  6143MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0     35889  /home/atyagiaa/lib/vmd/vmd_LINUXAMD64                240MB  |
|    1     35889  /home/atyagiaa/lib/vmd/vmd_LINUXAMD64                240MB  |
|    2     35862  namd2                                                168MB  |
|    2     35889  /home/atyagiaa/lib/vmd/vmd_LINUXAMD64                240MB  |
|    3     35889  /home/atyagiaa/lib/vmd/vmd_LINUXAMD64                240MB  |
+-----------------------------------------------------------------------------+
________________________________
From: Norman Geist <norman.geist_at_uni-greifswald.de>
Sent: Thursday, April 17, 2014 4:21 PM
To: Abhishek TYAGI
Cc: Namd Mailing List
Subject: AW: namd-l: CUDA error in cuda_check_local_progress
What GPUs are that? This error occurs for example if your cutoff or pairlistdist, etc. are too large to fit the GPUs memory and stuff. Whats the output of “nvidia-smi –q”. Maybe there are multiple GPUs where one is only for display and therefore hasn’t enough memory. Try setting +devices to select the GPU ids manually and see if it works with one GPU separately.
Norman Geist.
Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Abhishek TYAGI
Gesendet: Donnerstag, 17. April 2014 09:41
An: namd-l_at_ks.uiuc.edu
Betreff: namd-l: CUDA error in cuda_check_local_progress
Hi,
I am running a simulation for graphene and dna system. While running in my CPU their is no error, but while running on GPU Cluster (Nvidia, Cuda) I am using NAMD tool available on website (NAMD_2.9_Linux-x86_64-multicore-CUDA.tar.gz). The following error appears all the time. I tried to change timesteps, frequencies and other things too but i really dont understand what to do in this case.
I run the command for minimization but it is failed everytime:
% charmrun namd2 +idlepoll +p4 eq1.namd > eq1.log &
------------- Processor 0 Exiting: Called CmiAbort ------------
Reason: FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 0 (gpu10 device 0): unspecified launch failure
Charm++ fatal error:
FATAL ERROR: CUDA error in cuda_check_local_progress on Pe 0 (gpu10 device 0): unspecified launch failure
The eq1.namd conf file is as follows:
#############################################################
## JOB DESCRIPTION                                         ##
#############################################################
# Minimization and Equilibration of
# COMMENT ON YOUR SYSTEM HERE
#############################################################
## ADJUSTABLE PARAMETERS                                   ##
#############################################################
structure          ionized.psf
coordinates        ionized.pdb
set temperature    298
set outputname     eq1
firsttimestep      0
#############################################################
## SIMULATION PARAMETERS                                   ##
#############################################################
# Input
paraTypeCharmm         on
parameters          par_all27_na.prm
parameters          par_graphene.prm
temperature         $temperature
# Force-Field Parameters
exclude             scaled1-4
1-4scaling          1.0
cutoff              12.
switching           on
switchdist          10.
pairlistdist        13.5
# Integrator Parameters
timestep            0.5
rigidBonds          all
nonbondedFreq       2
fullElectFrequency  4
stepspercycle       10
# Constant Temperature Control
langevin            off
langevinDamping     5
langevinTemp        $temperature
langevinHydrogen    off
# Output
outputName          $outputname
restartfreq         500     ;# 500steps = every 1ps
dcdfreq             300
outputEnergies      100
outputPressure      100
#############################################################
## PBC PARAMETERS                                        ##
#############################################################
# Periodic Boundary Conditions
cellBasisVector1     40.0   0.0    0.0
cellBasisVector2     0.0    40.0   0.0
cellBasisVector3     0.0    0.0    30.0
cellOrigin           0.0    0.0    0.0
#############################################################
## EXECUTION SCRIPT                                        ##
#############################################################
# Minimization
minimize            100000
reinitvels          $temperature
run 50000
Please suggest me how to resolve this issue.
Thanks in advance
Abhishek
________________________________
[http://static.avast.com/emails/avast-mail-stamp.png] <http://www.avast.com/>
Diese E-Mail ist frei von Viren und Malware, denn der avast! Antivirus<http://www.avast.com/> Schutz ist aktiv.
This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:22:21 CST