From: Chris Harrison (char_at_ks.uiuc.edu)
Date: Tue Apr 28 2009 - 16:50:57 CDT
Grace,
You say your cluster, I'm assuming this isn't an XT5. ;)
Can you provide some details on your cluster and clarify if you mean 128
procs on both clusters, irrespective of architecture?
Also, you have confirmed that using the lower # of procs you can exceed the
timestep at which the "128 proc" job dies, correct?
C.
-- Chris Harrison, Ph.D. Theoretical and Computational Biophysics Group NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology University of Illinois, 405 N. Mathews Ave., Urbana, IL 61801 char_at_ks.uiuc.edu Voice: 217-244-1733 http://www.ks.uiuc.edu/~char Fax: 217-244-6078 On Tue, Apr 28, 2009 at 2:16 PM, Grace Brannigan <grace_at_vitae.cmm.upenn.edu>wrote: > Hi all, > > I have been simulating a protein in a truncated octahedral water box(~90k > atoms) using NAMD2.7b1. On both our local cluster and Jim's kraken build, > the job runs fine if I use up to 96 processors. With 128 the job crashes > after an error message, which is not consistent and can either be "bad > global exclusion count", atoms with nan velocities, or just a seg fault. I > haven't had any problems like this with the other jobs I've been running > using v2.7b1, which, admittedly, have more conventional geometries. My conf > file is below - any ideas? > > -Grace > > > ********************** > > # FILENAMES > set outName [file rootname [file tail [info script]]] > #set inFleNum [expr [scan [string range $outName end-1 end] "%d"] > - 1] > #set inName [format "%s%02u" [string range $outName 0 end-2] > $inFileNum] > #set inName ionized > set inName min01 > set homedir ../../.. > set sourcepath ../../solvate_and_ionize/riso > > timestep 2.0 > > structure $sourcepath/ionized.psf > parameters $homedir/toppar/par_all27_prot_lipid.prm > parameters $homedir/toppar/par_isoflurane_RS.inp > paraTypeCharmm on > > set temp 300.0 > #temperature $temp > # RESTRAINTS > > constraints on > consref $sourcepath/constraints.pdb > conskfile $sourcepath/constraints.pdb > conskcol O > > # INPUT > > coordinates $sourcepath/ionized.pdb > extendedsystem $inName.xsc > binvelocities $inName.vel > bincoordinates $inName.coor > #cellBasisVector1 108 0 0 > #cellBasisVector2 0 108 0 > #cellBasisVector3 54 54 54 > > # OUTPUT > > outputenergies 500 > outputtiming 500 > outputpressure 500 > binaryoutput yes > outputname [format "%so" $outName] > restartname $outName > restartfreq 500 > binaryrestart yes > > XSTFreq 500 > COMmotion no > > # DCD TRAJECTORY > > DCDfile $outName.dcd > DCDfreq 5000 > > # CUT-OFFs > > splitpatch hydrogen > hgroupcutoff 2.8 > stepspercycle 20 > switching on > switchdist 10.0 > cutoff 12.0 > pairlistdist 13.0 > > #margin 1.0 > > wrapWater no > > # CONSTANT-T > > langevin on > langevinTemp $temp > langevinDamping 0.1 > > # CONSTANT-P > > useFlexibleCell no > useConstantRatio no > useGroupPressure yes > > langevinPiston on > langevinPistonTarget 1 > langevinPistonPeriod 200 > langevinPistonDecay 100 > langevinPistonTemp $temp > > # PME > > PME yes > PMETolerance 10e-6 > PMEInterpOrder 4 > > PMEGridSizeX 120 > PMEGridSizeY 120 > PMEGridSizeZ 96 > > # MULTIPLE TIME-STEP > > fullelectfrequency 2 > nonbondedfreq 1 > > # SHAKE/RATTLE > > rigidBonds all > > # 1-4's > > exclude scaled1-4 > 1-4scaling 1.0 > > constraintscaling 1.0 > run 250000 > constraintscaling 0.0 > 1250000 > >
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:41 CST