From: Ryan Woltz (rlwoltz_at_ucdavis.edu)
Date: Tue Jul 20 2021 - 18:14:31 CDT
I have a 300k atom membrane embedded channel system that is stable on
NAMD2.14 but wanted to upgrade to take advantage of the increased speed. I
downloaded NAMD3.0 alpha9. I'm also using the V100 GPUs on EXPANSE if that
matters. I've had several errors which I fixed a few but not sure how it
affects my system, I don't expect they did but I'll state them as I go
along just in case. Almost all errors relate to cudasoaintegrate.
My system is set up as follows. Minimize, equilibrate system
(step6.1-6.6), slowly release restraints on protein to prevent large RMSD
jumps (Step7.1-7.13), production (step7.14).
My first attempt was to use NAMD3 to continue a NAMD2.14 run that was in
production for 30ns. This failed immediately with error:
OPENING EXTENDED SYSTEM TRAJECTORY FILE
FATAL ERROR: CUDA cuRAND error curandGenerateNormal(gen, gaussrand_x, n, 0,
1) in file src/SequencerCUDAKernel.cu, function langevinVelocitiesBBK2,
on Pe 0 (exp-12-57 device 0 pci 0:af:0): status value 202
>From what I gathered from posts you cannot continue a NAMD2 simulation with
NAMD3. Or more specifically the post said I cannot continue a simulation
that did not previously include cudasoaintegrate turned on.
Working with NAMD3 from beginning:
1. I then tried starting it from scratch by separating the minimize step
and the equilibration steps (6.1-6.6) but the cudasoaintegrate turned on
is not compatible with reassigntemp or reassignfreq.
2. I then turned off cudasoaintegrate for steps 6.1-6.6 and then turned
it on for steps 7.1 and beyond as the steps with protein restraints on the
CA atoms which are slowly released in steps 7.1-7.13 do not require
reassignment. I used suggested options such as 1) margin 8 2)
outputEnergies/outputTiming= 400 3) pairlistpercycle = 4 4) stepspercycle =
40. Simulation fails quickly with atoms moving too fast.
I then readjusted outputEnergies/outputTiming to 5000 (charmm-gui
default). However, step7.1 fails after 105000 steps due to fatal error
Periodic cell has become too small for original patch grid! Possible
solutions are to restart from a recent checkpoint, increase margin, or
disable useFlexibleCell for liquid simulation..
I played with the margins 0-20 and once I no longer got the atoms
escaping error I then got the allocated memory exceeded, too many atoms in
a patch error.
Finally I also took out pairlistpercycle and stepspercycle as noted on
NAMD3 website that these are obsolete. Now the error of ERROR: Atoms
moving too fast at timestep 135902; simulation has become unstable (0 atoms
on pe 0).
FATAL ERROR: SequencerCUDA: Atoms moving too fast
I was able to collect bits and pieces from the forums but mostly it is on
NAMD2 or on NAMD3 but the errors were similar but not the same. From error
E) I don’t think I’m running out of RAM since I have 93GB allocated and it
runs fine on NAMD2.
I’ve gotten most of my information from Nvidia’s website and NAMD3 website
and adjusted .inp files based on these sites.
I have a suspicion that the reason things are failing after equilibration
is because I’m turning on cudasoaintegrate after dynamics has started.
However, I don’t know how to equilibrate with cudasoaintegrate on and
reassignTemp/reassingFreq. I’ve worked a year to get this system stable so
don’t want to play around too much with options I’m unfamiliar with. Again
system is stable with NAMD2 and most of the errors I get is failure due to
the cudasoaintegrate option on. If any of these steps or errors could be
fixed even if there was an option to do steps6.1-7.13 without
cudasoaintegrate on and turn it on for the production I’d be happy. I’m
also wondering if cudasoaintegrate doesn’t like restraints on the protein
as I’ve been told by others that use very early versions.
Any suggestions on how to fix any of these errors? Do I just need to keep
playing with margin/ outputenergy/ outputtimeing/ pairlistpercycle/
stepspercycle parameters until it works?
This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST