From: Gumbart, JC (gumbart_at_physics.gatech.edu)
Date: Tue Jul 27 2021 - 10:52:32 CDT
You absolutely can continue a NAMD2 run with NAMD3 and the CUDA integrator. I've done it with membrane systems without issue. Your first error looks unusual to me. Are the configuration files identical except for turning on the CUDA integrator (and removing margin and stepspercycle)?
You could also try switching from NAMD2 to 3 at a different time, just in case you had something pathological about your system.
Best,
JC
________________________________
From: owner-namd-l_at_ks.uiuc.edu <owner-namd-l_at_ks.uiuc.edu> on behalf of Ryan Woltz <rlwoltz_at_ucdavis.edu>
Sent: Tuesday, July 20, 2021 7:06 PM
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>
Subject: namd-l: difficulty converting namd2 channel system to namd3 Atom velocity too fast, box too small errors.
Dear community,
I have a 300k atom membrane embedded channel system that is stable on NAMD2.14 but wanted to upgrade to take advantage of the increased speed. I downloaded NAMD3.0 alpha9. I'm also using the V100 GPUs on EXPANSE if that matters. I've had several errors which I fixed a few but not sure how it affects my system, I don't expect they did but I'll state them as I go along just in case. Almost all errors relate to cudasoaintegrate.
My system is set up as follows. Minimize, equilibrate system (step6.1-6.6), slowly release restraints on protein to prevent large RMSD jumps (Step7.1-7.13), production (step7.14).
1. My first attempt was to use NAMD3 to continue a NAMD2.14 run that was in production for 30ns. This failed immediately with error:
OPENING EXTENDED SYSTEM TRAJECTORY FILE
FATAL ERROR: CUDA cuRAND error curandGenerateNormal(gen, gaussrand_x, n, 0, 1) in file src/SequencerCUDAKernel.cu, function langevinVelocitiesBBK2, line 4263
on Pe 0 (exp-12-57 device 0 pci 0:af:0): status value 202
>From what I gathered from posts you cannot continue a NAMD2 simulation with NAMD3. Or more specifically the post said I cannot continue a simulation that did not previously include cudasoaintegrate turned on.B.I then tried starting it from scratch by separating the minimize step and the equilibration steps (6.1-6.6) but the cudasoaintegrate turned on is not compatible with reassigntemp or reassignfreq.
1. I then turned off cudasoaintegrate for steps 6.1-6.6 and then turned it on for steps 7.1 and beyond as the steps with protein restraints on the CA atoms which are slowly released in steps 7.1-7.13 do not require reassignment. I used suggested options such as 1) margin 8 2) outputEnergies/outputTiming= 400 3) pairlistpercycle = 4 4) stepspercycle = 40. Simulation fails quickly with atoms moving too fast.
2. I then readjusted outputEnergies/outputTiming to 5000 (charmm-gui default). However, step7.1 fails after 105000 steps due to fatal error Periodic cell has become too small for original patch grid! Possible solutions are to restart from a recent checkpoint, increase margin, or disable useFlexibleCell for liquid simulation..
3. I played with the margins 0-20 and once I no longer got the atoms escaping error I then got the allocated memory exceeded, too many atoms in a patch error.
4. Finally I also took out pairlistpercycle and stepspercycle as noted on NAMD3 website that these are obsolete. Now the error of ERROR: Atoms moving too fast at timestep 135902; simulation has become unstable (0 atoms on pe 0).
FATAL ERROR: SequencerCUDA: Atoms moving too fast
I was able to collect bits and pieces from the forums but mostly it is on NAMD2 or on NAMD3 but the errors were similar but not the same. From error E) I don’t think I’m running out of RAM since I have 93GB allocated and it runs fine on NAMD2.
I’ve gotten most of my information from Nvidia’s website and NAMD3 website and adjusted .inp files based on these sites.
https://www.ks.uiuc.edu/Research/namd/alpha/3.0alpha/
https://urldefense.com/v3/__https://developer.nvidia.com/blog/delivering-up-to-9x-throughput-with-namd-v3-and-a100-gpu/__;!!DZ3fjg!t-8Svjahoy1Ml50wu7SldEpeWnNxGXDPzIxoKtC9JTt6nuYUdYhrDfRIm7SV_Z3z5g$ <https://urldefense.com/v3/__https://developer.nvidia.com/blog/delivering-up-to-9x-throughput-with-namd-v3-and-a100-gpu/__;!!DZ3fjg!vaZ6nZprVW7FtNybFJ2f4Z7QkVPOwCod_rfu8WA6vXXFQnLc4HlkmtDZP-EgY2gPrg$>
I have a suspicion that the reason things are failing after equilibration is because I’m turning on cudasoaintegrate after dynamics has started. However, I don’t know how to equilibrate with cudasoaintegrate on and reassignTemp/reassingFreq. I’ve worked a year to get this system stable so don’t want to play around too much with options I’m unfamiliar with. Again system is stable with NAMD2 and most of the errors I get is failure due to the cudasoaintegrate option on. If any of these steps or errors could be fixed even if there was an option to do steps6.1-7.13 without cudasoaintegrate on and turn it on for the production I’d be happy. I’m also wondering if cudasoaintegrate doesn’t like restraints on the protein as I’ve been told by others that use very early versions.
Any suggestions on how to fix any of these errors? Do I just need to keep playing with margin/ outputenergy/ outputtimeing/ pairlistpercycle/ stepspercycle parameters until it works?
Thank you,
Ryan
This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST