Dear Dr. Vermaas and Dr. Hafner,
Thank you for the feedback.
NAMD PACE cannot be compiled with CUDA currently, I enquired the CHARMM-GUI team. Therefore, the NAMD startup is not yielding similar message (as mentioned in your previous email), as follows:
Charm++: standalone mode (not using charmrun)
Converse/Charm++ Commit ID: v6.5.0-beta1-293-gd148fb7
CharmLB> Load balancer assumes all CPUs are same.
Charm++> Running on 1 unique compute nodes (40-way SMP).
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: NAMD 2.9 for Linux-x86_64-multicore
Info: Please visit
Info: for updates, documentation, and support information.
Info: Please cite Phillips et al., J. Comp. Chem. 26:1781-1802 (2005)
Info: in all publications reporting results obtained with NAMD.
Info: Based on Charm++/Converse 60500 for multicore-linux64
Info: Built Tue May 25 19:00:30 KST 2021 by Prathit on master
Info: 1 NAMD 2.9 Linux-x86_64-multicore 1 gpu1 Prathit
Info: Running on 1 processors, 1 nodes, 1 physical nodes.
Info: CPU topology information available.
Info: Charm++/Converse parallel runtime startup completed at 0.00398183 s
Info: 34.4961 MB of memory in use based on /proc/self/stat
Info: Configuration file is step7_run.inp
Info: Working in the current directory /home2/Prathit/APP/PACE-CG/APP-Gamma_1000/charmm-gui-2444606374/namd
TCL: Suspending until startup complete.
Instead, I have to try whether with multiple processes, I am able to run the required simulations.
Thanks a lot anyways. Kindly let me know if you have any more related information.
Sincere Regards,
Is the binary under /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 compiled with CUDA support enabled or not? On a GPU build of NAMD, you should get output like this at the very beginning of NAMD startup:
Charm++> cpu topology info is gathered in 0.001 seconds.
Info: Built with CUDA version 10010
Did not find +devices i,j,k,... argument, using all
Pe 0 physical rank 0 binding to CUDA device 0 on PRL-VERMAAS-WS1: 'NVIDIA Quadro RTX 8000' Mem: 48567MB Rev: 7.5 PCI: 0:81:0
Info: NAMD 2.14 for Linux-x86_64-multicore-CUDA
Note the “Info:” lines. The first says that the NAMD build was compiled with CUDA 10.1. The second “Info” line says that this is a multicore (one node) build with CUDA support. What do those lines say for you when NAMD starts?
Dear Experts,
Just for your information, and for getting proper suggestions, I am sharing with you a few more details.
I tried to play around, apart from compiling NAMD with CUDA, as follows...
I am pasting a part of my submission script as follows:
module load compiler/gcc-7.5.0 cuda/11.2 mpi/openmpi-4.0.2-gcc-7
export PATH=/home2/Prathit/apps/NAMD_PACE_source/Linux-x86_64-g++:${PATH}
/home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 +p${SLURM_NTASKS_PER_NODE} +idlepoll ${prod_step}_run.inp > ${outputname}.out
Nevertheless, the error remians...
The job is visible in my submitted jobs list as follows:
(base) [Prathit_at_master]~/APP/PACE-CG/APP-Gamma_1000/charmm-gui-2444606374/namd>sq
Thu Jul 1 20:11:40 2021
326924 gpu PCCG1000 Prathit RUNNING 2:03:10 3-00:00:00 1 gpu1
326891 g3090 2000-APP Prathit RUNNING 5:54:45 3-00:00:00 1 gpu6
326890 g3090 1500-APP Prathit RUNNING 5:57:55 3-00:00:00 1 gpu6
Also, it is visible to be running with the "top" command, after logging into the gpu:
36566 exay 20 0 18.9g 2.3g 307624 S 120.9 1.5 6368:00 python
49595 junsu 20 0 27.2g 3.4g 323256 R 100.7 2.1 2162:55 python
49633 junsu 20 0 19.9g 3.5g 323120 R 100.3 2.2 2010:20 python
65081 Prathit 20 0 3514556 1.7g 5600 R 100.3 1.1 127:02.83 namd2
49453 junsu 20 0 27.2g 2.3g 323252 R 100.0 1.5 1908:17 python
49502 junsu 20 0 30.9g 2.2g 323248 R 100.0 1.4 2008:01 python
Yet, the job is not visible in queue as follows (with nvidia-smi command):
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
| 0 N/A N/A 36566 C python 9351MiB |
| 1 N/A N/A 49453 C ...nvs/conda_lgbm/bin/python 2413MiB |
| 2 N/A N/A 49502 C ...nvs/conda_lgbm/bin/python 2135MiB |
| 4 N/A N/A 49595 C ...nvs/conda_lgbm/bin/python 2939MiB |
| 5 N/A N/A 49633 C ...nvs/conda_lgbm/bin/python 2541MiB |
Also, as a consequence, the job is too slow.
Any further suggestions as to how I can run a job with compiled NAMD_PACE in a proper queueing system, will be greatly helpful.
Any inconvenience on my behalf is deeply regretted,
Maybe slurm wants namd to be located somwhere else? I mean not in your home folder. Ask your IT department, they will probably want to install it themselves
> I just understood that you have a special version there.
> You probably need to (re-)compile your adapted NAMD PACE Source with CUDA support first.
>> Hi
>> Did you actually use a GPU version of NAMD?
>> You should see this in the logfile.
>> If you rely on single node GPU runs the precompiled CUDA binaries should be sufficient.
>> And do add `+p${SLURM_NTASKS_PER_NODE} +idlepoll` to the namd exec line below for faster execution.
>> Kind regards
>>> Dear Experts,
>>> This is regarding GPU job submission with NAMD, compiled specifically for PACE CG force field, with CHARMM-GUI, in SLURM environment.
>>> Kindly see my submit script below:
>>> #!/bin/csh
>>> #
>>> #SBATCH -J PCCG2000
>>> #SBATCH -N 1
>>> #SBATCH -n 1
>>> #SBATCH -p g3090 # Using a 3090 node
>>> #SBATCH --gres=gpu:1 # Number of GPUs (per node)
>>> #SBATCH -o output.log
>>> #SBATCH -e output.err
>>> # Generated by CHARMM-GUI (;!!DZ3fjg!uotXwdn_atHouQnAZvtrBd5MzD78SRFnRy7FFtJz79ZzSrKhaTqJTavLSXrSmgMiKA$ ) v3.5
>>> #
>>> # The following shell script assumes your NAMD executable is namd2 and that
>>> # the NAMD inputs are located in the current directory.
>>> #
>>> # Only one processor is used below. To parallelize NAMD, use this scheme:
>>> # charmrun namd2 +p4 input_file.inp > output_file.out
>>> # where the "4" in "+p4" is replaced with the actual number of processors you
>>> # intend to use.
>>> module load compiler/gcc-7.5.0 cuda/11.2 mpi/openmpi-4.0.2-gcc-7
>>> set equi_prefix = step6.%d_equilibration
>>> set prod_prefix = step7.1_production
>>> set prod_step = step7
>>> # Running equilibration steps
>>> set cnt = 1
>>> set cntmax = 6
>>> while ( ${cnt} <= ${cntmax} )
>>> set step = `printf ${equi_prefix} ${cnt}`
>>> ## /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/charmrun /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 ${step}.inp > ${step}.out
>>> /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 ${step}.inp > ${step}.out
>>> @ cnt += 1
>>> end
>>> ================
>>> While the jobs are getting submitted, these are not entering the queueing system, the PIDs of the jobs are invisible with the command "nvidia-smi", but showing with the "top" command inside the gpu node.
>>> Any suggestions in rectifying the current discrepancy will be greatly helpful.
>>> Thank you and Regards,
>>> Prathit
