Re: NAMD jobs in SLURM environment, not entering queueing system

From: Natalia Ostrowska (n.ostrowska_at_cent.uw.edu.pl)
Date: Mon Jun 28 2021 - 04:37:25 CDT

Maybe slurm wants namd to be located somwhere else? I mean not in your home
folder. Ask your IT department, they will probably want to install it
themselves

Natalia Ostrowska
Univeristy of Warsaw, Poland
Centre of New Technologies
Biomolecular Machines Laboratory

pon., 28 cze 2021 o 11:28 René Hafner TUK <hamburge_at_physik.uni-kl.de>
napisał(a):

> I just understood that you have a special version there.
>
> You probably need to (re-)compile your adapted NAMD PACE Source with CUDA
> support first.
> On 6/28/2021 11:03 AM, René Hafner TUK wrote:
>
> Hi
>
> Did you actually use a GPU version of NAMD?
>
> You should see this in the logfile.
>
> If you rely on single node GPU runs the precompiled CUDA binaries
> should be sufficient.
>
> And do add `+p${SLURM_NTASKS_PER_NODE} +idlepoll` to the namd exec
> line below for faster execution.
>
> Kind regards
>
> René
> On 6/28/2021 10:54 AM, Prathit Chatterjee wrote:
>
> Dear Experts,
>
> This is regarding GPU job submission with NAMD, compiled specifically for
> PACE CG force field, with CHARMM-GUI, in SLURM environment.
>
> Kindly see my submit script below:
>
> #!/bin/csh
>
> #
>
> #SBATCH -J PCCG2000
>
> #SBATCH -N 1
>
> #SBATCH -n 1
>
> #SBATCH -p g3090 # Using a 3090 node
>
> #SBATCH --gres=gpu:1 # Number of GPUs (per node)
>
> #SBATCH -o output.log
>
> #SBATCH -e output.err
>
>
> # Generated by CHARMM-GUI (https://urldefense.com/v3/__http://www.charmm-gui.org__;!!DZ3fjg!pAlGyttSNy3GVSp-Q8_JUfsaQ0V3m9_3AsRIvN969W8iwUNVPMwD8cW2b_z0sNBiyA$
> <https://urldefense.com/v3/__http://www.charmm-gui.org__;%21%21DZ3fjg%21uzul5NRVXaxH6jBb2Q9G5YS_oEiOhHy617xQn-c3N4c6mvGLZWo1Ykiz6Ozuiwzv1w$>)
> v3.5
>
> #
>
> # The following shell script assumes your NAMD executable is namd2 and that
>
> # the NAMD inputs are located in the current directory.
>
> #
>
> # Only one processor is used below. To parallelize NAMD, use this scheme:
>
> # charmrun namd2 +p4 input_file.inp > output_file.out
>
> # where the "4" in "+p4" is replaced with the actual number of processors
> you
>
> # intend to use.
>
> module load compiler/gcc-7.5.0 cuda/11.2 mpi/openmpi-4.0.2-gcc-7
>
>
> echo "SLURM_NODELIST $SLURM_NODELIST"
>
> echo "NUMBER OF CORES $SLURM_NTASKS"
>
>
> set equi_prefix = step6.%d_equilibration
>
> set prod_prefix = step7.1_production
>
> set prod_step = step7
>
>
>
> # Running equilibration steps
>
> set cnt = 1
>
> set cntmax = 6
>
>
> while ( ${cnt} <= ${cntmax} )
>
> set step = `printf ${equi_prefix} ${cnt}`
>
> ## /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/charmrun
> /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 ${step}.inp >
> ${step}.out
>
> /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 ${step}
> .inp > ${step}.out
>
>
> @ cnt += 1
>
> end
>
> ================
>
> While the jobs are getting submitted, these are not entering the queueing
> system, the PIDs of the jobs are invisible with the command "*nvidia-smi*",
> but showing with the "*top*" command inside the gpu node.
>
> Any suggestions in rectifying the current discrepancy will be greatly
> helpful.
>
> Thank you and Regards,
> Prathit
>
>
> --
> --
> Dipl.-Phys. René Hafner
> TU Kaiserslautern
> Germany
>
> --
> --
> Dipl.-Phys. René Hafner
> TU Kaiserslautern
> Germany
>
>

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST