Re: NAMD jobs in SLURM environment, not entering queueing system

From: Prathit Chatterjee (pc20apr_at_yahoo.co.in)
Date: Mon Jun 28 2021 - 04:32:05 CDT

Hello,

Thank you for your suggestions, I shall implement the suggested and again get back to you if needed further.

Thanks again for the help,
Sincere Regards,
Prathit

On Monday, 28 June, 2021, 06:29:05 pm GMT+9, René Hafner TUK <hamburge_at_physik.uni-kl.de> wrote:

I just understood that you have a special  version there.

You probably need to (re-)compile your adapted NAMD PACE Source with CUDA support first.

On 6/28/2021 11:03 AM, René Hafner TUK wrote:

>  
Hi   

    Did you actually use a GPU version of NAMD?

    You should see this in the logfile.

    If you rely on single node GPU runs the precompiled CUDA binaries should be sufficient.

    And do add `+p${SLURM_NTASKS_PER_NODE} +idlepoll` to the namd exec line below for faster execution.

Kind regards

René

On 6/28/2021 10:54 AM, Prathit Chatterjee wrote:

>  

Dear Experts,

This is regarding GPU job submission with NAMD, compiled specifically for PACE CG force field, with CHARMM-GUI, in SLURM environment.

Kindly see my submit script below:

#!/bin/csh

#

#SBATCH -J PCCG2000

#SBATCH -N 1

#SBATCH -n 1

#SBATCH -p g3090 # Using a 3090 node

#SBATCH --gres=gpu:1    # Number of GPUs (per node)

#SBATCH -o output.log

#SBATCH -e output.err

# Generated by CHARMM-GUI (https://urldefense.com/v3/__http://www.charmm-gui.org__;!!DZ3fjg!uYrnluqKcx3dqMlahR_VBUlUbKlW2zKbS12HeZNP1gkkUOGxP5FGKyECdYjSdWiuMg$ ) v3.5

#

# The following shell script assumes your NAMD executable is namd2 and that

# the NAMD inputs are located in the current directory.

#

# Only one processor is used below. To parallelize NAMD, use this scheme:

#     charmrun namd2 +p4 input_file.inp > output_file.out

# where the "4" in "+p4" is replaced with the actual number of processors you

# intend to use.

module load compiler/gcc-7.5.0 cuda/11.2  mpi/openmpi-4.0.2-gcc-7

echo "SLURM_NODELIST $SLURM_NODELIST"

echo "NUMBER OF CORES $SLURM_NTASKS"

set equi_prefix = step6.%d_equilibration

set prod_prefix = step7.1_production

set prod_step   = step7

# Running equilibration steps

set cnt    = 1

set cntmax = 6

while ( ${cnt} <= ${cntmax} )

    set step = `printf ${equi_prefix} ${cnt}`

##    /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/charmrun /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 ${step}.inp > ${step}.out

    /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2 ${step}.inp > ${step}.out

    @ cnt += 1

end

================

While the jobs are getting submitted, these are not entering the queueing system, the PIDs of the jobs are invisible with the command "nvidia-smi", but showing with the "top" command inside the gpu node.

Any suggestions in rectifying the current discrepancy will be greatly helpful.

Thank you and Regards,

Prathit

-- 
--
Dipl.-Phys. René Hafner
TU Kaiserslautern
Germany-- 
--
Dipl.-Phys. René Hafner
TU Kaiserslautern
Germany

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST