Re: NAMD jobs in SLURM environment, not entering queueing system

From: René Hafner TUK (hamburge_at_physik.uni-kl.de)
Date: Mon Jun 28 2021 - 04:27:06 CDT

I just understood that you have a special  version there.

You probably need to (re-)compile your adapted NAMD PACE Source with
CUDA support first.

On 6/28/2021 11:03 AM, René Hafner TUK wrote:
>
> Hi
>
>     Did you actually use a GPU version of NAMD?
>
>     You should see this in the logfile.
>
>     If you rely on single node GPU runs the precompiled CUDA binaries
> should be sufficient.
>
>     And do add `+p${SLURM_NTASKS_PER_NODE} +idlepoll` to the namd exec
> line below for faster execution.
>
> Kind regards
>
> René
>
> On 6/28/2021 10:54 AM, Prathit Chatterjee wrote:
>> Dear Experts,
>>
>> This is regarding GPU job submission with NAMD, compiled specifically
>> for PACE CG force field, with CHARMM-GUI, in SLURM environment.
>>
>> Kindly see my submit script below:
>>
>> #!/bin/csh
>>
>> #
>>
>> #SBATCH -J PCCG2000
>>
>> #SBATCH -N 1
>>
>> #SBATCH -n 1
>>
>> #SBATCH -p g3090 # Using a 3090 node
>>
>> #SBATCH --gres=gpu:1# Number of GPUs (per node)
>>
>> #SBATCH -o output.log
>>
>> #SBATCH -e output.err
>>
>>
>> # Generated by CHARMM-GUI (https://urldefense.com/v3/__http://www.charmm-gui.org__;!!DZ3fjg!pa0EkaSxQCslve1sciRh37NGVdeKOPV35nhbs3ceygrQwtknmpRc98sHB42SL-UkdQ$
>> <https://urldefense.com/v3/__http://www.charmm-gui.org__;!!DZ3fjg!uzul5NRVXaxH6jBb2Q9G5YS_oEiOhHy617xQn-c3N4c6mvGLZWo1Ykiz6Ozuiwzv1w$>)
>> v3.5
>>
>> #
>>
>> # The following shell script assumes your NAMD executable is namd2
>> and that
>>
>> # the NAMD inputs are located in the current directory.
>>
>> #
>>
>> # Only one processor is used below. To parallelize NAMD, use this scheme:
>>
>> # charmrun namd2 +p4 input_file.inp > output_file.out
>>
>> # where the "4" in "+p4" is replaced with the actual number of
>> processors you
>>
>> # intend to use.
>>
>> module load compiler/gcc-7.5.0cuda/11.2mpi/openmpi-4.0.2-gcc-7
>>
>>
>> echo"SLURM_NODELIST $SLURM_NODELIST"
>>
>> echo"NUMBER OF CORES $SLURM_NTASKS"
>>
>>
>> setequi_prefix = step6.%d_equilibration
>>
>> setprod_prefix = step7.1_production
>>
>> setprod_step = step7
>>
>>
>>
>> # Running equilibration steps
>>
>> setcnt= 1
>>
>> setcntmax = 6
>>
>>
>> while(${cnt}<=${cntmax})
>>
>> setstep= `printf ${equi_prefix} ${cnt}`
>>
>> ##/home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/charmrun
>> /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2
>> ${step}.inp > ${step}.out
>>
>> /home2/Prathit/apps/NAMD_PACE_Source/Linux-x86_64-g++/namd2
>> ${step}.inp>${step}.out
>>
>>
>> @ cnt += 1
>>
>> end
>>
>>
>> ================
>>
>> While the jobs are getting submitted, these are not entering the
>> queueing system, the PIDs of the jobs are invisible with the command
>> "*nvidia-smi*", but showing with the "*top*" command inside the gpu node.
>>
>> Any suggestions in rectifying the current discrepancy will be greatly
>> helpful.
>>
>> Thank you and Regards,
>> Prathit
>>
>>
> --
> --
> Dipl.-Phys. René Hafner
> TU Kaiserslautern
> Germany

-- 
--
Dipl.-Phys. René Hafner
TU Kaiserslautern
Germany

This archive was generated by hypermail 2.1.6 : Fri Dec 31 2021 - 23:17:11 CST