Re: Job submission error for NAMD-2.13 ( version netlrts with cuda-10.0) using Torque job scheduler

From: Victor Kwan (vkwan8_at_uwo.ca)
Date: Tue Nov 19 2019 - 14:57:56 CST

Use the multicore cuda build if you are running it on one node, it is as simple as

namd2 +p20 +idlepoll config > log

Victor

On Tue, Nov 19, 2019 at 3:50 PM Daipayan Sarkar <sdaipayan_at_gmail.com<mailto:sdaipayan_at_gmail.com>> wrote:
Hello Users,

I have compiled NAMD-2.13 version netlrts with cuda-10.0. The job scheduler software is torque and have been facing issues for submitting a job. Below is my NAMD job submission script. I submit the bash script (at the end) using torque job scheduler:

qsub -l nodes=1:ppn=20:gpus=1,naccesspolicy=singleuser submit.sh

Using "$charm ++local +p20 $namd +idlepoll ++ppn 19 equil.0.inp > equil.0.log" gives error
--
 ++ppn not recognized
---

Removing +ppn gives the following error:
---
Reason: FATAL ERROR: Number of devices (1) is not a multiple of number of processes (20). Sharing devices between processes is inefficient. Specify +ignoresharing (each process uses all visible devices) if not all devices are visible to each process, otherwise adjust number of processes to evenly divide number of devices, specify subset of devices with +devices argument (e.g., +devices 0,2), or multiply list shared devices (e.g., +devices 0,1,2,0).
----

Please advice on how to proceed.

==== submit.sh ======
#!/usr/local/bin/bash

namd=$HOMESoftware/NAMD_2.13_Source_netlrts_cuda/Linux-x86_64-g++/namd2
charm=$HOME/Software/NAMD_2.13_Source_netlrts_cuda/Linux-x86_64-g++/charmrun

#$charm ++local +p20 $namd +idlepoll ++ppn 19 equil.0.inp > equil.0.log
$charm ++local +p20 +idlepoll $namd equil.0.inp > equil.0.log
===================

Many thanks,
Daipayan

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:12 CST