Re: Job submission error for NAMD-2.13 ( version netlrts with cuda-10.0) using Torque job scheduler

From: Daipayan Sarkar (sdaipayan_at_gmail.com)
Date: Tue Nov 19 2019 - 15:19:10 CST

Thanks Victor! I am testing on a single node but for production will have
to use multiple nodes. Each node in the cluster has two GPU's. How do I
modify the command for the latter case.

Daipayan

On Tue, Nov 19, 2019 at 3:57 PM Victor Kwan <vkwan8_at_uwo.ca> wrote:

> Use the multicore cuda build if you are running it on one node, it is as
> simple as
>
> namd2 +p20 +idlepoll config > log
>
> Victor
>
> On Tue, Nov 19, 2019 at 3:50 PM Daipayan Sarkar <sdaipayan_at_gmail.com>
> wrote:
>
>> Hello Users,
>>
>> I have compiled NAMD-2.13 version netlrts with cuda-10.0. The job
>> scheduler software is torque and have been facing issues for submitting a
>> job. Below is my NAMD job submission script. I submit the bash script (at
>> the end) using torque job scheduler:
>>
>> qsub -l nodes=1:ppn=20:gpus=1,naccesspolicy=singleuser submit.sh
>>
>> Using "$charm ++local +p20 $namd +idlepoll ++ppn 19 equil.0.inp >
>> equil.0.log" gives error
>> --
>> ++ppn not recognized
>> ---
>>
>> Removing +ppn gives the following error:
>> ---
>> Reason: FATAL ERROR: Number of devices (1) is not a multiple of number of
>> processes (20). Sharing devices between processes is inefficient. Specify
>> +ignoresharing (each process uses all visible devices) if not all devices
>> are visible to each process, otherwise adjust number of processes to evenly
>> divide number of devices, specify subset of devices with +devices argument
>> (e.g., +devices 0,2), or multiply list shared devices (e.g., +devices
>> 0,1,2,0).
>> ----
>>
>> Please advice on how to proceed.
>>
>> ==== submit.sh ======
>> #!/usr/local/bin/bash
>>
>> namd=$HOMESoftware/NAMD_2.13_Source_netlrts_cuda/Linux-x86_64-g++/namd2
>>
>> charm=$HOME/Software/NAMD_2.13_Source_netlrts_cuda/Linux-x86_64-g++/charmrun
>>
>> #$charm ++local +p20 $namd +idlepoll ++ppn 19 equil.0.inp > equil.0.log
>> $charm ++local +p20 +idlepoll $namd equil.0.inp > equil.0.log
>> ===================
>>
>> Many thanks,
>> Daipayan
>>
>>

This archive was generated by hypermail 2.1.6 : Thu Dec 31 2020 - 23:17:12 CST