Re: Error while simulating on NAMD

From: Hrishikesh Dhondge (hbdhondge_at_gmail.com)
Date: Tue Mar 22 2022 - 01:40:27 CDT

Hi,

There's a misspelling in the line where you ask for GPUs.

#SBATHC --gres=gpu:2

Replace it with

#SBATCH --gres=gpu:2

If the error persists, try increasing the number of CPUs:

#SBATCH --cpus-per-task=20

On Mon, Mar 21, 2022 at 11:31 PM René Hafner TUK <hamburge_at_physik.uni-kl.de>
wrote:

> Hi Anirvinya,
>
>
> your slurm error and NAMD output file tell you everything: "no
> CUDA-capable device is detected"
>
> and your SLURM_JOB_GPUS environment variable is empty hence no GPU is
> visible to the job.
>
>
> How do you specifically request a GPU for the job?
>
>
> Kind regards
>
> René
>
>
> On 3/21/2022 2:53 PM, Anirvinya Gururajan wrote:
>
> Hey Josh!
>
> Thanks for the reply. I do ask for GPUs in my batch script (PFA). Error
> dumped by slurm and the NAMD stdout output is attached in the previous
> message.
>
> Regards,
> Anirvinya G
> CCNSB, IIITH
> ------------------------------
> *From:* Josh Vermaas <vermaasj_at_msu.edu> <vermaasj_at_msu.edu>
> *Sent:* 21 March 2022 18:52
> *To:* namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu> <namd-l_at_ks.uiuc.edu>;
> Anirvinya Gururajan <anirvinya.gururajan_at_research.iiit.ac.in>
> <anirvinya.gururajan_at_research.iiit.ac.in>
> *Subject:* Re: namd-l: Error while simulating on NAMD
>
>
> Hi Anirvinya,
>
>
> In your slurm script, are you asking for any GPUs on the nodes? It looks
> like you are using a GPU-accelerated executable, which requires a GPU to be
> present in order to run. With slurm, the typical way to ask for GPUs to be
> allocated to the job is something like '#SBATCH --gres=gpu:1'. Do you have
> a line like that in your submission script?
>
>
> -Josh
>
>
> On 3/20/22 5:16 PM, Anirvinya Gururajan wrote:
>
>
> Hi all,
>
> Recently, I have been facing trouble with a very specific system that I am
> trying to simulate using NAMD/2.13. PFA the slurm output file generated.
> When I try to simulate it over an interactive job on the cluster node, it
> seems to work fine. But if it is submitted as a batch job on the same
> cluster node, it throws the following error. I am not very sure as to where
> the source of the problem is. The system is large and has about 800k atoms
> (I don't know how irrelevant it is to this issue).
>
> Regards,
> Anirvinya G
> CCNSB, IIITH
>
>
> --
> Josh Vermaas
> Assistant Professor
> MSU-DOE Plant Research Laboratory, Department of Biochemistry and Molecular Biology
> Michigan State Universityhttps://vermaaslab.github.io/ <https://urldefense.com/v3/__https://vermaaslab.github.io/__;!!DZ3fjg!sahpKfVO-tEtsbSszh9awlxQUC7lG1IfFyszJxkEY79c_Bj40OdOpmWcAcs9HebRKQ$>
>
> --
> --
> Dipl.-Phys. René Hafner
> TU Kaiserslautern
> Germany
>
>

-- 
With regards
Hrishikesh Dhondge
PhD student,
LORIA - INRIA Nancy

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST