Re: Error while simulating on NAMD

From: Anirvinya Gururajan (anirvinya.gururajan_at_research.iiit.ac.in)
Date: Tue Mar 22 2022 - 02:27:05 CDT

Hi René and Hrishikesh,

Thanks a lot for your replies. test.sh was a copy file I was trying with. Typo occurred while I uploaded here, I guess. GPUs on the cluster had fallen off the PCI bus and hence it couldn't get allocated the GPUs. I had a different error right before this one and so had eliminated the possibility of a hardware mishap. It has been fixed and works now!

Regards,
Anirvinya G
CCNSB, IIITH
________________________________
From: Hrishikesh Dhondge <hbdhondge_at_gmail.com>
Sent: 22 March 2022 12:10
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>; Anirvinya Gururajan <anirvinya.gururajan_at_research.iiit.ac.in>
Cc: René Hafner TUK <hamburge_at_physik.uni-kl.de>
Subject: Re: namd-l: Error while simulating on NAMD

Hi,

There's a misspelling in the line where you ask for GPUs.

#SBATHC --gres=gpu:2

Replace it with

#SBATCH --gres=gpu:2

If the error persists, try increasing the number of CPUs:

#SBATCH --cpus-per-task=20

On Mon, Mar 21, 2022 at 11:31 PM René Hafner TUK <hamburge_at_physik.uni-kl.de<mailto:hamburge_at_physik.uni-kl.de>> wrote:

Hi Anirvinya,

your slurm error and NAMD output file tell you everything: "no CUDA-capable device is detected"

and your SLURM_JOB_GPUS environment variable is empty hence no GPU is visible to the job.

How do you specifically request a GPU for the job?

Kind regards

René

On 3/21/2022 2:53 PM, Anirvinya Gururajan wrote:
Hey Josh!

Thanks for the reply. I do ask for GPUs in my batch script (PFA). Error dumped by slurm and the NAMD stdout output is attached in the previous message.

Regards,
Anirvinya G
CCNSB, IIITH
________________________________
From: Josh Vermaas <vermaasj_at_msu.edu><mailto:vermaasj_at_msu.edu>
Sent: 21 March 2022 18:52
To: namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu> <namd-l_at_ks.uiuc.edu><mailto:namd-l_at_ks.uiuc.edu>; Anirvinya Gururajan <anirvinya.gururajan_at_research.iiit.ac.in><mailto:anirvinya.gururajan_at_research.iiit.ac.in>
Subject: Re: namd-l: Error while simulating on NAMD

Hi Anirvinya,

In your slurm script, are you asking for any GPUs on the nodes? It looks like you are using a GPU-accelerated executable, which requires a GPU to be present in order to run. With slurm, the typical way to ask for GPUs to be allocated to the job is something like '#SBATCH --gres=gpu:1'. Do you have a line like that in your submission script?

-Josh

On 3/20/22 5:16 PM, Anirvinya Gururajan wrote:

Hi all,

Recently, I have been facing trouble with a very specific system that I am trying to simulate using NAMD/2.13. PFA the slurm output file generated. When I try to simulate it over an interactive job on the cluster node, it seems to work fine. But if it is submitted as a batch job on the same cluster node, it throws the following error. I am not very sure as to where the source of the problem is. The system is large and has about 800k atoms (I don't know how irrelevant it is to this issue).

Regards,
Anirvinya G
CCNSB, IIITH

--
Josh Vermaas
Assistant Professor
MSU-DOE Plant Research Laboratory, Department of Biochemistry and Molecular Biology
Michigan State University
https://urldefense.com/v3/__https://vermaaslab.github.io/__;!!DZ3fjg!pYHgl_KSPCHK2suORNFQzU3IquipnGsfEbIBYM-G9L_5edJzKYhuVJMM0wOxCLNdhw$ <https://urldefense.com/v3/__https://vermaaslab.github.io/__;!!DZ3fjg!sahpKfVO-tEtsbSszh9awlxQUC7lG1IfFyszJxkEY79c_Bj40OdOpmWcAcs9HebRKQ$>
--
--
Dipl.-Phys. René Hafner
TU Kaiserslautern
Germany
--
With regards
Hrishikesh Dhondge
PhD student,
LORIA - INRIA Nancy

This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST