From: Anirvinya Gururajan (anirvinya.gururajan_at_research.iiit.ac.in)
Date: Tue Mar 22 2022 - 02:27:05 CDT
Hi René and Hrishikesh,
Thanks a lot for your replies. test.sh was a copy file I was trying with. Typo occurred while I uploaded here, I guess. GPUs on the cluster had fallen off the PCI bus and hence it couldn't get allocated the GPUs. I had a different error right before this one and so had eliminated the possibility of a hardware mishap. It has been fixed and works now!
Regards,
Anirvinya G
CCNSB, IIITH
________________________________
From: Hrishikesh Dhondge <hbdhondge_at_gmail.com>
Sent: 22 March 2022 12:10
To: namd-l_at_ks.uiuc.edu <namd-l_at_ks.uiuc.edu>; Anirvinya Gururajan <anirvinya.gururajan_at_research.iiit.ac.in>
Cc: René Hafner TUK <hamburge_at_physik.uni-kl.de>
Subject: Re: namd-l: Error while simulating on NAMD
Hi,
There's a misspelling in the line where you ask for GPUs.
#SBATHC --gres=gpu:2
Replace it with
#SBATCH --gres=gpu:2
If the error persists, try increasing the number of CPUs:
#SBATCH --cpus-per-task=20
On Mon, Mar 21, 2022 at 11:31 PM René Hafner TUK <hamburge_at_physik.uni-kl.de<mailto:hamburge_at_physik.uni-kl.de>> wrote:
Hi Anirvinya,
your slurm error and NAMD output file tell you everything: "no CUDA-capable device is detected"
and your SLURM_JOB_GPUS environment variable is empty hence no GPU is visible to the job.
How do you specifically request a GPU for the job?
Kind regards
René
On 3/21/2022 2:53 PM, Anirvinya Gururajan wrote:
Hey Josh!
Thanks for the reply. I do ask for GPUs in my batch script (PFA). Error dumped by slurm and the NAMD stdout output is attached in the previous message.
Regards,
Anirvinya G
CCNSB, IIITH
________________________________
From: Josh Vermaas <vermaasj_at_msu.edu><mailto:vermaasj_at_msu.edu>
Sent: 21 March 2022 18:52
To: namd-l_at_ks.uiuc.edu<mailto:namd-l_at_ks.uiuc.edu> <namd-l_at_ks.uiuc.edu><mailto:namd-l_at_ks.uiuc.edu>; Anirvinya Gururajan <anirvinya.gururajan_at_research.iiit.ac.in><mailto:anirvinya.gururajan_at_research.iiit.ac.in>
Subject: Re: namd-l: Error while simulating on NAMD
Hi Anirvinya,
In your slurm script, are you asking for any GPUs on the nodes? It looks like you are using a GPU-accelerated executable, which requires a GPU to be present in order to run. With slurm, the typical way to ask for GPUs to be allocated to the job is something like '#SBATCH --gres=gpu:1'. Do you have a line like that in your submission script?
-Josh
On 3/20/22 5:16 PM, Anirvinya Gururajan wrote:
Hi all,
Recently, I have been facing trouble with a very specific system that I am trying to simulate using NAMD/2.13. PFA the slurm output file generated. When I try to simulate it over an interactive job on the cluster node, it seems to work fine. But if it is submitted as a batch job on the same cluster node, it throws the following error. I am not very sure as to where the source of the problem is. The system is large and has about 800k atoms (I don't know how irrelevant it is to this issue).
Regards,
Anirvinya G
CCNSB, IIITH
-- Josh Vermaas Assistant Professor MSU-DOE Plant Research Laboratory, Department of Biochemistry and Molecular Biology Michigan State University https://urldefense.com/v3/__https://vermaaslab.github.io/__;!!DZ3fjg!pYHgl_KSPCHK2suORNFQzU3IquipnGsfEbIBYM-G9L_5edJzKYhuVJMM0wOxCLNdhw$ <https://urldefense.com/v3/__https://vermaaslab.github.io/__;!!DZ3fjg!sahpKfVO-tEtsbSszh9awlxQUC7lG1IfFyszJxkEY79c_Bj40OdOpmWcAcs9HebRKQ$> -- -- Dipl.-Phys. René Hafner TU Kaiserslautern Germany -- With regards Hrishikesh Dhondge PhD student, LORIA - INRIA Nancy
This archive was generated by hypermail 2.1.6 : Tue Dec 13 2022 - 14:32:44 CST