Re: Right NAMD version

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Tue Sep 18 2018 - 08:19:50 CDT

Hi Fidan, the first errors are due to either the execute permission bits
being missing, or the sysadmins having disabled the execution bits for the
entire /scratch filesystem, due to overzealous security policies.

Also, any "multicore" version runs well on multiple cores on the same node,
but not across nodes. To run on multiple nodes, you will need a version
that either uses ibverbs directly, or wraps around the MPI library.

Most details are explained in the notes.txt document: you should read it
carefully, and if you lack specific technical information on the cluster
the admins must provide that to you. You could also share the document
with them, but you also must do a bit of legwork yourself.\

Issues with combination of MPI and CUDA are also mentioned.

Giacomo

On Tue, Sep 18, 2018 at 7:02 AM Fidan Sumbul <fidansumbul_at_gmail.com> wrote:

> Dear NAMD users,
>
> I'm trying to run NAMD in a HPCC that i have access which has multiple
> nodes connected by OFED, possible usage of CUDA, support for MPI.
> I'm having difficulties to use NAMD in the server.
> I've tried different versions of pre-compiled NAMD, by basically
> downloading and untar'ing them but i got the following error messages.
>
> Error messages with multicore-CUDA
> /scratch/xxx/NAMD_2.13b1_Linux-x86_64-multicore-CUDA/charmrun: line 34:
> /scratch/xxx/xxx/xxx/charmlist: Permission denied
> /scratch/xxx/NAMD_2.13b1_Linux-x86_64-multicore-CUDA/charmrun: line 34:
> exec: /scratch/xxx/xxx/xxx/charmlist: cannot execute: Permission denied
>
> Error message of ibverbs
> Charmrun> charmrun started...
> Charmrun> mpiexec started
> Charmrun> node programs all started
> Charmrun> node programs all connected
> ------------- Processor 9 Exiting: Called CmiAbort ------------
> Reason:
>
> Length mismatch!!
>
>
> I already found some information on the web that they were saying the
> “length mismatch” error means you are not using the correct version.
> Which version should i use?
>
> Thanks a lot for your help. Please find the sh script below.
>
> Best,
>
>
> #!/bin/sh
>
> # number of nodes
> #SBATCH -N 2
>
> # number of tasks / node, keep at 12 !
> #SBATCH --ntasks-per-node 32
> #SBATCH -n 64
>
> # name of job, optional
> #SBATCH -J trial_job
>
> #SBATCH -p xxx
>
> #SBATCH -A xxx
>
> # time. max 7 days #SBATCH -t JJ-HH:MM:SS
> #SBATCH -t 2:00:00
>
> # name of output file, optional
> #SBATCH -o slurm_trial.out
> #SBATCH -e slurm_trial.err
>
> # mail recipient, optional
> #SBATCH --mail-user=xxx_at_gmail.com
>
> # should mail be sent ? optional
> #SBATCH --mail-type=ALL
>
> # source /share/apps/scripts/env_vars.sh
>
> module purge
> module load userspace/all
> module load openmpi/2.1.2/2018
>
> #export NAMD_DIR=/softs/NAMD_2.12_Linux-x86_64-multicore-CUDA
> export NAMD_DIR=/scratch/fsumbul/NAMD_2.13b1_Linux-x86_64-ibverbs
> export PATH=$PATH:$NAMD_DIR
>
> echo "Running on: $SLURM_NODELIST"
> srun -s hostname > $PWD/charmlist
>
> charmrun ++verbose ++nodelist $PWD/charmlist ++p $SLURM_NPROCS ++mpiexec
> $NAMD_DIR/namd2 $PWD/xxx.conf > $PWD/xxx.log
>
> ---
> Fidan Sumbul, PhD
> Postdoctoral Researcher
> Force microscopy group <https://sites.google.com/view/fm4b-lab/> @LAI
> U1067
> Aix-Marseille Université / Inserm / CNRS
> 163, Av. de Luminy, Bât TPR2 bloc 5, case 909
> 13288 Marseille Cedex 9, France
> fidan.sumbul_at_inserm.fr
> fidansumbul_at_gmail.com
>
>
>
>
>

-- 
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2019 - 23:20:13 CST