Re: Which NAMD-Build for my local HPC?

From: Giacomo Fiorin (giacomo.fiorin_at_gmail.com)
Date: Wed Nov 01 2017 - 08:30:27 CDT

Hi Bassam, I would suggest checking together with your HPC manager the
release notes accompanying NAMD 2.12 and the NAMD Wiki entries relevant to
your cluster.

Giacomo

On Tue, Oct 31, 2017 at 4:50 PM, Bassam Haddad <bhaddad_at_pdx.edu> wrote:

>
>
> *Hi Again, *
> *I decided to go ahead and test the ibverbs binary, and it seems to be
> working fine (for 1 node) however I am getting the following error when I
> run the slurm.submit script....*
>
> Charmrun> scalable start enabled.
> Charmrun> IBVERBS version of charmrun
> Charmrun> started all node programs in 0.068 seconds.
> Charm++> Running in SMP mode: numNodes 2, 19 worker threads per process
> Charm++> The comm. thread both sends and receives messages
> Charm++> Using recursive bisection (scheme 3) for topology aware partitions
> Converse/Charm++ Commit ID: v6.7.1-0-gbdf6a1b-namd-charm-
> 6.7.1-build-2016-Nov-07-136676
> Warning> Randomization of stack pointer is turned on in kernel.
> Charm++> synchronizing isomalloc memory region...
> [0] consolidated Isomalloc memory region: 0x2b7200000000 - 0x7ff900000000
> (88633344 megs)
> Charm++> scheduler running in netpoll mode.
> CharmLB> Load balancer assumes all CPUs are same.
> Charm++> cpu affinity enabled.
> ------------- Processor 38 Exiting: Called CmiAbort ------------
> Reason:
>
> Length mismatch!!
>
>
> [38] Stack Traceback:
> [38:0] [0x1349a90]
> [38:1] [0x134e4bf]
> [38:2] [0x13847d6]
> [38:3] [0x11b450b]
> [38:4] [0x51d3b2]
> [38:5] [0x1330b85]
> [38:6] [0x132efd2]
> [38:7] +0x7dc5 [0x2b6db3439dc5]
> [38:8] clone+0x6d [0x2b6db437c73d]
> Fatal error on PE 38>
>
> Length mismatch!!
>
> *Below I have added my submit script...*
>
> #!/bin/bash
> #SBATCH --job-name namd86
> #SBATCH --partition long
> #SBATCH --nodes 2
> #SBATCH --ntasks-per-node 20
> #SBATCH --time 00:10:00
> #SBATCH --output namd-test.${SLURM_JOBID}.out
>
>
> # choose version of NAMD to use
> # export NAMD_DIR=/projects/username/NAMD/NAMD_2.11_Linux-x86_64-
> ibverbs-smp
> # export PATH=$PATH:$NAMD_DIR
>
>
> cd /scratch/bhaddad/NAMD/Coeus_test
>
>
> # generate NAMD nodelist
> for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
> echo "host $n ++cpus 19" >> nodelist.$SLURM_JOBID
> done
>
>
> # calculate total processes (P) and procs per node (PPN)
> PPN=`expr $SLURM_NTASKS_PER_NODE - 1`
> P="$(($PPN * $SLURM_NNODES))"
>
>
> charmrun ++mpiexec ++remote-shell srun /home/bhaddad/NAMD_2.12_Linux-x86_64-verbs-smp/namd2
> ++p $P ++ppn $PPN +setcpuaffinity +isomalloc_sync test.conf
>
> rm nodelist.$SLURM_JOBID
>
> *Thank you all, again!*
>
> *~ Bassam*
>
>
> On Tue, Oct 31, 2017 at 11:34 AM, Bassam Haddad <bhaddad_at_pdx.edu> wrote:
>
>> Hi NAMD Users,
>>
>> I am working with the HPC manager at my university to get a multi-node
>> build of NAMD installed on it. We are a bit at a cross-roads trying to
>> determine the best build, and whether or not compiling it is necessary with
>> the given binaries.
>>
>> Coeus HPC Specs
>> <https://www.pdx.edu/oit/linux-parallel-computing-clusters>:
>>
>>
>> - 128 compute nodes each with 20 cores and 128 GB RAM
>> - Dual Intel Xeon E2630 v4
>> <https://ark.intel.com/products/92981/Intel-Xeon-Processor-E5-2630-v4-25M-Cache-2_20-GHz>,
>> 10 cores @ 2.2 GHz
>> - 128 GB 2133 MHz RAM
>> - 200 GB SSD drive
>> - 12 Intel Phi processor nodes each with 64 cores and 96 GB RAM
>> - Intel Xeon Phi 7210
>> <https://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core>,
>> 64 cores @ 2.2 GHz
>> - 96 GB 2133 MHz RAM
>> - 200 GB SSD drive
>> - 2 large-memory compute nodes each with 20 cores and 128 GB RAM
>> - Dual Intel Xeon E2650 v4
>> <https://ark.intel.com/products/92981/Intel-Xeon-Processor-E5-2630-v4-25M-Cache-2_20-GHz>,
>> 12 cores @ 2.2 GHz
>> - 768 GB 1866 MHz RAM
>> - 6 TB local storage
>> - Data Transfer Node to support high-bandwidth data transfers
>> - Dual Intel Xeon E2650 v4, 12 cores @ 2.2 GHz
>> - 256 GB 2133 MHz RAM
>> - ~40 TB local disk storage in a RAID 6 array
>> - *Intel Omni-Path* high-performance (100 Gbps) network fabric
>> - 1 Gb ethernet cluster management and IPMI networks
>>
>>
>> To the best of my knowledge, the correct build to use would be
>> Linux-x86_64-ibverbs-smp
>> <http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD>,
>> however the more I read about multi-node NAMD builds, the more confused I
>> get. Also, could I just download the binaries to my cluster-home directory
>> then run the multi-node version?
>>
>> I am currently doing all of my research from a few GPU desktops, and need
>> the HPC.
>>
>> Thank you for hearing me out!
>>
>> ~ Bassam
>>
>>
>>
>

-- 
Giacomo Fiorin
Associate Professor of Research, Temple University, Philadelphia, PA
Contractor, National Institutes of Health, Bethesda, MD
http://goo.gl/Q3TBQU
https://github.com/giacomofiorin

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2018 - 23:20:40 CST