Re: Which NAMD-Build for my local HPC?

From: Bassam Haddad (bhaddad_at_pdx.edu)
Date: Tue Oct 31 2017 - 15:50:19 CDT

*Hi Again, *
*I decided to go ahead and test the ibverbs binary, and it seems to be
working fine (for 1 node) however I am getting the following error when I
run the slurm.submit script....*

Charmrun> scalable start enabled.
Charmrun> IBVERBS version of charmrun
Charmrun> started all node programs in 0.068 seconds.
Charm++> Running in SMP mode: numNodes 2, 19 worker threads per process
Charm++> The comm. thread both sends and receives messages
Charm++> Using recursive bisection (scheme 3) for topology aware partitions
Converse/Charm++ Commit ID:
v6.7.1-0-gbdf6a1b-namd-charm-6.7.1-build-2016-Nov-07-136676
Warning> Randomization of stack pointer is turned on in kernel.
Charm++> synchronizing isomalloc memory region...
[0] consolidated Isomalloc memory region: 0x2b7200000000 - 0x7ff900000000
(88633344 megs)
Charm++> scheduler running in netpoll mode.
CharmLB> Load balancer assumes all CPUs are same.
Charm++> cpu affinity enabled.
------------- Processor 38 Exiting: Called CmiAbort ------------
Reason:

                Length mismatch!!

[38] Stack Traceback:
  [38:0] [0x1349a90]
  [38:1] [0x134e4bf]
  [38:2] [0x13847d6]
  [38:3] [0x11b450b]
  [38:4] [0x51d3b2]
  [38:5] [0x1330b85]
  [38:6] [0x132efd2]
  [38:7] +0x7dc5 [0x2b6db3439dc5]
  [38:8] clone+0x6d [0x2b6db437c73d]
Fatal error on PE 38>

                Length mismatch!!

*Below I have added my submit script...*

#!/bin/bash
#SBATCH --job-name namd86
#SBATCH --partition long
#SBATCH --nodes 2
#SBATCH --ntasks-per-node 20
#SBATCH --time 00:10:00
#SBATCH --output namd-test.${SLURM_JOBID}.out

# choose version of NAMD to use
# export NAMD_DIR=/projects/username/NAMD/NAMD_2.11_Linux-x86_64-ibverbs-smp
# export PATH=$PATH:$NAMD_DIR

cd /scratch/bhaddad/NAMD/Coeus_test

# generate NAMD nodelist
for n in `echo $SLURM_NODELIST | scontrol show hostnames`; do
  echo "host $n ++cpus 19" >> nodelist.$SLURM_JOBID
done

# calculate total processes (P) and procs per node (PPN)
PPN=`expr $SLURM_NTASKS_PER_NODE - 1`
P="$(($PPN * $SLURM_NNODES))"

charmrun ++mpiexec ++remote-shell srun
/home/bhaddad/NAMD_2.12_Linux-x86_64-verbs-smp/namd2 ++p $P ++ppn $PPN
+setcpuaffinity +isomalloc_sync test.conf

rm nodelist.$SLURM_JOBID

*Thank you all, again!*

*~ Bassam*

On Tue, Oct 31, 2017 at 11:34 AM, Bassam Haddad <bhaddad_at_pdx.edu> wrote:

> Hi NAMD Users,
>
> I am working with the HPC manager at my university to get a multi-node
> build of NAMD installed on it. We are a bit at a cross-roads trying to
> determine the best build, and whether or not compiling it is necessary with
> the given binaries.
>
> Coeus HPC Specs
> <https://www.pdx.edu/oit/linux-parallel-computing-clusters>:
>
>
> - 128 compute nodes each with 20 cores and 128 GB RAM
> - Dual Intel Xeon E2630 v4
> <https://ark.intel.com/products/92981/Intel-Xeon-Processor-E5-2630-v4-25M-Cache-2_20-GHz>,
> 10 cores @ 2.2 GHz
> - 128 GB 2133 MHz RAM
> - 200 GB SSD drive
> - 12 Intel Phi processor nodes each with 64 cores and 96 GB RAM
> - Intel Xeon Phi 7210
> <https://ark.intel.com/products/94033/Intel-Xeon-Phi-Processor-7210-16GB-1_30-GHz-64-core>,
> 64 cores @ 2.2 GHz
> - 96 GB 2133 MHz RAM
> - 200 GB SSD drive
> - 2 large-memory compute nodes each with 20 cores and 128 GB RAM
> - Dual Intel Xeon E2650 v4
> <https://ark.intel.com/products/92981/Intel-Xeon-Processor-E5-2630-v4-25M-Cache-2_20-GHz>,
> 12 cores @ 2.2 GHz
> - 768 GB 1866 MHz RAM
> - 6 TB local storage
> - Data Transfer Node to support high-bandwidth data transfers
> - Dual Intel Xeon E2650 v4, 12 cores @ 2.2 GHz
> - 256 GB 2133 MHz RAM
> - ~40 TB local disk storage in a RAID 6 array
> - *Intel Omni-Path* high-performance (100 Gbps) network fabric
> - 1 Gb ethernet cluster management and IPMI networks
>
>
> To the best of my knowledge, the correct build to use would be
> Linux-x86_64-ibverbs-smp
> <http://www.ks.uiuc.edu/Development/Download/download.cgi?PackageName=NAMD>,
> however the more I read about multi-node NAMD builds, the more confused I
> get. Also, could I just download the binaries to my cluster-home directory
> then run the multi-node version?
>
> I am currently doing all of my research from a few GPU desktops, and need
> the HPC.
>
> Thank you for hearing me out!
>
> ~ Bassam
>
>
>

This archive was generated by hypermail 2.1.6 : Sun Dec 31 2017 - 23:21:45 CST