Re: NAMD2.7 on BluegeneL hang at "LDB: Central LB being created..."

From: Dong Luo (us917_at_yahoo.com)
Date: Fri Mar 04 2011 - 10:40:54 CST

Well, disable virtual node does not really solve the problem, just delayed the
occurrence of "FATAL ERROR: Memory allocation failed on processor 0." With 128
physical nodes, it occurred after about 270000 steps compared to 121000 steps
for 256 virtual nodes for the test system with 50k atoms. For another bigger
simulation system with 170k atoms, this FATAL ERROR happened only after 20000
steps. NAMD2.6 version has no such problem.
Each Bluegene/L node consist of dual core 32-bit PPC440 processors (700 MHz)
with 512 MB of main memory. Each node has a 32 KB L1 cache, 2 KB L2 cache, and
a 4 MB L3 cache.

Dong

________________________________
From: Dong Luo <us917_at_yahoo.com>
To: Chris Harrison <charris5_at_gmail.com>
Cc: namd-l_at_ks.uiuc.edu
Sent: Fri, March 4, 2011 9:08:38 AM
Subject: Re: namd-l: NAMD2.7 on BluegeneL hang at "LDB: Central LB being
created..."

I didn't say it clearly. I'm using the CVS version of Charm++, but modified the
configure file to skip MPI test, otherwise it will refuse to compile.
 
But now I run into another problem with the fresh compiled namd2.
With virtual node enabled, the simulation will get an "FATAL ERROR: Memory
allocation failed on processor 0." at step about 121000 (repeatable). The
simulation system contains only 50905 atoms. Disable virtual mode solves the
problem but slows the calculation speed from "256 CPUs 0.0204817 s/step 0.237057
days/ns" to "128 CPUs 0.0299266 s/step 0.346373 days/ns". Each physical CPU has
2 nodes on it. Thats why 128 CPUs can be counted as 256 when in virtual node
mode.
 
Dong

________________________________
 From: Chris Harrison <charris5_at_gmail.com>
To: Dong Luo <us917_at_yahoo.com>
Cc: akohlmey_at_gmail.com; namd-l_at_ks.uiuc.edu
Sent: Thu, March 3, 2011 9:08:20 PM
Subject: Re: namd-l: NAMD2.7 on BluegeneL hang at "LDB: Central LB being
created..."

Are you really using Charm++ 2.2?! Is there a reason?

This may work for you, but you should really upgrade to Charm++
6.2.1 or later when possible. Otherwise you're missing improvements
to performance from the more recent Charm++ versions.

Best,
Chris

--
Chris Harrison, Ph.D.
Theoretical and Computational Biophysics Group
NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman  Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave., Urbana, IL 61801
char_at_ks.uiuc.edu                          Voice: 217-244-1733
http://www.ks.uiuc.edu/~char              Fax:  217-244-6078
Dong Luo <us917_at_yahoo.com> writes:
> Date: Thu, 3 Mar 2011 17:59:10 -0800 (PST)
> From: Dong Luo <us917_at_yahoo.com>
> To: Chris Harrison <charris5_at_gmail.com>, akohlmey_at_gmail.com
> Cc: namd-l_at_ks.uiuc.edu
> Subject: Re: namd-l: NAMD2.7 on BluegeneL hang at "LDB: Central LB being
>  created..."
> X-Mailer: YahooMailRC/559 YahooMailWebService/0.8.109.292656
> 
> Chris, the CVS version of namd/charm++ work. Only that I have to comment out 
>MPI 
>
> checking in the configure file of charm++ because it fails on Bluegene/L. It is 
>
> not checked in charm++ 2.2.
> 
> Axel, namd/charm++ are cross-compiled on Bluegene/L because the login host uses 
>
> different OS compared to the cluster nodes. I did not figure out a way to test 
> charm++.
> 
> Dong
> 
>  
> 
> ________________________________
> From: Chris Harrison <charris5_at_gmail.com>
> To: Dong Luo <us917_at_yahoo.com>
> Cc: namd-l_at_ks.uiuc.edu
> Sent: Thu, March 3, 2011 1:41:50 AM
> Subject: Re: namd-l: NAMD2.7 on BluegeneL hang at "LDB: Central LB being 
> created..."
> 
> We've made recent improvements to startup and load-balancing.  Can you
> try the CVS version or one of the nightly builds of namd, with the most 
> recent git archive or nightly build of charm++?
> 
> Best,
> Chris
> 
> 
> --
> Chris Harrison, Ph.D.
> Theoretical and Computational Biophysics Group
> NIH Resource for Macromolecular Modeling and Bioinformatics
> Beckman Institute for Advanced Science and Technology
> University of  Illinois, 405 N. Mathews Ave., Urbana, IL 61801
> 
> char_at_ks.uiuc.edu                          Voice: 217-244-1733
> http://www.ks.uiuc.edu/~char              Fax:  217-244-6078
> 
> 
> Dong Luo <us917_at_yahoo.com> writes:
> > Date: Wed, 2 Mar 2011 19:42:57 -0800 (PST)
> > From: Dong Luo <us917_at_yahoo.com>
> > To: namd-l_at_ks.uiuc.edu
> > Subject: namd-l: NAMD2.7 on BluegeneL hang at "LDB: Central LB being
> >   created..."
> > X-Mailer: YahooMailRC/555 YahooMailWebService/0.8.109.292656
> > 
> > Hi,
> > 
> > I'm trying to use colvars in NAMD2.7 for distance restraints. There is no 
> > precompiled version for BluegeneL in the download section. I downloaded the 
> > source code and compiled following the instructions on this link: 
> > http://bluegene.bnl.gov/comp/buildnamd.html
> > 
> > However, the simulation (no matter with colvars or not) using this namd2 2.7 
> > version always hang after Startup phase 5 as shown in the log:
> > "
> > Info: REMOVING COM VELOCITY 0.0209799 0.0192793 0.000362722
> > Info: LARGEST PATCH (156) HAS 345 ATOMS
> > Info: Startup phase 3 took 0.246489 s, 17.3047 MB of memory in use
> > Info: PME using 40 and 32  processors for FFT and reciprocal sum.
> > Info: PME GRID LOCATIONS: 7 15 23 27 31 39 47 55 59 63 ...
> > Info: PME TRANS LOCATIONS: 3 11 19 29 35 43 51 61 67 75 ...
> > Info: Startup phase 4 took 0.00254185 s, 17.3047 MB of memory in use
> > Info: Startup phase 5 took 0.0261579 s, 17.3047 MB of memory in use
> > LDB: Central LB being created...
> > "
> > namd2 2.6 version can run normally, but lacks the colvars function I assume.
> > 
> > Any directions?
> > 
> > Thank you.
> > 
> > Dong
> > 
> > 
> > 
> >      
> 
> 
>      
      

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 05:23:38 CST