From: Peter Freddolino (petefred_at_ks.uiuc.edu)
Date: Thu Feb 12 2009 - 08:32:59 CST
Hi Vlad,
there is, in fact, not a 512 core limit in namd (we frequently run on
more); both of the instances of the number 512 in the code that you
mention are red herrings. PROCESSORMAX is never used, and numcpus is a
character array used to *print* the number of processors being used (and
thus the limit is a string 512 characters long). Segmentation faults on
startup (or shortly thereafter) with large systems usually mean that
node 0 is running out of memory. I'd recommend trying the steps at
http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdMemoryReduction
to reduce your memory usage. Out of curiosity, how large is your system?
Best,
Peter
Vlad Cojocaru wrote:
> Dear NAMDers,
>
> I compiled on August 6th last year, the namd cvs code on an
> opteron-based linux cluster with infiniband (intel compiler, mvapich).
> I was running it all the time on 512 cores and everything worked fine.
> Now, I have a much bigger system and I wanted to run on 1024 cores.
> However, I started getting "Segmentation fault" errors (nothing else
> in the error message) on jobs that I could run on 512 cores. I was
> puzzled by this as it didn't make sense and with some help I actually
> discovered that the code was compiled to run on maximum 512 cores.
>
> (defined in LdbCoordinator.h (PROCESSORMAX = 512) and in main.C (char
> numcpus[512] )
>
> Since I haven't changed anything in the code before compiling, I
> assume this was built in the cvs code . What I found even more strange
> is the error message. Instead of having something like "The maximum
> number of cpus (512) was exceeded" I get something like "Segmentation
> fault" (which doesn't tell much)
>
> Now, my questions are:
> 1. Why does the code define a maximum number of cpus since namd is
> meant to be run on large parrallel machines?
> 2. Is that the case with he newest cvs code as well ?
> 3. If I want to compile for running on more cpus, what do I have to
> modify in the cvs code ?
> 4. If this definition of max no of cpus is kept, is it possible to add
> a relevant error message when trying to run on more cpus ?
>
> Thanks a lot
>
> Best wishes
> vlad
>
This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:21 CST