Re: namd cvs compilation with a maximum number of cores to run on

From: Vlad Cojocaru (Vlad.Cojocaru_at_eml-r.villa-bosch.de)
Date: Thu Feb 12 2009 - 09:04:14 CST

Hi Peter,

my system has about 250K atoms. However, the simulation runs with no
problem on 512 cores. The segmentation fault only appears if I ask for
1024 cores for the same job.

Vlad

Peter Freddolino wrote:
> Hi Vlad,
> there is, in fact, not a 512 core limit in namd (we frequently run on
> more); both of the instances of the number 512 in the code that you
> mention are red herrings. PROCESSORMAX is never used, and numcpus is a
> character array used to *print* the number of processors being used (and
> thus the limit is a string 512 characters long). Segmentation faults on
> startup (or shortly thereafter) with large systems usually mean that
> node 0 is running out of memory. I'd recommend trying the steps at
> http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdMemoryReduction
> to reduce your memory usage. Out of curiosity, how large is your system?
>
> Best,
> Peter
>
> Vlad Cojocaru wrote:
>
>> Dear NAMDers,
>>
>> I compiled on August 6th last year, the namd cvs code on an
>> opteron-based linux cluster with infiniband (intel compiler, mvapich).
>> I was running it all the time on 512 cores and everything worked fine.
>> Now, I have a much bigger system and I wanted to run on 1024 cores.
>> However, I started getting "Segmentation fault" errors (nothing else
>> in the error message) on jobs that I could run on 512 cores. I was
>> puzzled by this as it didn't make sense and with some help I actually
>> discovered that the code was compiled to run on maximum 512 cores.
>>
>> (defined in LdbCoordinator.h (PROCESSORMAX = 512) and in main.C (char
>> numcpus[512] )
>>
>> Since I haven't changed anything in the code before compiling, I
>> assume this was built in the cvs code . What I found even more strange
>> is the error message. Instead of having something like "The maximum
>> number of cpus (512) was exceeded" I get something like "Segmentation
>> fault" (which doesn't tell much)
>>
>> Now, my questions are:
>> 1. Why does the code define a maximum number of cpus since namd is
>> meant to be run on large parrallel machines?
>> 2. Is that the case with he newest cvs code as well ?
>> 3. If I want to compile for running on more cpus, what do I have to
>> modify in the cvs code ?
>> 4. If this definition of max no of cpus is kept, is it possible to add
>> a relevant error message when trying to run on more cpus ?
>>
>> Thanks a lot
>>
>> Best wishes
>> vlad
>>
>>
>
>

-- 
----------------------------------------------------------------------------
Dr. Vlad Cojocaru
EML Research gGmbH
Schloss-Wolfsbrunnenweg 33
69118 Heidelberg
Tel: ++49-6221-533202
Fax: ++49-6221-533298
e-mail:Vlad.Cojocaru[at]eml-r.villa-bosch.de
http://projects.villa-bosch.de/mcm/people/cojocaru/
----------------------------------------------------------------------------
EML Research gGmbH
Amtgericht Mannheim / HRB 337446
Managing Partner: Dr. h.c. Klaus Tschira
Scientific and Managing Director: Prof. Dr.-Ing. Andreas Reuter
http://www.eml-r.org
----------------------------------------------------------------------------

This archive was generated by hypermail 2.1.6 : Wed Feb 29 2012 - 15:52:21 CST