AW: A larger protein data bank for performance benchmark

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue May 14 2013 - 00:46:34 CDT

Hi,

 

agree with the last poster. You got to inform about NAMD a little more and about HPC. I’m specialist for IT, too, so I know what I’m talking about. There are several configuration options that will have huge impact on NAMDs scaling behavior (magnitudes of 2 or 4, means some hundred percent). This will range from compile stage over the OS settings to the configuration of NAMD itself. This is not unique to NAMD but is valid for HPC in general, as every computational problem and its parallelization has a own nature. I agree with the others, that there are several benchmark systems out there, that are used frequently and are kind of representative for NAMDs performance on different platforms, two of these are Apoa1 (small-middle sized) or stmv (huge size) and should be a good choice. Usually NAMD scales well out of the box, when running on one node only (CPU version) and the distributed memory version is usually faster than smp (also valid for other HPC applications). To get reliable results, you should bind the namd processes to the “real” physical cores, instead of letting the OS decide, if you have HT or MagnyCores. Otherwise, you will not measure the best performance. You will notice less impact of memory bandwidth when using NAMD, but if you have many cores per node, check if you get the best performance with all physical cores used, or less, care about the distribution of processes over multiple CPU sockets here. Also people use different settings for “fullelectfrequency”, what controls how often the long range electrostatic forces are updated (PME) which is one of the most expensive parts of the computation. So you should ship the settings you used for the benchmark with the results. Like mentioned on the net when googling for “namd tuning” you will also find some information about how to improve scaling over large number of processors (some 100). If you plan to benchmark multiple node performance also, there are some more things to care about, especially with the networks. Let me know and I’ll tell you.

 

So far, how to start a precompiled namd:

 

charmrun +p4 namd2 ++idlepoll apoa1.namd

 

Be sure your machine is prepared to logon to 127.0.0.1, localhost, its own ipadress aswell as its hostname without password and without asking for saving the ssh fingerprint.

 

Charmrun is the tool for the parallel startup of the Charm++ parallel programming model (more often you will find MPI and mpirun in HPC instead).

The “idlepoll” parameter will cause the namd processes to fill up idle CPU time with polling for messages (OpenMPI does by default). This reduces latency for inter process communication and therefore usually improves scaling. The only additional thing is a namd configuration file. This is a plain text file, containing namd tcl commands. So it won’t harm you to read the file and the manual and find out what which parameter controls. You will also notice, that you can increase the runtime by increasing the number of molecular dynamics steps to do (numsteps or run xxx). For very big molecular systems and without a high speed file system, you can also run into bottlenecks when having to low output frequencies for DCD and restart file. Depending on how serious these benchmarks should be, or on the reason why you want to do this (a special customer?) you could find some more parameters to modify and to measure the scaling impact on.

 

Regards

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von Aron Broom
Gesendet: Dienstag, 14. Mai 2013 05:57
An: linux freaker
Cc: Rebecca Swett; Ajasja Ljubetič; Namd Mailing List
Betreff: Re: namd-l: A larger protein data bank for performance benchmark

 

I think you may need to read a bit more about what NAMD does, how it runs and what its inputs are. But the short answer to your question is no, you can't just supply the coordinates. The user manual and tutorials on the main site are particularly good sources of information.

As has already been mentioned there already exist benchmarking systems that you can use which cover a range of sizes, and which would allow you to compare against other hardware. These are ready to use so would save some time.

Have you opened the src/alanin file that you ran to see what the input was? Moreover, how do you even know the run completed successfully? The 0.0085 seconds could be the time it took to fail from an inappropriate input.

 

On Mon, May 13, 2013 at 10:02 PM, linux freaker <linuxfreaker_at_gmail.com> wrote:

Rebecca,

I appreciate your response.
Actually I am into High Performance Computing Team where I need to benchmark different kinds of processor family using NAMD.

I just installed it and able to see that running ./namd src/alanin took 0.0085 sec. It finishes up in no seconds.

As I am quite new, I need to know if I actually need to build it to supply like alanin running directly like:

./namd2 src/x.pdb

just works or I need to build pdb format? Please suggest.

It would be great if you can send me one or two PDBs which are large in size and should take sometime for namd to finish.
Also, it should run directly like ./namd2 src/?.pdb.

 

On Tue, May 14, 2013 at 7:22 AM, Rebecca Swett <rswett_at_chem.wayne.edu> wrote:

I think you are being somewhat unclear. Are you looking for the PDB structure of a large protein? or are you looking for a large set of PDB's?
The acronym PDB stands for protein data bank, www.pdb.org is the repository of known crystal structures. Used as a file extension, it represents
crystallographic coordinates saved in the standardized protein data bank format. www.pdb.org is where you would find most known structure files in the downloadable format
of numbers and letters. For example 2aa1.pdb is the structure of the day today and is a large toxin. If you are looking for a collection of structure files in pdb format, you may want to look at some of the search and download options on that site. However I strongly suggest doing some more reading. It is imperative to good science that users understand the files and formats they are working with.

R.J. Swett
Wayne State University
357 Chemistry
Detroit, MI 48201
 
Lab Phone 313-577-0552
Cell Phone 906-235-0768

On 5/13/2013 8:51 PM, linux freaker wrote:

Hi All,

I looked into pdb.org but couldnt get large size protein data bank. As I am new to NAMD, please suggest me larger PDB.
http://chemistry.gsu.edu/Glactone/PDB/Amino_Acids/aa.html is the link I tried looking into but those are samll size PDBs.

Can you please paste the link for large size PDB?

Also, is it possible to accomunlate and run ./namd2 src/* for selecting various PDBs?

 

On Mon, May 13, 2013 at 11:42 PM, linux freaker <linuxfreaker_at_gmail.com> wrote:

Which shall I choose? can you suggest me a pdb just like alanin which I can use directly?

On 13 May 2013 22:43, "Ajasja Ljubetič" <ajasja.ljubetic_at_gmail.com> wrote:

Here is a ready to run benchmark (apo1)

http://www.ks.uiuc.edu/Research/namd/performance.html (Direct download <http://www.ks.uiuc.edu/Research/namd/utilities/apoa1.tar.gz> )

 

 

 

 

 

On 13 May 2013 18:52, Cesar Millan <pachequin_at_gmail.com> wrote:

www.pdb.org

 

On Mon, May 13, 2013 at 11:49 AM, linux freaker <linuxfreaker_at_gmail.com> wrote:

I installed namd on linux machine.I ran src/alanin on 4 processor and it completed in no time.I need it to take some 5-10 minutes running so that I can benchmark.where I can download pdb from which is larger in size?

 

 

 

 

 

-- 
Aron Broom M.Sc
PhD Student
Department of Chemistry
University of Waterloo

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:23:14 CST