AW: Two GPU-based workstation

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Thu Nov 07 2013 - 01:51:43 CST

Hi James,

 

I guess you already use a fullelectfrequency > 1, maybe 4 is nice for GPUs. Additionally, depending on system size, “twoawayx yes” can have huge impact, but must be checked for individual systems as it can also slow down the simulation. If twoawayx comes with a speedup, it’s worth to try twoawayy additionally. If twoawayy speedups too, try twoawayz also of course.

 

Norman Geist.

 

Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im Auftrag von James Starlight
Gesendet: Donnerstag, 7. November 2013 06:32
An: Norman Geist; Namd Mailing List
Betreff: Re: namd-l: Two GPU-based workstation

 

I've gone to conclusion that using 2 GPUs simultaneously gave me the same performance as 1 GPU like

charmrun +p12 ++runscript ./runscript.sh namd2 +idlepoll +devices 0 ./md.conf >> ./output/log_1gpus_12proc

charmrun +p12 +ppn6 ++runscript ./runscript.sh namd2 +idlepoll +devices 0,1 ./md.conf >> ./output/log_2gpus_12proc_ppn 2

Doest it be due to the small CPU cores or addition RAM ( this system has 32 gb) is needed ? OR may be some extra options are needed in the config?

James

 

2013/11/4 Norman Geist <norman.geist_at_uni-greifswald.de>

Yes, but notice that using virtual cores (Intel Hyperthreading - HT) usually comes with no or negative speedup. So assuming 6 instead of 12 cores might be better. The ratio between processes and threads should be benchmarked.

 

Norman Geist.

 

Von: James Starlight [mailto:jmsstarlight_at_gmail.com]
Gesendet: Montag, 4. November 2013 11:43

An: Norman Geist; Namd Mailing List
Betreff: Re: namd-l: Two GPU-based workstation

 

Norman,

In case of my system I've noticed trivial notation: the ussage of higher number of CPUs (with both GPUs active) gives higher performance.

My 6 cores i7 is recognized as 12 nodes in Debian. In accordance to your suggestions the best palatalization on my workstation without any connection to other nodes could be by means of

charmrun +p12 +ppn6 ++runscript ./runscript.sh namd2 +idlepoll +devices 0,1 ./md.conf >> ./output/log_2gpus_12proc_ppn 2

doesnt it ?

James

 

2013/11/4 Norman Geist <norman.geist_at_uni-greifswald.de>

 

Von: James Starlight [mailto:jmsstarlight_at_gmail.com]

Gesendet: Montag, 4. November 2013 08:52

An: Norman Geist; Namd Mailing List

Betreff: Re: namd-l: Two GPU-based workstation

 

Norman,

 

James,

thanks for suggestions! As I've noticed NAMD directory also consist of libcudart.so.4 as 've found in the VMDs dir which correspond to the 4.xx version of cudatools (I have installed cuda-5.00) Would it be the source of conflicts between older and newest nvidia drivers?

Usually not, and it’s anyway recommended to use the libcudart shipped with namd.

What are another advantageous to run namd via charmrun ? Does it possible to show time remained to the end of simulation ?

To understand this, you need to know that there are two main ways of how a parallelism can be implemented in a software. The 1st is “shared memory” the 2nd is “distributed memory”. The shared memory parallelism, also called “threading” usually uses one single process, using multiple cpu cores with threads (top shows a process using multiple 100 %). Usually it just runs iterations of loops that are mostly independent on different cores to speedup the software, instead of running them serially. The distributed memory parallelism uses multiple processes, doing predefined parts of the work in parallel while exchanging information over a network protocol. So if you have a multicore-only build of namd, without network support, you cannot run across multiple machines as they do not share the same memory. Moreover, both parallelization methods have advantages and disadvantages. Luckily, namd have two layers of parallelism using both methods. Usually using the distributed memory is faster for most applications but there can be some sweet spots on various platform using a mixture of both. For example running one process per cpu socket threading over all the cores of it, but that’s just theory.

I didn’t notice that you just use the multicore version. Possibly running a dedicated process per GPU will come with an advantage in speed and is worth to try. You will need a build with network support. As long as you stay on one node, you can easily run with++local to charmrun. For multiple nodes you will need a passwordless ssh login between the nodes and the mentioned runscript method.

 

So for one node, something like the following should show 2 processes, threading over the rest of the cores:

charmrun +p6 +ppn3 ++local namd2 +idlepoll +devices 0,1 your.config

Norman Geist

James

 

 

2013/11/4 Norman Geist <norman.geist_at_uni-greifswald.de>

The log file is what you see on the screen when starting namd like you did obviously. To get a file of it, append the command with >> my.log 2>> my.errors to redirect the output of stdout and stderr.

GPU load monitoring is only enabled on quadro and tesla series cards.

The easiest option to bypass the libcudart stuff is to use the charmrun ++runscript option. Save the following three lines to a file called runscript.sh in your namd folder and make it executable with "chmod +x runscript.sh", adapt the path to your namd location.

#!/bin/bash
export LD_LIBRARY_PATH=/your/namd/folder/:$LD_LIBRARY_PATH
$*

Now always start namd like "/your/namd/folder/charmrun +p6 ++runscript /your/namd/folder/runscript.sh /your/namd/folder/namd2 +idlepoll +devices 0,1 >> log 2>> errors"

Norman Geist.

> -----Ursprüngliche Nachricht-----

> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von James Starlight

> Gesendet: Sonntag, 3. November 2013 15:04

> An: Ajasja Ljubetič; Namd Mailing List

> Betreff: Re: namd-l: Two GPU-based workstation
>

> Its strange but no log file is found in the work directory and I could
> not
> find a suitable option in the conf file for the log saving. :) Almost
> that I have its the 645000 steps computed during 6 hours of simulation
> (100k atoms protein in explicit water)
> Also I'd be thankful for list of all possible swithes accompanied with
> the
> namd2 terminal command
>
> James
>
>
>
>
> 2013/11/3 Ajasja Ljubetič <ajasja.ljubetic_at_gmail.com>
>
> >
> > On 3 November 2013 09:38, James Starlight <jmsstarlight_at_gmail.com>
> wrote:
> >
> >> updating
> >>
> >> using namd2 +idlepoll +p4 +devices 0,1 ./restart.conf
> >> I've launched simulation on both GPUs (according to thermal
> monitoring in
> >> nvidia-settings) but only half of cpus were fully loaded.
> >>
> >> Yes, naturally. Look up what the +p4 switch does. (Also read up on
> > hyperthreading)
> >
> > By the way how I could monitor real GPU loading as well as namd
> >> performance ( in ns\days or GFlops )?
> >>
> >
> > Try looking in the namd log file for the ns/days speed.
> >
> > And out of interest do report the ns/day of
> >
> > namd2 +idlepoll +p6 +devices 0 ./restart.conf
> > vs
> > namd2 +idlepoll +p6 +devices 0,1 ./restart.conf
> >
> > Regards,
> > Ajasja
> >
> >
> >>
> >>
> >> James
> >>
> >>
> >> 2013/11/1 James Starlight <jmsstarlight_at_gmail.com>
> >>
> >>> Ok. I'll try to make some simulations of this configure. The main
> issue
> >>> with which I can force is the possible conflict between that older
> cuda
> >>> library (used from vmd) and more newest development driver ( 5.5
> version)
> >>> which comes from installed cuda-5.5.
> >>>
> >>> By the way how I could use both of the GPUs simultaneously ? Just
> use
> >>> the below command?
> >>>
> >>> namd2 +idlepoll +p4 +devices 0,1 ./restart.conf
> >>>
> >>> Where 0 and 1 are the ids of my GPUs? Is there additional options
> for
> >>> synchronization of the simulations in dual-GPU regime ?
> >>>
> >>> James
> >>>
> >>>
> >>> 2013/10/31 Aron Broom <broomsday_at_gmail.com>
> >>>
> >>>> don't replace anything, just point to the version of the library
> in
> >>>> your NAMD directory as you did. It should work fine.
> >>>>
> >>>>
> >>>> On Thu, Oct 31, 2013 at 1:24 PM, James Starlight <
> >>>> jmsstarlight_at_gmail.com> wrote:
> >>>>
> >>>>> Dear Namd users,
> >>>>>
> >>>>> I've build my new workstations consisted of two Titans with i6
> (linux
> >>>>> recognize it like 12 core process but actually it consist of 6

> nodes)=

 

 

 

This archive was generated by hypermail 2.1.6 : Wed Dec 31 2014 - 23:21:53 CST