Re: AW: AW: Low CPU usage with NAMD running on linux cluster

From: amin_at_imtech.res.in
Date: Wed Aug 01 2012 - 00:16:30 CDT

Thanks for the replies. I tried the Linux-x86_64-ibverbs version of NAMD and to
my surprise it worked and it shows almost 100% usage of 24 processors. I guess
the Infiniband network is usable and I was under wrong impression based on my
conversation with other users. Thanks for suggesting lshw.I am getting close to
5ns a day on my 20K system which is good enough for me.

Regards.
Amin.

> Hi again,
>
> against what Aron supposed, you don't have a Hyper Theading capable CPU. You got
> two sockets with each one Xeon X5460 Quadcore. And you also got Gigabit-Ethernet
> aswell as an Infiniband HCA. So you should check if the Infiniband network is
> usable. If you have a administrator, you should ask him. If not, show us the
> output of ifconfig to check if IPoIB is active and check if the infiniband
> driver itself is installed, therefore check some tools like ibnodes, ibhosts
> etc. and remember that at least one subnetmaganer (opensm) have to run on one of
> the nodes.
>
> If the infiniband is not usable, because it not cabled, the Gigabit-Ethernet
> should also not be bad. But it will run the best, IMHO, with the UDP (not TCP)
> binary.
>
> Possibly you should show us the Initial Time or better Benchmark Time lines from
> your output with different numbers of used nodes so we can check if there's a
> scaling problem anyway or if the low CPU utilization is due to your small
> simulation system.
>
> Norman Geist.
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> Auftrag von amin_at_imtech.res.in
>> Gesendet: Dienstag, 31. Juli 2012 07:01
>> An: Norman Geist
>> Cc: namd-l_at_ks.uiuc.edu
>> Betreff: Re: AW: namd-l: Low CPU usage with NAMD running on linux
>> cluster
>>
>> Thanks for the suggestions. Here is what I got using lshw -short.
>> Please have a
>> look at it.
>> Regards.
>>
>> Amin.
>>
>>
>> WARNING: you should run this program as super-user.
>> H/W path Device Class Description
>> =========================================================
>> system Computer
>> /0 bus Motherboard
>> /0/0 memory 13GiB System memory
>> /0/1 processor Intel(R) Xeon(R) CPU
>> X5460 @ 3.16GHz
>> /0/2 processor Intel(R) Xeon(R) CPU
>> X5460 @ 3.16GHz
>> /0/100 bridge 5000P Chipset Memory
>> Controller Hub
>> /0/100/2 bridge 5000 Series Chipset PCI
>> Express x8
>> Port 2-3
>> /0/100/2/0 bridge 6311ESB/6321ESB PCI
>> Express
>> Upstream Port
>> /0/100/2/0/0 bridge 6311ESB/6321ESB PCI
>> Express
>> Downstream Port E1
>> /0/100/2/0/0/0 ib0 bus MT25204 [InfiniHost III
>> Lx HCA]
>> /0/100/2/0/2 bridge 6311ESB/6321ESB PCI
>> Express
>> Downstream Port E3
>> /0/100/2/0/2/0 __tmp1817455902 network 80003ES2LAN Gigabit
>> Ethernet
>> Controller (Copper)
>> /0/100/2/0/2/0.1 eth3 network 80003ES2LAN Gigabit
>> Ethernet
>> Controller (Copper)
>> /0/100/2/0.3 bridge 6311ESB/6321ESB PCI
>> Express to
>> PCI-X Bridge
>> /0/100/2/0.3/2 scsi0 storage SAS1064 PCI-X Fusion-MPT
>> SAS
>> /0/100/3 bridge 5000 Series Chipset PCI
>> Express x4
>> Port 3
>> /0/100/4 bridge 5000 Series Chipset PCI
>> Express x8
>> Port 4-5
>> /0/100/5 bridge 5000 Series Chipset PCI
>> Express x4
>> Port 5
>> /0/100/6 bridge 5000 Series Chipset PCI
>> Express x8
>> Port 6-7
>> /0/100/7 bridge 5000 Series Chipset PCI
>> Express x4
>> Port 7
>> /0/100/8 system 5000 Series Chipset DMA
>> Engine
>> /0/100/1c bridge 631xESB/632xESB/3100
>> Chipset PCI
>> Express Root Port 1
>> /0/100/1c/0 eth0 network 82571EB Gigabit Ethernet
>> Controller
>> /0/100/1c/0.1 eth1 network 82571EB Gigabit Ethernet
>> Controller
>> /0/100/1d bus 631xESB/632xESB/3100
>> Chipset UHCI
>> USB Controller #1
>> /0/100/1d/1 usb2 bus UHCI Host Controller
>> /0/100/1d.1 bus 631xESB/632xESB/3100
>> Chipset UHCI
>> USB Controller #2
>> /0/100/1d.1/1 usb3 bus UHCI Host Controller
>> /0/100/1d.2 bus 631xESB/632xESB/3100
>> Chipset UHCI
>> USB Controller #3
>> /0/100/1d.2/1 usb4 bus UHCI Host Controller
>> /0/100/1d.3 bus 631xESB/632xESB/3100
>> Chipset UHCI
>> USB Controller #4
>> /0/100/1d.3/1 usb5 bus UHCI Host Controller
>> /0/100/1d.7 bus 631xESB/632xESB/3100
>> Chipset EHCI
>> USB2 Controller
>> /0/100/1d.7/1 usb1 bus EHCI Host Controller
>> /0/100/1d.7/1/8 storage Multidevice
>> /0/100/1e bridge 82801 PCI Bridge
>> /0/100/1e/c display ES1000
>> /0/100/1f bridge 631xESB/632xESB/3100
>> Chipset LPC
>> Interface Controller
>> /0/100/1f.1 storage 631xESB/632xESB IDE
>> Controller
>> /0/100/1f.1/0 ide0 bus IDE Channel 0
>> /0/100/1f.1/0/1 /dev/hdb disk Optiarc DVD RW AD-7560A
>> /0/100/1f.2 storage 631xESB/632xESB/3100
>> Chipset SATA
>> IDE Controller
>> /0/100/1f.3 bus 631xESB/632xESB/3100
>> Chipset SMBus
>> Controller
>> /0/101 bridge 5000 Series Chipset FSB
>> Registers
>> /0/102 bridge 5000 Series Chipset FSB
>> Registers
>> /0/103 bridge 5000 Series Chipset FSB
>> Registers
>> /0/104 bridge 5000 Series Chipset
>> Reserved
>> Registers
>> /0/105 bridge 5000 Series Chipset
>> Reserved
>> Registers
>> /0/106 bridge 5000 Series Chipset FBD
>> Registers
>> /0/107 bridge 5000 Series Chipset FBD
>> Registers
>> /1 scsi3 storage
>> /2 scsi4 storage
>> /3 scsi5 storage
>> /4 scsi6 storage
>> > Hi all,
>> >
>> > for me this sounds more like a problem of the node interconnect. BTW
>> as long as
>> > the scaling is linear, you shouldn't give too much on low CPU
>> utilization. If
>> > the scaling is bad, that means the speedups are not the same as the
>> used nodes,
>> > you should more worry.
>> >
>> > First of all we need to know what interconnect you have. You already
>> told
>> > Ethernet, but there are 10/100/1000/10000 MBit/s versions outside
>> there so
>> > please tell what it is.
>> >
>> > You can check that by doing lshw on the nodes and look for Ethernet
>> adapters,
>> > post the name of the devices here.
>> > Also the name or model of the switch is of course important.
>> >
>> > Let us know
>> >
>> > Norman Geist.
>> >
>> >> -----Ursprüngliche Nachricht-----
>> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
>> >> Auftrag von amin_at_imtech.res.in
>> >> Gesendet: Sonntag, 29. Juli 2012 07:10
>> >> An: Rajan Vatassery
>> >> Cc: namd-l_at_ks.uiuc.edu
>> >> Betreff: Re: namd-l: Low CPU usage with NAMD running on linux
>> cluster
>> >>
>> >> What I meant was that I am distributing 24 processes over 4 nodes
>> >> having 8
>> >> processors each.Therefore each node will have 2 processors free. I
>> have
>> >> not
>> >> completed any run till now so I cant give the "TIMING INFORMATION"
>> >> because the
>> >> simulation is running extremely slow. My 2 ns simulation is running
>> for
>> >> more
>> >> than 2 days now.I will post that information as soon as the run is
>> >> completed.
>> >> Thanks.
>> >>
>> >> Amin.
>> >>
>> >>
>> >>
>> >>
>> >> > Amin,
>> >> > Do you really mean that you have requested 4 nodes x 8 processors
>> >> each
>> >> > = 32 processors? I'm wondering because you said you have only 24
>> >> > processors. Also, Branko is not asking about output frequencies,
>> but
>> >> > rather the details of the output that your simulation is giving.
>> For
>> >> > example, the "TIMING" information from the log file is indicative
>> of
>> >> the
>> >> > seconds/step that your simulation is running at.
>> >> >
>> >> > rajan
>> >> >
>> >> > On Sat, 2012-07-28 at 20:40 +0530, amin_at_imtech.res.in wrote:
>> >> >> Thanks. I have read the link. I have a PBC system with 20,000
>> atoms
>> >> with time
>> >> >> step=1, dcdFreq=500, outputEnergies=1000. I am trying to run NPT
>> >> simulation
>> >> >> using 24 processors on 4 nodes having 8 processors each.
>> >> >>
>> >> >> Regards.
>> >> >> Amin.
>> >> >>
>> >> >>
>> >> >> > Amin,
>> >> >> >
>> >> >> > Provide more data about the size of your system, output data,
>> and
>> >> see:
>> >> >> >
>> >> >> >
>> http://www.ks.uiuc.edu/Research/namd/wiki/?NamdPerformanceTuning
>> >> >> >
>> >> >> > Branko
>> >> >> >
>> >> >> > On 7/28/2012 8:33 AM, amin_at_imtech.res.in wrote:
>> >> >> >> Dear all,
>> >> >> >> I am trying to run NAMD on a linux cluster. I am using
>> NAMD
>> >> >> Linux-x86_64
>> >> >> >> (64-bit Intel/AMD with ethernet). While i am able to run the
>> >> program on the
>> >> >> >> nodes listed in the nodelist file but I find that all the
>> >> processes are
>> >> >> >> running at only 8-12 % CPU usage. Can someone please guide me?
>> >> >> >>
>> >> >> >> Regards.
>> >> >> >> Amin.
>> >> >> >>
>> >> >> >>
>> >>
>> ______________________________________________________________________
>> >> >> >> सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक
>> >> औद्योगिक अनुसंधान परिषद)
>> >> >> >> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT
>> OF
>> >> CSIR)
>> >> >> >> स%u09
>> >>
>> >>
>> ______________________________________________________________________
>> >> सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक औद्योगिक
>> >> अनुसंधान परिषद)
>> >> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF
>> CSIR)
>> >> सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh
>> >> पिन कोड/PIN CODE :160036
>> >> दूरभाष/EPABX :0172 6665 201-202
>> >
>> >
>> >
>>
>>
>> ______________________________________________________________________
>> सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक औद्योगिक
>> अनुसंधान परिषद)
>> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF CSIR)
>> सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh
>> पिन कोड/PIN CODE :160036
>> दूरभाष/EPABX :0172 6665 201-202
>
>
>

______________________________________________________________________
सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक औद्योगिक अनुसंधान परिषद)
Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF CSIR)
सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh
पिन कोड/PIN CODE :160036
दूरभाष/EPABX :0172 6665 201-202

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:52 CST