AW: AW: Low CPU usage with NAMD running on linux cluster

From: Norman Geist (norman.geist_at_uni-greifswald.de)
Date: Tue Jul 31 2012 - 01:06:32 CDT

Hi again,

against what Aron supposed, you don't have a Hyper Theading capable CPU. You got two sockets with each one Xeon X5460 Quadcore. And you also got Gigabit-Ethernet aswell as an Infiniband HCA. So you should check if the Infiniband network is usable. If you have a administrator, you should ask him. If not, show us the output of ifconfig to check if IPoIB is active and check if the infiniband driver itself is installed, therefore check some tools like ibnodes, ibhosts etc. and remember that at least one subnetmaganer (opensm) have to run on one of the nodes.

If the infiniband is not usable, because it not cabled, the Gigabit-Ethernet should also not be bad. But it will run the best, IMHO, with the UDP (not TCP) binary.

Possibly you should show us the Initial Time or better Benchmark Time lines from your output with different numbers of used nodes so we can check if there's a scaling problem anyway or if the low CPU utilization is due to your small simulation system.

Norman Geist.

> -----Ursprüngliche Nachricht-----
> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> Auftrag von amin_at_imtech.res.in
> Gesendet: Dienstag, 31. Juli 2012 07:01
> An: Norman Geist
> Cc: namd-l_at_ks.uiuc.edu
> Betreff: Re: AW: namd-l: Low CPU usage with NAMD running on linux
> cluster
>
> Thanks for the suggestions. Here is what I got using lshw -short.
> Please have a
> look at it.
> Regards.
>
> Amin.
>
>
> WARNING: you should run this program as super-user.
> H/W path Device Class Description
> =========================================================
> system Computer
> /0 bus Motherboard
> /0/0 memory 13GiB System memory
> /0/1 processor Intel(R) Xeon(R) CPU
> X5460 @ 3.16GHz
> /0/2 processor Intel(R) Xeon(R) CPU
> X5460 @ 3.16GHz
> /0/100 bridge 5000P Chipset Memory
> Controller Hub
> /0/100/2 bridge 5000 Series Chipset PCI
> Express x8
> Port 2-3
> /0/100/2/0 bridge 6311ESB/6321ESB PCI
> Express
> Upstream Port
> /0/100/2/0/0 bridge 6311ESB/6321ESB PCI
> Express
> Downstream Port E1
> /0/100/2/0/0/0 ib0 bus MT25204 [InfiniHost III
> Lx HCA]
> /0/100/2/0/2 bridge 6311ESB/6321ESB PCI
> Express
> Downstream Port E3
> /0/100/2/0/2/0 __tmp1817455902 network 80003ES2LAN Gigabit
> Ethernet
> Controller (Copper)
> /0/100/2/0/2/0.1 eth3 network 80003ES2LAN Gigabit
> Ethernet
> Controller (Copper)
> /0/100/2/0.3 bridge 6311ESB/6321ESB PCI
> Express to
> PCI-X Bridge
> /0/100/2/0.3/2 scsi0 storage SAS1064 PCI-X Fusion-MPT
> SAS
> /0/100/3 bridge 5000 Series Chipset PCI
> Express x4
> Port 3
> /0/100/4 bridge 5000 Series Chipset PCI
> Express x8
> Port 4-5
> /0/100/5 bridge 5000 Series Chipset PCI
> Express x4
> Port 5
> /0/100/6 bridge 5000 Series Chipset PCI
> Express x8
> Port 6-7
> /0/100/7 bridge 5000 Series Chipset PCI
> Express x4
> Port 7
> /0/100/8 system 5000 Series Chipset DMA
> Engine
> /0/100/1c bridge 631xESB/632xESB/3100
> Chipset PCI
> Express Root Port 1
> /0/100/1c/0 eth0 network 82571EB Gigabit Ethernet
> Controller
> /0/100/1c/0.1 eth1 network 82571EB Gigabit Ethernet
> Controller
> /0/100/1d bus 631xESB/632xESB/3100
> Chipset UHCI
> USB Controller #1
> /0/100/1d/1 usb2 bus UHCI Host Controller
> /0/100/1d.1 bus 631xESB/632xESB/3100
> Chipset UHCI
> USB Controller #2
> /0/100/1d.1/1 usb3 bus UHCI Host Controller
> /0/100/1d.2 bus 631xESB/632xESB/3100
> Chipset UHCI
> USB Controller #3
> /0/100/1d.2/1 usb4 bus UHCI Host Controller
> /0/100/1d.3 bus 631xESB/632xESB/3100
> Chipset UHCI
> USB Controller #4
> /0/100/1d.3/1 usb5 bus UHCI Host Controller
> /0/100/1d.7 bus 631xESB/632xESB/3100
> Chipset EHCI
> USB2 Controller
> /0/100/1d.7/1 usb1 bus EHCI Host Controller
> /0/100/1d.7/1/8 storage Multidevice
> /0/100/1e bridge 82801 PCI Bridge
> /0/100/1e/c display ES1000
> /0/100/1f bridge 631xESB/632xESB/3100
> Chipset LPC
> Interface Controller
> /0/100/1f.1 storage 631xESB/632xESB IDE
> Controller
> /0/100/1f.1/0 ide0 bus IDE Channel 0
> /0/100/1f.1/0/1 /dev/hdb disk Optiarc DVD RW AD-7560A
> /0/100/1f.2 storage 631xESB/632xESB/3100
> Chipset SATA
> IDE Controller
> /0/100/1f.3 bus 631xESB/632xESB/3100
> Chipset SMBus
> Controller
> /0/101 bridge 5000 Series Chipset FSB
> Registers
> /0/102 bridge 5000 Series Chipset FSB
> Registers
> /0/103 bridge 5000 Series Chipset FSB
> Registers
> /0/104 bridge 5000 Series Chipset
> Reserved
> Registers
> /0/105 bridge 5000 Series Chipset
> Reserved
> Registers
> /0/106 bridge 5000 Series Chipset FBD
> Registers
> /0/107 bridge 5000 Series Chipset FBD
> Registers
> /1 scsi3 storage
> /2 scsi4 storage
> /3 scsi5 storage
> /4 scsi6 storage
> > Hi all,
> >
> > for me this sounds more like a problem of the node interconnect. BTW
> as long as
> > the scaling is linear, you shouldn't give too much on low CPU
> utilization. If
> > the scaling is bad, that means the speedups are not the same as the
> used nodes,
> > you should more worry.
> >
> > First of all we need to know what interconnect you have. You already
> told
> > Ethernet, but there are 10/100/1000/10000 MBit/s versions outside
> there so
> > please tell what it is.
> >
> > You can check that by doing lshw on the nodes and look for Ethernet
> adapters,
> > post the name of the devices here.
> > Also the name or model of the switch is of course important.
> >
> > Let us know
> >
> > Norman Geist.
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: owner-namd-l_at_ks.uiuc.edu [mailto:owner-namd-l_at_ks.uiuc.edu] Im
> >> Auftrag von amin_at_imtech.res.in
> >> Gesendet: Sonntag, 29. Juli 2012 07:10
> >> An: Rajan Vatassery
> >> Cc: namd-l_at_ks.uiuc.edu
> >> Betreff: Re: namd-l: Low CPU usage with NAMD running on linux
> cluster
> >>
> >> What I meant was that I am distributing 24 processes over 4 nodes
> >> having 8
> >> processors each.Therefore each node will have 2 processors free. I
> have
> >> not
> >> completed any run till now so I cant give the "TIMING INFORMATION"
> >> because the
> >> simulation is running extremely slow. My 2 ns simulation is running
> for
> >> more
> >> than 2 days now.I will post that information as soon as the run is
> >> completed.
> >> Thanks.
> >>
> >> Amin.
> >>
> >>
> >>
> >>
> >> > Amin,
> >> > Do you really mean that you have requested 4 nodes x 8 processors
> >> each
> >> > = 32 processors? I'm wondering because you said you have only 24
> >> > processors. Also, Branko is not asking about output frequencies,
> but
> >> > rather the details of the output that your simulation is giving.
> For
> >> > example, the "TIMING" information from the log file is indicative
> of
> >> the
> >> > seconds/step that your simulation is running at.
> >> >
> >> > rajan
> >> >
> >> > On Sat, 2012-07-28 at 20:40 +0530, amin_at_imtech.res.in wrote:
> >> >> Thanks. I have read the link. I have a PBC system with 20,000
> atoms
> >> with time
> >> >> step=1, dcdFreq=500, outputEnergies=1000. I am trying to run NPT
> >> simulation
> >> >> using 24 processors on 4 nodes having 8 processors each.
> >> >>
> >> >> Regards.
> >> >> Amin.
> >> >>
> >> >>
> >> >> > Amin,
> >> >> >
> >> >> > Provide more data about the size of your system, output data,
> and
> >> see:
> >> >> >
> >> >> >
> http://www.ks.uiuc.edu/Research/namd/wiki/?NamdPerformanceTuning
> >> >> >
> >> >> > Branko
> >> >> >
> >> >> > On 7/28/2012 8:33 AM, amin_at_imtech.res.in wrote:
> >> >> >> Dear all,
> >> >> >> I am trying to run NAMD on a linux cluster. I am using
> NAMD
> >> >> Linux-x86_64
> >> >> >> (64-bit Intel/AMD with ethernet). While i am able to run the
> >> program on the
> >> >> >> nodes listed in the nodelist file but I find that all the
> >> processes are
> >> >> >> running at only 8-12 % CPU usage. Can someone please guide me?
> >> >> >>
> >> >> >> Regards.
> >> >> >> Amin.
> >> >> >>
> >> >> >>
> >>
> ______________________________________________________________________
> >> >> >> सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक
> >> औद्योगिक अनुसंधान परिषद)
> >> >> >> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT
> OF
> >> CSIR)
> >> >> >> स%u09
> >>
> >>
> ______________________________________________________________________
> >> सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक औद्योगिक
> >> अनुसंधान परिषद)
> >> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF
> CSIR)
> >> सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh
> >> पिन कोड/PIN CODE :160036
> >> दूरभाष/EPABX :0172 6665 201-202
> >
> >
> >
>
>
> ______________________________________________________________________
> सूक्ष्मजीव प्रौद्योगिकी संस्थान (वैज्ञानिक औद्योगिक
> अनुसंधान परिषद)
> Institute of Microbial Technology (A CONSTITUENT ESTABLISHMENT OF CSIR)
> सैक्टर 39 ए, चण्डीगढ़ / Sector 39-A, Chandigarh
> पिन कोड/PIN CODE :160036
> दूरभाष/EPABX :0172 6665 201-202

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:52 CST