Re: PCIexpress 3.0 for MD with NAMD on GPUs

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Thu Nov 21 2013 - 05:00:54 CST

Hi Johny:
I don't understand exactly how you operate. I just boot linux to the linux
prompt (startx). Then, as superuser, I activate the GPUs with

nvidia-smi -L (that shows two lines, one for each card with their names)

nvidia-smi -pm 1 (that also shows two lines, one for each card, this time
telling that the two cards are activated permanently)

Then, as user, I launch MD with

charmrun $NAMD_HOME/bin/namd2 filename.conf +p# (where # is the number of
processors) +idlepoll 2>&1 | tee filename.log

I get both GPUs working, they use the same amount of mem, #lspci -vvvv
shows the same for both, and the first part of namd log tells how the
system to be investigated is partitioned between the two cards.

As to using selected GPUs, namd manual tells how-to. I never used selected
GPUs.

I am no expert of either hardware or software. Just a biochemist.

cheers
francesco
On Nov 20, 2013 3:32 PM, "Johny Telecaster" <johnytelecaster_at_gmail.com>
wrote:

> Francesco,
>
>
> its very strange but turned off MDM (or switching from gnome to console) I
> cannot run tasks using both GPU simultaneously. The jobs run only using
> -devise 1 and crashed in case of -devise 0,1 or -devise 0 with some CUDA
> error. (curently device 1 is that connected to the monitor). As I've told
> I never seen such error launching tasks from the gnome. Besides I havenot
> noticed any performance increase in case of launching jobs from the
> terminal (on 1 GPU+6 ppc I have 3 ns/day)
>
> Johny
>
>
> 2013/11/19 Francesco Pietra <chiendarret_at_gmail.com>
>
>> OK, got PCIe 3.0 (LnkSta 8GT/s for both GTX-680) by sending the request
>> directly to the kernel.
>>
>> In conclusion, the change from sandy bridge and 1066MHz RAM to ivy bridge
>> and 1866MHz RAM gave no NAMD2.9 acceleration for a system of 150K atoms
>> and some 13% acceleration with a system of 500K atoms. One might wonder
>> whether this is worth the money.
>>
>> francesco pietra
>> ---------- Forwarded message ----------
>> From: "Francesco Pietra" <chiendarret_at_gmail.com>
>> Date: Nov 18, 2013 8:13 AM
>> Subject: Fwd: namd-l: PCIexpress 3.0 for MD with NAMD on GPUs
>> To: "Thomas Albers" <talbers_at_binghamton.edu>, "NAMD" <namd-l_at_ks.uiuc.edu>,
>> "Lennart Sorensen" <lsorense_at_csclub.uwaterloo.ca>
>> Cc:
>>
>> It is getting hard, unless I mistaken what was suggested by nvidia .
>> Thus, I added to GRUB the option suggested by nvidia by
>>
>> 1) typing 'e' at grub prompt,
>> 2) adding the option to the linux line,
>> 3) Ctrl-x to boot
>>
>> If that procedure is correct (probably it is:
>>
>> francesco_at_gig64:~$ cat /proc/cmdline
>> BOOT_IMAGE=/boot/vmlinuz-3.10-3-amd64 root=/dev/mapper/vg1-root ro 1.
>> nvidia.NVreg_EnablePCIeGen3=1 quiet
>> francesco_at_gig64:~$
>>
>> no luck, both LnkCap and LnkSta were at at 5GT/s, as for PCIe 2.0.
>> Molecular dynamics, accordingly, was not accelerated.
>>
>> I wonder whether "1." preceding "nvidia..." is what is needed for a grub
>> bootloader option. I did not find any other instance about that nvidia
>> suggestion on internet.
>>
>> Hope someone can think better
>>
>> francesco pietra
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Sun, Nov 17, 2013 at 4:06 PM
>> Subject: Fwd: namd-l: PCIexpress 3.0 for MD with NAMD on GPUs
>> To: NAMD <namd-l_at_ks.uiuc.edu>
>>
>>
>> This addendum to let you know that simply adding
>>
>> 1. options nvidia NVreg_EnablePCIeGen3=1
>>
>> to /etc/modprobe.d/nvidia.conf
>>
>> as suggested in
>>
>>
>> https://devtalk.nvidia.com/default/topic/545186/enabling-pcie-3-0-with-nvreg_enablepciegen3-on-titan/
>>
>> had no effect. Also, please note that what should be added to the kernel
>> boot string, according to the same source, is
>>
>>
>> 1. nvidia.NVreg_EnablePCIeGen3=1
>>
>>
>>
>> unlike I wrote before (i.e., no "options", while a dot between nvidia and NVreg
>>
>> francesco pietra
>>
>> ---------- Forwarded message ----------
>> From: Francesco Pietra <chiendarret_at_gmail.com>
>> Date: Sun, Nov 17, 2013 at 11:56 AM
>> Subject: Re: namd-l: PCIexpress 3.0 for MD with NAMD on GPUs
>> To: Thomas Albers <talbers_at_binghamton.edu>
>> Cc: Namd Mailing List <namd-l_at_ks.uiuc.edu>
>>
>>
>> Hello Thomas:
>> Thanks for sharing your benchmarks. It was very useful
>>
>> With my Gigabyte X79-UD3, with two GTX-680, I replaced sandy i7-3930K
>> with ivy i7-4930K, and also replaced 1066MHz with 1866MHz RAM:
>>
>> # cat /proc/cpuinfo
>> processor : 0
>> vendor_id : GenuineIntel
>> cpu family : 6
>> model : 62
>> model name : Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
>> stepping : 4
>> microcode : 0x416
>> cpu MHz : 1200.000
>> cache size : 12288 KB
>> physical id : 0
>> siblings : 12
>> core id : 0
>> cpu cores : 6
>> apicid : 0
>> initial apicid : 0
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 13
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
>> pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl
>> xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor
>> ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic
>> popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb
>> xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep
>> erms
>> bogomips : 6800.08
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 46 bits physical, 48 bits virtual
>> power management:
>>
>> (the same for processors 1-11)
>>
>> # cat /proc/driver/nvidia/versionNVRM version: NVIDIA UNIX x86_64 Kernel
>> Module 319.60 Wed Sep 25 14:28:26 PDT 2013GCC version: gcc version 4.7.3
>> (Debian 4.7.3-8)
>>
>> **************************************
>>
>> I observed no speed increase for namd2.9 MD for a light job (150K atoms)
>> and only a few percent speed increase with a large job (500K atoms). All MD
>> simulations were carried out from the linux prompt, without X-server,
>> activating the GPUs with:
>>
>> # nvidia-smi -L
>> # nvidia-smi -pm 1
>>
>>
>> ***************************************
>>
>>
>> With all such MDs, both the capability LnCap and the actual speed link
>> LnkSta turned out to be 5GT/s, as for PCIe 2.0
>>
>> I only observed a capability of 8GT/s when launching gnome:
>>
>> # lspci -vvvv
>> 02:00:0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX
>> 680] (rev a1) (prog-if 00 [VGA controller])
>> Subsystem: NVIDIA Corporation Device 0969
>> ...............
>> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Latency
>> L0 <512ns, L1 <4us
>>
>> VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 680]
>> (rev a1) (prog-if 00 [VGA controller])
>> Subsystem: Micro-Star International Co., Ltd. Device 2820
>> .........................
>> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Latency
>> L0 <512ns, L1 <4us
>> *****************************************
>> .........................
>> LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0
>> <512ns, L1 <4us
>>
>>
>> 03:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX
>> 680] (rev a1) (prog-if 00 [VGA controller])
>> Subsystem: Micro-Star International Co., Ltd. Device 2820
>> ...................
>> LnkCap: Port #0, Speed 5GT/s, Width x16, ASPM L0s L1, Latency L0
>> <512ns, L1 <4us
>> *******************************************
>>
>> As far as I could investigate, nvidia, to activate PCIe 3.0 suggests to
>> either:
>> (1) Modify /etc/modprobe.d/local.conf (which does not exist on my debian
>> amd64 jessie) or create a new
>>
>> /etc/modprobe.d/nvidia.conf, adding to that
>>
>> 1. options nvidia NVreg_EnablePCIeGen3=1
>>
>> Actually, on my jessie, nvidia.conf reads
>>
>> alias nvidia nvidia-current
>> remove nvidia-current rm mod nvidia
>>
>>
>> Some guys found that useless, even when both grub-efi and initramfs are
>> edited accordingly, so that nvidia offered a different move, updating the
>> kernel boot string, by appending this:
>>
>> 1. options nvidia NVreg_EnablePCIeGen3=1
>>
>>
>> Could you suggest about this? In the Gigabyte motherboard itself, I set
>> "automatic, which read correctly the speed of the CPU and RAM. I found no
>> settings for PCIe, unless this requires manual setting instead of
>> automatic. I have no experience about manipulating the kernel as suggested
>> above ny nvidia.
>>
>> Thanks a lot
>> francesco pietra
>>
>> PS: I did not try nvidia tools to investigate the link speed, nor CPU-Z
>> (which is a 32bit binary requiring installation of i386 libraries). The
>> latter would uninstall the 64bit nvidia-smi.
>>
>>
>>
>>
>>
>> On Sat, Nov 16, 2013 at 4:58 PM, Thomas Albers <talbers_at_binghamton.edu>wrote:
>>
>>> Hello!
>>>
>>> > Which version of the nvidia driver is needed to activate PCIexpress 3.0
>>> > between the GPUs and RAM for MD with NAMD2.9 or NAMD2.10? As far as I
>>> can
>>> > remember, nvidia deactivated PCIe 3.0 for linux from version 295.xx
>>> until
>>> > at least 310.xx. Is that correct?
>>>
>>> I am using an i5-Ivy Bridge CPU and a GTX 660 GPU with Nvidia driver
>>> 304.43 and can confirm that PCI-e 3.0 works. (I have not done
>>> benchmarking to see if there is any speedup compared to PCI-e 2.0.)
>>>
>>> # cat /proc/cpuinfo
>>> processor : 0
>>> vendor_id : GenuineIntel
>>> cpu family : 6
>>> model : 58
>>> model name : Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
>>>
>>> # cat /proc/driver/nvidia/version
>>> NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.43 Sun Aug 19
>>> 20:14:03 PDT 2012
>>> GCC version: gcc version 4.5.4 (Gentoo 4.5.4 p1.0, pie-0.4.7)
>>>
>>> # lspci -vvvv
>>> 01:00.0 VGA compatible controller: nVidia Corporation Device 11c0 (rev
>>> a1) (prog-if 00 [VGA controller])
>>> Subsystem: ZOTAC International (MCO) Ltd. Device 1281
>>> ....
>>> LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+
>>> DLActive- BWMgmt- ABWMgmt-
>>>
>>> What the driver does is fall back to PCi-e 2.0 when not under load, so
>>> one has to check while crunching numbers on the GPU. If the GPU is
>>> idle it reports a 5 GT/s transfer rate. I do not know if this
>>> behaviour is peculiar to Nvidia or part of the PCI-e standard.
>>>
>>> Hope that helps,
>>> Thomas
>>>
>>>
>>
>>
>>
>

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:24:01 CST