Re: Simultaneous calculation on CPU-only nodes and CPU/GPU node (with or without rCUDA)

From: Francesco Pietra (
Date: Tue May 29 2012 - 00:10:08 CDT

On Tue, May 29, 2012 at 12:19 AM, Axel Kohlmeyer <> wrote:
> On Mon, May 28, 2012 at 6:11 PM, Francesco Pietra <> wrote:
>> As to "CPU memory bandwidth" "PCI  bandwidth", it would be useful to
>> have tabulated data in physically unambiguous terms (bit/s or what
>> else) for typical motherboards and CPU GPU mem. It would provide a
>> ground of reference on which to pose questions to hardware producers
>> about their hardware. At least with consumer motherboards (but also
>> with certain server motherboards, such as the one I have from
>> Supermicro), indications are never numeric and often even the word
>> bandwidth is omitted.
> are you going to volunteer to generate, collect and maintain this data?

Had I some competence in hardware, I would do that. At any event, a
general indication of physically unambiguous data, such as minimum
bytes/s to get adequate performance, would help in the choice of
CPU/GPU mainboards. That explicit data are not compiled in mainboard
specifications makes the potential user suspicious and he has no
ground to ask the producer about performance. Particularly, as with
the 990FXA-GD80 mainboard, with declared four x16 2.x and only six
logical CPUs.

Thanks. I'll go to examine the specifications of server-grade CPU/GPU
mainboards. Perhaps some of them will present unambiguous data to be
used - with due balance - for queries to producers of consumer CPU/GPU

francesco Pietra
> axel.
>> thanks
>> francesco pietra
>> On Mon, May 28, 2012 at 9:58 PM, Axel Kohlmeyer <> wrote:
>>> On Sun, May 27, 2012 at 10:10 AM, Benjamin Merget
>>> <> wrote:
>>>> The problem is, that I can only reach up to about 25% gpu utilization of
>>>> each of the 4 Tesla cards. I thought that maybe I could increase the GPU
>>>> utilization by creating more processes to bind to the Tesla cards. But to do
>>>> so, I need more CPUs, i.e. the CPUs of my CPU-only nodes...
>>> no. that won't work. if you have a low GPU utilization
>>> then this is more likely due to:
>>> - your simulation system is too small to result in good
>>>  GPU utilization. remember that you need to have sufficient
>>>  GPU work to offset the cost or data transfers to and from
>>>  the GPU and also the non-accelerated work on the CPU.
>>>  attaching more processes to one GPU reduces the latter,
>>>  but increases the number of (competing) data transfers.
>>> - your host machine's CPU doesn't have much memory bandwidth
>>> - your GPUs are not in full bandwidth PCIe v2.x slots,
>>>  or you have a PCIe v1.x card somewhere that reduces
>>>  the neighboring GPU to drop to PCIe v1.x speed as well.
>>>  (depends on the main board)
>>>> Is there another way to increase my GPU utilization with the 8 CPU cores of
>>>> my GPU node?
>>> maybe. but that depends on the cause and that is
>>> impossible to tell from remote.
>>> axel.
>>>> Benny
>>>>> no, and it is not worth it. just run one calculation on the GPU node
>>>>> and a second on the rest and enjoy efficient utilization of your hardware.
>>>>> anything else is just wasting your time.
>>>>> axel.
>>> --
>>> Dr. Axel Kohlmeyer
>>> College of Science and Technology
>>> Temple University, Philadelphia PA, USA.
> --
> Dr. Axel Kohlmeyer
> College of Science and Technology
> Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:01 CST