Re: Simultaneous calculation on CPU-only nodes and CPU/GPU node (with or without rCUDA)

From: Axel Kohlmeyer (akohlmey_at_gmail.com)
Date: Mon May 28 2012 - 17:19:42 CDT

On Mon, May 28, 2012 at 6:11 PM, Francesco Pietra <chiendarret_at_gmail.com> wrote:
> As to "CPU memory bandwidth" "PCI  bandwidth", it would be useful to
> have tabulated data in physically unambiguous terms (bit/s or what
> else) for typical motherboards and CPU GPU mem. It would provide a
> ground of reference on which to pose questions to hardware producers
> about their hardware. At least with consumer motherboards (but also
> with certain server motherboards, such as the one I have from
> Supermicro), indications are never numeric and often even the word
> bandwidth is omitted.

are you going to volunteer to generate, collect and maintain this data?

axel.

> thanks
> francesco pietra
>
> On Mon, May 28, 2012 at 9:58 PM, Axel Kohlmeyer <akohlmey_at_gmail.com> wrote:
>> On Sun, May 27, 2012 at 10:10 AM, Benjamin Merget
>> <benjamin.merget_at_uni-wuerzburg.de> wrote:
>>> The problem is, that I can only reach up to about 25% gpu utilization of
>>> each of the 4 Tesla cards. I thought that maybe I could increase the GPU
>>> utilization by creating more processes to bind to the Tesla cards. But to do
>>> so, I need more CPUs, i.e. the CPUs of my CPU-only nodes...
>>
>> no. that won't work. if you have a low GPU utilization
>> then this is more likely due to:
>> - your simulation system is too small to result in good
>>  GPU utilization. remember that you need to have sufficient
>>  GPU work to offset the cost or data transfers to and from
>>  the GPU and also the non-accelerated work on the CPU.
>>  attaching more processes to one GPU reduces the latter,
>>  but increases the number of (competing) data transfers.
>>
>> - your host machine's CPU doesn't have much memory bandwidth
>>
>> - your GPUs are not in full bandwidth PCIe v2.x slots,
>>  or you have a PCIe v1.x card somewhere that reduces
>>  the neighboring GPU to drop to PCIe v1.x speed as well.
>>  (depends on the main board)
>>
>>> Is there another way to increase my GPU utilization with the 8 CPU cores of
>>> my GPU node?
>>
>> maybe. but that depends on the cause and that is
>> impossible to tell from remote.
>>
>> axel.
>>
>>> Benny
>>>
>>>
>>>> no, and it is not worth it. just run one calculation on the GPU node
>>>> and a second on the rest and enjoy efficient utilization of your hardware.
>>>> anything else is just wasting your time.
>>>>
>>>> axel.
>>>>
>>>
>>
>>
>>
>> --
>> Dr. Axel Kohlmeyer
>> akohlmey_at_gmail.com
>>
>> College of Science and Technology
>> Temple University, Philadelphia PA, USA.
>>

-- 
Dr. Axel Kohlmeyer
akohlmey_at_gmail.com  http://goo.gl/1wk0
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:01 CST