Re: Simultaneous calculation on CPU-only nodes and CPU/GPU node (with or without rCUDA)

From: Francesco Pietra (chiendarret_at_gmail.com)
Date: Mon May 28 2012 - 17:11:54 CDT

As to "CPU memory bandwidth" "PCI bandwidth", it would be useful to
have tabulated data in physically unambiguous terms (bit/s or what
else) for typical motherboards and CPU GPU mem. It would provide a
ground of reference on which to pose questions to hardware producers
about their hardware. At least with consumer motherboards (but also
with certain server motherboards, such as the one I have from
Supermicro), indications are never numeric and often even the word
bandwidth is omitted.
thanks
francesco pietra

On Mon, May 28, 2012 at 9:58 PM, Axel Kohlmeyer <akohlmey_at_gmail.com> wrote:
> On Sun, May 27, 2012 at 10:10 AM, Benjamin Merget
> <benjamin.merget_at_uni-wuerzburg.de> wrote:
>> The problem is, that I can only reach up to about 25% gpu utilization of
>> each of the 4 Tesla cards. I thought that maybe I could increase the GPU
>> utilization by creating more processes to bind to the Tesla cards. But to do
>> so, I need more CPUs, i.e. the CPUs of my CPU-only nodes...
>
> no. that won't work. if you have a low GPU utilization
> then this is more likely due to:
> - your simulation system is too small to result in good
>  GPU utilization. remember that you need to have sufficient
>  GPU work to offset the cost or data transfers to and from
>  the GPU and also the non-accelerated work on the CPU.
>  attaching more processes to one GPU reduces the latter,
>  but increases the number of (competing) data transfers.
>
> - your host machine's CPU doesn't have much memory bandwidth
>
> - your GPUs are not in full bandwidth PCIe v2.x slots,
>  or you have a PCIe v1.x card somewhere that reduces
>  the neighboring GPU to drop to PCIe v1.x speed as well.
>  (depends on the main board)
>
>> Is there another way to increase my GPU utilization with the 8 CPU cores of
>> my GPU node?
>
> maybe. but that depends on the cause and that is
> impossible to tell from remote.
>
> axel.
>
>> Benny
>>
>>
>>> no, and it is not worth it. just run one calculation on the GPU node
>>> and a second on the rest and enjoy efficient utilization of your hardware.
>>> anything else is just wasting your time.
>>>
>>> axel.
>>>
>>
>
>
>
> --
> Dr. Axel Kohlmeyer
> akohlmey_at_gmail.com
>
> College of Science and Technology
> Temple University, Philadelphia PA, USA.
>

This archive was generated by hypermail 2.1.6 : Mon Dec 31 2012 - 23:21:34 CST