Re: Simultaneous calculation on CPU-only nodes and CPU/GPU node (with or without rCUDA)

From: Axel Kohlmeyer (
Date: Mon May 28 2012 - 14:58:53 CDT

On Sun, May 27, 2012 at 10:10 AM, Benjamin Merget
<> wrote:
> The problem is, that I can only reach up to about 25% gpu utilization of
> each of the 4 Tesla cards. I thought that maybe I could increase the GPU
> utilization by creating more processes to bind to the Tesla cards. But to do
> so, I need more CPUs, i.e. the CPUs of my CPU-only nodes...

no. that won't work. if you have a low GPU utilization
then this is more likely due to:
- your simulation system is too small to result in good
  GPU utilization. remember that you need to have sufficient
  GPU work to offset the cost or data transfers to and from
  the GPU and also the non-accelerated work on the CPU.
  attaching more processes to one GPU reduces the latter,
  but increases the number of (competing) data transfers.

- your host machine's CPU doesn't have much memory bandwidth

- your GPUs are not in full bandwidth PCIe v2.x slots,
  or you have a PCIe v1.x card somewhere that reduces
  the neighboring GPU to drop to PCIe v1.x speed as well.
  (depends on the main board)

> Is there another way to increase my GPU utilization with the 8 CPU cores of
> my GPU node?

maybe. but that depends on the cause and that is
impossible to tell from remote.


> Benny
>> no, and it is not worth it. just run one calculation on the GPU node
>> and a second on the rest and enjoy efficient utilization of your hardware.
>> anything else is just wasting your time.
>> axel.

Dr. Axel Kohlmeyer
College of Science and Technology
Temple University, Philadelphia PA, USA.

This archive was generated by hypermail 2.1.6 : Tue Dec 31 2013 - 23:22:01 CST