From: Stober, Spencer T (spencer.t.stober_at_exxonmobil.com)
Date: Fri Sep 26 2014 - 10:35:07 CDT

Hi John,

Thanks for the fast reply.

I think that I almost have it fixed, here is a test I ran:

MASK DEV. RESULT

0x1 0 works

0x3 0,1 works

0x5 0,2 works

0x1d 0,2,3,4 does not work

So, devices 0,1, and 2 work. Device 3 or 4 must not be compatible. Therefore, I think that my (four) K20X's work, but my (single) Quadro 2000D does not. Presumably, the Quadro is either device 3 or 4.

I'm not sure how the mask hex number translates to which devices will be used, if you could possibly send the following masks, one of them must be OK:

Devices 0,1,2,3

Devices 0,1,2,4

Also, below is the output from nividia-smi (if you still need it). Device 0 (the Quadro) is missing from the startup information in VMD because I set export VMDCUDANODISPLAYGPUS=1.

Thanks very much for your help; I'm happy to run a few tests to help support the beta release.

Best regards, Spence

~~~~~~~~~~~~~~~ nvidia-smi ~~~~~~~~~~~~~~~~~~

Fri Sep 26 11:19:49 2014

+------------------------------------------------------+

| NVIDIA-SMI 331.20 Driver Version: 331.20 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 Quadro 2000D Off | 0000:01:00.0 On | N/A |

| 30% 39C P0 N/A / N/A | 198MiB / 1023MiB | 6% Default |

+-------------------------------+----------------------+----------------------+

| 1 Tesla K20Xm Off | 0000:02:00.0 Off | 0 |

| N/A 36C P0 56W / 235W | 82MiB / 5759MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 2 Tesla K20Xm Off | 0000:03:00.0 Off | 0 |

| N/A 36C P0 55W / 235W | 82MiB / 5759MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 3 Tesla K20Xm Off | 0000:83:00.0 Off | 0 |

| N/A 36C P0 57W / 235W | 82MiB / 5759MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 4 Tesla K20Xm Off | 0000:84:00.0 Off | 0 |

| N/A 36C P0 55W / 235W | 82MiB / 5759MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Compute processes: GPU Memory |

| GPU PID Process name Usage |

|=============================================================================|

| 1 5951 /data/lib/vmd/vmd_LINUXAMD64 369MiB |

| 2 5951 /data/lib/vmd/vmd_LINUXAMD64 369MiB |

| 3 5951 /data/lib/vmd/vmd_LINUXAMD64 369MiB |

| 4 5951 /data/lib/vmd/vmd_LINUXAMD64 369MiB |

+-----------------------------------------------------------------------------+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Spencer T. Stober, Ph.D.

ExxonMobil Research and Engineering

600 Billingsport Rd

Paulsboro, NJ 08066

Phone: 856-224-2638

Email: spencer.t.stober_at_exxonmobil.com

-----Original Message-----
From: John Stone [mailto:johns_at_ks.uiuc.edu]
Sent: Friday, September 26, 2014 10:56 AM
To: Stober, Spencer T
Cc: vmd-l_at_ks.uiuc.edu
Subject: Re: vmd-l: VMD 1.9.2 b1 OptiX Tachyon Error

Spencer,

  The CUDA device list shown by VMD seems to be missing device [1],

what kind of GPU is device [1]? Can you email the output of "nvidia-smi"?

Most likely what's going on here is that you have a mix of GPU hardware

generations, and OptiX is trying to use all 5 of them, although [1] is

perhaps not usable for some reason. If so, you'll need to tell VMD not

to let OptiX try to use the problematic GPU, by setting the VMDOPTIXDEVICEMASK

environment variablle appropriately.

In C-shell you would do:

  setenv VMDOPTIXDEVICEMASK 0x1

in bourne/bash shell you would do

  VMDOPTIXDEVICEMASK=0x1

  export VMDOPTIXDEVICEMASK

The device mask represents binary bits associated with each GPU.

Here are some simple examples:

Use only device 0:

  setenv VMDOPTIXDEVICEMASK 0x1

Use devices 0, 1:

  setenv VMDOPTIXDEVICEMASK 0x3

Use devices 0, 2:

  setenv VMDOPTIXDEVICEMASK 0x5

Use devices 0, 2, 3, 4

  setenv VMDOPTIXDEVICEMASK 0x1d

The tricky part is that OptiX uses a different device numbering than

CUDA does, so you may have to fiddle with the mask to get it right.

I plan to add more VMD startup output to emit the OptiX device numbering

much like the CUDA code does, to make this process simpler.

Cheers,

  John Stone

  vmd_at_ks.uiuc.edu<mailto:vmd_at_ks.uiuc.edu>

On Fri, Sep 26, 2014 at 10:40:07AM -0400, Stober, Spencer T wrote:

> Hello,

>

>

>

> I am having issues using the new CUDA accelerated Tachyon renderer. Any

> help is much appreciated.

>

>

>

> I receive the following error when I attempt to render a scene using OptiX

> Tachyon:

>

>

>

> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>

> Info) Rendering current scene to 'vmdscene.ppm' ...

>

> ERROR) OptiX error: Invalid value (Details: Function "RTresult

> _rtContextSetDevices(RTcontext, unsigned int, const int*)" caught

> exception: The list of devices is incompatible., [1575404])

> (OptiXRenderer.C:505

>

> OptiX: An error occured validating the context. Rendering is aborted.

>

> Total OptiX rendering time: 0.2 sec

>

> Info) Executing post-render cmd 'display vmdscene.ppm' ...

>

> display: Improper image header `vmdscene.ppm' @ pnm.c/ReadPNMImage/297--_000_F7B10F99FB4A394993B50E17FE44BB6D3AB113F563FFXEXM02NAXOM_--