From: Brendan Dennis (bdennis_at_physics.ucsd.edu)
Date: Tue Jan 31 2023 - 16:48:10 CST

Hi Josh,

The problem we are experiencing is that, on new systems with multiple GPUs
with compute capability 8.6 (which requires CUDA 11.1+), rendering with
TachyonL-OptiX produces a checkered pattern across the output. If we then
use the same exact compilation of VMD 1.9.4a57 (CUDA 11.2, OptiX 6.5.0) on
systems with older GPUs, we do not have this checkered pattern problem in
the output. So, it's not so much that we're having problems with OptiX 6.5
specifically, but rather that we're having problems with VMD rendering on
SM 8.6 GPUs. Although I can't determine for sure that OptiX 6.5.0 is the
problem-causing part of this, the fact the OptiX release notes only start
mentioning compatibility with CUDA 11.1+ in the 7.2.0 release is what made
me think this might be an OptiX version issue.

However, I had some further troubleshooting ideas after thinking things
through while reading your reply and typing up the above, and I've now been
able to verify that the checkered output problem goes away if I use
the VMDOPTIXDEVICE envvar at runtime to restrict VMD to using a single GPU
in one of these dual A5000 systems. It doesn't matter which GPU I restrict
it to though; if I render on one GPU, then exit and relaunch VMD to switch
to rendering with the other GPU, both renders turn out fine. But if I set
VMDOPTIXDEVICE or VMDOPTIXDEVICEMASK in such a way as to allow use of both
GPUs, the checkering problem comes back.

After doing some more digging into how these systems were purchased and
built by the vendor, it looks like the lab actually bought them with an
NVLink interconnect in place between the two A5000 GPUs. Although I am
getting no verification of the NVLink interconnect being available via
nvidia-smi or similar tools, VMD is reporting a GPU P2P link as being
available. So, I'm now wondering if the lack of CUDA 11 support in pre-v7
OptiX was a misdirect, and that this might actually be some sort of issue
with NVLink instead.

I can't really find any documentation for VMD and NVLink, so I'm not quite
sure how one is supposed to tune VMD to work with NVLink'd GPUs, or if it's
all supposed to be automatic. Who knows, maybe it'll still wind up being a
pre-v7 OptiX problem specifically with NVLink'd SM 8.6+ GPUs. Regardless,
for now I've asked someone who is on-site to see if they can check one of
the workstations for a physical NVLink interconnect, and to then remove it
if they find it. Once that's done, I'll give VMD another try, and see if I
still run into this checkering issue without the NVLink interconnect being
in place.

--
Brendan Dennis (he/him/his)
Systems Administrator
UCSD Physics Computing Facility
https://urldefense.com/v3/__https://pcf.ucsd.edu/__;!!DZ3fjg!7WznxmGYdNP1ickEiE86w_igykHV47KV_csqJyKtwcQuzdUhMfVve-1AUiKjKjBKKEO1JuaLRxjYq4QOXPu6fsLEnw$ 
Mayer Hall 3410
(858) 534-9415
On Tue, Jan 31, 2023 at 12:05 PM Vermaas, Josh <vermaasj_at_msu.edu> wrote:
> Hi Brendan,
>
>
>
> My point is that OptiX 6.5 works just fine with newer versions of CUDA.
> That is what we use in my lab here, and we haven’t noticed any graphical
> distortions. As you noted, porting VMD’s innards to a newer version of
> OptiX is something beyond the capabilities of a single scientist with other
> things to do for a dayjob. 😃 Do you have a minimal working example of
> something that makes a checkerboard in your setup? I’d be happy to render
> something here just to demonstrate that 6.5 works just fine, even with more
> modern CUDA libraries.
>
>
>
> -Josh
>
>
>
> *From: *Brendan Dennis <bdennis_at_physics.ucsd.edu>
> *Date: *Tuesday, January 31, 2023 at 2:17 PM
> *To: *"Vermaas, Josh" <vermaasj_at_msu.edu>
> *Cc: *"vmd-l_at_ks.uiuc.edu" <vmd-l_at_ks.uiuc.edu>
> *Subject: *Re: vmd-l: Running VMD 1.9.4alpha on newer GPUs that require
> CUDA 11+ and OptiX 7+
>
>
>
> Hi Josh,
>
>
>
> Thanks for the link, from looking at your repo it looks like we both
> figured out a lot of the same tweaks needed to get VMD building from source
> on newer systems with newer versions of various dependencies and CUDA.
> Unfortunately though, I don't think tweaking of the configure scripts or
> similar will get VMD building against OptiX 7, as NVIDIA made some pretty
> substantial changes in the OptiX 7.0.0 release that VMD's OptiX code
> doesn't yet reflect. Although it looks like the relevant portions of code
> in the most recent standalone release of Tachyon (0.99.5) have been
> rewritten to support OptiX 7, those changes have not been ported over to
> VMD's internal Tachyon renderer (or at least not as of VMD 1.9.4a57), and
> sadly it's all a bit over my head to port it myself.
>
> --
>
> Brendan Dennis (he/him/his)
>
> Systems Administrator
>
> UCSD Physics Computing Facility
>
> https://urldefense.com/v3/__https://pcf.ucsd.edu/__;!!DZ3fjg!7WznxmGYdNP1ickEiE86w_igykHV47KV_csqJyKtwcQuzdUhMfVve-1AUiKjKjBKKEO1JuaLRxjYq4QOXPu6fsLEnw$ 
> <https://urldefense.com/v3/__https:/pcf.ucsd.edu/__;!!HXCxUKc!y2kuOQIcWLv8EUaV3wpNMykOrLfVi5PJhmvm_sXJ5RCLM8fdDhHB6Zb_01wcuCnk3RMahrrqkmic6Yge5Vo5-OE$>
>
> Mayer Hall 3410
>
> (858) 534-9415
>
>
>
>
>
> On Tue, Jan 31, 2023 at 6:58 AM Josh Vermaas <vermaasj_at_msu.edu> wrote:
>
> Hi Brendan,
>
> I've been running VMD with CUDA 12.0 and OptiX 6.5, so I think it can be
> done. I've put instructions for how to do this on github.
> https://urldefense.com/v3/__https://github.com/jvermaas/vmd-packaging-instructions__;!!DZ3fjg!7WznxmGYdNP1ickEiE86w_igykHV47KV_csqJyKtwcQuzdUhMfVve-1AUiKjKjBKKEO1JuaLRxjYq4QOXPt1l8Jw7Q$ 
> <https://urldefense.com/v3/__https:/github.com/jvermaas/vmd-packaging-instructions__;!!Mih3wA!CpCXGIkyDLgkiLgg6XYyO8rPhE9542sEIOdi43gpxDKn7YboDflWtoPUOT5kOJhsyyEB0p6PdIdEKB-amahcGR4$>.
> This set of instructions was designed with my own use case in mind, where I
> have multiple Ubuntu machines all updating from my own repository. This
> saves me time on installing across the multiple machines, while respecting
> the licenses to both OptiX and CUDA. There may be some modifications you
> need to do for your own purposes, as admittedly I haven't updated the
> instructions for more recent alpha versions of VMD.
>
> -Josh
>
> On 1/30/23 9:16 PM, Brendan Dennis wrote:
>
> Hi,
>
>
>
> I provide research IT support to a lab that makes heavy use of VMD. They
> recently purchased several new Linux workstations with NVIDIA RTX A5000
> GPUs, which are only compatible with CUDA 11.1 and above. If they attempt
> to use the binary release of VMD 1.9.4a57, which is built against CUDA 10
> and OptiX 6.5.0, then they run into problems with anything using GPU
> acceleration. Of particular note is rendering an image using the internal
> TachyonL-OptiX option; the image is rendered improperly, with a severe
> checkered pattern throughout.
>
>
>
> I have been attempting to compile VMD 1.9.4a57 from source for them in
> order to try and get GPU acceleration working. Although I am able to
> compile against CUDA 11.2 successfully, the maximum version of OptiX that
> appears to be supported by VMD is 6.5.0. When built against CUDA 11.2 and
> OptiX 6.5.0, the image checkering still occurs on render, but is not nearly
> as severe as it was with the CUDA 10 binary release. My guess is that some
> version of OptiX 7 is also needed to fix this for these newer GPUs.
>
>
>
> In researching OptiX 7 support, it appears that how one would use OptiX in
> one's code changed pretty substantially with the initial 7.0.0 release, but
> also that CUDA 11 was not supported until the 7.2.0 release. It
> additionally looks like Tachyon 0.99.5 uses OptiX 7, and I was able to
> build the libtachyonoptix.a library with every OptiX 7 version <= 7.4.0.
> However, there does not appear to be a way to use this external Tachyon
> OptiX library with VMD, as all of VMD's OptiX support is internal.
>
>
>
> Is there any way to use an external Tachyon OptiX library with VMD? If
> not, is there any chance that support for OptiX 7 in VMD is not too far off
> on the horizon, perhaps even in the form of a new alpha Linux binary
> release built against CUDA 11.1+ and OptiX 7.2.0+? For now, I've had to
> tell people that they'll need to make due with using the Intel OSPray or
> other CPU-based rendering options, but I imagine that's going to get
> frustrating fairly quickly as they watch renders take minutes on their
> brand new systems, while older workstations with older GPUs can do them in
> seconds.
>
> --
>
> Brendan Dennis (he/him/his)
>
> Systems Administrator
>
> UCSD Physics Computing Facility
>
> https://urldefense.com/v3/__https://pcf.ucsd.edu/__;!!DZ3fjg!7WznxmGYdNP1ickEiE86w_igykHV47KV_csqJyKtwcQuzdUhMfVve-1AUiKjKjBKKEO1JuaLRxjYq4QOXPu6fsLEnw$ 
> <https://urldefense.com/v3/__https:/pcf.ucsd.edu/__;!!DZ3fjg!6Pk3uKQJXsVVUBSNiEN5nlGSFRbvhvd-zrWzv6WpfLenvQEvVvxE_ux5Q9DAtJmubWIicqFWxYWVawU-ciHx-3E1Yw$>
>
> Mayer Hall 3410
>
> (858) 534-9415
>
>
>
> --
>
> Josh Vermaas
>
>
>
> vermaasj_at_msu.edu
>
> Assistant Professor, Plant Research Laboratory and Biochemistry and Molecular Biology
>
> Michigan State University
>
> vermaaslab.github.io <https://urldefense.com/v3/__http:/vermaaslab.github.io__;!!HXCxUKc!y2kuOQIcWLv8EUaV3wpNMykOrLfVi5PJhmvm_sXJ5RCLM8fdDhHB6Zb_01wcuCnk3RMahrrqkmic6YgenUOnTvw$>
>
>