From: John Stone (johns_at_ks.uiuc.edu)
Date: Wed Nov 09 2005 - 10:36:41 CST

Hi Bram,

On Wed, Nov 09, 2005 at 11:51:10AM +0100, Bram Stolk wrote:
> John Stone wrote:
> >Bram,
> > If you're running with separate projectors for each eye, then I'm
> >almost certain you're experiencing a race condition. I've seen it before
>
> Nope, it's not a race condition: this would be impossible, as the
> app is multi-process, and not multi-threaded. The shared resources
> between the processes are minimal: They all have there own graphics pipe.

Oh, not impossible at all. :-)
A race condition can occur in any concurrent program where there's
improperly handled synchronization, even in distributed memory programs.

> I don't think there is a shared resource that the processes write to,
> so race-conditions are not possible.

This is not correct. CaveScene implements a large shared memory region
which is written to by the main VMD process at every frame. The slave
renderers read this shared memory region every time they draw.

If you read the source code to CaveDisplayDevice and CaveScene closely,
you'll also see that VMD is written somewhat differently than a typical
run-of-the-mill CAVE program. Most CAVE apps live in the event loop
provided by the CAVE library. VMD retains top level control of its own
event loop by performing its own synchronization with the slave renderers.
The reason for this is so that state changes that affect the CaveScene
shared memory arena will only get done at times when the slave renderers
aren't reading the shared memory arena, and that the slave renderers will
only read the shared memory arena when there's no writing/updating going on
from the main VMD process.

> It turned out to be exactly like I suspected: lights are off for one eye.
> This is because the code in Scene.C tries to do lazy setting of lights:
> Only call GL cmds if the lights changed.
> This is hazardous in stereo setups: if lights change, you need to make
> the GL calls for all eyes, for all walls in a cave.

Absolutely correct. The light_changed variable and lightSate array
is shared among all of the processes since it's stored in the
shared memory area (CaveScene subclass). This is why they are able
to get the light positions and other parameters from that area.
(these values can change while the program runs..)

When you added the extra CAVE check, you're forcing the light state to be
written every time. This does cause the slave renderers to update, but
it's not really the right way to fix the problem.
I think I see what the real bug is now however!!!

The real bug is line 230 of Scene.C:
   230 light_changed = 0;

I believe THAT is the race condition and the actual cause of the problem
with the way the code runs.

What's happening is that the first of the slave processes that encounters
the draw code is going to set that variable to zero after it updates
its lights, and then none of the other walls are going to update their
lighting state. This is why you're getting the correct lighting
for a single wall, but not for the others.

This state change should be protected with a synchronization primitive
but it's not. I think the best way to solve this without incurring the
performance hit that you take if you reset the light state every time
you draw would be to move the "light_changed = 0" state setting to
a location that would get called only after all of the slaved renderers
have finished their drawing. My suggestion would be to change the code
so that the 'light_changed = 0' is executed at line 230 when in non-CAVE
mode, but that when running in CAVE mode, that code is skipped and
instead, we execute it at line 84 of CaveScene.C (right after
the write lock is set by the master process), and at line 90 of
CaveScene.C after the second barrier synchronization. (depending on
which sync method the code is compiled with...)

> If you give me write access, I'll commit it for you, or else you would
> have to commit yourself. Write access could be handy if I stumble onto
> more problems.

I'd rather we try the fix I suggest above as it will retain the performance
advantage of only setting light state when necessary, but should fix the
race condition where the first wall sets the light state and the subsequent
walls "lose".

> Also note: against a tiny performance penalty, you could opt for sending
> lighting cmds each frame, and scrap the light_changed mechanism. I think
> that FreeVR thing, and other stereo setups may suffer as well from this.

The CAVE and FreeVR code are nearly identical in that regard, but all of
the other display devices work fine because they don't have the shared
memory arena and synchronization issues to worry about, so that code
works ok. If the fix I suggest cures your problem with CaveScene, then
the same fix can be trivially made for the FreeVRScene class as well.

  John

-- 
NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
Email: johns_at_ks.uiuc.edu                 Phone: 217-244-3349
  WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078