From: John Stone (johns_at_ks.uiuc.edu)
Date: Wed Nov 09 2005 - 11:32:57 CST

Bram,
  I just checked in new code that should cure the race for both
CAVE and FreeVR with minimal changes, try it out and let me know if it
works on your CAVE or not. I pushed the update to the anon CVS tree so
if you run an update you should get it now.

  John

On Wed, Nov 09, 2005 at 10:36:41AM -0600, John Stone wrote:
>
> Hi Bram,
>
> On Wed, Nov 09, 2005 at 11:51:10AM +0100, Bram Stolk wrote:
> > John Stone wrote:
> > >Bram,
> > > If you're running with separate projectors for each eye, then I'm
> > >almost certain you're experiencing a race condition. I've seen it before
> >
> > Nope, it's not a race condition: this would be impossible, as the
> > app is multi-process, and not multi-threaded. The shared resources
> > between the processes are minimal: They all have there own graphics pipe.
>
> Oh, not impossible at all. :-)
> A race condition can occur in any concurrent program where there's
> improperly handled synchronization, even in distributed memory programs.
>
> > I don't think there is a shared resource that the processes write to,
> > so race-conditions are not possible.
>
> This is not correct. CaveScene implements a large shared memory region
> which is written to by the main VMD process at every frame. The slave
> renderers read this shared memory region every time they draw.
>
> If you read the source code to CaveDisplayDevice and CaveScene closely,
> you'll also see that VMD is written somewhat differently than a typical
> run-of-the-mill CAVE program. Most CAVE apps live in the event loop
> provided by the CAVE library. VMD retains top level control of its own
> event loop by performing its own synchronization with the slave renderers.
> The reason for this is so that state changes that affect the CaveScene
> shared memory arena will only get done at times when the slave renderers
> aren't reading the shared memory arena, and that the slave renderers will
> only read the shared memory arena when there's no writing/updating going on
> from the main VMD process.
>
> > It turned out to be exactly like I suspected: lights are off for one eye.
> > This is because the code in Scene.C tries to do lazy setting of lights:
> > Only call GL cmds if the lights changed.
> > This is hazardous in stereo setups: if lights change, you need to make
> > the GL calls for all eyes, for all walls in a cave.
>
> Absolutely correct. The light_changed variable and lightSate array
> is shared among all of the processes since it's stored in the
> shared memory area (CaveScene subclass). This is why they are able
> to get the light positions and other parameters from that area.
> (these values can change while the program runs..)
>
> When you added the extra CAVE check, you're forcing the light state to be
> written every time. This does cause the slave renderers to update, but
> it's not really the right way to fix the problem.
> I think I see what the real bug is now however!!!
>
> The real bug is line 230 of Scene.C:
> 230 light_changed = 0;
>
> I believe THAT is the race condition and the actual cause of the problem
> with the way the code runs.
>
> What's happening is that the first of the slave processes that encounters
> the draw code is going to set that variable to zero after it updates
> its lights, and then none of the other walls are going to update their
> lighting state. This is why you're getting the correct lighting
> for a single wall, but not for the others.
>
> This state change should be protected with a synchronization primitive
> but it's not. I think the best way to solve this without incurring the
> performance hit that you take if you reset the light state every time
> you draw would be to move the "light_changed = 0" state setting to
> a location that would get called only after all of the slaved renderers
> have finished their drawing. My suggestion would be to change the code
> so that the 'light_changed = 0' is executed at line 230 when in non-CAVE
> mode, but that when running in CAVE mode, that code is skipped and
> instead, we execute it at line 84 of CaveScene.C (right after
> the write lock is set by the master process), and at line 90 of
> CaveScene.C after the second barrier synchronization. (depending on
> which sync method the code is compiled with...)
>
> > If you give me write access, I'll commit it for you, or else you would
> > have to commit yourself. Write access could be handy if I stumble onto
> > more problems.
>
> I'd rather we try the fix I suggest above as it will retain the performance
> advantage of only setting light state when necessary, but should fix the
> race condition where the first wall sets the light state and the subsequent
> walls "lose".
>
> > Also note: against a tiny performance penalty, you could opt for sending
> > lighting cmds each frame, and scrap the light_changed mechanism. I think
> > that FreeVR thing, and other stereo setups may suffer as well from this.
>
> The CAVE and FreeVR code are nearly identical in that regard, but all of
> the other display devices work fine because they don't have the shared
> memory arena and synchronization issues to worry about, so that code
> works ok. If the fix I suggest cures your problem with CaveScene, then
> the same fix can be trivially made for the FreeVRScene class as well.
>
> John
>
> --
> NIH Resource for Macromolecular Modeling and Bioinformatics
> Beckman Institute for Advanced Science and Technology
> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> Email: johns_at_ks.uiuc.edu Phone: 217-244-3349
> WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078

-- 
NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
Email: johns_at_ks.uiuc.edu                 Phone: 217-244-3349
  WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078