From: John Stone (johns_at_ks.uiuc.edu)
Date: Mon Apr 04 2022 - 11:37:34 CDT

Hi,
  Right, using the non-MPI Tachyon within VMD is correct.
It will result in the individual MPI ranks doing their own Tachyon
renderings, which is the right thing for most typical VMD+MPI
workloads like movie renderings.

If you're running out of node memory, there are a few ways
we might "tame" the memory use in VMD/Tachyon for your cube file
scenario. The 9GB cube file doesn't sound like it should result
in a scene that would create a huge memory footprint. Are you running
multiple VMD MPI ranks on the same machine still? If so, then I would
begin by avoiding that, so that each MPI process gets the full node
memory.

Regarding the rendering of the cube file, what representations are you
using? Just isosurface, or do you have lots of other representations
as well? Is there any other molecular geometry?

I might have suggestions for you to try the reduce that memory footprint
assuming you've already switched to running only one MPI rank per node.

Best,
  John Stone

On Mon, Apr 04, 2022 at 05:45:52PM +0200, Lenz Fiedler wrote:
> Hi John,
>
>
> Thank you so much - the error was indeed from the tachyon MPI
> version! It was just as you described, I had compiled the MPI
> version for both VMD and tachyon. After using the serial version for
> the latter, I don't get the crash anymore! :)
>
> Does this mean then that the rendering will be done in serial only
> on rank 0? I am trying to render an image based on a very large
> (9GB) .cube file (with isosurface), and so far using either 1, 2 and
> 4 nodes with 360GB shared memory have resulted in a segmentation
> fault. I assume it is memory related, because I can render smaller
> files just fine.
>
>
> Also thanks for the info regarding the threading, I will keep that in mind!
>
>
> Kind regards,
>
> Lenz
>
>
> --
> Lenz Fiedler, M. Sc.
> PhD Candidate | Matter under Extreme Conditions
>
> Tel.: +49 3581 37523 55
> E-Mail: l.fiedler_at_hzdr.de
> https://www.casus.science
>
> CASUS - Center for Advanced Systems Understanding
> Helmholtz-Zentrum Dresden-Rossendorf e.V. (HZDR)
> Untermarkt 20
> 02826 Görlitz
>
> Vorstand: Prof. Dr. Sebastian M. Schmidt, Dr. Diana Stiller
> Vereinsregister: VR 1693 beim Amtsgericht Dresden
>
> On 4/4/22 17:04, John Stone wrote:
> >Hi,
> > The MPI bindings for VMD are really intended for multi-node runs
> >rather than for dividing up the CPUs within a single node. The output
> >you're seeing shows that VMD is counting 48 CPUs (hyperthreading, no doubt)
> >for each MPI rank, even though they're all being launched on the same node.
> >The existing VMD startup code doesn't automatically determine when sharing
> >like this occurs, so it's just behaving the same way it would if you had
> >launched the job on 8 completely separate cluster nodes. You can set some
> >environment variables to restrict the number of shared memory threads
> >VMD/Tachyon use if you really want to run all of your ranks on the same node.
> >
> >The warning you're getting from OpenMPI about multiple initialization
> >is interesting. When you compiled VMD, you didn't compile both VMD
> >and the built-in Tachyon with MPI enabled did you? If Tachyon is also
> >trying to call MPI_Init() or MPI_Init_Thread() that might explain
> >that particular error message. Have a look at that and make sure
> >that (for now at least) you're not compiling the built-in Tachyon
> >with MPI turned on, and let's see if we can rid you of the
> >OpenMPI initialization errors+warnings.
> >
> >Best,
> > John Stone
> > vmd_at_ks.uiuc.edu
> >
> >On Mon, Apr 04, 2022 at 04:39:17PM +0200, Lenz Fiedler wrote:
> >>Dear VMD users and developers,
> >>
> >>
> >>I am facing a problem in running VMD using MPI.
> >>
> >>I compiled VMD from source (alongside Tachyon, which I would like to
> >>use for rendering). I had first checked everything in serial, there
> >>it worked. Now, after parallel compilation, I struggle to run VMD.
> >>
> >>E.g. I am allocating 8 CPUs on a cluster node that has 24 CPUs in
> >>total. Afterwards, I am trying to do:
> >>
> >>mpirun -np 8 vmd
> >>
> >>and I get this output:
> >>
> >>Info) VMD for LINUXAMD64, version 1.9.3 (April 4, 2022)
> >>Info) http://www.ks.uiuc.edu/Research/vmd/
> >>Info) Email questions and bug reports to vmd_at_ks.uiuc.edu
> >>Info) Please include this reference in published work using VMD:
> >>Info)    Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual
> >>Info)    Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
> >>Info) -------------------------------------------------------------
> >>Info) Initializing parallel VMD instances via MPI...
> >>Info) Found 8 VMD MPI nodes containing a total of 384 CPUs and 0 GPUs:
> >>Info)    0:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>Info)    1:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>Info)    2:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>Info)    3:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>Info)    4:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>Info)    5:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>Info)    6:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>Info)    7:  48 CPUs, 324.9GB (86%) free mem, 0 GPUs, Name: gv002.cluster
> >>--------------------------------------------------------------------------
> >>Open MPI has detected that this process has attempted to initialize
> >>MPI (via MPI_INIT or MPI_INIT_THREAD) more than once.  This is
> >>erroneous.
> >>--------------------------------------------------------------------------
> >>[gv002:139339] *** An error occurred in MPI_Init
> >>[gv002:139339] *** reported by process [530644993,2]
> >>[gv002:139339] *** on a NULL communicator
> >>[gv002:139339] *** Unknown error
> >>[gv002:139339] *** MPI_ERRORS_ARE_FATAL (processes in this
> >>communicator will now abort,
> >>[gv002:139339] ***    and potentially your MPI job)
> >>
> >>
> >> From the output it seems to me that each of the 8 MPI ranks assumes
> >>it is rank zero? At least the fact that each rank gives 48 CPUs
> >>(24*2 I assume?) makes me believe that.
> >>
> >>Could anyone give me a hint on what I might be doing wrong? The
> >>OpenMPI installation I am using has been used for many other
> >>programs on this cluster, so I would assume it is working correctly.
> >>
> >>
> >>Kind regards,
> >>
> >>Lenz
> >>
> >>--
> >>Lenz Fiedler, M. Sc.
> >>PhD Candidate | Matter under Extreme Conditions
> >>
> >>Tel.: +49 3581 37523 55
> >>E-Mail: l.fiedler_at_hzdr.de
> >>https://www.casus.science
> >>
> >>CASUS - Center for Advanced Systems Understanding
> >>Helmholtz-Zentrum Dresden-Rossendorf e.V. (HZDR)
> >>Untermarkt 20
> >>02826 Görlitz
> >>
> >>Vorstand: Prof. Dr. Sebastian M. Schmidt, Dr. Diana Stiller
> >>Vereinsregister: VR 1693 beim Amtsgericht Dresden
> >>
> >>
> >
> >
>

-- 
NIH Center for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
http://www.ks.uiuc.edu/~johns/           Phone: 217-244-3349
http://www.ks.uiuc.edu/Research/vmd/