From: Oliver Beckstein (orbeckst_at_jhmi.edu)
Date: Wed Nov 21 2007 - 19:05:14 CST

> John,
>
> Thank you very much for the explanation, it was very enlightening!

Yes, thanks from me as well; I wondered, too.

>>>> Hi Marcos, Oliver,
>>>> While inconvenient due to the way the authors of PMEPot and VolMap
>>>> wrote their code, it can still be done using BigDCD by changing the
>>>> BigDCD script to load batches of frames before triggering an
>>>> execution
>>>> of VolMap or PMEPot. In order to workaround the limitation of
>>>> these two
>>>> codes, you'd have to averaging of the batches in your own script as
>>>> neither of these two tools know how to allow the user to "continue"
>>>> a partial calculation.

Currently I only need densities from VolMap, thus I can get away with
averaging density maps, computed for batches (I have some python code
to manipulate the dx files, in case anyone needs it – I'm just better
with python than with tcl...).

Thanks,
Oliver

>
> Cheers,
> Michel
>
> 2007/11/21, John Stone <johns_at_ks.uiuc.edu>:
>>
>> Michel,
>> To answer your question, in short, VMD has evolved tremendously from
>> where it started...
>> The main reason why VMD uses physical memory for timesteps is
>> because it
>> was originally more of a visualization tool and less of an analysis
>> tool,
>> specifically it was originally developed to run in the CAVE,
>> an immersive VR environment that requires screen redraws of 20-30 fps
>> in
>> order to be usable at all. In that type of environment, it's
>> typically
>> not practical to load things from disk during visualization unless
>> you have
>> a huge RAID array and use separate I/O threads for read-ahead while
>> the
>> main thread is doing visualization.
>>
>> For the purposes of desktop visualization, loading timesteps from disk
>> on-the-fly is typically anoyyingly slow for anything but batch movie
>> making, unless you have a very small simulation. That's the reason
>> why it was originally written to load timesteps directly into memory.
>> As the program has become much more powerful on the analysis side,
>> we've progressively made steps to make it possible to run batch
>> analyses, and added significant functionality in support of
>> out-of-core scripting etc like BigDCD.
>>
>> The next step is to teach the VMD internals and the molfile plugins to
>> directly support the use of arbitrarily large out-of-core data and use
>> program-managed I/O to load and evict timesteps as necessary. The
>> script based approach we've used for the last five years worked well
>> for
>> the types of analyses people did in VMD up to this point, but with
>> the large number of plugins and extensions that VMD now enjoys,
>> it makes much more sense to do this sort of thing in VMD itself
>> and not require script and/or plugin writers to have to worry
>> about this in most cases.
>>
>> Cheers,
>> John Stone
>> vmd_at_ks.uiuc.edu
>>
>> On Wed, Nov 21, 2007 at 01:03:25AM +0100, L. Michel Espinoza-Fonseca
>> wrote:
>>> Hi John,
>>>
>>> That would be a great improvement in VMD. I know it sounds a bit
>>> ignorant, but I always wondered why VMD uses the physical memory to
>>> store the trajectories instead of using the hard disk to temporarily
>>> store them, for example. I also have huge trajectories (like 10 GB)
>>> and indeed, I always need to use the big machines to perform the
>>> analysis and/or to reduce the size of the files by cutting off the
>>> parts of the system I'm interested in. (Un)fortunately it seems that
>>> we're reaching a new point in which we can easily get tens of
>>> nanoseconds of systems with more than 200K atoms. That creates a lot
>>> of trouble when analyzing the huge trajectories created by NAMD :).
>>>
>>> Cheers,
>>> Michel
>>>
>>> 2007/11/21, John Stone <johns_at_ks.uiuc.edu>:
>>>>
>>>> Hi Marcos, Oliver,
>>>> While inconvenient due to the way the authors of PMEPot and VolMap
>>>> wrote their code, it can still be done using BigDCD by changing the
>>>> BigDCD script to load batches of frames before triggering an
>>>> execution
>>>> of VolMap or PMEPot. In order to workaround the limitation of
>>>> these two
>>>> codes, you'd have to averaging of the batches in your own script as
>>>> neither of these two tools know how to allow the user to "continue"
>>>> a partial calculation. This is something that would be best solved
>>>> in
>>>> a future rev of VMD by fixing both PMEPot and VolMap to allow an
>>>> existing calculation to be done in stages, or to allow continuation
>>>> by incorporating more frames, etc.
>>>>
>>>> I've already been planning to change the internals of VMD to allow
>>>> out-of-core data processing for huge datasets that can't possibly
>>>> fit
>>>> into the physical memory of the host machine, but it will probably
>>>> be
>>>> at least a couple more months before I have time to work on that
>>>> seriously
>>>> due to various ongoing efforts with NAMD and other projects.
>>>>
>>>> When I implement that feature, it will (almost) entirely eliminate
>>>> the
>>>> need for scripts like BigDCD to be used at all, as VMD will do this
>>>> automatically.
>>>>
>>>> Cheers,
>>>> John Stone
>>>> vmd_at_ks.uiuc.edu
>>>>
>>>> On Tue, Nov 20, 2007 at 02:47:21PM -0600, Marcos Sotomayor wrote:
>>>>>
>>>>> Hi John,
>>>>>
>>>>> I have had the same problem that Oliver mentioned. It would be
>>>>> indeed
>>>>> great and very useful if one could analyze big trajectories
>>>>> without using
>>>>> all the RAM of the most powerful computer in the lab...
>>>>>
>>>>> I know about and have used bigdcd before, but so far I don't see
>>>>> any easy
>>>>> way to use it along with volmap and pmepot (Am I missing
>>>>> something?).
>>>>>
>>>>> Regards,
>>>>> Marcos.
>>>>>
>>>>> ---------- Forwarded message ----------
>>>>> Date: Tue, 20 Nov 2007 15:28:45 -0500
>>>>> From: Oliver Beckstein <orbeckst_at_jhmi.edu>
>>>>> To: vmd-l_at_ks.uiuc.edu
>>>>> Subject: vmd-l: analysing big trajectories
>>>>>
>>>>> Hi,
>>>>>
>>>>> is there a way to analyse trajectories that are bigger than the
>>>>> available
>>>>> RAM? For instance, I have trajectories > 5GiB in size that I would
>>>>> like to
>>>>> analyze with VolMap but they can't be loaded because VMD insists
>>>>> on keeping
>>>>> the whole trajectory in memory.
>>>>>
>>>>> A cumbersome work-around would be to split the trajectory into
>>>>> smaller
>>>>> chunks, run volmap on each chunk, then average the resulting dx
>>>>> files.
>>>>> However, I can think of situations when a simple average is not
>>>>> enough (for
>>>>> instance for time correlation functions) and it would very
>>>>> convenient if
>>>>> one could just have a (python-style) iterator over a trajectory
>>>>> (similar to
>>>>> the 'for timestep in universe.dcd: ....' idiom in
>>>>> http://code.google.com/p/mdanalysis/ ).
>>>>>
>>>>> (Note: I don't think that increasing swap space is a solution
>>>>> because that
>>>>> leads to the computer almost grinding to halt when the trajectory
>>>>> is
>>>>> loaded.)
>>>>>
>>>>> Thanks,
>>>>> Oliver
>>>>>
>>>>> --
>>>>> Oliver Beckstein * orbeckst_at_jhmi.edu
>>>>>
>>>>> Johns Hopkins University, School of Medicine
>>>>> Dept. of Physiology, Biophysics 206
>>>>> 725 N. Wolfe St
>>>>> Baltimore, MD 21205, USA
>>>>>
>>>>> Tel.: +1 (410) 614-4435
>>>>
>>>> --
>>>> NIH Resource for Macromolecular Modeling and Bioinformatics
>>>> Beckman Institute for Advanced Science and Technology
>>>> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
>>>> Email: johns_at_ks.uiuc.edu Phone: 217-244-3349
>>>> WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078
>>>>
>>
>> --
>> NIH Resource for Macromolecular Modeling and Bioinformatics
>> Beckman Institute for Advanced Science and Technology
>> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
>> Email: johns_at_ks.uiuc.edu Phone: 217-244-3349
>> WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078
>>
>
>

--
Oliver Beckstein * orbeckst_at_jhmi.edu
Johns Hopkins University, School of Medicine
Dept. of Physiology, Biophysics 206
725 N. Wolfe St
Baltimore, MD 21205, USA
Tel.: +1 (410) 614-4435