From: John Stone (johns_at_ks.uiuc.edu)
Date: Mon Aug 30 2010 - 11:57:25 CDT

Jerome,
  It would be relatively easy to teach VMD to store per-timestep
secondary structure data so that one wouldn't have to use the sscache
script. I haven't put any effort into doing this, in part because I'd
been hoping to come up with a better tool for SS determination than
STRIDE. Beyond the single-threading limitation that the current
version of STRIDE has, it also has various issues with
very large structures, structures that are far from the origin, and
it often doesn't behave well with minor variations in atom naming
conventions.

My thought has been that rather than running multiple instances of STRIDE
in parallel, it might be time to begin a project to rewrite STRIDE itself,
and address not only the performance issue that you're experiencing, but
also some of the other problems I described above. Even better would
be to make a GPU accelerated version of STRIDE, applicable to very large
structures where even a multi-core CPU version would leave a lot to be
desired in terms of performance.

I haven't taken any action on rewriting STRIDE as I haven't been able
to justify spending the time yet, but every month that goes by, it becomes
a bigger issue, so it may soon be time to deal with it whether it is
convenient or not.

If a small group of people wanted to undertake the work collectively,
that would almost certainly be the best way to go about getting it done.

Cheers,
  John

On Mon, Aug 30, 2010 at 06:28:45PM +0200, Jérôme Hénin wrote:
> Hi all,
>
> The current system VMD uses for protein secondary structure
> assignment, while it works, has shortcomings. One is performance: when
> dealing with large trajectories, the initial assignment performed by
> the sscache script is fairly slow. One idea would be to take advantage
> of multicore machines by running stride on multiple frames at once.
> The issue here is that VMD itself knows only one secondary structure
> assignment per molecule - right now that's handled fully in Tcl by
> scache. After a quick look at threads in Tcl, I've decided that it
> might not be a good idea to parallelize on that side. So it seems that
> as long as VMD does not have secondary structure as per-timestep data,
> we're stuck.
>
> Another idea is to avoid writing to disk the data fed to Stride -
> either using named pipes, or requesting from the authors the
> permission to link Stride into VMD. I am not sure how much performance
> would be gained by doing this though - it could be that disk caching
> reduces the performance impact of the current implementation to almost
> zero.
>
> I'm sure others have been thinking about this for a while, so I'd like
> to hear your perspective. I can name at least two people who are going
> to answer this :-)
>
> Cheers,
> Jerome
>
>
> PS: unsubscribe vmd-l. Seriously guys. And namd-l too. I mean it.

-- 
NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
Email: johns_at_ks.uiuc.edu                 Phone: 217-244-3349
  WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078