From: John Stone (johns_at_ks.uiuc.edu)
Date: Tue Aug 31 2010 - 10:57:35 CDT

Jerome,

On Tue, Aug 31, 2010 at 05:38:49PM +0200, Jérôme Hénin wrote:
> John,
>
> Whether VMD keeps using Stride or switches to a different program or
> implementation, my understanding is that per-timestep storage of SS
> will be needed anyway, right?

Yes, that's right. It's of substantially greater interest in the case that
we have a means of rapidly computing the SS data however. With the
existing STRIDE code, the relatively low speed is an inhibitor to computing
SS separately for every frame by default. There's no reason not to implement
it in the core of VMD except the cost in development time, given that the
sscache script does provide an existing solution, albeit a very limited one.
If we had a fast STRIDE replacement, then I would view the priority
of doing this far higher than at the current moment.

> About Stride itself: I don't know of a better algorithm at this point.
> Optimizing the current implementation would require an extended
> license agreement in any case[1], so I suppose it would ideally be
> done in collaboration with the authors (at least Prof. Frishman).
>
> If the licensing issue cannot be sorted out, then re-implementing from
> scratch will be the only way. I'd offer to contribute, but anything
> CUDA-related is beyond my coding abilities.

One could either collaborate, or one could write an entirely new
piece of code based on the published STRIDE algorithm but not
reuse any of the original code. It would be great to collaborate
with Dr. Frishman if he has the time and inclination, and it might
make it easier to make a STRIDE 2.0 first, before doing more
radical things like SSE, CUDA, OpenCL, etc.

In the case of a more radical improvement, such as x86 SSE or
GPU-accelerated versions of a new STRIDE code, the rewrite is likely
to be the direction one would ultimately want to take anyway.
A good place to start would be to begin by reading the STRIDE papers
and analyzing the existing code to determine which parts of the
current code are really important to the algorithm, find and fix
some of the flaws I mentioned earlier (e.g. bad behavior for
structures far from the origin, atom name issues, etc), and
then work out what is needed to write a new replacement code
that does better than the existing STRIDE version.
The whole STRIDE code is only 7,700 lines, less than half the
size of Tachyon, and likely much simpler, and I'm guessing it contains
a lot of utility routines that also already exist in some form in
VMD itself (e.g. grid search, bond determination, etc).
So maybe the real meat of the code is only roughly half the total?

>
> Cheers,
> Jerome
>
> [1] The Stride license does allow for modifying the code, but its
> contains a copyleft-style restriction ("viral" copyright) and a
> non-commercial-only provision, both of which conflict - as far as I
> understand - with VMD's permissive license.
>
>
> On 30 August 2010 18:57, John Stone <johns_at_ks.uiuc.edu> wrote:
> >
> > Jerome,
> >  It would be relatively easy to teach VMD to store per-timestep
> > secondary structure data so that one wouldn't have to use the sscache
> > script.  I haven't put any effort into doing this, in part because I'd
> > been hoping to come up with a better tool for SS determination than
> > STRIDE.  Beyond the single-threading limitation that the current
> > version of STRIDE has, it also has various issues with
> > very large structures, structures that are far from the origin, and
> > it often doesn't behave well with minor variations in atom naming
> > conventions.
> >
> > My thought has been that rather than running multiple instances of STRIDE
> > in parallel, it might be time to begin a project to rewrite STRIDE itself,
> > and address not only the performance issue that you're experiencing, but
> > also some of the other problems I described above.  Even better would
> > be to make a GPU accelerated version of STRIDE, applicable to very large
> > structures where even a multi-core CPU version would leave a lot to be
> > desired in terms of performance.
> >
> > I haven't taken any action on rewriting STRIDE as I haven't been able
> > to justify spending the time yet, but every month that goes by, it becomes
> > a bigger issue, so it may soon be time to deal with it whether it is
> > convenient or not.
> >
> > If a small group of people wanted to undertake the work collectively,
> > that would almost certainly be the best way to go about getting it done.
> >
> > Cheers,
> >  John
> >
> >
> > On Mon, Aug 30, 2010 at 06:28:45PM +0200, Jérôme Hénin wrote:
> >> Hi all,
> >>
> >> The current system VMD uses for protein secondary structure
> >> assignment, while it works, has shortcomings. One is performance: when
> >> dealing with large trajectories, the initial assignment performed by
> >> the sscache script is fairly slow. One idea would be to take advantage
> >> of multicore machines by running stride on multiple frames at once.
> >> The issue here is that VMD itself knows only one secondary structure
> >> assignment per molecule - right now that's handled fully in Tcl by
> >> scache. After a quick look at threads in Tcl, I've decided that it
> >> might not be a good idea to parallelize on that side. So it seems that
> >> as long as VMD does not have secondary structure as per-timestep data,
> >> we're stuck.
> >>
> >> Another idea is to avoid writing to disk the data fed to Stride -
> >> either using named pipes, or requesting from the authors the
> >> permission to link Stride into VMD. I am not sure how much performance
> >> would be gained by doing this though - it could be that disk caching
> >> reduces the performance impact of the current implementation to almost
> >> zero.
> >>
> >> I'm sure others have been thinking about this for a while, so I'd like
> >> to hear your perspective. I can name at least two people who are going
> >> to answer this :-)
> >>
> >> Cheers,
> >> Jerome
> >>
> >>
> >> PS: unsubscribe vmd-l. Seriously guys. And namd-l too. I mean it.
> >
> > --
> > NIH Resource for Macromolecular Modeling and Bioinformatics
> > Beckman Institute for Advanced Science and Technology
> > University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> > Email: johns_at_ks.uiuc.edu                 Phone: 217-244-3349
> >  WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078
> >

-- 
NIH Resource for Macromolecular Modeling and Bioinformatics
Beckman Institute for Advanced Science and Technology
University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
Email: johns_at_ks.uiuc.edu                 Phone: 217-244-3349
  WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078