• ## Outreach

From: Anurag Sethi (anurag.sethi_at_gmail.com)
Date: Fri Sep 16 2011 - 17:04:29 CDT

Hi,

The problem does not reduce to two dimensions in the absence of gaps.

In structure QR factorization, the dimensions of the matrix are 4 * Naln *
Nstr, where Naln refers to the number of alignment columns (which reduces to
number of residues in the absence of gaps), and Nstr refers to number of
structures. 4 refers to 4 dimensions that are treated orthogonal to one
another - 1 each for x-axis, y-axis, and z-axis of the C-alpha atoms in each
structure, and one additional dimension for gaps.

In the absence of gaps, it reduces to 3*Naln*Nstr which is still a
multidimensional problem but it should still be faster than the original QR
code. The code for sequence QR is written in C++ and is way faster than
Matlab. I wouldn't recommend using Matlab because that is what we used to
do in the early days of QR factorization (until Elijah Roberts coded the
algorithm in C++).

Also, a final point is that using QR factorization, u cannot order more than
Naln structures. That is an algorithmic constraint that I can explain
offline if you want. You can still use it for clustering if I am not wrong
but you might have to write code based on the Q-score of different
structures during the trajectory (you can send me a message offline about
how you should be able to do this if you want).

The person who has used QR factorization for ordering structures in a
trajectory is Rommie Amaro - while she was a post doc with Andy McCammon -
this was used to find the most unique conformations for drug docking in
their study.

Regards,
Anurag

On Fri, Sep 16, 2011 at 14:42, Vijay Vammi <vsvammi_at_iastate.edu> wrote:

> Hi John,
>
> Thanks for looking into it.
>
> But I am still interested to know if any one has done this before and
> assert if I am right when I say the problem reduces to two-dimmensional QR
> factorization when dealing with MD trajectories.
> I am writing a numpy code for it and would check with existing
> implementation on a smaller dataset.
>
> Thanks
> Santhosh
>
>
> On Fri, Sep 16, 2011 at 3:02 PM, John Stone <johns_at_ks.uiuc.edu> wrote:
>
>>
>> Hi,
>> It sounds to me like whichever GUI is being used is causing VMD to
>> load all of the files concurrently, and that the host operating system
>> is running out of open file handles. If the file loading is done through
>> multiseq, then it may be that we need to add a "waitfor all" to the
>> code that causes the list of files to be loaded. I'll talk to Kirby
>>
>> Cheers,
>> John Stone
>> vmd_at_ks.uiuc.edu
>>
>> On Fri, Sep 16, 2011 at 02:52:47PM -0500, Vijay Vammi wrote:
>> > Hi Kirby,
>> >
>> > I am writing the code in numpy for QR factorization. Since I dont
>> have any
>> > gaps in my alignment the problem reduces to simple QR factorization.
>> >
>> > I get this error message : "couldn't read directory "/usr/tmp/":
>> too
>> > many open files" after 1092 files. And this makes me use only 1092
>> files.
>> >
>> > I am loading using GUI, select all the files from import data. I do
>> have
>> >
>> > Thanks,
>> > Santhosh
>> >
>> > On Fri, Sep 16, 2011 at 2:22 PM, Kirby Vandivort <kvandivo_at_gmail.com
>> >
>> > wrote:
>> >
>> > Santhosh,
>> >
>> version of
>> > MultiSeq that is being distributed with VMD 1.9 (are you using the
>> > latest version?), so I want to make sure that everything is working
>> as
>> > it should.
>> >
>> > How are you attempting to load the 16,000 frames?
>> >
>> > Thanks,
>> >
>> > Kirby
>> >
>> > On Fri, Sep 16, 2011 at 8:13 AM, Vijay Vammi <vsvammi_at_iastate.edu>
>> > wrote:
>> >
>> > Hello,
>> >
>> > I have a 8ns simulation with about 16000 frames. I want to
>> cluster
>> > these frames to get representative structures.
>> >
>> > I want to use the QR factorization method as described in
>> > "Evolutionary Profiles Derived from the QR Factorization of
>> Multiple
>> > Structural Alignments Gives an Economy of Information" by Patrick
>> > O*Donoghue and Zaida Luthey-Schulten. I see that this has been
>> part of
>> > multiseq plugin in VMD. I have couple of questions regarding its
>> use :
>> >
>> > 1). Since we are dealing with rather large number of PDB's, is it
>> > advisable that I run this via command line instead of GUI. I
>> tried
>> > using the GUI but I see that multiseq did not load all the frames
>> but
>> > only 2000 of them.
>> >
>> > 2). In the actual implementation the problem becomes
>> > multi-dimmensional because of gaps, since I dont have gaps in the
>> > structure alignment the problem should be reducible to
>> > factorization. (noofCAAtoms*3 X numframes would be the size of
>> matrix
>> > I am trying to decompose.). Would be it be much faster and better
>> if I
>> > use numpy or Matlab instead of VMD to get this done? Please
>> correct me
>> > if I am wrong here.
>> >
>> > Any help or advice on this is appreciated.
>> >
>> > Thanks
>> > Santhosh
>> >
>> > --
>> >
>> > Kirby Vandivort
>>
>> --
>> NIH Resource for Macromolecular Modeling and Bioinformatics
>> Beckman Institute for Advanced Science and Technology
>> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
>> http://www.ks.uiuc.edu/~johns/ Phone: 217-244-3349
>> http://www.ks.uiuc.edu/Research/vmd/ Fax: 217-244-6078
>>
>
>