From: Vijay Vammi (vsvammi_at_iastate.edu)
Date: Wed Sep 21 2011 - 15:06:13 CDT

Hi Kirby,

Thanks for the reply.
I intended to use Multiseq for my project but it is too slow to load all my
pdb files in VMD and then run structure QR.

I was able to run the program from command line itself and its much faster.
I am only interested in QR order but not the clusters so running from
command line solves my problem.
I do not have a smaller dataset that is causing this issue. It worked fine
upto 2000 files or so but only crashed at larger number of files(if that is
anyway a hint).

Sorry could not be of help,
Thanks
Santhosh

On Wed, Sep 21, 2011 at 2:52 PM, Kirby Vandivort <kvandivo_at_ks.uiuc.edu>wrote:

> Santhosh,
>
> From looking at the multiseq code, the only thing that could be causing
> the problem is the 'qpair' program not returning any data. I can add
> in error checking that will prevent the specific error message you are
> getting, but it won't actually solve your problem to do that. I would
> prefer to figure out why qpair isn't returning anything and go from there.
>
> Do you have a minimal data set (that doesn't have 4,000 files) that creates
> this problem? If so, that would be helpful to debug what is causing the
> problem you are having.
>
> Kirby
>
> Mon, Sep 19, 2011 at 02:33:34PM -0500: Vijay Vammi sent 401 lines:
> > Hi
> >
> > I was finally able to load my pdb files using vmd -m and was able to load
> > the files in multiseq.
> >
> > But when I run Multiseq select NR set with all the structures loaded : I
> get
> > this following error message
> >
> > MultiSeq Error)
> > missing operand at _@_
> > in expression "1.0-_@_"
> > (parsing expression "1.0-")
> > invoked from within
> > "expr 1.0-[lindex $scores [lindex $ordering $j]]"
> > (procedure "::Libbiokit::getNonRedundantStructures" line 65)
> > invoked from within
> > "::Libbiokit::getNonRedundantStructures $structureIDs $options(qhCutoff)
> > $numberSequencesToPreserve ".
> >
> > I have right now 4000 files in the alignment and the number of residues
> in
> > the protein are only 76. This ran well when the number of structures is
> less
> > than number of residues. Should there be any modification done when this
> > special case happens.
> >
> > Thanks,
> > Santhosh
> >
> > On Fri, Sep 16, 2011 at 5:04 PM, Anurag Sethi <anurag.sethi_at_gmail.com
> >wrote:
> >
> > > Hi,
> > >
> > > The problem does not reduce to two dimensions in the absence of gaps.
> > >
> > > In structure QR factorization, the dimensions of the matrix are 4 *
> Naln *
> > > Nstr, where Naln refers to the number of alignment columns (which
> reduces to
> > > number of residues in the absence of gaps), and Nstr refers to number
> of
> > > structures. 4 refers to 4 dimensions that are treated orthogonal to
> one
> > > another - 1 each for x-axis, y-axis, and z-axis of the C-alpha atoms in
> each
> > > structure, and one additional dimension for gaps.
> > >
> > > In the absence of gaps, it reduces to 3*Naln*Nstr which is still a
> > > multidimensional problem but it should still be faster than the
> original QR
> > > code. The code for sequence QR is written in C++ and is way faster
> than
> > > Matlab. I wouldn't recommend using Matlab because that is what we used
> to
> > > do in the early days of QR factorization (until Elijah Roberts coded
> the
> > > algorithm in C++).
> > >
> > > Also, a final point is that using QR factorization, u cannot order more
> > > than Naln structures. That is an algorithmic constraint that I can
> explain
> > > offline if you want. You can still use it for clustering if I am not
> wrong
> > > but you might have to write code based on the Q-score of different
> > > structures during the trajectory (you can send me a message offline
> about
> > > how you should be able to do this if you want).
> > >
> > > The person who has used QR factorization for ordering structures in a
> > > trajectory is Rommie Amaro - while she was a post doc with Andy
> McCammon -
> > > this was used to find the most unique conformations for drug docking in
> > > their study.
> > >
> > > Regards,
> > > Anurag
> > >
> > > On Fri, Sep 16, 2011 at 14:42, Vijay Vammi <vsvammi_at_iastate.edu>
> wrote:
> > >
> > >> Hi John,
> > >>
> > >> Thanks for looking into it.
> > >>
> > >> But I am still interested to know if any one has done this before and
> > >> assert if I am right when I say the problem reduces to
> two-dimmensional QR
> > >> factorization when dealing with MD trajectories.
> > >> I am writing a numpy code for it and would check with existing
> > >> implementation on a smaller dataset.
> > >>
> > >> Thanks
> > >> Santhosh
> > >>
> > >>
> > >> On Fri, Sep 16, 2011 at 3:02 PM, John Stone <johns_at_ks.uiuc.edu>
> wrote:
> > >>
> > >>>
> > >>> Hi,
> > >>> It sounds to me like whichever GUI is being used is causing VMD to
> > >>> load all of the files concurrently, and that the host operating
> system
> > >>> is running out of open file handles. If the file loading is done
> through
> > >>> multiseq, then it may be that we need to add a "waitfor all" to the
> > >>> code that causes the list of files to be loaded. I'll talk to Kirby
> > >>> about this and see what we come up with.
> > >>>
> > >>> Cheers,
> > >>> John Stone
> > >>> vmd_at_ks.uiuc.edu
> > >>>
> > >>> On Fri, Sep 16, 2011 at 02:52:47PM -0500, Vijay Vammi wrote:
> > >>> > Hi Kirby,
> > >>> >
> > >>> > I am writing the code in numpy for QR factorization. Since I
> dont
> > >>> have any
> > >>> > gaps in my alignment the problem reduces to simple QR
> factorization.
> > >>> >
> > >>> > But to answer your question :
> > >>> > I get this error message : "couldn't read directory
> "/usr/tmp/":
> > >>> too
> > >>> > many open files" after 1092 files. And this makes me use only
> 1092
> > >>> files.
> > >>> >
> > >>> > I am loading using GUI, select all the files from import data. I
> do
> > >>> have
> > >>> > the latest version.
> > >>> >
> > >>> > Thanks,
> > >>> > Santhosh
> > >>> >
> > >>> > On Fri, Sep 16, 2011 at 2:22 PM, Kirby Vandivort <
> > >>> kvandivo_at_gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> > Santhosh,
> > >>> >
> > >>> > We have successfully loaded about 100,000 sequences into the
> > >>> version of
> > >>> > MultiSeq that is being distributed with VMD 1.9 (are you using
> the
> > >>> > latest version?), so I want to make sure that everything is
> > >>> working as
> > >>> > it should.
> > >>> >
> > >>> > How are you attempting to load the 16,000 frames?
> > >>> >
> > >>> > Thanks,
> > >>> >
> > >>> > Kirby
> > >>> >
> > >>> > On Fri, Sep 16, 2011 at 8:13 AM, Vijay Vammi <
> vsvammi_at_iastate.edu
> > >>> >
> > >>> > wrote:
> > >>> >
> > >>> > Hello,
> > >>> >
> > >>> > I have a 8ns simulation with about 16000 frames. I want to
> > >>> cluster
> > >>> > these frames to get representative structures.
> > >>> >
> > >>> > I want to use the QR factorization method as described in
> > >>> > "Evolutionary Profiles Derived from the QR Factorization of
> > >>> Multiple
> > >>> > Structural Alignments Gives an Economy of Information" by
> > >>> Patrick
> > >>> > O*Donoghue and Zaida Luthey-Schulten. I see that this has
> been
> > >>> part of
> > >>> > multiseq plugin in VMD. I have couple of questions regarding
> its
> > >>> use :
> > >>> >
> > >>> > 1). Since we are dealing with rather large number of PDB's,
> is
> > >>> it
> > >>> > advisable that I run this via command line instead of GUI. I
> > >>> tried
> > >>> > using the GUI but I see that multiseq did not load all the
> > >>> frames but
> > >>> > only 2000 of them.
> > >>> >
> > >>> > 2). In the actual implementation the problem becomes
> > >>> > multi-dimmensional because of gaps, since I dont have gaps
> in
> > >>> the
> > >>> > structure alignment the problem should be reducible to
> > >>> traditional QR
> > >>> > factorization. (noofCAAtoms*3 X numframes would be the size
> of
> > >>> matrix
> > >>> > I am trying to decompose.). Would be it be much faster and
> > >>> better if I
> > >>> > use numpy or Matlab instead of VMD to get this done? Please
> > >>> correct me
> > >>> > if I am wrong here.
> > >>> >
> > >>> > Any help or advice on this is appreciated.
> > >>> >
> > >>> > Thanks
> > >>> > Santhosh
> > >>> >
> > >>> > --
> > >>> >
> > >>> > Kirby Vandivort
> > >>>
> > >>> --
> > >>> NIH Resource for Macromolecular Modeling and Bioinformatics
> > >>> Beckman Institute for Advanced Science and Technology
> > >>> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> > >>> http://www.ks.uiuc.edu/~johns/ Phone: 217-244-3349
> > >>> http://www.ks.uiuc.edu/Research/vmd/ Fax: 217-244-6078
> > >>>
> > >>
> > >>
> > >
>
> --
>
> Kirby Vandivort Theoretical and
> Senior Research Programmer Computational Biophysics
> Email: kvandivo_at_ks.uiuc.edu 3061 Beckman Institute
> http://www.ks.uiuc.edu/~kvandivo/ University of Illinois
> Phone: (217) 244-1928 405 N. Mathews Ave
> Fax : (217) 244-6078 Urbana, IL 61801, USA
>