From: Kirby Vandivort (kvandivo_at_ks.uiuc.edu)
Date: Wed Sep 21 2011 - 14:52:54 CDT

Santhosh,

>From looking at the multiseq code, the only thing that could be causing
the problem is the 'qpair' program not returning any data. I can add
in error checking that will prevent the specific error message you are
getting, but it won't actually solve your problem to do that. I would
prefer to figure out why qpair isn't returning anything and go from there.

Do you have a minimal data set (that doesn't have 4,000 files) that creates
this problem? If so, that would be helpful to debug what is causing the
problem you are having.

Kirby

Mon, Sep 19, 2011 at 02:33:34PM -0500: Vijay Vammi sent 401 lines:
> Hi
>
> I was finally able to load my pdb files using vmd -m and was able to load
> the files in multiseq.
>
> But when I run Multiseq select NR set with all the structures loaded : I get
> this following error message
>
> MultiSeq Error)
> missing operand at _@_
> in expression "1.0-_@_"
> (parsing expression "1.0-")
> invoked from within
> "expr 1.0-[lindex $scores [lindex $ordering $j]]"
> (procedure "::Libbiokit::getNonRedundantStructures" line 65)
> invoked from within
> "::Libbiokit::getNonRedundantStructures $structureIDs $options(qhCutoff)
> $numberSequencesToPreserve ".
>
> I have right now 4000 files in the alignment and the number of residues in
> the protein are only 76. This ran well when the number of structures is less
> than number of residues. Should there be any modification done when this
> special case happens.
>
> Thanks,
> Santhosh
>
> On Fri, Sep 16, 2011 at 5:04 PM, Anurag Sethi <anurag.sethi_at_gmail.com>wrote:
>
> > Hi,
> >
> > The problem does not reduce to two dimensions in the absence of gaps.
> >
> > In structure QR factorization, the dimensions of the matrix are 4 * Naln *
> > Nstr, where Naln refers to the number of alignment columns (which reduces to
> > number of residues in the absence of gaps), and Nstr refers to number of
> > structures. 4 refers to 4 dimensions that are treated orthogonal to one
> > another - 1 each for x-axis, y-axis, and z-axis of the C-alpha atoms in each
> > structure, and one additional dimension for gaps.
> >
> > In the absence of gaps, it reduces to 3*Naln*Nstr which is still a
> > multidimensional problem but it should still be faster than the original QR
> > code. The code for sequence QR is written in C++ and is way faster than
> > Matlab. I wouldn't recommend using Matlab because that is what we used to
> > do in the early days of QR factorization (until Elijah Roberts coded the
> > algorithm in C++).
> >
> > Also, a final point is that using QR factorization, u cannot order more
> > than Naln structures. That is an algorithmic constraint that I can explain
> > offline if you want. You can still use it for clustering if I am not wrong
> > but you might have to write code based on the Q-score of different
> > structures during the trajectory (you can send me a message offline about
> > how you should be able to do this if you want).
> >
> > The person who has used QR factorization for ordering structures in a
> > trajectory is Rommie Amaro - while she was a post doc with Andy McCammon -
> > this was used to find the most unique conformations for drug docking in
> > their study.
> >
> > Regards,
> > Anurag
> >
> > On Fri, Sep 16, 2011 at 14:42, Vijay Vammi <vsvammi_at_iastate.edu> wrote:
> >
> >> Hi John,
> >>
> >> Thanks for looking into it.
> >>
> >> But I am still interested to know if any one has done this before and
> >> assert if I am right when I say the problem reduces to two-dimmensional QR
> >> factorization when dealing with MD trajectories.
> >> I am writing a numpy code for it and would check with existing
> >> implementation on a smaller dataset.
> >>
> >> Thanks
> >> Santhosh
> >>
> >>
> >> On Fri, Sep 16, 2011 at 3:02 PM, John Stone <johns_at_ks.uiuc.edu> wrote:
> >>
> >>>
> >>> Hi,
> >>> It sounds to me like whichever GUI is being used is causing VMD to
> >>> load all of the files concurrently, and that the host operating system
> >>> is running out of open file handles. If the file loading is done through
> >>> multiseq, then it may be that we need to add a "waitfor all" to the
> >>> code that causes the list of files to be loaded. I'll talk to Kirby
> >>> about this and see what we come up with.
> >>>
> >>> Cheers,
> >>> John Stone
> >>> vmd_at_ks.uiuc.edu
> >>>
> >>> On Fri, Sep 16, 2011 at 02:52:47PM -0500, Vijay Vammi wrote:
> >>> > Hi Kirby,
> >>> >
> >>> > I am writing the code in numpy for QR factorization. Since I dont
> >>> have any
> >>> > gaps in my alignment the problem reduces to simple QR factorization.
> >>> >
> >>> > But to answer your question :
> >>> > I get this error message : "couldn't read directory "/usr/tmp/":
> >>> too
> >>> > many open files" after 1092 files. And this makes me use only 1092
> >>> files.
> >>> >
> >>> > I am loading using GUI, select all the files from import data. I do
> >>> have
> >>> > the latest version.
> >>> >
> >>> > Thanks,
> >>> > Santhosh
> >>> >
> >>> > On Fri, Sep 16, 2011 at 2:22 PM, Kirby Vandivort <
> >>> kvandivo_at_gmail.com>
> >>> > wrote:
> >>> >
> >>> > Santhosh,
> >>> >
> >>> > We have successfully loaded about 100,000 sequences into the
> >>> version of
> >>> > MultiSeq that is being distributed with VMD 1.9 (are you using the
> >>> > latest version?), so I want to make sure that everything is
> >>> working as
> >>> > it should.
> >>> >
> >>> > How are you attempting to load the 16,000 frames?
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Kirby
> >>> >
> >>> > On Fri, Sep 16, 2011 at 8:13 AM, Vijay Vammi <vsvammi_at_iastate.edu
> >>> >
> >>> > wrote:
> >>> >
> >>> > Hello,
> >>> >
> >>> > I have a 8ns simulation with about 16000 frames. I want to
> >>> cluster
> >>> > these frames to get representative structures.
> >>> >
> >>> > I want to use the QR factorization method as described in
> >>> > "Evolutionary Profiles Derived from the QR Factorization of
> >>> Multiple
> >>> > Structural Alignments Gives an Economy of Information" by
> >>> Patrick
> >>> > O*Donoghue and Zaida Luthey-Schulten. I see that this has been
> >>> part of
> >>> > multiseq plugin in VMD. I have couple of questions regarding its
> >>> use :
> >>> >
> >>> > 1). Since we are dealing with rather large number of PDB's, is
> >>> it
> >>> > advisable that I run this via command line instead of GUI. I
> >>> tried
> >>> > using the GUI but I see that multiseq did not load all the
> >>> frames but
> >>> > only 2000 of them.
> >>> >
> >>> > 2). In the actual implementation the problem becomes
> >>> > multi-dimmensional because of gaps, since I dont have gaps in
> >>> the
> >>> > structure alignment the problem should be reducible to
> >>> traditional QR
> >>> > factorization. (noofCAAtoms*3 X numframes would be the size of
> >>> matrix
> >>> > I am trying to decompose.). Would be it be much faster and
> >>> better if I
> >>> > use numpy or Matlab instead of VMD to get this done? Please
> >>> correct me
> >>> > if I am wrong here.
> >>> >
> >>> > Any help or advice on this is appreciated.
> >>> >
> >>> > Thanks
> >>> > Santhosh
> >>> >
> >>> > --
> >>> >
> >>> > Kirby Vandivort
> >>>
> >>> --
> >>> NIH Resource for Macromolecular Modeling and Bioinformatics
> >>> Beckman Institute for Advanced Science and Technology
> >>> University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
> >>> http://www.ks.uiuc.edu/~johns/ Phone: 217-244-3349
> >>> http://www.ks.uiuc.edu/Research/vmd/ Fax: 217-244-6078
> >>>
> >>
> >>
> >

-- 
Kirby Vandivort                      Theoretical and 
Senior Research Programmer            Computational Biophysics 
Email: kvandivo_at_ks.uiuc.edu          3061 Beckman Institute
http://www.ks.uiuc.edu/~kvandivo/    University of Illinois
Phone: (217) 244-1928                405 N. Mathews Ave
Fax  : (217) 244-6078                Urbana, IL  61801, USA