From: Ana Celia Araujo Vila Verde (avilaverde_at_engr.psu.edu)
Date: Tue Feb 21 2006 - 10:53:16 CST

Hi all,

I finally figured out what I was doing wrong, and decided to email the list about it in the hope that it will help others. Also, I'd like to thank Nuno Ferreira, who sent me the tcl scrip he uses to get the RMSD, and that enabled me to understand what I was doing wrong.

In order to get the RMSD, we compensate for the differences caused by rotation and translation of a molecule at time t relative to time zero by computing a matrix (measure fit $selectionFrame0 $selectionFramei) and then moving the selection by that matrix before calculating the RMSD.

My problem was exactly that: I was moving the selection, and not the entire molecule. So, if I called the script a first time to get the RMSD of all the sheets, it would work OK. But if subsequently I called it a second time to get the RMSD of the entire protein, the atoms that belong to the sheet would be in the wrong place (because I had moved them previously), and I'd get the wrong RMSD. In order for the script to work for various selections, the entire protein must be aligned with the reference configuration every time the RMSD is calculated.

Thanks to all the people that replied to my question.

Ana

_________________________________
Ana Célia Araújo Vila Verde
Penn State University
Department of Chemical Engineering
Fenske Laboratory
University Park, PA 16802
USA
 
Phone: +(1) (814) 863-2879
Fax: +(1) (814) 865-7846
avilaverde_at_engr.psu.edu
_________________________________

-----Original Message-----
From: owner-vmd-l_at_ks.uiuc.edu [mailto:owner-vmd-l_at_ks.uiuc.edu] On Behalf Of Ana Celia Araujo Vila Verde
Sent: Friday, February 17, 2006 3:10 PM
To: vmd-l_at_ks.uiuc.edu
Subject: RE: vmd-l: problem with script to get RMSD

Hi!

(Rui, sim, pt)

To answer Rui's question: the RMSD values that I get from both ways are very different: they differ in the first or second decimal place, and not just at the last significant digits.

By the way, I tried Nuno's suggestion with my code (calling the procedure 3 times with a different selection each time, instead of doing a foreach loop inside a single call of the procedure). I still get the same problem:

If I call the procedure 3 times inside the same file, first for sheet, then for protein, then for backbone, I get particular values for RMSD.

If I call the procedure 3 times but the script is in 3 different files - one for each selection - the values I get for the protein and the backbone selections are different from the ones I get for those same selections using the exact same script saved on a single file (like I described on the previous paragraph). I'm send you the code just after my signature (now modified for Nuno's suggestion)
(to run, save to a file and then type at a unix console: vmd -dispdev text -e nameofFile.tcl)

If someone could take 10 min to run my script in the ways I described it would really be very helpful. I'm guessing other people may be experiencing similar problems with their RMSD scripts and simply not noticing that they're occuring.

Ana

# loading files into VMD
set psf "3GBP1_2_wb_neutral.psf"
set pdb "3GBP1_2_wb_neutral.pdb"
set dcd "3GBP1_2_wb_neutral.dcd"
mol load psf $psf pdb $pdb ;#this gets the molid 0
mol load psf $psf dcd $dcd ;# this gets the molid 1

# procedure to get RMSD of all sheets relative to frame 0
# and the first \timestep for the given molecule id (default: top)

proc print_rmsd_BetterStats_through_time {{mol top} {sel}} {

        set sampFreq 500; # the sampling frequency throught the entire dataset
        set durTimestep 0.000002; # duration of the timestep in the simulation, in nanoseconds
        set stepsFrame 500; # number of timesteps in each frame in *dcd fil
#--------------------------------------------------------------------------- set outFile try24RMSD_${sel}_averagedSampFreq${sampFreq}.txt; # result goes to this file
                set out [open $outFile w]
                puts $out "computing average RMSD for $sel (for each point an average over the entire dataset is taken; sampling frequency is $sampFreq)"
                puts $out "time (ns) RMSD (angstrom)"

                set num_steps [molinfo $mol get numframes]
# puts "number of steps is $num_steps"
                # the frames being compared
                array set timeRMSD ""
# puts "cscobidoo1"
                for {set int 1} {$int <= $num_steps} {incr int} {
                        set rmsdAux1 "0"
                        set countInterval 0
                        puts "interval is $int..."
                        for {set nostep 0} {$nostep <= $num_steps} {incr nostep $sampFreq} {
# puts "scobidoo3"
                                set oldFrame [atomselect $mol "protein and $sel" frame $nostep]
# puts "scobidoo6"
                                set auxNew [expr $nostep + $int]
# puts "auxnew is $auxNew"
                                if { $auxNew <= $num_steps } {
                                        incr countInterval
# puts "count interval is $countInterval"
                                        set newFrame [atomselect $mol "protein and $sel" frame $auxNew]
# puts "scobidoo4"
# compute the transformation
                                        set trans_mat [measure fit $newFrame $oldFrame]
# puts "scobidoo5"
                                        # do the alignment
                                        $newFrame move $trans_mat
                                        # compute the RMSD
                                        set RootMSD [measure rmsd $newFrame $oldFrame]
                                        puts "frame new: $auxNew; frame old: $nostep; rootMSD is $RootMSD"
                                        append rmsdAux1 " + $RootMSD"
                                        $newFrame delete
                                        unset trans_mat
# unset $RootMSD

                                } else {
# $newFrame delete
                                        break
                                }
# unset auxNew
                                        $oldFrame delete
                        }
# puts "count interval is $countInterval"
# puts "sum is [expr $rmsdAux1]"
                        set avRMSD [expr [expr $rmsdAux1] / $countInterval]
# puts "avRMSD is $avRMSD"
                        array set timeRMSD "$int $avRMSD"
                }
                unset avRMSD
                # print the RMSD
# puts "RMSD of $frame is $rmsd"
                for {set i 1} {$i <= $num_steps} {incr i} {
                        foreach {Dframe RMS} [array get timeRMSD $i] {break}
                        set Dtime [expr {$Dframe*$stepsFrame*$durTimestep}]
                        puts $out "$Dtime $RMS"
                        unset Dtime
                 }

                close $out

}
# Call procedure
print_rmsd_BetterStats_through_time "top" "sheet"
print_rmsd_BetterStats_through_time "top" "protein"
print_rmsd_BetterStats_through_time "top" "backbone"
########################################################################################
# quitting VMD
quit

-----Original Message-----
From: J. Rui Rodrigues [mailto:jrui_at_ci.uc.pt]
Sent: Friday, February 17, 2006 2:29 PM
To: Ana Celia Araujo Vila Verde
Subject: Re: vmd-l: problem with script to get RMSD

Olá?

How different are the files you are getting? Are the RMSD values very
different? Or is it only the last decimal places?

Rui Rodrigues

Ana Celia Araujo Vila Verde wrote:

>Hi John,
>
>Thanks for replying. I don't think I was very clear on my previous email.
>Let me see if I can explain my problem better:
>The procedure has a foreach loop, looping through the elements in variable $selection.
>
>When $selection is set to "sheet protein backbone" (lets call this mode A), then in the first loop it will calculate, for all frames, the RMSD of all the atoms belonging to sheets relative to those same atoms in frame; it will output those RMSD values into a file. In the second loop it does the same thing but for all the atoms in the protein, and will output to a different file. In the third time it does the same thing for all the atoms belonging to the backbone.
>
>If I set selection to just "protein" (lets call this mode B), the foreach loop will only take place once, and will calculate the RMSD of all the atoms belonging to the protein relative to those same atoms in frame 0 and it will output those into a file.

>
>The problem is that, even though I didn't make any other changes in the tcl script, the RMSD for the protein atoms calculated using mode A is different from the RMSD that I get using mode B, even though I'm doing exactly the same thing in both! It's like information from the first loop somehow passes into the second loop, even though I unset all the variables at the end of each loop.
>
>I hope I explained my problem better.
>
>Thanks,
>
>Ana
>
>
>
> _________________________________
>Ana Célia Araújo Vila Verde
>Penn State University
>Department of Chemical Engineering
>Fenske Laboratory
>University Park, PA 16802
>USA
>
>Phone: +(1) (814) 863-2879
>Fax: +(1) (814) 865-7846
>avilaverde_at_engr.psu.edu
>_________________________________
>
>
>-----Original Message-----
>From: John Stone [mailto:johns_at_ks.uiuc.edu]
>Sent: Thursday, February 16, 2006 2:41 PM
>To: Ana Celia Araujo Vila Verde
>Cc: vmd-l_at_ks.uiuc.edu
>Subject: Re: vmd-l: problem with script to get RMSD
>
>
>Ana,
> It would be helpful to see exactly how you've changed the scripts so there's
>no confusion on what you're actually comparing. You didn't say anything
>about how different the RMSD values are, but they are bound to be somewhat
>different just because you have different atoms in each selection, and
>different numbers of atoms, so you'll definitely get different results,
>it's just a question of how different... One thing that may help is to
>use the same selections you're using for the RMSD calculations as
>graphical representations to highlight the particular selections you're
>interested in. Then, in order to see more clearly, you may want to use
>the Trajectory tab "Draw Multiple Frames" option to show the timesteps
>that you're comparing superimposed on each other. In order to see them
>more easily, you can then change the coloring method to "Timestep".
>Give that a try and see if it helps make more sense of the numbers you're
>getting from the RMSD calculation.
>
> John
>
>On Thu, Feb 16, 2006 at 02:31:28PM -0500, Ana Celia Araujo Vila Verde wrote:
>
>
>>Dear all,
>>
>>
>>
>>Based on the script to obtain the RMSD of one molecule relative to the same molecule at time 0, which can be found on the VMD manual, I built a very similar one to calculate the RMSD just for certain selections of the protein (see below signature). My problem is that if I do
>>
>>
>>
>>selection "sheet protein backbone", the RMSD for sheet is OK but for protein and for backbone is wrong (I think).
>>
>>
>>
>>If I take the exact same procedure and simply change to
>>
>>
>>
>>selection " protein "
>>
>>
>>
>> and then to
>>
>>
>>
>>selection "backbone ",
>>
>>
>>
>>I get different values for their respective RMSD! This leads me to think that the RMSD I get for protein and for backbone when I use """ selection "sheet protein backbone" """ is incorrect.
>>
>>
>>
>>Could anyone shine some light on this? Clearly I'm doing something wrong, but I looked at the code for hours and could not find it.
>>
>>
>>
>>Thanks Ana
>>
>>
>>
>>_________________________________
>>
>>Ana Célia Araújo Vila Verde
>>
>>Penn State University
>>
>>Department of Chemical Engineering
>>
>>Fenske Laboratory
>>University Park, PA 16802
>>
>>USA
>>
>>
>>
>>Phone: +(1) (814) 863-2879
>>Fax: +(1) (814) 865-7846
>>
>>avilaverde_at_engr.psu.edu
>>
>>_________________________________
>>
>>
>>
>>To execute I type vmd -dispdev text -e nameOfFile.tcl on the unix command line.
>>
>>
>>
>>proc print_rmsd0_through_time {{mol top}} {
>>
>>
>>
>>##################################
>>
>> set selection "sheet protein backbone" ;# the RMSD calculation is done for the elements of this list
>>
>>#################################
>>
>> set refFrame 0 ;# the RMDS is calculated against this reference
>>
>> set incrm 1; # the comparison is done every incrm number of frames
>>
>> set durTimestep 0.000002; # duration of the timestep in the simulation, in nanoseconds
>>
>> set stepsFrame 500; # number of timesteps in each frame in *dcd fil
>>
>>#-----------------------------------------------------------------------------
>>
>>
>>
>> foreach sel $selection {
>>
>> set outFile try21RMSD_${sel}_refFrame${refFrame}.txt; # result goes to this file
>>
>> set out [open $outFile w]
>>
>> puts $out "computing RMSD for all sheets using frame $refFrame as reference"
>>
>> puts $out "time (ns) RMSD (angstrom)"
>>
>>
>>
>> set reference [atomselect $mol "protein and $sel" frame $refFrame]
>>
>> # the frame being compared
>>
>> set compare [atomselect $mol "protein and $sel"]
>>
>>
>>
>> set num_steps [molinfo $mol get numframes]
>>
>># Does comparison every incrm 30 frames
>>
>> for {set frme $refFrame} {$frme <= $num_steps} {incr frme $incrm} {
>>
>> # get the correct frame
>>
>> $compare frame $frme
>>
>>
>>
>> # compute the transformation
>>
>> set trans_mat [measure fit $compare $reference]
>>
>> # do the alignment
>>
>> $compare move $trans_mat
>>
>> # compute the RMSD
>>
>> set rmsd [measure rmsd $compare $reference]
>>
>> # print the RMSD
>>
>># puts "RMSD of $frame is $rmsd"
>>
>> set time [expr {$frme*$stepsFrame*$durTimestep}]
>>
>> puts $out "$time $rmsd"
>>
>> }
>>
>> close $out
>>
>> $compare delete
>>
>> unset trans_mat
>>
>> unset rmsd
>>
>> unset reference
>>
>> unset num_steps
>>
>> }
>>
>>}
>>
>># Call procedure
>>
>>print_rmsd0_through_time
>>
>>#
>>
>>quit
>>
>>
>>
>
>
>