## VMD-L Mailing List

**From:** Andrew Dalke (*dalke_at_ks.uiuc.edu*)

**Date:** Sun Apr 13 1997 - 01:28:43 CDT

**Next message:**Brian Corrie: "Run time error on Max Impact"**Previous message:**Andrew Dalke: "Re: Segmentation error!"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Hello,

We had planned to talk about other interesting variables to trace in

this week's VMDTech, but that will have to wait for a few weeks.

Instead, this week we'll give another example of how to manipulate

trajectories using the VMD scripting language, in this case by

averaging each frame with its neighboring frames.

In other VMD news, if anyone here is looking for a job or post-doc in

east-central Illinois, or knows someone doing so, we have several

positions open. See http://www.ks.uiuc.edu/Services/Events/ for more

information. We have also reorganized the VMD web pages (thanks to

Jeff Ulrich) and the archive of the VMD mailing list (thanks to Jim

Phillips).

As always, suggestions and comments about VMD or ideas for future

VMDTechs are welcome. Please send them to this list or to

vmd_at_ks.uiuc.edu.

Andrew Dalke

dalke_at_ks.uiuc.edu

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

Smoothing Trajectories

One problem with viewing the results of MD simulations is that random

thermal motions often distract you from the motions you are trying to

observe. To reduce the problem, one trick many people use is to

average several frames together to smooth out the random component and

emphasize the non-random one.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

The Basics

As we pointed out in a previous VMDTech on Morphing, atom coordinate

information for each timestep is available to the scripting language.

It is a two step process. First, make an "atom selection" of the

atoms about which you are interested in getting more information. To

do this, you'll also need to know the appropriate molecule id, since

VMD will let you load several molecules. The molecule id is an

integer value or the word "top". The top molecule is designated in

the "Molecules" form and if there is only one molecule, it is by

definition the top molecule. (Note that all of this is describe in

the User's Guide.)

set sel1 [atomselect top "resid 23"]

set sel2 [atomselect top "all"]

Information about the selection is returned by using the "get" option,

followed by a list of keywords, as shown in these examples.

puts [$sel1 get charge]

puts [$sel1 get {index name}]

puts [$sel2 get {x y z}]

Some information can be modified, and coordinates are one of them. To

change coordinates, use the "set" option. For now, assume that there

are 5 atoms in "resid 23" ($sel1 from above), then the coordinates can

be changed like this:

$sel1 set {x y z} {{0 0 0} {1 0 0} {2 1 0} {3 1 0} {4 0 -3}}

(Yes, these coordinates are completely unreasonable.) Notice that the

coordinates are list of lists. It is a list of size N, and each item

on that list contains 3 values, for x, y and z, respectively.

For the scripts described for this VMDTech, we'll only need to get and

set the coordinate information, but we will need to get it over each

timestep. The selections have an option "frame" parameter to let you

access just that. There are a couple of ways to set this value, and

here are a couple of examples, assuming you've stored a selection in

the variable "sel1".

# get the coordinates for the first timestep

$sel1 frame 0

puts "First timestep: [$sel1 get {x y z}]"

# the sixth

$sel1 frame 5

puts "Sixth timestep: [$sel1 get {x y z}]"

# and the last

$sel1 frame last

puts "Last timestep: [$sel1 get {x y z}]"

We'll also need to know how many timesteps are available for a given

molecule, and that can be found using the "molinfo" command. Again,

you will need to know the molecule id, but the example will use "top".

puts "The top molecule has [molinfo top get numframes] timesteps."

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Simple Averaging

For the first script, the implementation is pretty easy; average the

coordinates over a window of size N, where N is odd. That is, the new

coordinates for frame a is:

(N-1)/2

1 ----

xyz'(a) = --- \ xyz(a + r)

N /

----

r = -(N-1)/2

One problem is how to treat cases near the edge where a + n <

0 or a + n >= the number of frames. For this case, we'll just reuse

the first or last frame. (Another option would be to not compute

those terms.)

The biggest problem is storing the coordinate information. The

coordinates for a given timestep cannot be changed until the original

information was completely used. In other words, once xyz'(a) is

computed, the replacement of xyz'(a) -> xyz(a) cannot be done until

the values of xyz'(a-r) to xyz'(a+r) are computed. Thus, some of the

information must be temporarily stored elsewhere before using it.

There are several ways to do this. One is to copy all the original

coordinate information to a Tcl array -- making a copy of the full

trajectory. This is the easiest implementation to understand, and is

shown here.

proc simple_average_frame {window_size {molid top}} {

# check that the window size is correct (must be odd)

if [expr $window_size % 2 == 0] {

error "window size must be odd for 'simple_average_frames'"

}

# window goes from -r to r

set r [expr ($window_size - 1) / 2]

# averaging over all the atoms

set sel [atomselect $molid all]

# get the number of frames

set n [molinfo $molid get numframes]

# get the coordinate information

for {set a -$r} {$a <= $n + $r} {incr a} {

set f $a

if {$a < 0} {set f 0}

if {$a >= $n} {set f [expr $n - 1]}

$sel frame $f

set xyz($a) [$sel get {x y z}]

}

# average over the coordinates

for {set a 0} {$a < $n} {incr a} {

puts "Computing frame $a of $n"

# initial coordinates

set temp_xyz $xyz([expr $a - $r])

# sum over the N coordinate sets

for {set i [expr $a - $r + 1]} {$i <= $a + $r} {incr i} {

set new_xyz {}

foreach x1 $temp_xyz x2 $xyz($i) {

lappend new_xyz [vecadd $x1 $x2]

}

set temp_xyz $new_xyz

}

set new_xyz {}

set Winv [expr 1.0 / $window_size]

foreach x1 $temp_xyz {

lappend new_xyz [vecscale $Winv $x1]

}

# set the coordinates

$sel frame $a

$sel set {x y z} $new_xyz

}

}

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Tastes Great -- Less Memory

However, this isn't very useful for large coordinate sets since it

does make a copy of the full animation, and does so using Tcl arrays,

which store numbers as strings rather than floats (though Tcl 8.0 is

supposed to change that). Here is a somewhat more complicated version

that stores only the number of coordinates as specified by the window

size. It is hard to say which is the better program to use; on a

small system, this new version was about 5% faster, but on a large

one, the previous example was about 5% faster. For the most part, it

doesn't matter.

The coordinates are stored in the queue 'xyz', which ranges from -r

to r. For each new frame, the information is shifted left, and the new

coordinates placed in xyz($r).

# average timesteps given a window size (from -(N-1)/2 to (N-1)/2)

proc average_frames {window_size {molid top}} {

# check that the window size is correct (must be odd)

if [expr $window_size % 2 == 0] {

error "window size must be odd for 'average_frames'"

}

# window goes from -r to r

set r [expr ($window_size - 1) / 2]

# averaging over all the atoms

set sel [atomselect $molid all]

# get the number of frames

set n [molinfo $molid get numframes]

# initialize the queue

# create a set of selections storing the different timesteps

# initialized to the first timesteps

for {set i -$r} {$i <= $r} {incr i} {

if {$i < 0} {

set f 0

} else {

if {$i >= $n} {

set i [expr $n - 1]

} else {

set f $i

}

}

$sel frame $f

set xyz($i) [$sel get {x y z}]

}

# loop over each timestep

for {set a 0} {$a < $n} {incr a} {

puts "Computing frame $a of $n"

# sum over all N coordinates

set xyz_avg $xyz(-$r)

for {set i [expr -$r + 1]} {$i <= $r} {incr i} {

set tmp_xyz {}

foreach pos $xyz_avg pos2 $xyz($i) {

lappend tmp_xyz [vecadd $pos $pos2]

}

set xyz_avg $tmp_xyz

}

# average them

set tmp_xyz {}

set Ninv [expr 1.0 / $window_size]

foreach pos $xyz_avg {

lappend tmp_xyz [vecscale $Ninv $pos]

}

# save the coordinates

$sel frame $a

$sel set {x y z} $tmp_xyz

# shift the stored data by one

for {set i [expr -$r + 1]} {$i <= $r} {incr i} {

set xyz([expr $i - 1]) $xyz($i)

}

# and add the new term to the coordinate queue

set f [expr $r + $a + 1]

if {$f >= $n} {

set f [expr $n - 1]

}

$sel frame $f

set xyz($r) [$sel get {x y z}]

}

}

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Using the Script

For those of you without a DCD or CRD trajectory file, or without a

set of PDB files you can load into VMD (hmmm, that would be a useful

VMDTech topic!), you can get a PDB and DCD file for bacteriorhodopsin

from http://www.ks.uiuc.edu/Research/vmd/workshop/ where it states:

*> The examples required two data files, the PDB structure of
*

*> bacteriorhopsin and a DCD(21 MB!) animation of the retinal being
*

*> pulled out. ( Here is a shorter version but still 4 MB.)
*

To load these into VMD, read the files as a "pdb and dcd" file in the

Files menu, or load the PDB file then append the DCD file with the

Edit menu.

You may also be able to use the 'urlload' option of VMD to load the

files (semi-)directly. Try the following commands (note that it saves

a temporary file in the directory $env(TMPDIR) and calls a perl script

directly in a manner than may not be supported in future versions):

set base_url http://www.ks.uiuc.edu/Research/vmd/workshop

# get the PDB file and load it

mol urlload pdb $base_url/bR.pdb

# get the DCD file

exec $env(VMDDIR)/scripts/vmd/url_get $base_url/bRshort.dcd \

* > $env(TMPDIR)/bRshort.dcd
*

# and load it

animate read dcd $env(TMPDIR)/bRshort.dcd

# you will have to delete the temporary file yourself

# _after_ VMD has read all the frames

if {[molinfo top get numframes] == 97} {

file delete $env(TMPDIR)/bRshort.dcd

}

With the structure and trajectory loaded, and with the script entered

into VMD (either through a cut-and-paste or by saving it to a file and

'source'ing it) choose a window size (3 and 5 are good options) and

run the 'average_frames' command like this:

average_frames 5

This will take a while (a bit over seven minutes on my Crimson! :( )

so for your own testing you will want to delete some frames. Use the

"skip" option in the Edit form, for instance, delete every 2 frames

twice, to cut out 3/4 of the frames).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Weighted Average

The last two examples gave the same weight to each of the timesteps

used in the averaging. As another possibility, you might want to

weight the terms in the window so the center is emphasized more than

the rest. For example, in a window of size 3 you might want the

middle weighted twice as much as each side (so the weights are {1, 2,

1} -- and if I remember FFTs right, this is called a "triangle

filter"). Here are two such functions.

# returns a list of window_size 1s.

# This is the same as a simple average

proc square_filter {window_size} {

if [expr $window_size % 2 == 0] {

error "window size must be odd for 'square_filter'"

}

set ret {}

for {set i 0} {$i < $window_size} {incr i} {

lappend ret 1

}

return $ret

}

# edges are 1, second from edges are 2, ..., middle is (window_size-1)/2

proc triangle_filter {window_size} {

if [expr $window_size % 2 == 0] {

error "window size must be odd for 'triangle_filter'"

}

set ret {}

set r [expr ($window_size - 1) / 2]

for {set i 0} {$i < $r} {incr i} {

lappend ret $i

}

for {set i $r} {$i >= 0} {incr i -1} {

lappend ret $i

}

return $ret

}

And you can add whatever other filters you want (how about a gaussian

curve?)

So I can run

triangle_filter 11

and get

Info) 0 1 2 3 4 5 4 3 2 1 0

or

square_filter 5

to get

Info) 1 1 1 1 1

Formally, given a list of weights w(r): -(N-1)/2 <= r <= (N-1)/2

where W = sum of all weights w(r),

(N-1)/2

1 ----

xyz'(a) = --- \ w(r) * xyz(a + r)

W /

----

r = -(N-1)/2

and you can see that if w(r) = 1, this is the same as the original

formulation.

These weights can be used with the following procedure, called

"weighted_frames", which takes as input the list of weights (there

must be an odd number of them) and the molecule id. Both of these

parameters have defaults; the first is {1 2 1}, which is a triangle

filter that weights the current frame twice as much as its neighbors.

As usual, the molecule id defaults to the top molecule.

Here are some examples:

weighted_frames {1 3 1} top

# identical to "average_frames 3" -- we tested! :)

weighted_frames [square_filter 3]

weighted_frames [triangle_filter 5] 2

and the full source is:

# Given a list of weights, compute the weighted average

# For example: weighted_frames {1 2 3 2 1} top

# which is the same as: weighted_frames [triangle_filter 5] top

# The default is a triangle average of window size 3 applied to

# the top moleucle

proc weighted_frames {{filter {1 2 1}} {molid top}} {

# check that the window size is correct (must be odd)

set window_size [llength $filter]

if [expr $window_size % 2 == 0] {

error "filter size must be odd for 'weighted_frames'"

}

# window goes from -r to r

set r [expr ($window_size - 1) / 2]

# averaging over all the atoms

set sel [atomselect $molid all]

# get the number of frames

set n [molinfo $molid get numframes]

# compute the total weight (using a cute Tcl trick

# to call a VMD function)

set total_weight [eval "vecadd $filter"]

# Check that it is somewhat sane

if {$total_weight == 0} {

error "cannot use weights that sum to 0 in 'weighted_frames'"

}

# save the weights in an array

set tmp $filter

for {set i -$r} {$i <= $r} {incr i} {

set weight($i) [lvarpop tmp]

}

# initialize the 'xyz' queue

# create a set of selections storing the different timesteps

# initialized to the first timestep

for {set i -$r} {$i <= $r} {incr i} {

if {$i < 0} {

set f 0

} else {

if {$i >= $n} {

set i [expr $n - 1]

} else {

set f $i

}

}

$sel frame $f

set xyz($i) [$sel get {x y z}]

}

# loop over each timestep

for {set a 0} {$a < $n} {incr a} {

puts "Computing frame $a of $n"

##sum over all N coordinates

# get the first coordinate

set xyz_avg {}

foreach pos $xyz(-$r) {

lappend xyz_avg [vecscale $weight(-$r) $pos]

}

# get the rest

for {set i [expr -$r + 1]} {$i <= $r} {incr i} {

set tmp_xyz {}

foreach pos $xyz_avg pos2 $xyz($i) {

lappend tmp_xyz [vecadd $pos [vecscale $weight($i) $pos2]]

}

set xyz_avg $tmp_xyz

}

# average them

set xyz_avg {}

set Winv [expr 1.0 / $total_weight]

foreach pos $tmp_xyz {

lappend xyz_avg [vecscale $Winv $pos]

}

# save the coordinates

$sel frame $a

$sel set {x y z} $xyz_avg

# shift the stored data by one

for {set i [expr -$r + 1]} {$i <= $r} {incr i} {

set xyz([expr $i - 1]) $xyz($i)

}

# and add the new term

set f [expr $r + $a + 1]

if {$f >= $n} {

set f [expr $n - 1]

}

$sel frame $f

set xyz($r) [$sel get {x y z}]

}

}

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Proposed VMD Vector List Commands

After doing these scripts and the morphing scripts from a few weeks

back, we've come to the conclusion that there needs to be a better way

to manipulate lists of vectors. For instance, given the coordinate

and the weight arrays used for 'weighted_frames', the final averaged,

weighted coordinates are computed as:

set xyz_avg {}

foreach pos $xyz(-$r) {

lappend xyz_avg [vecscale $weight(-$r) $pos]

}

# get the rest

for {set i [expr -$r + 1]} {$i <= $r} {incr i} {

set tmp_xyz {}

foreach pos $xyz_avg pos2 $xyz($i) {

lappend tmp_xyz [vecadd $pos [vecscale $weight($i) $pos2]]

}

set xyz_avg $tmp_xyz

}

# average them

set tmp_xyz {}

set Winv [expr 1.0 / $total_weight]

foreach pos $tmp_xyz {

lappend tmp_xyz [vecscale $Winv $pos]

}

The problem is there is no way in VMD to add lists of vectors element-

wise. (In other words, Tcl isn't APL :). Suppose instead there were

two functions 'veclistadd' and 'veclistscale' which operated on

"vector lists", using the reasonable definition. So you could say:

veclistadd {{1 0 1} {2 3 4}} {{-1 0 -1} {2 2 2}}

veclistscale 0.5 {{6 8 10} {-1 -2 -3}}

and get

{0.0 0.0 0.0} {5.0 6.0 7.0}

{3.0 4.0 5.0} {-0.5 -1.0 -1.5}

respectively.

If these functions existed, the average, weighted coordinates are

computed as:

set xyz_sum $xyz(-$r)

for {set i [expr $r + 1]} {$i <= $r} {incr i} {

set xyz_sum [veclistadd $xyz_sum \

[veclistscale $weight($i) $xyz($i)]]

}

set xyz_sum [veclistscale [expr 1.0 / $total_weight] $xyz_sum]

which is much easier (IMHO, of course) to understand and use. So in

the final 1.2 version of VMD, there will be a full set of veclist

functions that complement the regular vector functions. And as with

the vector functions, the most commonly ones will also be implemented

in C++ for speed.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

If there are problems, comments or suggestions about these examples,

please send them to us. These scripts are making their way to the VMD

script library (http://www.ks.uiuc.edu/Research/vmd/script_library)

and will be in the user's guide, so we want ideas for improvements.

Andrew Dalke

dalke_at_ks.uiuc.edu

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

**Next message:**Brian Corrie: "Run time error on Max Impact"**Previous message:**Andrew Dalke: "Re: Segmentation error!"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]