Introduction

The recent developments of projects such as the sequencing of the genome from several organisms, and high-throughput X-ray structure analysis, have brought to the scientific community a large amount of data about the sequences and structures of several thousand proteins. This information can effectively be used for medical and biological research only if one can extract functional insight from the sequence and structural data. To achieve this we need to understand how the proteins perform their functions. Two main computational techniques exist to reach this a goal: a bioinformatics approach, and atomistic molecular dynamics simulations. Bioinformatics uses the statistical analysis of protein sequences and structures to understand their function and predict structures when only sequence information is available. Molecular modeling and molecular dynamics simulations use the principles from physics and physical chemistry to study the function and folding of proteins. Bioinformatics methods are among the most powerful technologies available in life sciences today. They are used in fundamental research on theories of evolution and in more practical considerations of protein design. Algorithms and approaches used in these studies range from sequence and structure alignments, secondary structure prediction, functional classification of proteins, threading and modeling of distantly-related homologous proteins to modeling the progress of protein expression through a cell's life cycle. In this tutorial you will use classical sequence alignment methods; with the Smith-Waterman [1] and Needleman-Wunsch algorithms. You will start out only with sequence and biological information of class II aminoacyl-tRNA synthetases, key players in the translational mechanism of cell. Then you will classify protein domains and align the determined protein domains structurally. If structural alignments are considered to be the true alignments, you will see that simple pair sequence alignment of two proteins with low sequence identity has serious limitations. Finally you will determine the phylogenetic relationship of class II tRNA synthetases with a dendrogram-creation algorithm. You will carry out the exercises with the programs VMD, MOE, MATLAB and a Needleman-Wunsch alignment program provided by A. Sethi. Many of the tools of the field can be freely accessed by any person with a web browser; a listing of our favorite bioinformatics tools and resources is provided. The entire tutorial takes about 2 hours to complete.

$% latex2html id marker 494 \fbox{ \begin{minipage}{.2\textwidth} \includegraph... ...e, Adenine, Adenine, Guanine, ...'', or \lq\lq C, A, A, G, ...''.} \end{minipage} }$

This tutorial assumes that VMD, Moe, Matlab and other software has been correctly installed on the user's computer. Please ask a lab attendant for help if you have any trouble locating software or data files during the tutorial.

To set up the exercises...

You will make a copy of the files needed for these exercises in your home directory. Open a terminal window, and, if you don't already have one, make a ~/tbss.work directory by typing at the Unix prompt:
mkdir ~/tbss.work
Make sure that you have a ~/tbss.work directory:
ls ~/tbss.work
Copy the needed directory, but instead of typing TOP_DIR, type the location of the Summer School directory tree:
cp -rp TOP_DIR/sumschool03/tutorials/07-bioinformatics/files/Bioinformatics/ ~/tbss.work/
For instance, if the materials are located at /mnt/cdrom, you will type:
cp -rp /mnt/cdrom/sumschool03/tutorials/07-bioinformatics/files/Bioinformatics/ ~/tbss.work/
Check that you have the files in this directory by listing the contents:
cd ~/tbss.work/Bioinformatics
ls -lR
In this tutorial, when we refer to ~/tbss.work/Bioinformatics/ and its subdirectories, we are referring to the copy which you have just made in your own home directory.

Next: Biology of class II Up: Bioinformatics Tutorial Previous: Bioinformatics Tutorial

zan@uiuc.edu