Gene 508, Evan
Eichler, Ph.D.
I.
The Purpose
1) Determination of consensus sequence from a family of related sequences
2) Identify regions of conservation and regions of rapid divergence
3) The most basic first step in phylogenetic analysis-almost all tree-building methods require a multiple alignment as input.
II.
The Problem
Multiple global alignments of sequences in a rigorous fashion is possible BUT the computational time required to perform such searches is proportional to the product of the sequence lengths.
--BiG 0 notation becomes exponential--On
--if a pairwise alignment requires n x m time, then a multiple alignment of three sequences using the Needleman Wunsch algorithm would take n x m x o time, etc
Instead of a true multiple alignment…perform a series of progressive pairwise alignments in a step-by-step fashion.
The Steps:
1) Perfrom all possible pairwise alignments among the sequences. Becomes impractical after some point of n sequences n(n-1)/2 pairs for n given
sequences. 20 min if 1 pairwise takes 1 second and n=50.
2) Establish a hierarchy of relationships (based on UPGMA phylogeny or simple ordering)—usually distance methods are chosen because they are the most rapid
3) Identify the most similar pair…generate a consensus and use this consensus (averaged scores to compare against the next closest members). Repeat.
Some Problems with the Solution:
1) Still requires a trained eye to resolve some obvious discrepancy.
2) An inherent guide tree is created which obviously will dictate phylogenetic analysis to some degree
3) Order of input sequences sometimes matters.
· Thompson, Higgins and Gibson, 1994: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994 Nov 11;22(22):4673-80.
· “unabashedly ad hoc”: A Heuristic that works sometimes.
· Why does it work: 1) Weights sequences to compensate for biased representation
2) Proteins: close sequences (Blosum 80 hard matrix)
distant sequence (Blosum 50 soft matrix
3) Minimizes a gap blitzkrieg (by increasing affine parameters based on proximity)
4) Treatment of low-scoring alignment.
clustalw
· Input file: a single file with multiple files in one of six formats (suggested fasta)
· Menu driven options
Guy Bottu,
http://ben.vub.ac.be/embnet.news/vol2_1/align.html)
**************************************************************
******** CLUSTAL W(1.4) Multiple Sequence
Alignments ********
**************************************************************
1. Sequence Input From Disc
2. Multiple Alignments
3. Profile Alignments
4. Phylogenetic trees
S. Execute a system command
H. HELP
X. EXIT (leave program)
Your choice: 2
To do a multiple alignment, enter
"2" and the following menu is displyed:
****** MULTIPLE
ALIGNMENT MENU ******
1.
Do complete multiple alignment now (Slow/Accurate)
2.
Produce guide tree file only
3.
Do alignment using old guide tree file
4.
Toggle Slow/Fast pairwise alignments = SLOW
5.
Pairwise alignment parameters
6.
Multiple alignment parameters
7.
Reset gaps between alignments? = ON
8.
Toggle screen display =
ON
9.
Output format options
S.
Execute a system command
H.
HELP
or press [RETURN] to go back to main menu
Your choice: 5
This menu contains some new features with
regard to the clustlv program
(4, 7, 8).
Option 5 shows the default parameters.
********* PAIRWISE ALIGNMENT PARAMETERS
*********
Slow/Accurate alignments:
1. Gap Open Penalty :10.00
2. Gap Extension Penalty :0.10
3. Protein weight matrix :BLOSUM30
Fast/Approximate alignments:
4. Gap penalty :3
5. K-tuple (word) size :1
6. No. of top diagonals :5
7. Window size :5
8. Toggle Slow/Fast pairwise alignments =
SLOW
H. HELP
Enter number (or
[RETURN] to exit):
****** MULTIPLE
ALIGNMENT MENU ******
1.
Do complete multiple alignment now (Slow/Accurate)
2.
Produce guide tree file only
3.
Do alignment using old guide tree file
4.
Toggle Slow/Fast pairwise alignments = SLOW
5.
Pairwise alignment parameters
6.
Multiple alignment parameters
7.
Reset gaps between alignments? = ON
8.
Toggle screen display =
ON
9.
Output format options
S.
Execute a system command
H.
HELP
or press [RETURN] to go back to main menu
Your choice: 6
Hit return to see the
default multiple alignement parameters.
****** MULTIPLE
ALIGNMENT PARAMETERS ******
1. Gap Opening Penalty :10.00
2. Gap Extension Penalty :0.05
3. Delay divergent sequences :40 %
4. Toggle Transitions (DNA) :Weighted
5. Protein weight matrix :BLOSUM series
6. Use negative matrix :OFF
7. Protein Gap Parameters
H. HELP
Enter number (or
[RETURN] to exit):
****** MULTIPLE
ALIGNMENT MENU ******
1.
Do complete multiple alignment now (Slow/Accurate)
2.
Produce guide tree file only
3.
Do alignment using old guide tree file
4.
Toggle Slow/Fast pairwise alignments = SLOW
5.
Pairwise alignment parameters
6.
Multiple alignment parameters
7.
Reset gaps between alignments? = ON
8.
Toggle screen display =
ON
9.
Output format options
S.
Execute a system command
H.
HELP
or press [RETURN] to go back to main menu