Guide to Using MSA3D
in Protein Explorer -- by Eric Martz, August 2000
Portions of PE's MSA3D routines were generously contributed by
Paul Stothard.
Walk-Through for First-Time Users: Enolase
Thanks to Garry Duncan, Nebraska Wesleyan College, for providing
the enolase alignment.
- Please start by reading carefully the overview on the "MSA3D
Procedure" page (accessed from Advanced Explorer).
- Now click on the link "MSA3D ALIGNMENT FORM"
and carefully read the "MSA3D Form" page.
You can skip the "Advanced Options" section at the bottom.
If portions of this page
are unclear, they will become clear as we proceed.
- Now, in the "MSA3D Form" window,
please find the Ready-Made Examples and click the link "Enolase".
Accept the offer to fetch 4enl.pdb via Internet, or else you
will need to load a local copy.
Notice that clicking "Enolase" caused the relevant alignments
to be pasted into both boxes.
- Press the button "Color Alignment & Molecule". A new window
will appear containing the "MSA3D Alignment Listing". Read carefully
the explanation at the top, and see also the summary counts and
percentages at the bottom of this window.
- After you have scrutinized the Listing, click on the molecule
to bring it to the foreground. Notice that the alignment colors have
been applied to the molecule.
- Click on the links Identical, Similar, Different to spacefill
the residues in these categories.
The catalytic site is marked by a bound sulfate ion, and deeper, a
brown zinc ion.
Notice that the entire area around
the active site is "Identical" in an evolutionary span from Archebacteria
through man!
Pasting in an alignment and correcting mismatches with "sliding".
Thanks to Gabe McCool, University of Massachusetts Amherst,
for acquainting me with these molecules.
- Instructions are given for making an alignment in Biologists
Workbench in a later section below. Before you do that, the prepared alignment
in this section will give you some
useful experience in using the MSA3D feature. This is an alignment between
chain B of tubulin, and the bacterial cell division protein Ftsz.
These two proteins have less than 20% sequence identity, but a high
level of structural similarity.
(For more information on these proteins,
please see
Exploring structure and function of FtsZ, a prokaryotic cell
division protein and
tubulin-homologue
by Gabe J. McCool.) The alignment below was done by ClustalW in
Biology Workbench, using default settings. Given the low level of
sequence homology, the alignment may not be very meaningful, but it
is useful to illustrate some features of MSA3D.
>1TUB_B; Tubulin from Sus scrofa, electron diffraction
MREIVHIQAGQCGNQIGAKFWEVISDEHGIDPTGSYHGDSDLQL--ERINVYYNEAAGNKYVPRAILVDLEP
GTMDSVRSGPFGQIFRPDNFVFGQSGAGNNWAKGHYTEGAELVDSVLDVVRKESESCDCLQGFQLTHSLG
GGTGSGMGTLLISKIREEYPDRIMNTFSVVPSPKVSDTVVEPYNATLSVHQLVENTDETYCIDNEALYDI
CFRTLKLTTPTYGDLNHLVSATMSGVTTCLRFPGQLNADLRKLAVNMVPFPRLHFFMPGFAPLTSRGSQQ
YRALTVPELTQQMFDAKNMMAACDPRHGRYLTVAAVFRGRMSMKEVDEQMLNVQNKNSSYFVEWIPNNVK
TAVCDIPPRGLKMSATFIGNSTAIQELFKRISEQFTAMFRRKAFLHWYTGEGMDEMEFTEAESNMNDLVS
EYQQYQD
>1FSZ; Methanococcus jannaschii
-------------SPEDKELLEYLQQTKAKITVVGCGGAGNNTI--TRLKMEG--------IEGAKTVAINT
DAQQLIRTKADKKILIGKKLTRG-LGAG-----GNPKIGEEAAKESAEEIKAAIQDSDMVF---ITCGLG
GGTGTGS-APVVAEISKKIG---ALTVAVVTLPFVMEGKVRMKNAMEGLERLKQHTDTLVVIPNEKLFEI
VPN--MPLKLAFKVADEVLINAVKGLVELITKDGLINVDFADVKAVMN---NGGLAMIGIG--ESDSEKR
AKEAVSMALNSPLLDVD-----IDGATGALIHVMGPED--LTLEEAREVVATVSSR--------------
----------LDPNATIIWG--------ATIDENLENTVRVLLVITGVQSR----IEFTDTGLKRKKL--
-------
- Load 1fsz.pdb.
- In the "MSA3D Form" window, press the "Clear Form" button
and OK the confirmation. Block and paste the alignment above into
the top "Alignment Box" on the MSA3D Form window. (Don't worry
about the spaces at the beginning of each line -- spaces will be
ignored.)
- Block the 1FSZ portion of the above alignment and
paste it into the lower "3D Sequence" box.
- Uncheck "Apply colors to molecule". This is optional but will save
some time until we get the mismatches fixed.
- Click on "Color Alignment & Molecule". (The molecule won't be colored,
however, since we unchecked that option.) Notice that nearly all residues
are red, signifying mismatches with the aligned 3D sequence.
Had we loaded
the wrong PDB file altogether, this would be the result, and this coloring
would prevent you from inadvertantly thinking the alignment colors
could be meaningfully applied to the 3D structure.
Notice that the number of residues not mismatched is 4 + 1 + 10 = 15.
We can expect up to about 10% matches
at random, in the absence of any sequence identity. Note that 15 residues is
close to 5% of the 312 residues shown.
- In the "Alignment Listing" window,
touch the N-terminal Ser
with the mouse and notice (in the status bar) that it is residue 23 in the
sequence of 1fsz.pdb. This causes 22 dots to be prefixed, representing
the missing 22 residues (presumably disordered and unresolved in the crystal).
These types of gaps are typically closed up in aligned sequences.
Notice that the leading sequences labeled 1FSZ and 1fsz.pdb agree, but
are offset by 22 residues. To make them match, we need to slide
the PDB file sequence 22 residues to the left. To instruct MSA3D to do this, enter
"-22" in the slot labeled "slide the PDB file sequence to the right
positions". Now press the "Color Alignment & Molecule" button again.
There are now 0 mismatches (check summary line at the bottom of the Listing
window).
- Here is a more complicated example.
Bring the main Protein Explorer window to the foreground,
click on the link "MSA3D Procedure".
Enter 1tub (tubulin) into the slot near "Load"
at item 4 on the "MSA3D Procedure" page.
- Bring the "MSA3D Form" window to the foreground, and replace
the contents of the bottom box "3D Sequence" with the aligned
sequence 1TUB_B. (Leave the contents of the top box unchanged.)
- Delete the "-22" in the slot.
- Uncheck "Apply colors to molecule".
- Press the "Color Alignment & Molecule" button. Examining the listing
will reveal that about 90% of the 1tub.pdb sequence is mismatched, and there
is no obvious offset that would correct this. The problem is that we did
not specify which chain is in the alignment, so chain A was used by default.
There is not much sequence similarity between the two chains in tubulin.
Enter "b" in the "Apply colors to chain(s)" slot.
Press the "Color Alignment & Molecule" button again.
- The first 44 residues are matched, but a 2-residue gap causes
a mismatch thereafter. Touching the first dot in the gap reports
in the status line that it is position 45. Therefore we must slide
the PDB file sequence 2 positions to the left starting at position 45.
In the "slide the PDB file sequence to the right" slot, enter
"-2@45". Press the "Color Alignment & Molecule" button again.
- Mismatches are now avoided up to an 8-residue gap beginning with at dot
at position 361.
In the "slide the PDB file sequence to the right" slot, enter
"-2@45;-8@361". Press the "Color Alignment & Molecule" button again.
Zero mismatches -- hooray!
- Now check "Apply colors to molecule", and
press the "Color Alignment & Molecule" button again.
Pull the main Protein Explorer window to the foreground so you can see
the molecule. The Ready/Busy indicator below the molecule will be busy
while the colors are applied, and again while the Identical/Similar/Different
buttons are generated.
- The main purpose of the above exercise was to make clear what
mismatches mean, and how to correct them when sliding is needed.
Characteristics of alignments suitable for MSA3D
- The multiple sequence alignment must be for amino acids.
It would be possible to adapt MSA3D to handle RNA alignments as well.
If you wish to use MSA3D for RNA,
please contact
Eric Martz.
- The alignment must be in FASTA format.
PIR formatted alignments can be used with minor editing (see
example below).
- The alignment cannot exceed about 30,000 bytes because larger blocks of
text are not truncated to that size when pasted into a browser form box.
A mechanism has been designed and tested (but not released) that can handle
much larger alignments. If you need this,
please contact
Eric Martz.
Ready-made alignments from HOMSTRAD
- The
Homologous Structure Alignment Database (HOMSTRAD) offers
alignments of families of sequences within the Protein Data Bank.
That is, the only sequences included are those corresponding to
3D structure entries in the Protein Data Bank. This has the advantage
that the alignments usually contain less than a dozen sequences, and hence
fit easily into MSA3D's form and are processed rapidly.
-
For example, go to HOMSTRAD and search for "recombinase".
Several families are displayed, with 2 to 5 sequences per alignment.
Click on the family
Bacterial DNA recombination protein, RuvA: holliday junction DNA helicase RuvA.
- You will see a table containing (at the time I tried it) only two
PDB codes, 1cuk and 1bvs.
- Click on the link pir in the bottom line of the table. This
displays the alignment in PIR format. PIR format is very close to FASTA
format, however PIR has two lines of comments preceding the sequence,
while FASTA has only one. This confuses MSA3D, so you need to edit the
alignment to reduce the two lines to one. This can be done simply by
deleting the carriage return to join the first two lines into one long
one. (It will wrap in the form, but still be processed as one line.)
Alternatively, you can delete the text in red, which is tedious if you
have a lot of sequences in your alignment, but enables the labels
to show in the MSA3D Alignment Listing. (Labels are truncated at the first
semicolon.) Example: change this
>P1;1cuk
structureX:1cuk: 1 : : 203 : :holliday junction DNA helicase RuvA:Escherichia coli: 1.9: 20.9
MIGRLRGIIIEKQPPLVLIEVGGVGYEVHMPMTCFYELPEAGQEAIVFTHFVVREDAQLLYGFNNKQERTLFKEL
IKTNGVGPKLALAILSGMSAQQFVNAVEREEVGALVKLPGIGKKTAERLIVEMKDRFKGLHGDLFTPTDDAEQEA
VARLVALGYKPQEASRMVSKIARPDASSETLIREALRAAL--*
to this
>1cuk holliday junction DNA helicase RuvA:Escherichia coli
MIGRLRGIIIEKQPPLVLIEVGGVGYEVHMPMTCFYELPEAGQEAIVFTHFVVREDAQLLYGFNNKQERTLFKEL
IKTNGVGPKLALAILSGMSAQQFVNAVEREEVGALVKLPGIGKKTAERLIVEMKDRFKGLHGDLFTPTDDAEQEA
VARLVALGYKPQEASRMVSKIARPDASSETLIREALRAAL--*
- Copy the PIR alignment from HOMSTRAD and paste it into a text editor
(Word, WordPad, BBEdit, etc.). Edit it as explained above. Now it is ready
to paste into the MSA3D form and use to color the 3D image. We'll assume
you've done the sections on How to Use MSA3D, so you
know how to proceed.
Preparing an alignment in Biology Workbench (BW).
BW is a very flexible and powerful system. The method described
below is only one of many ways it could be used to prepare a multiple
protein sequence alignment.
BW is not very user friendly, but
the fact that it saves your sessions makes it worth the trouble.
Once you get the hang of it, you can try variations on the method below.
The instructions below were written for Biology Workbench. After
they were written,
Biology Workbench for Students became available. It has a more limited
set of options, and is a bit more user friendly, and it also saves your sessions
(indeed, shares them with Biology Workbench!).
Students may prefer to use it, while researchers will prefer the
additional options in the full Workbench.
In the Student Workbench, the process is similar to the
one outlined below for the full
Biology Workbench -- a few details are different, but
you should have no trouble
adapting the procedure below.
Caveat: I have little experience preparing alignments!
If you know of a tutorial with better or more complete advice, please
tell
me about it.
- Go to the Biology Workbench (BW).
- If you have not used BW before, click "Setup a free account".
It takes only a few
minutes. The advantage is that your sessions will be saved, so you can easily
resume one.
- After you enter BW, click the [Session Tools] button.
- Select "Start New Session", and press the [Run] button.
- Enter a session description, such as the name of the molecule
of interest. Press the [Start New Session] button.
SELECTING SEQUENCES
Sequences can be selected to address different questions. Often,
one wants to know which residues are conserved over a broad range
of phylogenetic distance. Another question is which residues have
mutated in closely related molecule, for example wild type human
hemoglobin vs. sickle cell hemoglobin.
|
The Sickle Hemoglobin Mutation in MSA3D
If you want to visualize the difference between wild type vs. sickle human
hemoglobin using MSA3D, view 2HBS (human sickle hemoglobin)
in Protein Explorer,
and use an alignment between the sequences of the beta chains (chain B
in each case) of
2HBS and 2HHD (human
wild type hemoglobin). You can use the steps below to search for
these two PDB ID codes ("2HBS or 2HHD") and align chains B, or
if you're impatient,
here is that alignment ready to paste into MSA3D's form:
>2HHD_B
VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA
LAHKYH
>2HBS_B
VHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA
FSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA
LAHKYH
Be sure to enter BDFH (all the beta chains) in the slot
"Apply colors to chain(s)" on the MSA3D form. Otherwise the alignment will
be compared with the alpha chain, and you'll get mostly mismatches!
If you want to find out what the mutant valine residue (E6V) is contacting
in order to aggregate the two hemoglobin molecules:
- At the MSA3D Result, click Different to spacefill just the
mutant valine. (If the Identical residues are spacefilled, click
Hide after the word Identical.)
- Go to Advanced Explorer, then Quickviews.
- SELECT Clicked, one residue per click, and select the one valine that is
touching the adjacent hemoglobin molecule.
- Click Stop at the top of the middle frame, to stop selection.
- DISPLAY Contacts.
- Zoom to enlarge the view.
- To tell which chain each contacting residue belongs to, enter this
command in the command entry slot: select not balled.
- COLOR Chain.
- To find out more, visit this
Hemoglobin Tutorial. You'll learn that the residues contacting
the mutant valine in 2HBS are not the same ones involved in sickle
cell disease. Nevertheless, the principle is similar: the additional
surface hydrophobicity of the mutant valine "precipitates" the hemoglobin.
|
- Press the [Protein Tools] button.
- Select "Ndjinn - Multiple Database Search", and press the [Run]
button.
-
Check only one database: PDBFINDER. (Look for the first blue line,
OMIM; just below it is PDBFINDER.) This guarantees that there is a
3D structure (PDB file) for everything you find.
- Enter the name of the molecule of interest in the slot at the top,
and press the [Search] button.
- A list of sequence names is displayed.
(If you get "No sequences found",
go back and make sure you checked PDBFINDER -- if you don't check any
databases, it doesn't tell you, but simply reports "No sequences found".
This is an example of user unfriendliness.)
Each sequence name is prefixed with "PDBFINDER"
(meaning it has a published 3D structure). You need to select at least
one of these to include in your alignment. More than one is also OK.
Use the [Show Records] button to get more information
about the checked sequences.
- After you have checked the desired sequences, and unchecked others,
press the [Import Sequences] button.
- At this point, you may already have enough sequences to try
an alignment. If so, skip ahead to ALIGNMENT.
Even two sequences sometimes gives an informative result.
GETTING MORE SEQUENCES
- Method I: Searching by Name. Go to Protein Tools and use the Ndjinn
search as in the preceding steps, but this time check the SWISSPROT
database (instead of only PDBFINDER). You'll get a lot more sequences.
(If you can't find it in the long list
of databases, use Netscape's Edit, Find in Page to look for "swiss".)
- Method II: Searching by Sequence Similarity. Go to
Protein Tools and check just one sequence name, the one
representing the PDB file you want to color in Protein Explorer.
(You may want to preview the PDB files you've selected in Protein
Explorer to select the best one for 3D viewing.)
- Select BLASTP, then press Run.
- On the BLASTP page, select the SWISSPROT database.
- Just below the databases list is a slot for "Expectation value".
The default (10) is much too high. Enter 0.1.
- Scroll to the bottom and press the Submit button.
- Now you need to select some of the sequences in this list
for importing. Use the [Show Records] button to help your selection.
- Use the [Import Sequence(s)] button to import sequences you
wish to align.
ALIGNMENT
- Now you have a list of sequences with checkboxes. Select
"Select All Sequences" and press [Run].
- Uncheck any sequences you don't want to include in the alignment.
- Make sure you check at least one sequence for which a
3D structure is available. All PDBFINDER sequences have 3D structures.
SWISSPROT sequences usually don't.
- Scroll down in the list of operations at the top until you find
CLUSTALW (near the middle of the list). Select it and [Run]. On the
next screen titled CLUSTALW, press [Submit].
- Examine the alignment carefully. An alignment that has very few
identities, or very few differences, may not be informative. If you wish
to exclude one or more sequences, press the [Return] button and rerun
the alignment.
- Once you are satisfied with the alignment, press [Import Alignment].
- You should now see a list of all alignments you have made (initially
just one), each with a checkbox. Notice that you are now in the Alignment
Tools, no longer in Protein Tools.
- Now we need to get the alignment in FASTA format. Check the
checkbox for the desired alignment. Select "Edit
Aligned Sequences", press [Run].
- Find the Format menu, and change it to "Fasta".
- Block and copy the alignment. Paste it directly into
Protein Explorer's MSA3D form. Optionally, also paste into a word processor
and save it as a file for later use.
- Select the one sequence that matches the 3D structure PDB file
you wish to view and color. Copy
that sequence into the bottom "3D Sequence" box on the MSA3D Form. Load the
corresponding PDB file. Assuming you have done the tutorial above,
you will now know how to proceed.
- Sometimes the PDB file sequence does not begin anywhere in
the alignment listing -- the PDB file sequence line is all
dashes. If this happens, use the Molecule Information Window to open the
Sequences display. Note the number of the first residue
(we'll call it "N1"). Enter the value -(N1 - 1) in the slot on
the MSA3D Alignment Form labeled "slide the PDB file sequence".
For example, if N1 is 389, enter -388 in the slot, then press the
[Color Alignment and Molecule] button.
- Occasionally, CLUSTALW will fail to align a sequence correctly
with other sequences. If the alignment is important to you, inspect it carefully in the
MSA3D Alignment Listing. Look for crucial sequence motifs known for
this family of molecules
and make sure they are aligned. (If the alignment is incorrect, I don't know how to fix it.
Send suggestion to me.)
Error avoidance and design features of the MSA3D tool.
- MSA3D refuses to proceed unless all the aligned sequences have
the same length. Were an unaligned sequence of the loaded PDB file
to be pasted into
the lower box, most likely the length would differ from the alignment, and
hence this would be caught.
- When a residue in the PDB file sequence is not identical to the
residue in that position in the aligned "3D Sequence", both residues
will be colored "mismatch" in the listing, and the mismatch color will
also be applied to the 3D structure.
This avoids applying colors to the wrong residues, as might occur if
(i) the wrong PDB file is loaded, or (ii)
the "3D Sequence" in the alignment is not 100% identical with the
"PDB file sequence", or
(iii) a sliding correction is needed
to get a match between the "3D Sequence" and the "PDB file sequence".
- The sequence of the PDB file can be longer or shorter than the
alignment, and vice versa.
- The residue counts (and percentages, and total residues) in the
summary at the bottom of the alignment listing include only the portion
of PDB file residues that fit underneath the alignment.
If sliding to the left causes residues in the PDB file to be skipped,
they will neither be listed nor included in the summary counts.
- If the PDB file sequence
is longer than the alignment, residues beyond the end of the alignment
will be listed in the "No Info" color, colored "No Info" in the 3D
structure, and excluded from
the summary counts at the bottom of the listing window.
- When the first residue in the PDB file has a negative or zero
sequence number, the negative or zero numbers
will be reported in the MSA3D Alignment Listing
window (in the status line, when the residues are touched with the mouse).
[Prior to PE version 1.901 released 8/31/01, the listing always assigned
1 to the first residue, and this caused colors to be applied to the wrong
residues when the first sequence in the PDB file was negative.]
- Gaps, including a leading gap, in the PDB file sequence
will be represented by dots (periods) and will
require sliding corrections
to avoid mismatches.