Introduction to Bioinformatics (KB7004) → Lab: Multiple sequence alignments

Lab: Multiple sequence alignments

Instructions

Answer all questions and send it to Marco by email.

***Please send a pdf file (not .docx, .odt etc) with “name_lab_your_name.pdf”

Video instructions

Grading

The lab reports are a training to write scientific texts and an important part of the course. To pass you need to provide a final version of the lab report within 7 days after the lab. You have one chance to submit a preliminary version and receive feedback on it (not mandatory). The preliminary version needs to be submitted within 72 hours of the lab and feedback will be provided at the latest 24 hours before the deadline of the assignment. If the final report it not submitted in time or it contains an error you will get an Fx on the lab course.  This means that you have to submit an updated lab report within 7 days after the exam and that the highest grade you can get is an E on the entire course. If this does not occur you will have to re-register for the course next time it is given (normally next year) and complete the missing parts that year and you can still not receive a grade higher than E.  So in short this is the rules for the lab reports.

A. 0-0 Lab

B. 0-3 days after the lab – Preliminary version

C. 3-6 days after the lab – Feedback

D. 7 days after the lab – Final version

The lab report must be complete upon submission, i.e. you have to provide an answer to each question. The assistant has the right to return / not approve the submission if it is not complete. Linguistic accuracy is a must.

Returns should get back to s within a week after you have received it. The submission of returns must be a version where all comments appear. All comments should be considered carefully or else it does not count as a return. A maximum of 1 return is allowed. The second return means that you have to re-register for the course next time it is given (normally next year) and complete the lab then.

Complete reports in time for all bioinformatics and programming labs are compulsory.

Assignment Instructions

Download all you need first!

Resources

P08487.fasta

Get the sequences of the proteins with the following accession numbers: A2AP18, O75038, Q4KWH8, Q8N3E9, Q4KWH5, Q8K2J0. The easiest way is to use Uniprot, and then retrieve.

PSI-BLAST

PSI-BLAST is an extension of BLAST and is employed to find sequences that are more distantly related to the query sequence. You can find information about how it works at
http://www.ncbi.nlm.nih.gov/books/NBK2590/

    1. How is PSI-BLAST different from blastp?

    2. What is a PSSM? Explain what kind of information it contains and how it is used.

    3. What does it mean when a PSI-BLAST search converges?

We will use PSI-BLAST to find evolutionarily related (homologous) sequences. The query sequence is taken from the Swissprot database and is the same as in the end of last assignment (P08487.fasta.txt).
Under protein blast it is possible to select PSI-BLAST as an option in the algorithm selection. The first round will be similar to running blastp. The sequences found in the first round can then be used to create a so called sequence profile to use in the next search.

    4. Which sequence is most similar to P08487 when searching the Swissprot database according to PSI-BLAST first round?

    5. Run PSI-BLAST iteration 2. Which one of the sequences that was found after iteration 2 but not in iteration 1 is most similar to your query protein P08487?

Multiple sequence alignments
Kalign

    6. Explain briefly why you would use a multiple sequence alignment to study a group of related proteins.

    7. What does it mean that a site is conserved?

Now you are going to create a multiple sequence alignment (MSA) of some of the hits from the PSI-blast search above. BLAST only reveals the part of the hit sequence that matches the query, so in order to build a proper MSA we will first need to retrieve the full sequences.

Now all the sequences you downloaded, together with the query sequence, are going to be aligned in a multiple alignment. The first program to take a look at is T-Coffee. Sequences in fasta format can be used as input. The multiple sequence alignment created by T-Coffee should be saved using “Download Alignment File” and can be visualized by using JalView. You can run JalView by typing “jalview” in a terminal window. More information about JalView is available here.

    8. Color the alignment according to BLOSUM62. Explain the coloring scheme.

    9. Calculate a phylogenetic tree using Neighbour joining tree BLOSUM 62. Include the image of   the obtained tree and an image of your full alignment. (Hint: Save As/Export)

ClustalW
Now we will use another program, called ClustalW, to create a multiple sequence alignment. ClustalW is also available as a web server. However, we will run it locally.

Start clustalw (with the command clustalw), choose 1. Sequence Input From Disc and type in your file. Next, create a multiple alignment with default parameters (2. Multiple Alignments, 1. Do complete multiple alignment now (Slow/Accurate) ). Save the output in a *.aln file. Now you should have a file with the alignment.

You can view your MSA with the multiple alignment as before using Jalview.

Now the alignment is visualized. Play around with the different coloring schemes that can be set.

Calculate a phylogenetic tree using Neighbour joining tree BLOSUM 62 based on your multiple alignment.

    10. Explain (briefly) the ClustalX coloring scheme.

    11. Include again an image of the alignment and the phylogenetic tree obtained.