Lab: Phylogeny and Pfam
Instructions
Answer all questions and send it to Marco by email.
***Please send a pdf file (not .docx, .odt etc) with “name_lab_your_name.pdf”
Video
Grading
The lab reports are a training to write scientific texts and an important part of the course. To pass you need to provide a final version of the lab report within 7 days after the lab. You have one chance to submit a preliminary version and receive feedback on it (not mandatory). The preliminary version needs to be submitted within 72 hours of the lab and feedback will be provided at the latest 24 hours before the deadline of the assignment. If the final report it not submitted in time or it contains an error you will get an Fx on the lab course. This means that you have to submit an updated lab report within 7 days after the exam and that the highest grade you can get is an E on the entire course. If this does not occur you will have to re-register for the course next time it is given (normally next year) and complete the missing parts that year and you can still not receive a grade higher than E. So in short this is the rules for the lab reports.
A. 0-0 Lab
B. 0-3 days after the lab – Preliminary version
C. 3-6 days after the lab – Feedback
D. 7 days after the lab – Final version
The lab report must be complete upon submission, i.e. you have to provide an answer to each question. The assistant has the right to return / not approve the submission if it is not complete. Linguistic accuracy is a must. Returns should get back to s within a week after you have received it. The submission of returns must be a version where all comments appear. All comments should be considered carefully or else it does not count as a return. A maximum of 1 return is allowed. The second return means that you have to re-register for the course next time it is given (normally next year) and complete the lab then. Complete reports in time for all bioinformatics and programming labs are compulsory.
Assignment Instructions
Download all you need first!
Resources
Phylogeny
In this assignment you will learn how to build and interpret different types of phylogenetic trees.
1. What kind of information does a phylogenetic tree illustrate?
2. What is the difference, in terms of the information displayed, between rooted and unrooted trees?
3. Use the UPGMA method together with the Hamming distance to build the phylogenetic tree for the following DNA sequences: ACTT, AGGG, GATT, TGGG. Show every step you performed when building the tree. What is one of the main disadvantages of this method?

Figure 1. Example of a species phylogenetic tree. The lengths of the branches correspond to distances between sequences.
Look at the phylogentic tree shown in Figure 1 and answer the following questions:
4. Which mouse has the largest evolutionary distance to the common ancestor (tree root)?
5. In terms of ordinary time, which mouse diverged first, the grey one or the purple one?
Read the abstract from the paper below:
Science 1994 Nov 18;266(5188):1229-1232
DNA was extracted from 80-million-year-old bone fragments found in strata
of the Upper Cretaceous Blackhawk Formation in the roof of an underground
coal mine in eastern Utah. This DNA was used as the template in a polymerase
chain reaction that amplified and sequenced a portion of the gene encoding
mitochondrial cytochrome b. These sequences differ from all other cytochrome b
sequences investigated, including those in the GenBank and European Molecular
Biology Laboratory databases. DNA isolated from these bone fragments and the
resulting gene sequences demonstrate that small fragments of DNA may survive
in bone for millions of years.
The authors of this research paper concluded that the DNA sequence appears to be from a dinosaur that lived 80 million years ago. Woodward et. al did not illustrate an evolutionary tree, and instead discussed their sequences in terms of percentage of sequence difference, noting that the discovered cytochrome b sequences differed from all the other cytochrome b sequences found in the databases.
Using the dinosaur sequence attached (dino_2.fa.txt) and the cytochrome b sequences from other species attached (dino_others.fa.txt), together with your skills from previous computer practicals, try to answer the following questions.
6. Use one of the methods shown in the previous lab to align the sequences and construct an evolutionary tree. Create and submit a print screen image of the multiple sequence alignment and the phylogenetic tree.
7. Which three species' sequences are the most related to the dinosaur sequence according to your phylogenetic tree?
8. What can you conclude from your observations, in terms of biology and evolution?
* Obtained from http://myweb.dal.ca/js551958/Tutorial/Resources.html
Pfam – Protein Domain Family Database (Pfam paper and profile Hmm)
Pfam is a tool, resource and database for protein domain families. They group protein sequences into families based on sequence similarity. They have two levels of family organization, called family and clan. A clan contains two or more families. Some families are not members of a clan. The Pfam database makes new versions or releases available for public use on a regular basis. The latest release (26) has more than 13,000 sequence families. Use the Pfam website, the textbook, and other resources, e.g. online information, and try to answer the following questions.
9. What do you think is a protein domain?
10. What do you think is a protein fold?
11. What procedure does Pfam use to create sequence families and to add new members to existing families? What do you think is the role of HMM profiles in this process?
12. Find the entry "PLCG1_BOVIN" in the Uniprot (SwissProt) database. Get its peptide sequence and search it using the Pfam Search utility. How many non-overlapping Pfam-A domains does it have? Include a screen shot of the Pfam predictions for this protein in your report.