Lab 1+2: Databases and genomes
Instructions
Answer all questions and send it to Marco by email.
***Please send a pdf file (not .docx, .odt etc) with “name_lab_your_name.pdf”
Video introduction
https://www.youtube.com/watch?v=ypqTnpBSsq8
Assignment Instructions
Download all you need first!
Resources
Genes and Open Reading Frames
In this part of the assignment, you will use ORF finder tool to search for open reading frames (ORFs) in the DNA sequence you enter. Before you start it is a good idea that you read the online information available about ORF in wikipedia.
HINT: when you search for ORFs consider all 6 possible reading frames (3 forward and 3 reverse)!!!
1. What is an Open Reading Frame? Why is it easier to find or predict genes in prokaryotic genomes than in eukaryotic genomes?
2. Give example of a few key features used to find genes in eukaryotic genomes.
3. Attached you find a part of a sequence from E.coli (E.coli.txt). Use ORF finder to find open reading frames longer than 300 nucleotides in that sequence. How many do you find (if does not matter if they are overlapping)?
4. Attached you find an unknown genomic sequence (unknown-seq.txt). Use ORF finder again to find ORFs larger than 300 nucleotides in this sequence. How many do you find?
5. Based on your findings, do you think that the sequence is prokaryotic or eukaryotic? Motivate your answer.
Entrez/GQuery
Now, you will use the Entrez (GQuery) search tool to search for information about Bacteriorhodopsin. Press the right mouse button on the Entrez link and choose “Open in new window (or tab)”. Before you start it is a good idea that you read the online information available about Entrez. When you are done you should know what sort of databases can be searched by Entrez and how you search for them.
6. How many known structures of Bacteriorhodopsin (and its variants) did you find with entrez ?
7. How many Bacteriorhodopsin proteins and variants are found in each of the following: Bacteria, Eukaryotes and Archaea? Describe briefly how you performed the search.
PubMed
PubMed is a resource for searching and finding research articles or papers on various subjects. In the search field within Pubmed, you have different kinds of helping keywords at your disposal, e.g. AND, OR, and NOT. Putting double quotes (“”) around your query phrases make them more specific. Limits can make your search more specific. Alternatively, you can specify which fields to search by using [keyword], e.g. lemon[Title]. Consult the search field descriptions for more options. When answering to the following questions, please include the exact search string.
8. How many articles are there about "molecular dynamics"?
9. How many articles have an author named Lindahl?
10. How many articles about "molecular dynamics" have an author named Lindahl?
11. How many review articles about "protein structure prediction" can you find?
The Uniprot Knowledgebase
The Uniprot Knowledgebase consists of both SwissProt and TrEMBL. There should be release statistics linked to from the front page.
12. How many sequences are there in SwissProt and in TrEMBL?
13. Why does TrEMBL have considerably more sequences than SwissProt?
14. Search for CCR5 (which is the C-C chemokine receptor type 5) in Uniprot. How long is the CCR5 protein in human?
15. Check out CCR5 in human. How many scientific articles are referring to this protein (excluding computationally mapped references)?
16. Which disease is it associated with?
Other sequence databases
Locate DDBJ, EMBL-ENA and GenBank.
17. What kind of databases are they?
18. How are they different from SwissProt and TrEMBL?
Biochemical (pathway) databases
19. What does kegg stand for? What is KEGG?
20. Sucrose is involved in a number of pathways, search for it in KEGG and please find out which.
Genetic disorder database
21. What does OMIM stand for? What is OMIM?
22. Look up oculotaneus albinism (type II) in OMIM. What eye-colour can this lead to? Please provide the link you used.
23. What gene is thought to be connected with this disorder and where is it located? Please provide the link you used.
Gene Ontology (GO)
24. What is Gene Ontology (GO)?
25. What three categories are at the first classification level in GO?
26. Search for alcohol dehydrogenase in GO. What definition for the alcohol dehydrogenase activity is there for the first hit?