Lab: Alignments and blast
Instructions
Answer all questions and send it to john.lamb.su@analys.urkund.se by email.
***Please send a pdf file (not .docx, .odt etc) with “yourname_lastname_labAlignments.pdf”
Video instructions
https://www.youtube.com/watch?v=_lMPwWjLBoI
Assignment Instructions
Download all you need first!
Resources
- P08487.fasta (download and place it in /scratch/)
-
in the terminal write:
cp /common/courses/introduction_to_bioinformatics/uniprot/uniref90.fasta /scratch/
Grading
The lab reports are a training to write scientific texts and an important part of the course. To pass you need to provide a final version of the lab report within 7 days after the lab. You have one chance to submit a preliminary version and receive feedback on it (not mandatory). The preliminary version needs to be submitted within 72 hours of the lab and feedback will be provided at the latest 24 hours before the deadline of the assignment. If the final report it not submitted in time or it contains an error you will get an Fx on the lab course. This means that you have to submit an updated lab report within 7 days after the exam and that the highest grade you can get is an E on the entire course. If this does not occur you will have to re-register for the course next time it is given (normally next year) and complete the missing parts that year and you can still not receive a grade higher than E. So in short this is the rules for the lab reports.
A. 0-0 Lab
B. 0-3 days after the lab – Preliminary version
C. 3-6 days after the lab – Feedback
D. 7 days after the lab – Final version
The lab report must be complete upon submission, i.e. you have to provide an answer to each question. The assistant has the right to return / not approve the submission if it is not complete. Linguistic accuracy is a must. Returns should get back to s within a week after you have received it. The submission of returns must be a version where all comments appear. All comments should be considered carefully or else it does not count as a return. A maximum of 1 return is allowed. The second return means that you have to re-register for the course next time it is given (normally next year) and complete the lab then. Complete reports in time for all bioinformatics and programming labs are compulsory.
BLAST
Blast is a collection of programs for performing biological sequence searches on large sequence collections or databases. The most well-known Blast programs are blastn for nucleotide sequence searches and blastp for protein sequence searches. Several Blast tutorials are available online, e.g. http://www.ncbi.nlm.nih.gov/books/NBK21097/
Blast programs can be used on a web server online or as a command line tool offline. When having just a few sequences as queries, the web server is usually preferable. When having a large number of sequences as queries, the command line version is usually preferred. Here, we will use the command line version.
Blast should be already installed on your computer. The program blastall contains several blast programs, including blastn and blastp. The program blastall has many options and several of them will be useful during this practical. List these options by running “blastall h” or “blastall –help”. You could also pipe the output from blastall h using the pipe command | combined with less. Try this by running blastall h | less.
The different options for blastall are known as flags and they start with the “-” character, e.g. -p and -i. Some flags accept one parameter directly after them while other flags accept zero parameters and function more like stand-alone boolean variables.
To run blastall with flags, you need to write blastall together with its flags on a single line in the shell. If you have a single fasta sequence in file “test.fasta” and you have a database called “/path/uniprot.fasta” then you can search your protein against the database by running e.g. blastall -p blastp -d /path/uniprot.fasta -i test.fasta
Questions:
1. What is the difference between blastn and blastp?
2. Which blastall flag would you use to specify the blast program (e.g. blastp)?
3. What does the e-value stand for and which e-values are better and which are worse?
At this point, you should have downloaded the attached file P08487.fasta to your /scratch/ directory. Use blastp and search the unknown protein sequence against the database (uniref90.fasta) attached to this assignment. Remember that you should have copied or downloaded all the files to your /scratch/ directory.
In order to continue, there are two important things. First, it is necessary to format your database using formatdb and then specify the formatted database as input to blast.
formatdb -i /scratch/uniref90.fasta
Important, using the flag “-n” you can give a specific name to your formatted database.
Second, you run:
blastall -p blastp -d /scratch/uniref90.fasta -i /scratch/P08487.fasta -o result.txt
Now, you can answer the following questions:
4. What is the best hit you obtain? You can just copy paste the line from the Blast output.
5. How many hits did you find at e-values less than or equal to 10?
Dotplots and Sequence alignments
6. Name one advantage and one disadvantage when using dotplots to detect evolutionary relationships between sequences.
7. What is the alignment corresponding to the path illustrated in the following dotplot?
8. What is the Hamming distance between the sequences TGGGC and TAGGC?
9. What kind of information does a BLOSUM matrix describe?
10. Why do you think it is necessary to use substitution matrices when scoring an alignment?
11. Using a BLOSUM matrix, which subsitution is more probable: F<->Y or H<->R?
Below are two pairwise alignments between two sequences. The BLOSUM62 substitution matrix that can be found at
http://www.uky.edu/Classes/BIO/520/BIO520WWW/blosum62.htm
X1G: VMMMVIH--F Y1G: VM--VIHWHY
X2G: VMMMVIHF-- Y2G: ---MVIHWHY
12. What are the total substitution scores of the alignments? (Hint: sum the scores from the BLOSUM62 matrix)
13. What are the total gap penalties for the two alignments when
A. using the gap opening penalty 1 and gap extension 1?
B. using the gap opening penalty 10 and gap extension 0.5?
Explain how you computed your results.
