Lab: 3D Structure Prediction
Instructions
Answer all questions and send it to John by email.
***Please send a pdf file (not .docx, .odt etc) with “name_lab_your_name.pdf”
Video
Assignment Instructions
Download all you need first!
Resources
Grading
The lab reports are a training to write scientific texts and an important part of the course. To pass you need to provide a final version of the lab report within 7 days after the lab. You have one chance to submit a preliminary version and receive feedback on it (not mandatory). The preliminary version needs to be submitted within 72 hours of the lab and feedback will be provided at the latest 24 hours before the deadline of the assignment. If the final report it not submitted in time or it contains an error you will get an Fx on the lab course. This means that you have to submit an updated lab report within 7 days after the exam and that the highest grade you can get is an E on the entire course. If this does not occur you will have to re-register for the course next time it is given (normally next year) and complete the missing parts that year and you can still not receive a grade higher than E. So in short this is the rules for the lab reports.
A. 0-0 Lab
B. 0-3 days after the lab – Preliminary version
C. 3-6 days after the lab – Feedback
D. 7 days after the lab – Final version
The lab report must be complete upon submission, i.e. you have to provide an answer to each question. The assistant has the right to return / not approve the submission if it is not complete. Linguistic accuracy is a must. Returns should get back to s within a week after you have received it. The submission of returns must be a version where all comments appear. All comments should be considered carefully or else it does not count as a return. A maximum of 1 return is allowed. The second return means that you have to re-register for the course next time it is given (normally next year) and complete the lab then. Complete reports in time for all bioinformatics and programming labs are compulsory.
Introduction
The tertiary structure of a protein is important to determine and study, because proteins are performing their function when they are in their 3D form. The Protein Data Bank (PDB) has been and still is the main database for publically known 3D structures of large biomolecules, including proteins. The 3D structure determination of proteins is done mainly by X-ray Crystallography and NMR Spectroscopy. Both of these methods require a lot of time and effort, and therefore the determination of all protein structures would take an unacceptable amount of time. Researchers have been therefore searching for ways to infer 3D structures for structurally uncharacterized proteins by using information from structurally characterized proteins. Subsequent efforts have shown that there is a generally strong agreement in terms of 3D structure between two proteins roughly down to 50% sequence identity and sometimes even down to 30% sequence identity.
1. Give a short description of how X-Ray Crystallography is performed?
2. Why did researchers develop methods for protein structure prediction?
3. Use the PDB Advanced Search and find out how many PDB entries that have had their structures determined via X-Ray Crystallography?
3D Structure Databases
The tertiary structure or three dimensional structure of a protein is generally built up by secondary structure elements positioned in a particular manner in respect to each other. Secondary structure elements refer to alpha helices and beta strands and also to loop regions. In the protein structure terminology, one often refers to protein chains. A protein chain is a tertiary structure of a continuous peptide chain, in other words, it is the 3D structure of a full or partial peptide sequence. All the residues that build up a single protein chain originate from one and the same protein sequence, while the residues in two different chains do not have the same restriction. Proteins often form protein complexes when they are performing their function in vivo. A protein complex is defined as a structure built up by two or more chains. There are examples of protein complexes that are built up by two or more chains that have identical sequences, these complexes are called dimers or multimers. It is useful to classify protein structures into families or something similar to families in order to have an overview and in order to better study these structures. SCOP and CATH are two databases that focus on this task. Use the textbook, the provided websites, and the know-how from the lectures and try to answer the following questions.
There is a research paper on SCOP that might be useful.
4. Search for Azurin in the SCOP database. Select the top hit. What class, fold and family do they belong to?
5. Search for Azurin in the CATH database. Select the top hit. What class, architecture and topology do they belong to?
6. Search for Azurin in the ECOD database. Select the top hit. What architecture, H-group and topology do they belong to?
7. To make their web services more user-friendly, ECOD, SCOP and CATH have divided their protein folds into classes. Which classes do they have?
8. Which of the three databases are most up to date?
Fold Recognition (Threading)
Fold Recognition (Threading) is a method used to predict the structure of a sequence with an unknown structure using a known structure. In general terms, the method is called threading because it involves a threading procedure where the query sequence is threaded one residue at a time on the template structure. There are different implementations of the method; straight sequence methods (like PSI-BLAST); sequence methods together with structural information (like predicted and known the secondary structure, structural environments, structural alignments; “true” threading approaches). Generally, the method includes a scanning of the query sequence against a library of known protein domains. The method generally works better when the query sequence is a single protein domain. This is because there is currently no reliable method for splitting a multi-domain sequence into single domain sequences based solely on sequence information. Therefore, for this practical, we will use a single domain sequence, shown below.
CW_binding_2
MRSYIKVLTMCFLGLILFVPTALADNSVKRVGGSNRYGTAVQISKQMYSTASTAVIVGGS
SYADAISAAPLAYQKNAPLLYTNSDKLSYETKTRLKEMQTKNVIIVGGTPAVSSNTANQI
KSLGISIKRIAGSNRYDTAARVAKAMGATSKAVILNGFLYADAPAVIPYAAKNGYPILFT
NKTSINSATTSVIKDKGISSTVVVGGTGSISNTVYNKLPSPTRISGSNRYELAANIVQKL
NLSTSTVYVSNGFSYPDSIAGATLAAKKKQSLILTNGENLSTGARKIIGSKNMSNFMIIG
NTPAVSTKVANQLKNPVVGETIFIDPGHGDQDSGAIGNGLLEKEVNLDIAKRVNTKLNAS
GALPVLSRSNDTFYSLQERVNKAASAQADLFLSIHANANDSSSPNGSETYYDTTYQAANS
KRLAEQIQPKLAANLGTRDRGVKTAAFYVIKYSKMPSVLVETAFITNASDASKLKQAVYK
DKAAQAIHDGTVSYYR
Please click on the sequence’s link and thereby have a look at its Pfam page. In the top right corner of the page, you might notice that the domain family CW_binding_2 has 0 known structures. Which makes it a superb query for a 3D structure prediction. The Weizmann website has a large collection of links to web-based tools for 3D structure prediction. Please have a look at the Weizmann website. As you can see, there are different types of tools and they use different strategies and have different specializations. Please use the course material and web material and try to answer the following questions.
9. Give a short description of how fold recognition (threading) generally works?
10. What is a template structure and how does one generally find it?
11. What do you think is a meta-server?
Homology modelling
Homology modeling is a method for creating 3D models of a query sequence based on a template structure. The strategy is to perform a fragment assembly, in other words, to find important structural fragments based on sequence alignments and them assemble them in 3D space. The Weizmann website has a large collection of links to web based tools for 3D structure prediction. The Weizmann website lists several tools called “Automated homology modelling server”. One of these tools is called SWISS-MODEL. An advantage of using the auto mode of an automated homology modelling server is that the server makes the template structure selection for the query sequence. Try to answer the following questions.
12. Which methods did Pcons.net use to perform the homology modeling? (Hint: Paper: Pcons.net: protein structure prediction meta server)
13. Give a short description of how homology modelling generally works?
You could use the homology model builder MODELLER directly by providing it with an alignment. Use the one attached (modeller.txt). The web server will ask you for a key, enter “MODELIRANJE”. They have some technical issues, you might have to try a bit to get the correct output format. A thing that worked for me when the output looked strange was, to copy the url of the output into a new tab and reload it.
14. Look at the "Quality by SOLVX" tab what does it mean when the blue line is inside the red area?
15. Download the pdb file, open it in Pymol, save an image and attach it to your report. (To be able to save the molecule as an image, Pymol needs to create a 2D representation of the 3D model. This can be done by simply typing "ray" into the Pymol command line in the structure window. Then you can save the image.)