Python: Data structures
Session 4: Data structures
Content
- List comprehension
- dict
- set
Reading
https://docs.python.org/3/tutorial/Chapter 5.
When I need to make a new kind of plot, a good way of learning how is to look at the gallery (http://matplotlib.org/gallery.html).
Grading
The lab reports are a training to write scientific texts and an important part of the course. To pass you need to provide a final version of the lab report within 7 days after the lab. You have a chance but it is not mandatory to submit a preliminary version and receive feedback on it. The preliminary version needs to be submitted within 72 hours of the lab and feedback will be provided at the latest 24 hours before the deadline of the assignment. If the final report it not submitted in time or it contains an error you will get an Fx on the lab course. This means that you have to submit an updated lab report within 7 days after the exam and that you will receive an E on the entire course. If this does not occur you will have to re-register for the course next time it is given (normally next year) and complete the missing parts that year and you can still not receive a grade higher than E.
Exercises
Only the mandatory ones should be handed in. The file name should be yourname_yoursurname_sess4.py. Ignore accents and special characters. Before submitting, please make sure your code is correct and send it to david.menendez.hurtado@scilifelab.se. You can find the template and necessary attachments here: https://gist.github.com/Dapid/84c98d14f362a6d0d57956e225ee7d37
Don’t hesitate to ask us to clarify if you have any questions.
Introductory Assignments
- What is the difference between a list and a tuple?
- Given two lists
[1,3,6,78,35,55]and[12,24,35,24,88,120,155], find the common elements. - Write a program that takes in a list of words from keyboard and prints only the ones that appear only once.
- Define a function that takes a string and counts the number of times each letter appears. Decide what is the best way of storing this information.
- Given a list of numbers, use list comprehension to:
- square each number.
- square every odd number, and removing the even ones.
- square every even number leaving the odds unchanged.
- Compute pi using Leibniz’s formula. π/4 = 1 − 1/3 + 1/5 – 1/7 + 1/9… To improve accuracy, you should sort the elements so you sum the smallest ones first. How many terms do you need to get six decimal places? You can compare values with
math.pi.
Mandatory Assignments
- Write a parser for FASTA into a dictionary, where the keys are the ids and the sequences are the values. Remember, the
>does not belong to the id. - Two labs, A and B have performed experiments measuring gene expression for 500 genes each, where some of them overlap. Read both files (attached), and merge them in a single file, space separated, one gene per line:
GreC 1.2
GreD 0.8
For the genes that both labs have measured, save the average. The order is irrelevant. - Same as before, but now save only the unique genes, that is, ignore the ones that were measured by both labs.
- Make a list of all the amino acids present in UniRef50. You can find the dataset at
/common/courses/introduction_in the lab computers.to_bioinformatics/uniprot/ uniref50.fasta
We want any letter that is part of a sequence, but you must ignore the headers. Warning: the file is big, so you won’t be able to load it all in memory.
Save the result as a hard coded list, as in the template.
Additional assignments
- Use the previous parser to load a file and make a dictionary mapping ids to GC content.
- Sometimes, we have functions that take a long time to run, but we call them again and again with the same arguments. In those cases it would be convenient to store the computed values in a dictionary, and query it next time. This is called memoisation. Implement a memoized version of the factorial. (Python has this inbuilt as
functools.lru_cache, so you should use that if you actually need it, but try to implement it yourself). - Go back to exercises from previous sessions. Is there anything you can make simpler by using something you have learned in this lesson? If so, implement it.
Template for the assignments: https://gist.github.com/Dapid/84c98d14f362a6d0d57956e225ee7d37