Classical papers in Bioinformatics

Scilife logo

Classical papers in bioinformatics

Ph.D. course, Fall 2012,
Scilifelab, Stockholm. Arne Elofsson, 5 ECTS

Course description

The subject of the course is a set of classical articles that contain fundamental contributions to bioinformatics. This includes papers describing results or novel approaches that have influenced the thinking in bioinformatics, or recent papers that are considered particularly important.

Course format

All students must read the given articles before the meeting. Two students will be responsible for presenting the articles, in order to start the discussion about them. These two students are supposed to meet and prepare the presentation together. The two students should be responsible for a subject that is different from the field of their Ph.D. projects.

All students must prepare at least one question about the articles for the presenting students to consider. Send this question to the course organizer (Arne Elofsson ), at the latest the Moday before the meeting, who will forward it to the presenting students.

All students must attend every meeting. If for some reason a student is unable to attend a meeting the student will have to write a one-page summary (A4, own words) of the articles and send it to Arne Elofsson (arne(at) by the latest three weeks after the session that was missed. A maximum number of three sessions could be missed.

The senior scientist(s) will not lead the discussion, only participate. That’s why the senior scientist is called ‘sponsor’, not teacher. The presenting students are encouraged to contact the sponsor well in advance of the meeting to discuss the paper(s).

You will have to get copies of the articles yourself. In a few cases (some very old papers), we may provide paper copies or pdfs.

Format of the discussion

Here is a list of questions to discuss.

  1. What was novel about these papers ?
  2. How has they changed the field ?
  3. What has happened with the field since they were published ?

Email List

The email list is accessible by sending an email to

Meeting date

The meetings are held weekly, Wednesdays, 15:00-18:00, in the Kilsbergen at floor  4 at Scilifelab. There will be 14 meetings during the fall of 2012. Starting from the 19th of Sept until Christmas.



  • Arne Elofsson
  • Bengt Persson
  • Erik Lindahl
  • Gunnar von Heijne
  • Jens Carlsson
  • Jens Lagergren
  • Lars Arvestad
  • Lukas Käll
  • Lukasz Humaniecki
  • Sara Light


  1. Auwn Muhammad
  2. Carlos Talavera-Lopez
  3. Frances Vega
  4. Ikram Ullah
  5. Joel Sjöstrand
  6. Karolis Uziela
  7. Konstantinos Tsirigos
  8. Linus Östberg
  9. Luisa Hugerth
  10. M Owais Mahmudi
  11. Özge Yoluk
  12. Oxana Sachenkova
  13. Per Warholm
  14. Raja Hashim
  15. Ranganathan Anirudh
  16. Valentine Svensson
  17. Viktor Granholm
  18. Walter Basile
  19. Viveca Lindahl


Location: Kilsbergen, Floor 4 Scilifelab.


Date Topic Article(s) Students Sponsor
120919 Alignments and Homology detection
  • Needleman SB, Wunsch CD.
    A general method applicable to the search for similarities in
    the amino acid sequence of two proteins.

    J Mol Biol. 1970 Mar;48(3):443-53.
  • Smith TF, Waterman MS.
    Identification of common molecular subsequences.
    J Mol Biol. 1981 Mar 25;147(1):195-7.
  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ.
    Domains Gapped BLAST and PSI-BLAST: a new generation of
    protein database search programs.
    Nucleic Acids Res. 1997
    Sep 1;25(17):3389-402. Review.
  • Kent WJ. BLAT–the BLAST-like alignment tool.
    Genome Res. 2002 Apr;12(4):656-64.
  • Söding J.
    Protein homology detection by HMM-HMM comparison.
    Bioinformatics. 2005 Apr 1;21(7):951-60. Epub 2004 Nov 5. Erratum in: Bioinformatics. 2005 May 1;21(9):2144.
Carlos Talavera-Lopez, Frances Vega Arne
120926 Phylogenetic inference
  • Fitch, W.M.,
    Toward defining the course of evolution: minimum change for a
    specific tree topology.

    Systematic Zoology 20 (1971) 406-416.
  • Felsenstein, J. 1981. Evolutionary Trees From DNA
    Sequences: a Maximum Likelihood Approach..
    Journal of
    Molecular Evolution 17 (6): 368–376.
  • Huelsenbeck, John P, F Ronquist, R Nielsen, and J P Bollback. 2001. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology Science (New York, NY) 294 (5550) (December 14): 2310–2314. doi:10.1126/science.1065889.
  • Wong, Karen M, Marc A Suchard, and John P Huelsenbeck. 2008. Alignment Uncertainty and Genomic Analysis Science (New York, NY) 319 (5862) (January 25): 473–476. doi:10.1126/science.1151532.
  • Redelings, Benjamin, and Marc Suchard. 2005. Joint Bayesian Estimation of Alignment and Phylogeny. Systematic Biology 54 (3) (June 1): 401–418. doi:10.1080/10635150590947041
Linus Östberg, Luisa Hugerth, Valentine Svensson Lars Arvestad
121003 Signal peptides and TM proteins
  • Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.Nielsen H, Engelbrecht J, Brunak S, von Heijne G.
    Protein Eng. 1997 Jan;10(1):1-6.
  • A new method for predicting signal sequence cleavage sites.
    von Heijne G.
    Nucleic Acids Res. 1986 Jun 11;14(11):4683-90.
  • Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.
    Krogh A, Larsson B, von Heijne G, Sonnhammer EL.
    J Mol Biol. 2001 Jan 19;305(3):567-80.
  • Membrane protein structure prediction. Hydrophobicity analysis and the positive-inside rule.
    von Heijne G.
    J Mol Biol. 1992 May 20;225(2):487-94.
Ikram Ullah, Joel Sjöstrand, Oxana Sachenkova Gunnar von Heijne
121010 Networks, function predictions and interactions
  • Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM.
    Probabilistic model of the human protein-protein interaction network.
    Nat Biotechnol. 2005 Aug;23(8):951-9.
  • Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM.
    A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans.
    Nat. Genet. 2008, 40:181-8
  • Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabasi AL.
    The human disease network.
    Proc Natl Acad Sci U S A. 2007 104:8685-90
Karolis Uziela, Per Warholm, Raja Hashim Bengt Persson
121017 Secondary structure predictions
  • Chou PY, Fasman GD.
    Conformational parameters for amino acids in helical,
    beta-sheet, and random coil regions calculated from proteins.

    Biochemistry. 1974 Jan 15;13(2):211-22.
  • Chou PY, Fasman GD.
    Prediction of protein conformation.
    Biochemistry. 1974 Jan 15;13(2):222-45.
  • Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins.
    Garnier J, Osguthorpe DJ, Robson B.
    J Mol Biol. 1978 Mar 25;120(1):97-120.
  • Rost B, Sander C.
    Prediction of protein secondary structure at better than 70%
    J Mol Biol. 1993 Jul 20;232(2):584-99,
Viveca Lindahl , Ranganathan Anirudh, Viktor Granholm Arne Elofsson
121024 Hidden Markov Models
  • Hidden Markov models in computational biology. Applications to protein modeling.
    Krogh A, Brown M, Mian IS, Sjölander K, Haussler D.
    J Mol Biol. 1994 Feb 4;235(5):1501-31.
  • A probabilistic model of local sequence alignment that simplifies
    statistical significance estimation.
    Eddy SR.
    PLoS Comput Biol. 2008 May 30;4(5):e1000069.
  • A tutorial on hidden Markov models and selected applications in speech recognition.
    Lawrence R Rabiner,
Raja Hashim, Walter Basile, Auwn Mohammed Lukas Käll
121031 Expression analysis
  • A gene atlas of the mouse and human protein-encoding transcriptomes.
    Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J,
    Soden R, Hayakawa M, Kreiman G, Cooke MP, Wal
    ker JR, Hogenesch JB.
    Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7. Epub 2004 Apr 9.
  • Schena M, Shalon D, Davis RW, Brown PO.
    Quantitative monitoring of gene expression patterns with a complementary DNA microarray.
    Science. 1995 Oct 20;270(5235):467-70.
  • Eisen MB, Spellman PT, Brown PO, Botstein D.
    Cluster analysis and display of genome-wide expression patterns.
    Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863-8.
  • Quackenbush J.
    Computational analysis of microarray data.
    Nat Rev Genet. 2001 Jun;2(6):418-27.
Ranganathan Anirudh , Auwn Muhammad, Frances
Lukasz Humaniecki
121107 Genome Assembly and gene annotation
  • Sanger F, Nicklen S, Coulson AR.
    DNA sequencing with chain-terminating inhibitors.
    Proc Natl Acad Sci U S A. 1977 Dec;74(12):5463-7.
  • Ronaghi M, Uhlén M, Nyrén P.
    A sequencing method based on real-time pyrophosphate.
    Science. 1998 Jul 17;281(5375):363, 365.
  • Lukashin AV, Borodovsky M.
    GeneMark.hmm: new solutions for gene finding.
    Nucleic Acids Res. 1998 Feb 15;26(4):1107-15.
  • Anson E & Myers G
    Algorithms for whole genome shotgun sequencing
    Annual Conference on Research in Computational Molecular Biology, 1999
  • Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC.A whole-genome assembly of Drosophila.
    Science. 2000 Mar 24;287(5461):2196-204.
Oxana Sachenkova, Özge Yoluk, Carlos Talavera-Lopez Arne Elofsson
121114 Mapping
  • Li, H., & Durbin, R. (2009). Fast and accurate short read
    alignment with Burrows-Wheeler transform. Bioinformatics, 25(14),

  • Yang Li, Hong-Mei Li, Paul Burns, Mark Borodovsky, Gene E. Robinson,
    Jian Ma TrueSight: Self-training Algorithm for Splice Junction
    Detection Using RNA-seq
    Research in Computational Molecular Biology
    Lecture Notes in Computer Science Volume 7262, 2012, pp 163-164
  • Trapnell, C., Williams, B., Pertea, G., Mortazavi, A., Kwan, G., Van
    Baren, M., Salzberg, S., et al. (2010). Transcript assembly and
    quantification by RNA-Seq reveals unannotated transcripts and
    isoform switching during cell differentiation.
    Nature Biotechnology,
    28(5), 511–515.
Konstantinos Tsirigos, Walter Basile, Per Warholm Jens Lagergren
121121 Protein design
  • Koga N, Tatsumi-Koga R, Liu G, Xiao R, Acton TB, Montelione GT,
    Baker D,
    Principles for designing ideal protein structures, Nature 491, 222-227 (2012)
  • Yin H, Slusky JS, Berger BW, Walters RS, Vilaire G, Litvinov RI,
    Lear JD, Caputo GA, Bennett JS, Degrado WF,
    Computational design of peptides that target transmembrane helices, Science 315, 1817-1822 (2007)
  • Dahiyat BI, Mayo SL. De novo protein design: fully automated sequence selection. Science. 1997 Oct 3;278(5335):82-7. PubMed
Joel Sjöstrand,Ikram Ullah Erik Lindahl
121128 Protein Structure prediction
  • Jones DT, Taylor WR, Thornton JM.
    A new approach to protein fold recognition.
    Nature. 1992 Jul 2;358(6381):86-9,
  • Sali A, Blundell TL. Comparative protein modelling by satisfaction of spatial restraints.
    J Mol Biol. 1993 Dec 5;234(3):779-815.
  • Bradley P, Misura KM, Baker D.
    Toward high-resolution de novo structure prediction for small proteins.
    Science. 2005 Sep 16;309(5742):1868-71.
  • Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina
    R, Sander C. Protein 3D structure computed from evolutionary sequence variation.
    PLoS One. 2011;6(12):e28766.
Luisa Hugerth, M Owais Mahmudi Arne Elofsson
121205 Molecular docking: Methods for fast scoring of protein-ligand complexes
  • Kuntz, ID; Blaney, JM; OAtley, SJ; et al.
    A geometric approach to macromolecule-ligand interactions
    Journal of Molecular BIOLOGY Volume: 161 Issue: 2 Pages: 269-288
  • Meng, EC; SHoichet, BK; Kuntz, ID
    Automated docking with grid-based energy evaluation
    JOURNAL OF COMPUTATIONAL CHEMISTRY Volume: 13 Issue: 4 Pages: 505-
    524 DOI: 10.1002/jcc.540130412 Published: MAY 1992
  • BOHM, HJ The development of a simple empirical scoring function to
    estimate the binding constant for a protein ligand complex of known 3-
    dimensional structure

    Journal of Computer-Aided Molecular Design Volume: 8 Issue: 3 Pages:
    243-256 DOI: 10.1007/BF00126743 Published: JUN 1994
  • Muegge, I; Martin, YC
    A general and fast scoring function for protein-ligand interactions: A simplified potential

    Journal of Medicinal Chemistry Volume: 42 Issue: 5 Pages: 791-804 DOI:
    10.1021/jm980536j Published: MAR 11 1999
  • Warren, Gregory L.; Andrews, C. Webster; Capelli, Anna-Maria; et
    A critical assessment of docking programs and scoring functions
    Source: JOURNAL OF MEDICINAL CHEMISTRY Volume: 49 Issue: 20 Pages: 5912-
    5931 DOI: 10.1021/jm050362n Published: OCT 5 2006
Valentine Svensson,, Viktor Granholm, Konstantinos Tsirigos Jens Carlsson
121212 Protein Domains
  • Sonnhammer EL, Eddy SR, Durbin R. Pfam: a comprehensive
    database of protein domain families based on seed alignments.

    Proteins. 1997 Jul;28(3):405-20.
  • Gordana Apic, Julian Gough, Sarah A Teichmann, Domain combinations in
    archaeal, eubacterial and eukaryotic proteomes,
    Journal of Molecular
    Biology, Volume 310, Issue 2, 6 July 2001, Pages 311-325, ISSN
  • Gabrielle A. Reeves, Timothy J. Dallman, Oliver C. Redfern, Adrian
    Akpor, Christine A. Orengo, Structural Diversity of Domain
    Superfamilies in the CATH Database
    Journal of Molecular Biology,
    Volume 360, Issue 3, 14 July 2006, Pages 725-741
  • Levy ED, Boeri Erba E, Robinson CV, Teichmann SA. Assembly reflects
    evolution of protein complexes.
    Nature. 2008 Jun 26;453(7199):1262-5.
    Epub 2008 Jun 18.
Linus Östberg, Özge Yoluk Sara Light
121219 Molecular Dynamics and computational chemistry
  • Chothia C, Levitt M, Richardson D.
    Structure of proteins: Packing of alpha-helices and pleated sheets
    Proc Natl Acad Sci U S A. 1977 Oct;74(10):4130-4.
  • McCammon JA, Gelin BR, Karplus M.
    Dynamics of folded proteins.
    Nature. 1977 Jun 16;267(5612):585-90.
  • Levinthal C.
    Molecular model-building by computer.
    Sci Am. 1966 Jun;214(6):42-52.
M Owais Mahmudi, Karolis Uziela Erik Lindahl