Course Resources


Administrative

Syllabus

Gradescope

Ed Discussion

Anonymous Feedback Form


Slides

Whole Genome Shotgun Sequencing

Idury-Waterman Algorithm (overview)

Idury-Waterman Algorithm (details + example)

Velvet Algorithm

HMMs: The Learning Problem

Maximum Likelihood & EM Algorithm

Clustering Theory

k-Means Clustering Algorithm


COVID-19

Coronavirus: Nobel Prize winner predicts US will get through crisis sooner than expected (The Independent, March 24, 2020)

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation (Wrapp et al., March 2020)

Genetic Analysis of the COVID-19 Virus and Other Pathogens (Scherer, April 2020)

Phylogenetic network analysis of SARS-CoV-2 genomes (Forster et al., April 2020)

Homologous protein domains in SARS-CoV-2 and measles, mumps and rubella viruses: preliminary evidence that MMR vaccine might provide protection against COVID-19 (Franklin et al., April 2020) (preprint)

Other

How to Succeed in Science (Lecture given at Brown by Jonathan Yewdell, 2012)


Supplementary Readings


Chapter 1: BLAST and Karlin-Altschul Statistics

A Model of Evolutionary Change in Proteins (Dayhoff et al., 1976)

Identification of Common Molecular Subsequences (Smith and Waterman, 1981)

Viral src gene products are related to the catalytic chain of mammalian cAMP-dependent protein kinase (Barker and Dayhoff, 1982)

Basic Local Alignment Search Tool (Altschul et al., 1990)

Amino Acid Substitution Matrices from an Information Theoretic Perspective (Altschul, 1991)

Applications and statistics for multiple high-scoring segments in molecular sequences (Karlin and Altschul, 1993)

On-Line Construction of Suffix Trees (Ukkonen, 1995)

BLAT—The BLAST-Like Alignment Tool (Kent, 2002)

BLAST Program Selection Guide (NCBI, 2009)


Chapter 2: Genome Assembly and Lander-Waterman Statistics

Genomic Mapping by Fingerprinting Random Clones: A Mathematical Analysis (Lander and Waterman, 1988)

A New Algorithm for DNA Sequence Assembly (Idury and Waterman, 1995)

The Sequence of the Human Genome (Venter et al., 2001)

An Eulerian path approach to DNA fragment assembly (Pevzner et al., 2001)

Whole-genome shotgun assembly and comparison of human genome assemblies (Istrail et al., 2004)

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs (Zerbino and Birney, 2008)

How to apply de Bruijn graphs to genome assembly (Compeau et al., 2011)

Why are de Bruijn graphs useful for genome assembly? (Compeau et al., 2011)

Evaluation of the impact of Illumina error correction tools on de novo genome assembly (Heydari et al., 2017)

Human contamination in bacterial genomes has created thousands of spurious proteins (Breitwieser et al., 2019)

Scalable Genome Assembly through Parallel de Bruijn Graph Construction for Multiple k-mers (Mahadik et al., 2019)


Chapter 3: Coalescent Theory and Ancestral Recombination Graphs

Geneological Trees, Coalescent Theory and the Analysis of Genetic Polymorphisms (Rosenberg and Nordborg, 2002)

Comparative immunopeptidomics of humans and their pathogens (Istrail et al., 2004)

Coalescent-Based Association Mapping and Fine Mapping of Complex Trait Loci (Zöllner and Pritchard, 2005)

Mapping Trait Loci by Use of Inferred Ancestral Recombination Graphs (Minichiello and Durbin, 2006)

Approximating the coalescent with recombination (McVean and Cardin, 2005)

Genome-Wide Inference of Ancestral Recombination Graphs (Rasmussen et al., 2014)

Developments in coalescent theory from single loci to chromosomes (Wakeley, 2019)


Chapter 4: Hidden Markov Models: The Learning Problem

Maximum Likelihood from Incomplete Data via the EM Algorithm (Dempster, Laird and Rubin, 1977)

CpG Islands in Vertebrate Genomes (Gardiner-Garden and Frommer, 1987)

A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition (Rabiner, 1989)

Maximum-Likelihood Estimation of Molecular Haplotype Frequencies in a Diploid Population (Excoffier and Slatkin, 1995)

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models (Bilmes, 1998)

What is a hidden Markov model? (Eddy, 2004)

Identification of CpG islands in DNA sequences using statistically optimal null filters (Kakumani et al., 2012)


Chapter 5: Clustering Theory and Spectral Clustering

Gene Expression Clustering with Functional Mixture Models (Chudova et al., 2003)

Incremental genetic K-means algorithm and its application in gene expression data analysis (Lu et al., 2004)

A Tutorial on Spectral Clustering (von Luxburg, 2007)

The EM Algorithm and the Rise of Computational Biology (Fan et al., 2010)

Spectral Clustering Based Classification Algorithm for Text Classification (Suganthi and Manimekalai, 2018)

Spectral clustering via half-quadratic optimization (Zhu et al., 2019)


Chapter 6: Protein Folding

The Protein Folding Problem (Dill, 2008)