For more than a decade, gene sequencers have been improving more rapidly than the computers required to make sense of their outputs. Searching for DNA sequences in existing genomic databases can already take hours, and the problem is likely to get worse. Recently, Bonnie Berger’s group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has been investigating techniques to make biological and chemical data easier to analyze by, in some sense, compressing it.
In the latest issue of the journal Cell Systems, Berger and colleagues — first authors Noah Daniels, a postdoc in her group, and William Yu, a graduate student in applied mathematics, and David Danko, an undergraduate major in computational biology — present a theoretical analysis that demonstrates why their previous compression schemes have been so successful. They identify properties of data sets that make them amenable to compression and present an algorithm for determining whether a given data set has those properties. They also show that several existing databases of chemical compounds and biological molecules do indeed exhibit them. Read the full story at MIT News