In everyday life, we encounter DNA through forensic crime dramas and heavily advertised ancestry kits. DNA is presented here as the genetic fingerprint that defines our unique traits as humans and biological organisms.
Even if we appreciate that DNA represents a form of biological software code, to consider it as a potential rival to the silicon chip may appear to you as a category error - what has our genome got to do with computation?
To make use of any medium’s affordances, there must exist a toolset to code and decode, store, and compress the information. DNA is no exception. In their biomechanical nature, the information processes that have evolved within our cells are fiendishly complex.
As science develops more robust ways of interacting with DNA, its potential advantages and efficiencies (optimised by the nature itself) become obvious: extreme durability, information density, radically paralleled forms of information processing, or the more prosaic cost advantages of exploiting the chemistry of say, a potato (with estimated data storage of 727 Mb by the Potato Council’s Potato Genome Sequencing Consortium).[1]
However not only is the science rapidly developing, it’s also commercialising. At the end of 2023, the French startup Biomemory unveiled their proof-of-concept DNA device - a memory card that can store 1 kB of data in DNA.[2]
Perhaps superficially unremarkable, given that the storage capacity of commercially available portable drives usually ranges between tens and hundreds of GB but still marks an important milestone on the road to utilisable biocomputers based on DNA; in order to appreciate the computational potential of DNA, one must first unlock its potential as a data storage medium, and then move towards operationalizing its unique abilities to process and edit its stored information.
While talking about biology and life in general, DNA became the solid bedrock of our scientific discourses; talking about living matter means talking about matter capable of self-reproduction via replication of genetic information. Gene storage and processing thus lie at the very core of terrestrial biology. Apart from viruses, the genetic information in organisms is usually stored in the part of cells known as nucleus, where it is encoded using organic molecules known as nucleotides.
As the heirs of Hindu-Arabic numeral system, we are used to count using 10 numbers. Alternatively, we often encounter the binary 0 and 1 of digital computation. However, DNA base is quaternary, which sounds much less intuitive, especially when the standard symbols are common alphabetical letters derived from their chemical origins. Although in experiments related to DNA computation, these symbols are not frequently mapped to numerical values (0, 1, 2, 3), thinking of these elements as base numbers brings us closer to understanding how they work together.
These organic molecules compose chains of ribonucleic and deoxyribonucleic acid, i.e. RNA and DNA. In the case of RNA, its structure consists of sequences of four nucleotides: Adenine (A), Cytosine (C), Guanine (G), and Uracil (U). However, since RNA forms a single string, it lacks the capacity to self-replicate, and hence it only functions as an intermediary that transfers information from DNA to other parts of cells in order to synthesise proteins or enzymes.
The long-term storage of genetic information is the business of double-helix DNA, which can copy itself. It also uses four nucleotide bases, but instead of Uracil (U), it complements with Thymine (T) as the components to form two base pairs: A-T and C-G, linked by the sugar Deoxyribose to form Deoxyribonucleic Acid – DNA. The concatenation of the pairs into a ladder-like sequence forms the elementary information code of life on Earth.
Multiple estimates suggest that DNA is by far the highest-density data storage solution ever created on this planet. Theoretically, one gram of DNA can store 215 petabytes of information (215 000 000 GB) - which is enough storage space for the entire publicly available information on the World Wide Web.[3] Besides this high storage efficiency, the comparison between contemporary silicon-based computers with organically stored information yields multiple other important differences:
Moreover, DNA enables an entirely novel approach to computer architecture. Computers are traditionally divided into memory and processing units, meaning that every piece of information that the computer wants to operate with needs first to be loaded into processing unit, which creates a potential bottleneck in the data flow.
This separation of storage from processing units - called von Neumann architecture or Harvard architecture - has been paradigmatic to hardware construction from early room-sized computers to contemporary super-portable laptops or cellphones. The alternative currently enabled by DNA computing relies on DNA’s ability to rewrite in real time the information it stores, leading to the idea of in-storage processing as the possible basis for future biocomputers. This also paves the way to massively parallel information processing (which turns DNA into a serious contender of quantum computation) and a virtually inexhaustible supply of data storage, thanks to DNA’s exponential replication capacity (one DNA splits into two copies, the two split into four, four into eight and so on).[5]
In terms of computational architecture, there is also one striking similarity between DNA and early computers. The process of DNA’s replication can be understood as a biological analogy of the Turing Machine: one strand of DNA represents an “input tape” read by the head (in the case of DNA, the head is DNA polymerase), with results being written on a new strand of DNA, which represents an “output tape”.[6] This analogy led Leonard Adleman (who is also credited with coining the term ‘Computer Virus’) to develop the first DNA computer in 1994 called TT-100, which was able to solve a version of Hamiltonian Path Problem also known as ”the travelling salesman problem”.
A decade later, more advanced DNA computers were already able to solve complex mathematical problems, despite they were still constrained by very long computing time and ambiguities in the interpretation of the results. That however changed with the invention of CRISPR-cas9 gene editing technology, which radically boosted techniques of gene reading as well as editing, and made manipulation of genetic information both faster and more affordable. Today, portable DNA readers such as MinION nanopore sequencers provide cheap and fast access to information stored in DNA, while easily fitting into your pocket.[7] Beyond that, CRISPR also paved the way to the current progress in DNA data storage - for example, Catalog’s DNA platform Shannon encodes and stores information in strands of synthetic DNA.[8]