Addressing the Data Storage Crisis

Our increasingly digitized world is creating more data every year, including videos from ubiquitous smart phones, observations from billions of sensors and surveillance cameras, output from artificial intelligence, and much more. Until now, exponential growth in data storage capacity has largely kept pace with the flood of data at a steady cost.

This trend may not continue, according to John Monroe at Furthur Market Research, who for almost 25 years, until 2022, was an analyst at Gartner. “We’ve just spoiled all the users: Storage is infinitely there and always cheaper, year on year. That isn’t necessarily going to be the case in the near-term future.”

Hard-disk drives (HDDs) have been critical for keeping pace with exploding demand, along with their solid-state equivalents, which are widely used for “warm” data that needs to be accessed frequently. The cost of these solutions is no longer falling quickly enough, however. In 2020, Monroe (no relation to this author) produced a Gartner report that projected a growing storage gap this decade, one that will demand “new technologies that can deliver millions of enterprise-grade petabytes at costs approaching $0.001 per gigabyte.”

In the near term, magnetic tape is likely to be the best option to fill the gap. But much of the growing demand is for long-term archival storage of “cold” data that will be retrieved only rarely—or never. Archiving such data demands long-term robustness and low upfront and maintenance costs.

To this end, researchers are exploring techniques such as laser writing of inorganic films or bulk modification of fused silica. Vastly denser long-term storage, however, could potentially be achieved with organic molecules, especially DNA, which can stably store genetic information over centuries. Biology tools for cheaply reading DNA sequences have already vastly improved. More recently, investigators have adapted microelectronics techniques to encode information in the molecules quickly and cheaply, but the techniques are still at an early stage.

The Return of Tape

Datacenters are a prime driver of storage demand, and HDDs, based on magnetic disks, are key to meeting it. Solid-state drives, built on flash memory, conveniently expand this capacity, but at a higher cost per gigabyte. Technologies such as static, dynamic, and magnetic random-access memory (SRAM, DRAM, and MRAM) can be integrated closely with computation, but are too expensive for high-volume use.

A large fraction of the increased demand, however, is for “cold” data that needs to be cheaply preserved indefinitely. According to the 2019 Tape Roadmap report from the Information Storage Industry Consortium (INSIC), “60% of the total datasphere is neither frequently accessed nor does it require rapid access,” making it suitable for tape storage.

The areal bit density of tapes is much lower than that on HDDs, whose bits shrank exponentially over the years. However, HDD density growth has slowed dramatically since about 2009. In contrast, INSIC projected sustained rapid increases in tape storage density. “There’s technical problems with going to higher density on tape, but it’s nowhere near the physical limits that hard drives are on,” agreed Eric Fullerton, director of the Center for Memory and Recording Research at the University of California, San Diego

More importantly, areal bit density is also only part of the story, Fullerton said. “One advantage that tape has is you can put large capacity because it’s essentially three-dimensional because you wind the tape on top of itself,” without multiplying the reading and writing hardware.

Fullerton (a National Academy of Engineering member) co-authored a “Rapid Expert Consultation” on Archival Data Storage Technologies for the Intelligence Community. He and three other experts were tasked by the U.S. National Academies with exploring archival options that would be ready by 2030 to store large amounts of intelligence data for decades. They concluded that “Tape is really the way to go in many ways” for cold data, Fullerton said, even though “If you had asked me back when I was at IBM, I would have said tape was over.”

“Within the next three years, certainly, chief information officers are going to be forced to use tape (or something else),” Monroe agreed. “Why? Because hard drives and solid-state drives are too expensive and suck up too much energy.”

Eyeing Optical Storage

What that “something else” might be, especially after 2030, remains an open question. Consumers, of course, are familiar with other mass-storage technologies, such as optical disks, which offer a variety of (incompatible) formats. The costs, however, are not competitive with hard drives, let alone tape.

Companies also are working on alternative archival solutions. Group 47, for example, has acquired rights to the Digital Optical Technology System (DOTS) archival optical technology devised by Kodak. Cerabyte is working to commercialize a system that uses high-power femtosecond lasers to write more than a million bits in parallel on a thin ceramic layer.

Storing information in holograms also garnered “enormous effort” even decades ago, Fullerton said. “It works in principle,” but materials that can be written with low-power lasers but do not degrade over time or during reading proved challenging to find. Nonetheless, even now, U.K.-based HoloMem is pursuing commercialization of storage in small holograms written in photopolymer materials that are stable over a wide temperature range.

Transparent media also offer increased optical storage capacity through multi-layer recording. For example, Folio Photonics features optical disks using multiple layers of organic molecules whose fluorescence is selectively quenched in a nonlinear response to focused lasers. Microsoft’s Research Lab in Cambridge, U.K., is also examining optical storage with hundreds of layers; its “Project Silica” uses high-power femtosecond lasers to permanently modify the local structure within slabs of fused silica, building on “5D” research from the University of Southampton.

“By improving core metrics of the technology (i.e., density, throughput, write energy efficiency), we could create truly sustainable archival storage,” Ioan Stefanovici, principal research manager in the Cloud Infrastructure Group at Microsoft Research Cambridge, wrote by email. “We envision this cutting-edge technology becoming a mainstay in Azure datacenters,” which support Microsoft’s cloud services.

Although multiple layers significantly improve capacity, optical storage is limited to feature sizes comparable to optical wavelengths. “IBM gave up on optical storage,” Fullerton said, “as soon as magnetic hard drives passed the optical resolution limit.”

Molecular Information

What could improve density by orders of magnitude is molecular storage, notably in the sequence of the genetic molecule DNA. In 2012, a 5-megabit book was encoded in DNA and read back by George Church of Harvard Medical School and the Wyss Institute for Biologically Inspired Engineering in Boston and his colleagues.

Following that feasibility demonstration, David A. Markowitz, now of R&D company STR, spearheaded the Molecular Information Storage (MIST) program at the U.S. governmental Intelligence Advanced Research Projects Activity (IARPA). In adapting biological tools for synthesizing and sequencing DNA, “We’ve been scaling them up for an entirely different application,” Markowitz said, “so that industry would be willing to commit follow-on investments and actually work to build out this new industry, which is in the process of happening.”

Indeed, Microsoft has also been exploring DNA storage in a collaboration with the University of Washington. In 2021, the team demonstrated synthesis of a million short chains of DNA in parallel. “We expect that write speeds will scale beyond that in the future,” said Microsoft researcher Karin Strauss by email.

“A read operation takes some time to be initiated due to how long the steps in preparing the DNA for reading take,” Strauss acknowledged. “This makes it a technology with high read latency, constraining it to the archival storage space for the moment.”

For Markowitz, write speed (DNA synthesis) was a critical goal for the first phase of MIST. One of its synthesis leads, Twist Bioscience, “produced a device that was capable of writing 100 million DNA oligomers in parallel,” Markowitz said. “We massively exceeded the state of the art for write throughput for DNA.”

In 2020, Microsoft, Twist, sequencing titan Illumina, and storage vendor Western Digital, founded the DNA Data Storage Alliance. With many later members, this organization is a big improvement over the fractious optical-storage industry, Monroe said. “You’ve got 50 some-odd companies trying to come up with a compatible, symbiotic ecosystem.”

For archiving, “Biomolecules like DNA can be so incredibly stable under reasonable conditions that it’s basically write it and forget it. You never need to do integrity checks or replacement,” Markowitz said, unlike magnetic media that need “integrity checks and media replacements every two and five years. It’s a huge cost driver.”

DNA storage profoundly differs from other formats, however, since each free-floating snippet needs to carry addressing information. Also, “DNA is prone to insertions and deletions, in addition to substitutions (the equivalent of bit flips),” Strauss wrote. “However, this type of error is not new to coding theorists, since such errors are common in networking.”

Computational expertise thus has a key role to play, Markowitz stressed. “I think that the greatest enabler of progress in the molecular information storage space will be algorithm development,” he said. “A clever person who’s got a great idea could drive tremendous progress in the field.”

Further Reading

Monroe, J. Preservation or Deletion: Archiving and Accessing the Dataverse, Furthur Market Research, (March 2023)

National Academies of Sciences, Engineering, and Medicine. “Rapid Expert Consultation on Archival Data Storage Technologies for the Intelligence Community,” The National Academies Press (January 2024), https://doi.org/10.17226/27445

2019 INSIC Tape Report, Information Storage Industry Consortium (2019)

Anderson, P. et al. Project Silica: Towards Sustainable Cloud Archival Storage in Glass, The 29th ACM Symposium on Operating Systems Principles (October 2023)

Landsman, D. and Strauss, K. The DNA Storage Model, Computer (June, 2023), https://doi.org/10.1109/MC.2023.3272188