Go behind the scenes in our latest Lab Session.Read More
Automate building guardrails for Amazon Bedrock using test-driven development
As companies of all sizes continue to build generative AI applications, the need for robust governance and control mechanisms becomes crucial. With the growing complexity of generative AI models, organizations face challenges in maintaining compliance, mitigating risks, and upholding ethical standards. This is where the concept of guardrails comes into play, providing a comprehensive framework for implementing governance and control measures with safeguards customized to your application requirements and responsible AI policies.
Amazon Bedrock Guardrails helps implement safeguards for generative AI applications based on specific use cases and responsible AI policies. Amazon Bedrock Guardrails assists in controlling the interaction between users and foundation models (FMs) by detecting and filtering out undesirable and potentially harmful content, while maintaining safety and privacy. Organizations can define denied topics, making sure that FMs refrain from providing information or advice on undesirable subjects; configure content filters to set thresholds for blocking harmful content across categories such as hate, insults, sexual, violence, and misconduct; redact sensitive and personally identifiable information (PII) to protect privacy; and block inappropriate content with a custom word filter. You can create multiple guardrails with different configurations, each tailored to specific use cases, and continuously monitor and analyze user inputs and FM responses that might violate customer-defined policies. By proactively implementing guardrails, companies can future-proof their generative AI applications while maintaining a steadfast commitment to ethical and responsible AI practices.
In this post, we explore a solution that automates building guardrails using a test-driven development approach.
Iterative development
Although implementing Amazon Bedrock Guardrails is a crucial step in practicing responsible AI, it’s important to recognize that these safeguards aren’t static. As models evolve and new use cases emerge, organizations must be proactive in refining and adapting their guardrails to maintain effectiveness and alignment with their responsible AI policies.
To address this challenge, we recommend builders adopt a test-driven development (TDD) approach when building and maintaining their guardrails. TDD is a software development methodology that emphasizes writing tests before implementing actual code. By applying this methodology to guardrails, organizations can proactively identify edge cases, potential vulnerabilities, and areas for improvement, making sure that their guardrails remain robust and fit for purpose. TDD for guardrails offers several benefits. It promotes a structured and systematic approach to refining and validating guardrails, reducing the risk of unintended consequences or gaps in coverage. Additionally, TDD facilitates collaboration and knowledge sharing among teams, because tests serve as living documentation and a shared understanding of the expected behavior and constraints.
In this post, we present a solution that takes a TDD approach to guardrail development, allowing you to improve your guardrails over time.
Solution overview
In this solution, you use a TDD approach to improve your guardrails. You first create a guardrail, then build a testing dataset, and finally evaluate the guardrail using the testing dataset. Using the test results from your evaluation of the guardrail, you can go back and update it and reevaluate. This allows you to maintain the TDD approach and improve your guardrail over multiple iterations. The solution also includes an optional step where you invoke an FM to generate and implement changes to your guardrail based on the test results; we recommend using that step to understand the different ways to update the guardrail because it doesn’t guarantee all test cases will pass.
This workflow is shown in the following diagram.
Prerequisites
Before you start, make sure you have the following prerequisites in place:
- Create an AWS account, or sign in to your existing account.
- Make sure that you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock.
- Have access to the large language model (LLM) that will be used. This solution uses Anthropic’s Claude 3 Sonnet and Claude 3 Haiku models.
- Install Python 3.8 or greater in your environment.
- Install pip.
- Configure your AWS credentials.
Clone the repo
To get started, clone the repository by running the following command, and then switch to the working directory:
Build your guardrail
To build the guardrail, you can use the CreateGuardrail API. There are multiple components to a guardrail for Amazon Bedrock. This API allows you to configure the following policies programmatically:
- Content filters – You can configure thresholds to block input prompts or model responses containing harmful content such as hate, insults, sexual, violence, misconduct (including criminal activity), and prompt attacks (prompt injection and jailbreaks). For example, an ecommerce site can design its online assistant to not use inappropriate language, such as hate speech or insults.
- Denied topics – You can define a set of topics to deny within your generative AI application. For example, a banking assistant application can be designed to deny topics related to illegal investment advice.
- Word filters – You can configure a set of custom words or phrases that you want to detect and block in the interaction between your users and generative AI applications. For example, you can detect and block profanity as well as specific custom words such as competitor names, or other offensive words.
- Sensitive information filters – You can detect sensitive content such as PII or custom regular expressions (regex) entities in user inputs and FM responses. Based on the use case, you can reject inputs that contain sensitive information or redact them in FM responses. For example, you can redact users’ PII while generating summaries from customer and agent conversation transcripts.
- Contextual grounding check – You can detect and filter hallucinations in model responses if they aren’t grounded (factually inaccurate or add new information) in the source information or are irrelevant to the user’s query. For example, you can block or flag responses in Retrieval Augmented Generation (RAG) applications if the model’s responses deviate from the information in the retrieved passages or don’t answer the user’s questions.
To test this solution, you create a guardrail for a math tutoring business, which stops the model from providing responses for non-math tutoring, in-person tutoring, or tutoring outside grades 6–12 requests. See the following code:
How the Department of Energy’s AI Initiatives Are Transforming Science, Industry and Government
The U.S. Department of Energy oversees national energy policy and production. As big a job as that is, the DOE also does so much more — enough to have earned the nickname “Department of Everything.”
In this episode of the NVIDIA AI Podcast, Helena Fu, director of the DOE’s Office of Critical and Emerging Technologies (CET) and DOE’s chief AI officer, talks about the department’s latest AI efforts. With initiatives touching national security, infrastructure and utilities, and oversight of 17 national labs and 34 scientific user facilities dedicated to scientific discovery and industry innovation, DOE and CET are central to AI-related research and development throughout the country.
Hear more from Helena Fu by watching the on-demand session, AI for Science, Energy and Security, from AI Summit DC. And learn more about software-defined infrastructure for power and utilities.
Time Stamps
2:20: Four areas of focus for the CET include AI, microelectronics, quantum information science and biotechnology.
10:55: Introducing AI-related initiatives within the DOE, including FASST, or Frontiers in AI for Science, Security and Technology.
16:30: Discussing future applications of AI, large language models and more.
19:35: The opportunity of AI applied to materials discovery and applications across science, energy and national security.
You Might Also Like…
NVIDIA’s Josh Parker on How AI and Accelerated Computing Drive Sustainability – Ep. 234
AI isn’t just about building smarter machines. It’s about building a greener world. AI and accelerated computing are helping industries tackle some of the world’s toughest environmental challenges. Joshua Parker, senior director of corporate sustainability at NVIDIA, explains how these technologies are powering a new era of energy efficiency.
Currents of Change: ITIF’s Daniel Castro on Energy-Efficient AI and Climate Change
AI is everywhere. So, too, are concerns about advanced technology’s environmental impact. Daniel Castro, vice president of the Information Technology and Innovation Foundation and director of its Center for Data Innovation, discusses his AI energy use report that addresses misconceptions about AI’s energy consumption. He also talks about the need for continued development of energy-efficient technology.
How the Ohio Supercomputer Center Drives the Future of Computing – Ep. 213
The Ohio Supercomputer Center’s Open OnDemand program empowers the state’s educational institutions and industries with computational services, training and educational programs. They’ve even helped NASCAR simulate race car designs. Alan Chalker, the director of strategic programs at the OSC, talks about all things supercomputing.
Anima Anandkumar on Using Generative AI to Tackle Global Challenges – Ep. 204
Anima Anandkumar, Bren Professor at Caltech and former senior director of AI research at NVIDIA, speaks to generative AI’s potential to make splashes in the scientific community, from accelerating drug and vaccine research to predicting extreme weather events like hurricanes or heat waves.
Subscribe to the AI Podcast
Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.
Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.
MLSysBook.AI: Principles and Practices of Machine Learning Systems Engineering
Posted by Jason Jabbour, Kai Kleinbard and Vijay Janapa Reddi (Harvard University)
Everyone wants to do the modeling work, but no one wants to do the engineering.
If ML developers are like astronauts exploring new frontiers, ML systems engineers are the rocket scientists designing and building the engines that take them there.
Introduction
“Everyone wants to do modeling, but no one wants to do the engineering,” highlights a stark reality in the machine learning (ML) world: the allure of building sophisticated models often overshadows the critical task of engineering them into robust, scalable, and efficient systems.
The reality is that ML and systems are inextricably linked. Models, no matter how innovative, are computationally demanding and require substantial resources—with the rise of generative AI and increasingly complex models, understanding how ML infrastructure scales becomes even more critical. Ignoring the system’s limitations during model development is a recipe for disaster.
Unfortunately, educational resources on the systems side of machine learning are lacking. There are plenty of textbooks and materials on deep learning theory and concepts. However, we truly need more resources on the infrastructure and systems side of machine learning. Critical questions—such as how to optimize models for specific hardware, deploy them at scale, and ensure system efficiency and reliability—are still not adequately understood by ML practitioners. This lack of understanding is not due to disinterest but rather a gap in available knowledge.
One significant resource addressing this gap is MLSysBook.ai. This blog post explores key ML systems engineering concepts from MLSysBook.ai and maps them to the TensorFlow ecosystem to provide practical insights for building efficient ML systems.
The Connection Between Machine Learning and Systems
Many think machine learning is solely about extracting patterns and insights from data. While this is fundamental, it’s only part of the story. Training and deploying these “deep” neural network models often necessitates vast computational resources, from powerful GPUs and TPUs to massive datasets and distributed computing clusters.
Consider the recent wave of large language models (LLMs) that have pushed the boundaries of natural language processing. These models highlight the immense computational challenges in training and deploying large-scale machine learning models. Without carefully considering the underlying system, training times can stretch from days to weeks, inference can become sluggish, and deployment costs can skyrocket.
Building a successful machine-learning solution involves the entire system, not just the model. This is where ML systems engineering takes the reins, allowing you to optimize model architecture, hardware selection, and deployment strategies, ensuring that your models are not only powerful in theory but also efficient and scalable.
To draw an analogy, if developing algorithms is like being an astronaut exploring the vast unknown of space, then ML systems engineering is similar to the work of rocket scientists building the engines that make those journeys possible. Without the precise engineering of rocket scientists, even the most adventurous astronauts would remain earthbound.
Bridging the Gap: MLSysBook.ai and System-Level Thinking
One important new resource this blog post offers for insights into ML systems engineering is an open-source “textbook” — MLSysBook.ai —developed initially as part of Harvard University’s CS249r Tiny Machine Learning course and HarvardX’s TinyML online series. This project, which has expanded into an open, collaborative initiative, dives deep into the end-to-end ML lifecycle.
It highlights that the principles governing ML systems, whether designed for tiny embedded devices or large data centers, are fundamentally similar. For instance, while tiny machines might employ INT8 for numeric operations to save resources, larger systems often utilize FP16 for higher precision—the fundamental concepts, such as quantization, span across both scenarios.
Key concepts covered in this resource include:
- Data Engineering: Setting the foundation by efficiently collecting, preprocessing, and managing data to prepare it for the machine learning pipeline.
- Model Development: Crafting and refining machine learning models to meet specific tasks and performance goals.
- Optimization: Fine-tuning model performance and efficiency, ensuring effective use of hardware and resources within the system.
- Deployment: Transitioning models from development to real-world production environments while scaling and adapting them to existing infrastructure.
- Monitoring and Maintenance: Continuously tracking system health and performance to maintain reliability, address issues, and adapt to evolving data and requirements.
In an efficient ML system, data engineering lays the groundwork by preparing and organizing raw data, which is essential for any machine learning process. This ensures data can be transformed into actionable insights during model development, where machine learning models are created and refined for specific tasks. Following development, optimization becomes critical for enhancing model performance and efficiency, ensuring that models are tuned to run effectively on the designated hardware and within the system’s constraints.
The seamless integration of these steps then extends into the deployment phase, where models are brought into real-world production environments. Here, they must be scaled and adapted to function effectively within existing infrastructure, highlighting the importance of robust ML systems engineering. However, the lifecycle of an ML system continues after deployment; continuous monitoring and maintenance are vital. This ongoing process ensures that ML systems remain healthy, reliable and perform optimally over time, adapting to new data and requirements as they arise.
SocratiQ: An Interactive AI-Powered Generative Learning Assistant
One of the exciting innovations we’ve integrated into MLSysBook.ai is SocratiQ—an AI-powered learning assistant designed to foster a deeper and more engaging connection with content focused on machine learning systems. By leveraging a Large Language Model (LLM), SocratiQ turns learning into a dynamic, interactive experience that allows students and practitioners to engage with and co-create their educational journey actively.
With SocratiQ, readers transition from passive content consumption to an active, personalized learning experience. Here’s how SocratiQ makes this possible:
- Interactive Quizzes: SocratiQ enhances the learning process by automatically generating quizzes based on the reading content. This feature encourages active reflection and reinforces understanding without disrupting the learning flow. Learners can test their comprehension of complex ML systems concepts.
- Adaptive, In-Content Learning: SocratiQ offers real-time conversations with the LLM without pulling learners away from the content they’re engaging with. Acting as a personalized Teaching Assistant (TA), it provides tailored explanations.
- Progress Assessment and Gamification: Learners’ progress is tracked and stored locally in their browser, providing a personalized path to developing skills without privacy concerns. This allows for evolving engagement as the learner progresses through the material.
SocratiQ strives to be a supportive guide that respects the primacy of the content itself. It subtly integrates into the learning flow, stepping in when needed to provide guidance, quizzes, or explanations—then stepping back to let the reader continue undistracted. This design ensures that SocratiQ works harmoniously within the natural reading experience, offering support and personalization while keeping the learner immersed in the content.
We plan to integrate capabilities such as research lookups and case studies. The aim is to create a unique learning environment where readers can study and actively engage with the material. This blend of content and AI-driven assistance transforms MLSysBook.ai into a living educational resource that grows alongside the learner’s understanding.
Mapping MLSysBook.ai’s Concepts to the TensorFlow Ecosystem
MLSysBook.AI focuses on the core concepts in ML system engineering while providing strategic tie-ins to the TensorFlow ecosystem. The TensorFlow ecosystem offers a rich environment for realizing many of the principles discussed in MLSysBook.AI. This makes the TensorFlow ecosystem a perfect match for the key ML systems concepts covered in MLSysBook.AI, with each tool supporting a specific stage of the machine learning process:
- TensorFlow Data (Data Engineering): Supports efficient data preprocessing and input pipelines.
- TensorFlow Core (Model Development): Central to model creation and training.
- TensorFlow Lite (Optimization): Enables model optimization for various deployment scenarios, especially critical for edge devices.
- TensorFlow Serving (Deployment): Facilitates smooth model deployment in production environments.
- TensorFlow Extended (Monitoring and maintenance): Offers comprehensive tools for ongoing system health and performance.
Note that MLSysBook.AI does not explicitly teach or focus on TensorFlow-specific concepts or implementations. The book’s primary goal is to explore fundamental ML system engineering principles. The connections drawn in this blog post to the TensorFlow ecosystem are simply intended to illustrate how these core concepts align with tools and practices used by industry practitioners, providing a bridge between theoretical understanding and real-world application.
Support ML Systems Education: Every Star Counts 🌟
If you find this blog post valuable and want to improve ML systems engineering education, please consider giving the MLSysBook.ai GitHub repository a star ⭐.
Thanks to our sponsors, each ⭐ added to the MLSysBook.ai GitHub repository translates to donations supporting students and minorities globally by funding their research scholarships, empowering them to drive innovation in machine learning systems research worldwide.
Every star counts—help us reach the generous funding cap!
Conclusion
The gap between ML modeling and system engineering is closing, and understanding both aspects is important for creating impactful AI solutions. By embracing ML system engineering principles and leveraging powerful tools like those in the TensorFlow ecosystem, we can go beyond building models to creating complete, optimized, and scalable ML systems.
As AI continues to evolve, the demand for professionals who can bridge the gap between ML algorithms and systems implementation will only grow. Whether you’re a seasoned practitioner or just starting your ML journey, investing time in understanding ML systems engineering will undoubtedly pay dividends in your career and the impact of your work. If you’d like to learn more, listen to our MLSysBook.AI podcast, generated by Google’s NotebookLM.
Remember, even the most brilliant astronauts need skilled engineers to build their rockets!
Acknowledgments
We thank Josh Gordon for his suggestion to write this blog post and for encouraging and sharing ideas on how the book could be a useful resource for the TensorFlow community.
Ideas: The journey to DNA data storage
Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.
Accommodating the increasing amounts of digital data the world is producing requires out-of-the-box thinking. In this episode, guest host Karin Strauss, an innovation strategist and senior principal research manager at Microsoft, brings together members of her team to explore a more sustainable, more cost-effective alternative for archival data storage: synthetic DNA. Strauss, Principal Researcher Bichlien Nguyen, Senior Researcher Jake Smith, and Partner Research Manager Sergey Yekhanin discuss how Microsoft Research’s contributions have helped bring “science fiction,” as Strauss describes it, closer to reality, including its role in establishing the DNA Data Storage Alliance to foster collaboration in developing the technology and to establish specifications for interoperability. They also talk about the scope of collaboration with other fields, such as the life sciences and electrical and mechanical engineering, and the coding theory behind the project, including the group’s most powerful algorithm for DNA error correction, Trellis BMA, which is now open source.
Learn more:
- Trellis BMA: coded trace reconstruction on IDS channels for DNA storage
Publication, July 2021 - Evaluating the risk of data loss due to particle radiation damage in a DNA data storage system | Nature Communications
Publication, September 2024 - DNA Data Storage Alliance (opens in new tab)
Alliance homepage - Architecting Datacenters for Sustainability: Greener Data Storage using Synthetic DNA
Publication, September 2021 - Microsoft and UW demonstrate first fully automated DNA data storage
Video, March 2019 - Storing digital data in synthetic DNA with Dr. Karin Strauss
Microsoft Research Podcast, October 2018 - Microsoft and University of Washington DNA Storage Research Project (opens in new tab)
Video, July 2016
Subscribe to the Microsoft Research Podcast:
Transcript
[TEASER] [MUSIC PLAYS UNDER DIALOGUE]JAKE SMITH: This really starts from the fundamental data production–data storage gap, where we produce way more data nowadays than we could ever have imagined years ago. And it’s more than we can practically store in magnetic media. And so we really need a denser medium on the other side to contain that. DNA is extremely dense. It holds far, far more information per unit volume, per unit mass than any storage media that we have available today. This, along with the fact that DNA is itself a relatively rugged molecule—it lives in our body; it lives outside our body for thousands and thousands of years if we, you know, leave it alone to do its thing—makes it a very attractive media.
BICHLIEN NGUYEN: It’s such a futuristic technology, right? When you begin to work on the tech, you realize how many disciplines and domains you actually have to reach in and leverage. It’s really interesting, this multidisciplinarity, because we’re, in a way, bridging software with wetware with hardware. And so you, kind of, need all the different disciplines to actually get you to where you need to go.
SERGEY YEKHANIN: We all work for Microsoft; we are all Microsoft researchers. Microsoft isn’t a startup. But that team, the team that drove the DNA Data Storage Project, it did feel like a startup, and it was something unusual and exciting for me.
SERIES INTRO: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.
[MUSIC FADES]
GUEST HOST KARIN STRAUSS: I’m your guest host Karin Strauss, a senior principal research manager at Microsoft. For nearly a decade, my colleagues and I—along with a fantastic and talented group of collaborators from academia and industry—have been working together to help close the data creation–data storage gap. We’re producing far more digital information than we can possibly store. One solution we’ve explored uses synthetic DNA as a medium, and over the years, we’ve contributed to steady and promising progress in the area. We’ve helped push the boundaries of how much DNA writer can simultaneously store, shown that full automation is possible, and helped create an ecosystem for the commercial success of DNA data storage. And just this week, we’ve made one of our most advanced tools for encoding and decoding data in DNA open source. Joining me today to discuss the state of DNA data storage and some of our contributions are several members of the DNA Data Storage Project at Microsoft Research: Principal Researcher Bichlien Nguyen, Senior Researcher Jake Smith, and Partner Research Manager Sergey Yekhanin. Bichlien, Jake, and Sergey, welcome to the podcast.
BICHLIEN NGUYEN: Thanks for having us, Karin.
SERGEY YEKHANIN: Thank you so much.
JAKE SMITH: Yes, thank you.
STRAUSS: So before getting into the details of DNA data storage and our work, I’d like to talk about the big idea behind the work and how we all got here. I’ve often described the DNA Data Storage Project as turning science fiction into reality. When we started the project in 2015, though, the idea of using DNA for archival storage was already out there and had been for over five decades. Still, when I talked about the work in the area, people were pretty skeptical in the beginning, and I heard things like, “Wow, why are you thinking about that? It’s so far off.” So, first, please share a bit of your research backgrounds and then how you came to work on this project. Where did you first encounter this idea, what do you remember about your initial impressions—or the impressions of others—and what made you want to get involved? Sergey, why don’t you start.
YEKHANIN: Thanks so much. So I’m a coding theorist by training, so, like, my core areas of research have been error-correcting codes and also computational complexity theory. So I joined the project probably, like, within half a year of the time that it was born, and thanks, Karin, for inviting me to join. So, like, that was roughly the time when I moved from a different lab, from the Silicon Valley lab in California to the Redmond lab, and actually, it just so happened that at that moment, I was thinking about what to do next. Like, in California, I was mostly working on coding for distributed storage, and when I joined here, that effort kept going. But I had some free cycles, and that was the moment when Karin came just to my office and told me about the project. So, indeed, initially, it did feel a lot like science fiction. Because, I mean, we are used to coding for digital storage media, like for magnetic storage media, and here, like, this is biology, and, like, why exactly these kind of molecules? There are so many different molecules. Like, why that? But honestly, like, I didn’t try to pretend to be a biologist and make conclusions about whether this is the right medium or the wrong medium. So I tried to look into these kinds of questions from a technical standpoint, and there was a lot of, kind of, deep, interesting coding questions, and that was the main attraction for me. At the same time, I wasn’t convinced that we will get as far as we actually got, and I wasn’t immediately convinced about the future of the field, but, kind of, just the depth and the richness of the, what I’ll call, technical problems, that’s what made it appealing for me, and I, kind of, enthusiastically joined. And also, I guess, the culture of the team. So, like, it did feel like a startup. Like, we all work for Microsoft; we’re all Microsoft researchers. Microsoft isn’t a startup. But that team, the team that drove the DNA Data Storage Project, it did feel like a startup, and it was something unusual and exciting for me.
NGUYEN: Oh, I love that, Sergey. So my background is in organic chemistry, and Karin had reached out to me, and I interviewed not knowing what Karin wanted. Actually … so I took the job kind of blind because I was like, “Hmm, Microsoft Research? … DNA biotech? …” I was very, very curious, and then when she told me that this project was about DNA data storage, I was like, this is a crazy, crazy idea. I definitely was not sold on it, but I was like, well, look, I get to meet and work with so many interesting people from different backgrounds that, one, even if it doesn’t work out, I’m going to learn something, and, two, I think it could work, like it could work. And so I think that’s really what motivated me to join.
SMITH: The first thing that you think when you hear about we’re going to take what is our hard drive and we’re going to turn that into DNA is that this is nuts. But, you know, it didn’t take very long after that. I come from a chemistry, biotech-type background where I’ve been working on designing drugs, and there, DNA is this thing off in the nethers, you know. You look at it every now and then to see what information it can tell you about, you know, what maybe your drug might be hitting on the target side, and it’s, you know, that connection—that the DNA contains the information in the living systems, the DNA contains the information in our assays, and why could the DNA not contain the information that we, you know, think more about every day, that information that lives in our computers—as an extremely cool idea.
STRAUSS: Through our work, we’ve had years to wrap our heads around DNA data storage. But, Jake, could you tell us a little bit about how DNA data storage works and why we’re interested in looking into the technology?
SMITH: So you mentioned it earlier, Karin, that this really starts from the fundamental data production–data storage gap, where we produce way more data nowadays than we could ever have imagined years ago. And it’s more than we can practically store in magnetic media. This is a problem because, you know, we have data; we have recognized the value of data with the rise of large language models and these other big generative models. The data that we do produce, our video has gone from, you know, substantially small, down at 480 resolution, all the way up to things at 8K resolution that now take orders of magnitude more storage. And so we really need a denser medium on the other side to contain that. DNA is extremely dense. It holds far, far more information per unit volume, per unit mass than any storage media that we have available today. This, along with the fact that DNA is itself a relatively rugged molecule—it lives in our body; it lives outside our body for thousands and thousands of years if we, you know, leave it alone to do its thing—makes it a very attractive media, particularly compared to the traditional magnetic media, which has lower density and a much shorter lifetime on the, you know, scale of decades at most.
So how does DNA data storage actually work? Well, at a very high level, we start out in the digital domain, where we have our information represented as ones and zeros, and we need to convert that into a series of A’s, C’s, T’s, and G’s that we could then actually produce, and this is really the domain of Sergey. He’ll tell us much more about how this works later on. For now, let’s just assume we’ve done this. And now our information, you know, lives in the DNA base domain. It’s still in the digital world. It’s just represented as A’s, C’s, T’s, and G’s, and we now need to make this physical so that we can store it. This is accomplished through large-scale DNA synthesis. Once the DNA has been synthesized with the sequences that we specified, we need to store it. There’s a lot of ways we can think about storing it. Bichlien’s done great work looking at DNA encapsulation, as well as, you know, other more raw just DNA-on-glass-type techniques. And we’ve done some work looking at the susceptibility of DNA stored in this unencapsulated form to things like atmospheric humidity, to temperature changes and, most excitingly, to things like neutron radiation. So we’ve stored our data in this physical form, we’ve archived it, and coming back to it, likely many years in the future because the properties of DNA match up very well with archival storage, we need to convert it back into the digital domain. And this is done through a technique called DNA sequencing. What this does is it puts the molecules through some sort of machine, and on the other side of the machine, we get out, you know, a noisy representation of what the actual sequence of bases in the molecules were. We have one final step. We need to take this series of noisy sequences and convert it back into ones and zeros. Once we do this, we return to our original data and we’ve completed, let’s call it, one DNA data storage cycle.
STRAUSS: We’ll get into this in more detail later, but maybe, Sergey, we dig a little bit on encoding-decoding end of things and how DNA is different as a medium from other types of media.
YEKHANIN: Sure. So, like, I mean, coding is an important aspect of this whole idea of DNA data storage because we have to deal with errors—it’s a new medium—but talking about error-correcting codes in the context of DNA data storage, so, I mean, usually, like … what are error-correcting codes about? Like, on the very high level, right, I mean, you have some data—think of it as a binary string—you want to store it, but there are errors. So usually, like, in most, kind of, forms of media, the errors are bit flips. Like, you store a 0; you get a 1. Or you store a 1; you get a 0. So these are called substitution errors. The field of error-correcting codes, it started, like, in the 1950s, so, like, it’s 70 years old at least. So we, kind of, we understand how to deal with this kind of error reasonably well, so with substitution errors. In DNA data storage, the way you store your data is that given, like, some large amount of digital data, you have the freedom of choosing which short DNA molecules to generate. So in a DNA molecule, it’s a sequence of the bases A, G, C, and T, and you have the freedom to decide, like, which of the short molecules you need to generate, and then those molecules get stored, and then during the storage, some of them are lost; some of them can be damaged. There can be insertions and deletions of bases on every molecule. Like, we call them strands. So you need redundancy, and there are two forms of redundancy. There’s redundancy that goes across strands, and there is redundancy on the strand. And so, yeah, so, kind of, from the error-correcting side of things, like, we get to decide what kind of redundancy we want to introduce—across strands, on the strand—and then, like, we want to make sure that our encoding and decoding algorithms are efficient. So that’s the coding theory angle on the field.
NGUYEN: Yeah, and then, you know, from there, once you have that data encoded into DNA, the question is how do you make that data on a scale that’s compatible with digital data storage? And so that’s where a lot of the work came in for really automating the synthesis process and also the reading process, as well. So synthesis is what we consider the writing process of DNA data storage. And so, you know, we came up with some unique ideas there. We made a chip that enabled us to get to the densities that we needed. And then on the reading side, we used different sequencing technologies. And it was great to see that we could actually just, kind of, pull sequencing technologies off the shelf because people are so interested in reading biological DNA. So we explored the Illumina technologies and also Oxford Nanopore, which is a new technology coming in the horizon. And then preservation, too, because we have to make sure that the data that’s stored in the DNA doesn’t get damaged and that we can recover it using the error-correcting codes.
STRAUSS: Yeah, absolutely. And it’s clear that—and it’s also been our experience that—DNA data storage and projects like this require more than just a team of computer scientists. Bichlien, you’ve had the opportunity to collaborate with many people in all different disciplines. So do you want to talk a little bit about that? What kind of expertise, you know, other disciplines that are relevant to bringing DNA data storage to reality?
NGUYEN: Yeah, well, it’s such a futuristic technology, right? When you begin to work on the tech, you realize how many disciplines and domains you actually have to reach in and leverage. One concrete example is that in order to fabricate an electronic chip to synthesize DNA, we really had to pull in a lot of material science research because there’s different capabilities that are needed when trying to use liquid on a chip. We, you know, have to think about DNA data storage itself. And that’s a very different beast than, you know, the traditional storage mediums. And so we worked with teams who literally create, you know, these little tiny micro- or nanocapsules in glass and being able to store that there. It’s really interesting, this multidisciplinarity, because we’re, in a way, bridging software with wetware with hardware. And so you, kind of, need all the different disciplines to actually get you to where you need to go.
STRAUSS: Yeah, absolutely. And, you know, building on, you know, collaborators, I think one area that was super interesting, as well, and was pretty early on in the project was building that first end-to-end system that we collaborated with University of Washington, the Molecular Information Systems Lab there, to build. And really, at that point, you know, there had been work suggesting that DNA data storage was viable, but nobody had really shown an end-to-end system, from beginning to end, and in fact, my manager at the time, Doug Carmean, used to call it the “bubble gum and shoestring” system. But it was a crucial first step because it shows it was possible to really fully automate the process. And there have been several interesting challenges there in the system, but we noticed that one particularly challenging one was synthesis. That first system that we built was capable of storing the word “hello,” and that was all we could store. So it wasn’t a very high-capacity system. But in order to be able to store a lot more volumes of data instead of a simple word, we really needed much more advanced synthesis systems. And this is what both Bichlien and Jake ended up working on, so do you want to talk a little bit about that and the importance of that particular work?
SMITH: Yeah, absolutely. As you said, Karin, the amount of DNA that is required to store the massive amount of data we spoke about earlier is far beyond the amount of DNA that’s needed for any, air quotes, traditional applications of synthetic DNA, whether it’s your gene construction or it’s your primer synthesis or such. And so we really had to rethink how you make DNA at scale and think about how could this actually scale to meet the demand. And so Bichlien started out looking at a thing called a microelectrode array, where you have this big checkerboard of small individual reaction sites, and in each reaction site, we used electrochemistry in order to control base by base—A, C, T, or G by A, C, T, or G—the sequence that was growing at that particular reaction site. We got this down to the nanoscale. And so what this means practically is that on one of these chips, we could synthesize at any given time on the order of hundreds of millions of individual strands. So once we had the synthesis working with the traditional chemistry where you’re doing chemical synthesis—each base is added in using a mixture of chemicals that are added to the individual spots—they’re activated. But each coupling happens due to some energy you prestored in the synthesis of your reagents. And this makes the synthesis of those reagents costly and themselves a bottleneck. And so taking, you know, a look forward at what else was happening in the synthetic biology world, the, you know, next big word in DNA synthesis was and still is enzymatic synthesis, where rather than having to, you know, spend a lot of energy to chemically pre-activate reagents that will go in to make your actual DNA strands, we capitalize on nature’s synthetic robots—enzymes—to start with less-activated, less-expensive-to-get-to, cheaply-produced-through-natural-processes substrates, and we use the enzymes themselves, toggling their activity over each of the individual chips, or each of the individual spots on our checkerboard, to construct DNA strands. And so we got a little bit into this project. You know, we successfully showed that we could put down selectively one base at a given time. We hope that others will, kind of, take up the work that we’ve put out there, you know, particularly our wonderful collaborators at Ansa who helped us design the enzymatic system. And one day we will see, you know, a truly parallelized, in this fashion, enzymatic DNA system that can achieve the scales necessary.
NGUYEN: It’s interesting to note that even though it’s DNA and we’re still storing data in these DNA strands, chemical synthesis and enzymatic synthesis provide different errors that you see in the actual files, right, in the DNA files. And so I know that we talked to Sergey about how do we deal with these new types of errors and also the new capabilities that you can have, for example, if you don’t control base by base the DNA synthesis.
YEKHANIN: This whole field of DNA data storage, like, the technologies on the biology side are advancing rapidly, right. And there are different approaches to synthesis. There are different approaches to sequencing. And, presumably, the way the storage is actually done, like, is also progressing, right, and we had works on that. So there is, kind of, this very general, kind of, high-level error profile that you can say that these are the type of errors that you encounter in DNA data storage. Like, in DNA molecules—just the sequence of these bases, A, G, C, T, in maybe a length of, like, 200 or so and you store a very, very large number of them—the errors that you see is that some of these strands, kind of, will disappear. Some of these strings can be torn apart like, let’s say, in two pieces, maybe even more. And then on every strand, you also encounter these errors—insertions, deletions, substitutions—with different rates. Like, the likelihood of all kinds of these errors may differ very significantly across different technologies that you use on the biology side. And also there can be error bursts somehow. Maybe you can get an insertion of, I don’t know, 10 A’s, like, in a row, or you can lose, like, you know, 10 bases in a row. So if you don’t, kind of, quantify, like, what are the likelihoods of all these bad events happening, then I think this still, kind of, fits at least the majority of approaches to DNA data storage, maybe not exactly all of them, but it fits the majority. So when we design coding schemes, we are trying also, kind of, to look ahead in the sense that, like, we don’t know, like, in five years, like, how will these error profiles, how will it look like. So the technologies that we develop on the error-correction side, we try to keep them very flexible, so whether it’s enzymatic synthesis, whether it’s Nanopore technology, whether it’s Illumina technology that is being used, the error-correction algorithms would be able to adapt and would still be useful. But, I mean, this makes also coding aspect harder because, [LAUGHTER] kind of, you want to keep all this flexibility in mind.
STRAUSS: So, Sergey, we are at an interesting moment now because you’re open sourcing the Trellis BMA piece of code, right, that you published a few years ago. Can you talk a little bit about that specific problem of trace reconstruction and then the paper specifically and how it solves it?
YEKHANIN: Absolutely, yeah, so this Trellis BMA paper for that we are releasing the source code right now, this is, kind of, this is the latest in our sequence of publications on error-correction for DNA data storage. And I should say that, like, we already discussed that the project is, kind of, very interdisciplinary. So, like, we have experts from all kinds of fields. But really even within, like, within this coding theory, like, within computer science/information theory, coding theory, in our algorithms, we use ideas from very different branches. I mean, there are some core ideas from, like, core algorithm space, and I won’t go into these, but let me just focus, kind of, on two aspects. So when we just faced this problem of coding for DNA data storage and we were thinking about, OK, so how to exactly design the coding scheme and what are the algorithms that we’ll be using for error correction, so, I mean, we’re always studying the literature, and we came up on this problem called trace reconstruction that was pretty popular—I mean, somewhat popular, I would say—in computer science and in statistics. It didn’t have much motivation, but very strong mathematicians had been looking at it. And the problem is as follows. So, like, there is a long binary string picked at random, and then it’s transmitted over a deletion channel, so some bits—some zeros and some ones—at certain coordinates get deleted and you get to see, kind of, the shortened version of the string. But you get to see it multiple times. And the question is, like, how many times do you need to see it so that you can get a reasonably accurate estimate of the original string that was transmitted? So that was called trace reconstruction, and we took a lot of motivation—we took a lot of inspiration—from the problem, I would say, because really, in DNA data storage, if we think about a single strand, like, a single strand which is being stored, after we read it, we usually get multiple reads of this string. And, well, the errors there are not just deletions. There are insertions, substitutions, and, like, inversive errors, but still we could rely on this literature in computer science that already had some ideas. So there was an algorithm called BMA, Bitwise Majority Alignment. We extended it—we adopted it, kind of, for the needs of DNA data storage—and it became, kind of, one of the tools in our toolbox for error correction.
So we also started to use ideas from literature on electrical engineering, what are called convolutional error-correcting codes and a certain, kind of, class of algorithms for decoding errors in these convolutional error-correcting codes called, like, I mean, Trellis is the main data structure, like, Trellis-based algorithms for decoding convolutional codes, like, Viterbi algorithm or BCJR algorithm. Convolutional codes allow you to introduce redundancy on the string. So, like, with algorithms kind of similar to BMA, like, they were good for doing error correction when there was no redundancy on the strand itself. Like, when there is redundancy on the strand, kind of, we could do some things, but really it was very limited. With Trellis-based approaches, like, again inspired by the literature in electrical engineering, we had an approach to introduce redundancy on the strand, so that allowed us to have more powerful error-correction algorithms. And then in the end, we have this algorithm, which we call Trellis BMA, which, kind of, combines ideas from both fields. So it’s based on Trellis, but it’s also more efficient than standard Trellis-based algorithms because it uses ideas from BMA from computer science literature. So this is, kind of, this is a mix of these two approaches. And, yeah, that’s the paper that we wrote about three years ago. And now we’re open sourcing it. So it is the most powerful algorithm for DNA error correction that we developed in the group. We’re really happy that now we are making it publicly available so that anybody can experiment with the source code. Because, again, the field has expanded a lot, and now there are multiple groups around the globe that work just specifically on error correction apart from all other aspects, so, yeah, so we are really happy that it’s become publicly available to hopefully further advance the field.
STRAUSS: Yeah, absolutely, and I’m always amazed by, you know, how, it is really about building on other people’s work. Jake and Bichlien, you recently published a paper in Nature Communications. Can you tell us a little bit about what it was, what you exposed the DNA to, and what it was specifically about?
NGUYEN: Yeah. So that paper was on the effects of neutron radiation on DNA data storage. So, you know, when we started the DNA Data Storage Project, it was really a comparison, right, between the different storage medias that exist today. And one of the issues that have come up through the years of development of those technologies was, you know, hard errors and soft errors that were induced by radiation. So we wanted to know, does that maybe happen in DNA? We know that DNA, in humans at least, is affected by radiation from cosmic rays. And so that was really the motivation for this type of experiment. So what we did was we essentially took our DNA files and dried them and threw them in a neutron accelerator, which was fantastic. It was so exciting. That’s, kind of, the merge of, you know, sci fi with sci fi at the same time. [LAUGHS] It was fantastic. And we irradiated for over 80 million years—
STRAUSS: The equivalent of …
NGUYEN: The equivalent of 80 million years.
STRAUSS: Yes, because it’s a lot of radiation all at the same time, …
NGUYEN: It’s a lot of radiation …
STRAUSS: … and it’s accelerated radiation exposure?
NGUYEN: Yeah, I would say it’s accelerated aging with radiation. It’s an insane amount of radiation. And it was surprising that even though we irradiated our DNA files with that much radiation, there wasn’t that much damage. And that’s surprising because, you know, we know that humans, if we were to be irradiated like that, it would be disastrous. But in, you know, DNA, our files were able to be recovered with zero bit errors.
STRAUSS: And why that difference?
NGUYEN: Well, we think there’s a few reasons. One is that when you look at the interaction between a neutron and the actual elemental composition of DNA—which is basically carbons, oxygens, and hydrogens, maybe a phosphorus—the neutrons don’t interact with the DNA much. And if it did interact, we would have, for example, a strand break, which based on the error-correcting codes, we can recover from. So essentially, there’s not much … one, there’s not much interaction between neutrons and DNA, and second, we have error-correcting codes that would prevent any data loss.
STRAUSS: Awesome, so yeah, this is another milestone that contributes towards the technology becoming a reality. There are also other conditions that are needed for technology to be brought to the market. And one thing I’ve worked on is to, you know, create the DNA Data Storage Alliance; this is something Microsoft co-founded with Illumina, Twist Bioscience, and Western Digital. And the goal there was to essentially provide the right conditions for the technology to thrive commercially. We did bring together multiple universities and companies that were interested in the technology. And one thing that we’ve seen with storage technologies that’s been pretty important is standardization and making sure that the technology’s interoperable. And, you know, we’ve seen stalemate situations like Blu-ray and high-definition DVD, where, you know, really we couldn’t decide on a standard, and the technology, it took a while for the technology to be picked up, and the intent of the DNA Data Storage [Alliance] is to provide an ecosystem of companies, universities, groups interested in making sure that this time, it’s an interoperable technology from the get-go, and that increases the chances of commercial adoption. As a group, we often talk about how amazing it is to work for a company that empowers us to do this kind of research. And for me, one of Microsoft Research’s unique strengths, particularly in this project, is the opportunity to work with such a diverse set of collaborators on such a multidisciplinary project like we have. How do you all think where you’ve done this work has impacted how you’ve gone about it and the contributions you’ve been able to make?
NGUYEN: I’m going to start with if we look around this table and we see who’s sitting at it, which is two chemists, a computer architect, and a coding theorist, and we come together and we’re like, what can we make that would be super, super impactful? I think that’s the answer right there, is that being at Microsoft and being in a culture that really fosters this type of interdisciplinary collaboration is the key to getting a project like this off the ground.
SMITH: Yeah, absolutely. And we should acknowledge the gigantic contributions made by our collaborators at the University of Washington. Many of them would fall in not any of these three categories. They’re electrical engineers, they’re mechanical engineers, they’re pure biologists that we worked with. And each of them brought their own perspective, and particularly when you talk about going to a true end-to-end system, those perspectives were invaluable as we were trying to fit all the puzzle pieces together.
STRAUSS: Yeah, absolutely. We’ve had great collaborations over time—University of Washington, ETH Zürich, Los Alamos National Lab, ChipIr, Twist Bioscience, Ansa Biotechnologies. Yeah, it’s been really great and a great set of different disciplines, all the way from coding theorists to the molecular biology and chemistry, electrical and mechanical engineering. One of the great things about research is there’s never a shortage of interesting questions to pursue, and for us, this particular work has opened the door to research in adjacent domains, including sustainability fields. DNA data storage requires small amounts of materials to accommodate the large amounts of data, and early on, we wanted to understand if DNA data storage was, as it seemed, a more sustainable way to store information. And we learned a lot. Bichlien and Jake, you had experience in green chemistry when you came to Microsoft. What new findings did we make, and what sustainability benefits do we get with DNA data storage? And, finally, what new sustainability work has the project led to?
NGUYEN: As a part of this project, if we’re going to bring new technologies to the forefront, you know, to the world, we should make sure that they have a lower carbon footprint, for example, than previous technologies. And so we ran a life cycle assessment—which is a way to systematically evaluate the environmental impacts of anything of interest—and we did this on DNA data storage and compared it to electronic storage medium[1], and we noticed that if we were able to store all of our digital information in DNA, that we would have benefits associated with carbon emissions. We would be able to reduce that because we don’t need as much infrastructure compared to the traditional storage methods. And there would be an energy reduction, as well, because this is a passive way of archival data storage. So that was, you know, the main takeaways that we had. But that also, kind of, led us to think about other technologies that would be beneficial beyond data storage and how we could use the same kind of life cycle thinking towards that.
SMITH: This design approach that you’ve, you know, talked about us stumbling on, not inventing but seeing other people doing in the literature and trying to implement ourselves on the DNA Data Storage Project, you know, is something that can be much bigger than any single material. And where we think there’s a, you know, chance for folks like ourselves at Microsoft Research to make a real impact on this sustainability-focused design is through the application of machine learning, artificial intelligence—the new tools that will allow us to look at much bigger design spaces than we could previously to evaluate sustainability metrics that were not possible when everything was done manually and to ultimately, you know, at the end of the day, take a sustainability-first look at what a material should be composed of. And so we’ve tried to prototype this with a few projects. We had another wonderful collaboration with the University of Washington where we looked at recyclable circuit boards and a novel material called a vitrimer that it could possibly be made out of[2]. We’ve had another great collaboration with the University of Michigan, where we’ve looked at the design of charge-carrying molecules in these things called flow batteries that have good potential for energy smoothing in, you know, renewables production, trying to get us out of that day-night, boom-bust cycle[3]. And we had one more project, you know, this time with collaborators at the University of Berkeley, where we looked at, you know, design of a class of materials called a metal organic framework, which have great promise in low-energy-cost gas separation, such as pulling CO2 out of the, you know, plume of a smokestack or, you know, ideally out of the air itself[4].
STRAUSS: For me, the DNA work has made me much more open to projects outside my own research area—as Bichlien mentioned, my core research area is computer architecture, but we’ve ventured in quite a bit of other areas here—and going way beyond my own comfort zone and really made me love interdisciplinary projects like this and try, really try, to do the most important work I can. And this is what attracted me to these other areas of environmental sustainability that Bichlien and Jake covered, where there’s absolutely no lack of problems. Like them, I’m super interested in using AI to solve many of them. So how do each of you think working on the DNA Data Storage Project has influenced your research approach more generally and how you think about research questions to pursue next?
YEKHANIN: It definitely expanded the horizons a lot, like, just, kind of, just having this interactions with people, kind of, whose core areas of research are so different from my own and also a lot of learning even within my own field that we had to do to, kind of, carry this project out. So, I mean, it was a great and rewarding experience.
NGUYEN: Yeah, for me, it’s kind of the opposite of Karin, right. I started as an organic chemist and then now really, one, appreciate the breadth and depth of going from a concept to a real end-to-end prototype and all the requirements that you need to get there. And then also, really the importance of having, you know, a background in computer science and really being able to understand the lingo that is used in multidisciplinary projects because you might say something and someone else interprets it very differently, and it’s because you’re not speaking the same language. And so that understanding that you have to really be … you have to learn a little bit of vocabulary from each person and understand how they contribute and then how your ideas can contribute to their ideas has been really impactful in my career here.
SMITH: Yeah, I think the key change in approach that I took away—and I think many of us took away from the DNA Data Storage Project—was rather than starting with an academic question, we started with a vision of what we wanted to happen, and then we derived the research questions from analyzing what would need to happen in the world—what are the bottlenecks that need to be solved in order for us to achieve, you know, that goal? And this is something that we’ve taken with us into the sustainability-focused research and, you know, something that I think will affect all the research I do going forward.
STRAUSS: Awesome. As we close, let’s reflect a bit on what a world in which DNA data storage is widely used might look like. If everything goes as planned, what do you hope the lasting impact of this work will be? Sergey, why don’t you lead us off.
YEKHANIN: Sure, I remember that, like, when … in the early days when I started working on this project actually, you, Karin, told me that you were taking an Uber ride somewhere and you were talking to the taxi driver, and the taxi driver—I don’t know if you remember that—but the taxi driver mentioned that he has a camera which is recording everything that’s happening in the car. And then you had a discussion with him about, like, how long does he keep the data, how long does he keep the videos. And he told you that he keeps it for about a couple of days because it’s too expensive. But otherwise, like, if it weren’t that expensive, he would keep it for much, much longer because, like, he wants to have these recordings if later somebody is upset about the ride and, I don’t know, he is getting sued or something. So this is, like, this is one small narrow application area where DNA data storage would clearly, kind of, if it happens, then it will solve it. Because then, kind of, this long-term archival storage will become very cheap, available to everybody; it would become a commodity basically. There are many things that will be enabled, like this helping the Uber drivers, for instance. But also one has to think of, of course, like, about, kind of, the broader implications so that we don’t get into something negative because again this power of recording everything and storing everything, it can also lead to some use cases that might be, kind of, morally wrong. So, again, hopefully by the time that we get to, like, really wide deployments of this technology, the regulation will also be catching up and the, like, we will have great use cases and we won’t have bad ones. I mean, that’s how I think of it. But definitely there are lots of, kind of, great scenarios that this can enable.
SMITH: Yeah. I’ll grab onto the word you use there, which is making DNA a commodity. And one of the things that I hope comes out of this project, you know, besides all the great benefits of DNA data storage itself is spillover benefits into the field of health—where if we make DNA synthesis at large scale truly a commodity thing, which I hope some of the work that we’ve done to really accelerate the throughput of synthesis will do—then this will open new doors in what we can do in terms of gene synthesis, in terms of, like, fundamental biotech research that will lead to that next set of drugs and, you know, give us medications or treatments that we could not have thought possible if we were not able to synthesize DNA and related molecules at that scale.
NGUYEN: So much information gets lost because of just time. And so I think being able to recover really ancient history that humans wrote in the future, I think, is something that I really hope could be achieved because we’re so information rich, but in the course of time, we become information poor, and so I would like for our future generations to be able to understand the life of, you know, an everyday 21st-century person.
STRAUSS: Well, Bichlien, Jake, Sergey, it’s been fun having this conversation with you today and collaborating with you in all of this amazing project [MUSIC] and all the research we’ve done together. Thank you so much.
YEKHANIN: Thank you, Karin.
SMITH: Thank you.
NGUYEN: Thanks.
[MUSIC FADES]
[1] The team presented the findings from their life cycle assessment of DNA data storage in the paper Architecting Datacenters for Sustainability: Greener Data Storage using Synthetic DNA.
[2] For more information, check out the podcast episode Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth and the paper Recyclable vitrimer-based printed circuit boards for sustainable electronics.
[3] For more information, check out the podcast episode Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi.
[4] For more information, check out the paper MOFDiff: Coarse-grained Diffusion for Metal-Organic Framework Design.
The post Ideas: The journey to DNA data storage appeared first on Microsoft Research.
NVIDIA and Microsoft Showcase Blackwell Preview, Omniverse Industrial AI and RTX AI PCs at Microsoft Ignite
NVIDIA and Microsoft today unveiled product integrations designed to advance full-stack NVIDIA AI development on Microsoft platforms and applications.
At Microsoft Ignite, Microsoft announced the launch of the first cloud private preview of the Azure ND GB200 V6 VM series, based on the NVIDIA Blackwell platform. The Azure ND GB200 v6 will be a new AI-optimized virtual machine (VM) series and combines the NVIDIA GB200 NVL72 rack design with NVIDIA Quantum InfiniBand networking.
In addition, Microsoft revealed that Azure Container Apps now supports NVIDIA GPUs, enabling simplified and scalable AI deployment. Plus, the NVIDIA AI platform on Azure includes new reference workflows for industrial AI and an NVIDIA Omniverse Blueprint for creating immersive, AI-powered visuals.
At Ignite, NVIDIA also announced multimodal small language models (SLMs) for RTX AI PCs and workstations, enhancing digital human interactions and virtual assistants with greater realism.
NVIDIA Blackwell Powers Next-Gen AI on Microsoft Azure
Microsoft’s new Azure ND GB200 V6 VM series will harness the powerful performance of NVIDIA GB200 Grace Blackwell Superchips, coupled with advanced NVIDIA Quantum InfiniBand networking. This offering is optimized for large-scale deep learning workloads to accelerate breakthroughs in natural language processing, computer vision and more.
The Blackwell-based VM series complements previously announced Azure AI clusters with ND H200 V5 VMs, which provide increased high-bandwidth memory for improved AI inferencing. The ND H200 V5 VMs are already being used by OpenAI to enhance ChatGPT.
Azure Container Apps Enables Serverless AI Inference With NVIDIA Accelerated Computing
Serverless computing provides AI application developers increased agility to rapidly deploy, scale and iterate on applications without worrying about underlying infrastructure. This enables them to focus on optimizing models and improving functionality while minimizing operational overhead.
The Azure Container Apps serverless containers platform simplifies deploying and managing microservices-based applications by abstracting away the underlying infrastructure.
Azure Container Apps now supports NVIDIA-accelerated workloads with serverless GPUs, allowing developers to use the power of accelerated computing for real-time AI inference applications in a flexible, consumption-based, serverless environment. This capability simplifies AI deployments at scale while improving resource efficiency and application performance without the burden of infrastructure management.
Serverless GPUs allow development teams to focus more on innovation and less on infrastructure management. With per-second billing and scale-to-zero capabilities, customers pay only for the compute they use, helping ensure resource utilization is both economical and efficient. NVIDIA is also working with Microsoft to bring NVIDIA NIM microservices to serverless NVIDIA GPUs in Azure to optimize AI model performance.
NVIDIA Unveils Omniverse Reference Workflows for Advanced 3D Applications
NVIDIA announced reference workflows that help developers to build 3D simulation and digital twin applications on NVIDIA Omniverse and Universal Scene Description (OpenUSD) — accelerating industrial AI and advancing AI-driven creativity.
A reference workflow for 3D remote monitoring of industrial operations is coming soon to enable developers to connect physically accurate 3D models of industrial systems to real-time data from Azure IoT Operations and Power BI.
These two Microsoft services integrate with applications built on NVIDIA Omniverse and OpenUSD to provide solutions for industrial IoT use cases. This helps remote operations teams accelerate decision-making and optimize processes in production facilities.
The Omniverse Blueprint for precise visual generative AI enables developers to create applications that let nontechnical teams generate AI-enhanced visuals while preserving brand assets. The blueprint supports models like SDXL and Shutterstock Generative 3D to streamline the creation of on-brand, AI-generated images.
Leading creative groups, including Accenture Song, Collective, GRIP, Monks and WPP, have adopted this NVIDIA Omniverse Blueprint to personalize and customize imagery across markets.
Accelerating Gen AI for Windows With RTX AI PCs
NVIDIA’s collaboration with Microsoft extends to bringing AI capabilities to personal computing devices.
At Ignite, NVIDIA announced its new multimodal SLM, NVIDIA Nemovision-4B Instruct, for understanding visual imagery in the real world and on screen. It’s coming soon to RTX AI PCs and workstations and will pave the way for more sophisticated and lifelike digital human interactions.
Plus, updates to NVIDIA TensorRT Model Optimizer (ModelOpt) offer Windows developers a path to optimize a model for ONNX Runtime deployment. TensorRT ModelOpt enables developers to create AI models for PCs that are faster and more accurate when accelerated by RTX GPUs. This enables large models to fit within the constraints of PC environments, while making it easy for developers to deploy across the PC ecosystem with ONNX runtimes.
RTX AI-enabled PCs and workstations offer enhanced productivity tools, creative applications and immersive experiences powered by local AI processing.
Full-Stack Collaboration for AI Development
NVIDIA’s extensive ecosystem of partners and developers brings a wealth of AI and high-performance computing options to the Azure platform.
SoftServe, a global IT consulting and digital services provider, today announced the availability of SoftServe Gen AI Industrial Assistant, based on the NVIDIA AI Blueprint for multimodal PDF data extraction, on the Azure marketplace. The assistant addresses critical challenges in manufacturing by using AI to enhance equipment maintenance and improve worker productivity.
At Ignite, AT&T will showcase how it’s using NVIDIA AI and Azure to enhance operational efficiency, boost employee productivity and drive business growth through retrieval-augmented generation and autonomous assistants and agents.
Learn more about NVIDIA and Microsoft’s collaboration and sessions at Ignite.
See notice regarding software product information.
Microsoft and NVIDIA Supercharge AI Development on RTX AI PCs
Generative AI-powered laptops and PCs are unlocking advancements in gaming, content creation, productivity and development. Today, over 600 Windows apps and games are already running AI locally on more than 100 million GeForce RTX AI PCs worldwide, delivering fast, reliable and low-latency performance.
At Microsoft Ignite, NVIDIA and Microsoft announced tools to help Windows developers quickly build and optimize AI-powered apps on RTX AI PCs, making local AI more accessible. These new tools enable application and game developers to harness powerful RTX GPUs to accelerate complex AI workflows for applications such as AI agents, app assistants and digital humans.
RTX AI PCs Power Digital Humans With Multimodal Small Language Models
NVIDIA ACE is a suite of digital human technologies that brings life to agents, assistants and avatars. To achieve a higher level of understanding so that they can respond with greater context-awareness, digital humans must be able to visually perceive the world like humans do.
Enhancing digital human interactions with greater realism demands technology that enables perception and understanding of their surroundings with greater nuance. To achieve this, NVIDIA developed multimodal small language models that can process both text and imagery, excel in role-playing and are optimized for rapid response times.
The NVIDIA Nemovision-4B-Instruct model, soon to be available, uses the latest NVIDIA VILA and NVIDIA NeMo framework for distilling, pruning and quantizing to become small enough to perform on RTX GPUs with the accuracy developers need.
The model enables digital humans to understand visual imagery in the real world and on the screen to deliver relevant responses. Multimodality serves as the foundation for agentic workflows and offers a sneak peek into a future where digital humans can reason and take action with minimal assistance from a user.
NVIDIA is also introducing the Mistral NeMo Minitron 128k Instruct family, a suite of large-context small language models designed for optimized, efficient digital human interactions, coming soon. Available in 8B-, 4B- and 2B-parameter versions, these models offer flexible options for balancing speed, memory usage and accuracy on RTX AI PCs. They can handle large datasets in a single pass, eliminating the need for data segmentation and reassembly. Built in the GGUF format, these models enhance efficiency on low-power devices and support compatibility with multiple programming languages.
Turbocharge Gen AI With NVIDIA TensorRT Model Optimizer for Windows
When bringing models to PC environments, developers face the challenge of limited memory and compute resources for running AI locally. And they want to make models available to as many people as possible, with minimal accuracy loss.
Today, NVIDIA announced updates to NVIDIA TensorRT Model Optimizer (ModelOpt) to offer Windows developers an improved way to optimize models for ONNX Runtime deployment.
With the latest updates, TensorRT ModelOpt enables models to be optimized into an ONNX checkpoint for deploying the model within ONNX runtime environments — using GPU execution providers such as CUDA, TensorRT and DirectML.
TensorRT-ModelOpt includes advanced quantization algorithms, such as INT4-Activation Aware Weight Quantization. Compared to other tools such as Olive, the new method reduces the memory footprint of the model and improves throughput performance on RTX GPUs.
During deployment, the models can have up to 2.6x reduced memory footprint compared to FP16 models. This results in faster throughput, with minimal accuracy degradation, allowing them to run on a wider range of PCs.
Learn more about how developers on Microsoft systems, from Windows RTX AI PCs to NVIDIA Blackwell-powered Azure servers, are transforming how users interact with AI on a daily basis.
Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications
This paper was accepted at the Machine Learning and Compression Workshop at NeurIPS 2024.
Compressing Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. In this work, we dive into how compression damages LLMs’ inherent knowledge and the possible remedies. We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed model to (re)learn from data with additional parameters; the other presumes that knowledge is internally…Apple Machine Learning Research
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length (concat-and-chunk). Recent attention implementations mask cross-document attention, reducing the effective length of a chunk of tokens. Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length…Apple Machine Learning Research
Towards Low-Bit Communication for Tensor Parallel LLM Inference
This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.
Tensor parallelism provides an effective way to increase server large language model (LLM) inference efficiency despite adding an additional communication cost. However, as server LLMs continue to scale in size, they will need to be distributed across more devices, magnifying the communication cost. One way to approach this problem is with quantization, but current methods for LLMs tend to avoid quantizing the features that tensor parallelism needs to communicate. Taking advantage…Apple Machine Learning Research