AI Chases the Storm: New NVIDIA Research Boosts Weather Prediction, Climate Simulation

AI Chases the Storm: New NVIDIA Research Boosts Weather Prediction, Climate Simulation

As hurricanes, tornadoes and other extreme weather events occur with increased frequency and severity, it’s more important than ever to improve and accelerate climate research and prediction using the latest technologies.

Amid peaks in the current Atlantic hurricane season, NVIDIA Research today announced a new generative AI model, dubbed StormCast, for emulating high-fidelity atmospheric dynamics. This means the model can enable reliable weather prediction at mesoscale — a scale larger than storms but smaller than cyclones — which is critical for disaster planning and mitigation.

Detailed in a paper written in collaboration with the Lawrence Berkeley National Laboratory and the University of Washington, StormCast arrives as extreme weather phenomena are taking lives, destroying homes and causing more than $150 billion in damage annually in the U.S. alone.

It’s just one example of how generative AI is supercharging thundering breakthroughs in climate research and actionable extreme weather prediction, helping scientists tackle challenges of the highest stakes: saving lives and the world.

NVIDIA Earth-2 — a digital twin cloud platform that combines the power of AI, physical simulations and computer graphics — enables simulation and visualization of weather and climate predictions at a global scale with unprecedented accuracy and speed.

At COMPUTEX in June, NVIDIA founder and CEO Jensen Huang announced CorrDiff, available through Earth-2.

In Taiwan, for example, the National Science and Technology Center for Disaster Reduction predicts fine-scale details of typhoons using CorrDiff, an NVIDIA generative AI model offered as part of Earth-2.

CorrDiff can super-resolve 25-kilometer-scale atmospheric data by 12.5x down to 2 kilometers — 1,000x faster and using 3,000x less energy for a single inference than traditional methods.

That means the center’s potentially lifesaving work, which previously cost nearly $3 million on CPUs, can be accomplished using about $60,000 on a single system with an NVIDIA H100 Tensor Core GPU. It’s a massive reduction that shows how generative AI and accelerated computing increase energy efficiency and lower costs.

The center also plans to use CorrDiff to predict downwash — when strong winds funnel down to street level, damaging buildings and affecting pedestrians — in urban areas.

Now, StormCast adds hourly autoregressive prediction capabilities to CorrDiff, meaning it can predict future outcomes based on past ones.

A Global Impact From a Regional Focus

Global climate research begins at a regional level.

Physical hazards of weather and climate change can vary dramatically on regional scales. But reliable numerical weather prediction at this level comes with substantial computational costs. This is due to the high spatial resolution needed to represent the underlying fluid-dynamic motions at mesoscale.

Regional weather prediction models — often referred to as convection-allowing models, or CAMs — have traditionally forced researchers to face varying tradeoffs in resolution, ensemble size and affordability.

CAMs are useful to meteorologists for tracking the evolution and structure of storms, as well as for monitoring its convective mode, or how a storm is organized when it forms. For example, the likelihood of a tornado is based on a storm’s structure and convective mode.

A mesoscale convective system visualized using NOAA’s Geostationary Operational Environmental Satellite. Image courtesy of NOAA.

CAMs also help researchers understand the implications for weather-related physical hazards at the infrastructure level.

For example, global climate model simulations can be used to inform CAMs, helping them translate slow changes in the moisture content of large atmospheric rivers into flash-flooding projections in vulnerable coastal areas.

At lower resolutions, machine learning models trained on global data have emerged as useful emulators of numerical weather prediction models that can be used to improve early-warning systems for severe events. These machine learning models typically have a spatial resolution of about 30 kilometers and a temporal resolution of six hours.

Now, with the help of generative diffusion, StormCast enables this at a 3-kilometer, hourly scale.

Despite being in its infancy, the model — when applied with precipitation radars — already offers forecasts with lead times of up to six hours that are up to 10% more accurate than the U.S. National Oceanic and Atmospheric Administration (NOAA)’s state-of-the-art 3-kilometer operational CAM.

Plus, outputs from StormCast exhibit physically realistic heat and moisture dynamics, and can predict over 100 variables, such as temperature, moisture concentration, wind and rainfall radar reflectivity values at multiple, finely spaced altitudes. This enables scientists to confirm the realistic 3D evolution of a storm’s buoyancy — a first-of-its-kind accomplishment in AI weather simulation.

NVIDIA researchers trained StormCast on approximately three-and-a-half years of NOAA climate data from the central U.S., using NVIDIA accelerated computing to speed calculations.

More Innovations Brewing

Scientists are already looking to harness the model’s benefits.

“Given both the outsized impacts of organized thunderstorms and winter precipitation, and the major challenges in forecasting them with confidence, the production of computationally tractable storm-scale ensemble weather forecasts represents one of the grand challenges of numerical weather prediction,” said Tom Hamill, head of innovation at The Weather Company. “StormCast is a notable model that addresses these challenges, and The Weather Company is excited to collaborate with NVIDIA on developing, evaluating and potentially using these deep learning forecast models.”

“Developing high-resolution weather models requires AI algorithms to resolve convection, which is a huge challenge,” said Imme Ebert-Uphoff, machine learning lead at Colorado State University’s Cooperative Institute for Research in the Atmosphere. “The new NVIDIA research explores the potential of accomplishing this with diffusion models like StormCast, which presents a significant step toward the development of future AI models for high-resolution weather prediction.”

Alongside the acceleration and visualization of physically accurate climate simulations, as well as a digital twin of our planet, such research breakthroughs signify how NVIDIA Earth-2 is enabling a new, vital era of climate research.

Learn more about sustainable computing and NVIDIA Research, a global team of hundreds of scientists and engineers focused on topics including climate AI, computer graphics, computer vision, self-driving cars and robotics.

Featured image courtesy of NASA.

See notice regarding software product information.

Read More

Can You Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

Self-supervised features are typically used in place of filter-bank features in speaker verification models. However, these models were originally designed to ingest filter-banks as inputs, and thus, training them on self-supervised features assumes that both feature types require the same amount of learning for the task. In this work, we observe that pre-trained self-supervised speech features inherently include information required for a downstream speaker verification task, and therefore, we can simplify the downstream model without sacrificing performance. To this end, we revisit the…Apple Machine Learning Research

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Perform generative AI-powered data prep and no-code ML over any size of data using Amazon SageMaker Canvas

Amazon SageMaker Canvas now empowers enterprises to harness the full potential of their data by enabling support of petabyte-scale datasets. Starting today, you can interactively prepare large datasets, create end-to-end data flows, and invoke automated machine learning (AutoML) experiments on petabytes of data—a substantial leap from the previous 5 GB limit. With over 50 connectors, an intuitive Chat for data prep interface, and petabyte support, SageMaker Canvas provides a scalable, low-code/no-code (LCNC) ML solution for handling real-world, enterprise use cases.

Organizations often struggle to extract meaningful insights and value from their ever-growing volume of data. You need data engineering expertise and time to develop the proper scripts and pipelines to wrangle, clean, and transform data. Then you must experiment with numerous models and hyperparameters requiring domain expertise. Afterward, you need to manage complex clusters to process and train your ML models over these large-scale datasets.

Starting today, you can prepare your petabyte-scale data and explore many ML models with AutoML by chat and with a few clicks. In this post, we show you how you can complete all these steps with the new integration in SageMaker Canvas with Amazon EMR Serverless without writing code.

Solution overview

For this post, we use a sample dataset of a 33 GB CSV file containing flight purchase transactions from Expedia between April 16, 2022, and October 5, 2022. We use the features to predict the base fare of a ticket based on the flight date, distance, seat type, and others.

In the following sections, we demonstrate how to import and prepare the data, optionally export the data, create a model, and run inference, all in SageMaker Canvas.

Prerequisites

You can follow along by completing the following prerequisites:

  1. Set up SageMaker Canvas.
  2. Download the dataset from Kaggle and upload it to an Amazon Simple Storage Service (Amazon S3) bucket.
  3. Add emr-serverless as a trusted entity to the SageMaker Canvas execution role to allow Amazon EMR processing jobs.

Import data in SageMaker Canvas

We start by importing the data from Amazon S3 using Amazon SageMaker Data Wrangler in SageMaker Canvas. Complete the following steps:

  1. In SageMaker Canvas, choose Data Wrangler in the navigation pane.
  2. On the Data flows tab, choose Tabular on the Import and prepare dropdown menu.
  3. Enter the S3 URI for the file and choose Go, then choose Next.
  4. Give your dataset a name, choose Random for Sampling method, then choose Import.

Importing data from the SageMaker Data Wrangler flow allows you to interact with a sample of the data before scaling the data preparation flow to the full dataset. This improves time and performance because you don’t need to work with the entirety of the data during preparation. You can later use EMR Serverless to handle the heavy lifting. When SageMaker Data Wrangler finishes importing, you can start transforming the dataset.

After you import the dataset, you can first look at the Data Quality Insights Report to see recommendations from SageMaker Canvas on how to improve the data quality and therefore improve the model’s performance.

  1. In the flow, choose the options menu (three dots) for the node, then choose Get data insights.
  2. Give your analysis a name, select Regression for Problem type, choose baseFare for Target column, select Sampled dataset for Data Size, then choose Create.

Assessing the data quality and analyzing the report’s findings is often the first step because it can guide the proceeding data preparation steps. Within the report, you will find dataset statistics, high priority warnings around target leakage, skewness, anomalies, and a feature summary.

Prepare the data with SageMaker Canvas

Now that you understand your dataset characteristics and potential issues, you can use the Chat for data prep feature in SageMaker Canvas to simplify data preparation with natural language prompts. This generative artificial intelligence (AI)-powered capability reduces the time, effort, and expertise required for the often complex tasks of data preparation.

  1. Choose the .flow file on the top banner to go back to your flow canvas.
  2. Choose the options menu for the node, then choose Chat for data prep.

For our first example, converting searchDate and flightDate to datetime format might help us perform date manipulations and extract useful features such as year, month, day, and the difference in days between searchDate and flightDate. These features can find temporal patterns in the data that can influence the baseFare.

  1. Provide a prompt like “Convert searchDate and flightDate to datetime format” to view the code and choose Add to steps.

In addition to data preparation using the chat UI, you can use LCNC transforms with the SageMaker Data Wrangler UI to transform your data. For example, we use one-hot encoding as a technique to convert categorical data into numerical format using the LCNC interface.

  1. Add the transform Encode categorical.
  2. Choose One-hot encode for Transform and add the following columns: startingAirport, destinationAirport, fareBasisCode, segmentsArrivalAirportCode, segmentsDepartureAirportCode, segmentsAirlineName, segmentsAirlineCode, segmentsEquipmentDescription, and segmentsCabinCode.

You can use the advanced search and filter option in SageMaker Canvas to select columns that are of String data type to simplify the process.

Refer to the SageMaker Canvas blog for other examples using SageMaker Data Wrangler. For this post, we simplify our efforts with these two steps, but we encourage you to use both chat and transforms to add data preparation steps on your own. In our testing, we successfully ran all our data preparation steps through the chat using the following prompts as an example:

  • “Add another step that extracts relevant features such as year, month, day, and day of the week which can enhance temporality to our dataset”
  • “Have Canvas convert the travelDuration, segmentsDurationInSeconds, and segmentsDistance column from string to numeric”
  • “Handle missing values by imputing the mean for the totalTravelDistance column, and replacing missing values as ‘Unknown’ for the segmentsEquipmentDescription column”
  • “Convert boolean columns isBasicEconomy, isRefundable, and isNonStop to integer format (0 and 1)”
  • “Scale numerical features like totalFare, seatsRemaining, totalTravelDistance using Standard Scaler from scikit-learn”

When these steps are complete, you can move to the next step of processing the full dataset and creating a model.

(Optional) Export your data in Amazon S3 using an EMR Serverless job

You can process the entire 33 GB dataset by running the data flow using EMR Serverless for the data preparation job without worrying about the infrastructure.

  1. From the last node in the flow diagram, choose Export and Export data to Amazon S3.
  2. Provide a dataset name and output location.
  3. It is recommended to keep Auto job configuration selected unless you want to change any of the Amazon EMR or SageMaker Processing configs. (If your data is greater than 5 GB data processing will run in EMR Serverless, otherwise it will run within the SageMaker Canvas workspace.)
  4. Under EMR Serverless, provide a job name and choose Export.

You can view the job status in SageMaker Canvas on the Data Wrangler page on the Jobs tab.

You can also view the job status on the Amazon EMR Studio console by choosing Applications under Serverless in the navigation pane.

Create a model

You can also create a model at the end of your flow.

  1. Choose Create model from the node options, and SageMaker Canvas will create a dataset and then navigate you to create a model.
  2. Provide a dataset and model name, select Predictive analysis for Problem type, choose baseFare as the target column, then choose Export and create model.

The model creation process will take a couple of minutes to complete.

  1. Choose My Models in the navigation pane.
  2. Choose the model you just exported and navigate to version 1.
  3. Under Model type, choose Configure model.
  4. Select Numeric model type, then choose Save.
  5. On the dropdown menu, choose Quick Build to start the build process.

When the build is complete, on the Analyze page, you can the following tabs:

  • Overview – This gives you a general overview of the model’s performance, depending on the model type.
  • Scoring – This shows visualizations that you can use to get more insights into your model’s performance beyond the overall accuracy metrics.
  • Advanced metrics – This contains your model’s scores for advanced metrics and additional information that can give you a deeper understanding of your model’s performance. You can also view information such as the column impacts.

Run inference

In this section, we walk through the steps to run batch predictions against the generated dataset.

  1. On the Analyze page, choose Predict.
  2. To generate predictions on your test dataset, choose Manual.
  3. Select the test dataset you created and choose Generate predictions.
  4. When the predictions are ready, either choose View in the pop-up message at the bottom of the page or navigate to the Status column to choose Preview on the options menu (three dots).

You’re now able to review the predictions.

You have now used the generative AI data preparation capabilities in SageMaker Canvas to prepare a large dataset, trained a model using AutoML techniques, and run batch predictions at scale. All of this was done with a few clicks and using a natural language interface.

Clean up

To avoid incurring future session charges, log out of SageMaker Canvas. To log out, choose Log out in the navigation pane of the SageMaker Canvas application.

When you log out of SageMaker Canvas, your models and datasets aren’t affected, but SageMaker Canvas cancels any Quick build tasks. If you log out of SageMaker Canvas while running a Quick build, your build might be interrupted until you relaunch the application. When you relaunch, SageMaker Canvas automatically restarts the build. Standard builds continue even if you log out.

Conclusion

The introduction of petabyte-scale AutoML support within SageMaker Canvas marks a significant milestone in the democratization of ML. By combining the power of generative AI, AutoML, and the scalability of EMR Serverless, we’re empowering organizations of all sizes to unlock insights and drive business value from even the largest and most complex datasets.

The benefits of ML are no longer confined to the domain of highly specialized experts. SageMaker Canvas is revolutionizing the way businesses approach data and AI, putting the power of predictive analytics and data-driven decision-making into the hands of everyone. Explore the future of no-code ML with SageMaker Canvas today.


About the authors

Bret Pontillo is a Sr. Solutions Architect at AWS. He works closely with enterprise customers building data lakes and analytical applications on the AWS platform. In his free time, Bret enjoys traveling, watching sports, and trying new restaurants.

Polaris Jhandi is a Cloud Application Architect with AWS Professional Services. He has a background in AI/ML & big data. He is currently working with customers to migrate their legacy Mainframe applications to the Cloud.

Peter Chung is a Solutions Architect serving enterprise customers at AWS. He loves to help customers use technology to solve business problems on various topics like cutting costs and leveraging artificial intelligence. He wrote a book on AWS FinOps, and enjoys reading and building solutions.

Read More

Abstracts: August 15, 2024

Abstracts: August 15, 2024

Microsoft Research Podcast - Abstracts

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Microsoft Product Manager Shrey Jain and OpenAI Research Scientist Zoë Hitzig join host Amber Tingle to discuss “Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online.” In their paper, Jain, Hitzig, and their coauthors describe how malicious actors can draw on increasingly advanced AI tools to carry out deception, making online deception harder to detect and more harmful. Bringing ideas from cryptography into AI policy conversations, they identify a possible mitigation: a credential that allows its holder to prove they’re a person––not a bot––without sharing any identifying information. This exploratory research reflects a broad range of collaborators from across industry, academia, and the civil sector specializing in areas such as security, digital identity, advocacy, and policy.

Transcript

[MUSIC]

AMBER TINGLE: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research—in brief. I’m Amber Tingle. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Our guests today are Shrey Jain and Zoë Hitzig. Shrey is a product manager at Microsoft, and Zoë is a research scientist at OpenAI. They are two of the corresponding authors on a new paper, “Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online.” This exploratory research comprises multidisciplinary collaborators from across industry, academia, and the civil sector. The paper is available now on arXiv. Shrey and Zoë, thank you so much for joining us, and welcome back to the Microsoft Research Podcast.


SHREY JAIN: Thank you. We’re happy to be back.

ZOË HITZIG: Thanks so much.

TINGLE: Shrey, let’s start with a brief overview of your paper. Why is this research important, and why do you think this is something we should all know about?

JAIN: Malicious actors have been exploiting anonymity as a way to deceive others online. And historically, deception has been viewed as this unfortunate but necessary cost as a way to preserve the internet’s commitment to privacy and unrestricted access to information. And today, AI is changing the way we should think about malicious actors’ ability to be successful in those attacks. It makes it easier to create content that is indistinguishable from human-created content, and it is possible to do so in a way that is only getting cheaper and more accessible. And so this paper aims to address a countermeasure to protect against AI-powered deception at scale while also protecting privacy. And I think the reason why people should care about this problem is for two reasons. One is it can very soon become very logistically annoying to deal with these various different types of scams that can occur. I think we’ve all been susceptible to different types of attacks or scams that, you know, people have had. But now these scams are going to become much more persuasive and effective. And so for various different recovery purposes, it can become very challenging to get access back to your accounts or rebuild your reputation that someone may damage online. But more importantly, there’s also very dangerous things that can happen. Kids might not be safe online anymore. Or our ability to communicate online for democratic processes. A lot of the way in which we shape political views today happens online. And that’s also at risk. And in response to that, we propose in this paper a solution titled personhood credentials. Personhood credentials enable people to prove that they are in fact a real person without revealing anything more about themselves online.

TINGLE: Zoë, walk us through what’s already been done in this field, and what’s your unique contribution to the literature here?

HITZIG: I see us as intervening on two separate bodies of work. And part of what we’re doing in this paper is bringing together those two bodies of work. There’s been absolutely amazing work for decades in cryptography and in security. And what cryptographers have been able to do is to figure out protocols that allow people to prove very specific claims about themselves without revealing their full identity. So when you think about walking into a bar and the bartender asks you to prove that you’re over 21—or over 18, depending on where you are—you typically have to show your full driver’s license. And now that’s revealing a lot of information. It says, you know, where you live, whether you’re an organ donor. It’s revealing a lot of information to that bartender. And online, we don’t know what different service providers are storing about us. So, you know, the bartender might not really care where we live or whether we’re an organ donor. But when we’re signing up for digital services and we have to show a highly revealing credential like a driver’s license just to get access to something, we’re giving over too much information in some sense. And so this one body of literature that we’re really drawing on is a literature in cryptography. The idea that I was talking about there, where you can prove privately just isolated claims about yourself, that’s an idea called an anonymous credential. It allows you to be anonymous with respect to some kind of service provider while still proving a limited claim about yourself, like “I am over 18,” or in the case of personhood credentials, you prove, “I am a person.” So that’s all one body of literature. Then there’s this huge other body of literature and set of conversations happening in policy circles right now around what to do about AI. Huge questions abounding. Shrey and I have written a prior paper called “Contextual Confidence and Generative AI,” which we talked about on this podcast, as well, and in that paper, we offered a framework for thinking about the specific ways that generative AI, sort of, threatens the foundations of our modes of communication online. And we outlined about 16 different solutions that could help us to solve the coming problems that generative AI might bring to our online ecosystems. And what we decided to do in this paper was focus on a set of solutions that we thought are not getting enough attention in those AI and AI policy circles. And so part of what this paper is doing is bringing together these ideas from this long body of work in cryptography into those conversations.

TINGLE: I’d like to know more about your methodology, Shrey. How did your team go about conducting this research?

JAIN: So we had a wide range of collaborators from industry, academia, the civil sector who work on topics of digital identity, privacy, advocacy, security, and AI policy which came together to think about, what is the clearest way in which we can explain what we believe is a countermeasure that can protect against AI-powered deception that, from a technological point of view, there’s already a large body of work that we can reference but from a “how this can be implemented.” Discussing the tradeoffs that various different types of academics and industry leaders are thinking about. Can we communicate that very clearly? And so the methodology here was really about bringing together a wide range of collaborators to really bridge these two bodies of work together and communicate it clearly—not just the technical solutions but also the tradeoffs.

TINGLE: So, Zoë, what are the major findings here, and how are they presented in the paper?

HITZIG: I am an economist by training. Economists love to talk about tradeoffs. You know, when you have some of this, it means you have a little bit less of that. It’s kind of like the whole business of economics. And a key finding of the paper, as I see it, is that we begin with what feels like a tradeoff, which is on the one hand, as Shrey was saying, we want to be able to be anonymous online because that has great benefits. It means we can speak truth to power. It means we can protect civil liberties and invite everyone into online spaces. You know, privacy is a core feature of the internet. And at the same time, the, kind of, other side of the tradeoff that we’re often presented is, well, if you want all that privacy and anonymity, it means that you can’t have accountability. There’s no way of tracking down the bad actors and making sure that they don’t do something bad again. And we’re presented with this tradeoff between anonymity on the one hand and accountability on the other hand. All that is to say, a key finding of this paper, as I see it, is that personhood credentials and more generally this class of anonymous credentials that allow you to prove different pieces of your identity online without revealing your entire identity actually allow you to evade the tradeoff and allow you to, in some sense, have your cake and eat it, too. What it allows us to do is to create some accountability, to put back some way of tracing people’s digital activities to an accountable entity. What we also present in the paper are a number of different, sort of, key challenges that will have to be taken into account in building any kind of system like this. But we present all of that, all of those challenges going forward, as potentially very worth grappling with because of the potential for this, sort of, idea to allow us to preserve the internet’s commitment to privacy, free speech, and anonymity while also creating accountability for harm.

TINGLE: So Zoë mentioned some of these tradeoffs. Let’s talk a little bit more about real-world impact, Shrey. Who benefits most from this work?

JAIN: I think there’s many different people that benefit. One is anyone who’s communicating or doing anything online in that they can have more confidence in their interactions. And it, kind of, builds back on the paper that Zoë and I wrote last year on contextual confidence and generative AI, which is that we want to have confidence in our interactions, and in order to do that, one component is being able to identify who you’re speaking with and also doing it in a privacy-preserving way. And I think another person who benefits is policymakers. I think today, when we think about the language and technologies that are being promoted, this complements a lot of the existing work that’s being done on provenance and watermarking. And I think the ability for those individuals to be successful in their mission, which is creating a safer online space, this work can help guide these individuals to be more effective in their mission in that it highlights a technology that is not currently as discussed comparatively to these other solutions and complements them in order to protect online communication.

HITZIG: You know, social media is flooded with bots, and sometimes the problem with bots is that they’re posting fake content, but other times, the problem with bots is that there are just so many of them and they’re all retweeting each other and it’s very hard to tell what’s real. And so what a personhood credential can do is say, you know, maybe each person is only allowed to have five accounts on a particular social media platform.

TINGLE: So, Shrey, what’s next on your research agenda? Are there lingering questions—I know there are—and key challenges here, and if so, how do you hope to answer them?

JAIN: We believe we’ve aggregated a strong set of industry, academic, and, you know, civil sector collaborators, but we’re only a small subset of the people who are going to be interacting with these systems. And so the first area of next steps is to gather feedback about the proposal of a solution that we’ve had and how can we improve that: are there tradeoffs that we’re missing? Are there technical components that we weren’t thinking as deeply through? And I think there’s a lot of narrow open questions that come out of this. For instance, how do personhood credentials relate to existing laws regarding identity theft or protection laws? In areas where service providers can’t require government IDs, how does that apply to personhood credentials that rely on government IDs? I think that there’s a lot of these open questions that we address in the paper that I think need more experimentation and thinking through but also a lot of empirical work to be done. How do people react to personhood credentials, and does it actually enhance confidence in their interactions online? I think that there’s a lot of open questions on the actual effectiveness of these tools. And so I think there’s a large area of work to be done there, as well.

HITZIG: I’ve been thinking a lot about the early days of the internet. I wasn’t around for that, but I know that every little decision that was made in a very short period of time had incredibly lasting consequences that we’re still dealing with now. There’s an enormous path dependence in every kind of technology. And I feel that right now, we’re in that period of time, the small window where generative AI is this new thing to contend with, and it’s uprooting many of our assumptions about how our systems can work or should work. And I’m trying to think about how to set up those institutions, make these tiny decisions right so that in the future we have a digital architecture that’s really serving the goals that we want it to serve.

[MUSIC]

TINGLE: Very thoughtful. With that, Shrey Jain, Zoë Hitzig, thank you so much for joining us today.

HITZIG: Thank you so much, Amber.

TINGLE: And thanks to our listeners, as well. If you’d like to learn more about Shrey and Zoë’s work on personhood credentials and advanced AI, you’ll find a link to this paper at aka.ms/abstracts, or you can read it on arXiv. Thanks again for tuning in. I’m Amber Tingle, and we hope you’ll join us next time on Abstracts.

[MUSIC FADES]

The post Abstracts: August 15, 2024 appeared first on Microsoft Research.

Read More

ReALM: Reference Resolution as Language Modeling

Reference resolution is an important problem, one that is essential to understand and successfully handle contexts of different kinds. This context includes both previous turns and context that pertains to non-conversational entities, such as entities on the user’s screen or those running in the background. While LLMs have been shown to be extremely powerful for a variety of tasks, their use in reference resolution, particularly for non-conversational entities, remains underutilized. This paper demonstrates how LLMs can be used to create an effective system to resolve references of various…Apple Machine Learning Research

Novel-View Acoustic Synthesis From 3D Reconstructed Rooms

We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separation, and dereverberation. While naively training an end-to-end network fails to produce high-quality results, we show that incorporating room impulse responses (RIRs) derived from 3D reconstructed…Apple Machine Learning Research

RepCNN: Micro-Sized, Mighty Models for Wakeword Detection

Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model’s capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show…Apple Machine Learning Research

Delight your customers with great conversational experiences via QnABot, a generative AI chatbot

Delight your customers with great conversational experiences via QnABot, a generative AI chatbot

QnABot on AWS (an AWS Solution) now provides access to Amazon Bedrock foundational models (FMs) and Knowledge Bases for Amazon Bedrock, a fully managed end-to-end Retrieval Augmented Generation (RAG) workflow. You can now provide contextual information from your private data sources that can be used to create rich, contextual, conversational experiences.

The advent of generative artificial intelligence (AI) provides organizations unique opportunities to digitally transform customer experiences. Enterprises with contact center operations are looking to improve customer satisfaction by providing self-service, conversational, interactive chat bots that have natural language understanding (NLU). Enterprises want to automate frequently asked transactional questions, provide a friendly conversational interface, and improve operational efficiency. In turn, customers can ask a variety of questions and receive accurate answers powered by generative AI.

In this post, we discuss how to use QnABot on AWS to deploy a fully functional chatbot integrated with other AWS services, and delight your customers with human agent like conversational experiences.

Solution overview

QnABot on AWS is an AWS Solution that enterprises can use to enable a multi-channel, multi-language chatbot with NLU to improve end customer experiences. QnABot provides a flexible, tiered conversational interface empowering enterprises to meet customers where they are and provide accurate responses. Some responses need to be exact (for example, regulated industries like healthcare or capital markets), some responses need to be searched from large, indexed data sources and cited, and some answers need to be generated on the fly, conversationally, based on semantic context. With QnABot on AWS, you can achieve all of the above by deploying the solution using an AWS CloudFormation template, with no coding required. The solution is extensible, uses AWS AI and machine learning (ML) services, and integrates with multiple channels such as voice, web, and text (SMS).

QnABot on AWS provides access to multiple FMs through Amazon Bedrock, so you can create conversational interfaces based on your customers’ language needs (such as Spanish, English, or French), sophistication of questions, and accuracy of responses based on user intent. You now have the capability to access various large language models (LLMs) from leading AI enterprises (such as Amazon Titan, Anthropic Claude 3, Cohere Command, Meta Llama 3, Mistal AI Large Model, and others on Amazon Bedrock) to find a model best suited for your use case. Additionally, native integration with Knowledge Bases for Amazon Bedrock allows you to retrieve specific, relevant data from your data sources via pre-built data source connectors (Amazon Simple Storage Service – S3, Confluence, Microsoft SharePoint, Salesforce, or web crawlers), and automatically converted to text embeddings stored in a vector database of your choice. You can then retrieve your company-specific information with source attribution (such as citations) to improve transparency and minimize hallucinations. Lastly, if you don’t want to set up custom integrations with large data sources, you can simply upload your documents and support multi-turn conversations. With prompt engineering, managed RAG workflows, and access to multiple FMs, you can provide your customers rich, human agent-like experiences with precise answers.

Deploying the QnABot solution builds the following environment in the AWS Cloud.

Figure 1: QnABot Architecture Diagram

The high-level process flow for the solution components deployed with the CloudFormation template is as follows:

  1. The admin deploys the solution into their AWS account, opens the Content Designer UI or Amazon Lex web client, and uses Amazon Cognito to authenticate.
  2. After authentication, Amazon API Gateway and Amazon S3 deliver the contents of the Content Designer UI.
  3. The admin configures questions and answers in the Content Designer and the UI sends requests to API Gateway to save the questions and answers.
  4. The Content Designer AWS Lambda function saves the input in Amazon OpenSearch Service in a questions bank index. If using text embeddings, these requests first pass through a LLM model hosted on Amazon Bedrock or Amazon SageMaker to generate embeddings before being saved into the question bank on OpenSearch Service.
  5. Users of the chatbot interact with Amazon Lex through the web client UI, Amazon Alexa, or Amazon Connect.
  6. Amazon Lex forwards requests to the Bot Fulfillment Lambda function. Users can also send requests to this Lambda function through Amazon Alexa devices.
  7. The user and chat information is stored in Amazon DynamoDB to disambiguate follow-up questions from previous question and answer context.
  8. The Bot Fulfillment Lambda function takes the user’s input and uses Amazon Comprehend and Amazon Translate (if necessary) to translate non-native language requests to the native language selected by the user during the deployment, and then looks up the answer in OpenSearch Service. If using LLM features such as text generation and text embeddings, these requests first pass through various LLM models hosted on Amazon Bedrock or SageMaker to generate the search query and embeddings to compare with those saved in the question bank on OpenSearch Service.
  9. If no match is returned from the OpenSearch Service question bank, then the Bot Fulfillment Lambda function forwards the request as follows:
    1. If an Amazon Kendra index is configured for fallback, then the Bot Fulfillment Lambda function forwards the request to Amazon Kendra if no match is returned from the OpenSearch Service question bank. The text generation LLM can optionally be used to create the search query and synthesize a response from the returned document excerpts.
    2. If a knowledge base ID is configured, the Bot Fulfillment Lambda function forwards the request to the knowledge base. The Bot Fulfillment Lambda function uses the RetrieveAndGenerate API to fetch the relevant results for a user query, augment the FM’s prompt, and return the response.
  10. User interactions with the Bot Fulfillment function generate logs and metrics data, which is sent to Amazon Kinesis Data Firehose and then to Amazon S3 for later data analysis.
  11. OpenSearch Dashboards can be used to view usage history, logged utterances, no hits utterances, positive user feedback, and negative user feedback, and also provides the ability to create custom reports.

Prerequisites

To get started, you need the following:

  • An AWS account
  • An active deployment of QnABot on AWS (version 6.0.0 or later)
  • Amazon Bedrock model access (required) for all embeddings and LLM models that will be used in QnABot

Figure 2: Request Access to Bedrock Foundational Models (FMs)

In the following sections, we explore some of QnABot’s generative AI features.

Semantic question matching using an embeddings LLM

QnABot on AWS can use text embeddings to provide semantic search capabilities by using LLMs. The goal of this feature is to improve question matching accuracy while reducing the amount of tuning required when compared to the default OpenSearch Service keyword-based matching.

Some of the benefits include:

  • Improved FAQ accuracy from semantic matching vs. keyword matching (comparing the meaning vs. comparing individual words)
  • Fewer training utterances required to match a diverse set of queries
  • Better multi-language support, because translated utterances only need to match the meaning of the stored text, not the wording

Configure Amazon Bedrock to enable semantic question matching

To enable these expanded semantic search capabilities, QnABot uses an Amazon Bedrock FM to generate text embeddings provided using the EmbeddingsBedrockModelId CloudFormation stack parameter. These models provide the best performance and operate on a pay-per-request model. At the time of writing, the following embeddings models are supported by QnABot on AWS:

For the CloudFormation stack, set the following parameters:

  • Set EmbeddingsAPI to BEDROCK
  • Set EmbeddingsBedrockModelId to one of the available options

For example, with semantic matching enabled, the question “What’s the address of the White House?” matches to “Where does the President live?” This example doesn’t match using keywords because they don’t share any of the same words.

Semantic matching in QnABot

Figure 3: Semantic matching in QnABot

In the UI designer, you can set ENABLE_DEBUG_RESPONSE to true to see the user input, source, or any errors of the answer, as illustrated in the preceding screenshot.

You can also evaluate the matching score on the TEST tab in the content designer UI. In this example, we add a match on “qna item question” with the question “Where does the President live?”

Test and evaluate answer

Figure 4: Test and evaluate answers in QnABot

Similarly, you can try a match on “item text passage” with the question “Where did Humpty Dumpty sit?”

Match items or text passages

Figure 5: Match items or text passages in QnABot

Recommendations for tuning with an embeddings LLM

When using embeddings in QnABot, we recommend generalizing questions because more user utterances will match a general statement. For example, the embeddings LLM model will cluster “checking” and “savings” with “account,” so if you want to match both account types, use “account” in your questions.

Similarly, for the question and utterance of “transfer to an agent,” consider using “transfer to someone” because it will better match with “agent,” “representative,” “human,” “person,” and so on.

In addition, we recommend tuning EMBEDDINGS_SCORE_THRESHOLD, EMBEDDINGS_SCORE_ANSWER_THRESHOLD, and EMBEDDINGS_TEXT_PASSAGE_SCORE_THRESHOLD based on the scores. The default values are generalized to all multiple models, but you might need to modify this based on embeddings model and your experiments.

Text generation and query disambiguation using a text LLM

QnABot on AWS can use LLMs to provide a richer, more conversational chat experience. The goal of these features is to minimize the amount of individually curated answers administrators are required to maintain, improve question matching accuracy by providing query disambiguation, and enable the solution to provide more concise answers to users, especially when using a knowledge base in Amazon Bedrock or the Amazon Kendra fallback feature.

Configure an Amazon Bedrock FM with AWS CloudFormation

To enable these capabilities, QnABot uses one of the Amazon Bedrock FMs to generate text embeddings provided using the LLMBedrockModelId CloudFormation stack parameter. These models provide the best performance and operate on a pay-per-request model.

For the CloudFormation stack, set the following parameters:

  • Set LLMApi to BEDROCK
  • Set LLMBedrockModelId to one of the available LLM options
Setup QnABot to use Bedrock FMs

Figure 6: Setup QnABot to use Bedrock FMs

Query disambiguation (LLM-generated query)

By using an LLM, QnABot can take the user’s chat history and generate a standalone question for the current utterance. This enables users to ask follow-up questions that on their own may not be answerable without context of the conversation. The new disambiguated, or standalone, question can then be used as search queries to retrieve the best FAQ, passage, or Amazon Kendra match.

In QnABot’s Content Designer, you can further customize the prompt and model listed in the Query Matching section:

  • LLM_GENERATE_QUERY_PROMPT_TEMPLATE – The prompt template used to construct a prompt for the LLM to disambiguate a follow-up question. The template may use the following placeholders:
    • history – A placeholder for the last LLM_CHAT_HISTORY_MAX_MESSAGES messages in the conversational history, to provide conversational context.
    • input – A placeholder for the current user utterance or question.
  • LLM_GENERATE_QUERY_MODEL_PARAMS – The parameters sent to the LLM model when disambiguating follow-up questions. Refer to the relevant model documentation for additional values that the model provider accepts.

The following screenshot shows an example with the new LLM disambiguation feature enabled, given the chat history context after answering “Who was Little Bo Peep” and the follow-up question “Did she find them again?”

Use LLMs to disambiguate queries

Figure 7: LLM query disambiguation feature enabled

QnABot rewrites that question to provide all the context required to search for the relevant FAQ or passage: “Did Little Bo Peep find her lost sheep again?”

Query disambiguation with LLMs

Figure 8: With query disambiguation with LLMs, context is maintained

Answer text generation using QnABot

You can now generate answers to questions from context provided by knowledge base search results, or from text passages created or imported directly into QnABot. This allows you to generate answers that reduce the number of FAQs you have to maintain, because you can now synthesize concise answers from your existing documents in a knowledge base, Amazon Kendra index, or document passages stored in QnABot as text items. Additionally, your generated answers can be concise and therefore suitable for voice or contact center chatbots, website bots, and SMS bots. Lastly, these generated answers are compatible with the solution’s multi-language support—customers can interact in their chosen languages and receive generated answers in the same language.

With QnABot, you can use two different data sources to generate responses: text passages or a knowledge base in Amazon Bedrock.

Generate answers to questions from text passages

In the content designer web interface, administrators can store full text passages for QnABot on AWS to use. When a question gets asked that matches against this passage, the solution can use LLMs to answer the user’s question based on information found within the passage. We highly recommend you use this option with semantic question matching using Amazon Bedrock text embedding. In QnABot content designer, you can further customize the prompt and model listed under Text Generation using the General Settings section.

Let’s look at a text passage example:

  1. In the Content Designer, choose Add.
  2. Select the text, enter an item ID and a passage, and choose Create.

You can also import your passages from a JSON file using the Content Designer Import feature. On the tools menu, choose Import, open Examples/Extensions, and choose LOAD next to TextPassage-NurseryRhymeExamples to import two nursery rhyme text items.

The following example shows QnABot generating an answer using a text passage item that contains the nursery rhyme, in response to the question “Where did Humpty Dumpty sit?”

Generate answers from text passages

Figure 9: Generate answers from text passages

You can also use query disambiguation and text generation together, by asking “Who tried to fix Humpty Dumpty?” and the follow-up question “Did they succeed?”

Text generation with query disambiguation

Figure 10: Text generation with query disambiguation to maintain context

You can also modify LLM_QA_PROMPT_TEMPLATE in the Content Designer to answer in different languages. In the prompt, you can specify the prompt and answers in different languages (e.g. prompts in French, Spanish).

Answer in different languages

Figure 11: Answer in different languages

You can also specify answers in two languages with bulleted points.

Answer in multiple languages

Figure 12: Answer in multiple languages

RAG using an Amazon Bedrock knowledge base

By integrating with a knowledge base, QnABot on AWS can generate concise answers to users’ questions from configured data sources. This prevents the need for users to sift through larger text passages to find the answer. You can also create your own knowledge base from files stored in an S3 bucket. Amazon Bedrock knowledge bases with QnABot don’t require EmbeddingsApi and LLMApi because the embeddings and generative response are already provided by the knowledge base. To enable this option, create an Amazon Bedrock knowledge base and use your knowledge base ID for the CloudFormation stack parameter BedrockKnowledgeBaseId.

To configure QnABot to use the knowledge base, refer to Create a knowledge base. The following is a quick setup guide to get started:

  1. Provide your knowledge base details.
Setup Amazon Bedrock Knowledge Base

Figure 13: Setup Amazon Bedrock Knowledge Base for RAG use cases

  1. Configure your data source based on the available options. For this example, we use Amazon S3 as the data source and note that the bucket has to be prepended with qna or QNA.
Setup data sources for Knowledge Base

Figure 14: Setup your RAG data sources for Amazon Knowledge Base

  1. Upload your documents to Amazon S3. For this example, we uploaded the aws-overview.pdf whitepaper to test integration.
  2. Create or choose your vector database store to allow Bedrock to store, update and manage embeddings.
  3. Sync the data source and use your knowledge base ID for the CloudFormation stack parameter BedrockKnowledgeBaseId.
Complete setting up Amazon Bedrock Knowledge Base

Figure 15: Complete setting up Amazon Bedrock Knowledge Base for your RAG use cases

In QnABot Content Designer, you can customize additional settings list under Text Generation using RAG with the Amazon Bedrock knowledge base.

QnABot on AWS can now answer questions from the AWS whitepapers, such as “What services are available in AWS for container orchestration?” and “Are there any upfront fees with ECS?”

Generate answers from your Amazon Bedrock Knowledge Base

Figure 16: Generate answers from your Amazon Bedrock Knowledge Base (RAG)

Conclusion

Customers expect quick and efficient service from enterprises in today’s fast-paced world. But providing excellent customer experience can be significantly challenging when the volume of inquiries outpaces the human resources employed to address them. Companies of all sizes can use QnABot on AWS with built-in Amazon Bedrock integrations to provide access to many market leading FMs, provide specialized lookup needs using RAG to reduce hallucinations, and provide a friendly AI conversational experience. With QnABot on AWS, you can provide high-quality natural text conversations, content management, and multi-turn dialogues. The solution comes with one-click deployment for custom implementation, a content designer for Q&A management, and rich reporting. You can also integrate with contact center systems like Amazon Connect and Genesys Cloud CX. Get started with QnABot on AWS.


About the Author

Ajay Swamy is the Product Leader for Data, ML and Generative AI AWS Solutions. He specializes in building AWS Solutions (production-ready software packages) that deliver compelling value to customers by solving for their unique business needs. Other than QnABot on AWS, he manages Generative AI Application BuilderEnhanced Document UnderstandingDiscovering Hot Topics using Machine Learning and other AWS Solutions. He lives with his wife and dog (Figaro), in New York, NY.

Abhishek Patil is a Software Development Engineer at Amazon Web Services (AWS) based in Atlanta, GA, USA. With over 7 years of experience in the tech industry, he specializes in building distributed software systems, with a primary focus on Generative AI and Machine Learning. Abhishek is a primary builder on AI solution QnABot on AWS and has contributed to other AWS Solutions including Discovering Hot Topics using Machine Learning and OSDU® Data Platform. Outside of work, Abhishek enjoys spending time outdoors, reading, resistance training, and practicing yoga.

Read More