February 2025 – Page 5

Calling All Creators: GeForce RTX 5070 Ti GPU Accelerates Generative AI and Content Creation Workflows in Video Editing, 3D and More

The NVIDIA GeForce RTX 5070 Ti graphics cards — built on the NVIDIA Blackwell architecture — are out now, ready to power generative AI content creation and accelerate creative performance.

GeForce RTX 5070 Ti GPUs feature fifth-generation Tensor Cores with support for FP4, doubling performance and reducing VRAM requirements to run generative AI models.

In addition, the GPU comes equipped with two ninth-generation encoders and a sixth-generation decoder that add support for the 4:2:2 pro-grade color format and increase encoding quality for HEVC and AV1. This combo accelerates video editing workflows, reducing export times by 8x compared with single encoder GPUs without 4:2:2 support like the GeForce RTX 3090.

The GeForce RTX 5070 Ti GPU also includes 16GB of fast GDDR7 memory and 896 GB/sec of total memory bandwidth — a 78% increase over the GeForce RTX 4070 Ti GPU.

The GeForce RTX 5070 Ti GPU — a game changer.

NVIDIA DLSS 4, a suite of neural rendering technologies that uses AI to boost frames per second (fps) and improve image quality, is now available in professional-grade 3D apps like Chaos Vantage. D5 Render also adds DLSS 4 in beta with the new Multi Frame Generation feature to boost frame rates by 3x. 3D rendering software Maxon Redshift also added NVIDIA Blackwell support, providing a 30% performance increase.

The February NVIDIA Studio Driver, with support for the GeForce RTX 5070 Ti GPU, will be ready for download next week. For automatic Studio Driver notifications, download the NVIDIA app.

Use NVIDIA’s product finder to pick up a GeForce RTX 5070 Ti GPU or prebuilt system today. Check back regularly after 6 a.m. PT, as retail partners list their available models. Explore complete specifications.

Ready for the Generative AI Era

Black Forest Lab’s FP4-optimized FLUX.1 [dev] suite of image generation models is now available on Hugging Face.

FP4 is a lower quantization method, similar to file compression, that decreases model sizes. FLUX.1 [dev] at FP4 requires less than 10GB of VRAM, compared with over 23GB at FP16.

This means the state-of-the-art FLUX.1 [dev] model can run on the GeForce RTX 5070 Ti GPU as well as all GeForce RTX 50 Series GPUs. This is important because the FLUX.1 [dev] model wouldn’t be able to run at FP16, given memory constraints.

On the GeForce RTX 5070 Ti GPU, the FLUX.1 [dev] model can generate images in just over eight seconds on FP4, compared with 20 seconds on FP8 on a GeForce RTX 4070 Ti GPU.

Versatile Viewports

DLSS 4 is now available in Chaos Vantage and D5 Render in beta — popular professional-grade 3D apps for architects, animators and designers.

Both apps natively support DLSS 4’s improved Super Resolution and Ray Reconstruction models — powered by transformers — to increase image detail and improve stability.

D5 Render also supports DLSS 4’s DLSS Multi Frame Generation to boost frame rates by using AI to generate up to three frames per traditionally rendered frame.

This enables animators to smoothly navigate a scene with multiplied frame rates and render 3D content, even with massive file sizes, at 60 fps or more.

Maxon Redshift — a 3D rendering software that uses GPU acceleration to visualize 3D models, scenes, animations and designs — has released an update to fully harness GeForce RTX 50 Series GPUs, accelerating performance by up to 30%.

Every month brings new creative app updates and optimizations powered by the NVIDIA Studio platform. Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

See notice regarding software product information.

Evaluating Sample Utility for Data Selection by Mimicking Model Weights

Foundation models are trained on large-scale web-crawled datasets, which often contain noise, biases, and irrelevant information. This motivates the use of data selection techniques, which can be divided into model-free variants — relying on heuristic rules and downstream datasets — and model-based, e.g., using influence functions. The former can be expensive to design and risk introducing unwanted dependencies, while the latter are often computationally prohibitive. Instead, we propose an efficient, model-based approach using the Mimic Score, a new data quality metric that leverages the…Apple Machine Learning Research

Wearable Accelerometer Foundation Models for Health via Knowledge Distillation

Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly smaller power footprint and is available in almost any wearable device. While accelerometry is widely used for activity recognition and fitness, it is less…Apple Machine Learning Research

Grounding Multimodal Large Language Models in Actions

Multimodal Large Language Models (MLLMs) have demonstrated a wide range of capabilities across many domains, including Embodied AI. In this work, we study how to best ground a MLLM into different embodiments and their associated action spaces, with the goal of leveraging the multimodal world knowledge of the MLLM. We first generalize a number of methods through a unified architecture and the lens of action space adaptors. For continuous actions, we show that a learned tokenization allows for sufficient modeling precision, yielding the best performance on downstream tasks. For discrete actions…Apple Machine Learning Research

Temenos’ Barb Morgan Shares How Chatbots and AI Agents Are Reshaping Customer Service in Banking

In financial services, AI has traditionally been used primarily for fraud detection and risk modeling. With recent advancements in generative AI, the banking industry as a whole is becoming smarter and more intuitive, offering hyper-personalized services and real-time insights for customers.

In the latest episode of the NVIDIA AI Podcast, Barb Morgan, chief product and technology officer at banking and financial services technology company Temenos, shares how AI is reshaping the banking landscape, from enhancing customer experiences to ensuring robust data security.

Morgan explains that AI can tailor financial products and services to customer needs, making interactions more meaningful and relevant. Plus, AI-powered chatbots and digital interfaces can provide 24/7 support, addressing customer queries in real time.

The AI Podcast · Temenos’ Barb Morgan Shares How AI Is Reshaping Banking – Ep. 246

AI adoption has grown significantly in financial services. Notably, the use of generative AI for customer experience, especially through chatbots and virtual assistants, has more than doubled, rising from 25% to 60% over the last year. Learn more in NVIDIA’s fifth annual “State of AI in Financial Services” report.

And see more of the latest technological advancements by registering for NVIDIA GTC, the conference for the era of AI, taking place March 17-21. Temenos will share more insights and examples in the session titled, “Generative AI for Core Banking.”

Time Stamps

08:30 – How AI can help banks process and analyze vast amounts of data to provide deeper insights and predictions.

11:56 – The importance of data management for effective AI implementation.

16:13 – Sustainability in the banking industry, and how AI can help banks and customers track and reduce their carbon footprints.

You Might Also Like…

Firsthand’s Jon Heller Shares How AI Agents Enhance Consumer Journeys in Retail

Learn how AI agents are transforming the retail landscape by personalizing customer journeys, converting marketing interactions into valuable research data and enhancing the customer experience with hyper-personalized insights and recommendations.

Snowflake’s Baris Gultekin on Unlocking the Value of Data With Large Language Models

See how Snowflake’s AI Data Cloud platform helps enterprises unlock the value of data by transforming it into actionable insights and applications, using large language models.

Sequoia Capital’s Pat Grady and Sonya Huang on Generative AI

Hear how AI is revolutionizing art, design and media by enabling unique, personalized content creation at an unprecedented scale.

Subscribe to the AI Podcast

Get the AI Podcast through Amazon Music, Apple Podcasts, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, SoundCloud, Spotify, Stitcher and TuneIn.

Build verifiable explainability into financial services workflows with Automated Reasoning checks for Amazon Bedrock Guardrails

Foundational models (FMs) and generative AI are transforming how financial service institutions (FSIs) operate their core business functions. AWS FSI customers, including NASDAQ, State Bank of India, and Bridgewater, have used FMs to reimagine their business operations and deliver improved outcomes.

FMs are probabilistic in nature and produce a range of outcomes. Though these models can produce sophisticated outputs through the interplay of pre-training, fine-tuning, and prompt engineering, their decision-making process remains less transparent than classical predictive approaches. Although emerging techniques such as tool use and Retrieval Augmented Generation (RAG) aim to enhance transparency, they too rely on probabilistic mechanisms—whether in retrieving relevant context or selecting appropriate tools. Even methods such as attention visualization and prompt tracing produce probabilistic insights rather than deterministic explanations.

AWS customers operating in regulated industries such as insurance, banking, payments, and capital markets, where decision transparency is paramount, want to launch FM-powered applications with the same confidence of traditional, deterministic software. To address these challenges, we’re introducing Automated Reasoning checks in Amazon Bedrock Guardrails (preview.) Automated Reasoning checks can detect hallucinations, suggest corrections, and highlight unstated assumptions in the response of your generative AI application. More importantly, Automated Reasoning checks can explain why a statement is accurate using mathematically verifiable, deterministic formal logic.

To use Automated Reasoning checks, you first create an Automated Reasoning policy by encoding a set of logical rules and variables from available source documentation. Automated Reasoning checks can then validate that the questions (prompts) and the FM-suggested answers are consistent with the rules defined in the Automated Reasoning policy using sound mathematical techniques. This fundamentally changes the approach to a solution’s transparency in FM applications, adding a deterministic verification for process-oriented workflows common in FSI organizations.

In this post, we explore how Automated Reasoning checks work through various common FSI scenarios such as insurance legal triaging, underwriting rules validation, and claims processing.

What is Automated Reasoning and how does it help?

Automated Reasoning is a field of computer science focused on mathematical proof and logical deduction—similar to how an auditor might verify financial statements or how a compliance officer makes sure that regulatory requirements are met. Rather than using probabilistic approaches such as traditional machine learning (ML), Automated Reasoning tools rely on mathematical logic to definitively verify compliance with policies and provide certainty (under given assumptions) about what a system will or won’t do. Automated Reasoning checks in Amazon Bedrock Guardrails is the first offering from a major cloud provider in the generative AI space.

The following financial example serves as an illustration.

Consider a basic trading rule: “If a trade is over $1 million AND the client is not tier-1 rated, THEN additional approval is required.”

An Automated Reasoning system would analyze this rule by breaking it down into logical components:

Trade value > $1,000,000
Client rating ≠ tier-1
Result: Additional approval required

When presented with a scenario, the system can provide a deterministic (yes or no) answer about whether additional approval is needed, along with the exact logical path it used to reach that conclusion. For instance:

Scenario A – $1.5M trade, tier-2 client → Additional approval required (Both conditions met)
Scenario B – $2M trade, tier-1 client → No additional approval (Second condition not met)

What makes Automated Reasoning different is its fundamental departure from probabilistic approaches common in generative AI. At its core, Automated Reasoning provides deterministic outcomes where the same input consistently produces the same output, backed by verifiable proof chains that trace each conclusion to its original rules. This mathematical certainty, based on formal logic rather than statistical inference, enables complete verification of possible scenarios within defined rules (and under given assumptions).

FSIs regularly apply Automated Reasoning to verify regulatory compliance, validate trading rules, manage access controls, and enforce policy frameworks. However, it’s important to understand its limitations. Automated Reasoning can’t predict future events or handle ambiguous situations, nor can it learn from new data such as ML models. It requires precise, formal definition of rules and isn’t suitable for subjective decisions that require human judgment. This is where the combination of generative AI and Automated Reasoning come into play.

As institutions seek to integrate generative AI into their decision-making processes, Amazon Bedrock Guardrails Automated Reasoning checks provides a way to incorporate Automated Reasoning into the generative AI workflow. Automated Reasoning checks deliver deterministic verification of model outputs against documented rules, complete with audit trails and mathematical proof of policy adherence. This capability makes it particularly valuable for regulated processes where accuracy and governance are essential, such as risk assessment, compliance monitoring, and fraud detection. Most importantly, through its deterministic rule-checking and explainable audit trails, Automated Reasoning checks effectively address one of the major barriers to generative AI adoption: model hallucination, where models generate unreliable or unfaithful responses to the given task.

Using Automated Reasoning checks for Amazon Bedrock in financial services

A great candidate for applying Automated Reasoning in FSI is in scenarios where a process or workflow can be translated into a set of logical rules. Hard-coding rules as programmatic functions provides deterministic outcomes, but it becomes complex to maintain and requires highly structured inputs, potentially compromising the user experience. Alternatively, using an FM as the decision engine offers flexibility but introduces uncertainty. This is because FMs operate as black boxes where the internal reasoning process remains opaque and difficult to audit. In addition, the FM’s potential to hallucinate or misinterpret inputs means that conclusions would require human verification to verify accuracy.

Solution overview

This is where Automated Reasoning checks come into play. The following diagram demonstrates the workflow to combine generative AI and Automated Reasoning to incorporate both methods.

The following steps explain the workflow in detail:

The source document along with the intent instructions are passed to the Automated Reasoning checks service to build the rules and variables and create an Automated Reasoning checks policy.
An Automated Reasoning checks policy is created and versioned.
An Automated Reasoning checks policy and version is associated with an Amazon Bedrock guardrail.
An ApplyGuardrail API call is made with the question and an FM response to the associated Amazon Bedrock guardrail.
The Automated Reasoning checks model is triggered with the inputs from the ApplyGuardrail API, building logical representation of the input and FM response.
An Automated Reasoning check is completed based on the created rules and variables from the source document and the logical representation of the inputs.
The results of the Automated Reasoning check are shared with the user along with what rules, variables, and variable values were used in its determination, plus suggestions on what would make the assertion valid.

Prerequisites

Before you build your first Automated Reasoning check for Amazon Bedrock Guardrails, make sure you have the following:

An AWS account that provides access to AWS services, including Amazon Bedrock.
The new Automated Reasoning checks safeguard is available today in preview in Amazon Bedrock Guardrails in the US West (Oregon) AWS Region. Make sure that you have access to the Automated Reasoning checks preview within Amazon Bedrock. To request access to the preview today, contact your AWS account team. To learn more, visit Amazon Bedrock Guardrails.
An AWS Identity and Access Management (IAM) user set up for the Amazon Bedrock API and appropriate permissions added to the IAM user

Solution walkthrough

To build an Automated Reasoning check for Amazon Bedrock Guardrails, follow these steps:

On the Amazon Bedrock console, under Safeguards in the navigation pane, select Automated Reasoning.
Choose Create policy, as shown in the following screenshot.

On the Create policy section, shown in the following screenshot, enter the following inputs:

Name – Name of the Automated Reasoning checks policy.
Description – Description of the Automated Reasoning checks policy.
Source content – The document to create the rules and variables from. You need to upload a document in PDF format.
Intent – Instructions on how to approach the creation of the rules and variables.

The following sections dive into some example uses of Automated Reasoning checks.

Automated Reasoning checks for insurance underwriting rules validation

Consider a scenario for an auto insurance company’s underwriting rules validation process.

Underwriting is a fundamental function within the insurance industry, serving as the foundation for risk assessment and management. Underwriters are responsible for evaluating insurance applications, determining the level of risk associated with each applicant, and making decisions on whether to accept or reject the application based on the insurer’s guidelines and risk appetite.

One of the key challenges in underwriting is the process of rule validations, which is the verification that the information provided in the documents adheres to the insurer’s underwriting guidelines. This is a complex task that deals with unstructured data and varying document formats.

This example uses an auto insurance company’s underwriting rules guideline document. A typical underwriting manual can have rules to define unacceptable drivers, unacceptable vehicles, and other definitions, as shown in the following example:

Unacceptable drivers

Drivers with 3 or more DUIs.
For new business or additional drivers, drivers with 3 or more accidents, regardless of fault.
Drivers with more than 2 major violations.
Drivers with more than 3 chargeable accidents.
Military personnel not stationed in California.
Drivers 75 and older without a completed company Physician’s Report form.
Any driver disclosing physical or mental conditions that might affect the driver’s ability to safely operate a motor vehicle may be required to complete a company Physician’s Report form to verify their ability to drive. In addition, if in the course of an investigation we discover an undisclosed medical concern, a completed company Physician’s Report form will be required.
Any unlisted or undisclosed driver that is a household member or has regular use of a covered vehicle.

Unacceptable Vehicles

Vehicles principally garaged outside the state of California.
Vehicles with more or less than 4 wheels.
Vehicles with cargo capacity over 1 ton.
Motor vehicles not eligible to be licensed for highway use.
Taxicabs, limousines, emergency vehicles, escort vehicles, and buses.
Vehicles used for pickup or delivery of goods at any time including pizzas, magazines, and newspapers.
Vehicles used for public livery, conveyance, and company fleets.
Vehicles made available to unlisted drivers for any use including business use such as sales, farming, or artisan use (for example, pooled vehicles).
Vehicles used to transport nursery or school children, migrant workers, or hotel or motel guests.
Vehicles with permanent or removable business-solicitation logos or advertising.
Vehicles owned or leased by a partnership or corporation.
Step vans, panel vans, dump trucks, flatbed trucks, amphibious vehicles, dune buggies, motorcycles, scooters, motor homes, travel trailers, micro or kit cars, antique or classic vehicles, custom, rebuilt, altered or modified vehicles.
Physical damage coverage for vehicles with an ISO symbol of more than 20 for model year 2010 and earlier or ISO symbol 41 for model year 2011 and later.
Liability coverage for vehicles with an ISO symbol of more than 25 for vehicles with model year 2010 and earlier or ISO symbol 59 for model year 2011 and later.
Salvaged vehicles for comprehensive and collision coverage. Liability only policies for salvaged vehicles are acceptable.
Physical damage coverage for vehicles over 15 years old for new business or for vehicles added during the policy term.

For this example, we entered the following inputs for the Automated Reasoning check:

Name – Auto Policy Rule Validation.
Description – A policy document outlining the rules and criteria that define unacceptable drivers and unacceptable vehicles.
Source content – A document describing the companies’ underwriting manual and guidelines. You can copy and paste the example provided and create a PDF document. Upload this document as your source content.
Intent – Create a logical model for auto insurance underwriting policy approval. An underwriter associate will provide the driver profile and type of vehicle and ask whether a policy can be written for this potential customer. The underwriting guideline document uses a list of unacceptable driver profiles and unacceptable vehicles. Make sure to create a separate rule for each unacceptable condition listed in the document, and create a variable to capture whether the driver is an acceptable risk or not. A customer that doesn’t violate any rule is acceptable. Here is an example: ” Is the risk acceptable for a driver with the following profile? A driver has 4 car accidents, uses the car as a Uber-Taxi, and has 3 DUIs”. The model should determine: “The driver has unacceptable risks. Driving a taxi is an unacceptable risk. The driver has multiple DUIs.”

The model creates rules and variables from the source content. Depending on the size of the source content, this process may take more than 10 minutes.

The process of rule and variable creation is probabilistic in nature, and we highly recommend that you edit the created rules and variables to align better with your source content.

After the process is complete, a set of rules and variables will be created and can be reviewed and edited.

The following screenshots show an extract of the rules and variables created by the Automated Reasoning checks feature. The actual policy will have more rules and variables that can be viewed in Amazon Bedrock, but we’re not showing them here due to space limits.

The Automated Reasoning checks policy must be associated to an Amazon Bedrock guardrail. For more information, refer to Create a guardrail.

Test the policy

To test this policy, we considered a hypothetical scenario with an FM-generated response to validate.

Question: Is the risk acceptable for a driver with the following profile? Has 2 chargeable accidents in a span of 10 years. Driving records show a negligent driving charge and one DUI.

Answer: Driver has unacceptable risk. Number of chargeable accidents count is 2.

After entering the question and answer inputs, choose Submit, as shown in the following screenshot.

The Automated Reasoning check returned as Invalid, as shown in the following screenshot. The components shown in the screenshot are as follows:

Validation result – This is the Automated Reasoning checks validation output. This conclusion is reached by computing the extracted variable assignments against the rules defined in the Automated Reasoning policy.
Applied rules – These are the rules that were used to reach the validation result for this finding.
Extracted variables – This list shows how Automated Reasoning checks interpreted the input Q&A and used it to assign values to variables in the Automated Reasoning policy. These variable values are computed against the rules in the policy to reach the validation result.
Suggestions – When the validation result is invalid, this list shows a set of variable assignments that would make the conclusion valid. When the validation result is valid, this list shows a list of assignments that are necessary for the result to hold; these are unstated assumptions in the answer. You can use these values alongside the rules to generate a string that provides feedback to your FM.

The model evaluated the answer against the Automated Reasoning logical rules, and in this scenario the following rule was triggered:

“A driver is considered an acceptable risk if and only if their number of violations is less than or equal to 2.”

The Extracted variables value for violation_count is 2, and the is_acceptable_risk variable was set to false, which is wrong according to the Automated Reasoning logic. Therefore, the answer isn’t valid.

The suggested value for is_acceptable_risk is true.

Here is an example with a revised answer.

Question: Is the risk acceptable for a driver with the following profile? Has 2 chargeable accidents in a span of 10 years. Driving records show a negligent driving charge and one DUI.

Answer: Driver has acceptable risk.

Because no rules were violated, the Automated Reasoning logic determines the assertion is Valid, as shown in the following screenshot.

Automated Reasoning checks for insurance legal triaging

For the next example, consider a scenario where an underwriter is evaluating whether a long-term care (LTC) claim requires legal intervention.

For this example, we entered the following inputs:

Name – Legal LTC Triage
Description – A workflow document outlining the criteria, process, and requirements for referring LTC claims to legal investigation
Source content – A document describing your LTC legal triaging process. You need to upload your own legal LTC triage document in PDF format. This document should outline the criteria, process, and requirements for referring LTC claims to legal investigation.
Intent – Create a logical model that validates compliance requirements for LTC claims under legal investigation. The model must evaluate individual policy conditions including benefit thresholds, care durations, and documentation requirements that trigger investigations. It should verify timeline constraints, proper sequencing of actions, and policy limits. Each requirement must be evaluated independently, where a single violation results in noncompliance. For example: “A claim has two care plan amendments within 90 days, provider records covering 10 months, and a review meeting at 12 days. Is this compliant?” The model should determine: “Not compliant because: multiple amendments require investigation, provider records must cover 12 months, and review meetings must be within 10 days.”

The process of rule and variable creation is probabilistic in nature, and we highly recommend that you edit the created rules and variables to align better with your source content.

After the process is complete, a set of rules and variables will be created. To review and edit a rule or variable, select the more options icon under Actions and then choose Edit. The following screenshots show the Rules and Variables screens.

Test the policy

From here we can test out our Automated Reasoning checks in the test playground. Note: to do this, the Automated Reasoning checks policy must be associated to an Amazon Bedrock guardrail.To test this policy, we posed the following hypothetical scenario with an FM-generated response for the Automated Reasoning checks policy to validate.

Question: A claim with care duration of 28 months, no documentation irregularities, and total projected benefit value of $200,000 has been submitted. Does this require legal investigation?

Answer: This claim does not require legal investigation because the total projected benefit value is below $250,000 and there are no documentation irregularities.

After completing the check, the Automated Reasoning tool produces the validation result, which for this example was Invalid, as shown in the following screenshot. This means the FM generated response violates one or more rules from the generated Automated Reasoning checks policy.

The rule that was triggered was the following:

“A claim is flagged for legal investigation if and only if there are documentation irregularities, or the total projected benefit exceeds $250,000, or the care duration is more than 24 months, or the number of care plan amendments within a 90-day period is greater than 1.”

Based on our input the model determined our variable inputs to be:

	Name	Type	Value	Description
1	total_projected_benefit	Real number	200,000	The total projected monetary value of benefits for a long-term care claim
2	flag_for_legal_investigation	Boolean	FALSE	Indicates whether a claim should be flagged for legal investigation based on the specified criteria
3	has_documentation_irregularities	Boolean	FALSE	Presence of irregularities in the care provider’s documentation
4	care_duration_months	Integer	28	The length of time for which care is provided or expected to be provided

From this, we can determine where exactly our rule was found INVALID. Our input had care_duration_months > 24 months, and flag_for_legal_investigation was set as FALSE. This invalidated our rule.

In the suggestions, we observe that for our original Q&A to be correct, we’d have to have flag_for_legal_investigation as TRUE, along with the total_projected_benefit being 200,000.

We can validate whether the suggestion will yield a VALID response by adjusting our answer to the original question to the following.

“This claim does require legal investigation even though the total projected benefit value is below $250,000 and there are no documentation irregularities.”

As shown in the following screenshot, no rules were triggered. However, what changed is our extracted variables and our suggestions.

Now that the assertion is valid, we have the other requirements as unstated assumptions according to our rules to make sure that this is a VALID response. We can use suggestions to modify our response to the end user with more granular detail.

Automated Reasoning checks for insurance claims processing

The final example demonstrates an Automated Reasoning checks example for claims processing.

Claims processing is another fundamental function within insurance companies, and it’s the process used by policy holders to exercise their policy to get compensation for an event (a car accident, for example). Claims processors work to validate the claim and the beneficiaries, determine the amount of compensation, and work to settle the claim. This process includes verification of the people involved, proof of the incident, and a host of legal guidelines that they’re required to follow.

One of the key issues in claims processing is validating the claim and the parties involved. In this example, we use Automated Reasoning checks to provide recommendations to individuals attempting to file a claim in the case of a house fire.

As in the previous examples, we create an Automated Reasoning guardrail policy as follows:

Name – Home Owners Insurance Claims Policy
Description – This policy is used for the validation of homeowners’ insurance claims and includes the processes and procedures needed to file a claim.
Source content – A document describing the companies’ homeowners’ insurance claims process. This document should outline the necessary processes and procedures needed to file a claim.
Intent – Create a logical model that validates the requirements for homeowner claims. The model must evaluate individual policy conditions, including benefit thresholds, durations, and documentation requirements needed for the creation of a claim. It should verify timeline constraints, proper sequencing of actions, and policy limits. Each requirement must be evaluated independently, where any single violation results in noncompliance. For example: “I had a fire at my house. What documents do I need in order to file a claim?” The model should determine: “You will need to provide a fire department report, police report, photos, and your policy number.”

Test the policy

To test this policy, we considered a hypothetical scenario with an FM-generated response to validate.

Question: I had a fire at my house. What documents do I need to file a claim?

Answer: You provide a report from the fire department, a police report, photos, and policy number.

In this case, the Automated Reasoning check returned as Valid, as shown in the following screenshot. Automated Reasoning checks validated that the answer is correct and aligns to the provided claims processing document.

Conclusion

In this post, we demonstrated that Automated Reasoning checks solve a core challenge within FMs: the ability to verifiably demonstrate the reasoning for decision-making. By incorporating Automated Reasoning checks into our workflow, we were able to validate a complex triage scenario and determine the exact reason for why a decision was made. Automated Reasoning is deterministic, meaning that with the same ruleset, same variables, and same input and FM response, the determination will be reproducible. This means you can the reproduce findings for compliance or regulatory reporting.

Automated Reasoning checks in Amazon Bedrock Guardrails empowers financial service professionals to work more effectively with generative AI by providing deterministic validation of FM responses for decision-oriented documents. This enhances human decision-making by reducing hallucination risk and creating reproducible, explainable safeguards that help professionals better understand and trust FM-generated insights.

The new Automated Reasoning checks safeguard is available today in preview in Amazon Bedrock Guardrails in the US West (Oregon) AWS Region. We invite you to build your first Automated Reasoning checks. For detailed guidance, visit our documentation and code examples in our GitHub repo. Please share your experiences in the comments or reach out to the authors with questions. Happy building!

About the Authors

Alfredo Castillo is a Senior Solutions Architect at AWS, where he works with Financial Services customers on all aspects of internet-scale distributed systems, and specializes in Machine learning, Natural Language Processing, Intelligent Document Processing, and GenAI. Alfredo has a background in both electrical engineering and computer science. He is passionate about family, technology, and endurance sports.

Andy Hall is a Senior Solutions Architect with AWS and is focused on helping Financial Services customers with their digital transformation to AWS. Andy has helped companies to architect, migrate, and modernize large-scale applications to AWS. Over the past 30 years, Andy has led efforts around Software Development, System Architecture, Data Processing, and Development Workflows for large enterprises.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Best practices for Amazon SageMaker HyperPod task governance

At AWS re:Invent 2024, we launched a new innovation in Amazon SageMaker HyperPod on Amazon Elastic Kubernetes Service (Amazon EKS) that enables you to run generative AI development tasks on shared accelerated compute resources efficiently and reduce costs by up to 40%. Administrators can use SageMaker HyperPod task governance to govern allocation of accelerated compute to teams and projects, and enforce policies that determine the priorities across different types of tasks. The resulting improvement in utilization of compute resources enables organizations to focus on accelerating their generative AI innovation and time to market, instead of spending time coordinating resource allocation and continuously replanning their generative AI development tasks.

In this post, we provide best practices to maximize the value of SageMaker HyperPod task governance and make the administration and data science experiences seamless. We also discuss common governance scenarios when administering and running generative AI development tasks.

Prerequisites

To get started with SageMaker HyperPod task governance on an existing SageMaker HyperPod cluster orchestrated by Amazon EKS, make sure you uninstall any existing Kueue installations, and have a Kubernetes cluster running version 1.30+.

Administration experience

Administrators are the first persona interacting with SageMaker HyperPod task governance. They are responsible for managing the cluster compute allocation according to the organization’s priorities and goals.

Managing compute

The first step to managing capacity across teams is to set up compute allocations. When setting up a compute allocation, keep in mind the following considerations:

What type of tasks does this team typically run?
Does this team constantly run tasks and require reserved capacity?
What is this team’s priority relative to other teams?

When setting up a compute allocation, an administrator sets the team’s fair-share weight, which provides relative prioritization comparative to other teams when vying for the same idle compute. Higher weight enables a team to access unutilized resources within shared capacity sooner. As a best practice, set the fair-share weight higher for teams that will require access to capacity sooner than other teams.

After the fair-share weight is set, the administrator then sets up the quota and borrowing strategy. Quota determines the allocation per instance type within the cluster’s instance groups. Borrowing strategy determines whether a team will share or reserve their allotted capacity. To enforce proper quota management, the total reserved quota should not surpass the cluster’s available capacity for that resource. For instance, if a cluster comprises 20 ml.c5.2xlarge instances, the cumulative quota assigned to teams should remain under 20.

If the compute allocations for teams allow for “Lend and Borrow” or “Lend,” the idle capacity is shared between these teams. For example, if Team A has a quota of 6 but is using only 2 for its tasks, and Team B has a quota of 5 and is using 4 for its tasks, and a task that is submitted to Team B requiring 4 resources, 3 will be borrowed from Team A based on its “Lend and Borrow” settings. If any team’s compute allocations setting is set to “Don’t Lend,” the team will not be able to borrow any additional capacity beyond its reserved capacity.

To maintain a pool or a set of resources that all teams can borrow from, users can set up a dedicated team with resources that bridge the gap between other teams’ allocations and the total cluster capacity. Make sure that this cumulative resource allocation includes the appropriate instance types and doesn’t exceed the total cluster capacity. To make sure that these resources can be shared among teams, enable the participating teams to have their compute allocations set to “Lend and Borrow” or “Lend” for this common pool of resources. In addition, every time new teams are introduced, or quota allocations are changed or there are any changes to the cluster capacity, revisit the quota allocations of all the teams, to make sure the cumulative quota remains at or below cluster capacity.

After compute allocations have been set, the administrator will also need to set a cluster policy, which is comprised of two components: task prioritization and idle compute allocation. Administrators will set up a task prioritization, which determines the priority level for tasks running in a cluster. Next, an administrator will set idle compute allocation setting to either “first come, first serve,” in which tasks are not prioritized, or “fair-share allocation,” in which idle compute is distributed to teams based on their fair-share weight.

Observability

To get started with observability, install the Amazon CloudWatch Observability add-on with Kueue metrics selected. The SageMaker HyperPod task governance dashboard provides a single pane of glass view for cluster utilization across teams. At present, you can view tasks running for PyTorch, TensorFlow, and MPI tasks. Administrators can analyze the graphs within the dashboard to understand equity in resource sharing and utilization of resources.

To view utilization of resources, users can see the following dashboard showing GPU and vCPU utilization. These graphs inform administrators where teams can further maximize their GPU utilization. In this example, administrators observe GPU utilization around 52%

Administrators have a real-time view of utilization of instances as tasks are running or moved to pending during preemption. In this example, the ML engineering team is borrowing 5 GPUs for their training task

With SageMaker HyperPod, you can additionally set up observability tools of your choice. In our public workshop, we have steps on how to set up Amazon Managed Prometheus and Grafana dashboards.

Data scientist experience

Data scientists are the second persona interacting with SageMaker HyperPod clusters. Data scientists are responsible for the training, fine-tuning, and deployment of models on accelerated compute instances. It’s important to make sure data scientists have the necessary capacity and permissions when interacting with clusters of GPUs.

Access control

When working with SageMaker HyperPod task governance, data scientists will assume their specific role. Each data science team will need to have their own role and associated role-based access control (RBAC) on the cluster. RBAC prevents data scientists from submitting tasks to teams in which they do not belong. For more information about data science role permissions, see AWS Identity and Access Management for SageMaker HyperPod. As a best practice, administrators should limit data scientists according to the principle of least privilege. After roles and access entries are set up, data scientists can assume their associated AWS Identity and Access Management (IAM) role to submit tasks to corresponding namespaces. It’s important to note that users interacting with the console dashboard who didn’t create the associated EKS cluster will need to have their role added to the AccessEntry list for the EKS cluster.

Submitting tasks

There are two ways to submit tasks on Amazon EKS orchestrated SageMaker HyperPod clusters: kubectl and the SageMaker HyperPod CLI. With both options, data scientists will need to reference their team’s namespace and task priority class in the task configuration file in order to use their allocated quota with appropriate prioritization. If the user doesn’t a specify priority class, then SageMaker HyperPod task governance will automatically assume the lowest priority.

In the following code snippet, we show the labels required in a kubectl manifest file for the researchers namespace with inference priority. Priority classes will have -priority appended to the name set in the cluster policy. For further guidance on submitting tasks to SageMaker HyperPod task governance, follow the documentation here.

metadata:
    name: job-name
    namespace: hyperpod-ns-researchers
    labels:
        kueue.x-k8s.io/queue-name: hyperpod-ns-researchers-localqueue
        kueue.x-k8s.io/priority-class: inference-priority

HyperPod CLI

The HyperPod CLI was created to abstract the complexities of working with kubectl and enable developers using SageMaker HyperPod to iterate faster with custom commands. HyperPod CLI v2.0.0 introduces a new default scheduler type with autofill commands, auto discovery of namespaces, improved cluster and task management features, and enhanced visibility into task priorities and accelerator quota allocations. Data scientists can use the new HyperPod CLI to quickly submit tasks, iterate, and experiment in their generative AI development lifecycle.

Sample commands

The following is a short reference guide for helpful commands when interacting with SageMaker HyperPod task governance:

Describing cluster policy with the AWS CLI – This AWS Command Line Interface (AWS CLI) command is useful to view the cluster policy settings for your cluster.
List compute quota allocations with the AWS CLI – This AWS CLI command is useful to view the different teams and set up task governance and their respective quota allocation settings.
HyperPod CLI – The HyperPod CLI abstracts common kubectl commands used to interact with SageMaker HyperPod clusters such as submitting, listing, and cancelling tasks. Refer to the for a full list of commands.
kubectl – You can also use kubectl to interact with task governance with the following example commands:
- kubectl get pytorchjobs -n hyperpod-ns-<team-name> This command shows you the PyTorch tasks running in the specified team namespace.
- kubectl get workloads -n hyperpod-ns-<team-name> / kubectl describe workload <workload-name> -n hyperpod-ns-<team-name> – These commands show the workloads running in your cluster per namespace and provide detailed reasonings on Kueue Admission. You can use these commands to answer questions such as “Why was my task preempted?” or “Why did my task get admitted?”

Common scenarios

SageMaker HyperPod task governance enables allocating compute quota to teams, increasing utilization of compute resources, reducing costs, and accelerating waiting tasks by priority and in turn accelerating time to market. To relate these value propositions to real work scenarios, we will talk about a enterprise and a startup situation.

Enterprises have different teams working towards various business goals, each with budgets that limit their compute access. To maximize resource utilization within budget constraints, SageMaker HyperPod task governance allows enterprises to allocate compute quotas to teams for artificial intelligence and machine learning (AI/ML) tasks. When teams use up their allocation, they can access idle compute from other teams to accelerate waiting tasks, providing optimal resource utilization across the organization.

Startups aim to maximize compute resource utilization while achieving timely allocation for high-priority tasks. SageMaker HyperPod task governance’s prioritization feature allows you to assign priorities to different task types, such as prioritizing inference over training. This makes sure that high-priority tasks receive necessary compute resources before lower-priority ones, optimizing overall resource allocation.

Now we will walk you through two common scenarios for users interacting with SageMaker HyperPod task governance.

Scenario 1: Enterprise

In the first scenario, we have an enterprise company who wants to manage compute allocations to optimize for cost. This company has five teams sharing 80 GPUs, with the following configuration:

Team 1 – Compute allocation: 20; Strategy: Don’t Lend
Team 2 – Compute allocation: 20; Strategy: Don’t Lend
Team 3 – Compute allocation: 5; Strategy: Lend & Borrow at 150%; Fair-share weight: 100
Team 4 – Compute allocation: 10; Strategy: Lend & Borrow at 100%; Fair-share weight: 75
Team 5 – Compute allocation: 25; Strategy: Lend & Borrow at 50%; Fair-share weight: 50

This sample configuration reserves capacity to teams that will be constantly using instances for high-priority tasks. In addition, a few teams have the option to lend and borrow idle compute from other teams—this improves cost optimization by reserving capacity as needed and allowing non-consistent workloads to run using available idle compute with prioritization.

Scenario 2: Startup

In the second scenario, we have a startup customer who wants to provide equitable compute allocation for members of their engineering and research teams. This company has three teams sharing 15 GPUs:

Team 1 (ML engineering) – Compute allocation: 6; Strategy: Lend & Borrow at 50%; Fair-share weight: 100
Team 2 (Researchers) – Compute allocation: 5; Strategy: Lend & Borrow at 50%; Fair-share weight: 100
Team 3 (Real-time chatbot) – Compute allocation: 4; Strategy: Don’t Lend; Fair-share weight: 100

This sample configuration promotes equitable compute allocation across the company because all teams have the same fair-share weight and are able to preempt tasks with lower priority.

Conclusion

In this post, we discussed best practices for efficient use of SageMaker HyperPod task governance. We also provided certain patterns that you can adopt while administering generative AI tasks, whether you are aiming to optimize for cost or optimize for equitable compute allocation. To get started with SageMaker HyperPod task governance, refer to the Amazon EKS Support in Amazon SageMaker HyperPod workshop and SageMaker HyperPod task governance.

About the Author

Nisha Nadkarni is a Senior GenAI Specialist Solutions Architect at AWS, where she guides companies through best practices when deploying large scale distributed training and inference on AWS. Prior to her current role, she spent several years at AWS focused on helping emerging GenAI startups develop models from ideation to production.

Chaitanya Hazarey leads software development for SageMaker HyperPod task governance at Amazon, bringing extensive expertise in full-stack engineering, ML/AI, and data science. As a passionate advocate for responsible AI development, he combines technical leadership with a deep commitment to advancing AI capabilities while maintaining ethical considerations. His comprehensive understanding of modern product development drives innovation in machine learning infrastructure.

Kareem Syed-Mohammed is a Product Manager at AWS. He is focused on compute optimization and cost governance. Prior to this, at Amazon QuickSight, he led embedded analytics, and developer experience. In addition to QuickSight, he has been with AWS Marketplace and Amazon retail as a Product Manager. Kareem started his career as a developer for call center technologies, Local Expert and Ads for Expedia, and management consultant at McKinsey.

Use Lens to search your screen while you browse on iOS

Use Google Lens to search your screen within the Google app or Chrome on iOS. Plus, AI Overviews are coming to more Lens queries.Read More

Introducing Muse: Our first generative AI model designed for gameplay ideation

Three white gaming icons on a green and blue gradient background.

Today, the journal Nature (opens in new tab) is publishing our latest research, which introduces the first World and Human Action Model (WHAM). The WHAM, which we’ve named “Muse,” is a generative AI model of a video game that can generate game visuals, controller actions, or both.

The paper in Nature offers a detailed look at Muse, which was developed by the Microsoft Research Game Intelligence (opens in new tab) and Teachable AI Experiences (opens in new tab) (Tai X) teams in collaboration with Xbox Games Studios’ Ninja Theory (opens in new tab). Simultaneously, to help other researchers explore these models and build on our work, we are open sourcing the weights and sample data and making the executable available for the WHAM Demonstrator—a concept prototype that provides a visual interface for interacting with WHAM models and multiple ways of prompting the models. Developers can learn and experiment with the weights, sample data, and WHAM Demonstrator on Azure AI Foundry (opens in new tab).

In our research, we focus on exploring the capabilities that models like Muse need to effectively support human creatives. I’m incredibly proud of our teams and the milestone we have achieved, not only by showing the rich structure of the game world that a model like Muse can learn, as you see in the video demo below, but also, and even more importantly, by demonstrating how to develop research insights to support creative uses of generative AI models.

Generated gameplay examples

10 seconds video generated by Muse. The character Gizmo from the game Bleeding Edge is attacking an enemy player, jumps forward, and then turns around.

10 seconds video generated by Muse. The character Daemon from the game Bleeding Edge destroys a Cannister, and then collects the Power Cell within. Daemon then mounts their hoverboard and moves towards another set of Cannisters to destroy them.

10 seconds video generated by Muse. The character Gizmo from the game Bleeding edge is moving forward on a hoverboard towards a group of enemies.

10 seconds video generated by Muse. The character Zero Cool from the game Bleeding edge is moving forward up a set of stairs towards a group of enemies. They then activate their ability to jump up to a higher platform.

10 seconds video generated by Muse. The character Nidhoggr from the game Bleeding edge is navigating through the game map.

10 seconds video generated by Muse. The character Makuto from the game Bleeding edge is being healed by an ally whilst they dash forwards.

10 seconds video generated by Muse. The character Miko from the game Bleeding edge is on a hoverboard moving towards a group of Cannisters.

10 seconds video generated by Muse. The character Buttercup from the game Bleeding edge is attacking players from the opposing team.

10 seconds video generated by Muse. The character Makuto from the game Bleeding edge is fleeing from a fight with enemy players.

Example gameplay sequences generated by Muse (based on WHAM-1.6B) demonstrate that our model can generate complex gameplay sequences that are consistent over several minutes. All examples shown here were generated by prompting the model with 10 initial frames (1 second) of human gameplay and the controller actions of the whole play sequence. Muse is used in “world model mode” meaning that it is used to predict how the game will evolve from the initial prompt sequence. The more closely the generated gameplay sequence resembles the actual game, the more accurately Muse has captured the dynamics of that game.

What motivated this research?

As we release our research insights and model today, I keep thinking back to how this all started. There was a key moment back in December 2022 that I remember clearly. I had recently returned from maternity leave, and while I was away the machine learning world had changed in fundamental ways. ChatGPT had been publicly released, and those who had tried it were in awe of OpenAI’s technical achievements and the model’s capabilities. It was a powerful demonstration of what transformer-based generative models could do when trained on large amounts of (text) data. Coming back from leave at that moment, the key question on my mind was, “What are the implications of this achievement for our team’s work at the intersection of artificial intelligence and video games?”

A new research opportunity enabled by data

In our team, we had access to a very different source of data. For years, we had collaborated with Xbox Game Studios’ Ninja Theory (based in Cambridge, UK, just like our research team) to collect gameplay data from Bleeding Edge, their 2020 Xbox game. Bleeding Edge is a 4-versus-4 game where all games are played online, and matches are recorded if the player agrees to the End User License Agreement (EULA). We worked closely with our colleagues at Ninja Theory and with Microsoft compliance teams to ensure that the data was collected ethically and used responsibly for research purposes.

“It’s been amazing to see the variety of ways Microsoft Research has used the Bleeding Edge environment and data to explore novel techniques in a rapidly moving AI industry,” said Gavin Costello, technical director at Ninja Theory. “From the hackathon that started it all, where we first integrated AI into Bleeding Edge, to building AI agents that could behave more like human players, to the World and Human Action Model being able to dream up entirely new sequences of Bleeding Edge gameplay under human guidance, it’s been eye-opening to see the potential this type of technology has.”

Muse Training Data

Current Muse instances were trained on human gameplay data (visuals and controller actions) from the Xbox game Bleeding Edge – shown here at the 300×180 px resolution at which we train current models. Muse (using WHAM-1.6B) has been trained on more than 1 billion images and controller actions, corresponding to over 7 years of continuous human gameplay.

The Game Intelligence and Teachable AI Experiences teams playing the Bleeding Edge game together.

Until that point in late 2022, we had used Bleeding Edge as a platform for human-like navigation experiments, but we had not yet made meaningful use of the large amount of human player data we now had available. With the powerful demonstration of text-models, the next question was clear: “What could we achieve if we trained a transformer-based model on large amounts of human gameplay data?”

Scaling up model training

As the team got to work, some of the key challenges included scaling up the model training. We initially used a V100 cluster, where we were able to prove out how to scale up to training on up to 100 GPUs; that eventually paved the way to training at scale on H100s. Key design decisions we made early focused on how to best leverage insights from the large language model (LLM) community and included choices such as how to effectively represent controller actions and especially images.

The first sign that the hard work of scaling up training was paying off came in the form of a demo that thoroughly impressed me. Tim Pearce, at that time a researcher in Game Intelligence, had put together examples of what happened early versus later in training. You can see the demo here – it’s like watching the model learn. This led to our follow-up work showing how scaling laws emerge in these kinds of models.

Muse consistency over the course of training

Comparing ground truth human gameplay (left) to visuals generated using Muse (using WHAM-206M) when prompted with 1 second of human gameplay (visuals and controller actions) and 9 seconds of controller actions from the ground truth. In this setting, if Muse can generate visuals that closely match the ground truth, then it has captured the game dynamics. We see that the quality of generated visuals improves visibly over the course of training. In early training (10k training updates) we see signs of life, but quality deteriorates quickly. After 100k training updates, the model is consistent over time but does not yet capture relatively less frequent aspects of the game dynamics, such as the flying mechanic. Consistency with the ground truth continues to improve with additional training, e.g., the flying mechanic is captured after 1M training updates.

Ground truth Human gameplay	Game visuals generated by Muse with 206M parameters Conditioned on 1 second of real gameplay and 9 seconds of actions
Original	10k training updates	100k training updates	1M training updates

Character recognizable
Basic movements and geometry
No degeneration over time	✘
Correct interaction with power cell	✘	✘
Models flying mechanic correctly	✘	✘

Multidisciplinary collaboration: Involving users from the beginning

We had started to investigate how to evaluate these types of models early on. For example, we wanted to understand the representations learned using linear probing, which was driven by Research Intern Gunshi Gupta and Senior Research Scientist Sergio Valcarcel Macua; to explore online evaluation, driven by Senior Research Scientist Raluca Georgescu; and to generate both visuals and actions, initially termed “full dreaming” and driven by Research Intern Tarun Gupta. But working through how to systematically evaluate Muse required a much broader set of insights. More importantly, we needed to understand how people might use these models in order to know how to evaluate them.

This was where the opportunity for multidisciplinary research became crucial. We had discussed aspects of this work with Senior Principal Research Manager Cecily Morrison and her Teachable AI Experiences team for several months. And we had already partnered on an engagement with game creatives (driven by Cecily, Design Researcher Linda Wen, and Principal Research Software Development Engineer Martin Grayson) to investigate how game creators would like to use generative AI capabilities in their creative practice.

“It was a great opportunity to join forces at this early stage to shape model capabilities to suit the needs of creatives right from the start, rather than try to retrofit an already developed technology,” Cecily said.

Linda offered some valuable insights about how we approached the work: “We’ve seen how technology-driven AI innovation has disrupted the creative industry—often catching creators off guard and leaving many feeling excluded,” she said. “This is why we invited game creators to help us shape this technology from the start. Recognizing that most AI innovations are developed in the Global North, we also made it a priority to recruit game creators from underrepresented backgrounds and geographies. Our goal was to create a technology that benefits everyone—not just those already in positions of privilege.”

Unlocking new creative use cases with the WHAM Demonstrator

Now, with the model’s emerging capabilities and user insights in mind, it was time to put all the pieces together. The teams joined forces during a Microsoft internal hackathon to explore new interaction paradigms and creative uses that Muse could unlock. As a result, we developed a prototype that we call the WHAM Demonstrator, which allows users to directly interface with the model.

“The Global Hackathon was the perfect opportunity for everyone to come together and build our first working prototype,” Martin said. “We wanted to develop an interface for the WHAM model that would allow us to explore its creative potential and start to test ideas and uses we had learned from our interviews with game developers.”

WHAM Demonstrator

For interacting with World and Human Action Models like Muse, the WHAM Demonstrator provides a visual interface for interacting with a WHAM instance.

In this example, the user is loading a visual as an initial prompt to the model, here a single promotional image for the game Bleeding Edge. They use Muse to generate multiple potential continuations from this starting point.

The user explores the generated sequences and can tweak them, for example using a game controller to direct the character. These features demonstrate how Muse’s capabilities can enable iteration as part of the creative process.

Identifying key capabilities and how to evaluate them

The hands-on experience of exploring Muse capabilities with the WHAM Demonstrator, and drawing on insights we gained from the user study, allowed us to systematically identify capabilities that game creatives would require to use generative models like Muse. This in turn allowed us to establish evaluation protocols for three key capabilities: consistency, diversity, and persistency. Consistency refers to a model’s ability to generate gameplay sequences that respect the dynamics of the game. For example, the character moves consistently with controller actions, does not walk through walls, and generally reflects the physics of the underlying game. Diversity refers to a model’s ability to generate a range of gameplay variants given the same initial prompt, covering a wide range of ways in which gameplay could evolve. Finally, persistency refers to a model’s ability to incorporate (or “persist”) user modifications into generated gameplay sequences, such as a character that is copy-pasted into a game visual. We give an overview of these capabilities below.

Muse evaluation of consistency, diversity and persistency

Consistency

We evaluate consistency by prompting the model with ground truth gameplay sequences and controller actions, and letting the model generate game visuals. The videos shown here are generated using Muse (based on WHAM-1.6B) and demonstrate the model’s ability to generate consistent gameplay sequences of up to two minutes. In our paper, we also compare the generated visuals to the ground truth visuals using FVD (Fréchet Video Distance), an established metric in the video generation community.

Diversity

Muse (based on WHAM-1.6B) generated examples of behavioral and visual diversity, conditioned on the same initial 10 frames (1 second) of real gameplay. The three examples at the top show behavioral diversity (diverse camera movement, loitering near the spawn location, and navigating various paths to the middle jump pad). The three examples below show visual diversity (different hoverboards for the character). In the paper, we also quantitatively assess diversity using the Wasserstein distance, a measure of distance between two distributions, to compare the model-generated sequences to the diversity reflected in human gameplay recordings. Muse generated examples of behavioral and visual diversity, conditioned on the same 10 frames of real gameplay. Three examples of behavioral diversity show diverse camera movement, loitering near the spawn location, and navigating various paths to the middle jump pad. Three examples of visual diversity show different hoverboards for the character.

With our evaluation framework in place, and access to an H100 compute allocation, the team was able to further improve Muse instances, including higher resolution image encoders (our current models generate visuals at a resolution of 300×180 pixels, up from the 128×128 resolution of our earliest models) and larger models, and expand to all seven Bleeding Edge maps. To show some of the capabilities of the model we are publishing today, we have included videos of 2-minute-long generated gameplay sequences above, which give an impression of the consistency and diversity of gameplay sequences that the model can generate.

According to Senior Researcher Tabish Rashid: “Being handed an allocation of H100s was initially quite daunting, especially in the early stages figuring out how to make best use of it to scale to larger models with the new image encoders. After months of experimentation, it was immensely rewarding to finally see outputs from the model on a different map (not to knock the lovely greenery of Skygarden) and not have to squint so much at smaller images. I’m sure at this point many of us have watched so many videos from Muse that we’ve forgotten what the real game looks like.”

One of my favorite capabilities of the model is how it can be prompted with modifications of gameplay sequences and persist newly introduced elements. For example, in the demo below, we’ve added a character onto the original visual from the game. Prompting the model with the modified visual, we can see how the model “persists” the added character and generates plausible variants of how the gameplay sequence could have evolved from this modified starting point.

Persistency

Demonstrations of how Muse (based on WHAM-1.6B) can persist modifications. A visual is taken from the original gameplay data and an image of an additional character is edited into the image. The generated gameplay sequence shows how the character is adapted into the generated gameplay sequence.

Conclusion

Today, our team is excited to be publishing our work in Nature and simultaneously releasing Muse open weights, the WHAM Demonstrator, and sample data to the community.

I look forward to seeing the many ways in which the community will explore these models and build on our research. I cannot wait to see all the ways that these models and subsequent research will help shape and increase our understanding of how generative AI models of human gameplay may support gameplay ideation and pave the way for future, novel, AI-based game experiences, including the use cases that our colleagues at Xbox (opens in new tab) have already started to explore.

The post Introducing Muse: Our first generative AI model designed for gameplay ideation appeared first on Microsoft Research.

Ideas: Quantum computing redefined with Chetan Nayak

Outline illustration of Chetan Nayak | Ideas podcast

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Dr. Chetan Nayak, a technical fellow focused on quantum hardware at Microsoft. As a preteen, Nayak became engrossed in the world of scientific discovery, “accidentally exposed,” he says, to the theory of relativity, advanced mathematics, and the like while exploring the shelves of his local bookstores. In studying these big ideas, he began to develop his own understanding of the forces and phenomena at work around us and ultimately realized he could make his own unique contributions, which have since included advancing the field of quantum computing. Nayak examines the defining moments in the history of quantum computing; explains why we still need quantum computing, even with the rise of generative AI; and discusses how Microsoft Quantum is re-engineering the quantum computer with the creation of the world’s first topoconductor and first quantum processing unit (QPU) architecture with a topological core, called the Majorana 1.

Learn more:

Interferometric Single-Shot Parity Measurement in InAs-Al Hybrid Devices (opens in new tab)
Publication, February 2025
Roadmap to fault tolerant quantum computation using topological qubit arrays (opens in new tab)
Publication, February 2025
Microsoft unveils Majorana 1, the world’s first quantum processor powered by topological qubits (opens in new tab)
Microsoft Quantum blog, February 2025
Microsoft’s Majorana 1 chip carves a new path for quantum computing (opens in new tab)
Microsoft Source blog, February 2025
InAs-Al Hybrid Devices Passing the Topological Gap Protocol
Publication and video, 2022
A discussion with Sankar Das Sarma and Chetan Nayak
Microsoft Research video, March 2022
The Map of Topological Quantum Computing (opens in new tab)
Microsoft Quantum video
Innovator series: Why and what is the future of the topological qubit with Dr. Chetan Nayak (opens in new tab)
Microsoft Quantum video
Microsoft Quantum (opens in new tab)
Homepage

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

CHETAN NAYAK: People sometimes say, well, quantum computers are just going to be like classical computers but faster. And that’s not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest today is Dr. Chetan Nayak, a technical fellow of Quantum Hardware at Microsoft Quantum. Under Chetan’s leadership, the Microsoft Quantum team has published a paper that demonstrates a fundamental operation for a scalable topological quantum computer. The team also announced the creation of the world’s first topoconductor—more on that later—and first QPU architecture with a topological core, called the Majorana 1. Chetan Nayak, I can’t wait to find out what all of this is … welcome to Ideas!

CHETAN NAYAK: Thank you. Thanks for having me. And I’m excited to tell you about this stuff.

HUIZINGA: Well, you have a huge list of accomplishments, accolades, and awards—little alliteration there. But I want to start by getting to know a bit more about you and what got you there. So specifically, what’s your “research origin story,” as it were? What big idea inspired you to study the smallest parts of the universe?

NAYAK: It’s a great question. I think if I really have to go back to the origin story, it starts when I was a kid, you know, probably a preteen. And, you know, I’d go to bookstores to … I know, I guess many of the people listening to this may not know what that is, [LAUGHTER] but there used to be these brick-and-mortar storefronts where they would sell books, physical books, …

HUIZINGA: Right.

NAYAK: … and I’d go to bookstores to, you know, to buy books to read, you know, fiction. But I would browse through them, and there’d be a nonfiction section. And often there’d be used books, you know, sometimes used textbooks or used popular science books. And I remember, even though they were bookstores, not libraries, I would spend a lot of time there leafing through books and got exposed to—accidentally exposed to—a lot of ideas that I wouldn’t otherwise have been. You know, just, sort of, you know, I maybe went there, you know, looking to pick up the next Lord of the Rings book, and while I was there, you know, wander into a book that was sort of explaining the theory of relativity to non-scientists. And I remember leafing through those books and actually reading about Einstein’s discoveries, you know, most famously E = mc², but actually a lot of those books were explaining these thought experiments that Einstein did where he was thinking about, you know, if he were on a train that were traveling at the speed of light, what would light look like to him? [LAUGHTER] Would he catch up to it? You know, and all these incredible thought experiments that he did to try to figure out, you know, to really play around with the basic laws as they were currently understood, of physics, and by, you know, stretching and pulling them and going into extreme … taking them to extreme situations, you could either find the flaws in them or in some cases see what the next steps were. And that was, you know, really inspirational to me. I, you know, around the same time, also started leafing through various advanced math books and a little later picked up a book on calculus and started flipping through it, used book with, like, you know, the cover falling apart and the pages starting to fall out. But there was a lot of, you know, accidental discovery of topics through wandering through bookstores, actually. I also, you know, went to this great magnet high school in New York City called Stuyvesant High School, where I was surrounded by people who were really interested in science and math and technology. So I think, you know, for me, that origin story really starts, you know, maybe even earlier, but at least in my preteen years when, you know, I went through a process of learning new things and trying to understand them in my own way. And the more you do that, eventually you find maybe you’re understanding things in a little different way than anybody else ever did. And then pretty soon, you know, you’re discovering things that no one’s ever discovered before. So that’s, sort of, how it started.

HUIZINGA: Yeah. Well, I want to drill in a little bit there because you’ve brought to mind a couple of images. One is from a Harry Potter movie, And the Half-Blood Prince, where he discovers the potions handbook, but it’s all torn up and they were fighting about who didn’t get that book. And it turned out to be … so there’s you in a bookstore somewhere between the sci-fi and the non-fi, shall we call it. And you’re, kind of, melding the two together. And I love how you say, I was accidentally exposed. [LAUGHTER] Sounds kind of like radiation of some kind and you’ve turned into a scientist. A little bit more on that. This idea of quantum, because you’ve mentioned Albert Einstein, there’s quantum physics, quantum mechanics, now quantum computing. Do these all go together? I mean, what came out of what in that initial, sort of, exploration with you? Where did you start getting interested in the quantum of things?

NAYAK: Yeah, so I definitely started with relativity, not quantum. That was the first thing I heard about. And I would say in a lot of ways, that’s the easier one. I mean, those are the two big revolutions in physics in the 20th century, relativity and quantum theory, and quantum mechanics is by far, at least for me and for many people, the harder one to get your head around because it is so counterintuitive. Quantum mechanics in some sense, or quantum theory in some sense, for most of what we experience in the world is down many abstraction layers away from what we experience. What I find amazing is that the people who created, you know, discovered quantum mechanics, they had nothing but the equations to guide them. You know, they didn’t really understand what they were doing. They knew that there were some holes or gaps in the fundamental theory, and they kind of stumbled into these equations, and they gave the right answers, and they just had to follow it. I was actually just a few weeks ago, I was in Arosa, which is a small Swiss town in the Alps. That’s actually the town where Schrödinger discovered Schrödinger’s equation.

HUIZINGA: No!

NAYAK: Yeah, a hundred years ago, this summer …

HUIZINGA: Amazing!

NAYAK: So Schrödinger suffered tuberculosis, which eventually actually killed him much later in his life. And so he went into the mountains …

HUIZINGA: … for the cure.

NAYAK: … for his health, yeah, to a sanatorium to recover from tuberculosis. And while he was there in Arosa, he discovered his equation. And it’s a remarkable story because, you know, that equation, he didn’t even know what the equation meant. He just knew, well, particles are waves, and waves have wave equations. Because that’s ultimately Maxwell’s equation. You can derive wave equations for light waves and radio waves and microwaves, x-rays. And he said, you know, there has to be a wave equation for this thing and this wave equation needs to somehow correctly predict the energy levels in hydrogen.

HUIZINGA: Oh, my gosh.

NAYAK: And he, you know, worked out this equation and then solved it, which is for that time period not entirely trivial. And he got correctly the energy levels of hydrogen, which people had … the spectra, the different wavelengths of light that hydrogen emits. And lo and behold, it works. He had no idea why. No idea what it even meant. And, um, but knew that he was onto something. And then remarkably, other people were able to build on what he’d done, were able to say, no, there must be a grain of truth here, if not the whole story, and let’s build on this, and let’s make something that is richer and encompasses more and try to understand the connections between this and other things. And Heisenberg was, around the same time, developing his what’s called matrix mechanics, a different way of thinking about quantum computing, and then people realize the connections between those, like Dirac. So it’s a remarkable story how people, how scientists, took these things they understood, you know, imposed on it a certain level of mathematical consistency and a need for the math to predict things that you could observe, and once you had, sort of, the internal mathematical consistency and it was correctly explaining a couple of data points about the world, you could build this huge edifice based on that. And so that was really impressive to me as I learned that. And that’s 100 years ago! It was 1925.

HUIZINGA: Right. Well, let me …

NAYAK: And that’s quantum mechanics!

HUIZINGA: OK.

NAYAK: You’re probably going to say, well, how does quantum computing fit into this, you know? [LAUGHTER] Right? And that’s a much later development. People spent a long time just trying to understand quantum mechanics, extend it, use it to understand more things, to understand, you know, other particles. So it was initially introduced to understand the electron, but you could understand atoms, molecules, and subatomic things and quarks and positrons. So there was a rich, you know, decades of development and understanding, and then eventually it got combined with relativity, at least to some extent. So there was a lot to do there to really understand and build upon the early discoveries of quantum mechanics. One of those directions, which was kicked off by Feynman around, I think, 1982 and independently by a Russian mathematician named Yuri Manin was, OK, great, you know, today’s computers, again, is many abstraction layers away from anything quantum mechanical, and in fact, it’s sort of separated from the quantum world by many classical abstraction layers. But what if we built a technology that didn’t do that? Like, that’s a choice. It was a choice. It was a choice that was partially forced on us just because of the scale of the things we could build. But as computers get smaller and smaller and the way Moore’s law is heading, you know, at some point, you’re going to get very close to that point at which you cannot abstract away quantum mechanics, [LAUGHTER] where you must deal with quantum mechanics, and it’s part and parcel of everything. You are not in the fortunate case where, out of quantum theory has emerged the classical world that behaves the way we expect it to intuitively. And, you know, once we go past that, that potentially is really catastrophic and scary because, you know, you’re trying to make things smaller for the sake of, you know, Moore’s law and for making computers faster and potentially more energy efficient. But, you know, if you get down to this place where the momentum and position of things, of the electrons, you know, or of the currents that you’re relying on for computation, if they’re not simultaneously well-defined, how are you going to compute with that? It looks like this is all going to break down. And so it looks like a real crisis. But, you know, what they realized and what Feynman realized was actually it’s an opportunity. It’s actually not just a crisis. Because if you do it the right way, then actually it gives you way more computational power than you would otherwise have. And so rather than looking at it as a crisis, it’s an opportunity. And it’s an opportunity to do something that would be otherwise unimaginable.

HUIZINGA: Chetan, you mentioned a bunch of names there. I have to say I feel sorry for Dr. Schrödinger because most of what he’s known for to people outside your field is a cat, a mysterious cat in a box, meme after meme. But you’ve mentioned a number of really important scientists in the field of quantum everything. I wonder, who are your particular quantum heroes? Are there any particular, sort of, modern-day 21st-century or 20th-century people that have influenced you in such a way that it’s like, I really want to go deep here?

NAYAK: Well, definitely, you know, the one person I mentioned, Feynman, is later, so he’s the second wave, you could say, of, OK, so if the first wave is like Schrödinger and Heisenberg, and you could say Einstein was the leading edge of that first wave, and Planck. But … and the second wave, maybe you’d say is, is, I don’t know, if Dirac is first or second wave. You might say Dirac is second wave and potentially Landau, a great Russian physicist, second wave. Then maybe Feynman’s the third wave, I guess? I’m not sure if he’s second or third wave, but anyway, he’s post-war and was really instrumental in the founding of quantum computing as a field. He had a famous statement, which is, you know, in his lectures, “There’s always room at the bottom.” And, you know, what he was thinking about there was, you can go to these extreme conditions, like very low temperatures and in some cases very high magnetic fields, and new phenomena emerge when you go there, phenomena that you wouldn’t otherwise observe. And in a lot of ways, many of the early quantum theorists, to some extent, were extreme reductionists because, you know, they were really trying to understand smaller and smaller things and things that in some ways are more and more basic. At the same time, you know, some of them, if not all of them, at the same time held in their mind the idea that, you know, actually, more complex behaviors emerge out of simple constituents. Einstein famously, in his miracle year of 1905, one of the things he did was he discovered … he proposed the theory of Brownian motion, which is an emergent behavior that relies on underlying atomic theory, but it is several layers of abstraction away from the underlying atoms and molecules and it’s a macroscopic thing. So Schrödinger famously, among the other things, he’s the person who came up with the concept of entanglement …

HUIZINGA: Yes.

NAYAK: … in understanding his theory. And for that matter, Schrödinger’s cat is a way to understand the paradoxes that occur when the classical world emerges from quantum mechanics. So they were thinking a lot about how these really incredible, complicated things arise or emerge from very simple constituents. And I think Feynman is one those people who really bridged that as a post-war scientist because he was thinking a lot about quantum electrodynamics and the basic underlying theory of electrons and photons and how they interact. But he also thought a lot about liquid helium and ultimately about quantum computing. Motivation for him in quantum computing was, you have these complex systems with many underlying constituents and it’s really hard to solve the equation. The equations are basically unsolvable.

HUIZINGA: Right.

NAYAK: They’re complicated equations. You can’t just, sort of, solve them analytically. Schrödinger was able to do that with his equation because it was one electron, one proton, OK. But when you have, you know, for a typical solid, you’ll have Avogadro’s number of electrons and ions inside something like that, there’s no way you’re going to solve that. And what Feynman recognized, as others did, really, coming back to Schrödinger’s observation on entanglement, is you actually can’t even put it on a computer and solve a problem like that. And in fact, it’s not just that with Avogadro’s number you can’t; you can’t put it on a computer and solve it with a thousand, you know, [LAUGHTER] atoms, right? And actually, you aren’t even going to be able to do it with a hundred, right. And when I say you can’t do that on a computer, it’s not that, well, datacenters are getting bigger, and we’re going to have gigawatt datacenters, and then that’s the point at which we’ll be able to see—no, the fact is the amazing thing about quantum theory is if, you know, you go from, let’s say, you’re trying to solve a problem with 1,000 atoms in it. You know, if you go to 1,001, you’re doubling the size of the problem. As far as if you were to store it on a cloud, just to store the problem on the classical computer, just to store the answer, I should say, on a classical computer, you’d have to double the size. So there’s no chance of getting to 100, even if, you know, with all the buildout of datacenters that’s happening at this amazing pace, which is fantastic and is driving all these amazing advances in AI, that buildout is never going to lead to a classical computer that can even store the answer to a difficult quantum mechanical problem.

HUIZINGA: Yeah, so basically in answer to the “who are your quantum heroes,” you’ve kind of given us a little history of quantum computing, kind of, the leadup and the questions that prompted it. So we’ll get back to that in one second, because I want you to go a little bit further on where we are today. But before we do that, you’ve also alluded to something that’s super interesting to me, which is in light of all the recent advances and claims in AI, especially generative AI, that are making claims like we’ll be able to shorten the timeline on scientific discovery and things like that, why then, do we need quantum computing? Why do we need it?

NAYAK: Great question, so at least AI is … AI and machine learning, at least so far, is only as good as the training data that you have for it. So if you train AI on all the data we have, and if you train AI on problems we can solve, which at some level are classical, you will be able to solve classical problems. Now, protein folding is one of those problems where the solution is basically classical, very complicated and difficult to predict but basically classical, and there was a lot of data on it, right. And so it was clearly a big data problem that’s basically classical. As far as we know, there’s no classical way to simulate or mimic quantum systems at scale, that there’s a clean separation between the classical and quantum worlds. And so, you know, that the quantum theory is the fundamental theory of the world, and there is no hidden classical model that is lurking [LAUGHTER] in the background behind it, and people sometimes would call these things like hidden variable theories, you know, which Einstein actually really was hoping, late in his life, that there was. That there was, hiding behind quantum mechanics, some hidden classical theory that was just obscured from our view. We didn’t know enough about it, and the quantum thing was just our best approximation. If that’s true, then, yeah, maybe an AI can actually discover that classical theory that’s hiding behind the quantum world and therefore would be able to discover it and answer the problems we need to answer. But that’s almost certainly not the case. You know, there’s just so much experimental evidence about the correctness of quantum mechanics and quantum theory and many experiments that really, kind of, rule out many aspects of such a classical theory that I think we’re fairly confident there isn’t going to be some classical approximation or underlying theory hiding behind quantum mechanics. And therefore, an AI model, which at the end of the day is some kind of very large matrix—you know, a neural network is some very large classical model obeying some very classical rules about, you take inputs and you produce outputs through many layers—that that’s not going to produce, you know, a quantum theory. Now, on the other hand, if you have a quantum computer and you can use that quantum computer to train an AI model, then the AI model is learning—you’re teaching it quantum mechanics—and at least within a certain realm of quantum problems, it can interpolate what we’ve learned about quantum mechanics and quantum problems to solve new problems that, you know, you hadn’t already solved. Actually, you know, like I said, in the early days, I was reading these books and flipping through these bookstores, and I’d sometimes figure out my own ways to solve problems different from how it was in the books. And then eventually I ended up solving problems that hadn’t been solved. Well, that’s sort of what an AI does, right? It trains off of the internet or off of playing chess against itself many times. You know, it learns and then takes that and eventually by learning its own way to do things, you know, it learns things that we as humans haven’t discovered yet.

HUIZINGA: Yeah.

NAYAK: And it could probably do that with quantum mechanics if it were trained on quantum data. So, but without that, you know, the world is ultimately quantum mechanical. It’s not classical. And so something classical is not going to be a general-purpose substitute for quantum theory.

HUIZINGA: OK, Chetan, this is fascinating. And as you’ve talked about pretty well everything so far, that’s given us a really good, sort of, background on quantum history as we know it in our time. Talk a little bit about where we are now, particularly—and we’re going get into topology in a minute, topological stuff—but I want to know where you feel like the science is now, and be as concise as you can because I really want get to your cool work that we’re going to talk about. And this question includes, what’s a Majorana and why is it important?

NAYAK: Yeah. So … OK, unfortunately, it won’t be that concise an answer. OK, so, you know, early ’80s, ideas about quantum computing were put forward. But I think most people thought, A, this is going to be very difficult, you know, to do. And I think, B, it wasn’t clear that there was enough motivation. You know, I think Feynman said, yes, if you really want to simulate quantum systems, you need a quantum computer. And I think at that point, people weren’t really sure, is that the most pressing thing in the world? You know, simulating quantum systems? It’s great to understand more about physics, understand more about materials, understand more about chemistry, but we weren’t even at that stage, I think, there where, hey, that’s the limiting thing that’s limiting progress for society. And then, secondly, there was also this feeling that, you know, what you’re really doing is some kind of analog computing. You know, this doesn’t feel digital, and if it doesn’t feel digital, there’s this question about error correction and how reliable is it going to be. So Peter Shor actually, you know, did two amazing things, one of which is a little more famous in the general public but one of which is probably more important technically, is he did these two amazing things in the mid-’90s. He first came up with Shor’s algorithm, where he said, if you have a quantum computer, yeah, great for simulating quantum systems, but actually you can also factor large numbers. You can find the prime factors of large numbers, and the difficulty of that problem is the underlying security feature under RSA [encryption], and many of these public key cryptography systems rely on certain types of problems that are really hard. It’s easy to multiply two large primes together and get the output, and you can use that to encrypt data. But to decrypt it, you need to know those two numbers, and it’s hard to find those factors. What Peter Shor discovered is that ideally, a quantum computer, an ideal quantum computer, would be really good at this, OK. So that was the first discovery. And at that point, what seemed at the time an academic problem of simulating quantum systems, which seemed like in Feynman’s vision, that’s what quantum computers are for, that seemingly academic problem, all of a sudden, also, you know, it turns out there’s this very important both financially and … economically and national security-wise other application of a quantum computer. And a lot of people sat up and took notice at that point. So that’s huge. But then there’s a second thing that he, you know, discovered, which was quantum error correction. Because everyone, when he first discovered it, said, sure, ideally that’s how a quantum computer works. But quantum error correction, you know, this thing sounds like an analog system. How are you going to correct errors? This thing will never work because it’ll never operate perfectly. Schrödinger’s problem with the cat’s going to happen, is that you’re going to have entanglement. The thing is going to just end up being basically classical, and you’ll lose all the supposed gains you’re getting from quantum mechanics. And quantum error correction, that second discovery of Peter Shor’s, really, you know, suddenly made it look like, OK, at least in principle, this thing can happen. And people built on that. Peter Shor’s original quantum error correction, I would say, it was based on a lot of ideas from classical error correction. Because you have the same problem with classical communication and classical computing. Alexei Kitaev then came up with, you know, a new set of quantum error correction procedures, which really don’t rely in the same way on classical error correction. Or if they do, it’s more indirect and in many ways rely on ideas in topology and physics. And, you know, those ideas, which lead to quantum error correcting codes, but also ideas about what kind of underlying physical systems would have built-in hardware error protection, led to what we now call topological quantum computing and topological qubits, because it’s this idea that, you know, just like people went from the early days of computers from vacuum tubes to silicon, actually, initially germanium transistors and then silicon transistors, that similarly that you had to have the right underlying material in order to make qubits.

HUIZINGA: OK.

NAYAK: And that the right underlying material platform, just as for classical computing, it’s been silicon for decades and decades, it was going to be at one of these so-called topological states of matter. And that these would be states of matter whose defining feature, in a sense, would be that they protect quantum information from errors, at least to some extent. Nothing’s perfect, but, you know, in a controllable way so that you can make it better as needed and good enough that any subsequent error correction that you might call software-level error correction would not be so cumbersome and introduce so much overhead as to make a quantum computer impractical. I would say, you know, there were these … the field had a, I would say, a reboot or a rebirth in the mid-1990s, and pretty quickly those ideas, in addition to the applications and algorithms, you know, coalesced around error correction and what’s called fault tolerance. And many of those ideas came, you know, freely interchanged between ideas in topology and the physics of what are called topological phases and, you know, gave birth to this, I would say, to the set of ideas on which Microsoft’s program has been based, which is to look for the right material … create the right material and qubits based on it so that you can get to a quantum computer at scale. Because there’s a number of constraints there. And the work that we’re really excited about right now is about getting the right material and harnessing that material for qubits.

HUIZINGA: Well, let’s talk about that in the context of this paper that you’re publishing and some pretty big news in topology. You just published a paper in Nature that demonstrates—with receipts—a fundamental operation for a scalable topological quantum computer relying on, as I referred to before, Majorana zero modes. That’s super important. So tell us about this and why it’s important.

NAYAK: Yeah, great. So building on what I was just saying about having the right material, what we’re relying on is, to an extent, is superconductivity. So that’s one of the, you know, really cool, amazing things about the physical world. That many metals, including aluminum, for instance, when you cool them down, they’re able to carry electricity with no dissipation, OK. No energy loss associated with that. And that property, the remarkable … that property, what underlies it is that the electrons form up into pairs. These things called Cooper pairs. And those Cooper pairs, their wave functions kind of lock up and go in lockstep, and as a result, actually the number of them fluctuates wildly, you know, in any place locally. And that enables them to, you know, to move easily and carry current. But also, a fundamental feature, because they form pairs, is that there’s a big difference between an even and odd number of electrons. Because if there’s an odd electron, then actually there’s some electron that’s unpaired somewhere, and there’s an energy penalty associated, an energy cost to that. It turns out that that’s not always true. There’s actually a subclass of superconductors called topological superconductors, or topoconductors, as we call them, and topoconductors have this amazing property that actually they’re perfectly OK with an odd number of electrons! In fact, when there’s an odd number of electrons, there isn’t any unpaired electron floating around. But actually, topological superconductors, they don’t have that. That’s the remarkable thing about it. I’ve been warned not to say what I’m about to say, but I’ll just go ahead [LAUGHTER] and say it anyway. I guess that’s bad way to introduce something …

HUIZINGA: No, it’s actually really exciting!

NAYAK: OK, but since you brought up, you know, Harry Potter and the Half-Blood Prince, you know, Voldemort famously split his soul into seven or, I guess, technically eight, accidentally. [LAUGHTER] He split his soul into seven Horcruxes, so in some sense, there was no place where you could say, well, that’s where his soul is.

HUIZINGA: Oh, my gosh!

NAYAK: So Majorana zero modes do kind of the same thing! Like, there’s this unpaired electron potentially in the system, but you can’t find it anywhere. Because to an extent, you’ve actually figured out a way to split it and put it … you know, sometimes we say like you put it at the two ends of the system, but that’s sort of a mathematical construct. The reality is there is no place where that unpaired electron is!

HUIZINGA: That’s crazy. Tell me, before you go on, we’re talking about Majorana. I had to look it up. That’s a guy’s name, right? So do a little dive into what this whole Majorana zero mode is.

NAYAK: Yeah, so Majorana was an Italian physicist, or maybe technically Sicilian physicist. He was very active in the ’20s and ’30s and then just disappeared mysteriously around 1937, ’38, around that time. So no one knows exactly what happened to him. You know, but one of his last works, which I think may have only been published after he disappeared, he proposed this equation called the Majorana equation. And he was actually thinking about neutrinos at the time and particles, subatomic particles that carry no charge. And so, you know, he was thinking about something very, very different from quantum computing, actually, right. So Majorana—didn’t know anything about quantum computing, didn’t know anything about topological superconductors, maybe even didn’t know much about superconductivity at all—was thinking about subatomic particles, but he wrote down this equation for neutral objects, or some things that don’t carry any charge. And so when people started, you know, in the ’90s and 2000s looking at topological superconductors, they realized that there are these things called Majorana zero modes. So, as I said, and let me explain how they enter the story, so Majorana zero modes are … I just said that topological superconductors, there’s no place you can find that even or odd number of electrons. There’s no penalty. Now superconductors, they do have a penalty—and it’s called the energy gap—for breaking a pair. Even topological superconductors. You take a pair, a Cooper pair, you break it, you have to pay that energy cost, OK. And it’s, like, double the energy, in a sense, of having an unpaired electron because you’ve created two unpaired electrons and you break that pair. Now, somehow a topological superconductor has to accommodate that unpaired electron. It turns out the way it accommodates it is it can absorb or emit one of these at the ends of the wire. If you have a topological superconductor, a topoconductor wire, at the ends, it can absorb or emit one of these things. And once it goes into one end, then it’s totally delocalized over the system, and you can’t find it anywhere. You can say, oh, it got absorbed at this end, and you can look and there’s nothing you can tell. Nothing has changed about the other end. It’s now a global property of the whole thing that you actually need to somehow figure out, and I’ll come to this, somehow figure out how to connect the two ends and actually measure the whole thing collectively to see if there’s an even or odd number of electrons. Which is why it’s so great as a qubit because the reason it’s hard for Schrödinger’s cat to be both dead and alive is because you’re going to look at it, and then you look at it, photons are going to bounce off it and you’re going to know if it’s dead or alive. And the thing is, the thing that was slightly paradoxical is actually a person doesn’t have to perceive it. If there’s anything in the environment that, you know, if a photon bounces off, it’s sort of like if a tree falls in the forest …

HUIZINGA: I was just going to say that!

NAYAK: … it still makes a sound. I know! It still makes a sound in the sense that Schrödinger’s cat is still going to be dead or alive once a photon or an air molecule bounces off it because of the fact that it’s gotten entangled with, effectively, the rest of the universe … you know many other parts of the universe at that point. And so the fact that there is no place where you can go and point to that unpaired electron means it does that “even or oddness” which we call parity, whether something’s even or odd is parity. And, you know, these are wires with, you know, 100 million electrons in them. And it’s a difference between 100 million and 100 million and one. You know, because one’s an even or odd number. And that difference, you have to be able to, like, the environment can’t detect it. So it doesn’t get entangled with anything, and so it can actually be dead and alive at the same time, you know, unlike Schrödinger’s cat, and that’s what you need to make a qubit, is to create those superpositions. And so Majorana zero modes are these features of the system that actually don’t actually carry an electrical charge. But they are a place where a single unpaired electron can enter the system and then disappear. And so they are this remarkable thing where you can hide stuff. [LAUGHS]

HUIZINGA: So how does that relate to your paper and the discoveries that you’ve made here?

NAYAK: Yeah, so in an earlier paper … so now the difficulty is you have to actually make this thing. So, you know, you put a lot of problems up front, is that you’re saying, OK, the solution to our problem is we need this new material and we need to harness it for qubits, right. Great. Well, where are we going to get this material from, right? You might discover it in nature. Nature may hand it to you. But in many cases, it doesn’t. And that’s … this is one of those cases where we actually had to engineer the material. And so engineering the material is, it turns out to be a challenge. People had ideas early on that they could put some combination of semiconductors and superconductors. But, you know, for us to really make progress, we realized that, you know, it’s a very particular combination. And we had to develop—and we did develop—simulation capabilities, classical. Unfortunately, we don’t have a quantum computer, so we had to do this classically with classical computers. We had to classically simulate various kinds of materials combinations to find one, or find a class, that would get us into the topological phase. And it turned out lots of details mattered there, OK. It involves a semiconductor, which is indium arsenide. It’s not silicon, and it’s not the second most common semiconductor, which is gallium nitride, which is used in LED lights. It’s something called indium arsenide. It has some uses as an infrared detector, but it’s a different semiconductor. And we’re using it in a nonstandard way, putting it into contact with aluminum and getting, kind of, the best of both worlds of a superconductor and a semiconductor so that we can control it and get into this topological phase. And that’s a previously published paper in American Physical [Society] journal. But that’s great. So that enables … that shows that you can create this state of matter. Now we need to then build on it; we have to harness it, and we have to, as I said, we have to make one of these wires or, in many cases, multiple wires, qubits, et cetera, complex devices, and we need to figure out, how do we measure whether we have 100 million or 100 million and one electrons in one of these wires? And that was the problem we solved, which is we made a device where we took something called a quantum dot—you should think of [it] as a tiny little capacitor—and that quantum dot is coupled to the wire in such a way that the coupling … that an electron—it’s kind of remarkable—an electron can quantum mechanically tunnel from … you know, this is like an electron, you don’t know where it is at any given time. You know, its momentum and its position aren’t well defined. So it’s, you know, an electron whose, let’s say, energy is well defined … actually, there is some probability amplitude that it’s on the wire and not on the dot. Even though it should be on the dot, it actually can, kind of, leak out or quantum mechanically end up on the wire and come back. And because of that fact—the simple fact that its quantum mechanical wave function can actually have it be on the wire—it actually becomes sensitive to that even or oddness.

HUIZINGA: Interesting.

NAYAK: And that causes a small change in the capacitance of this tiny little parallel plate capacitor, effectively, that we have. And that tiny little change in capacitance, which is, just to put into numbers, is the femtofarad, OK. So that’s a decimal point followed by, you know, 15 zeros and a one … 14 zeros and a one. So that’s how tiny it is. That that tiny change in the capacitance, if we put it into a larger resonant circuit, then that larger resonant circuit shows a small shift in its resonant frequency, which we can detect. And so what we demonstrated is we can detect the difference, that one electron difference, that even or oddness, which is, again, it’s not local property of anywhere in the wire, that we can nevertheless detect. And that’s, kind of, the fundamental thing you have to have if you want to be able to use these things for quantum information processing, you know, this parity, you have to be able to measure what that parity is, right. That’s a fundamental thing. Because ultimately, the information you need is classical information. You’re going to want to know the answer to some problem. It’s going to be a string of zeros and ones. You have to measure that. But moreover, the particular architecture we’re using, the basic operations for us are measurements of this type, which is a … it’s a very digital process. The process … I mentioned, sort of, how quantum computing looks a little analog in some ways, but it’s not really analog. Well, that’s very manifestly true in our architecture, that our operations are a succession of measurements that we turn on and off, but different kinds of measurements. And so what the paper shows is that we can do these measurements. We can do them fast. We can do them accurately.

HUIZINGA: OK.

NAYAK: And the additional, you know, announcements that we’re making, you know, right now are work that we’ve done extending and building on that with showing additional types of measurements, a scalable qubit design, and then building on that to multi-qubit arrays.

HUIZINGA: Right.

NAYAK: So that really unlocked our ability to do a number of things. And I think you can see the acceleration now with the announcements we have right now.

HUIZINGA: So, Chetan, you’ve just talked about the idea of living in a classical world and having to simulate quantum stuff.

NAYAK: Yup.

HUIZINGA: Tell us about the full stack here and how we go from, in your mind, from quantum computing at the bottom all the way to the top.

NAYAK: OK, so one thing to keep in mind is quantum computers are not a general-purpose accelerator for every problem. You know, so people sometimes say, well, quantum computers are just going to be like classical computers but faster. And that’s not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving. On the other hand, there are lots of things that classical computers are good at that quantum computers aren’t going to be good at, because it’s not going to give you any big scale up. Like a lot of big data problems where you have lots of classical data, you know, a quantum computer with, let’s say, let’s call it 1,000 qubits, and here I mean 1,000 logical qubits, and we come back to what that means, but 1,000 error-corrected qubits can solve problems that you have no chance of solving with a classical computer, even with all the world’s computing. But in fact, if it were a 1,000 qubits, you would have to take every single atom in the entire universe, OK, and turn that into a transistor, and it still wouldn’t be big enough. You don’t have enough bytes, even if every single atom in the universe were a byte. So that’s how big these quantum problems are when you try to store them on a classical computer, just to store the answer, let’s say.

HUIZINGA: Yeah.

NAYAK: But conversely, if you have a lot of classical data, like all the data in the internet, which we train, you know, our AI models with, you can’t store that on 1,000 qubits, right. You actually can’t really store more than 1,000 bits of classical information on 1,000 qubits. So many things that we have big data in classically, we don’t have the ability to really, truly store within a quantum computer in a way that you can do anything with it. So we should definitely not view quantum computers as replacing classical computers. There’s lots of things that classical computers are already good at and we’re not trying to do those things. But there many things that classical computers are not good at all. Quantum computer we should think of as a complimentary thing, an accelerator for those types of problems. It will have to work in collaboration with a classical computer that is going to do the classical steps, and the quantum computer will do the quantum steps. So that’s one thing to just keep in mind. When we talk about a quantum computer, it is part of a larger computing, you know, framework where there are many classical elements. It might be CPUs, it might be GPUs, might be custom ASICs for certain things, and then quantum computer, you know, a quantum processor, as well. So …

HUIZINGA: Is that called a QPU?

NAYAK: A QPU is the quantum processing unit, exactly! So we’ll have CPUs, GPUs, and QPUs. And so that is, you know, at the lowest layer of that stack, is the underlying substrate, physical substrate. That’s our topoconductor. It’s the material which we build our QPUs. That’s the quantum processing unit. The quantum processing unit includes all of the qubits that we have in our architecture on a single chip. And that’s, kind of, one of the big key features, key design features, that the qubits be small and small and manufacturable on a single wafer. And then the QPU also has to enable that quantum world to talk to the classical world …

HUIZINGA: Right.

NAYAK: … because you have to send it, you know, instructions and you have to get back answers. And for us, that is turning on and off measurements because our instructions are a sequence of measurements. And then, we ultimately have to get back a string of zeros and ones. But that initially is these measurements where we’re getting, you know, phase shifts on microwaves, and … which are in turn telling us about small capacitance shifts, which are in turn telling us the parity of electrons in a wire.

HUIZINGA: Right.

NAYAK: So really, this is a quantum machine in which, you know, you have the qubits that are built on the quantum plane. You’ve then got this quantum-classical interface where the classical information is going in and out of the quantum processor. And then there’s a lot of classical processing that has to happen, both to enable error correction and to enable computations. And the whole thing has to be inside of a cryogenic environment. So it’s a very special environment in which we … in which, A, it’s kept cold because that’s what you need in order to have a topoconductor, and that’s also what you need in order just in general for the qubits to be very stable. So that … when we talk about the full stack, just on the hardware side, there are many layers to this. And then of course, you know, there is the classical firmware that takes instructions and turns them into the physical things that need to happen. And then, of course, we have algorithms and then ultimately applications.

HUIZINGA: Yeah, so I would say, Chetan, that people can probably go do their own little research on how you go from temperatures that are lower than deep space to the room you’re working in. And we don’t have time to unpack that on this show. And also, I was going to ask you what could possibly go wrong if you indeed got everything right. And you mentioned earlier about, you know, what happens in an AI world if we get everything right. If you put quantum and AI together, it’s an interesting question, what that world looks like. Can you just take a brief second to say that you’re thinking about what could happen to cryptography, to, you know, just all kinds of things that we might be wondering about in a post-quantum world?

NAYAK: Great question. So, you know, first of all, you know, one of the things I want to, kind of, emphasize is, ultimately, a lot of, you know, when we think about the potential for technology, often the limit comes down to physics. There are physics limits. You know, if you think about, like, interstellar travel and things like that, well, the speed of light is kind of a hard cutoff, [LAUGHTER] and actually, you’re not going to be able to go faster than the speed light, and you have to bake that in. That ultimately, you know, if you think of a datacenter, ultimately, like there’s a certain amount of energy, and there’s a certain amount of cooling power you have. And you can say, well, this datacenter is 100 megawatts, and then in the future, we’ll have a gigawatt to use it. But ultimately, then that energy has to come from somewhere, and you’ve got some hard physical constraints. So similarly, you could ask, you know, with quantum computers, what are the hard physical constraints? What are the things that just … because you can’t make a perpetual motion machine; you can’t violate, I think, laws of quantum mechanics. And I think in the early days, there was this concern that, you know, this idea relies on violating something. You’re doing something that’s not going to work. You know, I’d say the theory of quantum error correction, the theory of fault tolerance, you know, many of the algorithms have been developed, they really do show that there is no fundamental physical constraint saying that this isn’t going to happen, you know. That, you know, that somehow you would need to have either more power than you can really generate or you would need to go much colder than you can actually get. That, you know, there’s no physical, you know, no-go result. So that’s an important thing to keep in mind. Now, the thing is, some people might then be tempted to say, well, OK, now it’s just an engineering problem because we know this in principle can work, and we just have to figure out how to work. But the truth is, there isn’t any such, like, hard barrier where you say, well, oh, up until here, it’s fundamental physics, and then beyond this, it’s just an engineering problem. The reality is, you know, new difficulties and challenges arise every step along the way. And one person might call it an engineering or an implementation challenge, and one person may call it a fundamental, you know, barrier obstruction, and I think people will probably profitably disagree, you know, agree to disagree on, like, where that goes. I think for us, like, it was really crucial, you know, as we look out at a scale to realize quantum computers are going to really make an impact. We’re going to need thousands, you know, hundreds to thousands of logical qubits. That is error-corrected qubits. And when you look at what that means, that means really million physical qubits. That is a very large scale in a world in which people have mostly learned what we know about these things from 10 to 100 qubits. To project out from that to a million, you know, it would surprise me if the solutions that are optimal for 10 to 100 qubits are the same solutions that are optimal for a million qubits, right.

HUIZINGA: Yeah.

NAYAK: And that has been a motivation for us, is let’s try to think, based on what we now know, of things that at least have a chance to work at that million qubit. Let’s not do anything that looks like it’s going to clearly hit a dead end before then.

HUIZINGA: Right.

NAYAK: Now, obviously in science, nothing is certain, and you learn new things along the way, but we didn’t want to start out with things that looked like they were not going to be, you know, work for a million qubits. That was the reason that we developed this new material, that we created this, engineered this new material, you know, these topoconductors, precisely because we said we need to have a material that can give us something where we can operate it fast and make it small and be able to control these things. So, you know, I think that’s one key thing. And, you know, what we’ve demonstrated now is that we can harness this; that we’ve got a qubit. And that’s why we have a lot of confidence that, you know, these are things that aren’t going to be decades away. That these things are going to be years away. And that was the basis for our interaction with DARPA [Defense Advanced Research Projects Agency]. We’ve just been … signed a contract with DARPA to go into the next phase of the DARPA US2QC program. And, you know, DARPA, the US government, wants to see a fault-tolerant quantum computer. And … because they do not want any surprises.

HUIZINGA: Right?!? [LAUGHS]

NAYAK: And, you know, there are people out there who said, you know, quantum computers are decades away; don’t worry about it. But I think the US government realizes they might be years, not decades away, and they want to get ahead of that. And so that’s why they’ve entered into this agreement with us and the contract with us.

HUIZINGA: Yeah.

NAYAK: And so that is, you know, the thing I just want to make sure that, you know, listeners to the podcast understand that we are, you know, the reason that we fundamentally re-engineered, re-architected, what we think a quantum computer should look like and what the qubit should be and even … going all the way down to the underlying materials was … which is high risk, right? I mean, there was no guarantee … there’s no guarantee that any of this is going to work, A. And, B, there was no guarantee we would even be able to do the things we’ve done so far. I mean, you know, that’s the nature of it. If you’re going to try to do something really different, you’re going to have to take risks. And we did take risks by really starting at, you know, the ground floor and trying to redesign and re-engineer these things. So that was a necessary part of this journey and the story, was for us to re-engineer these things in a high-risk way. What that leads to is, you know, potentially changing that timeline. And so in that context, it’s really important to make this transition to post-quantum crypto because, you know, the cryptography systems in use up until now are things that are not safe from quantum attacks if you have a utility-scale quantum computer. We do know that there are crypto systems which, at least as far as we know, appear to be safe from quantum attacks. That’s what’s called post-quantum cryptography. You know, they rely on different types of hard math problems, which quantum computers aren’t probably good at. And so, you know, and changing over to a new crypto standard isn’t something that happens at the flip of a switch.

HUIZINGA: No.

NAYAK: It’s something that takes time. You know, first, you know, early part of that was based around the National Institute of Standards and Technology aligning around one or a few standard systems that people would implement, which they certified would be quantum safe and, you know, those processes have occurred. And so now is the time to switch over. Given that we know that we can do this and that it won’t happen overnight, now’s the time to make that switch.

HUIZINGA: And we’ve had several cryptographers on the show who’ve been working on this for years. It’s not like they’re just starting. They saw this coming even before you had some solidity in your work. But listen, I would love to talk to you for hours, but we’re coming to a close here. And as we close, I want to refer to a conversation you had with distinguished university professor Sankar Das Sarma. He suggested that with the emergence of Majorana zero modes, you had reached the end of the beginning and that you were now sort of embarking on the beginning of the end in this work. Well, maybe that’s a sort of romanticized vision of what it is. But could you give us a little bit of a hint on what are the next milestones on your road to a scalable, reliable quantum computer, and what’s on your research roadmap to reach them?

NAYAK: Yeah, so interestingly, we actually just also posted on the arXiv a paper that shows some aspects of our roadmap, kind of the more scientific aspects of our roadmap. And that roadmap is, kind of, continuously going from the scientific discovery phase through the engineering phase, OK. Again, as I said, it’s a matter of debate and even taste of what exactly you want to call scientific discovery versus engineering, but—which will be hotly debated, I’m sure—but it is definitely a continuum that’s going more towards … from one towards the other. And I would say, you know, at a high level, logical qubits, you know, error-corrected, reliable qubits, are, you know, the basis of quantum computation at scale and developing, demonstrating, and building those logical qubits and logic qubits at scale is kind of a big thing that—for us and for the whole industry—is, I would say, is, sort of, the next level of quantum computing. Jason Zander wrote this blog where he talked about level one, level two, level three, where level one was this NISQ—noisy intermediate-scale quantum—era; level two is foundations of, you know, reliable and logical qubits; and level three is the, you know, at-scale logical qubits. I think we’re heading towards level two, and so in my mind, that’s sort of, you know, the next North Star is really around that. I think there will be a lot of very interesting and important things that are more technical and maybe are not as accessible to a big audience. But I’d say that’s, kind of, the … I would say, if you’re, you know, a thing to keep in mind as a big exciting thing happening in the field.

HUIZINGA: Yeah. Well, Chetan Nayak, what a ride this show has been. I’m going to be watching this space—and the timelines thereof because they keep getting adjusted!

[MUSIC]

Thank you for taking time to share your important work with us today.

NAYAK: Thank you very much, my pleasure!

[MUSIC FADES]

The post Ideas: Quantum computing redefined with Chetan Nayak appeared first on Microsoft Research.

Ready for the Generative AI Era

Versatile Viewports

Time Stamps

You Might Also Like…

Subscribe to the AI Podcast

What is Automated Reasoning and how does it help?

Using Automated Reasoning checks for Amazon Bedrock in financial services

Solution overview

Prerequisites

Solution walkthrough

Automated Reasoning checks for insurance underwriting rules validation

Test the policy

Automated Reasoning checks for insurance legal triaging

Test the policy

Automated Reasoning checks for insurance claims processing

Test the policy

Conclusion

About the Authors

Prerequisites

Administration experience

Managing compute

Observability

Data scientist experience

Access control

Submitting tasks

HyperPod CLI

Sample commands

Common scenarios

Scenario 1: Enterprise

Scenario 2: Startup

Conclusion

About the Author

Generated gameplay examples

What motivated this research?

A new research opportunity enabled by data

Muse Training Data

Scaling up model training

Muse consistency over the course of training

Multidisciplinary collaboration: Involving users from the beginning

Unlocking new creative use cases with the WHAM Demonstrator

WHAM Demonstrator

Identifying key capabilities and how to evaluate them

Muse evaluation of consistency, diversity and persistency

Consistency

Diversity

Persistency

Conclusion

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.