Research Focus: Week of May 13, 2024

Research Focus: Week of May 13, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: May 13, 2024

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning 

Large language models (LLMs) have shown remarkable performance in generating text similar to that created by people, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model’s training knowledge cutoff date.

In a recent paper: Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning, researchers from Microsoft investigate the effectiveness of supervised fine-tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on recent sporting events. They compare different dataset generation strategies—token-based and fact-based scaling—to create training data that helps the model learn new information. Their experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. The researchers present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge. 


A Reflection on Human-Notebook Experiences in the Era of AI

Computational notebooks provide an interactive way to work with data. They have been widely used by data professionals to write code, explore data, and generate visualizations, all in one document. Previous research has revealed unique pain points around the user experience in computational notebooks. However, as AI tools like ChatGPT or Copilot have emerged, it is unclear whether these pain points have been reduced or changed, or whether new pain points have arisen. Due to the fast pace of advances in AI technology, most of the development of new AI tools has been primarily driven by technology and not by user experience.

In a recent paper: A Reflection on Human-Notebook Experiences in the Era of AI, researchers from Microsoft summarize literature on how new AI technology has impacted human-notebook interaction and human-computer interaction (HCI) paradigms, new challenges and user behavior around using AI assistants, and recent research on AI assistants in computational notebook scenarios. They outline gaps in existing literature and suggest a future focus on improving macro human-notebook experiences throughout a user’s workflow, measuring and quantifying the value of AI systems, and establishing a set of standards and best practices for AI tools.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.


Jacdac: Service-Based Prototyping of Embedded Systems

The traditional approach to programming embedded systems is monolithic: firmware on a microcontroller contains both application code and the drivers needed to communicate with sensors and actuators, using low-level protocols such as I2C, SPI, and RS232. In comparison, software development for the cloud has moved to a service-based development and operation paradigm: a service provides a discrete unit of functionality that can be accessed remotely by an application, or other service, but is independently managed and updated.

In a recent paper: Jacdac: Service-Based Prototyping of Embedded Systems (opens in new tab), researchers from Microsoft propose, design, implement, and evaluate a service-based approach to prototyping embedded systems called Jacdac (opens in new tab). Jacdac defines a service specification language, designed especially for embedded systems, along with a host of specifications for a variety of sensors and actuators. With Jacdac, each sensor/actuator in a system is paired with a low-cost microcontroller that advertises the services that represent the functionality of the underlying hardware over an efficient and low-cost single-wire bus protocol. A separate microcontroller executes the user’s application program, which is a client of the Jacdac services on the bus. 

Three Jacdac kits, comprising over twenty modules, have been produced by third-party manufacturers: KittenBot (opens in new tab) and Forward Education (opens in new tab).


PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

Evaluation of multilingual LLMs is challenging due to a variety of factors – the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data, and the lack of local, cultural nuances in translated benchmarks. Hence, it is difficult to extensively evaluate LLMs in a multilingual setting, leading to lack of fair comparisons between models and difficulties in replicating the evaluation setup used by some models. Recently, several Indic (Indian language) LLMs have been created to help build more locally and culturally relevant LLMs.

In a recent paper: PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models, researchers from Microsoft present an evaluation framework, which is the first comprehensive evaluation of Indic LLMs using a combination of human and LLM-based evaluation. The researchers conduct a total of 90,000 human evaluations and 50,000 LLM-based evaluations of 29 models to present leaderboards for 10 Indic languages. Pariksha provides inclusive evaluation by engaging a community of workers that represent India’s large and diverse workforce and also serves as a research platform for improving the process of evaluation. For transparency on the process, the evaluation artifacts will be released. Conducting Pariksha at regular intervals, the researchers aim to enable models to improve over time with insights and artifacts from their evaluations. 


Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists

Many responsible AI resources, such as toolkits, playbooks, and checklists, have been developed to support AI practitioners in identifying, measuring, and mitigating potential fairness-related harms. These resources are often designed to be general purpose, in order to address a variety of use cases, domains, and deployment contexts. However, this can lead to decontextualization, where such resources lack the level of relevance or specificity needed to use them.

To understand how AI practitioners might contextualize one such resource, an AI fairness checklist, for their particular use cases, domains, and deployment contexts, researchers from Microsoft conducted a retrospective contextual inquiry with 13 AI practitioners from seven organizations. In a recent paper: Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists, they identify how contextualizing this checklist introduces new forms of work for AI practitioners and other stakeholders, while opening up new sites for negotiation and contestation of values in AI. The researchers also identify how the contextualization process may help AI practitioners develop a shared language around AI fairness. They also identify dynamics related to ownership over this process that suggest larger issues of accountability in responsible AI work. 


MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

LLMs are becoming indispensable tools for many creative and information related tasks, but they still come with limitations, including a tendency to fabricate content. State-of-the-art algorithms pair the LLM with an external, dynamically updated knowledge base to ground the LLM’s answers and provide up-to-date information. However, these techniques require large amounts of relevant, labeled training data that have not previously been publicly available. 

In a recent paper: MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels presented at the 2024 ACM Web Conference, researchers from Microsoft introduce a novel dataset that closely mimics real-world web document and query distribution. MS MARCO Web Search contains 10 million unique queries across 93 languages with millions of relevant labeled query-document pairs. It uses ClueWeb22’s 10 billion high-quality web pages as the document corpus and provides rich information for various kinds of downstream tasks. 

This dataset unlocks several new research directions that previous datasets cannot well support, including generic end-to-end neural indexer models, generic embedding models, and next generation information access systems with LLMs. MS MARCO Web Search offers a retrieval benchmark with three web scale retrieval challenge tasks, each with automatic evaluation and leaderboard. These tasks demand innovation in both machine learning and information retrieval systems. The researchers intend for MS MARCO Web Search to lay the groundwork for future advancements in AI and systems research.


AI Case Studies for Natural Science Research with Bonnie Kruft

Among the stunning changes and disruptions driven by AI, one of the most significant is the impact on scientific discovery. In her presentation at EmTech Digital 2024 (opens in new tab), Bonnie Kruft, partner deputy director at Microsoft Research AI for Science, outlined some examples of how generative AI enables groundbreaking research in the natural sciences. Recent breakthroughs aided by AI include small molecular inhibitors for treating infectious disease, the discovery of new materials for energy storage, and new drug development. 

Catch a replay of the presentation, including a follow-up Q&A with the audience, and hear how researchers are reducing discovery times from years to months. The discussion explores safe and responsible AI practices, how large language models can work with science-based models, and what lies ahead for AI in science. 

Microsoft Research in the news


The tiny glass blocks that can preserve your data for centuries 

The Times UK | April 27, 2024

Microsoft’s Project Silica is an innovative form of long-term storage – potentially revolutionizing how important data can be preserved for future generations.


These Recyclable Circuit Boards Could Stem E-Waste 

IEEE Spectrum | May 2, 2024

New research from the University of Washington and Microsoft show that vitrimer-based PCBs can be broken down into a gel for repeated reuse. The research stems from the Microsoft Research Climate Initiative.


Today’s AI models are impressive. Teams of them will be formidable 

The Economist | May 13, 2024

Teams of LLMs are more capable and intelligent than solitary agents because a single job can be split into many smaller, more specialized tasks, says Chi Wang, a principal researcher at Microsoft Research in Redmond, Washington.


You Only Cache Once: Decoder-Decoder Architectures for Language Models 

Microsoft Research LinkedIn | May 11, 2024

YOCO is a novel decoder-decoder architecture for LLMs, enhancing memory efficiency by caching key-value pairs only once. It slashes KV cache memory and prefilling time and makes 1M-length LLMs practical.


Peter Lee discusses new technologies that will drive the future of drug discovery 

AAPS | May 10, 2024

The president of Microsoft Research explores how new advances in technologies, such as AI and machine learning, are transforming biotechnology, in the closing plenary of the AAPS National Biotechnology Conference (NBC) on Thursday, May 16.


PKSHA develops advanced LLMs in collaboration with Microsoft Japan 

Business Wire | April 29, 2024

PKSHA Technology has developed one of the first Japanese-English LLMs in collaboration with Microsoft Japan. This development primarily focuses on boosting productivity within contact centers and corporate help desks.


BRAID fellowships include three collaborations with Microsoft Research 

Bridging Responsible AI Divides | May 2024

BRAID fellowships support individual researchers in partnership with public and private organizations to address challenges in the field of responsible AI. Among the latest fellowships are three supported by Microsoft Research.

The post Research Focus: Week of May 13, 2024 appeared first on Microsoft Research.

Read More

Build a serverless exam generator application from your own lecture content using Amazon Bedrock

Build a serverless exam generator application from your own lecture content using Amazon Bedrock

Crafting new questions for exams and quizzes can be tedious and time-consuming for educators. The time required varies based on factors like subject matter, question types, experience level, and class level. Multiple-choice questions require substantial time to generate quality distractors and ensure a single unambiguous answer, and composing effective true-false questions demands careful effort to avoid vagueness and assess deeper understanding. Creating high-quality assessment questions of any format necessitates meticulous attention to detail from educators in order to produce fair and valid student evaluations. To streamline this cumbersome process, we propose an automated exam generation solution based on Amazon Bedrock.

In this post, we explore how to build an application that generates tests tailored to your own lecture content. We cover the technical implementation using the Anthropic Claude large language model (LLM) on Amazon Bedrock and AWS Lambda deployed with the AWS Serverless Application Model (AWS SAM). This solution enables educators to instantly create curriculum-aligned assessments with minimal effort. Students can take personalized quizzes and get immediate feedback on their performance. This solution simplifies the exam creation process while benefiting both teachers and learners.

Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon using a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. In this post, we focus on a text generation use case, and can choose from Amazon Titan Text G1 and other models on Amazon Bedrock, including Anthropic Claude, AI21 Labs Jurassic, Meta Llama 2, and Cohere Command.

With the ability to scale up to 200,000-token context windows, Anthropic Claude v2.1 on Amazon Bedrock is our preferred choice for this post. It is typically helpful when working with lengthy documents such as entire books. When we talk about tokens, we refer to the smallest individual “atoms” of a language model, and can varyingly correspond to words, subwords, characters, or even bytes (in the case of Unicode). For Anthropic Claude on Amazon Bedrock, the average token is about 3.5 English characters. The 200,000 tokens supported by Anthropic Claude v2.1 on Amazon Bedrock would be equivalent to roughly 150,000 words or over 500 pages of documents.

This post demonstrates how to use advanced prompt engineering to control an LLM’s behavior and responses. It shows how to randomly generate questions and answers from lecture files, implemented as a simple serverless application.

Solution overview

The following diagram illustrates the application architecture. We distinguish two paths: the educator path (1) and the learner path (2).

As first-time users, both educator and learner need to complete the sign-up process, which is done by two separate Amazon Cognito user pools. For the educator, when the sign-up is complete, Amazon Cognito invokes the Lambda function called CognitoPostSignupFn to subscribe the educator to an Amazon Simple Notification Service (Amazon SNS) topic. The educator must approve the subscription to this topic in order to be notified by email with the scorecard of each learner who will be taking the generated exam.

Figure 1: Architectural diagram of the exam generator application

The workflow includes the following steps:

  1. The educator opens the landing page for generating an exam under the domain gen-exam.<your-domain-name> through Amazon Route 53, which redirects the request to the Application Load Balancer (ALB).

1.1 The ALB communicates with Amazon Cognito to authenticate the educator on the educator user pool.

1.2 The educator uploads a lecture as a PDF file into the exam generation front-end.

1.3 The Amazon Elastic Container Service (Amazon ECS) container running on AWS Fargate uploads the file to Amazon Simple Storage Service (Amazon S3) in the Examgen bucket under the prefix exams.

1.4 The S3 bucket is configured using event notification. Whenever a new file is uploaded, a PutObject is activated to send the file to the ExamGenFn Lambda function.

1.5 The Lambda function ExamGenFn invokes the Anthropic Claude v2.1 model on Amazon Bedrock to generate exam questions and answers as a JSON file.

1.6 The Amazon Bedrock API returns the output Q&A JSON file to the Lambda function.

1.7 The ExamGenFn Lambda function saves the output file to the same S3 bucket under the prefix Questions-bank. (You can choose to save it to a different S3 bucket.)

1.8 The ExamGenFn Lambda function sends an email notification to the educator through the SNS topic to notify that the exam has been generated.

  1. The learner opens the landing page to take the exam under the domain take-exam.<your-domain-name> through Route 53, which redirects the request to the ALB.

2.1 The ALB communicates with Amazon Cognito to authenticate the learner on the learner user pool.

2.2 The learner accesses the frontend and selects a test to take.

2.3 The container image sends the REST API request to Amazon API Gateway (using the GET method).

2.4 API Gateway communicates with the TakeExamFn Lambda function as a proxy.

2.5 The Lambda TakeExamFn function retrieves from S3 bucket under the prefix Questions-bank the available exam in JSON format.

2.6 The JSON file is returned to API Gateway.

2.7 API Gateway transmits the JSON file to the ECS container in the front-end.

2.8 The container presents the exam as a UI using the Streamlit framework. The learner then takes the exams. When the learner is finished and submits their answers, the ECS container performs a comparison between the answers provided and the correct answers, and then shows the score results to the learner.

2.9 The ECS container stores the scorecard in an Amazon DynamoDB table.

2.10 The Lambda DynamoDBTriggerFn function detects the new scorecard record on the DynamoDB table and sends an email notification to the educator with the learner’s scorecard.

This is an event-driven architecture made up of individual AWS services that are loosely integrated with each other, with each service handling a specific function. It uses AWS serverless technologies, allowing you build and run your application without having to manage your own servers. All server management is done by AWS, providing many benefits such as automatic scaling and built-in high availability, letting you take your idea to production quickly.

Prerequisites

In this section, we go through the prerequisite steps to complete before you can set up this solution.

Enable model access through Amazon Bedrock

You can add access to a model from the Amazon Bedrock console. For this walkthrough, you need to request access to the Anthropic Claude model on Amazon Bedrock. For more information, see Model access.

Install the necessary packages

You need to install the following:

Register a DNS domain and create certificates

If you don’t already have a DNS domain registered, you need to create one in order to not expose the DNS of your ALB. For instructions, refer to Registering a new domain.

You also need to request two public certificates, one for each front-end: gen-exam.<your-domain-name> and take-exam.<your-domain-name>. Refer to Requesting a public certificate to request a public certificate on AWS Certificate Manager.

Save the values for genCertificateArn and takeCertificateArn.

If you want to build the app in a development environment without using your own domain, you can uncomment the following section in the sam template:

# un-comment if you need to test with HTTP traffic and no certifcate
#  ExamGenALBHTTPListener:
#    Type: AWS::ElasticLoadBalancingV2::Listener
#    Properties:
#      LoadBalancerArn: !Ref ExamGenALB
#      Protocol: HTTP
#      Port: 80
#      DefaultActions:
#        - Type: forward
#          TargetGroupArn: !Ref ExamGenTG

Chain-of-Thought (CoT) Prompting

Before we embark on constructing the app, let’s delve into prompt engineering. We use Chain-of-Thought (CoT) Prompting, which allows the model to break down complex reasoning into smaller, more manageable steps. By providing the AI with intermediate prompts that guide its reasoning process step by step, CoT prompting enables the model to tackle sophisticated reasoning tasks. Guiding the AI through an analytical chain of thought in this way allows it to develop complex reasoning capabilities that would otherwise be beyond its unaided abilities.

In the ExamGenFn Lambda function, we use the following prompt to guide the model through reasoning steps. You can change the prompt and give it different personas and instructions, and see how it behaves.

template_instruction = f"""Human: 
You are a teacher during examination time and you are responsible for creating exam questions from the student study book.
Before creating the questions
- Analyze the book found between <exam_book> </exam_book> tags, to identify distinct chapters, sections, or themes for question generation.
- For true/false questions, select statements that can be clearly identified as true or false based on the book's content.
- For MCQs, develop questions that challenge the understanding of the material, ensuring one correct answer and {n_mcq_options-1} distractors that are relevant but incorrect.
- Randomize the selection of pages or topics for each run to generate a new set of questions, ensuring no two sets are identical.
Please provide the questions in this format exactly for MCQ:
- The output should be like     
"question": "What is the colour of the car in the book?",
"options": ["Blue", "Green", "Yellow", "Grey"],
"correct_answer": "Yellow"
For True/False:
- the output should be like     
"question": "is the sky Blue?",
"options": ["True", "False"],
"correct_answer": "True"
                               
Generate {n_tfq} true/false and {n_mcq} multiple-choice questions (MCQs) ensuring each question pertains to different pages or topics within the book. For MCQs, provide [n_mcq_options] options for each question. Focus on creating unique questions that cover a broad spectrum of the book's content, avoiding repetition and ensuring a diverse examination of the material. Use the following guidelines:
                               
1. True/False Questions:
- Craft each true/false question based on factual statements or key concepts from the book.
- Ensure each question spans a wide range of topics to cover the book comprehensively.
                               
                               
2. Multiple-Choice Questions (MCQs):
- Formulate each MCQ to assess understanding of significant themes, events, or facts.
- Include {n_mcq_options} options per MCQ, making sure one is correct and the others are plausible but incorrect.
- Diversify the content areas and pages/topics for each MCQ to avoid overlap and repetition. 
""" 

Build the exam generator application

The application presented in this post is available in the following GitHub repo with the building blocks code. Let’s start with a git pull on the repo.

We recommend using temporary credentials with the AWS CLI to make programmatic requests for AWS resources using the AWS CLI.

Build the front-end using Streamlit and Docker

You build two containers, one for generating exams and one for taking exams. Let’s start with building the generating exam Docker image:

  1. Go to the following path in the repo and build your Docker image:
user@exam-gen ~ % cd exam-gen-ai-blog/frontend/generate-exam-fe

user@exam-gen generate-exam-fe % docker build -t <your-image-name>:tag .
  1. Authenticate the Docker CLI to Amazon Elastic Container Registry (Amazon ECR):
aws ecr get-login-password --region <your-region> | docker login --username AWS --password-stdin <your-account-id>.dkr.ecr.<your-region>.amazonaws.com
  1. Create a new repository in Amazon ECR:
aws ecr create-repository --repository-name <your-repository-name>
  1. Tag your Docker image with the ECR repository URI:
docker tag <your-image-name>:tag your-account-id.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag
  1. Push your tagged Docker image to your ECR repository:
docker push <your-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag
  1. Navigate to this path in the repo to build your Docker image for taking the exam:
user@exam-gen ~ % cd exam-gen-ai-blog/frontend/take-exam-fe
  1. Because the authentication and the ECR repo are already done, run directly the following command:
user@exam-gen take-exam-fe % docker build -t <your-image-name>:tag .

docker tag <your-image-name>:tag your-account-id.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag

docker push <your-account-id>.dkr.ecr.<your-region>.amazonaws.com/<your-ecr-repository>:tag
  1. Copy the values for GenExamImageUri and TakeExamImageUri.

Now that you have both containers ready to run, let’s build the rest of the components using AWS SAM.

Build solution components with AWS SAM

AWS SAM consists of two parts:

  • AWS SAM template specification – An open source framework that you can use to define your serverless application infrastructure on AWS
  • AWS SAM CLI – A command line tool that you can use with AWS SAM templates and supported third-party integrations to build and run your serverless applications

For further information, refer to Using the AWS Serverless Application Model (AWS SAM).

  1. Go to the home directory user@exam-gen ~ % cd exam-gen-ai-blog and run the sam build command.

Before you run sam deploy, be aware of the following:

  • The ECS containers are deployed on Fargate, which needs a VPC with two subnets in different Availability Zones. We use the default VPC for simplicity. You can create your own VPC or use an existing one in your AWS account and update the sam template. To list your VPC IDs and subnets within a selected VPC ID, run the following commands to extract your VpcId and your two SubnetId:
aws ec2 describe-vpcs
aws ec2 describe-subnets
  • GenExamCallbackURL (for generating exam) and TakeExamCallbackURL (for taking exam) are used by Amazon Cognito. They are URLs where the user is redirected to after a successful sign-in.
  1. Now let’s deploy the sam template:
sam deploy --stack-name <your-stack-name> --guided 
 --parameter-overrides 
 DefaultVPCID="your-default-vpc-id" 
 SubnetIdOne="your-subnet-one-id" 
 SubnetIdTwo="your-subnet-two-id" 
 genCertificateArn="arn:aws:acm:<your-region>:<your-account-id>:certificate/<your-certificate-id>" 
 takeCertificateArn="arn:aws:acm:<your-region>:<your-account-id>:certificate/<your-certificate-id>" 
 GenExamImageUri="<your-gen-image-uri>" 
 TakeExamImageUri="<your-take-image-uri>" 
 GenExamCallbackURL="gen-exam.<your-domain-name>" 
 TakeExamCallbackURL="take-exam.<your-domain-name>" 
 NotificationEmail="your-email-address@example.com" 
 --capabilities CAPABILITY_NAMED_IAM 
        #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
        Confirm changes before deploy [Y/n]: n
        #SAM needs permission to be able to create roles to connect to the resources in your template
        Allow SAM CLI IAM role creation [Y/n]: y
        #Preserves the state of previously provisioned resources when an operation fails
        Disable rollback [Y/n]: n
        Save arguments to configuration file [Y/n]: n

        Looking for resources needed for deployment:
        Creating the required resources...

        Successfully created!

You can follow the creation on the AWS CloudFormation console.

This following video demonstrates running the sam build and sam deploy commands.

Figure 2: SAM build and SAM deploy execution

  1. The final step is to get the DNS names for the deployed ALB, map them to the certificate domains names in Route 53, and add them as a CNAME record.

Test the solution

You can use your browser to test the solution.

  1. Navigate to gen-exam.<your-domain-name>.

You’ll receive an email with a confirmation code.

  1. Enter the verification code and choose Confirm account.

Once verified, you will land on a page to generate your quiz.

  1. Choose the amount of multiple choice and true/false questions you want to generate, then choose Browse files to upload an input file.

For this example, we use the whitepaper AWS Cloud Adoption Framework: Security Perspective as our input file. We generate four multiple-choice questions and one true/false question.

  1. Confirm your subscription to the SNS topic (you’ll receive an email).

Then you’ll receive an email confirming the exam has been generated.

  1. Switch to take-exam.<your-domain-name>, and you’ll find the exam on the dropdown menu.
  1. Choose the exam, then choose Load quiz.

  1. Then you can take the exam and choose Submit to display the results.

The educator will receive an email with the scorecard of the learner.

You have just built a simple application that randomly generates questions and answers from uploaded documents. Learners can take the generated exams and educators can receive scorecards via email when tests are complete. The integration with the DynamoDB table allows you to store the responses on a long-term basis.

Expanding the solution

There are many possibilities to build on top of this and create a fully featured learning and testing application. One area of expansion is uploading multiple documents at once. As of this writing, users can only upload one document at a time, but support for bulk uploads would improve efficiency and make it easier to work with large sets of source materials. Educators could be empowered to gather and upload content from various documents and websites as source material for questions. This provides greater flexibility compared to using a single document. Moreover, with a data store, they could view and analyze learner answers via a scorecard interface to track progress over time.

Clean up

It’s important to clean up your resources in the following order:

  1. On the Amazon S3 console, empty the bucket by deleting any files and folders.
  1. On the AWS CloudFormation console, delete the stack.

Conclusion

In this post, we showed how to build a generative AI application powered by Amazon Bedrock that creates exam questions using lecture documents as input to support educators with an automated tool to continuously modernize quiz material and improve learners’ skills. Learners will be able to take the freshly generated exam and get the score results. With the capabilities of Amazon Bedrock and the AWS SAM, you can increase educators’ productivity and foster student success.

For more information on working with generative AI on AWS for education use cases, refer to Generative AI in education: Building AI solutions using course lecture content.


About the Authors

Merieme Ezzaouia is a Solutions Architect at AWS dedicated to the public sector. She helps customers in education and sports turn their concepts into tangible solutions, develop new services, and foster innovation. Beyond work, Merieme’s passions include gardening, traveling the world, and reading.

Mohammed Reda is a Solutions Architect at Amazon Web Services. He helps UK schools, universities, and EdTech companies adopt cloud technologies, improve their educational offerings, and innovate on AWS. Outside of work, Mohammed enjoys running and watching cooking shows.

Read More

Accelerate NLP inference with ONNX Runtime on AWS Graviton processors

Accelerate NLP inference with ONNX Runtime on AWS Graviton processors

ONNX is an open source machine learning (ML) framework that provides interoperability across a wide range of frameworks, operating systems, and hardware platforms. ONNX Runtime is the runtime engine used for model inference and training with ONNX.

AWS Graviton3 processors are optimized for ML workloads, including support for bfloat16, Scalable Vector Extension (SVE), and Matrix Multiplication (MMLA) instructions. Bfloat16 accelerated SGEMM kernels and int8 MMLA accelerated Quantized GEMM (QGEMM) kernels in ONNX have improved inference performance by up to 65% for fp32 inference and up to 30% for int8 quantized inference for several natural language processing (NLP) models on AWS Graviton3-based Amazon Elastic Compute Cloud (Amazon EC2) instances. Starting version v1.17.0, the ONNX Runtime supports these optimized kernels.

In this post, we show how to run ONNX Runtime inference on AWS Graviton3-based EC2 instances and how to configure them to use optimized GEMM kernels. We also demonstrate the resulting speedup through benchmarking.

Optimized GEMM kernels

ONNX Runtime supports the Microsoft Linear Algebra Subroutine (MLAS) backend as the default Execution Provider (EP) for deep learning operators. AWS Graviton3-based EC2 instances (c7g, m7g, r7g, c7gn, and Hpc7g instances) support bfloat16 format and MMLA instructions for the deep learning operator acceleration. These instructions improve the SIMD hardware utilization and reduce the end-to-end inference latency by up to 1.65 times compared to the armv8 DOT product instruction-based kernels.

The AWS team implemented MLAS kernels for bfloat16 fast math and int8 quantized General Matrix Multiply (GEMM) using BFMMLA, SMMLA, and UMMLA instructions, which have higher matrix multiplication throughput compared to DOT instructions. The bfloat16 support allows efficient deployment of models trained using bfloat16, fp32, and automatic mixed precision (AMP) without the need for quantization. As shown in the following diagrams, the optimized GEMM kernels are integrated into the ONNX Runtime CPU EP as MLAS kernels.

The first figure illustrates the ONNX software stack, highlighting (in orange) the components optimized for inference performance improvement on the AWS Graviton3 platform.

onnx_highlevel_stack_graviton_kernels

The following diagram illustrates the ONNX Runtime EP flow, highlighting (in orange) the components optimized for inference performance improvement on the AWS Graviton3 platform.

onnxruntime_flow_Graviton_kernels

Enable the optimizations

The optimizations are part of the ONNX Runtime 1.17.0 release, and are available starting with onnxruntime-1.17.0 python wheels and conda-1.17.0 packages. Optimized int8 kernels are enabled by default, and will be picked up automatically for AWS Graviton3 Processors. Bfloat16 fast math kernels, on the other hand, are not enabled by default and need the following session options in ONNX Runtime to enable them:

# For C++ applications

SessionOptions so; 
so.config_options.AddConfigEntry( kOrtSessionOptionsMlasGemmFastMathArm64Bfloat16, "1");

# For Python applications

sess_options = onnxruntime.SessionOptions()
sess_options.add_session_config_entry("mlas.enable_gemm_fastmath_arm64_bfloat16", "1")

Benchmark results

We started with measuring the inference throughput, in queries per second, for the fp32 model without any of our optimizations (using ONNX Runtime 1.16.0), which is marked at 1.0 with the red dotted line in the following graph. Then we compared the improvements from bfloat16 fast math kernels from ONNX Runtime 1.17.1 for the same fp32 model inference. The normalized results are plotted in the graph. You can see that for the BERT, RoBERTa, and GPT2 models, the throughput improvement is up to 65%. Similar improvements are observed for the inference latency.

fp32_perf_improvement_onnx

Similar to the preceding fp32 inference comparison graph, we started with measuring the inference throughput, in queries per second, for the int8 quantized model without any of our optimizations (using ONNX Runtime 1.16.0), which is marked at 1.0 with the red dotted line in the following graph. Then we compared the improvements from the optimized MMLA kernels from ONNX Runtime 1.17.1 for the same model inference. The normalized results are plotted in the graph. You can see that for the BERT, RoBERTa, and GPT2 models, the throughput improvement is up to 30%. Similar improvements are observed for the inference latency.

int8_perf_improvement_onnx

Benchmark setup

We used an AWS Graviton3-based c7g.4xl EC2 instance with Ubuntu 22.04 based AMI to demonstrate the performance improvements with the optimized GEMM kernels from ONNX Runtime. The instance and the AMI details are mentioned in the following snippet:

Instance: c7g.4xl instance
Region: us-west-2
AMI: ami-0a24e6e101933d294 (Ubuntu 22.04/Jammy with 6.5.0-1014-aws kernel)

The ONNX Runtime repo provides inference benchmarking scripts for transformers-based language models. The scripts support a wide range of models, frameworks, and formats. We picked PyTorch-based BERT, RoBERTa, and GPT models to cover the common language tasks like text classification, sentiment analysis, and predicting the masked word. The models cover both encoder and decoder transformers architecture.

The following code lists the steps to run inference for the fp32 model with bfloat16 fast math mode and int8 quantized mode using the ONNX Runtime benchmarking script. The script downloads the models, exports them to ONNX format, quantizes them into int8 for int8 inference, and runs inference for different sequence lengths and batch sizes. Upon successful completion of the script, it will print the inference throughput in queries/sec (QPS) and latency in msec along with the system configuration. Refer to the ONNX Runtime Benchmarking script for more details.

# Install Python
sudo apt-get update
sudo apt-get install -y python3 python3-pip

# Upgrade pip3 to the latest version
python3 -m pip install --upgrade pip

# Install onnx and onnx runtime
# NOTE: We used 1.17.1 instead of 1.17.0 as it was the latest
# version available while collecting data for this post
python3 -m pip install onnx==1.15.0 onnxruntime==1.17.1

# Install the dependencies
python3 -m pip install transformers==4.38.1 torch==2.2.1 psutil==5.9.8

# Clone onnxruntime repo to get the benchmarking scripts
git clone --recursive https://github.com/microsoft/onnxruntime.git
cd onnxruntime
git checkout 430a086f22684ad0020819dc3e7712f36fe9f016
cd onnxruntime/python/tools/transformers

# To run bert-large fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m bert-large-uncased -p fp32 --enable_arm64_bfloat16_fastmath_mlas_gemm

# To run bert-base  fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m bert-base-cased -p fp32 --enable_arm64_bfloat16_fastmath_mlas_gemm

# To run roberta-base  fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m roberta-base -p fp32 --enable_arm64_bfloat16_fastmath_mlas_gemm

# To run gpt2  fp32 inference with bfloat16 fast math mode
python3 benchmark.py -m gpt2 -p fp32 --enable_arm64_bfloat16_fastmath_mlas_gemm

# To run bert-large int8 quantized inference
python3 benchmark.py -m bert-large-uncased -p int8

# To run bert-base int8 quantized inference
python3 benchmark.py -m bert-base-cased -p int8

# To run roberta-base int8 quantized inference
python3 benchmark.py -m roberta-base -p int8

# To run gpt2 int8 quantized inference
python3 benchmark.py -m gpt2 -p int8

Conclusion

In this post, we discussed how to run ONNX Runtime inference on an AWS Graviton3-based EC2 instance and how to configure the instance to use optimized GEMM kernels. We also demonstrated the resulting speedups. We hope that you will give it a try!

If you find use cases where similar performance gains are not observed on AWS Graviton, please open an issue on the AWS Graviton Technical Guide GitHub to let us know about it.


About the Author

Sunita Nadampalli is a Software Development Manager at AWS. She leads Graviton software performance optimizations for Machine Learning and HPC workloads. She is passionate about open source software development and delivering high-performance and sustainable software solutions with Arm SoCs.

Read More

Microsoft at CHI 2024: Innovations in human-centered design

Microsoft at CHI 2024: Innovations in human-centered design

Microsoft at CHI 2024

The ways people engage with technology, through its design and functionality, determine its utility and acceptance in everyday use, setting the stage for widespread adoption. When computing tools and services respect the diversity of people’s experiences and abilities, technology is not only functional but also universally accessible. Human-computer interaction (HCI) plays a crucial role in this process, examining how technology integrates into our daily lives and exploring ways digital tools can be shaped to meet individual needs and enhance our interactions with the world.

The ACM CHI Conference on Human Factors in Computing Systems is a premier forum that brings together researchers and experts in the field, and Microsoft is honored to support CHI 2024 as a returning sponsor. We’re pleased to announce that 33 papers by Microsoft researchers and their collaborators have been accepted this year, with four winning the Best Paper Award and seven receiving honorable mentions.

This research aims to redefine how people work, collaborate, and play using technology, with a focus on design innovation to create more personalized, engaging, and effective interactions. Several projects emphasize customizing the user experience to better meet individual needs, such as exploring the potential of large language models (LLMs) to help reduce procrastination. Others investigate ways to boost realism in virtual and mixed reality environments, using touch to create a more immersive experience. There are also studies that address the challenges of understanding how people interact with technology. These include applying psychology and cognitive science to examine the use of generative AI and social media, with the goal of using the insights to guide future research and design directions. This post highlights these projects.

Microsoft Research Podcast

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Dr. Bichlien Nguyen and Dr. David Kwabi explore their work in flow batteries and how machine learning can help more effectively search the vast organic chemistry space to identify compounds with properties just right for storing waterpower and other renewables.


Best Paper Award recipients

DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing 
Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, Chenglong Wang 
GUIs used for editing visualizations can overwhelm users or limit their interactions. To address this, the authors introduce DynaVis, which combines natural language interfaces with dynamically synthesized UI widgets, enabling people to initiate and refine edits using natural language.  

Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking  
Nikhil Sharma, Q. Vera Liao, Ziang Xiao  
Conversational search systems powered by LLMs potentially improve on traditional search methods, yet their influence on increasing selective exposure and fostering echo chambers remains underexplored. This research suggests that LLM-driven conversational search may enhance biased information querying, particularly when the LLM’s outputs reinforce user views, emphasizing significant implications for the development and regulation of these technologies.  

Piet: Facilitating Color Authoring for Motion Graphics Video  
Xinyu Shi, Yinghou Wang, Yun Wang, Jian Zhao 
Motion graphic (MG) videos use animated visuals and color to effectively communicate complex ideas, yet existing color authoring tools are lacking. This work introduces Piet, a tool prototype that offers an interactive palette and support for quick theme changes and controlled focus, significantly streamlining the color design process.

The Metacognitive Demands and Opportunities of Generative AI 
Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, Sean Rintel 
Generative AI systems offer unprecedented opportunities for transforming professional and personal work, yet they present challenges around prompting, evaluating and relying on outputs, and optimizing workflows. This paper shows that metacognition—the psychological ability to monitor and control one’s thoughts and behavior—offers a valuable lens through which to understand and design for these usability challenges.  


Honorable Mentions

Big or Small, It’s All in Your Head: Visuo-Haptic Illusion of Size-Change Using Finger-Repositioning
Myung Jin Kim, Eyal Ofek, Michel Pahud, Mike J. Sinclair, Andrea Bianchi 
This research introduces a fixed-sized VR controller that uses finger repositioning to create a visuo-haptic illusion of dynamic size changes in handheld virtual objects, allowing users to perceive virtual objects as significantly smaller or larger than the actual device. 

LLMR: Real-time Prompting of Interactive Worlds Using Large Language Models 
Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores, Jaron Lanier 
Large Language Model for Mixed Reality (LLMR) is a framework for the real-time creation and modification of interactive mixed reality experiences using LLMs. It uses novel strategies to tackle difficult cases where ideal training data is scarce or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity. 

Observer Effect in Social Media Use 
Koustuv Saha, Pranshu Gupta, Gloria Mark, Emre Kiciman, Munmun De Choudhury 
This work investigates the observer effect in behavioral assessments on social media use. The observer effect is a phenomenon in which individuals alter their behavior due to awareness of being monitored. Conducted over an average of 82 months (about 7 years) retrospectively and five months prospectively using Facebook data, the study found that deviations in expected behavior and language post-enrollment in the study reflected individual psychological traits. The authors recommend ways to mitigate the observer effect in these scenarios.

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming 
Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz 
By investigating how developers use GitHub Copilot, the authors created CUPS, a taxonomy of programmer activities during system interaction. This approach not only elucidates interaction patterns and inefficiencies but can also drive more effective metrics and UI design for code-recommendation systems with the goal of improving programmer productivity. 

SharedNeRF: Leveraging Photorealistic and View-dependent Rendering for Real-time and Remote Collaboration 
Mose Sakashita, Bala Kumaravel, Nicolai Marquardt, Andrew D. Wilson 
SharedNeRF, a system for synchronous remote collaboration, utilizes neural radiance field (NeRF) technology to provide photorealistic, viewpoint-specific renderings that are seamlessly integrated with point clouds to capture dynamic movements and changes in a shared space. A preliminary study demonstrated its effectiveness, as participants used this high-fidelity, multi-perspective visualization to successfully complete a flower arrangement task. 

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination 
Ananya Bhattacharjee, Yuchen Zeng, Sarah Yi Xu, Dana Kulzhabayeva, Minyi Ma, Rachel Kornfield, Syed Ishtiaque Ahmed, Alex Mariakakis, Mary P. Czerwinski, Anastasia Kuzminykh, Michael Liut, Joseph Jay Williams 
In this study, the authors explore the potential of LLMs for customizing academic procrastination interventions, employing a technology probe to generate personalized advice. Their findings emphasize the need for LLMs to offer structured, deadline-oriented advice and adaptive questioning techniques, providing key design insights for LLM-based tools while highlighting cautions against their use for therapeutic guidance.

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration 
Haotian Li, Yun Wang, Huamin Qu
This paper evaluates data storytelling tools using a dual framework to analyze the stages of the storytelling workflow—analysis, planning, implementation, communication—and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. The study identifies common collaboration patterns in existing tools, summarizes lessons from these patterns, and highlights future research opportunities for human-AI collaboration in data storytelling.


Learn more about our work and contributions to CHI 2024, including our full list of publications, on our conference webpage.

The post Microsoft at CHI 2024: Innovations in human-centered design appeared first on Microsoft Research.

Read More

Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

Learn how Amazon Ads created a generative AI-powered image generation capability using Amazon SageMaker

Amazon Ads helps advertisers and brands achieve their business goals by developing innovative solutions that reach millions of Amazon customers at every stage of their journey. At Amazon Ads, we believe that what makes advertising effective is delivering relevant ads in the right context and at the right moment within the consumer buying journey. With that goal, Amazon Ads has used artificial intelligence (AI), applied science, and analytics to help its customers drive desired business outcomes for nearly two decades.

In a March 2023 survey, Amazon Ads found that among advertisers who were unable to build successful campaigns, nearly 75 percent cited building the creative content as one of their biggest challenges. To help advertisers more seamlessly address this challenge, Amazon Ads rolled out an image generation capability that quickly and easily develops lifestyle imagery, which helps advertisers bring their brand stories to life. This blog post shares more about how generative AI solutions from Amazon Ads help brands create more visually rich consumer experiences.

In this blog post, we describe the architectural and operational details of how Amazon Ads implemented its generative AI-powered image creation solution on AWS. Before diving deeper into the solution, we start by highlighting the creative experience of an advertiser enabled by generative AI. Next, we present the solution architecture and process flows for machine learning (ML) model building, deployment, and inferencing. We end with lessons learned.

Advertiser creative experience

When building ad creative, advertisers prefer to customize the creative in a way that makes it relevant to their desired audiences. For example, an advertiser might have static images of their product against a white background. From an advertiser point of view, the process is handled in three steps:

  1. Image generation converts product-only images into rich, contextually relevant images using generative AI. The approach preserves the original product features, requiring no technical expertise.
  2. Anyone with access to the Amazon Ads console can create custom brand images without needing technical or design expertise.
  3. Advertisers can create multiple contextually relevant and engaging product images with no additional cost.

A benefit of the image-generation solution is the automatic creation of relevant product images based on product selection only, with no additional input required from the advertisers. While there are options to enhance background imagery such as prompts, themes, and custom product images, they are not necessary to generate compelling creative. If advertisers do not supply this information, the model will infer it based on information from their product listing on amazon.com.

An example screenshot from Amazon Ads generator where a product with various background.

Figure 1. An example from the image generation solution showing a hydro flask with various backgrounds.

Solution overview

Figure 2 shows a simplified solution architecture for inferencing and model deployment. The steps for the model development and deployment are shown in blue circles and depicted by roman-numerals (i,ii, … iv.) whereas inferencing steps are in orange with Hindu-Arabic numbers (1,2,… 8.).

AWS solution architecture showing the architecture for the Amazon Ads solution.

Figure 2. Solution architecture for inferencing and model deployment.

Amazon SageMaker is at the center of model development and deployment. The team used Amazon SageMaker JumpStart to rapidly prototype and iterate under their desired conditions (step i). Acting as a model hub, JumpStart provided a large selection of foundation models and the team quickly ran their benchmarks on candidate models. After selecting candidate large language models (LLMs), the science teams can proceed with the remaining steps by adding more customization. Amazon Ads applied scientists use SageMaker Studio as the web-based interface to work with SageMaker (step ii). SageMaker has the appropriate access policies to view some intermediary model results, which can be used for further experimentation (step iii).

The Amazon Ads team manually reviewed images at scale through a human-in-the-loop process where the team ensured that the application provides high quality and responsible images. To do that, the team deployed testing endpoints using SageMaker and generated a large number of images spanning various scenarios and conditions (step iv). Here, Amazon SageMaker Ground Truth allowed ML engineers to easily build the human-in-the-loop workflow (step v). The workflow allowed the Amazon Ads team to experiment with different foundation models and configurations through blind A/B testing to ensure that feedback to the generated images is unbiased. After the chosen model is ready to be moved into production, the model is deployed (step vi) using the team’s own in-house Model Lifecycle Manager tool. Under the hood, this tool uses artifacts generated by SageMaker (step vii) which is then deployed into the production AWS account (step viii), using SageMaker SDKs .

Regarding the inference, customers using Amazon Ads now have a new API to receive these generated images. The Amazon API Gateway receives the PUT request (step 1). The request is then processed by AWS Lambda, which uses AWS Step Functions to orchestrate the process (step 2). The product image is fetched from an image repository, which is a part of an existing solution predating this creative feature. The next step is to process customer text prompts and customize the image through content ingestion guardrails. Amazon Comprehend is used to detect undesired context in the text prompt, whereas Amazon Rekognition processes images for content moderation purposes (step 3). If the inputs pass the inspection, then the text continues as a prompt, while the image is processed by removing the background (step 4). Then, the deployed text-to-image model is used for image generation using the prompt and the processed image (step 5). The image is then uploaded into an Amazon Simple Storage Services (Amazon S3) bucket for images and the metadata about the image is stored in an Amazon DynamoDB table (step 6). This whole process starting from step 2 is orchestrated by AWS Step Functions. Finally, the Lambda function receives the image and meta-data (step 7) which are then sent to the Amazon Ads client service through the API Gateway (step 8).

Conclusion

This post presented the technical solution for the Amazon Ads generative AI-powered image generation solution, which advertisers can use to create customized brand images without needing a dedicated design team. Advertisers have a series of features to generate and customize images such as writing text prompts, selecting different themes, swapping the featured product, or uploading a new image of the product from their device or asset library allowing them to create impactful images for advertising their products.

The architecture uses modular microservices with separate components for model development, registry, model lifecycle management (which is an orchestration and step function-based solution to process advertiser inputs), select the appropriate model, and track the job throughout the service, and a customer facing API. Here, Amazon SageMaker is at the center of the solution, starting from JumpStart to final SageMaker deployment.

If you plan to build your generative AI application on Amazon SageMaker, the fastest way is with SageMaker JumpStart. Watch this presentation to learn how you can start your project with JumpStart.


About the Authors

Anita Lacea is the Single-Threaded Leader of generative AI image ads at Amazon, enabling advertisers to create visually stunning ads with the click of a button. Anita pairs her broad expertise across the hardware and software industry with the latest innovations in generative AI to develop performant and cost-optimized solutions for her customers, revolutionizing the way businesses connect with their audiences. She is passionate about traditional visual arts and is an exhibiting printmaker.

Burak Gozluklu is a Principal AI/ML Specialist Solutions Architect located in Boston, MA. He helps strategic customers adopt AWS technologies and specifically Generative AI solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. Burak is still a research affiliate in MIT. Burak is passionate about yoga and meditation.

Christopher de Beer is a senior software development engineer at Amazon located in Edinburgh, UK. With a background in visual design. He works on creative building products for advertising, focusing on video generation, helping advertisers to reach their customers through visual communication. Building products that automate creative production, using traditional as well as generative techniques, to reduce friction and delight customers. Outside of his work as an engineer Christopher is passionate about Human-Computer Interaction (HCI) and interface design.

Yashal Shakti Kanungo is an Applied Scientist III at Amazon Ads. His focus is on generative foundational models that take a variety of user inputs and generate text, images, and videos. It’s a blend of research and applied science, constantly pushing the boundaries of what’s possible in generative AI. Over the years, he has researched and deployed a variety of these models in production across the online advertising spectrum ranging from ad sourcing, click-prediction, headline generation, image generation, and more.

Sravan Sripada is a Senior Applied Scientist at Amazon located in Seattle, WA. His primary focus lies in developing generative AI models that enable advertisers to create engaging ad creatives (images, video, etc.) with minimal effort. Previously, he worked on utilizing machine learning for preventing fraud and abuse on the Amazon store platform. When not at work, He is passionate about engaging in outdoor activities and dedicating time to meditation.

Cathy Willcock is a Principal Technical Business Development Manager located in Seattle, WA. Cathy leads the AWS technical account team  supporting Amazon Ads adoption of AWS cloud technologies. Her team works across Amazon Ads enabling discovery, testing, design, analysis, and deployments of AWS services at scale, with a particular focus on innovation to shape the landscape across the AdTech and MarTech industry. Cathy has led engineering,  product, and marketing  teams and is an inventor of ground-to-air calling (1-800-RINGSKY).

Read More

Needle-Moving AI Research Trains Surgical Robots in Simulation

Needle-Moving AI Research Trains Surgical Robots in Simulation

A collaboration between NVIDIA and academic researchers is prepping robots for surgery.

ORBIT-Surgical — developed by researchers from the University of Toronto, UC Berkeley, ETH Zurich, Georgia Tech and NVIDIA — is a simulation framework to train robots that could augment the skills of surgical teams while reducing surgeons’ cognitive load.

It supports more than a dozen maneuvers inspired by the training curriculum for laparoscopic procedures, aka minimally invasive surgery, such as grasping small objects like needles, passing them from one arm to another and placing them with high precision.

The physics-based framework was built using NVIDIA Isaac Sim, a robotics simulation platform for designing, training and testing AI-based robots. The researchers trained reinforcement learning and imitation learning algorithms on NVIDIA GPUs and used NVIDIA Omniverse, a platform for developing and deploying advanced 3D applications and pipelines based on Universal Scene Description (OpenUSD), to enable photorealistic rendering.

Using the community-supported da Vinci Research Kit, provided by the Intuitive Foundation, a nonprofit supported by robotic surgery leader Intuitive Surgical, the ORBIT-Surgical research team demonstrated how training a digital twin in simulation transfers to a physical robot in a lab environment in the video below.

ORBIT-Surgical will be presented Thursday at ICRA, the IEEE International Conference on Robotics and Automation, taking place this week in Yokohama, Japan. The open-source code package is now available on GitHub.

A Stitch in AI Saves Nine

ORBIT-Surgical is based on Isaac Orbit, a modular framework for robot learning built on Isaac Sim. Orbit includes support for various libraries for reinforcement learning and imitation learning, where AI agents are trained to mimic ground-truth expert examples.

The surgical framework enables developers to train robots like the da Vinci Research Kit robot, or dVRK, to manipulate both rigid and soft objects using reinforcement learning and imitation learning frameworks running on NVIDIA RTX GPUs.

ORBIT-Surgical introduces more than a dozen benchmark tasks for surgical training, including one-handed tasks such as picking up a piece of gauze, inserting a shunt into a blood vessel or lifting a suture needle to a specific position. It also includes two-handed tasks, like handing a needle from one arm to another, passing a threaded needle through a ring pole and reaching two arms to specific positions while avoiding obstacles.

One of ORBIT-Surgical’s benchmark tests is inserting a shunt — shown on left with a real-world robot and on right in simulation.

By developing a surgical simulator that takes advantage of GPU acceleration and parallelization, the team is able to boost robot learning speed by an order of magnitude compared to existing surgical frameworks. They found that the robot digital twin could be trained to complete tasks like inserting a shunt and lifting a suture needle in under two hours on a single NVIDIA RTX GPU.

With the visual realism enabled by rendering in Omniverse, ORBIT-Surgical also allows researchers to generate high-fidelity synthetic data, which could help train AI models for perception tasks such as segmenting surgical tools in real-world videos captured in the operating room.

A proof of concept by the team showed that combining simulation and real data significantly improved the accuracy of an AI model to segment surgical needles from images — helping reduce the need for large, expensive real-world datasets for training such models.

Read the paper behind ORBIT-Surgical, and learn more about NVIDIA-authored papers at ICRA.

Read More

How Basecamp Research Helps Catalog Earth’s Biodiversity

How Basecamp Research Helps Catalog Earth’s Biodiversity

Basecamp Research is on a mission to capture the vastness of life on Earth at an unprecedented scale. Phil Lorenz, CTO at Basecamp Research, discusses using AI and biodiversity data to advance fields like medicine and environmental conservation with host Noah Kravitz in this AI Podcast episode recorded live at the NVIDIA GTC global AI conference. Lorenz explains Basecamp’s systematic collection of biodiversity data in partnership with nature parks worldwide and its use of deep learning to analyze and apply it for use cases such as protein structure prediction and gene editing. He also emphasizes the importance of ethical data governance and touches on technological advancements that will help drive the future of AI in biology. 

Basecamp Research is a member of the NVIDIA Inception program for cutting-edge startups. 

Stay tuned for more episodes recorded live at GTC, and hear more from Lorenz in this GTC session.

Time Stamps

1:31: What is Basecamp Research?
3:08: How does the process of sequencing biodiversity work?
5:15: What is the collected biodiversity data used for?
7:56: Gene editing and how biodiversity data can enhance it
9:00: How the development of AI has affected Basecamp’s work
14:33: Basecamp’s breakthroughs
16:49: AI and machine learning-related challenges Basecamp has encountered
26:02: Ethical considerations in data collecting

You Might Also Like…

AI2’s Christopher Bretherton Discusses Using Machine Learning for Climate Modeling – Ep. 220

Can machine learning help predict extreme weather events and climate change? Christopher Bretherton, senior director of climate modeling at the Allen Institute for Artificial Intelligence, or AI2, explores the technology’s potential to enhance climate modeling.

Cardiac Clarity: Dr. Keith Channon Talks Revolutionizing Heart Health With AI – Ep. 212

Here’s some news to still beating hearts: AI is helping bring some clarity to cardiology. In this episode of NVIDIA’s AI Podcast, Dr. Keith Channon, cofounder and chief medical officer at the startup, speaks with host Noah Kravitz about Caristo, an AI-powered solution for detecting inflammation in cardiac CT scans.

Matice Founder Jessica Whited on Harnessing Regenerative Species for Medical Breakthroughs – Ep. 198

Scientists at Matice Biosciences are using AI to study the regeneration of tissues in animals known as super-regenerators, such as salamanders and planarians. The research aims to develop new treatments that will help humans heal from injuries without scarring.

Bojan Tunguz, Johnny Israeli on How AI and Crowdsourcing Can Advance Vaccine Distribution – Ep. 195

Artificial intelligence is teaming up with crowdsourcing to improve the thermo-stability of mRNA vaccines, making distribution more accessible worldwide. In this episode of NVIDIA’s AI podcast, host Noah Kravitz interviewed Bojan Tunguz, a physicist and senior system software engineer at NVIDIA, and Johnny Israeli, senior manager of AI and cloud software at NVIDIA, about AI’s potential in drug discovery.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Podcasts, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Fire It Up: Mozilla Firefox Adds Support for AI-Powered NVIDIA RTX Video

Fire It Up: Mozilla Firefox Adds Support for AI-Powered NVIDIA RTX Video

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and which showcases new hardware, software, tools and accelerations for RTX PC users.

Mozilla Firefox, the popular open-source browser, is the latest partner to incorporate NVIDIA RTX Video, a technology that uses AI to improve video quality on Windows PCs and workstations. The browser’s latest release taps local NVIDIA RTX GPUs to make streaming and video better than ever.

Pixel Perfect

First announced at CES in January 2023, RTX Video is a collection of AI video enhancements that improve the quality of videos played on browsers through platforms like YouTube, Prime Video and Disney+. The technology makes videos streamed on NVIDIA GeForce RTX-powered PCs and RTX-powered workstations appear sharper and more detailed without requiring a higher-resolution source.

RTX Video is made up of two parts. RTX Video Super Resolution upscales low-resolution video for cleaner, crisper imagery. It works by analyzing the lower-resolution video and using deep learning to predict what the higher-resolution version should look like. The algorithm then combines this predicted image with a traditionally upscaled version to reduce or eliminate compression artifacts and sharpen the final output.

RTX Video HDR goes one step further: when enabled, it analyzes standard dynamic range (SDR) video content through AI neural networks to add high-dynamic range (HDR) information, improving visibility, details and vibrancy.

Since 90% of video online is 1080p or lower and SDR, enabling RTX Video is like pushing the “remaster” button on most of the content users watch everyday.

Pretty Foxy

Mozilla Firefox now supports RTX Video Super Resolution and HDR in its latest stable version (v126). It’s never been easier for users to access AI-enhanced upscaling, de-artifacting and HDR effects for online videos.

“Video is a core pillar of the modern web, and we are committed to delivering a great experience for our users,” said Bobby Holley, chief technology officer of Firefox at Mozilla. “Mozilla is integrating RTX Video into Firefox to improve video quality for our users with compatible RTX GPUs.”

Firefox joins other Chromium-based browsers, including Google Chrome and Microsoft Edge, in supporting RTX Video. RTX Video Super Resolution is also supported in popular video players like VLC.

Enabling RTX Video is easy:

  1. Update to the latest GeForce RTX Game Ready Driver, NVIDIA Studio or NVIDIA RTX Enterprise Driver.
  2. Ensure Windows HDR features are enabled by navigating to System > Display > HDR.
  3. Open the NVIDIA Control Panel and navigate to Adjust Video Image Settings > RTX Video Enhancement.
  4. Turn on “Super Resolution” and “High Dynamic Range.”

Note that RTX Video HDR requires an NVIDIA GeForce RTX or RTX professional GPU connected to an HDR10-compatible monitor or TV.

For more information, check out the RTX Video FAQ.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More