Using natural language in Amazon Q Business: From searching and creating ServiceNow incidents and knowledge articles to generating insights

Using natural language in Amazon Q Business: From searching and creating ServiceNow incidents and knowledge articles to generating insights

Many enterprise customers across various industries are looking to adopt Generative AI to drive innovation, user productivity, and enhance customer experience. Generative AI–powered assistants such as Amazon Q Business can be configured to answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business understands natural language and allows users to receive immediate, permissions-aware responses from enterprise data sources with citations. This capability supports various use cases such as IT, HR, and help desk.

With custom plugins for Amazon Q Business, you can enhance the application environment to enable your users to use natural language to perform specific tasks related to third-party applications — such as Jira, Salesforce, and ServiceNow — directly from within their web experience chat.

Enterprises that have adopted ServiceNow can improve their operations and boost user productivity by using Amazon Q Business for various use cases, including incident and knowledge management. Users can search ServiceNow knowledge base (KB) articles and incidents in addition to being able to create, manage, and track incidents and KB articles, all from within their web experience chat.

In this post, we’ll demonstrate how to configure an Amazon Q Business application and add a custom plugin that gives users the ability to use a natural language interface provided by Amazon Q Business to query real-time data and take actions in ServiceNow. By the end of this hands-on session, you should be able to:

  • Create an Amazon Q Business application and integrate it with ServiceNow using a custom plugin.
  • Use natural language in your Amazon Q web experience chat to perform read and write actions in ServiceNow such as querying and creating incidents and KB articles in a secure and governed fashion.

Prerequisites

Before proceeding, make sure that you have the necessary AWS account permissions and services enabled, along with access to a ServiceNow environment with the required privileges for configuration.

AWS

ServiceNow

  • Obtain a ServiceNow Personal Developer Instance or use a clean ServiceNow developer environment. You will need an account that has admin privileges to perform the configuration steps in ServiceNow.

Solution overview

The following architecture diagram illustrates the workflow for Amazon Q Business web experience with enhanced capabilities to integrate it seamlessly with ServiceNow.

Solution Overview

The implementation includes the following steps:

  1. The solution begins with configuring Amazon Q Business using the AWS Management Console. This includes setting up the application environment, adding users to AWS IAM Identity Center, selecting the appropriate subscription tier, and configuring the web experience for users to interact with. The environment can optionally be configured to provide real-time data retrieval using a native retriever, which pulls information from indexed data sources, such as Amazon Simple Storage Service (Amazon S3), during interactions.
  2. The next step involves adjusting the global controls and response settings for the application environment guardrails to allow Amazon Q Business to use its large language model (LLM) knowledge to generate responses when it cannot find responses from your connected data sources.
  3. Integration with ServiceNow is achieved by setting up an OAuth Inbound application endpoint in ServiceNow, which authenticates and authorizes interactions between Amazon Q Business and ServiceNow. This involves creating an OAuth API endpoint in ServiceNow and using the web experience URL from Amazon Q Business as the callback URL. The setup makes sure that Amazon Q Business can securely perform actions in ServiceNow with the same scoped permissions as the user signing in to ServiceNow.
  4. The final step of the solution involves enhancing the application environment with a custom plugin for ServiceNow using APIs defined in an OpenAPI schema. The plugin allows Amazon Q Business to securely interact with ServiceNow’s REST APIs, enabling operations such as querying, creating, and updating records dynamically and in real time

Configuring the Amazon Q Business application

To create an Amazon Q Business application, sign in to the Amazon Q Business console.
As a prerequisite to creating an Amazon Q Business application, follow the instructions in Configuring an IAM Identity Center instance section. Amazon Q Business integrates with IAM Identity Center to enable managing user access to your Amazon Q Business application. This is the recommended method for managing human access to AWS resources and the method used for the purpose of this blog.

Amazon Q Business also supports identity federation through IAM. When you use identity federation, you can manage users with your enterprise identity provider (IdP) and use IAM to authenticate users when they sign in to Amazon Q Business.

Create and configure the Amazon Q Business application:

  1. In the Amazon Q Business console, choose Application from the navigation pane and then choose Create application.
  2. Enter the following information for your Amazon Q Business application:
    • Application name: Enter a name for quick identification, such as my-demo-application.
    • Service access: Select the Create and use a new service-linked role (SLR). A service-linked role is a unique type of IAM role that is linked directly to Amazon Q Business. Service-linked roles are predefined by Amazon Q Business and include the permissions that the service requires to call other AWS services on your behalf.
    • Choose Create.
  3.  After creating your Amazon Q Business application environment, create and select the retriever and provision the index that will power your generative AI web experience. The retriever pulls data from the index in real time during a conversation. On the Select Retriever page:
    • Retrievers: Select Use native retriever.
    • Index provisioning: Select Starter, which is ideal for proof-of-concept or developer workloads. See Index types for more information.
    • Number of units: Enter 1. This indicates the capacity units that you want to provision for your index. Each unit is 20,000 documents. Choose Next.
    • Choose Next.

Select Retriever

  1. After you select a retriever for your Amazon Q Business application environment, you can optionally connect other data sources to it. Because a data source isn’t required for this session, we won’t configure one. For more information on connecting data sources to an Amazon Q Business application, see connecting data sources.
    • Choose Next.
  2. As an account admin, you can add users to your IAM Identity Center instance from the Amazon Q Business console. After you add users or groups to an application environment, you can then choose the Amazon Q Business tier for each user or group. On the Add groups and users page:
    • Choose Add groups and users.
    • In the Add new users dialog box that opens, enter the details of the user. The details you must enter for a single user include: Username, First name, Last name, email address, Confirm email address, and Display name.
    • Choose Next and then Add. The user is automatically added to an IAM Identity Center directory and an email invitation to join Identity Center is sent to the email address provided.
    • After adding a user or group, choose the Amazon Q Business subscription tier for each user or group. From the Current subscription dropdown menu, select Q Business Pro.
    • For the Web experience service access, select Create and use a new service role.
    • Choose Create application.

    Add groups and users

Upon successful completion, Amazon Q Business returns a web experience URL that you can share with the users you added to your application environment. The Web experience URL (in this case: https://xxxxxxxx.chat.qbusiness.us-east-1.on.aws/) will be used when creating an OAuth application endpoint in ServiceNow. Note that your web experience URL will be different from the one shown here.

Application Created

Enhancing an Amazon Q Business application with guardrails

By default, an Amazon Q Business application is configured to respond to user chat queries using only enterprise data. Because we didn’t configure a data source for the purpose of this post, you will use Admin controls and guardrails to allow Amazon Q to use its LLM world knowledge to generate responses when it cannot find responses from your connected data sources.

Create a custom plugin for ServiceNow:

  1. From the Amazon Q Business console, choose Applications in the navigation pane. Select the name of your application from the list of applications.
  2. From the navigation pane, choose Enhancements, and then choose Admin Controls and guardrails.
  3. In Global Controls, choose Edit.
  4. In Response settings under Application guardrails, select Allow Amazon Q to fall back to LLM knowledge.

create guardrails

Configuring ServiceNow

To allow Amazon Q Business to connect to your ServiceNow instance, you need to create an OAuth inbound application endpoint. OAuth-based authentication validates the identity of the client that attempts to establish a trust on the system by using an authentication protocol. For more information, see OAuth Inbound and Outbound authentication.

Create an OAuth application endpoint for external client applications to access the ServiceNow instance:

  1. In the ServiceNow console, navigate to All, then System OAuth, then Application Registry and then choose New. On the interceptor page, select Create an OAuth API endpoint for external clients and then fill in the form with details for Name and Redirect URL. The other fields are automatically generated by the ServiceNow OAuth server.
    • The Redirect URL is the callback URL that the authorization server redirects to. Enter the web experience URL of your Amazon Q Business application environment (which is the client requesting access to the resource), appended by oauth/callback.
    • For this example, the URL is: https://xxxxxxxx.chat.qbusiness.us-east-1.on.aws/oauth/callback
  2. For Auth Scope, set the value to useraccount. The scope API response parameter defines the amount of access granted by the access token, which means that the access token has the same rights as the user account that authorized the token. For example, if Abel Tuter authorizes an application by providing login credentials, then the resulting access token grants the token bearer the same access privileges as Abel Tuter.
  3. Choose Submit.

This creates an OAuth client application record and generates a client ID and client secret, which Amazon Q Business needs to access the restricted resources on the instance. You will need this authentication information (client ID and client secret) in the following custom plugin configuration process.

ServiceNow App Registry OAuth

Enhancing the Amazon Q Business application environment with custom plugins for ServiceNow

To integrate with external applications, Amazon Q Business uses APIs, which are configured as part of the custom plugins.

Before creating a custom plugin, you need to create or edit an OpenAPI schema, outlining the different API operations that you want to enable for your custom plugin. Amazon Q Business uses the configured third-party OpenAPI specifications to dynamically determine which API operations to perform to fulfill a user request. Therefore, the OpenAPI schema definition has a big impact on API selection accuracy and might require design optimizations. In order to maximize accuracy and improve efficiency with an Amazon Q Business custom plugin, follow the best practices for configuring OpenAPI schema definitions.

To configure a custom plugin, you must define at least one and a maximum of eight API operations that can be invoked. To define the API operations, create an OpenAPI schema in JSON or YAML format. You can create OpenAPI schema files and upload them to Amazon S3. Alternatively, you can use the OpenAPI text editor in the console, which will validate your schema.

For this post, a working sample of an OpenAPI Schema for ServiceNow is provided in JSON format. Before using it, edit the template file and replace <YOUR_SERVICENOW_INSTANCE_URL> in the following sections with the URL of your ServiceNow instance.

You can use the REST API Explorer to browse available APIs, API versions, and methods for each API. The explorer enables you to test REST API requests straight from the user interface. The Table API provides endpoints that allow you to perform create, read, update, and delete (CRUD) operations on existing tables. The calling user must have sufficient roles to access the data in the table specified in the request. For additional information on assigning roles, see Managing roles.

{
  "openapi": "3.0.1",
  "info": {
    "title": "Table API",
    "description": "Allows you to perform create, read, update and delete (CRUD) operations on existing tables",
    "version": "latest"
  },
  "externalDocs": {
    "url": "https://docs.servicenow.com/?context=CSHelp:REST-Table-API"
  },
  "servers": [
    {
      "url": "YOUR_SERVICENOW_INSTANCE_URL"
    }
  ],
  "paths": {
    "/api/now/table/{tableName}": {
      "get": {
        "description": "Retrieve records from a table",
        "parameters": [
          {
            "name": "tableName",
            "in": "path",
            "description": "Table Name",
            "required": true,
            "schema": {
              "type": "string"
            }
          },
          {
            "name": "sysparm_query",
            "in": "query",
            "description": "An encoded query string used to filter the results like Incidents Numbers or Knowledge Base IDs etc",
            "required": true,
            "schema": {
              "type": "string"
            }
          },
          {
            "name": "sysparm_fields",
            "in": "query",
            "description": "A comma-separated list of fields to return in the response",
            "required": false,
            "schema": {
              "type": "string"
            }
          },
          {
            "name": "sysparm_limit",
            "in": "query",
            "description": "The maximum number of results returned per page",
            "required": false,
            "schema": {
              "type": "string"
            }
          }
        ],
        "responses": {
          "200": {
            "description": "ok",
            "content": {
              "application/json": {
                "schema": {
                  "$ref": "#/components/schemas/incident"
                }
              }
            }
          }
        }
      },
      "post": {
        "description": "Create a record",
        "parameters": [
          {
            "name": "tableName",
            "in": "path",
            "description": "Table Name",
            "required": true,
            "schema": {
              "type": "string"
            }
          }
        ],
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "type": "object",
                "properties": {
                  "short_description": {
                    "type": "string",
                    "description": "Short Description"
                  },
                  "description": {
                    "type": "string",
                    "description": "Full Description for Incidents only"
                  },
                  "caller_id": {
                    "type": "string",
                    "description": "Caller Email"
                  },
                  "state": {
                    "type": "string",
                    "description": "State of the incident",
                    "enum": [
                      "new",
                      "in_progress",
                      "resolved",
                      "closed"
                    ]
                  },
                  "text": {
                    "type": "string",
                    "description": "Article Body Text for Knowledge Bases Only (KB)"
                  }
                },
                "required": [
                  "short_description",
                  "caller_id"
                ]
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "description": "ok",
            "content": {
              "application/json": {}
            }
          }
        }
      }
    },
    "/api/now/table/{tableName}/{sys_id}": {
      "get": {
        "description": "Retrieve a record",
        "parameters": [
          {
            "name": "tableName",
            "in": "path",
            "description": "Table Name",
            "required": true,
            "schema": {
              "type": "string"
            }
          },
          {
            "name": "sys_id",
            "in": "path",
            "description": "Sys ID",
            "required": true,
            "schema": {
              "type": "string"
            }
          },
          {
            "name": "sysparm_fields",
            "in": "query",
            "description": "A comma-separated list of fields to return in the response",
            "required": false,
            "schema": {
              "type": "string"
            }
          }
        ],
        "responses": {
          "200": {
            "description": "ok",
            "content": {
              "application/json": {},
              "application/xml": {},
              "text/xml": {}
            }
          }
        }
      },
      "delete": {
        "description": "Delete a record",
        "parameters": [
          {
            "name": "tableName",
            "in": "path",
            "description": "Table Name",
            "required": true,
            "schema": {
              "type": "string"
            }
          },
          {
            "name": "sys_id",
            "in": "path",
            "description": "Sys ID",
            "required": true,
            "schema": {
              "type": "string"
            }
          }
        ],
        "responses": {
          "200": {
            "description": "ok",
            "content": {
              "application/json": {},
              "application/xml": {},
              "text/xml": {}
            }
          }
        }
      },
      "patch": {
        "description": "Update or modify a record",
        "parameters": [
          {
            "name": "tableName",
            "in": "path",
            "description": "Table Name",
            "required": true,
            "schema": {
              "type": "string"
            }
          },
          {
            "name": "sys_id",
            "in": "path",
            "description": "Sys ID",
            "required": true,
            "schema": {
              "type": "string"
            }
          }
        ],
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "type": "object",
                "properties": {
                  "short_description": {
                    "type": "string",
                    "description": "Short Description"
                  },
                  "description": {
                    "type": "string",
                    "description": "Full Description for Incidents only"
                  },
                  "caller_id": {
                    "type": "string",
                    "description": "Caller Email"
                  },
                  "state": {
                    "type": "string",
                    "description": "State of the incident",
                    "enum": [
                      "new",
                      "in_progress",
                      "resolved",
                      "closed"
                    ]
                  },
                  "text": {
                    "type": "string",
                    "description": "Article Body Text for Knowledge Bases Only (KB)"
                  }
                },
                "required": [
                  "short_description",
                  "caller_id"
                ]
              }
            }
          },
          "required": true
        },
        "responses": {
          "200": {
            "description": "ok",
            "content": {
              "application/json": {},
              "application/xml": {},
              "text/xml": {}
            }
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "incident": {
        "type": "object",
        "properties": {
          "sys_id": {
            "type": "string",
            "description": "Unique identifier for the incident"
          },
          "number": {
            "type": "string",
            "description": "Incident number"
          },
          "short_description": {
            "type": "string",
            "description": "Brief description of the incident"
          }
        }
      }
    },
    "securitySchemes": {
      "oauth2": {
        "type": "oauth2",
        "flows": {
          "authorizationCode": {
            "authorizationUrl": "YOUR_SERVICENOW_INSTANCE_URL/oauth_auth.do",
            "tokenUrl": "YOUR_SERVICENOW_INSTANCE_URL/oauth_token.do",
            "scopes": {
            "useraccount": "Access equivalent to the user's account"
            }
          }
        }
      }
    }
  },
  "security": [
    {
      "oauth2": [
        "useraccount"
      ]
    }
  ]
}

The URL for the ServiceNow instance used in this post is: https://devxxxxxx.service-now.com/. After updating the sections of the template with the URL for this specific instance, the JSON should look like the following:

  "servers": [
    {
      "url": "https://devxxxxxx.service-now.com/"
    }
    "securitySchemes": {
      "oauth2": {
        "type": "oauth2",
        "flows": {
          "authorizationCode": {
            "authorizationUrl": "https://devxxxxxx.service-now.com/oauth_auth.do",
            "tokenUrl": "https://devxxxxxx.service-now.com/oauth_token.do",
            "scopes": {
              "useraccount": "Access equivalent to the user's account"
            }
          }
        }
      }
    }

To create a custom plugin for ServiceNow:

    1. Sign in to the Amazon Q Business console.
    2. Choose Applications in the navigation pane, and then select your application from the list of applications.
    3. In the navigation pane, choose Enhancements, and then choose Plugins.
    4. In Plugins, choose Add plugin.
    5. In Add plugins, choose Custom plugin.
      Create Custom Plugin
    6. In Custom plugin, enter the following information:
      • In Name and description, for Plugin name: Enter a name for your Amazon Q plugin.
      • In API schema, for API schema source, select Define with in-line OpenAPI schema editor.
      • Select JSON as the format for the schema.
      • Remove any sample schema that appears in the inline OpenAPI schema editor and replace it with the text from the provided sample JSON template, updated with your ServiceNow instance URL.

      Enter Custom Plugin Details

    7. In Authentication: Select Authentication required.
    8. For AWS Secrets Manager secret, choose Create and add a new secret. You need to store the ServiceNow OAuth authentication credentials in a Secrets Manager secret to connect your third-party application to Amazon Q. In the window that opens, enter the details in the form:
      • Secret name: A name for your Secrets Manager secret.
      • Client ID: The Client ID from ServiceNow OAuth configuration in the previous section.
      • Client secret: The Client Secret from ServiceNow OAuth configuration in the previous section.
      • OAuth callback URL: The URL the user needs to be redirected to after authentication. This will be your web experience URL. For this example, it’s: https://xxxxxxxx.chat.qbusiness.us-east-1.on.aws/oauth/callback. Amazon Q Business will handle OAuth tokens in this URL.

Create AWS Secrets Manager secret

  1. In Choose a method to authorize Amazon Q Business: Select Create and add a new service role. The console will generate a service role name. To connect Amazon Q Business to third-party applications that require authentication, you need to give the Amazon Q role permissions to access your Secrets Manager secret. This will enable an Amazon Q Business custom plugin to access the credentials needed to sign in to the third-party service.
    Custom Plugin Authentication
  2. Choose Add plugin to add your plugin.

Upon successful completion, the plugin will appear under Plugins with Build status of Ready and Plugin status Active.
Custom Plugin Active

Using Amazon Q Business web experience chat to take actions in ServiceNow

Users can launch your Amazon Q Business web experience in two ways:

  • AWS access portal URL provided in an invitation email sent to the user to join AWS IAM Identity Center.
  • Web experience URL shared by the admin.

Navigate to the deployed web experience URL and sign with your AWS IAM Identity Center credentials.
After signing in, choose the New conversation icon in the left-hand menu to start a conversation.

Example: Search Knowledge Base Articles in ServiceNow for user issue and create an incident

The following chat conversation example illustrates a typical use case of Amazon Q Business integrated with custom plugins for ServiceNow. These features allow you to perform a wide range of tasks tailored to your organization’s needs.

In this example, we initiate a conversation in the web experience chat to search for KB articles related to ”log in issues” in ServiceNow by invoking a plugin action. After the user submits a prompt, Amazon Q Business queries ServiceNow through the appropriate API to retrieve the results and provides a response with related KB articles. We then proceed by asking Amazon Q Business for more details to see if any of the KB articles directly addresses the user’s issue. When no relevant KB articles pertaining to the user’s issue are found, we ask Amazon Q Business to summarize the conversation and create a new incident in ServiceNow, making sure the issue is logged for resolution.

User prompt 1 – I am having issues logging in to the intranet and want to know if there are any ServiceNow KB articles on log-in issues. Perform the search on both Short Description and Text field using LIKE operator

Before submitting the preceding prompt for an action to create an incident in ServiceNow, choose the vertical ellipsis to open Conversation settings, then choose Use a Plugin to select the corresponding custom plugin for ServiceNow.
Web Experience Chat conversation with Amazon Q Business with Custom Plugin
If this is the first time a user is accessing the custom plugin or if their past sign-in has expired, the user will need to authenticate. After authenticating successfully, Amazon Q Business will perform the requested task.

Choose Authorize.
Amazon Q Business Authorization for ServiceNow Interaction

If the user isn’t already signed in to ServiceNow, they will be prompted to enter their credentials. For this example, the user signing in to ServiceNow is the admin user and API actions performed in ServiceNow by Amazon Q Business on behalf of the user will have the same level of access as the user within ServiceNow.
ServiceNow Login

Choose Allow for Amazon Q Business to connect to ServiceNow and perform the requested task on your behalf.

Allow Access to Amazon Q Business

Upon executing the user’s request after verifying that they are authorized, Amazon Q Business responds with the information that it retrieved. We then proceed to retrieve additional details with the following prompt.

User prompt 2 – Can you list the KB number and short description in a tabular form?

Conversation with Amazon Q Business to search for KB articles in ServiceNow
Because there no KB articles related the user’s issue were found, we will ask Amazon Q to summarize the conversation context to create an incident with the following prompt.

User prompt 3 – The error I get is "Unable to Login After System Upgrade". Summarize my issue and create an incident with detailed description and add a note that this needs to be resolved asap.

In response to your prompt for an action, Amazon Q displays a review form where you can modify or fill in the necessary information.

To successfully complete the action, choose submit.

Note: The caller_id value entered in the following example is a valid ServiceNow user.

Amazon Q Business Create Service Now Incident
Your web experience will display a success message if the action succeeds, or an error message if the action fails. In this case, the action succeeded and Amazon Q Business responded accordingly.

Amazon Q Business - Success message after incident Creation

The following screenshot shows that the incident was created successfully in ServiceNow.

Shows ServiceNow Incident Created from Amazon Q Business

Troubleshooting common errors

To have a seamless experience with third-party application integrations, it’s essential to thoroughly test, identify, and troubleshoot unexpected behavior.

A common error encountered in Amazon Q Business is API Response too large, which occurs when an API response size exceeds the current limit of 100 KB. While prompting techniques are essential for obtaining accurate and relevant answers, optimizing API responses to include only the necessary and relevant data is crucial for better response times and enhanced user experience.

The REST API Explorer (shown in the following figure) in ServiceNow is a tool that allows developers and administrators to interact with and test the ServiceNow REST APIs directly from within the ServiceNow environment. It provides a user-friendly interface for making API requests, viewing responses, and understanding the available endpoints and data structures. Using this tool simplifies the process of testing and integrating with ServiceNow.
Rest API Explorer in ServiceNow

Clean up

To clean up AWS configurations, sign in to the Amazon Q Business console.

  1. From the Amazon Q Business console, in Applications, select the application that you want to delete.
  2. Choose Actions and select Delete.
  3. To confirm deletion, enter Delete.

This will take a few minutes to finish. When completed, the application and the configured custom plugin will be deleted.
Delete Amazon Q Business App

When you delete the Amazon Q Business application, the users created as part of the configuration are not automatically deleted from IAM Identity Center. Use the instructions in Delete users in IAM Identity Center to delete the users created for this post.

To clean up in ServiceNow, release the Personal Developer Instance provisioned for this post by following the instructions in the ServiceNow Documentation.

Conclusion

The integration of generative AI-powered assistants such as Amazon Q Business with enterprise systems such as ServiceNow offers significant benefits for organizations. By using natural language processing capabilities, enterprises can streamline operations, enhance user productivity, and deliver better customer experiences. The ability to query real-time data and create incidents and knowledge articles through a secure and governed chat interface transforms how users interact with enterprise data and applications. As demonstrated in this post, enhancing Amazon Q Business to integrate with ServiceNow using custom plugins empowers users to perform complex tasks effortlessly, driving efficiency across various business functions. Adopting this technology not only modernizes workflows, but also positions enterprises at the forefront of innovation.

Learn more


About the Author

Siddhartha Angara is a Senior Solutions Architect at Amazon Web Services. He helps enterprise customers design and build well-architected solutions in the cloud, accelerate cloud adoption, and build Machine Learning and Generative AI applications. He enjoys playing the guitar, reading and family time!

Read More

NVIDIA Awards up to $60,000 Research Fellowships to PhD Students

NVIDIA Awards up to $60,000 Research Fellowships to PhD Students

For more than two decades, the NVIDIA Graduate Fellowship Program has supported graduate students doing outstanding work relevant to NVIDIA technologies. Today, the program announced the latest awards of up to $60,000 each to 10 Ph.D. students involved in research that spans all areas of computing innovation.

Selected from a highly competitive applicant pool, the awardees will participate in a summer internship preceding the fellowship year. Their work puts them at the forefront of accelerated computing — tackling projects in autonomous systems, computer architecture, computer graphics, deep learning, programming systems, robotics and security.

The NVIDIA Graduate Fellowship Program is open to applicants worldwide.

The 2025-2026 fellowship recipients are:

  • Anish Saxena, Georgia Institute of Technology — Rethinking data movement across the stack — spanning large language model architectures, system software and memory systems — to improve the efficiency of LLM training and inference.
  • Jiawei Yang, University of Southern California — Creating scalable, generalizable foundation models for autonomous systems through self-supervised learning, leveraging neural reconstruction to capture detailed environmental geometry and dynamic scene behaviors, and enhancing adaptability in robotics, digital twin technologies and autonomous driving.
  • Jiayi (Eris) Zhang, Stanford University — Developing intelligent algorithms, models and tools for enhancing user creativity and productivity in design, animation and simulation.
  • Ruisi Cai, University of Texas at Austin — Working on efficient training and inference for large foundation models as well as AI security and privacy.
  • Seul Lee, Korea Advanced Institute of Science and Technology — Developing generative models for molecules and exploration strategies in chemical space for drug discovery applications.
  • Sreyan Ghosh, University of Maryland, College Park — Advancing audio processing and reasoning by designing resource-efficient models and training techniques, improving audio representation learning and enhancing audio perception for AI systems.
  • Tairan He, Carnegie Mellon University — Researching the development of humanoid robots, with a focus on advancing whole-body loco-manipulation through large-scale simulation-to-real learning.
  • Xiaogeng Liu, University of Wisconsin–Madison — Developing robust and trustworthy AI systems, with an emphasis on evaluating and enhancing machine learning models to ensure consistent performance and resilience against diverse attacks and unforeseen inputs.
  • Yunze Man, University of Illinois Urbana-Champaign — Developing vision-centric reasoning models for multimodal and embodied AI agents, with a focus on object-centric perception systems in dynamic scenes, vision foundation models for open-world scene understanding and generation, and large multimodal models for embodied reasoning and robotics planning.
  • Zhiqiang Xie, Stanford University — Building infrastructures to enable more efficient, scalable and complex compound AI systems while enhancing the observability and reliability of such systems.

We also acknowledge the 2025-2026 fellowship finalists:

  • Bo Zhao, University of California, San Diego
  • Chenning Li, Massachusetts Institute of Technology
  • Dacheng Li, University of California, Berkeley
  • Jiankai Sun, Stanford University
  • Wenlong Huang, Stanford University

Read More

docTR joins PyTorch Ecosystem: From Pixels to Data, Building a Recognition Pipeline with PyTorch and docTR

docTR joins PyTorch Ecosystem: From Pixels to Data, Building a Recognition Pipeline with PyTorch and docTR

docTR logo

We’re thrilled to announce that the docTR project has been integrated into the PyTorch ecosystem! This integration ensures that docTR aligns with PyTorch’s standards and practices, giving developers a reliable, community-backed solution for powerful OCR workflows.

For more information on what it means to be a PyTorch ecosystem project, see the PyTorch Ecosystem Tools page.

About docTR

docTR is an Apache 2.0 project developed and distributed by Mindee to help developers integrate OCR capabilities into applications with no prior knowledge required.

To quickly and efficiently extract text information, docTR uses a two-stage approach:

  • First, it performs text detection to localize words.
  • Then, it conducts text recognition to identify all characters in a word.

Detection and recognition are performed by state-of-the-art models written in PyTorch. To learn more about this approach, you can refer to the docTR documentation.

docTR enhances the user experience in PyTorch projects by providing high-performance OCR capabilities right out of the box. Its specially designed models require minimal to no fine-tuning for common use cases, allowing developers to quickly integrate advanced document analysis features.

Local installation

docTR requires Python >= 3.10 and supports Windows, Mac and Linux. Please refer to our README for necessary dependencies for MacBook with the M1 chip.

pip3 install -U pip
pip3 install "python-doctr[torch,viz]"

This will install docTR along with the latest version of PyTorch.

Note: docTR also provides docker images for an easy deployment, such as a part of Kubernetes cluster.

Text recognition

Now, let’s try docTR’s OCR recognition on this sample:

OCR sample

The OCR recognition model expects an image with only one word on it and will output the predicted word with a confidence score. You can use the following snippet to test OCR capabilities from docTR:

python
from doctr.io import DocumentFile
from doctr.models import recognition_predictor

doc = DocumentFile.from_images("/path/to/image")

# Load the OCR model
# This will download pre-trained models hosted by Mindee
model = recognition_predictor(pretrained=True)

result = model(doc)
print(result)

Here, the most important line of code is model = recognition_predictor(pretrained=True). This will load a default text recognition model, crnn_vgg16_bn, but you can select other models through the arch parameter. You can check out the available architectures.

When run on the sample, the recognition predictor retrieves the following data: [('MAGAZINE', 0.9872216582298279)]

Note: using the DocumentFile object docTR provides an easy way to manipulate PDF or Images.

Text detection

The last example was a crop on a single word. Now, what about an image with several words on it, like this one?

photo of magazines

A text detection model is used before the text recognition to output a segmentation map representing the location of the text. Following that, the text recognition is applied on every detected patch.

Below is a snippet to run only the detection part:

from doctr.io import DocumentFile
from doctr.models import detection_predictor
from matplotlib import pyplot as plt
from doctr.utils.geometry import detach_scores
from doctr.utils.visualization import draw_boxes

doc = DocumentFile.from_images("path/to/my/file")
model = detection_predictor(pretrained=True)

result = model(doc)

draw_boxes(detach_scores([result[0]["words"]])[0][0], doc[0])
plt.axis('off')
plt.show()

Running it on the full sample yields the following:

photo of magazines

Similarly to the text recognition, detection_predictor will load a default model (fast_base here). You can also load another one by providing it through the arch parameter.

The full implementation

Now, let’s plug both components into the same pipeline.

Conveniently, docTR provides a wrapper that does exactly that for us:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_images("/path/to/image")

model = ocr_predictor(pretrained=True, assume_straight_pages=False)

result = model(doc)
result.show()

photo of magazines

The last line should display a matplotlib window which shows the detected patches. Hovering the mouse over them will display their contents.

You can also do more with this output, such as reconstituting a synthetic document like so:

import matplotlib.pyplot as plt

synthetic_pages = result.synthesize()
plt.imshow(synthetic_pages[0])
plt.axis('off')
plt.show()

black text on white

The pipeline is highly customizable, where you can modify the detection or recognition model behaviors by passing arguments to the ocr_predictor. Please refer to the documentation to learn more about it.

Conclusion

We’re excited to welcome docTR into the PyTorch Ecosystem, where it seamlessly integrates with PyTorch pipelines to deliver state-of-the-art OCR capabilities right out of the box.

By empowering developers to quickly extract text from images or PDFs using familiar tooling, docTR simplifies complex document analysis tasks and enhances the overall PyTorch experience.

We invite you to explore the docTR GitHub repository, join the docTR community on Slack, and reach out at contact@mindee.com for inquiries or collaboration opportunities.

Together, we can continue to push the boundaries of document understanding and develop even more powerful, accessible tools for everyone in the PyTorch community.

Read More

Accelerating LLM Inference on NVIDIA GPUs with ReDrafter

Accelerating LLM inference is an important ML research problem, as auto-regressive token generation is computationally expensive and relatively slow, and improving inference efficiency can reduce latency for users. In addition to ongoing efforts to accelerate inference on Apple silicon, we have recently made significant progress in accelerating LLM inference for the NVIDIA GPUs widely used for production applications across the industry.
Earlier this year, we published and open sourced Recurrent Drafter (ReDrafter), a novel approach to speculative decoding that achieves state of the art…Apple Machine Learning Research

NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou

NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou

Illustrated headshots of Lidong Zhou and Eliza Strickland

The Microsoft Research Podcast offers its audience a unique view into the technical advances being pursued at Microsoft through the insights and personal experiences of the people committed to those pursuits.

Just after his keynote at the 38th annual Conference on Neural Information Processing Systems (NeurIPS), Microsoft Corporate Vice President Lidong Zhou joins guest host Eliza Strickland of IEEE Spectrum at the conference to further explore the topic of his talk: the co-evolution of systems and AI. Zhou, who is also chief scientist of the Microsoft Asia-Pacific Research and Development Group and managing director of Microsoft Research Asia, discusses how rapidly advancing AI impacts the systems supporting it; AI as a tool for improving systems engineering itself; and how budding computer scientists can prepare for innovating in a world where AI and systems grow together.

Learn more: 

Verus: A Practical Foundation for Systems Verification
Publication, November 2024

SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation
Publication, July 2024

BitNet: Scaling 1-bit Transformers for Large Language Models
Publication, October 2023

Transcript

[MUSIC]

ELIZA STRICKLAND: Welcome to the Microsoft Research Podcast, where Microsoft’s leading researchers bring you to the cutting edge. This series of conversations showcases the technical advances being pursued at Microsoft through the insights and experiences of the people driving them.

I’m Eliza Strickland, a senior editor at IEEE Spectrum and your guest host for a special edition of the podcast.

[MUSIC FADES]

Joining me today in the Microsoft Booth at the 38th annual Conference on Neural Information Processing Systems, or NeurIPS, is Lidong Zhou. Lidong is a Microsoft corporate vice president, chief scientist of the Microsoft Asia-Pacific Research and Development Group, and managing director of Microsoft Research Asia. Earlier today, Lidong gave a keynote here at NeurIPS on the co-evolution of AI and systems engineering.

Lidong, welcome to the podcast.


LIDONG ZHOU: Thank you, Eliza. It’s such a pleasure to be here.

STRICKLAND: You said in your keynote that progress in AI is now outpacing progress in the systems supporting AI. Can you give me some concrete examples of where the current infrastructure is struggling to keep up?

ZHOU: Yeah. So actually, we have been working on supporting AI from the infrastructure perspective, and I can say, you know, there are at least three dimensions where it’s actually posing a lot of challenges. One dimension is that the scale of the AI systems that we have to support. You know, you heard about the scaling law in AI and, you know, demanding even higher scale every so often. And when we scale, as I mentioned in the talk this morning, every time you scale the system, you actually have to rethink how to design a system, develop a new methodology, revisit all the assumptions. And it becomes very challenging for the community to keep up. And the other dimension is if you look at AI systems, it’s actually a whole-stack kind of design. You have to understand not only the AI workloads, the model architecture, but also the software and also the underlying hardware. And you have to make sure they are all aligned to deliver the best performance. And the third dimension is the temporal dimension, where you really see accelerated growth and the pace of innovation in AI and not actually only in AI but also in the underlying hardware. And that puts a lot of pressure on how fast we innovate on the systems side because we really have to keep up in that dimension, as well. So all those three dimensions add up. It’s becoming a pretty challenging task for the whole systems community.

STRICKLAND: I like how in your talk you proposed a marriage between systems engineering and AI. What does this look like in practice, and how might it change the way we approach both fields?

ZHOU: Yeah, so I’m actually a big fan of systems community and AI community work together to tackle some of the most challenging problems. Of course, you know, we have been working on systems that support AI. But now increasingly, we’re seeing opportunities where AI can actually help developers to become more productive and develop systems that are better in many dimensions in terms of efficiency, in terms of reliability, in terms of trustworthiness. So I really want to see the two communities work together even more closely going forward. You know, I talk about, sort of, the three pillars, right—the efficiency; there’s trust; there’s also the infusion of the two (AI and systems engineering)—that are three ambitions that we are actually working on. And we see very encouraging early results that makes us believe that there’s much more to be achieved going forward with the two communities working together.

STRICKLAND: You mentioned the challenging of scaling. I think everyone at NeurIPS is talking about scaling. And you’ve highlighted efficiency as a key opportunity for improvement in AI. What kind of breakthroughs in systems engineering or new ideas in systems engineering could help AI achieve greater efficiencies?

ZHOU: Yeah, that’s another great question. I think there are a couple of aspects to efficiency. So this morning, I talked about some of the innovations in model architecture. So our researchers have been looking into BitNet, which is essentially try to use one bit or, actually, using a ternary representation for the weights in all those AI models rather than using FP16 and so on. And that potentially creates a lot of opportunities for efficiency and energy gains. But that cannot be done without rethinking about the software and even the hardware stack so that, you know, those innovations that you have in the model architecture can actually have the end-to-end benefits. And that’s, you know, one of the dimensions where we see the coinnovation of AI and underlying system to deliver some efficiency gains for AI models, for example. But there’s another dimension, which I think is also very important. With all the AI infrastructure that we build to support AI, there’s actually a huge room for improvement, as well. And this is where AI can actually be utilized to solve some of the very challenging systems problems, for optimization, for reliability, for trustworthiness. And I use some of the examples in my talk, but this is a very early stage. I think the potential is much larger going forward.

STRICKLAND: Yeah. It’s interesting to think about how GPUs and large language models are so intertwined at this point. You can’t really have one without the other. And you said in your talk you sort of see the need to decouple the architectures and the hardware. Is that right?

ZHOU: Yes. Yeah, so this is always, you know, like very system type of thinking where, you know, you really want to decouple some of the elements so that they can evolve and innovate independently. And this gives more opportunities, you know, larger design space, for each field. And what we are observing now, which is actually very typical in relatively mature fields, where we have GPUs that are dominating in the hardware land and all the model architecture has to be designed and, you know, proving very efficient on GPUs. And that limits the design space for model architecture. And similarly, you know, if you look at hardware, it’s very hard for hardware innovations to happen because now you have to show that those hardwares are actually great for all the models that have been actually optimized for GPUs. So I think, you know, from a systems perspective, it’s actually possible if you design the right abstraction between the AI and the hardware, it’s possible for this two domains to actually evolve separately and have a much larger design space, actually, to find the best solution for both.

STRICKLAND: And when you think about systems engineering, are there ways that AI can be used to optimize your own work?

ZHOU: Yes, I think there are. Two examples that I gave this morning, one is, you know, in systems there’s this what we call a holy grail of system research because we want to build trustworthy systems that people can depend on. And one of the approach is called verified systems. And this has been a very active research area in systems because there are a lot of advancements in formal methods in how we can infuse the formal method into building real systems. But it’s still very hard for the general system community because, you know, you really have to understand how formal methods works and so on. And so it’s still not within reach. You know, like when we build mission-critical systems, we want to be completely verified so, you know, you don’t have to do a lot of testing to show that there are no bugs. You’ll never be able to show there’s no bugs with testing. But if you …

STRICKLAND: Sorry, can I pause you for one moment? Could you define formal verification for our listeners, just in case they don’t know?

ZHOU: Yeah, that’s a good point. I think the easy way to think about this is formal verification, it uses mathematical logic to describe, say, a program and, you know, it can represent some properties in math, essentially, in logic. And then you can use a proof to show that the program has certain properties that you desire, and a simple form, like, a very preliminary form of formal (specification for) verification is, you know, just assertions in the program, right, where it, say, asserts A is not equal to zero. And that’s a very simple form of logic that must hold (or be proven to hold), and then, you know, the proof system is also much more complicated to talk about more advanced properties of programs, their correctness, and so on.

STRICKLAND: Mm-hm.

ZHOU: So I think that the opportunity that we’re seeing is that with the help of AI, I think we are on the verge of providing the capability of building verified systems, at least for some of the mission-critical pieces of systems. And that would be a very exciting area for systems and AI to tackle together. And I think we’re going to see a paradigm shift in systems where some pieces of system components will actually be implemented using AI. [What] is interesting is, you know, system is generally deterministic because, so, you know, when you look at the traditional computer system, you want to know that it’s actually acting as you expected, but AI, you know, it can be stochastic, right. And it might not always give you the same answer. But how you combine these two is another area where I see a lot of opportunities for breakthroughs.

STRICKLAND: Yeah, yeah. I wanted to back up in your career a little bit and talk about the concept of gray failures because you were really instrumental in defining this concept, which for people who don’t know, gray failures are subtle and partial failures in cloud-scale systems. They can be very difficult to detect and can lead to major problems. I wanted to see if you’re still thinking about gray failures in the context of your thinking about AI and systems. Are gray failures having an impact on AI today?

ZHOU: Yes, definitely. So when we were looking at cloud systems, we realized the … so in systems, we developed a lot of mechanisms for reliability. And when we look at the cloud systems, when they reach a certain scale, a lot of methodology we develop in systems for reliability actually no longer applies. One of the reasons is we have those gray failures. And then we moved to looking at AI infrastructure. The problem is actually even worse because what we realize is there’s a lot of built-in redundancy at every level, like in GPUs, memory, or all the communication channels. And because of those built-in redundancies, sometimes the system is experience failures, but they’re being masked because of the redundancies. And that makes it very hard for us to actually maintain the system, debug the system, or to troubleshooting. And for AI infrastructure, what we have developed is a very different approach using proactive validation rather than reactive repair. And this is actually a paper that we wrote recently in USENIX ATC that talks about how we approach reliability in AI infrastructure, where the same concept happens to apply in a new meaning.

STRICKLAND: Mm. I like that. Yeah. So tell me a little bit about your vision for where AI goes from here. You talked a little bit in your keynote about AI-infused systems. And what would that look like?

ZHOU: Yeah, so I think AI is going to transform almost everything, and that includes systems. That’s why I’m so happy to be here to learn more from the AI community. But I also believe that for every domain that AI is going to transform, you really need the domain expertise and, sort of, the combination of AI and that particular domain. And the same for systems. So when we look at what we call AI-infused systems, we really see the opportunity where there are a lot of hard system challenges can be addressed by AI. But we need to define the right interface between the system and the AI so that we can leverage the advantage of both, right. Like, AI is creative. It comes up with solutions that, you know, people might not think of, but it’s also a little bit random sometimes. It could, you know, give you wrong answers. But systems are very grounded and very deterministic. So we need to figure out what is the design paradigm that we need to develop so that we can get the best of both worlds.

STRICKLAND: Makes sense. In your talk you gave an example of OptiFlow. Could you tell our listeners a bit about that?

ZHOU: Yeah. This is a pretty interesting project that is actually done in Microsoft Research Asia jointly with the Azure team where we look at collective communication, which is a major part of AI infrastructure. And it turns out, you know, there’s a lot of room for optimization. It was initially done manually. So an expert had to take a look at the system and look at the different configurations and do all kinds of experiments, and, you know, it takes about two weeks to come up with a solution. This is why I say, you know, the productivity is becoming a bottleneck for our AI infrastructure because people are in the loop who have to develop solutions. And it turns out that this is a perfect problem for AI, where AI can actually come up with various solutions. It can actually develop good system insights based on the observations from the system. And so OptiFlow, what it does is it comes up with the, sort of, the algorithm or the schedule of communications for different collective communication primitives. And it turns out to be able to discover algorithms that’s much better than the default one or, you know, for different settings. And it’s giving us the benefits of the productivity; also, efficiency.

STRICKLAND: And you said that this is in production today, right?

ZHOU: Yes. It is in production.

STRICKLAND: That’s exciting. So thinking still to the future, how might the co-evolution of AI and systems change the skills needed for future computer scientists?

ZHOU: Yeah, that’s a very deep question. As I mentioned, I think being fluent in AI is very important. But I also believe that domain expertise is probably undervalued in many ways. And I see a lot of needs for this interdisciplinary kind of education where someone who not only understands AI and what AI technology can do but also understands a particular domain very well. And those are the people who will be able to figure out the future for that particular domain with the power AI. And I think for students, certainly it’s no longer sufficient for you to be an expert in a very narrow domain. I think we see a lot of fields sort of merging together, and so you have to be an expert in multiple domains to see new opportunities for innovations.

STRICKLAND: So what advice would you give to a high school student who’s just starting out and thinks, ah, I want to get into AI?

ZHOU: Yeah, I mean certainly there’s a lot of excitement over AI, and it would be great for high school students to, actually, to have the firsthand experience. And I think it’s their world in the future. Because they probably can imagine a lot of things from scratch. I think they probably have the opportunity to disrupt a lot of the things that we take for granted today. So I think just use their imagination. And I don’t think we have really good advice for the young generation. It’s going to be their creativity and their imagination. And AI is definitely going to empower them to do something that’s going to be amazing.

STRICKLAND: Something that we probably can’t even imagine.

ZHOU: Right.

STRICKLAND: Yeah.

ZHOU: I think so.

STRICKLAND: I like that. So as we close, I’m hoping you can look ahead and talk about what excites you most about the potential of AI and systems working together, but also if you have any concerns, what concerns you most?

ZHOU: Yeah, I think in terms of AI systems, I’m certainly pretty excited about what we can do together, you know, with a combination of AI and systems. There are a lot of low-hanging fruit, and there are also a lot of potential grand challenges that we can actually take on. I mentioned a couple in this morning’s talk. And certainly, you know, we also want to look at the risks that could happen, especially when we have systems and AI start to evolve together. And this is also in an area where having some sort of trust foundation is very important so we can have some assurance of the kind of system or AI system that we are going to build. And this is actually fundamental in how we think about trust in systems. And I think that concept can be very useful for us to guard against unintended consequences or unintended issues.

[MUSIC]

STRICKLAND: Well, Lidong Zhou, thank you so much for joining us on the podcast. I really enjoyed the conversation.

ZHOU: It’s such a pleasure, Eliza.

STRICKLAND: And to our listeners, thanks for tuning. If you want to learn more about research at Microsoft, you can check out the Microsoft Research website at Microsoft.com/research. Until next time.

[MUSIC FADES]

The post NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou appeared first on Microsoft Research.

Read More

Simplify multimodal generative AI with Amazon Bedrock Data Automation

Simplify multimodal generative AI with Amazon Bedrock Data Automation

Developers face significant challenges when using foundation models (FMs) to extract data from unstructured assets. This data extraction process requires carefully identifying models that meet the developer’s specific accuracy, cost, and feature requirements. Additionally, developers must invest considerable time optimizing price performance through fine-tuning and extensive prompt engineering. Managing multiple models, implementing safety guardrails, and adapting outputs to align with downstream system requirements can be difficult and time consuming.

Amazon Bedrock Data Automation in public preview helps address these and other challenges. This new capability from Amazon Bedrock offers a unified experience for developers of all skillsets to easily automate the extraction, transformation, and generation of relevant insights from documents, images, audio, and videos to build generative AI–powered applications. With Amazon Bedrock Data Automation, customers can fully utilize their data by extracting insights from their unstructured multimodal content in a format compatible with their applications. Amazon Bedrock Data Automation’s managed experience, ease of use, and customization capabilities help customers deliver business value faster, eliminating the need to spend time and effort orchestrating multiple models, engineering prompts, or stitching together outputs.

In this post, we demonstrate how to use Amazon Bedrock Data Automation in the AWS Management Console and the AWS SDK for Python (Boto3) for media analysis and intelligent document processing (IDP) workflows.

Amazon Bedrock Data Automation overview

You can use Amazon Bedrock Data Automation to generate standard outputs and custom outputs. Standard outputs are modality-specific default insights, such as video summaries that capture key moments, visual and audible toxic content, explanations of document charts, graph figure data, and more. Custom outputs use customer-defined blueprints that specify output requirements using natural language or a schema editor. The blueprint includes a list of fields to extract, data format for each field, and other instructions, such as data transformations and normalizations. This gives customers full control of the output, making it easy to integrate Amazon Bedrock Data Automation into existing applications.

Using Amazon Bedrock Data Automation, you can build powerful generative AI applications and automate use cases such as media analysis and IDP. Amazon Bedrock Data Automation is also integrated with Amazon Bedrock Knowledge Bases, making it easier for developers to generate meaningful information from their unstructured multimodal content to provide more relevant responses for Retrieval Augmented Generation (RAG).

Customers can get started with standard outputs for all four modalities: documents, images, videos, and audio and custom outputs for documents and images. Custom outputs for video and audio will be supported when the capability is generally available.

Amazon Bedrock Data Automation for images, audio, and video

To take a media analysis example, suppose that customers in the media and entertainment industry are looking to monetize long-form content, such as TV shows and movies, through contextual ad placement. To deliver the right ads at the right video moments, you need to derive meaningful insights from both the ads and the video content. Amazon Bedrock Data Automation enables your contextual ad placement application by generating these insights. For instance, you can extract valuable information such as video summaries, scene-level summaries, content moderation concepts, and scene classifications based on the Interactive Advertising Bureau (IAB) taxonomy.

To get started with deriving insights with Amazon Bedrock Data Automation, you can create a project where you can specify your output configuration using the AWS console, AWS Command Line Interface (AWS CLI) or API.

To create a project on the Amazon Bedrock console, follow these steps:

  1. Expand the Data Automation dropdown menu in the navigation pane and select Projects, as shown in the following screenshot.
  2. From the Projects console, create a new project and provide a project name, as shown in the following screenshot.
  3. From within the project, choose Edit, as shown in the following screenshot, to specify or modify an output configuration. Standard output is the default way of interacting with Amazon Bedrock Data Automation, and it can be used with audio, documents, images and videos, where you can have one standard output configuration per data type for each project.
  4. For customers who want to analyze images and videos for media analysis, standard output can be used to generate insights such as image summary, video scene summary, and scene classifications with IAB taxonomy. You can select the image summarization, video scene summarization, and IAB taxonomy checkboxes from the Standard output tab and then choose Save changes to finish configuring your project, as shown in the following screenshot.
  5. To test the standard output configuration using your media assets, choose Test, as shown in the following screenshot.

The next example uses the project to generate insights for a travel ad.

  1. Upload an image, then choose Generate results, as shown in the following screenshot, for Amazon Bedrock Data Automation to invoke an inference request.
  2. Amazon Bedrock Data Automation will process the uploaded file based on the project’s configuration, automatically detecting that the file is an image and then generating a summary and IAB categories for the travel ad.
  3. After you have generated insights for the ad image, you can generate video insights to determine the best video scene for effective ad placement. In the same project, upload a video file and choose Generate results, as shown in the following screenshot.

Amazon Bedrock Data Automation will detect that the file is a video and will generate insights for the video based on the standard output configuration specified in the project, as shown in the following screenshot.

These insights from Amazon Bedrock Data Automation, can help you effectively place relevant ads in your video content, which can help improve content monetization.

Intelligent document processing with Amazon Bedrock Data Automation

You can use Amazon Bedrock Data Automation to automate IDP workflows at scale, without needing to orchestrate complex document processing tasks such as classification, extraction, normalization, or validation.

To take a mortgage example, a lender wants to automate the processing of a mortgage lending packet to streamline their IDP pipeline and improve the accuracy of loan processing. Amazon Bedrock Data Automation simplifies the automation of complex IDP tasks such as document splitting, classification, data extraction, output format normalization, and data validation. Amazon Bedrock Data Automation also incorporates confidence scores and visual grounding of the output data to mitigate hallucinations and help improve result reliability.

For example, you can generate custom output by defining blueprints, which specify output requirements using natural language or a schema editor, to process multiple file types in a single, streamlined API. Blueprints can be created using the console or the API, and you can use a catalog blueprint or create a custom blueprint for documents and images.

For all modalities, this workflow consists of three main steps: creating a project, invoking the analysis, and retrieving the results.

The following solution walks you through a simplified mortgage lending process with Amazon Bedrock Data Automation using the Amazon SDK for Python (Boto3), which is straightforward to integrate into an existing IDP workflow.

Prerequisites

Before you invoke the Amazon Bedrock API, make sure you have the following:

Create custom blueprint

In this example, you have the lending packet, as shown in the following image, which contains three documents: a pay stub, a W-2 form, and a driver’s license.

Amazon Bedrock Data Automation has sample blueprints for these three documents that define commonly extracted fields. However, you can also customize Amazon Bedrock Data Automation to extract specific fields from each document. For example, you can extract only the gross pay and net pay from the pay stub by creating a custom blueprint.

To create a custom blueprint using the API, you can use the CreateBlueprint operation using the Amazon Bedrock Data Automation Client. The following example shows the gross pay and net pay being defined as properties passed to CreateBlueprint, to be extracted from the lending packet:

bda_create_blueprint_response = bedrock_data_automation_client.create_blueprint(
    blueprintName='CUSTOM_PAYSLIP_BLUEPRINT',
    type='DOCUMENT',
    blueprintStage='LIVE',
    schema=json.dumps({
        '$schema': 'http://json-schema.org/draft-07/schema#',
        'description': 'default',
        'documentClass': 'default',
        'type': 'object',
        'properties': {
            'gross_pay_this_period': {
                'type': 'number',
                'inferenceType': 'extractive',
                'description': 'The gross pay for this pay period from the Earnings table'
            },
            'net_pay': {
                'type': 'number',
                'inferenceType': 'extractive',
                'description': 'The net pay for this pay period from the bottom of the document'
            }
        }
    }),
)

The CreateBlueprint response returns the blueprintARN for the pay stub’s custom blueprint:

'blueprintArn: arn:aws:bedrock:us-west-2:<AWS_ACCOUNT_ID>:blueprint/<BLUEPRINT_ID>'

Configure Amazon Bedrock Data Automation project

To begin processing files using blueprints with Amazon Bedrock Data Automation, you first need to create a data automation project. To process a multiple-page document containing different file types, you can configure a project with different blueprints for each file type.

Use Amazon Bedrock Data Automation to apply multiple document blueprints within one project so you can process different types of documents within the same project, each with its own custom extraction logic.

When using the API to create a project, you invoke the CreateDataAutomationProject operation. The following is an example of how you can configure custom output using the custom blueprint for the pay stub and the sample blueprints for the W-2 and driver’s license:

bda_bedrock_automation_create_project_response = bedrock_data_automation_client.create_data_automation_project(
    projectName='TEST_PROJECT',
    projectDescription='test BDA project',
    projectStage=bda_stage,
    standardOutputConfiguration={
        'document': {
            'outputFormat': {
                'textFormat': {
                    'types': ['PLAIN_TEXT']
                },
                'additionalFileFormat': {
                    'state': 'ENABLED',
                }
            }
        },
    },
    customOutputConfiguration={
        'blueprints': [
          {
              'blueprintArn': 'arn:aws:bedrock:us-west-2:<AWS_ACCOUNT_ID>:blueprint/<BLUEPRINT_ID>'
          },
          {
              'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-w2-form'
          },
          {
              'blueprintArn': 'arn:aws:bedrock:us-west-2:aws:blueprint/bedrock-data-automation-public-us-driver-license'
          },
        ],
    },
    overrideConfiguration={
        'document': {
            'splitter': {
                'state': 'ENABLED'
            }
        }
    },
)

The CreateProject response returns the projectARN for the project:

'arn:aws:bedrock:us-west-2:<AWS_ACCOUNT_ID>:data-automation-project/<PROJECT_ID>'

To process different types of documents using multiple document blueprints in a single project, Amazon Bedrock Data Automation uses a splitter configuration, which must be enabled through the API. The following is the override configuration for the splitter, and you can refer to the Boto3 documentation for more information:

overrideConfiguration={
    'document': {
        'splitter': {
            'state': 'ENABLED' | 'DISABLED'
        }
    }
},

Upon creation, the API validates the input configuration and creates a new project, returning the projectARN, as shown in the following screenshot.

'arn:aws:bedrock:us-west-2:<AWS_ACCOUNT_ID>:data-automation-project/<PROJECT_ID>'

Test the solution

Now that the blueprint and project setup is complete, the InvokeDataAutomationAsync operation from the Amazon Bedrock Data Automation runtime can be used to start processing files. This API call initiatives the asynchronous processing of files in an S3 bucket, in this case the lending packet, using the configuration defined in the project by passing the project’s ARN:

bda_invoke_data_automation_async_response = bedrock_data_automation_runtime_client.invoke_data_automation_async(
    inputConfiguration={'s3Uri': '<S3_URI>'},
    outputConfiguration={'s3Uri': '<S3_URI>'},
    dataAutomationConfiguration={
        'dataAutomationArn': 'arn:aws:bedrock:us-west-2:<AWS_ACCOUNT_ID>:data-automation-project/<PROJECT_ID>',
        'stage': 'LIVE'
    }
)

InvokeDataAutomationAsync returns the invocationARN:

'arn:aws:bedrock:us-west-2:<AWS_ACCOUNT_ID>:data-automation-invocation/<INVOCATION_ID>'

GetDataAutomationStatus can be used to view the status of the invocation, using the InvocationARN from the previous response:

bda_invoke_data_automation_async_response = bedrock_data_automation_runtime_client.get_data_automation_status(
    invocationArn='arn:aws:bedrock:us-west-2:<AWS_ACCOUNT_ID>:data-automation-invocation/<INVOCATION_ID>'
)

When the job is complete, view the results in the S3 bucket used in the outputConfiguration by navigating to the ~/JOB_ID/0/custom_output/ folder.

From the following sample output, Amazon Bedrock Data Automation associated the pay stub file with the custom pay stub blueprint with a high level of confidence:

'matched_blueprint': {
    'arn': '<BLUEPRINT_ARN>', 'name': 'CUSTOM_PAYSLIP_BLUEPRINT', 'confidence': 0.99959725
}

Using the matched blueprint, Amazon Bedrock Data Automation was able to accurately extract each field defined in the blueprint:

'inference_result': {
    'net_pay': 291.9, 'gross_pay_this_period': 452.43
}

Additionally, Amazon Bedrock Data Automation returns confidence intervals and bounding box information for each field:

'explainability_info': [{
    'net_pay': {'success': true, 'confidence': 0.96484375, 'geometry': [{'boundingBox': ...

This example demonstrates how customers can use Amazon Bedrock Data Automation to streamline and automate an IDP workflow. Amazon Bedrock Data Automation automates complex document processing tasks such as data extraction, normalization, and validation from documents. Amazon Bedrock Data Automation helps to reduce operational complexity and improves processing efficiency to handle higher loan processing volumes, minimize errors, and drive operational excellence.

Cleanup

When you’re finished evaluating this feature, delete the S3 bucket and any objects to avoid any further charges.

Summary

Customers can get started with Amazon Bedrock Data Automation, which is available in public preview in AWS Region US West 2 (Oregon). Learn more on Amazon Bedrock Data Automation and how to automate the generation of accurate information from unstructured content for building generative AI–based applications.


About the authors

Ian Lodge is a Solutions Architect at AWS, helping ISV customers in solving their architectural, operational, and cost optimization challenges. Outside of work he enjoys spending time with his family, ice hockey and woodworking.

Alex Pieri is a Solutions Architect at AWS that works with retail customers to plan, build, and optimize their AWS cloud environments. He specializes in helping customers build enterprise-ready generative AI solutions on AWS.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Read More

PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts

PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts

A diagram illustrating the joint optimization process of instructions and in-context examples in PromptWizard. The figure demonstrates how the framework iteratively refines both components, integrating feedback to enhance the overall prompt effectiveness and adaptability across tasks.

The challenge of effective prompting

AI is reshaping industries—from education to healthcare—thanks to advancements in large language models (LLMs). These models rely on prompts, carefully crafted inputs that guide them to produce relevant and meaningful outputs. While the impact of prompts is profound, creating prompts that can help with complex tasks is a time-intensive and expertise-heavy process, often involving months of trial and error. 

This challenge grows as new tasks arise and models evolve rapidly, making manual methods for prompt engineering increasingly unsustainable. The question then becomes: How can we make prompt optimization faster, more accessible, and more adaptable across diverse tasks? 

To address this challenge, we developed PromptWizard (PW), a research framework that automates and streamlines the process of prompt optimization. We are open sourcing the PromptWizard codebase (opens in new tab) to foster collaboration and innovation within the research and development community.

Introducing PromptWizard

PromptWizard (PW) is designed to automate and simplify prompt optimization. It combines iterative feedback from LLMs with efficient exploration and refinement techniques to create highly effective prompts within minutes.

PromptWizard optimizes both the instruction and the in-context learning examples. Central to PW is its self-evolving and self-adaptive mechanism, where the LLM iteratively generates, critiques, and refines prompts and examples in tandem. This process ensures continuous improvement through feedback and synthesis, achieving a holistic optimization tailored to the specific task at hand. By evolving both instructions and examples simultaneously, PW ensures significant gains in task performance. 

Three key insights behind PromptWizard:

  • Feedback-driven refinement: At its core, PW leverages an iterative feedback loop where the LLM generates, critiques, and refines its own prompts and examples. This continuous improvement mechanism ensures that each iteration is better than the last, leading to highly effective prompts and examples. 
  • Joint optimization and synthesis of diverse examples: PW generates synthetic examples that are not only robust and diverse but also task-aware. By optimizing prompts and examples together, it ensures they work in tandem to address specific task requirements effectively. 
  • Self-generated chain-of-thought (CoT) steps: Incorporating CoT reasoning improves the problem-solving capabilities of the model. By using selected few-shot examples, PW generates a detailed reasoning chain for each example, facilitating nuanced and step-by-step problem-solving approaches.
Fig 1: A diagram providing an overview of the PromptWizard process. It illustrates the main components, including iterative prompt generation, feedback-based refinement, and joint optimization of instructions and examples. The workflow emphasizes modularity and adaptability, demonstrating how PromptWizard evolves prompts to improve performance across diverse tasks.
Figure 1. Overview of PromptWizard

How PromptWizard works

PromptWizard begins with a user input: a problem description, an initial prompt instruction, and a few training examples that serve as a foundation for the task at hand.

Its output is a refined, optimized set of prompt instructions paired with carefully curated in-context few-shot examples. These outputs are enriched with detailed reasoning chains, task intent, and an expert profile that bridges human-like reasoning with the AI’s responses. 

Stage 1: Refinement of prompt instruction

The first stage focuses on refining the task instructions of a prompt. PromptWizard generates multiple candidate instructions, evaluates them using feedback from the LLM, and iteratively synthesizes improved versions. This process balances exploration—trying diverse ideas—and exploitation—refining the most promising ones.

For example, if an initial instruction yields suboptimal results, PW incorporates feedback to identify its shortcomings and generates an improved version. Over three to five iterations, this iterative cycle ensures that the instruction converges to an optimal state. 

Fig 2: A visualization of the refinement process for prompt instructions in PromptWizard. The figure highlights iterative improvements, where initial instructions are critiqued, adjusted based on feedback, and fine-tuned to achieve greater accuracy and alignment with task objectives.
Figure 2. Refinement of prompt instruction

Stage 2: Joint optimization of instructions and examples

The refined prompt obtained from Stage 1 is combined with carefully selected examples, and both are optimized together. Through the critique-and-synthesis mechanism, PromptWizard ensures alignment between the prompt and examples, simultaneously synthesizing new examples to enhance task performance.

This structured approach makes PromptWizard highly versatile, adapting to tasks as varied as solving math problems or generating creative content. 

Fig 3: A diagram illustrating the joint optimization process of instructions and in-context examples in PromptWizard. The figure demonstrates how the framework iteratively refines both components, integrating feedback to enhance the overall prompt effectiveness and adaptability across tasks.
Figure 3. Joint optimization of instructions and examples

Microsoft research podcast

Abstracts: August 15, 2024

Advanced AI may make it easier for bad actors to deceive others online. A multidisciplinary research team is exploring one solution: a credential that allows people to show they’re not bots without sharing identifying information. Shrey Jain and Zoë Hitzig explain.


Results

PromptWizard stands out for its feedback-driven refinement and systematic exploration, delivering exceptional results across a wide variety of tasks while maintaining computational efficiency. 

Comprehensive evaluation across tasks

PromptWizard was rigorously evaluated on over 45 tasks, spanning both general and domain-specific challenges. Benchmarked against state-of-the-art techniques—including Instinct, InstructZero, APE, PromptBreeder, EvoPrompt, DSPy, APO, and PromptAgent—PW consistently outperformed competitors in accuracy, efficiency, and adaptability. Please see detailed results in our paper

  • Accuracy: PW consistently outperformed other methods, maintaining performance close to the best across all tasks. Figure 4 shows the performance profile curve that highlights PW’s reliability, demonstrating how frequently it achieves near-best accuracy compared to other approaches for BigBench Instruction Induction dataset (BBII).
  • Efficiency: Beyond accuracy, PW demonstrates its computational efficiency. Unlike many baseline methods that require extensive API calls and computational resources, PW achieves superior results with minimal overhead by striking an effective balance between exploration and exploitation. Table 1 demonstrates PW’s cost-effectiveness, with significantly reduced token usage for input and output while optimizing prompts effectively.
Fig 4: A performance profile curve illustrating PromptWizard's reliability on the BigBench Instruction Induction (BBII) dataset. The curve demonstrates how often PromptWizard achieves accuracy levels close to the best performance when compared to other approaches, highlighting its consistency and effectiveness.
Figure 4. Performance Profile curve on BBII dataset
Methods API calls Total tokens
Instinct 1730 115k
PromptBreeder 18600 1488k
EvoPrompt 5000 400k
PW 69 24k
Table 1. Cost analysis on BBII dataset

We also have conducted numerous experiments to highlight PromptWizard’s efficacy with limited training data and smaller LLMs. 

Resilience with limited data

Real-world scenarios often lack abundant training data. PW excels in such conditions, requiring as few as five examples to produce effective prompts. Across five diverse datasets, PW demonstrated an average accuracy drop of only 5% when using five examples compared to 25 examples—highlighting its adaptability and efficiency (see Table 2). 

Datasets 5 Examples 25 Examples
MMLU 80.4 89.5
GSM8k 94 95.4
Ethos 86.4 89.4
PubMedQA 68 78.2
MedQA 80.4 82.9
Average 81.9 87
Table 2. PW’s performance with varying number of examples

Leveraging smaller models for optimization

PromptWizard also reduces computational costs by using smaller LLMs for prompt generation, reserving more powerful models for inference. For example, using Llama-70B for prompt generation resulted in negligible performance differences compared to GPT-4, while significantly lowering resource usage (see Table 3).

Dataset Prompt Gen: Llama-70B Prompt Gen: GPT4
GSM8k 94.6 95.4
Ethos 89.2 89.4
Average 91.9 92.4
Table 3. Performance with smaller LLMs for prompt generation 

PromptWizard shows that effective prompts combine optimized instructions refined through iterative feedback, thoughtfully chosen in-context examples, and a modular design that incorporates expert knowledge and task-specific intent. This approach enables the framework to handle a broad range of tasks, from simple to highly complex, with exceptional efficiency and flexibility.

 Whether you are a researcher addressing cutting-edge challenges or an organization looking to streamline workflows, PromptWizard provides a practical, scalable, and impactful solution for enhancing model performance.

The post PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts appeared first on Microsoft Research.

Read More

How TUI uses Amazon Bedrock to scale content creation and enhance hotel descriptions in under 10 seconds

How TUI uses Amazon Bedrock to scale content creation and enhance hotel descriptions in under 10 seconds

TUI Group is one of the world’s leading global tourism services, providing 21 million customers with an unmatched holiday experience in 180 regions. TUI Group covers the end-to-end tourism chain with over 400 owned hotels, 16 cruise ships, 1,200 travel agencies, and 5 airlines covering all major holiday destinations around the globe. At TUI, crafting high-quality content is a crucial component of its promotional strategy.

The TUI content teams are tasked with producing high-quality content for its websites, including product details, hotel information, and travel guides, often using descriptions written by hotel and third-party partners. This content needs to adhere to TUI’s tone of voice, which is essential to communicating the brand’s distinct personality. But as its portfolio expands with more hotels and offerings, scaling content creation has proven challenging. This presents an opportunity to augment and automate the existing content creation process using generative AI.

In this post, we discuss how we used Amazon SageMaker and Amazon Bedrock to build a content generator that rewrites marketing content following specific brand and style guidelines. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Amazon SageMaker helps data scientists and machine learning (ML) engineers build FMs from scratch, evaluate and customize FMs with advanced techniques, and deploy FMs with fine-grain controls for generative AI use cases that have stringent requirements on accuracy, latency, and cost.

Through experimentation, we found that following a two-phased approach worked best to make sure that the output aligned to TUI’s tone of voice requirements. The first phase was to fine-tune with a smaller large language model (LLM) on a large corpus of data. The second phase used a different LLM model for post-processing. Through fine-tuning, we generate content that mimics the TUI brand voice using static data and which could not be captured through prompt engineering. Employing a second model with few-shot examples helped verify the output adhered to specific formatting and grammatical rules. The latter uses a more dynamic dataset, which we can use to adjust the output quickly in the future for different brand requirements. Overall, this approach resulted in higher quality content and allowed TUI to improve content quality at a higher velocity.

Solution overview

The architecture consists of a few key components:

  • LLM models – We evaluated different approaches and found that a two-model solution performed the best. This consists of a fine-tuned Meta Llama model to generate a description for the given hotel and Anthropic’s Claude model to reformat its output. Fine-tuning and hosting the Meta Llama 2 model was done on Amazon SageMaker, and Anthropic’s Claude 2 was consumed from Amazon Bedrock through API calls.
  • Orchestration – We created a state machine using AWS Step Functions to make calls in a batch format to the two LLMs and fetch the search engine optimization (SEO) score for the generated content from a third-party API. If the SEO content score is above a defined threshold (80%), the generated content is stored in an Amazon DynamoDB table and can later be reviewed by the content team directly in the front-end UI. Through this process, we maintain and monitor content quality at scale.
  • Human in the loop feedback – We developed a custom React front-end application to gather feedback from the content team to facilitate continuous improvement and future model fine-tuning. You can use the feedback to fine-tune a base model on SageMaker using reinforcement learning from human feedback (RLHF) to improve performance.

The following diagram is the high-level architecture of the solution.

Architecture Diagram

Prompt engineering

We started by crafting prompts that include the most common issues identified by the TUI content team, including tone of voice, accuracy, length, and grammar. We provided the LLM with a few examples of curated content within the context window. Although the generated output followed the guidance, the writing style didn’t meet TUI’s tone of voice requirements.

 Example prompt:

You are an experienced British copywriter for TUI. TUI is a world-leading travel company. You are an expert in generating hotel descriptions, based on TUI’s tone of voice. TUI's tone of voice can be described as upbeat, enthusiastic, and promotional. Avoid all the words in the following list: {banned words}
Write at most 100 words.
Your hotel descriptions must follow TUI's tone of voice and apply SEO guidelines.
These are some good examples. You should mimic below.
{examples}.
Human: {input}
Assistant:

Here is an example of the results we received.

Original text Generated text
Boutique interiors and a great location combine at this small but perfectly formed hotel. It may be small but this 15-room hotel packs a punch with its interior design. The immaculate pool is set in an open atrium area. And there’s funky cocktail bar and lounge, too. The location is every bit as impressive as the styling. It’s about a 10-minute walk to the beach and less than half that to the centre of Thassos Town. This petite hotel brings boutique style to Thassos Town. Its immaculate pool and funky cocktail bar create an oasis in the heart of the action. The beach is just a 10-minute walk away, so you can easily split your time between sunbathing and sightseeing. With only 15 rooms, the experience feels intimate yet worldly. The interiors marry modern design with traditional touches for a unique look. And the central location means popular attractions like the Archaeological Museum and shopping street are right on your doorstep.

To evaluate these results, we generated a dataset of 150 hotel descriptions using two different prompt versions and asked our business teams to review it. However, they flagged 98% of the output with significant issues. Firstly, Anthropic’s Claude 2 struggled to adhere to TUI’s tone of voice and branding guidelines, frequently employed abbreviations, and favored American English over British English. It also failed to follow the SEO guidelines provided and sometimes generated false information regarding hotel facilities and locations. The following image shows a list of these challenges and how the LLM handled them. Of the six challenges, the LLM met only one.

Main Challenges

Fine-tuning Llama 2 using PEFT on Amazon SageMaker JumpStart

These issues and poor feedback led us to conclude that prompt engineering alone would not adequately address the problem. As a result, we decided to pursue an alternative approach: fine-tuning a smaller large language model to rewrite the text in accordance with TUI’s tone of voice. We used a curated set of hotel descriptions written by TUI copywriters so that the model would have better alignment with our guidelines.

We selected the Meta Llama 2 model using Amazon SageMaker JumpStart, one of the top open source LLMs available at the time, and chose the 13B parameter version to apply parameter-efficient fine-tuning (PEFT), specifically using quantized low-rank adaptation (QLoRA). This technique quantizes the pre-trained model to 4 bits and adds small low-rank adapters for fine-tuning. We fine-tuned the model on a single ml.g5.4xlarge instance in about 20 hours using a relatively small dataset of around 4,500 hotels. We also tested out the Llama 2 7B and 70B models. We found that the 7B model didn’t perform well enough, and the 70B model had much higher costs without seeing significant improvement.

We used common natural language processing (NLP) evaluation metrics, such as perplexity for evaluation and monitoring during training, and established daily feedback loops with the content team to refine the test set. The output from the fine-tuned Meta Llama 2 13B model effectively mimicked the TUI tone of voice and adhered to most SEO guidelines, even those not specified in the prompt.

For more information, refer to Fine-tune Llama 2 for text generation on Amazon SageMaker Jumpstart.

Original text Generated text
The NH München City Süd hotel, formerly known as NH München am Ring, is located southeast of Munich. The location is ideal for reaching the highway towards Salzburg and provides good access to the airport and Exhibition Center. You can also get to the city center in 20 minutes by hopping on the S-Bahn at the Karl-Preis-Platz station. The peaceful residential location offers an excellent opportunity to relax in one of the 163 rooms. These are decorated in white with red accents. Rooms feature free Wi-Fi and electronic safes that are large enough for your laptop. If you need to work, there are spacious desks in every room and apartments with seperate living rooms are available for longer term guests. The suites also come with separate living rooms and foldaway beds are available upon request. The breakfast will get you ready for a busy day with a large spread of cold cuts, cheeses, fresh pastries, and fruit. These come accompanied with freshly brewed coffee, tea, and fresh fruit juices. For lunch and dinner, you will find many restaurant options in close proximity to the hotel. There is a sauna for relaxation and you can park your car in the garage.

Make yourself at home in Munich at the München City Süd. This former NH München am Ring sits just southeast of the Bavarian capital, putting both the highway to Salzburg and the airport within easy reach. The S-Bahn station Karl-Preis-Platz is a quick walk away too, so you can be in the city center in just 20 minutes.

The neighborhood here is peaceful, giving you the chance to properly relax during your stay. The hotel’s 163 rooms are decorated in calming white and red hues. Free Wi-Fi and laptop-sized safes come as standard, while spacious desks make working easy. For longer visits, apartments with separate living areas are available. The suites take things up a notch with their own living spaces and foldaway beds on request.

Fuel up each morning with a vast breakfast spread of cold cuts, cheeses, pastries and fruit. All washed down with coffee, tea and fresh juices. You’ll find many dining options close by for other meals. After a busy day, unwind in the sauna or park up in the garage.

The following image shows a list of the challenges and how the LLM handled them. Of the six challenges, the LLM met four.

Improvements

Integrating Anthropic’s Claude 2 for further improvement

To further improve the output, we introduced Anthropic’s Claude 2 using Amazon Bedrock as a final refinement step. This included converting American spelling to British spelling, writing numbers one through nine in words and larger numbers in digits, correcting typos and capitalization errors, minimizing banned words, incorporating essential TUI branding words, and adding missing hotel information. We also implemented a feedback mechanism in the UI to use data for ongoing fine-tuning in production. By using Anthropic Claude 2, we make sure that the final output applies the remaining formatting rules.

Original text Generated text

As its name suggests, Vienna Sporthotel has loads lined up for sports fans.

There are 10 tennis courts and four golf simulators here, plus, you can book a session with a coach to brush up on your skills. On top of that, there’s an outdoor pool, a well-stocked fitness room and a big sauna. To fuel up for all that exercise, you can grab a plate at the breakfast buffet.

Location-wise, this place is set up for sightseeing. Hop on a bus and you can reach the likes of The Hofburg, Belvedere Palace and Karlskirche in less than half an hour. You’re only a 10-minute stroll away from the nearest restaurants and cafes, too.

With a name like Vienna Sporthotel, you can expect this place to serve up plenty for fans of sport and activity. Brush up your skills on one of 10 tennis courts or in the four golf simulators, then recharge in the outdoor pool or well-equipped gym. There’s even a coach on hand to help perfect your technique.

When you’re ready to refuel, tuck into the breakfast buffet. Then get set for sightseeing – a bus ride whisks you to top Vienna attractions like The Hofburg, Belvedere Palace and Karlskirche in under 30 minutes. You’re also just a short stroll from local eateries and coffee shops.

The following image shows a list of the challenges and how the LLM handled them. The LLM met all six challenges.

Success in Target Outcomes

Key outcomes

The final architecture consists of a fine-tuned Meta Llama 2 13B model and Anthropic Claude 2, using the strengths of each model. In a blind test, these dynamically generated hotel descriptions were rated higher than those written by humans in 75% of a sample of 50 hotels. We also integrated a third-party API to calculate SEO scores for the generated content, and we observed up to 4% uplift in SEO scores for the generated content compared to human written descriptions. Most significantly, the content generation process is now five times faster, enhancing our team’s productivity without compromising quality or consistency. We can generate a vast number of hotel descriptions in just a few hours— a task that previously took months.

Takeaways

Moving forward, we plan to explore how this technology can address current inefficiencies and quality gaps, especially for hotels that our team hasn’t had the capacity to curate. We plan to expand this solution to more brands and regions within the TUI portfolio, including producing content in various languages and tailoring it to meet the specific needs of different audiences.

Throughout this project, we learned a few valuable lessons:

  • Few-shot prompting is cost-effective and sufficient when you have limited examples and specific guidelines for responses. Fine-tuning can help significantly improve model performance when you need to tailor content to match a brand’s tone of voice, but can be resource intensive and is based on static data sources that can get outdated.
  • Fine-tuning the Llama 70B model was much more expensive than Llama 13B and did not result in significant improvement.
  • Incorporating human feedback and maintaining a human-in-the-loop approach is essential for protecting brand integrity and continuously improving the solution. The collaboration between TUI engineering, content, and SEO teams was crucial to the success of this project.

Although Meta Llama 2 and Anthropic’s Claude 2 were the latest state-of-the-art models available at the time of our experiment, since then we have seen the launch of Meta Llama 3 and Anthropic’s Claude 3.5, which we expect can significantly improve the quality of our outputs. Amazon Bedrock also now supports fine-tuning for Meta Llama 2, Cohere Command Light, and Amazon Titan models, making it simpler and faster to test models without managing infrastructure.


About the Authors

Nikolaos Zavitsanos is a Data Scientist at TUI, specialized in developing customer-facing Generative AI applications using AWS services. With a strong background in Computer Science and Artificial Intelligence, he leverages advanced technologies to enhance user experiences and drive innovation. Outside of work, Nikolaos plays water polo and is competing at a national level. Connect with Nikolaos on Linkedin

Hin Yee Liu is a Senior Prototyping Engagement Manager at Amazon Web Services. She helps AWS customers to bring their big ideas to life and accelerate the adoption of emerging technologies. Hin Yee works closely with customer stakeholders to identify, shape and deliver impactful use cases leveraging Generative AI, AI/ML, Big Data, and Serverless technologies using agile methodologies. In her free time, she enjoys knitting, travelling and strength training. Connect with Hin Yee on LinkedIn.

Read More

AI in Your Own Words: NVIDIA Debuts NeMo Retriever Microservices for Multilingual Generative AI Fueled by Data

AI in Your Own Words: NVIDIA Debuts NeMo Retriever Microservices for Multilingual Generative AI Fueled by Data

In enterprise AI, understanding and working across multiple languages is no longer optional — it’s essential for meeting the needs of employees, customers and users worldwide.

Multilingual information retrieval — the ability to search, process and retrieve knowledge across languages — plays a key role in enabling AI to deliver more accurate and globally relevant outputs.

Enterprises can expand their generative AI efforts into accurate, multilingual systems using NVIDIA NeMo Retriever embedding and reranking NVIDIA NIM microservices, which are now available on the NVIDIA API catalog. These models can understand information across a wide range of languages and formats, such as documents, to deliver accurate, context-aware results at massive scale.

With NeMo Retriever, businesses can now:

  • Extract knowledge from large and diverse datasets for additional context to deliver more accurate responses.
  • Seamlessly connect generative AI to enterprise data in most major global languages to expand user audiences.
  • Deliver actionable intelligence at greater scale with 35x improved data storage efficiency through new techniques such as long context support and dynamic embedding sizing.
New NeMo Retriever microservices reduce storage volume needs by 35x, enabling enterprises to process more information at once and fit large knowledge bases on a single server. This makes AI solutions more accessible, cost-effective and easier to scale across organizations.

Leading NVIDIA partners like DataStax, Cohesity, Cloudera, Nutanix, SAP, VAST Data and WEKA are already adopting these microservices to help organizations across industries securely connect custom models to diverse and large data sources. By using retrieval-augmented generation (RAG) techniques, NeMo Retriever enables AI systems to access richer, more relevant information and effectively bridge linguistic and contextual divides.

Wikidata Speeds Data Processing From 30 Days to Under Three Days 

In partnership with DataStax, Wikimedia has implemented NeMo Retriever to vector-embed the content of Wikipedia, serving billions of users. Vector embedding — or “vectorizing” —  is a process that transforms data into a format that AI can process and understand to extract insights and drive intelligent decision-making.

Wikimedia used the NeMo Retriever embedding and reranking NIM microservices to vectorize over 10 million Wikidata entries into AI-ready formats in under three days, a process that used to take 30 days. That 10x speedup enables scalable, multilingual access to one of the world’s largest open-source knowledge graphs.

This groundbreaking project ensures real-time updates for hundreds of thousands of entries that are being edited daily by thousands of contributors, enhancing global accessibility for developers and users alike. With Astra DB’s serverless model and NVIDIA AI technologies, the DataStax offering delivers near-zero latency and exceptional scalability to support the dynamic demands of the Wikimedia community.

DataStax is using NVIDIA AI Blueprints and integrating the NVIDIA NeMo Customizer, Curator, Evaluator and Guardrails microservices into the LangFlow AI code builder to enable the developer ecosystem to optimize AI models and pipelines for their unique use cases and help enterprises scale their AI applications.

Language-Inclusive AI Drives Global Business Impact

NeMo Retriever helps global enterprises overcome linguistic and contextual barriers and unlock the potential of their data. By deploying robust, AI solutions, businesses can achieve accurate, scalable and high-impact results.

NVIDIA’s platform and consulting partners play a critical role in ensuring enterprises can efficiently adopt and integrate generative AI capabilities, such as the new multilingual NeMo Retriever microservices. These partners help align AI solutions to an organization’s unique needs and resources, making generative AI more accessible and effective. They include:

  • Cloudera plans to expand the integration of NVIDIA AI in the Cloudera AI Inference Service. Currently embedded with NVIDIA NIM, Cloudera AI Inference will include NVIDIA NeMo Retriever to improve the speed and quality of insights for multilingual use cases.
  • Cohesity introduced the industry’s first generative AI-powered conversational search assistant that uses backup data to deliver insightful responses. It uses the NVIDIA NeMo Retriever reranking microservice to improve retrieval accuracy and significantly enhance the speed and quality of insights for various applications.
  • SAP is using the grounding capabilities of NeMo Retriever to add context to its Joule copilot Q&A feature and information retrieved from custom documents.
  • VAST Data is deploying NeMo Retriever microservices on the VAST Data InsightEngine with NVIDIA to make new data instantly available for analysis. This accelerates the identification of business insights by capturing and organizing real-time information for AI-powered decisions.
  • WEKA is integrating its WEKA AI RAG Reference Platform (WARRP) architecture with NVIDIA NIM and NeMo Retriever into its low-latency data platform to deliver scalable, multimodal AI solutions, processing hundreds of thousands of tokens per second.

Breaking Language Barriers With Multilingual Information Retrieval

Multilingual information retrieval is vital for enterprise AI to meet real-world demands. NeMo Retriever supports efficient and accurate text retrieval across multiple languages and cross-lingual datasets. It’s designed for enterprise use cases such as search, question-answering, summarization and recommendation systems.

Additionally, it addresses a significant challenge in enterprise AI — handling large volumes of large documents. With long-context support, the new microservices can process lengthy contracts or detailed medical records while maintaining accuracy and consistency over extended interactions.

These capabilities help enterprises use their data more effectively, providing precise, reliable results for employees, customers and users while optimizing resources for scalability. Advanced multilingual retrieval tools like NeMo Retriever can make AI systems more adaptable, accessible and impactful in a globalized world.

Availability

Developers can access the multilingual NeMo Retriever microservices, and other NIM microservices for information retrieval, through the NVIDIA API catalog, or a no-cost, 90-day NVIDIA AI Enterprise developer license.

Learn more about the new NeMo Retriever microservices and how to use them to build efficient information retrieval systems.

Read More