Automate actions across enterprise applications using Amazon Q Business plugins

Automate actions across enterprise applications using Amazon Q Business plugins

Amazon Q Business is a generative AI-powered assistant that enhances employee productivity by solving problems, generating content, and providing insights across enterprise data sources. Beyond searching indexed third-party services, employees need access to dynamic, near real-time data such as stock prices, vacation balances, and location tracking, which is made possible through Amazon Q Business plugins. Furthermore, Amazon Q Business plugins enable employees to take direct actions within multiple enterprise applications—such as upgrading service ticket priorities—through a single Amazon Q Business interface, eliminating the need to switch between different systems and saving valuable time.

In this post, we explore how Amazon Q Business plugins enable seamless integration with enterprise applications through both built-in and custom plugins. We dive into configuring built-in plugins such as Salesforce, creating custom plugins for specific business needs, and real-world use cases showing how plugins can streamline employee workflows across multiple applications

Plugins enable Amazon Q Business users to use natural language to access non-indexed data (for example, available calendar slots, stock prices, and PTO balance) and take actions (for example, book a meeting or submit PTO) using third-party services such as Jira, ServiceNow, Salesforce, Fidelity, Vanguard, ADP, Workday, and Google Calendar. This provides a more straightforward and quicker experience for users, who no longer need to use multiple applications to complete tasks.

Solution overview

The following figure illustrates a sample architecture using Amazon Q Business plugins.

Amazon Q Business can connect to enterprise applications using over 50 connectors and over 10 plugins. Administrators can use connectors to pre-index the content from enterprise sources into Amazon Q Business to be used by end-users, whereas plugins can be configured to retrieve information and perform actions in real time on enterprise applications. There are two types of plugins:

  • Built-in plugins – These are available by default in Amazon Q Business. Built-in plugins carry out specific actions in an enterprise application. At the time of writing, we support predefined operations on Jira Cloud, ServiceNow, Zendesk Suite, Microsoft Teams, Atlassian Confluence, Smartsheet, Salesforce, Microsoft Exchange, Asana, and Google Calendar.
  • Custom plugins – These are created by administrators to interact with specific third-party services and the API endpoints. Administrators have flexibility in defining the behavior and actions carried out by custom plugins.

In the following sections, we discuss the capabilities of built-in plugins and custom plugins, with examples to create each type of plugin.

Built-in plugins

Amazon Q Business supports more than 50 actions in applications, including:

  • PagerDuty Advance, ServiceNow, and Zendesk Suite for ticketing and incident management
  • Atlassian Confluence, Jira Cloud, and Smartsheet for project management
  • Salesforce for customer relationship management (CRM)
  • Microsoft Exchange and Teams for communication
  • Asana and Google Calendar for productivity

The following table provides a complete list of the Amazon Q actions available for each application.

Category Application Actions
Ticketing and incident management PagerDuty Advance • Get incidents
• Similar incidents
• Root cause incident
• Find recent changes
• Who is on-call
• Status update on incident
• Customer impact
• Update incident
  ServiceNow • Create incident
• Read incident
• Update incident
• Delete incident
• Read change request
• Create change request
• Update change request
• Delete change request
  Zendesk Suite • Search content
• Get ticket
• Create ticket
• Update ticket
Project management Atlassian Confluence • Search pages
  Jira Cloud • Read issue
• Create issue
• Search issue
• Change issue status
• Delete issue
• Read sprint
• Move issue to sprint
• Create sprint
• Delete sprint
  Smartsheet • Search sheets
• Read sheet
• List reports
• Get report
Customer Relationship Management (CRM) Salesforce • Get account list
• Get case
• Create case
• Delete case
• Update case
• Get opportunities
• Get specific opportunity
• Create opportunity
• Update opportunity
• Delete opportunity
• Fetch specific contact
• List contacts
Communication Microsoft Exchange • Get events from calendar
• Get email
  Microsoft Teams • Send private message
• Send channel message (public or private)
Productivity Asana • Create a task
• Update a task
  Google Calendar • Find events
• List calendar

Built-in plugin example: Configure the Salesforce built-in plugin with Amazon Q Business

Salesforce is a CRM tool for managing customer interactions. If you’re a Salesforce user, you can activate the Amazon Q Business plugin for Salesforce to allow your users to perform the following actions from within their web experience chat:

  • Managing cases (create, delete, update, and get)
  • Retrieving account lists
  • Handling opportunities (create, update, delete, get, and fetch specific)
  • Fetching specific contacts

To set up this plugin, you need configuration details from your Salesforce instance to connect Amazon Q Business with Salesforce. For more information, see Prerequisites

After carrying out the prerequisites in Salesforce and capturing configuration details, you need to configure them on the Amazon Q Business console.

To configure the plugin, complete the following steps:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select your application and on the Actions menu, choose Plugins.
  3. Choose Add plugin.
  4. Under Add plugin, provide the following information:
    • Choose Salesforce as your plugin.
    • For Plugin name, enter a name for your Amazon Q plugin.
    • For Domain URL, enter your Salesforce domain URL. For example, https://yourInstance.my.salesforce.com/services/data/v60.0.
  5. Under OAuth 2.0 authentication, for AWS Secrets Manager secret, select Create and add a new secret or Use an existing one. (For this example, we create a new AWS Secrets Manager secrets).
  6. In the Create new AWS Secrets Manager secret pop-up, enter the following information:
    1. For Secret name, enter a name for your secret.
    2. For Client ID, enter the client ID generated when you created your OAuth 2.0 application in Salesforce.
    3. For Client secret, enter the client secret generated when you created your OAuth 2.0 application in Salesforce.
    4. For Redirect URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is <q-endpoint>, use <q-endpoint>/oauth/callback. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application.
    5. Choose Create.
  7. For Access token URL, enter https://login.salesforce.com/services/oauth2/token (Salesforce OAuth applications).
  8. For Authorization URL, enter https://login.salesforce.com/services/oauth2/authorize (Salesforce OAuth applications).
  9. Under Service access, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
  10. Under Tags, you can add optional tags to track your plugin.
  11. Choose Add.

You have successfully added the Salesforce built-in plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.

Custom plugins

If an action isn’t available through built-in plugins, then you can build a custom plugin and add it to your Amazon Q Business plugins. With custom plugins, you can integrate Amazon Q with third-party applications for a variety of different use cases. After a custom plugin is enabled, users can use natural language to query data (such as stock prices or their vacation balance) and take actions (such as submitting vacation time or updating a record).

Creating and using custom plugins requires the following high-level steps:

  1. Configure authentication and network information for the third-party application to interact with Amazon Q Business.
  2. Create or edit an OpenAPI schema outlining the different API operations that you want to enable for your custom plugin. You can configure up to eight API operations per custom plugin.
  3. After the custom plugin is deployed, Amazon Q Business will dynamically determine the appropriate APIs to call to accomplish a user-requested task. To maximize accuracy, review the best practices for configuring OpenAPI schema definitions for custom plugins.

Custom plugin example: Configure the HR Time Off custom plugin with Amazon Q Business.

The HR Time Off custom plugin is designed to help employees manage their time off requests through Amazon Q Business. An employee can use this custom plugin to perform the following actions directly from an Amazon Q business web experience chat:

  • Check available time off balance
  • Submit time off requests

The following figure shows the architecture of this plugin.

This integration allows employees to manage their time off requests seamlessly in Amazon Q Business without having to switch between different applications, improving productivity and user experience.

For an AWS CloudFormation template and code samples to deploy an HR Leave Management System application along with the Amazon Q Business plugin, refer to the following GitHub repo.

To configure Amazon Q Business with the API details, complete the following steps:

  1. On the Amazon Q Business console, in the navigation pane, choose Applications.
  2. Select your application from the list of applications.
  3. Choose Enhancements, and then choose Plugins.
  4. Choose Add plugin.
  5. Under Add plugin, choose Custom plugin.
  6. Under Name and description, for Plugin name, enter a name for your Amazon Q plugin. The name can include hyphens (-) but not spaces and can have a maximum of 1,000 alphanumeric characters.
  7. Under API schema, for API schema source, select one of the following options:
    • Select Select from Amazon S3 to select an existing API schema from an Amazon Simple Storage Service (Amazon S3) bucket. Your API schema must have an API description, structure, and parameters for your custom plugin. Then, enter the Amazon S3 URL to your API schema.
    • Select Define with in-line OpenAPI schema editor to write a custom plugin API schema in the inline OpenAPI schema editor in the Amazon Q console. A sample schema appears that you can edit. Then, you can choose to do the following:
      • Select the format for the schema: JSON or YAML.
      • To import an existing schema from Amazon S3 to edit, choose Import schema, provide the Amazon S3 URL, and choose Import.
      • To restore the schema to the original sample schema, choose Reset and then confirm the message that appears by choosing Reset

  8. Under Authentication, select either Authentication required or No authentication required.
  9. If no authentication is required, there is no further action needed. If authentication is required, choose Create and add a new secret or Use an existing one. (For this post, we create a new secret.)

Your secret must contain:

  1. In the Create an AWS Secrets Manager secret pop-up, provide the following information:
    • For Secret name, enter a name for your Secrets Manager secret.
    • For Client ID, enter the client ID you copied from your third-party application.
    • For Client secret, enter the client secret you copied from your third-party application.
    • For OAuth callback URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is <q-endpoint>, use <q-endpoint>/oauth/callback. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application.
    • Choose Create.
  2. Under Choose a method to authorize Amazon Q Business, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
  3. The console will generate a Service role name.
  4. Under Tags, you can add optional tags to track your plugin.
  5. Choose Add to add your plugin.

You have successfully added the HR Time Off custom plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.

End-to-end use cases using built-in and custom plugins

Sarah, a Customer Success Manager, demonstrates the seamless use of multiple applications through Amazon Q Business. Using Amazon Q Business, she uses a Salesforce built-in plugin to check high-value opportunities and create cases, uses ServiceNow’s built-in plugin for ticket management on email synchronization issues of her laptop, and uses a custom HR plugin to check her PTO balance and submit time off requests.

Overview of the Amazon Q Business setup

To enable Sarah’s seamless experience across multiple applications, an Amazon Q Business administrator needs to implement a comprehensive configuration that combines both built-in and custom plugins. This enterprise-wide setup consists of:

  • UI integration
    1. Implement the Amazon Q Business chat interface
    2. Configure user interaction endpoints
  • Built-in plugin setup
    1. Integrate ServiceNow for IT service management and incident handling
    2. Configure Salesforce plugin for CRM operations and case handling
  • Custom plugin implementation
    1. Set up the HR Time Off plugin employee leave management and PTO balance inquiries
    2. Configure endpoints and authentication mechanisms
  • Data source integration
    1. Configure an Amazon S3 connector for ingesting IT documentation
    2. Set up secure access to the enterprise knowledge base

This integrated setup, shown in the following figure, enables employees to interact with multiple enterprise systems through a single, conversational interface, significantly improving workflow efficiency and user experience.

The following screenshot shows all the plugins available for end-user.

In the following sections, we explore the end-to-end user flow for this use case.

Salesforce integration (built-in plugin)

Sarah selects the Salesforce built-in plugin from the Amazon Q Business Chat UI and asks Amazon Q to provide details about high-value opportunities, as shown in the following screenshots.


During the first use of the Salesforce plugin, Amazon Q Business will authenticate the user through Salesforce’s login interface, as shown in the following screenshot. For users who have already authenticated through enterprise single sign-on (SSO) or directly using their Salesforce login, only an API access approval will be requested.

After authentication and API access approval by the user, the plugin returns the results in real time from Salesforce, as shown in the following screenshot.

Later, Sarah creates a new case in Salesforce to follow up with high-value client, as shown in the following screenshot.

A case is created successfully in Salesforce, as shown in the following screenshot.

ServiceNow ticket management integration (enterprise indexed content and built-in plugin)

Sarah encounters an email synchronization issue on her laptop. Sarah searches Amazon Q Business for guidance on troubleshooting the issue. Given that Amazon Q Business has already indexed IT Helpdesk documents from Amazon S3, it returns troubleshooting steps, as shown in the following screenshot.


Sarah couldn’t resolve the issue after following the troubleshooting documentation. She chooses the ServiceNow plugin in the Chat UI and creates a ServiceNow ticket for further analysis, as shown in the following screenshot.

During the first usage of the ServiceNow plugin, Amazon Q Business will authenticate the user through ServiceNow’s login interface, as shown in the following screenshot.

For users who are already authenticated through enterprise SSO or directly using their ServiceNow login, only an API access approval is required, as shown in the following screenshot.

As shown in the following screenshot, an incident is successfully created in ServiceNow.

An incident is created successfully in ServiceNow as show below. This shows the creation capability of built in plugin.

She updates the ticket priority to high for faster resolution as show below. This shows the update capability of built in plugin.


Impact and Urgency of the incident is updated to high in ServiceNow in real-time as shown in below figure. This shows the update capability of built in plugin.

HR system integration (custom plugin)

Sarah needs to plan her upcoming vacation. She uses Amazon Q to check her available PTO balance through the HR custom plugin, as shown in the following screenshot. This demonstrates the real-time secure retrieval capability of custom plugins.

She submits a time off request directly through Amazon Q, as shown in the following screenshots.


Sarah’s experience demonstrates how Amazon Q Business plugins enable seamless real-time interaction across multiple enterprise applications—from managing Salesforce opportunities and ServiceNow tickets to submitting time off requests—all through a single conversational interface, eliminating application switching and improving productivity.

Clean up

To clean up, delete the Amazon Q application you created.

Conclusion

Amazon Q Business actions through plugins represent a significant advancement in streamlining enterprise workflows and enhancing employee productivity. As demonstrated in this post, these advancements can be seen across three key areas:

  • Unified interface
    • Provides employees with a single, conversational interface
    • Enables seamless interaction across multiple enterprise applications
    • Eliminates the need for constant application switching
  • Knowledge integration
    • Combines enterprise knowledge from Amazon Q Business connectors with actionable plugins
    • Enables employees to access documentation and take immediate action
  • Workflow enhancement
    • Simplifies complex tasks through natural language interaction
    • Reduces time spent switching between applications
    • Improves overall employee productivity

What enterprise workflows in your organization could benefit from streamlined automation through Amazon Q Business plugins? Whether it’s integrating existing enterprise applications through built-in plugins or creating custom plugins for your proprietary systems, Amazon Q Business provides the flexibility to enhance employee productivity across your organization. Try implementing plugins in your Amazon Q Business environment today, and share your feedback and use cases in the comments.


About the Authors

Abhishek Maligehalli Shivalingaiah is a Senior Generative AI Solutions Architect at AWS, specializing in Amazon Q Business. With a deep passion for using agentic AI frameworks to solve complex business challenges, he brings nearly a decade of expertise in developing data and AI solutions that deliver tangible value for enterprises. Beyond his professional endeavors, Abhishek is an artist who finds joy in creating portraits of family and friends, expressing his creativity through various artistic mediums.

Marcel Pividal is a Senior AI Services Solutions Architect in the World-Wide Specialist Organization, bringing over 22 years of expertise in transforming complex business challenges into innovative technological solutions. As a thought leader in generative AI implementation, he specializes in developing secure, compliant AI architectures for enterprise-scale deployments across multiple industries.

Sachi Sharma is a Senior Software Engineer at Amazon Q Business, specializing in generative and agentic AI. Beyond her professional pursuits, Sachi is an avid reader and coffee lover, and enjoys driving, particularly long, scenic drives.

Manjukumar Patil is a Software Engineer at Amazon Q Business with a passion for designing and scaling AI-driven distributed systems. In his free time, he loves hiking and exploring national parks.

James Gung is a Senior Applied Scientist at AWS whose research spans diverse topics related to conversational AI and agentive systems. Outside of work, he enjoys spending time with his family, traveling, playing violin, and bouldering.

Najih is a Senior Software Engineer at AWS Q Business. He is passionate about designing and scaling AI based distributed systems, and excels at bringing innovative solutions to complex challenges. Outside of work, he enjoys lifting and martial arts, particularly MMA.

Read More

Turn Down the Noise: CUDA-Q Enables Industry-First Quantum Computing Demo With Logical Qubits

Turn Down the Noise: CUDA-Q Enables Industry-First Quantum Computing Demo With Logical Qubits

Quantum computing has the potential to transform industries ranging from drug discovery to logistics, but a huge barrier standing between today’s quantum devices and useful applications is noise. These disturbances, introduced by environmental interactions and imperfect hardware, mean that today’s qubits can only perform hundreds of operations before quantum computations irretrievably deteriorate. 

Though seemingly inevitable, noise in quantum hardware can be tackled by so-called logical qubits – collections of tens, hundreds or even thousands of actual physical qubits that allow the correction of noise-induced errors. Logical qubits are the holy grail of quantum computing, and quantum hardware builder Infleqtion today published groundbreaking work that used the NVIDIA CUDA-Q platform to both design and demonstrate an experiment with two of them.  

These logical qubits were used to perform a small-scale demonstration of the so-called single-impurity Anderson model, a high-accuracy approach necessary for many important materials science applications. 

This constitutes the first time that a demonstration of a materials science quantum algorithm has been performed on logical qubits. The creation of just a single logical qubit is extremely challenging. Infleqtion was able to achieve such a feat thanks to accurate modeling of its quantum computer using CUDA-Q’s unique GPU-accelerated simulation capabilities.  

Having developed and tested its entire experiment within CUDA-Q’s simulators, with only trivial changes, Infleqtion could then use CUDA-Q to orchestrate the experiment using the actual physical qubits within its Sqale neutral atom quantum processor. 

This work sets the stage for quantum computing’s move toward large-scale, error-corrected systems.  

Many scaling challenges still stand between today’s quantum devices and large systems of logical qubits, which will only be solved by integrating quantum hardware with AI supercomputers to form accelerated quantum supercomputers.  

NVIDIA continues to work with partners like Infleqtion to enable this breakthrough research needed to make accelerated quantum supercomputing a reality. 

Learn more about NVIDIA’s quantum computing platforms. 

Read More

Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

*Equal Contributors
Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy…Apple Machine Learning Research

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

Accelerating ML experimentation with enhanced security: AWS PrivateLink support for Amazon SageMaker with MLflow

With access to a wide range of generative AI foundation models (FM) and the ability to build and train their own machine learning (ML) models in Amazon SageMaker, users want a seamless and secure way to experiment with and select the models that deliver the most value for their business. In the initial stages of an ML project, data scientists collaborate closely, sharing experimental results to address business challenges. However, keeping track of numerous experiments, their parameters, metrics, and results can be difficult, especially when working on complex projects simultaneously. MLflow, a popular open-source tool, helps data scientists organize, track, and analyze ML and generative AI experiments, making it easier to reproduce and compare results.

SageMaker is a comprehensive, fully managed ML service designed to provide data scientists and ML engineers with the tools they need to handle the entire ML workflow. Amazon SageMaker with MLflow is a capability in SageMaker that enables users to create, manage, analyze, and compare their ML experiments seamlessly. It simplifies the often complex and time-consuming tasks involved in setting up and managing an MLflow environment, allowing ML administrators to quickly establish secure and scalable MLflow environments on AWS. See Fully managed MLFlow on Amazon SageMaker for more details.

Enhanced security: AWS VPC and AWS PrivateLink

When working with SageMaker, you can decide the level of internet access to provide to your users. For example, you can give users access permission to download popular packages and customize the development environment. However, this can also introduce potential risks of unauthorized access to your data. To mitigate these risks, you can further restrict which traffic can access the internet by launching your ML environment in an Amazon Virtual Private Cloud (Amazon VPC). With an Amazon VPC, you can control the network access and internet connectivity of your SageMaker environment, or even remove direct internet access to add another layer of security. See Connect to SageMaker through a VPC interface endpoint to understand the implications of running SageMaker within a VPC and the differences when using network isolation.

SageMaker with MLflow now supports AWS PrivateLink, which enables you to transfer critical data from your VPC to MLflow Tracking Servers through a VPC endpoint. This capability enhances the protection of sensitive information by making sure that data sent to the MLflow Tracking Servers is transferred within the AWS network, avoiding exposure to the public internet. This capability is available in all AWS Regions where SageMaker is currently available, excluding China Regions and GovCloud (US) Regions. To learn more, see Connect to an MLflow tracking server through an Interface VPC Endpoint.

In this blogpost, we demonstrate a use case to set up a SageMaker environment in a private VPC (without internet access), while using MLflow capabilities to accelerate ML experimentation.

Solution overview

You can find the reference code for this sample in GitHub. The high-level steps are as follows:

  1. Deploy infrastructure with the AWS Cloud Development Kit (AWS CDK) including:
  2. Run ML experimentation with MLflow using the @remote decorator from the open-source SageMaker Python SDK.

The overall solution architecture is shown in the following figure.

solution architecture

For your reference, this blog post demonstrates a solution to create a VPC with no internet connection using an AWS CloudFormation template.

Prerequisites

You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, see Creating an AWS account.

Deploy infrastructure with AWS CDK

The first step is to create the infrastructure using this CDK stack. You can follow the deployment instructions from the README.

Let’s first have a closer look at the CDK stack itself.

It defines multiple VPC endpoints, including the MLflow endpoint as shown in the following sample:

vpc.add_interface_endpoint(
    "mlflow-experiments",
    service=ec2.InterfaceVpcEndpointAwsService.SAGEMAKER_EXPERIMENTS,
    private_dns_enabled=True,
    subnets=ec2.SubnetSelection(subnets=subnets),
    security_groups=[studio_security_group]
)

We also try to restrict the SageMaker execution IAM role so that you can use SageMaker MLflow only when you’re in the right VPC.

You can further restrict the VPC endpoint for MLflow by attaching a VPC endpoint policy.

Users outside the VPC can potentially connect to Sagemaker MLflow through the VPC endpoint to MLflow. You can add restrictions so that user access to SageMaker MLflow is only allowed from your VPC.

studio_execution_role.attach_inline_policy(
    iam.Policy(self, "mlflow-policy",
        statements=[
            iam.PolicyStatement(
                effect=iam.Effect.ALLOW,
                actions=["sagemaker-mlflow:*"],
                resources=["*"],
                conditions={"StringEquals": {"aws:SourceVpc": vpc.vpc_id } }
            )
        ]
    )
)

After successful deployment, you should be able to see the new VPC in the AWS Management Console for Amazon VPC without internet access, as shown in the following screenshot.

vpc

A CodeArtifact domain and a CodeArtifact repository with external connection to PyPI should also be created, as shown in the following figure, so that SageMaker can use it to download necessary packages without internet access. You can verify the creation of the domain and the repository by going to the CodeArtifact console. Choose “Repositories” under “Artifacts” from the navigation pane and you will see the repository “pip”.

CodeArtifact

ML experimentation with MLflow

Setup

After the CDK stack creation, a new SageMaker domain with a user profile should also be created. Launch Amazon SageMaker Studio and create a JupyterLab Space. In the JupyterLab Space, choose an instance type of ml.t3.medium, and select an image with SageMaker Distribution 2.1.0.

To check that the SageMaker environment has no internet connection, open the JupyterLab space and check the internet connection by running the curl command in a terminal.

no internet access

SageMaker with MLflow now supports MLflow version 2.16.2 to accelerate generative AI and ML workflows from experimentation to production. An MLflow 2.16.2 tracking server is created along with the CDK stack.

You can find the MLflow tracking server Amazon Resource Name (ARN) either from the CDK output or from the SageMaker Studio UI by clicking “MLFlow” icon, as shown in the following figure. You can click the “copy” button next to the “mlflow-server” to copy the MLflow tracking server ARN.

mlflow-tracking-server

As an example dataset to train the model, download the reference dataset from the public UC Irvine ML repository to your local PC, and name it predictive_maintenance_raw_data_header.csv.

Upload the reference dataset from your local PC to your JupyterLab Space as shown in the following figure.

JupyterLab

To test your private connectivity to the MLflow tracking server, you can download the sample notebook that has been uploaded automatically during the creation of the stack in a bucket within your AWS account. You can find the an S3 bucket name in the CDK output, as shown in the following figure.

s3 bucket arn

From the JupyterLab app terminal, run the following command:

aws s3 cp --recursive <YOUR-BUCKET-URI> ./

You can now open the private-mlflow.ipynb notebook.

In the first cell, fetch credentials for the CodeArtifact PyPI repository so that SageMaker can use pip from the private AWS CodeArtifact repository. The credentials will expire in 12 hours. Make sure to log on again when they expire.

%%bash
AWS_ACCOUNT=$(aws sts get-caller-identity --output text --query 'Account')
aws codeartifact login --tool pip --repository pip --domain code-artifact-domain --domain-owner ${AWS_ACCOUNT} --region ${AWS_DEFAULT_REGION}

Experimentation

After setup, start the experimentation. The scenario is using the XGBoost algorithm to train a binary classification model. Both the data processing job and model training job use @remote decorator so that the jobs are running in the SageMaker-associated private subnets and security group from your private VPC.

In this case, the @remote decorator looks up the parameter values from the SageMaker configuration file (config.yaml). These parameters are used for data processing and training jobs. We define the SageMaker-associated private subnets and security group in the configuration file. For the full list of supported configurations for the @remote decorator, see Configuration file in the SageMaker Developer Guide.

Note that we specify in PreExecutionCommands the aws codeartifact login command to point SageMaker to the private CodeAritifact repository. This is needed to make sure that the dependencies can be installed at runtime. Alternatively, you can pass a reference to a container in your Amazon ECR through ImageUri, which contains all installed dependencies.

We specify the security group and subnets information in VpcConfig.

config_yaml = f"""
SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      TelemetryOptOut: true
      RemoteFunction:
        # role arn is not required if in SageMaker Notebook instance or SageMaker Studio
        # Uncomment the following line and replace with the right execution role if in a local IDE
        # RoleArn: <replace the role arn here>
        # ImageUri: <replace with your image if you want to avoid installing dependencies at run time>
        S3RootUri: s3://{bucket_prefix}
        InstanceType: ml.m5.xlarge
        Dependencies: ./requirements.txt
        IncludeLocalWorkDir: true
        PreExecutionCommands:
        - "aws codeartifact login --tool pip --repository pip --domain code-artifact-domain --domain-owner {account_id} --region {region}"
        CustomFileFilter:
          IgnoreNamePatterns:
          - "data/*"
          - "models/*"
          - "*.ipynb"
          - "__pycache__"
        VpcConfig:
          SecurityGroupIds: 
          - {security_group_id}
          Subnets: 
          - {private_subnet_id_1}
          - {private_subnet_id_2}
"""

Here’s how you can setup an MLflow experiment similar to this.

from time import gmtime, strftime

# Mlflow (replace these values with your own, if needed)
project_prefix = project_prefix
tracking_server_arn = mlflow_arn
experiment_name = f"{project_prefix}-sm-private-experiment"
run_name=f"run-{strftime('%d-%H-%M-%S', gmtime())}"

Data preprocessing

During the data processing, we use the @remote decorator to link parameters in config.yaml to your preprocess function.

Note that MLflow tracking starts from the mlflow.start_run() API.

The mlflow.autolog() API can automatically log information such as metrics, parameters, and artifacts.

You can use log_input() method to log a dataset to the MLflow artifact store.

@remote(keep_alive_period_in_seconds=3600, job_name_prefix=f"{project_prefix}-sm-private-preprocess")
def preprocess(df, df_source: str, experiment_name: str):
    
    mlflow.set_tracking_uri(tracking_server_arn)
    mlflow.set_experiment(experiment_name)    
    
    with mlflow.start_run(run_name=f"Preprocessing") as run:            
        mlflow.autolog()
        
        columns = ['Type', 'Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]', 'Machine failure']
        cat_columns = ['Type']
        num_columns = ['Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']
        target_column = 'Machine failure'                    
        df = df[columns]

        mlflow.log_input(
            mlflow.data.from_pandas(df, df_source, targets=target_column),
            context="DataPreprocessing",
        )
        
        ...
        
        model_file_path="/opt/ml/model/sklearn_model.joblib"
        os.makedirs(os.path.dirname(model_file_path), exist_ok=True)
        joblib.dump(featurizer_model, model_file_path)

    return X_train, y_train, X_val, y_val, X_test, y_test, featurizer_model

Run the preprocessing job, then go to the MLflow UI (shown in the following figure) to see the tracked preprocessing job with the input dataset.

X_train, y_train, X_val, y_val, X_test, y_test, featurizer_model = preprocess(df=df, 
                                                                              df_source=input_data_path, 
                                                                              experiment_name=experiment_name)

You can open an MLflow UI from SageMaker Studio as the following figure. Click “Experiments” from the navigation pane and select your experiment.

mlflow-UI

From the MLflow UI, you can see the processing job that just run.

mlflow experiment

You can also see security details in the SageMaker Studio console in the corresponding training job as shown in the following figure.

training security

Model training

Similar to the data processing job, you can also use @remote decorator with the training job.

Note that the log_metrics() method sends your defined metrics to the MLflow tracking server.

@remote(keep_alive_period_in_seconds=3600, job_name_prefix=f"{project_prefix}-sm-private-train")
def train(X_train, y_train, X_val, y_val,
          eta=0.1, 
          max_depth=2, 
          gamma=0.0,
          min_child_weight=1,
          verbosity=0,
          objective='binary:logistic',
          eval_metric='auc',
          num_boost_round=5):     
    
    mlflow.set_tracking_uri(tracking_server_arn)
    mlflow.set_experiment(experiment_name)
    
    with mlflow.start_run(run_name=f"Training") as run:               
        mlflow.autolog()
             
        # Creating DMatrix(es)
        dtrain = xgboost.DMatrix(X_train, label=y_train)
        dval = xgboost.DMatrix(X_val, label=y_val)
        watchlist = [(dtrain, "train"), (dval, "validation")]
    
        print('')
        print (f'===Starting training with max_depth {max_depth}===')
        
        param_dist = {
            "max_depth": max_depth,
            "eta": eta,
            "gamma": gamma,
            "min_child_weight": min_child_weight,
            "verbosity": verbosity,
            "objective": objective,
            "eval_metric": eval_metric
        }        
    
        xgb = xgboost.train(
            params=param_dist,
            dtrain=dtrain,
            evals=watchlist,
            num_boost_round=num_boost_round)
    
        predictions = xgb.predict(dval)
    
        print ("Metrics for validation set")
        print('')
        print (pd.crosstab(index=y_val, columns=np.round(predictions),
                           rownames=['Actuals'], colnames=['Predictions'], margins=True))
        
        rounded_predict = np.round(predictions)
    
        val_accuracy = accuracy_score(y_val, rounded_predict)
        val_precision = precision_score(y_val, rounded_predict)
        val_recall = recall_score(y_val, rounded_predict)

        # Log additional metrics, next to the default ones logged automatically
        mlflow.log_metric("Accuracy Model A", val_accuracy * 100.0)
        mlflow.log_metric("Precision Model A", val_precision)
        mlflow.log_metric("Recall Model A", val_recall)
        
        from sklearn.metrics import roc_auc_score
    
        val_auc = roc_auc_score(y_val, predictions)
        
        mlflow.log_metric("Validation AUC A", val_auc)
    
        model_file_path="/opt/ml/model/xgboost_model.bin"
        os.makedirs(os.path.dirname(model_file_path), exist_ok=True)
        xgb.save_model(model_file_path)

    return xgb

Define hyperparameters and run the training job.

eta=0.3
max_depth=10

booster = train(X_train, y_train, X_val, y_val,
              eta=eta, 
              max_depth=max_depth)

In the MLflow UI you can see the tracking metrics as shown in the figure below. Under “Experiments” tab, go to “Training” job of your experiment task. It is under “Overview” tab.

mlflow training result

You can also view the metrics as graphs. Under “Model metrics” tab, you can see the model performance metrics that configured as part of the training job log.

mlflow training metrics

With MLflow, you can log your dataset information alongside other key metrics, such as hyperparameters and model evaluation. Find more details in the blogpost LLM experimentation with MLFlow.

Clean up

To clean up, first delete all spaces and applications created within the SageMaker Studio domain. Then destroy the infrastructure created by running the following code.

cdk destroy

Conclusion

SageMaker with MLflow allows ML practitioners to create, manage, analyze, and compare ML experiments on AWS. To enhance security, SageMaker with MLflow now supports AWS PrivateLink. All MLflow Tracking Server versions including 2.16.2 integrate seamlessly with this feature, enabling secure communication between your ML environments and AWS services without exposing data to the public internet.

For an extra layer of security, you can set up SageMaker Studio within your private VPC without Internet access and execute your ML experiments in this environment.

SageMaker with MLflow now supports MLflow 2.16.2. Setting up a fresh installation provides the best experience and full compatibility with the latest features.


About the Authors

xiaoyu_profileXiaoyu Xing is a Solutions Architect at AWS. She is driven by a profound passion for Artificial Intelligence (AI) and Machine Learning (ML). She strives to bridge the gap between these cutting-edge technologies and a broader audience, empowering individuals from diverse backgrounds to learn and leverage AI and ML with ease. She is helping customers to adopt AI and ML solutions on AWS in a secure and responsible way.

Paolo-profilePaolo Di Francesco is a Senior Solutions Architect at Amazon Web Services (AWS). He holds a PhD in Telecommunications Engineering and has experience in software engineering. He is passionate about machine learning and is currently focusing on using his experience to help customers reach their goals on AWS, in particular in discussions around MLOps. Outside of work, he enjoys playing football and reading.

Tomer-profile Tomer Shenhar is a Product Manager at AWS. He specializes in responsible AI, driven by a passion to develop ethically sound and transparent AI solutions.

Read More

Crowning Achievement: NVIDIA Research Model Enables Fast, Efficient Dynamic Scene Reconstruction

Crowning Achievement: NVIDIA Research Model Enables Fast, Efficient Dynamic Scene Reconstruction

Content streaming and engagement are entering a new dimension with QUEEN, an AI model by NVIDIA Research and the University of Maryland that makes it possible to stream free-viewpoint video, which lets viewers experience a 3D scene from any angle.

QUEEN could be used to build immersive streaming applications that teach skills like cooking, put sports fans on the field to watch their favorite teams play from any angle, or bring an extra level of depth to video conferencing in the workplace. It could also be used in industrial environments to help teleoperate robots in a warehouse or a manufacturing plant.

The model will be presented at NeurIPS, the annual conference for AI research that begins Tuesday, Dec. 10, in Vancouver.

“To stream free-viewpoint videos in near real time, we must simultaneously reconstruct and compress the 3D scene,” said Shalini De Mello, director of research and a distinguished research scientist at NVIDIA. “QUEEN balances factors including compression rate, visual quality, encoding time and rendering time to create an optimized pipeline that sets a new standard for visual quality and streamability.”

Reduce, Reuse and Recycle for Efficient Streaming

Free-viewpoint videos are typically created using video footage captured from different camera angles, like a multicamera film studio setup, a set of security cameras in a warehouse or a system of videoconferencing cameras in an office.

Prior AI methods for generating free-viewpoint videos either took too much memory for livestreaming or sacrificed visual quality for smaller file sizes. QUEEN balances both to deliver high-quality visuals — even in dynamic scenes featuring sparks, flames or furry animals — that can be easily transmitted from a host server to a client’s device. It also renders visuals faster than previous methods, supporting streaming use cases.

In most real-world environments, many elements of a scene stay static. In a video, that means a large share of pixels don’t change from one frame to another. To save computation time, QUEEN tracks and reuses renders of these static regions — focusing instead on reconstructing the content that changes over time.

Using an NVIDIA Tensor Core GPU, the researchers evaluated QUEEN’s performance on several benchmarks and found the model outperformed state-of-the-art methods for online free-viewpoint video on a range of metrics. Given 2D videos of the same scene captured from different angles, it typically takes under five seconds of training time to render free-viewpoint videos at around 350 frames per second.

This combination of speed and visual quality can support media broadcasts of concerts and sports games by offering immersive virtual reality experiences or instant replays of key moments in a competition.

In warehouse settings, robot operators could use QUEEN to better gauge depth when maneuvering physical objects. And in a videoconferencing application — such as the 3D videoconferencing demo shown at SIGGRAPH and NVIDIA GTC — it could help presenters demonstrate tasks like cooking or origami while letting viewers pick the visual angle that best supports their learning.

The code for QUEEN will soon be released as open source and shared on the project page.

QUEEN is one of over 50 NVIDIA-authored NeurIPS posters and papers that feature groundbreaking AI research with potential applications in fields including simulation, robotics and healthcare.

Generative Adversarial Nets, the paper that first introduced GAN models, won the NeurIPS 2024 Test of Time Award. Cited more than 85,000 times, the paper was coauthored by Bing Xu, distinguished engineer at NVIDIA. Hear more from its lead author, Ian Goodfellow, research scientist at DeepMind, on the AI Podcast:

Learn more about NVIDIA Research at NeurIPS.

See the latest work from NVIDIA Research, which has hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics.

Academic researchers working on large language models, simulation and modeling, edge AI and more can apply to the NVIDIA Academic Grant Program.

See notice regarding software product information.

Read More

vLLM Joins PyTorch Ecosystem: Easy, Fast, and Cheap LLM Serving for Everyone

vLLM Joins PyTorch Ecosystem: Easy, Fast, and Cheap LLM Serving for Everyone

vllm logo

We’re thrilled to announce that the vLLM project has become a PyTorch ecosystem project, and joined the PyTorch ecosystem family!

Running large language models (LLMs) is both resource-intensive and complex, especially as these models scale to hundreds of billions of parameters. That’s where vLLM comes in — a high-throughput, memory-efficient inference and serving engine designed for LLMs.

Originally built around the innovative PagedAttention algorithm, vLLM has grown into a comprehensive, state-of-the-art inference engine. A thriving community is also continuously adding new features and optimizations to vLLM, including pipeline parallelism, chunked prefill, speculative decoding, and disaggregated serving.

Since its release, vLLM has garnered significant attention, achieving over 31,000 GitHub stars—a testament to its popularity and thriving community. This milestone marks an exciting chapter for vLLM as we continue to empower developers and researchers with cutting-edge tools for efficient and scalable AI deployment. Welcome to the next era of LLM inference!

vLLM has always had a strong connection with the PyTorch project. It is deeply integrated into PyTorch, leveraging it as a unified interface to support a wide array of hardware backends. These include NVIDIA GPUs, AMD GPUs, Google Cloud TPUs, Intel GPUs, Intel CPUs, Intel Gaudi HPUs, and AWS Neuron, among others. This tight coupling with PyTorch ensures seamless compatibility and performance optimization across diverse hardware platforms.

Do you know you can experience the power of vLLM right from your phone? During this year’s Amazon Prime Day, vLLM played a crucial role in delivering lightning-fast responses to millions of users. Across three regions, over 80,000 Trainium and Inferentia chips powered an average of 3 million tokens per minute, all while maintaining a P99 latency of less than 1 second for the first response. That means when customers opened the Amazon app and chatted with Rufus, they were seamlessly interacting with vLLM in action!

vLLM also collaborates tightly with leading model vendors to ensure support for popular models. This includes tight integration with Meta LLAMA, Mistral, QWen, and DeepSeek models, plus many others. One particularly memorable milestone was the release of LLAMA 3.1 (405B). As the launching partner, vLLM was the first to enable running this very large model, showcasing vLLM’s capability to handle the most complex and resource-intensive language models.

To install vLLM, simply run:

pip install vllm

vLLM is designed for both researchers and production-grade serving.

To run vLLM as an OpenAI API compatible server, just use the Huggingface model ID:

vllm serve meta-llama/Llama-3.1-8B

To run vLLM as a simple function:

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
   "Hello, my name is",
   "The president of the United States is",
   "The capital of France is",
   "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="meta-llama/Llama-3.1-8B")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
   prompt = output.prompt
   generated_text = output.outputs[0].text
   print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Open-source innovation is part of the vLLM’s DNA. Born out of a Berkeley academic project, it follows the legacy of other pioneering open-source initiatives such as BSD, which revolutionized operating systems in the 1980s. Other innovations from the same organization include Apache Spark and Ray, now the standard for big data and AI systems. In the Gen AI era, vLLM serves as a platform dedicated to democratizing AI inference.

The vLLM team remains steadfast in its mission to keep the project “of the community, by the community, and for the community.” Collaboration and inclusivity lie at the heart of everything we do.

If you have collaboration requests or inquiries, feel free to reach out at vllm-questions@lists.berkeley.edu. To join the active and growing vLLM community, explore our GitHub repository or connect with us on the vLLM Slack. Together, we can push the boundaries of AI innovation and make it accessible to all.

Read More

Momentum Approximation in Asynchronous Private Federated Learning

This paper was accepted for presentation at the International Workshop on Federated Foundation Models (FL@FM-NeurIPS’24), held in conjunction with NeurIPS 2024.
Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying momentum in asynchronous FL algorithms leads to slower convergence and degraded model performance. It is still unclear how to effective combinie these two techniques together to achieve a win-win…Apple Machine Learning Research