Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…Apple Machine Learning Research

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to consume new, previously unseen modalities via low rank adaptation. For device-directed speech detection, using FLoRA, the multimodal LLM achieves 22% relative reduction in equal error rate (EER) over…Apple Machine Learning Research

Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness

Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speaker identification, and hands-free communication systems. With the increasing demand for personalized and context-aware technologies, the need for effective personalized VAD systems has become paramount. In this paper, we present a comparative analysis of Personalized Voice Activity Detection (PVAD) systems to assess their real-world effectiveness. We introduce a comprehensive approach to assess PVAD systems, incorporating various performance metrics such as frame-level and…Apple Machine Learning Research

Conformer-Based Speech Recognition on Extreme Edge-Computing Devices

This paper was accepted at the Industry Track at NAACL 2024.
With increasingly more powerful compute capabilities and resources in today’s devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other small home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to…Apple Machine Learning Research

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

How Twilio used Amazon SageMaker MLOps pipelines with PrestoDB to enable frequent model retraining and optimized batch transform

This post is co-written with Shamik Ray, Srivyshnav K S, Jagmohan Dhiman and Soumya Kundu from Twilio.

Today’s leading companies trust Twilio’s Customer Engagement Platform (CEP) to build direct, personalized relationships with their customers everywhere in the world. Twilio enables companies to use communications and data to add intelligence and security to every step of the customer journey, from sales and marketing to growth and customer service, and many more engagement use cases in a flexible, programmatic way. Across 180 countries, millions of developers and hundreds of thousands of businesses use Twilio to create magical experiences for their customers. Being one of the largest AWS customers, Twilio engages with data and artificial intelligence and machine learning (AI/ML) services to run their daily workloads. This post outlines the steps AWS and Twilio took to migrate Twilio’s existing machine learning operations (MLOps), the implementation of training models, and running batch inferences to Amazon SageMaker.

ML models don’t operate in isolation. They must integrate into existing production systems and infrastructure to deliver value. This necessitates considering the entire ML lifecycle during design and development. With the right processes and tools, MLOps enables organizations to reliably and efficiently adopt ML across their teams for their specific use cases. SageMaker includes a suite of features for MLOps that includes Amazon SageMaker Pipelines and Amazon SageMaker Model Registry. Pipelines allow for straightforward creation and management of ML workflows while also offering storage and reuse capabilities for workflow steps. The model registry simplifies model deployment by centralizing model tracking.

This post focuses on how to achieve flexibility in using your data source of choice and integrate it seamlessly with Amazon SageMaker Processing jobs. With SageMaker Processing jobs, you can use a simplified, managed experience to run data preprocessing or postprocessing and model evaluation workloads on the SageMaker platform.

Twilio needed to implement an MLOps pipeline that queried data from PrestoDB. PrestoDB is an open source SQL query engine that is designed for fast analytic queries against data of any size from multiple sources.

In this post, we show you a step-by-step implementation to achieve the following:

Use case overview

Twilio trained a binary classification ML model using scikit-learn’s RandomForestClassifier to integrate into their MLOps pipeline. This model is used as part of a batch process that runs periodically for their daily workloads, making training and inference workflows repeatable to accelerate model development. The training data used for this pipeline is made available through PrestoDB and read into Pandas through the PrestoDB Python client.

The end goal was to convert the existing steps into two pipelines: a training pipeline and a batch transform pipeline that connected the data queried from PrestoDB to a SageMaker Processing job, and finally deploy the trained model to a SageMaker endpoint for real-time inference.

In this post, we use an open source dataset available through the TPCH connector that is packaged with PrestoDB to illustrate the end-to-end workflow that Twilio used. Twilio was able to use this solution to migrate their existing MLOps pipeline to SageMaker. All the code for this solution is available in the GitHub repo.

Solution overview

This solution is divided into three main steps:

  • Model training pipeline – In this step, we connect a SageMaker Processing job to fetch data from a PrestoDB instance, train and tune the ML model, evaluate it, and register it with the SageMaker model registry.
  • Batch transform pipeline – In this step, we run a preprocessing data step that reads data from a PrestoDB instance and runs batch inference on the registered ML model (from the model registry) that we approve as a part of this pipeline. This model is approved either programmatically or manually through the model registry.
  • Real-time inference – In this step, we deploy the latest approved model as a SageMaker endpoint for real-time inference.

All pipeline parameters used in this solution exist in a single config.yml file. This file includes the necessary AWS and PrestoDB credentials to connect to the PrestoDB instance, information on the training hyperparameters and SQL queries that are run at training, and inference steps to read data from PrestoDB. This solution is highly customizable for industry-specific use cases so that it can be used with minimal code changes through simple updates in the config file.

The following code shows an example of how a query is configured within the config.yml file. This query is used at the data processing step of the training pipeline to fetch data from the PrestoDB instance. Here, we predict whether an order is a high_value_order or a low_value_order based on the orderpriority as given from the TPC-H data. For more information on the TPC-H data, its database entities, relationships, and characteristics, refer to TPC Benchmark H. You can change the query for your use case within the config file and run the solution with no code changes.

SELECT
    o.orderkey,
    COUNT(l.linenumber) AS lineitem_count,
    SUM(l.quantity) AS total_quantity,
    AVG(l.discount) AS avg_discount,
    SUM(l.extendedprice) AS total_extended_price,
    SUM(l.tax) AS total_payable_tax,
    o.orderdate,
    o.orderpriority,
    CASE
        WHEN (o.orderpriority = '2-HIGH') THEN 1 
        ELSE 0
    END AS high_value_order
FROM
    orders o
JOIN
    lineitem l ON o.orderkey = l.orderkey
GROUP BY
    o.orderkey,
    o.orderdate,
    o.orderpriority
ORDER BY 
    RANDOM() 
LIMIT 5000

The main steps of this solution are described in detail in the following sections.

Data preparation and training

The data preparation and training pipeline includes the following steps:

  1. The training data is read from a PrestoDB instance, and any feature engineering needed is done as part of the SQL queries run in PrestoDB at retrieval time. The queries that are used to fetch data at training and batch inference steps are configured in the config file.
  2. We use the FrameworkProcessor with SageMaker Processing jobs to read data from PrestoDB using the Python PrestoDB client.
  3. For the training and tuning step, we use the SKLearn estimator from the SageMaker SDK and the RandomForestClassifier from scikit-learn to train the ML model. The HyperparameterTuner class is used for running automatic model tuning, which finds the best version of the model by running many training jobs on the dataset using the algorithm and the ranges of hyperparameters.
  4. The model evaluation step checks that the trained and tuned model has an accuracy level above a user-defined threshold and only then register that model within the model registry. If the model accuracy doesn’t meet the threshold, the pipeline fails and the model is not registered with the model registry.
  5. The model training pipeline is then run with pipeline.start, which invokes and instantiates all the preceding steps.

Batch transform

The batch transform pipeline consists of the following steps:

  1. The pipeline implements a data preparation step that retrieves data from a PrestoDB instance (using a data preprocessing script) and stores the batch data in Amazon Simple Storage Service (Amazon S3).
  2. The latest model registered in the model registry from the training pipeline is approved.
  3. A Transformer instance is used to runs a batch transform job to get inferences on the entire dataset stored in Amazon S3 from the data preparation step and store the output in Amazon S3.

SageMaker real-time inference

The SageMaker endpoint pipeline consists of the following steps:

  1. The latest approved model is retrieved from the model registry using the describe_model_package function from the SageMaker SDK.
  2. The latest approved model is deployed as a real-time SageMaker endpoint.
  3. The model is deployed on a ml.c5.xlarge instance with a minimum instance count of 1 and a maximum instance count of 3 (configurable by the user) with the automatic scaling policy set to ENABLED. This removes unnecessary instances so you don’t pay for provisioned instances that you aren’t using.

Prerequisites

To implement the solution provided in this post, you should have an AWS account, a SageMaker domain to access Amazon SageMaker Studio, and familiarity with SageMaker, Amazon S3, and PrestoDB.

The following prerequisites also need to be in place before running this code:

  • PrestoDB – We use the built-in datasets available in PrestoDB through the TPCH connector for this solution. Follow the instructions in the GitHub README.md to set up PrestoDB on an Amazon Elastic Compute Cloud (Amazon EC2) instance in your account. If you already have access to a PrestoDB instance, you can skip this step but note its connection details (see the presto section in the config file). When you have your PrestoDB credentials, fill out the presto section in the config file as follows (enter your host public IP, port, credentials, catalog and schema):
presto:
  host: <0.0.0.0>
  parameter: "0000"
  presto_credentials: <presto_credentials>
  catalog: <catalog>
  schema: <schema>
  • VPC network configurations – We also define the encryption, network isolation, and VPC configurations of the ML model and operations in the config file. For more information on network configurations and preferences, refer to Connect to SageMaker Within your VPC. If you are using the default VPC and security groups then you can leave these configuration parameters empty, see example in this configuration file. If not, then in the aws section, specify the enable_network_isolation status, security_group_ids, and subnets based on your network isolation preferences. :
network_config:
    enable_network_isolation: false
    security_group_ids: 
    - <security_group_id>
    subnets:
    - <subnet-1>
    - <subnet-2>
    - <subnet-3>
  • IAM role – Set up an AWS Identity and Access Management (IAM) role with appropriate permissions to allow SageMaker to access AWS Secrets Manager, Amazon S3, and other services within your AWS account. Until an AWS CloudFormation template is provided that creates the role with the requisite IAM permissions, use a SageMaker role that allows the AmazonSageMakerFullAccess AWS managed policy for your role.
  • Secrets Manager secret – Set up a secret in Secrets Manager for the PrestoDB user name and password. Call the secret prestodb-credentials and add a username field and password field to it. For instructions, refer to Create and manage secrets with AWS Secrets Manager.

Deploy the solution

Complete the following steps to deploy the solution:

  1. Clone the GitHub repository in SageMaker Studio. For instructions, see Clone a Git Repository in SageMaker Studio Classic.
  2. Edit the config.yml file as follows:
    1. Edit the parameter values in the presto section. These parameters define the connectivity to PrestoDB.
    2. Edit the parameter values in the aws section. These parameters define the network connectivity, IAM role, bucket name, AWS Region, and other AWS Cloud-related parameters.
    3. Edit the parameter values in the sections corresponding to the pipeline steps (training_step, tuning_step, transform_step, and so on).
    4. Review all the parameters in these sections carefully and edit them as appropriate for your use case.

When the prerequisites are complete and the config.yml file is set up correctly, you’re ready to run the mlops-pipeline-prestodb solution. The following architecture diagram provides a visual representation of the steps that you implement.

The diagram shows the following three steps:

  • Part 1: Training – This pipeline includes the data preprocessing step, the training and tuning step, the model evaluation step, the condition step, and the register model step. The train, test, and validation datasets and evaluation report that are generated in this pipeline are sent to an S3 bucket.
  • Part 2: Batch transform – This pipeline includes the batch data preprocessing step, approving the latest model from the model registry, creating the model instance, and performing batch transformation on data that is stored and retrieved from an S3 bucket.
  • The PrestoDB server is hosted on an EC2 instance, with credentials stored in Secrets Manager.
  • Part 3: SageMaker real-time inference – Finally, the latest approved model from the SageMaker model registry is deployed as a SageMaker real-time endpoint for inference.

Test the solution

In this section, we walk through the steps of running the solution.

Training pipeline

Complete the following steps to run the training pipeline

(0_model_training_pipeline.ipynb):

  1. On the SageMaker Studio console, choose 0_model_training_pipeline.ipynb in the navigation pane.
  2. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook.

This notebook demonstrates how you can use SageMaker Pipelines to string together a sequence of data processing, model training, tuning, and evaluation steps to train a binary classification ML model using scikit-learn.

At the end of this run, navigate to pipelines in the navigation pane. Your pipeline structure on SageMaker Pipelines should look like the following figure.

The training pipeline consists of the following steps that are implemented through the notebook run:

  • Preprocess the data – In this step, we create a processing job for data preprocessing. For more information on processing jobs, see Process data. We use a preprocessing script to connect and query data from a PrestoDB instance using the user-specified SQL query in the config file. This step splits and sends data retrieved from PrestoDB as train, test, and validation files to an S3 bucket. The ML model is trained using the data in these files.
  • The sklearn_processor is used in the ProcessingStep to run the scikit-learn script that preprocesses data. The step is defined as follows:
# declare the sk_learn processer
step_args = sklearn_processor.run(
        ## code refers to the data preprocessing script that is responsible for querying data from the PrestoDB instance
        code=config['scripts']['preprocess_data'],
        source_dir=config['scripts']['source_dir'], 
        outputs=outputs_preprocessor,
        arguments=[
            "--host", host_parameter,
            "--port", port_parameter,
            "--presto_credentials_key", presto_parameter,
            "--region", region_parameter,
            "--presto_catalog", presto_catalog_parameter,
            "--presto_schema", presto_schema_parameter,
            "--train_split", train_split.to_string(), 
            "--test_split", test_split.to_string(),
        ],
    )

    step_preprocess_data = ProcessingStep(
        name=config['data_processing_step']['step_name'],
        step_args=step_args,
    )

Here, we use config['scripts']['source_dir'], which points to the data preprocessing script that connects to the PrestoDB instance. Parameters used as arguments in step_args are configurable and fetched from the config file.

  • Train the model – In this step, we create a training job to train a model. For more information on training jobs, see Train a Model with Amazon SageMaker. Here, we use the Scikit Learn Estimator from the SageMaker SDK to handle the end-to-end training and deployment of custom Scikit-learn code. The RandomForestClassifier is used to train the ML model for our binary classification use case. The HyperparameterTuner class is used for running automatic model tuning to determine the set of hyperparameters that provide the best performance based on a user-defined metric threshold (for example, maximizing the AUC metric).

In the following code, the sklearn_estimator object is used with parameters that are configured in the config file and uses a training script to train the ML model. This step accesses the train, test, and validation files that were created as a part of the previous data preprocessing step.

# declare a tuning step to use the train and test data to tune the ML model using the `HyperparameterTuner` declared above
step_tuning = TuningStep(
    name=config['tuning_step']['step_name'],
    tuner=rf_tuner,
    inputs={
        "train": TrainingInput(
            s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs[
                "train" ## refer to this
            ].S3Output.S3Uri,
            content_type="text/csv",
        ),
        "test": TrainingInput(
        s3_data=step_preprocess_data.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
        content_type="text/csv",
        ),
    },
)
  • Evaluate the model – This step checks if the trained and tuned model has an accuracy level above a user-defined threshold, and only then registers the model with the model registry. If the model accuracy doesn’t meet the user-defined threshold, the pipeline fails and the model is not registered with the model registry. We use the ScriptProcessor with an evaluation script that a user creates to evaluate the trained model based on a metric of choice.

The evaluation step uses the evaluation script as a code entry. This script prepares the features and target values, and calculates the prediction probabilities using model.predict. At the end of the run, an evaluation report is sent to Amazon S3 that contains information on precision, recall, and accuracy metrics.

step_evaluate_model = ProcessingStep(
    name=config['evaluation_step']['step_name'],
    processor=evaluate_model_processor,
    inputs=[
        ProcessingInput(
            source=step_tuning.get_top_model_s3_uri(top_k=0, s3_bucket=bucket),
            destination="/opt/ml/processing/model",
            input_name="model.tar.gz" 
        ),
        ProcessingInput(
            source=step_preprocess_data.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
            destination="/opt/ml/processing/test",
            input_name="test.csv" 
        ),
    ],
    outputs=[
        ProcessingOutput(
            output_name="evaluation",
            source="/opt/ml/processing/evaluation",
            destination=Join(
                on="/",
                values=[
                    "s3://{}".format(bucket),
                    prefix,
                    ExecutionVariables.PIPELINE_EXECUTION_ID,
                    "evaluation",
                ]
            )
        )
    ],
    code = config['scripts']['evaluation'],
    property_files=[evaluation_report],
    job_arguments=[
        "--target", target_parameter,
        "--features", feature_parameter,
    ]
)

The following screenshot shows an example of an evaluation report.

  • Add conditions – After the model is evaluated, we can add conditions to the pipeline with a ConditionStep. This step registers the model only if the given user-defined metric threshold is met. In our solution, we only want to register the new model version with the model registry if the new model meets a specific accuracy condition of above 70%.
# Create a SageMaker Pipelines ConditionStep, using the condition above.
# Enter the steps to perform if the condition returns True / False.
step_cond = ConditionStep(
    name=config['condition_step']['step_name'],
    conditions=[cond_gte],
    if_steps=[step_register_model],
    else_steps=[step_fail], ## if this fails
)

If the accuracy condition is not met, a step_fail step is run that sends an error message to the user, and the pipeline fails. For instance, because the user-defined accuracy condition is set to 0.7 in the config file, and the accuracy calculated during the evaluation step exceeds it (73.8%), the outcome of this step is set to True and the model moves to the last step of the training pipeline.

  • Register the model – The RegisterModel step registers a sagemaker.model.Model or a sagemaker.pipeline.PipelineModel with the SageMaker model registry. When the trained model meets the model performance requirements, a new version of the model is registered with the SageMaker model registry.

The model is registered with the model registry with an approval status set to PendingManualApproval. This means the model can’t be deployed on a SageMaker endpoint unless its status in the registry is changed to Approved manually on the SageMaker console, programmatically, or through an AWS Lambda function.

Now that the model is registered, you can get access to the registered model manually on the SageMaker Studio model registry console or programmatically in the next notebook, approve it, and run the batch transform pipeline.

Batch transform pipeline

Complete the following steps to run the batch transform pipeline (1_batch_transform_pipeline.ipynb):

  1. On the SageMaker Studio console, choose 1_batch_transform_pipeline.ipynb in the navigation pane.
  2. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook.

This notebook will run a batch transform pipeline using the model trained in the previous notebook.

At the end of the batch transform pipeline, your pipeline structure on SageMaker Pipelines should look like the following figure.

The batch transform pipeline consists of the following steps that are implemented through the notebook run:

  • Extract the latest approved model from the SageMaker model registry – In this step, we extract the latest model from the model registry and set the ModelApprovalStatus to Approved:
## updating the latest model package to approved status to use it for batch inference
model_package_update_response = sm.update_model_package(
    ModelPackageArn=latest_model_package_arn,
    ModelApprovalStatus="Approved",
)

Now we have extracted the latest model from the SageMaker model registry and programmatically approved it. You can also approve the model manually on the SageMaker model registry page in SageMaker Studio as shown in the following screenshot.

  • Read raw data for inference from PrestoDB and store it in an S3 bucket – After the latest model is approved, batch data is fetched from the PrestoDB instance and used for the batch transform step. In this step, we use a batch preprocessing script that queries data from PrestoDB and saves it in a batch directory within an S3 bucket. The query that is used to fetch batch data is configured by the user within the config file in the transform_step section:
# declare the batch step that is called later in pipeline execution
batch_data_prep = ProcessingStep(
    name=config['data_processing_step']['step_name'],
    step_args=step_args,
)

After the batch data is extracted into the S3 bucket, we create a model instance and point to the inference.py script, which contains code that runs as part of getting inference from the trained model:

# create the model image based on the model data and refer to the inference script as an entry point for batch inference
model = Model(
    image_uri=image_uri,
    entry_point=config['scripts']['batch_inference'],
    model_data=model_data_url,
    sagemaker_session=pipeline_session,
    role=role,
)
  • Create a batch transform step to perform inference on the batch data stored in Amazon S3 – Now that a model instance is created, create a Transformer instance with the appropriate model type, compute instance type, and desired output S3 URI. Specifically, pass in the ModelName from the CreateModelStep step_create_model properties. The CreateModelStep properties attribute matches the object model of the DescribeModel response object. Use a transform step for batch transformation to run inference on an entire dataset. For more information about batch transform, see Run Batch Transforms with Inference Pipelines.
  • A transform step requires a transformer and the data on which to run batch inference:
transformer = Transformer(
model_name=step_create_model.properties.ModelName,
instance_type=config['transform_step']['instance_type'],
instance_count=config['transform_step']['instance_count'],
strategy="MultiRecord",
accept="text/csv",
assemble_with="Line",
output_path=f"s3://{bucket}",
tags = config['transform_step']['tags'], 
env={
    'START_TIME_UTC': st.strftime('%Y-%m-%d %H:%M:%S'), 
    'END_TIME_UTC': et.strftime('%Y-%m-%d %H:%M:%S'),
})

Now that the transformer object is created, pass the transformer input (which contains the batch data from the batch preprocess step) into the TransformStep declaration. Store the output of this pipeline in an S3 bucket.

step_transform = TransformStep(
    name=config['transform_step']['step_name'], transformer=transformer, inputs=transform_input, 
)

SageMaker real-time inference

Complete the following steps to run the real-time inference pipeline (2_realtime_inference.ipynb):

  1. On the SageMaker Studio console, choose 2_realtime_inference_pipeline.ipynb in the navigation pane.
  2. When the notebook is open, on the Run menu, choose Run All Cells to run the code in this notebook.

This notebook extracts the latest approved model from the model registry and deploys it as a SageMaker endpoint for real-time inference. It does so by completing the following steps:

  • Extract the latest approved model from the SageMaker model registry – To deploy a real-time SageMaker endpoint, first fetch the image URI of your choice and extract the latest approved model from the model registry. After the latest approved model is extracted, we use a container list with the specified inference.py as the script for the deployed model to use at inference. This model creation and endpoint deployment are specific to the scikit-learn model configuration.
  • In the following code, we use the inference.py file specific to the scikit-learn model. We then create our endpoint configuration, setting our ManagedInstanceScaling to ENABLED with our desired MaxInstanceCount and MinInstanceCount for automatic scaling:
create_endpoint_config_response = sm.create_endpoint_config(
EndpointConfigName = endpoint_config_name,
ProductionVariants=[{
    'InstanceType': instance_type,
    # have max instance count configured here
    'InitialInstanceCount': min_instances,
    'InitialVariantWeight': 1,
    'ModelName': model_name,
    'VariantName': 'AllTraffic', 
    # change your managed instance configuration here
    "ManagedInstanceScaling":{
        "MaxInstanceCount": max_instances,
        "MinInstanceCount": min_instances,
        "Status": "ENABLED",}
}])
  • Run inference on the deployed real-time endpoint – After you have extracted the latest approved model, created the model from the desired image URI, and configured the endpoint configuration, you can deploy it as a real-time SageMaker endpoint:
create_endpoint_response = sm.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)

# wait for endpoint to reach a terminal state (InService) using describe endpoint
describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
    describe_endpoint_response = sm.describe_endpoint(EndpointName=endpoint_name)

Upon deployment, you can view the endpoint in service on the SageMaker Endpoints page.

Now you can run inference against the data extracted from PrestoDB:

body_str = "total_extended_price,avg_discount,total_quantityn1,2,3n66.77,12,2"

response = smr.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=body_str.encode('utf-8') ,
    ContentType='text/csv',
)

response_str = response["Body"].read().decode()
response_str

Results

Here is an example of an inference request and response from the real time endpoint using the implementation above:

Inference request format (view and change this example as you would like for your custom use case)

body_str = """total_extended_price,avg_discount,total_quantity
32,40,334
"""
 
response = smr.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=body_str.encode('utf-8'),
    ContentType='text/csv',
)

response_str = response["Body"].read().decode()
data = json.loads(response_str)
print(json.dumps(data, indent=4))

Response from the real time endpoint

[
    {
        "total_extended_price": 32,
        "avg_discount": 40,
        "total_quantity": 334,
        "prediction": 0
    }
]

Clean up

To clean up the endpoint used in this solution to avoid extra charges, complete the following steps:

  1. On the SageMaker console, choose Endpoints in the navigation pane.
  2. Select the endpoint to delete.
  3. On the Actions menu, choose Delete.

Conclusion

In this post, we demonstrated an end-to-end MLOps solution on SageMaker. The process involved fetching data by connecting a SageMaker Processing job to a PrestoDB instance, followed by training, evaluating, and registering the model. We approved the latest registered model from the training pipeline and ran batch inference against it using batch data queried from PrestoDB and stored in Amazon S3. Lastly, we deployed the latest approved model as a real-time SageMaker endpoint to run inferences.

The rise of generative AI increases the demand for training, deploying, and running ML models, and consequently, the use of data. By integrating SageMaker Processing jobs with PrestoDB, you can seamlessly migrate your workloads to SageMaker pipelines without additional data preparation, storage, or accessibility burdens. You can build, train, evaluate, run batch inferences, and deploy models as real-time endpoints while using your existing data engineering pipelines with minimal or no code changes.

Explore SageMaker Pipelines and open source data querying engines like PrestoDB, and build a solution using the sample implementation provided.

Get started today by referring to the GitHub repository.

For more information and tutorials on SageMaker Pipelines, refer to the SageMaker Pipelines documentation.


About the Authors

Madhur Prashant is an AI and ML Solutions Architect at Amazon Web Services. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington D.C.

Antara Raisa is an AI and ML Solutions Architect at Amazon Web Services supporting strategic customers based out of Dallas, Texas. She also has experience working with large enterprise partners at AWS, where she worked as a Partner Success Solutions Architect for digital-centered customers.

Johnny Chivers is a Senior Solutions Architect working within the Strategic Accounts team at AWS. With over 10 years of experience helping customers adopt new technologies, he guides them through architecting end-to-end solutions spanning infrastructure, big data, and AI.

Shamik Ray is a Senior Engineering Manager at Twilio, leading the Data Science and ML team. With 12 years of experience in software engineering and data science, he excels in overseeing complex machine learning projects and ensuring successful end-to-end execution and delivery.

Srivyshnav K S is a Senior Machine Learning Engineer at Twilio with over 5 years of experience. His expertise lies in leveraging statistical and machine learning techniques to develop advanced models for detecting patterns and anomalies. He is adept at building projects end-to-end.

Jagmohan Dhiman is a Senior Data Scientist with 7 years of experience in machine learning solutions. He has extensive expertise in building end-to-end solutions, encompassing data analysis, ML-based application development, architecture design, and MLOps pipelines for managing the model lifecycle.

Soumya Kundu is a Senior Data Engineer with almost 10 years of experience in Cloud and Big Data technologies. He specialises in AI/ML based large scale Data Processing systems and an avid IoT enthusiast in his spare time.

Read More

Accelerate deep learning training and simplify orchestration with AWS Trainium and AWS Batch

Accelerate deep learning training and simplify orchestration with AWS Trainium and AWS Batch

In large language model (LLM) training, effective orchestration and compute resource management poses a significant challenge. Automation of resource provisioning, scaling, and workflow management is vital for optimizing resource usage and streamlining complex workflows, thereby achieving efficient deep learning training processes. Simplified orchestration enables researchers and practitioners to focus more on model experimentation, hyperparameter tuning, and data analysis, rather than dealing with cumbersome infrastructure management tasks. Straightforward orchestration also accelerates innovation, shortens time-to-market for new models and applications, and ultimately enhances the overall efficiency and effectiveness of LLM research and development endeavors.

This post explores the seamless integration of AWS Trainium with AWS Batch, showcasing how the powerful machine learning (ML) acceleration capabilities of Trainium can be harnessed alongside the efficient orchestration functionalities offered by AWS Batch. Trainium provides massive scalability, enables effortless scaling of training jobs from small models to LLMs, and offers cost-effective access to computational power, making training LLMs affordable and accessible. AWS Batch is a managed service facilitating batch computing workloads on the AWS Cloud, handling tasks like infrastructure management and job scheduling, while enabling you to focus on application development and result analysis. AWS Batch provides comprehensive features, including managed batch computing, containerized workloads, custom compute environments, and prioritized job queues, along with seamless integration with other AWS services.

Solution overview

The following diagram illustrates the solution architecture.

The training process proceeds as follows:

  1. The user creates a Docker image configured to suit the demands of the underlying training task.
  2. The image is pushed to Amazon Elastic Container Registry (Amazon ECR) to make it ready for deployment.
  3. The user submits the training job to AWS Batch with the Docker image.

Let’s deep dive into this solution to see how you can integrate Trainium with AWS Batch. The following example demonstrates how to train the Llama 2-7B model using AWS Batch with Trainium.

Prerequisites

It is advised to not run the following scripts on your local machine. Instead, clone the GitHub repository and run the provided scripts on an x86_64-based instance, preferably using a C5.xlarge instance type with the Linux/Ubuntu operating system. For this post, we run the example on an Amazon Linux 2023 instance.

You should have the following resources and tools before getting started with the training on AWS Batch:

sudo yum install -y docker 
sudo yum install -y jq

Clone the repo

Clone the GitHub repo and navigate to the required directory:

git clone https://github.com/aws-neuron/aws-neuron-samples.git 
cd aws-neuron-samples/torch-neuronx/training/aws-batch/llama2

Update the configuration

First, update the config.txt file to specify values for the following variables:

REGION                          # your aws region 
SUBNET                          # your subnet in which the Trainium instances would be launched 
SG                              # your security group you want to associate with your instances 
ECR_REPO                        # your ECR repo where the docker container image will be pushed to 
INSTANCE_ROLE                   # Instance profile ARN for your IAM Instance Role 
DO_PRE_COMPILATION              # boolean value (truefalse) indicating if you want to do neuron pre-compilation for your training job 
TOKENIZED_DATASET_URI           # s3 uri to store the tokenized dataset 
NEURON_COMPILE_CACHE_URI        # s3 uri to store the neuron compile caches 
CHECKPOINT_SAVE_URI             # s3 uri to store the checkpoints

After you provide these values, your config.txt file should look something like the following code

REGION=us-east-1
SUBNET=subnet-012345abcd5689
SG=sg-012345abcd5689
ECR_REPO=1010101010.dkr.ecr.us-east-1.amazonaws.com/your-docker-repo
INSTANCE_ROLE=arn:aws:iam::1010101010:instance-profile/your-instance-role
DO_PRE_COMPILATION=true
TOKENIZED_DATASET_URI=s3://your/s3/location/to/store/tokenized/dataset/
NEURON_COMPILE_CACHE_URI=s3://your/s3/location/to/store/neuron-compile-cache/
CHECKPOINT_SAVE_URI=s3://your/s3/location/to/store/checkpoints/

Get the Llama tokenizer

To tokenize the dataset, you would need to get the tokenizer from Hugging Face. Follow the instructions to access the Llama tokenizer. (You need to acknowledge and accept the license terms.) After you’re granted access, you can download the tokenizer from Hugging Face. After a successful download, place the tokenizer.model file in the root directory (llama2).

Set up Llama training

Run the setup.sh script, which streamlines the prerequisite steps for initiating the AWS Batch training. This script downloads the necessary Python files for training the Llama 2-7B model. Additionally, it performs environment variable substitution within the provided templates and scripts designed to establish AWS Batch resources. When it runs, it makes sure your directory structure conforms to the following setup:

.
├── build
│ ├── compute_env.json
│ ├── job_def.json
│ ├── job_queue.json
│ └── launch_template.json
├── build_and_push_docker_image.sh
├── cleanup.sh
├── config.txt
├── create_resources.sh
├── data
│ ├── get_dataset.py
│ ├── config.json
│ └── tokenizer.model
├── docker
│ ├── Dockerfile
│ ├── llama2
│ │ ├── adamw_fp32_optim_params.py
│ │ ├── config.json
│ │ ├── llama_batch_training.sh
│ │ ├── modeling_llama_nxd.py
│ │ ├── requirements.txt
│ │ └── tp_zero1_llama2_7b_hf_pretrain.py
│ └── llama_batch_training.sh
├── download_and_tokenize_data.sh
├── images
│ └── aws-batch.png
├── README.md
├── scripts
│ ├── build_and_push_docker_image.sh
│ ├── cleanup.sh
│ ├── create_resources.sh
│ ├── download_and_tokenize_data.sh
│ └── submit_batch_job.sh
├── setup.sh
├── submit_batch_job.sh
└── templates
├── compute_env.json
├── job_def.json
├── job_queue.json
└── launch_template.json

Tokenize the dataset

Next, run the download_and_tokenize_data.sh script to complete the data preprocessing steps for Llama 2-7B training. In this instance, we use the wikicorpus dataset sourced from Hugging Face. After the dataset retrieval, the script performs tokenization and uploads the tokenized dataset to the predefined S3 location specified within the config.txt configuration file. The following screenshots show the preprocessing results.

Provision resources

Next, run the create_resources.sh script, which orchestrates the provisioning of the required resources for the training task. This includes creation of a placement group, launch template, compute environment, job queue, and job definition. The following screenshots illustrate this process.

Build and push the Docker image

Now you can run the script build_and_push_docker_image.sh, which constructs a Docker container image customized for your specific training task. This script uses a Deep Learning Container Image published by the Neuron team, which contains the required software stack, and then added instructions for running the Llama 2-7B training on top of it. The training script uses the neuronx_distributed library with tensor parallelism along with the ZeRO-1 Optimizer. Subsequently, the newly generated Docker container image is uploaded to your designated ECR repository as specified by the variable ECR_REPO in the configuration file config.txt.

If you want to modify any of the Llama training hyperparameters, make the required changes in ./docker/llama_batch_training.sh before running build_and_push_docker_image.sh.

The following screenshots illustrate the process for building and pushing the Docker image.

Submit the training job

Run the submit_batch_job.sh script to initiate the AWS Batch job and start the Llama2 model training, as shown in the following screenshots.

Upon batch job submission, an Amazon Elastic Container Service (Amazon ECS) cluster is dynamically provisioned. When it’s operational, you can navigate to the cluster to monitor all tasks actively running on the Trn1.32xl instances, launched through this job. By default, this example is configured to use 4 trn1.32xl instances. To customize this setting, you can modify the numNodes parameter in the submit_batch_job.sh script.

Logs and monitoring

After the job submission, you can use Amazon CloudWatch Logs for comprehensive monitoring, storage, and viewing of all logs generated by AWS Batch. Complete the following steps to access the logs:

  1. On the CloudWatch console, choose Log groups under Logs in the navigation pane.
  2. Choose /aws/batch/job to view the batch job logs.
  3. Look for log groups that match your AWS Batch job names or job definitions.
  4. Choose the job to view its details.

The following screenshot shows an example.

Checkpoints

Checkpoints generated during training will be stored in the predefined S3 location specified as CHECKPOINT_SAVE_URI in the config.txt file. By default, the checkpoint is saved when training is complete. However, you can adjust this behavior by opting to save the checkpoint after every N steps within the training loop. For detailed instructions on this customization, refer to Checkpointing.

Clean up

When you’re done, run the cleanup.sh script to manage the removal of resources created during the post. This script takes care of removing various components, such as the launch template, placement group, job definition, job queue, and compute environment. AWS Batch automatically handles the cleanup of the ECS stack and Trainium instances, so there’s no need to manually remove or stop them.

Conclusion

The seamless integration of Trainium with AWS Batch represents a significant advancement in the realm of ML training. By combining the unparalleled capabilities of Trainium with the powerful orchestration functionalities of AWS Batch, you stand to benefit in numerous ways. Firstly, you gain access to massive scalability, with the ability to effortlessly scale training jobs from small models to LLMs. With up to 16 Trainium chips per instance and the potential for distributed training across tens of thousands of accelerators, you can tackle even the most demanding training tasks with ease by virtue of Trainium instances. Additionally, it offers a cost-effective solution, helping you harness the power you need at an appealing price point. With the fully managed service offered by AWS Batch for computing workloads, you can offload operational complexities such as infrastructure provisioning and job scheduling, allowing you to focus your efforts on building applications and analyzing results. Ultimately, the integration of Trainium with AWS Batch empowers you to accelerate innovation, shorten time-to-market for new models and applications, and enhance the overall efficiency and effectiveness of your ML endeavors.

Now that you have learned about orchestrating Trainium using AWS Batch, we encourage you to try it out for your next deep learning training job. You can explore more tutorials that will help you gain hands-on experience with AWS Batch and Trainium, and enable you to manage your deep learning training workloads and resources for better performance and cost-efficiency. So why wait? Start exploring these tutorials today and take your deep learning training to the next level with Trainium and AWS Batch!


About the authors

Scott Perry is a Solutions Architect on the Annapurna ML accelerator team at AWS. Based in Canada, he helps customers deploy and optimize deep learning training and inference workloads using AWS Inferentia and AWS Trainium. His interests include large language models, deep reinforcement learning, IoT, and genomics.

Sadaf Rasool is a Machine Learning Engineer with Annapurna ML Accelerator team at AWS. As an enthusiastic and optimistic AI/ML professional, he holds firm to the belief that the ethical and responsible application of AI has the potential to enhance society in the years to come, fostering both economic growth and social well-being.

Read More

Microsoft at CVPR 2024: Innovations in computer vision and AI research

Microsoft at CVPR 2024: Innovations in computer vision and AI research

CVPR 2024 logo on a green and purple abstract background

Microsoft is proud to sponsor the 41st annual Conference on Computer Vision and Pattern Recognition (CVPR 2024), held from June 17 to June 21. This premier conference covers a broad spectrum of topics in the field, including 3D reconstruction and modeling, action and motion analysis, video and image processing, synthetic data generation, neural networks, and many more. This year, 63 papers from Microsoft have been accepted, with six selected for oral presentations. This post highlights these contributions.

The diversity of these research projects reflects the interdisciplinary approach that Microsoft research teams have taken, from techniques that precisely recreate 3D human figures and perspectives in augmented reality (AR) to combining advanced image segmentation with synthetic data to better replicate real-world scenarios. Other projects demonstrate how researchers are combining machine learning with natural language processing and structured data, developing models that not only visualize but also interact with their environments. Collectively, these projects aim to improve machine perception and enable more accurate and responsive interactions with the world. 

Microsoft Research Podcast

What’s Your Story: Jacki O’Neill

Jacki O’Neill saw an opportunity to expand Microsoft research efforts to Africa. She now leads Microsoft Research Africa, Nairobi (formerly MARI). O’Neill talks about the choices that got her there, the lab’s impact, and how living abroad is good for innovation.


Oral presentations 

BIOCLIP: A Vision Foundation Model for the Tree of Life

Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G. Campolongo, Chan Hee Song, David Carlyn, Li Dong, W. Dahdul, Charles Stewart, Tanya Y. Berger-Wolf, Wei-Lun Chao, Yu Su 

The surge in images captured from diverse sources—from drones to smartphones—offers a rich source of biological data. To harness this potential, we introduce TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images, and BioCLIP, a foundation model intended for the biological sciences. BioCLIP, utilizing the TreeOfLife-10M’s vast array of organism images and structured knowledge, excels in fine-grained biological classification, outperforming existing models by significant margins and demonstrating strong generalizability. 

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys 

A critical challenge in augmented reality (AR) is simulating realistic anatomical movements to guide cameras for authentic egocentric views. To overcome this, the authors developed EgoGen, a sophisticated synthetic data generator that not only improves training data accuracy for egocentric tasks but also refines the integration of motion and perception. It offers a practical solution for creating realistic egocentric training data, with the goal of serving as a useful tool for egocentric computer vision research. 

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan 

Florence-2 introduces a unified, prompt-based vision foundation model capable of handling a variety of tasks, from captioning to object detection and segmentation. Designed to interpret text prompts as task instructions, Florence-2 generates text outputs across a spectrum of vision and vision-language tasks. This model’s training utilizes the FLD-5B dataset, which includes 5.4 billion annotations on 126 million images, developed using an iterative strategy of automated image annotation and continual model refinement.

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia

This work introduces reasoning segmentation, a new segmentation task using complex query texts to generate segmentation masks. The authors also established a new benchmark, comprising over a thousand image-instruction-mask data samples, incorporating intricate reasoning and world knowledge for evaluation. Finally, the authors present Large Language Instructed Segmentation Assistant (LISA), a tool that combines the linguistic capabilities of large language models with the ability to produce segmentation masks. LISA effectively handles complex queries and shows robust zero-shot learning abilities, further enhanced by minimal fine-tuning.

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin (opens in new tab), Otmar Hilliges, Jie Song 

MultiPly is a new framework for reconstructing multiple people in 3D from single-camera videos in natural settings. This technique employs a layered neural representation for the entire scene, refined through layer-wise differentiable volume rendering. Enhanced by a hybrid instance segmentation that combines self-supervised 3D and promptable 2D techniques, it provides reliable segmentation even with close interactions. The process uses confidence-guided optimization to alternately refine human poses and shapes, achieving high-fidelity, consistent 3D models.

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Alexandros Delitzas, Ayça Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, Francis Engelmann 

Traditional 3D scene understanding methods are heavily focused on 3D sematic and instance segmentation, but the true challenge lies in interacting with functional interactive elements like handles, knobs, and buttons to achieve specific tasks. Enter SceneFun3D: a robust dataset featuring over 14,800 precise interaction annotations across 710 high-resolution real-world 3D indoor scenes. This dataset enriches scene comprehension with motion parameters and task-specific natural language descriptions, facilitating advanced research in functionality segmentation, task-driven affordance grounding, and 3D motion estimation.

Discover more about our work and contributions to CVPR 2024, including our full list of publications and sessions, on our conference webpage

The post Microsoft at CVPR 2024: Innovations in computer vision and AI research appeared first on Microsoft Research.

Read More

Introducing AutoGen Studio: A low-code interface for building multi-agent workflows

Introducing AutoGen Studio: A low-code interface for building multi-agent workflows

White icons representing (from left to right) agents (multi), workflow, tasks, and coding on a blue to purple to pink gradient background.

Multi-agent approaches to AI applications, where multiple foundation model-based agents collaborate to solve problems, are emerging as a powerful paradigm for accomplishing increasingly complex tasks. In September 2023, we released AutoGen – a flexible and open-source Python-based framework for defining, configuring, and composing AI agents to drive multi-agent applications. Today, we are introducing AutoGen Studio (version 0.1.0) – a low-code interface for rapidly building, testing, and sharing multi-agent solutions. AutoGen Studio is built on AutoGen and inherits its features and functionalities, while providing a user-friendly and intuitive interface to create and customize agents, with little to no coding required.

During the nine months since it was released, AutoGen (opens in new tab) has been widely adopted by researchers, developers, and enthusiasts who have created a variety of novel and exciting applications (opens in new tab) – from market research to interactive educational tools to data analysis pipelines in the medical domain.  With more than 290 community contributors on GitHub and 890,000 downloads of the Python package (as of May 2024), AutoGen continues to be a leading framework for building and researching multi-agent AI applications.

AutoGen Studio user interface: PDF Book Gen Session
A screenshot of the AutoGen Studio interface shows results when two agents are used to address the task, “Create a 4-page kids’ .pdf book with details and pictures about weather patterns in Seattle”.

AutoGen Studio is the next step forward in enabling developers to advance the multi-agent paradigm. We want to make multi-agent solutions responsibly available to diverse audiences – from academic researchers to professional developers across industries – who want to build multi-agent applications to solve real-world problems. Imagine having access to agents that can automate your vacation planning and grocery shopping, manage your personal finances, help you accomplish your learning goals, or perform any other task you care about. How would you build such agents? What capabilities would you give them? How would you make them work together? How would you ensure they are working as intended?

These questions motivated us to build AutoGen Studio. With AutoGen Studio, developers can rapidly build, test, deploy, and share agents and agent-teams (workflows), with the community. 

Note: AutoGen is primarily a developer tool to enable rapid prototyping and research. It is not a production ready tool. Please see the GitHub repository (opens in new tab) and documentation (opens in new tab) for instructions on how to get started.

What can you do with AutoGen Studio right now?

We built AutoGen Studio with the following goals in mind:  

  • Lower the barrier to entry in building multi-agent applications  
  • Facilitate rapid prototyping and testing of multi-agent solutions
  • Cultivate expertise and community by allowing users to share and re-use this technology 

With AutoGen Studio’s early release (v 0.1.0), users can rapidly author agent workflows via a user interface, interactively test and debug agents, reuse artifacts, and deploy workflows.

The video above shows how users can create skills and models, attach them to agents, create agent workflows, test and deploy them in AutoGen Studio. All in a few clicks.

Rapidly author agent workflows

AutoGen Studio provides a “Build” section where users can choose from a library of pre-defined agents and compose them into teams (workflows) that can address tasks in minutes. Furthermore, users can customize agents and agent teams with foundation models, prompts, skills (python functions that accomplish a specific task e.g., fetching the weather from a weather provider), and workflows via a graphical user interface.  Workflows may be sequential (where agents act in a predefined sequential order) or autonomous chat (where the order in which agents act may be driven by a large language model, custom logic, all based on the state of the task).

AutoGen Studio user interface: agent configuration
In AutoGen Studio, agents can be configured via the user interface. Models and skills can be associated with agents, and agents can be composed into autonomous chat and sequential workflows.

Debug and test agents

AutoGen Studio allows developers to immediately test workflows on a variety of tasks and review resulting artifacts (such as images, code, and documents). Developers can also review the “inner monologue” of agent workflows as they address tasks, and view profiling information such as costs associated with the run (such as number of turns and number of tokens), and agent actions (such as whether tools were called and the outcomes of code execution).

AutoGen Studio user interface: profile sample workflow
AutoGen Studio user interface: sample workflow
In AutoGen Studio, users can test workflows, see results, and view visualizations that profile agent actions (such as how often tools were used or code was executed).

Artifact reuse and deployment

Users can download the skills, agents, and workflow configurations they create as well as share and reuse these artifacts.  AutoGen Studio also offers a seamless process to export workflows and deploy them as application programming interfaces (APIs) that can be consumed in other applications deploying workflows as APIs.

Specifically, workflows can be exported as JavaScript Object Notation (JSON) files and loaded into any python application, launched as an API endpoint from the command line or wrapped into a Dockerfile that can be deployed on cloud services like Azure Container Apps or Azure Web Apps.

AutoGen Studio user interface: export workflow
In AutoGen Studio, users can export agent workflows as a JSON configuration file and then reuse them in any python application, launch it as an API from the command line or deploy on a cloud service like Azure Container Apps and Azure Web Apps.

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience


What is the community creating with AutoGen Studio?

Over the last few months, we have shared an early version of AutoGen Studio, which has been downloaded more than 154,000 times on pypi (January – May 2024). Our observations of early usage patterns (based on feedback from social platforms like GitHub discussions (opens in new tab) , Discord (opens in new tab) and Youtube (opens in new tab) (opens in new tab)) suggest that AutoGen Studio is driving a new group of users who have basic technical capabilities (that is, they can install the tool) and are interested in rapidly testing out ideas but have limited programming skills.

We have seen these users prototype examples covering tasks like travel planning, pdf brochure generation, market research, structured data extraction, video generation, and visualization generation among others. Importantly, these tasks are accomplished simply by defining agents, giving them access to large language models and skills, adding agents to a workflow, and running tasks with these workflows.

Users are exploring early use cases such as report/book generation, as seen in the screenshot above. Here, two agents are defined and given access to skills for generating images. The agents are then composed into a workflow where messages and actions are exchanged to solve the task of generating a pdf report.

Open research questions and next steps

Orchestrating teams of agents that can explore plans, reflect on actions, and collaborate offers opportunities to build tools that address challenging tasks. We believe that we are just scratching the surface of what may be possible with the multi-agent paradigm, and much is unknown about how best to harness foundation models, let alone foundation model-based agents and multi-agent solutions.

This leaves open many opportunities for further research.

For example, the sophisticated interplay between agents in multi-agent paradigms, particularly for increasingly more complex and dynamic domains, highlights many opportunities for multi-agent evaluation and tooling. Open questions include:

  • How can we measure the performance, reliability, and reusability of agents across tasks?
  • How can we better understand the strengths and limitations of agents?
  • How can we explore alternative scenarios and outcomes?
  • How can we compare different agent architectures and collaboration protocols?

These questions require novel methods and metrics that can capture the multi-faceted aspects of multi-agent paradigms and provide actionable insights for developers and users.

As our understanding of the multi-agent paradigm matures, another opportunity is in distilling design patterns and best practices for building effective agent teams for different types of tasks. For instance:

  • What are the optimal number and composition of agents for a given problem?
  • What is the best way to distribute responsibilities and coordinate actions among agents?
  • What are the trade-offs between centralized and decentralized control, or between homogeneous and heterogeneous agents?
  • How can we leverage human oversight and feedback to improve agent reliability and safety?

These questions require systematic studies and empirical evaluations to discover the key dimensions and principles for designing multi-agent solutions.

Finally, as agents become more long-lived and ubiquitous in our digital world, an open challenge is in automating and optimizing the agent-creation process itself. For example:

  •  How can we dynamically spawn agents based on the task requirements and available resources?
  • How can we tune agent parameter workflow configurations to achieve the best performance?
  • How can we adapt agent teams to changing environments and user preferences?

Future design improvements

Naturally, we see AutoGen Studio as a potential vehicle to study many of these research questions – from improvements in the user experience of authoring workflows to a gallery of shareable artifacts to advanced tools for making sense of agent behaviors.

We are currently working on a new drag-and-drop experience in AutoGen Studio, designed to transform how users’ author multi-agent workflows. Our new visual canvas allows users to easily orchestrate and connect agents, providing an intuitive interface for defining collaboration dynamics.

AutoGen Studio user interface: visual workflow design
A new visual canvas interface for AutoGen allows users to easily orchestrate and connect agents, providing an intuitive interface for defining collaboration dynamics. Entities such as skills and models can be associated with agents via drag-and-drop interactions.

Visual workflow design: The heart of our enhanced user interface is a visual canvas where you can literally see your workflow come to life. Drag and drop different agents onto the canvas to build complex conversation patterns. This graphical approach not only simplifies the initial setup but also makes the process of modifying agents and workflows more intuitive.

A new visual canvas interface for AutoGen that allows users to both visualize agent interactions as well as update properties of each agent in the same view pane.
A new visual canvas interface for AutoGen allows users to both visualize agent interactions and update properties of each agent in the same view pane.

Configurable agents, models, and skills: Customize each agent’s role and skills through simple, direct interactions on the canvas. Whether you’re adding new capabilities or tweaking existing ones, the process is straightforward and user-friendly.

AutoGen Studio user interface: dynamic prototyping and testing
The proposed visual canvas interface for AutoGen will explore updated visualization of agent internal monologues for improved debugging.

Dynamic prototyping and testing: Experimentation is key to perfecting agent workflows. With our new interface, you can prototype various agent configurations and immediately test them in a live environment. This real-time interaction allows you to chat with the workflow, observe all agent messages, and pinpoint areas for improvement on the fly.

AutoGen Studio community gallery
The new proposed design explores a gallery of curated workflows and entities (such as skills and agents) that can be reused.

Finally, we are developing a community gallery within AutoGen Studio where users can share, discover, and learn from one another. This gallery will allow you to publish your workflows, agents, and skills, fostering a collaborative environment where everyone can benefit from shared knowledge and innovations.

Note on responsible AI: Promoting safe and ethical multi-agent solutions

AutoGen Studio is designed to provide a low-code environment for rapidly prototyping and testing multi-agent workflows. Our goal is to responsibly advance research and practice in solving problems with multiple agents and to develop tools that contribute to human well-being. Along with AutoGen, AutoGen Studio is committed to implementing features that promote safe and reliable outcomes. For example, AutoGen Studio offers profiling tools to make sense of agent actions and safeguards, such as support for Docker environments for code execution. This feature helps ensure that agents operate within controlled and secure environments, reducing the risk of unintended or harmful actions. For more information on our approach to responsible AI in AutoGen,  please refer to transparency FAQS here: https://github.com/microsoft/autogen/blob/main/TRANSPARENCY_FAQS.md (opens in new tab). Finally, AutoGen Studio is not production ready i.e., it does not focus on implementing authentication and other security measures that are required for production ready deployments.

Acknowledgements 

We would like to thank members of the open-source software (OSS) community and the AI Frontiers organization at Microsoft for discussions and feedback along the way. Specifically, we would like to thank Piali Choudhury, Ahmed Awadallah, Robin Moeur, Jack Gerrits, Robert Barber, Grace Proebsting, Michel Pahud, and others for feedback and comments.

The post Introducing AutoGen Studio: A low-code interface for building multi-agent workflows appeared first on Microsoft Research.

Read More

Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

Seamless in Seattle: NVIDIA Research Showcases Advancements in Visual Generative AI at CVPR

NVIDIA researchers are at the forefront of the rapidly advancing field of visual generative AI, developing new techniques to create and interpret images, videos and 3D environments.

More than 50 of these projects will be showcased at the Computer Vision and Pattern Recognition (CVPR) conference, taking place June 17-21 in Seattle. Two of the papers — one on the training dynamics of diffusion models and another on high-definition maps for autonomous vehicles — are finalists for CVPR’s Best Paper Awards.

NVIDIA is also the winner of the CVPR Autonomous Grand Challenge’s End-to-End Driving at Scale track — a significant milestone that demonstrates the company’s use of generative AI for comprehensive self-driving models. The winning submission, which outperformed more than 450 entries worldwide, also received CVPR’s Innovation Award.

NVIDIA’s research at CVPR includes a text-to-image model that can be easily customized to depict a specific object or character, a new model for object pose estimation, a technique to edit neural radiance fields (NeRFs) and a visual language model that can understand memes. Additional papers introduce domain-specific innovations for industries including automotive, healthcare and robotics.

Collectively, the work introduces powerful AI models that could enable creators to more quickly bring their artistic visions to life, accelerate the training of autonomous robots for manufacturing, and support healthcare professionals by helping process radiology reports.

“Artificial intelligence, and generative AI in particular, represents a pivotal technological advancement,” said Jan Kautz, vice president of learning and perception research at NVIDIA. “At CVPR, NVIDIA Research is sharing how we’re pushing the boundaries of what’s possible — from powerful image generation models that could supercharge professional creators to autonomous driving software that could help enable next-generation self-driving cars.”

At CVPR, NVIDIA also announced NVIDIA Omniverse Cloud Sensor RTX, a set of microservices that enable physically accurate sensor simulation to accelerate the development of fully autonomous machines of every kind.

Forget Fine-Tuning: JeDi Simplifies Custom Image Generation

Creators harnessing diffusion models, the most popular method for generating images based on text prompts, often have a specific character or object in mind — they may, for example, be developing a storyboard around an animated mouse or brainstorming an ad campaign for a specific toy.

Prior research has enabled these creators to personalize the output of diffusion models to focus on a specific subject using fine-tuning — where a user trains the model on a custom dataset — but the process can be time-consuming and inaccessible for general users.

JeDi, a paper by researchers from Johns Hopkins University, Toyota Technological Institute at Chicago and NVIDIA, proposes a new technique that allows users to easily personalize the output of a diffusion model within a couple of seconds using reference images. The team found that the model achieves state-of-the-art quality, significantly outperforming existing fine-tuning-based and fine-tuning-free methods.

JeDi can also be combined with retrieval-augmented generation, or RAG, to generate visuals specific to a database, such as a brand’s product catalog.

 

New Foundation Model Perfects the Pose

NVIDIA researchers at CVPR are also presenting FoundationPose, a foundation model for object pose estimation and tracking that can be instantly applied to new objects during inference, without the need for fine-tuning.

The model, which set a new record on a popular benchmark for object pose estimation, uses either a small set of reference images or a 3D representation of an object to understand its shape. It can then identify and track how that object moves and rotates in 3D across a video, even in poor lighting conditions or complex scenes with visual obstructions.

FoundationPose could be used in industrial applications to help autonomous robots identify and track the objects they interact with. It could also be used in augmented reality applications where an AI model is used to overlay visuals on a live scene.

NeRFDeformer Transforms 3D Scenes With a Single Snapshot

A NeRF is an AI model that can render a 3D scene based on a series of 2D images taken from different positions in the environment. In fields like robotics, NeRFs can be used to generate immersive 3D renders of complex real-world scenes, such as a cluttered room or a construction site. However, to make any changes, developers would need to manually define how the scene has transformed — or remake the NeRF entirely.

Researchers from the University of Illinois Urbana-Champaign and NVIDIA have simplified the process with NeRFDeformer. The method, being presented at CVPR, can successfully transform an existing NeRF using a single RGB-D image, which is a combination of a normal photo and a depth map that captures how far each object in a scene is from the camera.

VILA Visual Language Model Gets the Picture

A CVPR research collaboration between NVIDIA and the Massachusetts Institute of Technology is advancing the state of the art for vision language models, which are generative AI models that can process videos, images and text.

The group developed VILA, a family of open-source visual language models that outperforms prior neural networks on key benchmarks that test how well AI models answer questions about images. VILA’s unique pretraining process unlocked new model capabilities, including enhanced world knowledge, stronger in-context learning and the ability to reason across multiple images.

figure showing how VILA can reason based on multiple images
VILA can understand memes and reason based on multiple images or video frames.

The VILA model family can be optimized for inference using the NVIDIA TensorRT-LLM open-source library and can be deployed on NVIDIA GPUs in data centers, workstations and even edge devices.

Read more about VILA on the NVIDIA Technical Blog and GitHub.

Generative AI Fuels Autonomous Driving, Smart City Research

A dozen of the NVIDIA-authored CVPR papers focus on autonomous vehicle research. Other AV-related highlights include:

Also at CVPR, NVIDIA contributed the largest ever indoor synthetic dataset to the AI City Challenge, helping researchers and developers advance the development of solutions for smart cities and industrial automation. The challenge’s datasets were generated using NVIDIA Omniverse, a platform of APIs, SDKs and services that enable developers to build Universal Scene Description (OpenUSD)-based applications and workflows.

NVIDIA Research has hundreds of scientists and engineers worldwide, with teams focused on topics including AI, computer graphics, computer vision, self-driving cars and robotics. Learn more about NVIDIA Research at CVPR.

Read More