Time series forecasting with Amazon SageMaker AutoML

Time series forecasting with Amazon SageMaker AutoML

Time series forecasting is a critical component in various industries for making informed decisions by predicting future values of time-dependent data. A time series is a sequence of data points recorded at regular time intervals, such as daily sales revenue, hourly temperature readings, or weekly stock market prices. These forecasts are pivotal for anticipating trends and future demands in areas such as product demand, financial markets, energy consumption, and many more.

However, creating accurate and reliable forecasts poses significant challenges because of factors such as seasonality, underlying trends, and external influences that can dramatically impact the data. Additionally, traditional forecasting models often require extensive domain knowledge and manual tuning, which can be time-consuming and complex.

In this blog post, we explore a comprehensive approach to time series forecasting using the Amazon SageMaker AutoMLV2 Software Development Kit (SDK). SageMaker AutoMLV2 is part of the SageMaker Autopilot suite, which automates the end-to-end machine learning workflow from data preparation to model deployment. Throughout this blog post, we will be talking about AutoML to indicate SageMaker Autopilot APIs, as well as Amazon SageMaker Canvas AutoML capabilities. We’ll walk through the data preparation process, explain the configuration of the time series forecasting model, detail the inference process, and highlight key aspects of the project. This methodology offers insights into effective strategies for forecasting future data points in a time series, using the power of machine learning without requiring deep expertise in model development. The code for this post can be found in the GitHub repo.

The following diagram depicts the basic AutoMLV2 APIs, all of which are relevant to this post. The diagram shows the workflow for building and deploying models using the AutoMLV2 API. In the training phase, CSV data is uploaded to Amazon S3, followed by the creation of an AutoML job, model creation, and checking for job completion. The deployment phase allows you to choose between real-time inference via an endpoint or batch inference using a scheduled transform job that stores results in S3.

Basic AutoMLV2 API's

1. Data preparation

The foundation of any machine learning project is data preparation. For this project, we used a synthetic dataset containing time series data of product sales across various locations, focusing on attributes such as product code, location code, timestamp, unit sales, and promotional information. The dataset can be found in an Amazon-owned, public Amazon Simple Storage Service (Amazon S3) dataset.

When preparing your CSV file for input into a SageMaker AutoML time series forecasting model, you must ensure that it includes at least three essential columns (as described in the SageMaker AutoML V2 documentation):

  1. Item identifier attribute name: This column contains unique identifiers for each item or entity for which predictions are desired. Each identifier distinguishes the individual data series within the dataset. For example, if you’re forecasting sales for multiple products, each product would have a unique identifier.
  2. Target attribute name: This column represents the numerical values that you want to forecast. These could be sales figures, stock prices, energy usage amounts, and so on. It’s crucial that the data in this column is numeric because the forecasting models predict quantitative outcomes.
  3. Timestamp attribute name: This column indicates the specific times when the observations were recorded. The timestamp is essential for analyzing the data in a chronological context, which is fundamental to time series forecasting. The timestamps should be in a consistent and appropriate format that reflects the regularity of your data (for example, daily or hourly).

All other columns in the dataset are optional and can be used to include additional time-series related information or metadata about each item. Therefore, your CSV file should have columns named according to the preceding attributes (item identifier, target, and timestamp) as well as any other columns needed to support your use case For instance, if your dataset is about forecasting product demand, your CSV might look something like this:

  • Product_ID (item identifier): Unique product identifiers.
  • Sales (target): Historical sales data to be forecasted.
  • Date (timestamp): The dates on which sales data was recorded.

The process of splitting the training and test data in this project uses a methodical and time-aware approach to ensure that the integrity of the time series data is maintained. Here’s a detailed overview of the process:

Ensuring timestamp integrity

The first step involves converting the timestamp column of the input dataset to a datetime format using pd.to_datetime. This conversion is crucial for sorting the data chronologically in subsequent steps and for ensuring that operations on the timestamp column are consistent and accurate.

Sorting the data

The sorted dataset is critical for time series forecasting, because it ensures that data is processed in the correct temporal order. The input_data DataFrame is sorted based on three columns: product_code, location_code, and timestamp. This multi-level sort guarantees that the data is organized first by product and location, and then chronologically within each product-location grouping. This organization is essential for the logical partitioning of data into training and test sets based on time.

Splitting into training and test sets

The splitting mechanism is designed to handle each combination of product_code and location_code separately, respecting the unique temporal patterns of each product-location pair. For each group:

  • The initial test set is determined by selecting the last eight timestamps (yellow + green below). This subset represents the most recent data points that are candidates for testing the model’s forecasting ability.
  • The final test set is refined by removing the last four timestamps from the initial test set, resulting in a test dataset that includes the four timestamps immediately preceding the latest data (green below). This strategy ensures the test set is representative of the near-future periods the model is expected to predict, while also leaving out the most recent data to simulate a realistic forecasting scenario.
  • The training set comprises the remaining data points, excluding the last eight timestamps (blue below). This ensures the model is trained on historical data that precedes the test period, avoiding any data leakage and ensuring that the model learns from genuinely past observations.

This process is visualized in the following figure with an arbitrary value on the Y axis and the days of February on the X axis.

Time series data split

The test dataset is used to evaluate the performance of the trained model and compute various loss metrics, such as mean absolute error (MAE) and root-mean-squared error (RMSE). These metrics quantify the model’s accuracy in forecasting the actual values in the test set, providing a clear indication of the model’s quality and its ability to make accurate predictions. The evaluation process is detailed in the “Inference: Batch, real-time, and asynchronous” section, where we discuss the comprehensive approach to model evaluation and conditional model registration based on the computed metrics.

Creating and saving the datasets

After the data for each product-location group is categorized into training and test sets, the subsets are aggregated into comprehensive training and test DataFrames using pd.concat. This aggregation step combines the individual DataFrames stored in train_dfs and test_dfs lists into two unified DataFrames:

  • train_df for training data
  • test_df for testing data

Finally, the DataFrames are saved to CSV files (train.csv for training data and test.csv for test data), making them accessible for model training and evaluation processes. This saving step not only facilitates a clear separation of data for modelling purposes but also enables reproducibility and sharing of the prepared datasets.

Summary

This data preparation strategy meticulously respects the chronological nature of time series data and ensures that the training and test sets are appropriately aligned with real-world forecasting scenarios. By splitting the data based on the last known timestamps and carefully excluding the most recent periods from the training set, the approach mimics the challenge of predicting future values based on past observations, thereby setting the stage for a robust evaluation of the forecasting model’s performance.

2. Training a model with AutoMLV2

SageMaker AutoMLV2 reduces the resources needed to train, tune, and deploy machine learning models by automating the heavy lifting involved in model development. It provides a straightforward way to create high-quality models tailored to your specific problem type, be it classification, regression, or forecasting, among others. In this section, we delve into the steps to train a time series forecasting model with AutoMLV2.

Step 1: Define the tine series forecasting configuration

The first step involves defining the problem configuration. This configuration guides AutoMLV2 in understanding the nature of your problem and the type of solution it should seek, whether it involves classification, regression, time-series classification, computer vision, natural language processing, or fine-tuning of large language models. This versatility is crucial because it allows AutoMLV2 to adapt its approach based on the specific requirements and complexities of the task at hand. For time series forecasting, the configuration includes details such as the frequency of forecasts, the horizon over which predictions are needed, and any specific quantiles or probabilistic forecasts. Configuring the AutoMLV2 job for time series forecasting involves specifying parameters that would best use the historical sales data to predict future sales.

The AutoMLTimeSeriesForecastingConfig is a configuration object in the SageMaker AutoMLV2 SDK designed specifically for setting up time series forecasting tasks. Each argument provided to this configuration object tailors the AutoML job to the specifics of your time series data and the forecasting objectives.

time_series_config = AutoMLTimeSeriesForecastingConfig(
    forecast_frequency='W',
    forecast_horizon=4,
    item_identifier_attribute_name='product_code',
    target_attribute_name='unit_sales',
    timestamp_attribute_name='timestamp',
    ...
)

The following is a detailed explanation of each configuration argument used in your time series configuration:

  • forecast_frequency
    • Description: Specifies how often predictions should be made.
    • Value ‘W’: Indicates that forecasts are expected on a weekly basis. The model will be trained to understand and predict data as a sequence of weekly observations. Valid intervals are an integer followed by Y (year), M (month), W (week), D (day), H (hour), and min (minute). For example, 1D indicates every day and 15min indicates every 15 minutes. The value of a frequency must not overlap with the next larger frequency. For example, you must use a frequency of 1H instead of 60min.
  • forecast_horizon
    • Description: Defines the number of future time-steps the model should predict.
    • Value 4: The model will forecast four time-steps into the future. Given the weekly frequency, this means the model will predict the next four weeks of data from the last known data point.
  • forecast_quantiles
    • Description: Specifies the quantiles at which to generate probabilistic forecasts.
    • Values [p50,p60,p70,p80,p90]: These quantiles represent the 50th, 60th, 70th, 80th, and 90th percentiles of the forecast distribution, providing a range of possible outcomes and capturing forecast uncertainty. For instance, the p50 quantile (median) might be used as a central forecast, while the p90 quantile provides a higher-end forecast, where 90% of the actual data is expected to fall below the forecast, accounting for potential variability.
  • filling
    • Description: Defines how missing data should be handled before training; specifying filling strategies for different scenarios and columns.
    • Value filling_config: This should be a dictionary detailing how to fill missing values in your dataset, such as filling missing promotional data with zeros or specific columns with predefined values. This ensures the model has a complete dataset to learn from, improving its ability to make accurate forecasts.
  • item_identifier_attribute_name
    • Description: Specifies the column that uniquely identifies each time series in the dataset.
    • Value ’product_code’: This setting indicates that each unique product code represents a distinct time series. The model will treat data for each product code as a separate forecasting problem.
  • target_attribute_name
    • Description: The name of the column in your dataset that contains the values you want to predict.
    • Value unit_sales: Designates the unit_sales column as the target variable for forecasts, meaning the model will be trained to predict future sales figures.
  • timestamp_attribute_name
    • Description: The name of the column indicating the time point for each observation.
    • Value ‘timestamp’: Specifies that the timestamp column contains the temporal information necessary for modeling the time series.
  • grouping_attribute_names
    • Description: A list of column names that, in combination with the item identifier, can be used to create composite keys for forecasting.
    • Value [‘location_code’]: This setting means that forecasts will be generated for each combination of product_code and location_code. It allows the model to account for location-specific trends and patterns in sales data.

The configuration provided instructs the SageMaker AutoML to train a model capable of weekly sales forecasts for each product and location, accounting for uncertainty with quantile forecasts, handling missing data, and recognizing each product-location pair as a unique series. This detailed setup aims to optimize the forecasting model’s relevance and accuracy for your specific business context and data characteristics.

Step 2: Initialize the AutoMLV2 job

Next, initialize the AutoMLV2 job by specifying the problem configuration, the AWS role with permissions, the SageMaker session, a base job name for identification, and the output path where the model artifacts will be stored.

automl_sm_job = AutoMLV2(
    problem_config=time_series_config,
    role=role,
    sagemaker_session=sagemaker_session,
    base_job_name='time-series-forecasting-job',
    output_path=f's3://{bucket}/{prefix}/output'
)

Step 3: Fit the model

To start the training process, call the fit method on your AutoMLV2 job object. This method requires specifying the input data’s location in Amazon S3 and whether SageMaker should wait for the job to complete before proceeding further. During this step, AutoMLV2 will automatically pre-process your data, select algorithms, train multiple models, and tune them to find the best solution.

automl_sm_job.fit(
    inputs=[AutoMLDataChannel(s3_data_type='S3Prefix', s3_uri=train_uri, channel_type='training')],
    wait=True,
    logs=True
)

Please note that model fitting may take several hours, depending on the size of your dataset and compute budget. A larger compute budget allows for more powerful instance types, which can accelerate the training process. In this situation, provided you’re not running this code as part of the provided SageMaker notebook (which handles the order of code cell processing correctly), you will need to implement some custom code that monitors the training status before retrieving and deploying the best model.

3. Deploying a model with AutoMLV2

Deploying a machine learning model into production is a critical step in your machine learning workflow, enabling your applications to make predictions from new data. SageMaker AutoMLV2 not only helps build and tune your models but also provides a seamless deployment experience. In this section, we’ll guide you through deploying your best model from an AutoMLV2 job as a fully managed endpoint in SageMaker.

Step 1: Identify the best model and extract name

After your AutoMLV2 job completes, the first step in the deployment process is to identify the best performing model, also known as the best candidate. This can be achieved by using the best_candidate method of your AutoML job object. You can either use this method immediately after fitting the AutoML job or specify the job name explicitly if you’re operating on a previously completed AutoML job.

# Option 1: Directly after fitting the AutoML job
best_candidate = automl_sm_job.best_candidate()

# Option 2: Specifying the job name directly
best_candidate = automl_sm_job.best_candidate(job_name='your-auto-ml-job-name')

best_candidate_name = best_candidate['CandidateName']

Step 2: Create a SageMaker model

Before deploying, create a SageMaker model from the best candidate. This model acts as a container for the artifacts and metadata necessary to serve predictions. Use the create_model method of the AutoML job object to complete this step.

endpoint_name = f"ep-{best_candidate_name}-automl-ts"

# Create a SageMaker model from the best candidate
automl_sm_model = automl_sm_job.create_model(name=best_candidate_name, candidate=best_candidate)

4. Inference: Batch, real-time, and asynchronous

For deploying the trained model, we explore batch, real-time, and asynchronous inference methods to cater to different use cases.

The following figure is a decision tree to help you decide what type of endpoint to use. The diagram outlines a decision-making process for selecting between batch, asynchronous, or real-time inference endpoints. Starting with the need for immediate responses, it guides you through considerations like the size of the payload and the computational complexity of the model. Depending on these factors, you can choose a faster option with lower computational requirements or a slower batch process for larger datasets.

Decisioin tree for selecting between batch, asynchronous, or real-time inference endpoints

Batch inference using SageMaker pipelines

  • Usage: Ideal for generating forecasts in bulk, such as monthly sales predictions across all products and locations.
  • Process: We used SageMaker’s batch transform feature to process a large dataset of historical sales data, outputting forecasts for the specified horizon.

The inference pipeline used for batch inference demonstrates a comprehensive approach to deploying, evaluating, and conditionally registering a machine learning model for time series forecasting using SageMaker. This pipeline is structured to ensure a seamless flow from data preprocessing, through model inference, to post-inference evaluation and conditional model registration. Here’s a detailed breakdown of its construction:

  • Batch tranform step
    • Transformer Initialization: A Transformer object is created, specifying the model to use for batch inference, the compute resources to allocate, and the output path for the results.
    • Transform step creation: This step invokes the transformer to perform batch inference on the specified input data. The step is configured to handle data in CSV format, a common choice for structured time series data.
  • Evaluation step
    • Processor setup: Initializes an SKLearn processor with the specified role, framework version, instance count, and type. This processor is used for the evaluation of the model’s performance.
    • Evaluation processing: Configures the processing step to use the SKLearn processor, taking the batch transform output and test data as inputs. The processing script (evaluation.py) is specified here, which will compute evaluation metrics based on the model’s predictions and the true labels.
    • Evaluation strategy: We adopted a comprehensive evaluation approach, using metrics like mean absolute error (MAE) and root-means squared error (RMSE) to quantify the model’s accuracy and adjusting the forecasting configuration based on these insights.
    • Outputs and property files: The evaluation step produces an output file (evaluation_metrics.json) that contains the computed metrics. This file is stored in Amazon S3 and registered as a property file for later access in the pipeline.
  • Conditional model registration
    • Model metrics setup: Defines the model metrics to be associated with the model package, including statistics and explainability reports sourced from specified Amazon S3 URIs.
    • Model registration: Prepares for model registration by specifying content types, inference and transform instance types, model package group name, approval status, and model metrics.
    • Conditional registration step: Implements a condition based on the evaluation metrics (for example, MAE). If the condition (for example, MAE is greater than or equal to threshold) is met, the model is registered; otherwise, the pipeline concludes without model registration.
  • Pipeline creation and runtime
    • Pipeline definition: Assembles the pipeline by naming it and specifying the sequence of steps to run: batch transform, evaluation, and conditional registration.
    • Pipeline upserting and runtime: The pipeline.upsert method is called to create or update the pipeline based on the provided definition, and pipeline.start() runs the pipeline.

The following figure is an example of the SageMaker Pipeline directed acyclic graph (DAG).

SageMaker Pipeline directed acyclic graph (DAG) for this problem.

This pipeline effectively integrates several stages of the machine learning lifecycle into a cohesive workflow, showcasing how Amazon SageMaker can be used to automate the process of model deployment, evaluation, and conditional registration based on performance metrics. By encapsulating these steps within a single pipeline, the approach enhances efficiency, ensures consistency in model evaluation, and streamlines the model registration process—all while maintaining the flexibility to adapt to different models and evaluation criteria.

Inferencing with Amazon SageMaker Endpoint in (near) real-time

But what if you want to run inference in real-time or asynchronously? SageMaker real-time endpoint inference offers the capability to deliver immediate predictions from deployed machine learning models, crucial for scenarios demanding quick decision making. When an application sends a request to a SageMaker real-time endpoint, it processes the data in real time and returns the prediction almost immediately. This setup is optimal for use cases that require near-instant responses, such as personalized content delivery, immediate fraud detection, and live anomaly detection.

  • Usage: Suited for on-demand forecasts, such as predicting next week’s sales for a specific product at a particular location.
  • Process: We deployed the model as a SageMaker endpoint, allowing us to make real-time predictions by sending requests with the required input data.

Deployment involves specifying the number of instances and the instance type to serve predictions. This step creates an HTTPS endpoint that your applications can invoke to perform real-time predictions.

# Deploy the model to a SageMaker endpoint
predictor = automl_sm_model.deploy(initial_instance_count=1, endpoint_name=endpoint_name, instance_type='ml.m5.xlarge')

The deployment process is asynchronous, and SageMaker takes care of provisioning the necessary infrastructure, deploying your model, and ensuring the endpoint’s availability and scalability. After the model is deployed, your applications can start sending prediction requests to the endpoint URL provided by SageMaker.

While real-time inference is suitable for many use cases, there are scenarios where a slightly relaxed latency requirement can be beneficial. SageMaker Asynchronous Inference provides a queue-based system that efficiently handles inference requests, scaling resources as needed to maintain performance. This approach is particularly useful for applications that require processing of larger datasets or complex models, where the immediate response is not as critical.

  • Usage: Examples include generating detailed reports from large datasets, performing complex calculations that require significant computational time, or processing high-resolution images or lengthy audio files. This flexibility makes it a complementary option to real-time inference, especially for businesses that face fluctuating demand and seek to maintain a balance between performance and cost.
  • Process: The process of using asynchronous inference is straightforward yet powerful. Users submit their inference requests to a queue, from which SageMaker processes them sequentially. This queue-based system allows SageMaker to efficiently manage and scale resources according to the current workload, ensuring that each inference request is handled as promptly as possible.

Clean up

To avoid incurring unnecessary charges and to tidy up resources after completing the experiments or running the demos described in this post, follow these steps to delete all deployed resources:

  1. Delete the SageMaker endpoints: To delete any deployed real-time or asynchronous endpoints, use the SageMaker console or the AWS SDK. This step is crucial as endpoints can accrue significant charges if left running.
  2. Delete the SageMaker Pipeline: If you have set up a SageMaker Pipeline, delete it to ensure that there are no residual executions that might incur costs.
  3. Delete S3 artifacts: Remove all artifacts stored in your S3 buckets that were used for training, storing model artifacts, or logging. Ensure you delete only the resources related to this project to avoid data loss.
  4. Clean up any additional resources: Depending on your specific implementation and additional setup modifications, there may be other resources to consider, such as roles or logs. Check your AWS Management Console for any resources that were created and delete them if they are no longer needed.

Conclusion

This post illustrates the effectiveness of Amazon SageMaker AutoMLV2 for time series forecasting. By carefully preparing the data, thoughtfully configuring the model, and using both batch and real-time inference, we demonstrated a robust methodology for predicting future sales. This approach not only saves time and resources but also empowers businesses to make data-driven decisions with confidence.

If you’re inspired by the possibilities of time series forecasting and want to experiment further, consider exploring the SageMaker Canvas UI. SageMaker Canvas provides a user-friendly interface that simplifies the process of building and deploying machine learning models, even if you don’t have extensive coding experience.

Visit the SageMaker Canvas page to learn more about its capabilities and how it can help you streamline your forecasting projects. Begin your journey towards more intuitive and accessible machine learning solutions today!


About the Authors

Nick McCarthy is a Senior Machine Learning Engineer at AWS, based in London. He has worked with AWS clients across various industries including healthcare, finance, sports, telecoms and energy to accelerate their business outcomes through the use of AI/ML. Outside of work he loves to spend time travelling, trying new cuisines and reading about science and technology. Nick has a Bachelors degree in Astrophysics and a Masters degree in Machine Learning.

Davide Gallitelli is a Senior Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.

Read More

Automate user on-boarding for financial services with a digital assistant powered by Amazon Bedrock

Automate user on-boarding for financial services with a digital assistant powered by Amazon Bedrock

In this post, we present a solution that harnesses the power of generative AI to streamline the user onboarding process for financial services through a digital assistant. Onboarding new customers in the banking industry is a crucial step in the customer journey, involving a series of activities designed to fulfill know your customer (KYC) requirements, conduct necessary verifications, and introduce them to the bank’s products or services. Traditionally, customer onboarding has been a tedious and heavily manual process. Our solution provides practical guidance on addressing this challenge by using a generative AI assistant on AWS.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock, we build a digital assistant that automates document processing, identity verifications, and engages customers through conversational interactions. As a result, customers can be onboarded in a matter of minutes through secure, automated workflows. In this post we provide you a solution and the accompanying code that banks can use to dramatically enhance the customer experience and establish a strong customer relationship from the outset.

Challenges with traditional onboarding

The traditional onboarding process for banks faces challenges in the current digital landscape because many institutions don’t have fully automated account-opening systems. While customers in other sectors have access to intelligent assistants, those in banking often encounter legacy processes. As the financial services industry adapts to changing consumer expectations, there’s a need to address the demand for instant and 24/7 availability of services.

The challenges associated with the manual onboarding process include but aren’t limited to, the following:

  • Time-consuming paperwork – New customers are asked to manually fill out extensive paperwork including account opening forms, disclosures, and so on. Reviewing physical documents also takes up valuable staff time. This lengthy paperwork process can result in slow onboarding and a poor customer experience.
  • Security risks – Paper documents and in-person ID verification lack security compared to digital processes because of their susceptibility to tampering, loss, and lack of traceability. For example, there’s a greater risk of identity theft and fraud with physical documents, because they can be altered or misplaced without leaving an audit trail.
  • Accessibility issues – Requiring in-person account opening at branches can create accessibility challenges for many customers, including senior citizens and disabled individuals.
  • Limited service hours – The account opening process is available only during branch operating hours, which limits the timeframe when customers can complete the onboarding process. This constraint impacts the flexibility for customers to initiate account opening at their preferred time.
  • High costs – Manual paperwork processing and in-person verification are labor-intensive tasks that require significant staff time and resources, leading to high operational costs.

AI-powered services enable automated, secure, and compliant processes for self-service account opening. Providing onboarding experiences aligned with current digital standards might offer a competitive edge for banks in the future.

Solution overview

The solution allows users to open bank accounts remotely through a conversational interface, eliminating the need to visit a physical branch. We created a digital assistant named Penny to guide users through the process, including uploading KYC documents and facilitating identity verification using document scanning and facial recognition. The approach uses Retrieval Augmented Generation (RAG), which combines text generation capabilities with database querying to provide contextually relevant responses to customer inquiries. Implementing digital onboarding reduces the accessibility barriers present in traditional manual account opening processes. The code for this solution is available in a GitHub repository.

The brain of our application is a custom LangChain Agent. When a user wants to open a new bank account, the agent will help them complete the onboarding process using preconfigured stages corresponding to each onboarding step. Each stage might use a LangChain tool, allowing for the automation and orchestration of onboarding. These tools call on AWS service APIs for the required functionality.

The following figure represents the high-level architecture of the proposed solution.

User on-boarding architecture diagram

The flow of the application is as follows:

  1. Users access the frontend website hosted within AWS Amplify. AWS Amplify is an end-to-end solution that enables frontend web developers to build and deploy secure, scalable full stack applications.
  2. The website invokes an Amazon CloudFront endpoint to interact with the digital assistant, Penny, which is containerized and deployed in AWS Fargate. Fargate is a serverless compute engine for containers that manages and scales your containers for you, compatible with Amazon Elastic Container Service (Amazon ECS).
  3. The digital assistant uses a custom LangChain agent to answer questions on the bank’s products and services and orchestrate the onboarding flow.
  4. If the user asks a general question related to the bank’s products or service, the agent will use a custom LangChain tool called ProductSearch. This tool uses Amazon Kendra linked with an Amazon Simple Storage Service (Amason S3) data source that contains the bank’s data. Amazon Kendra is an intelligent enterprise search service powered by machine learning that enables companies to index and search content across their document stores.
  5. If the user indicates that they want to open a new account, the agent will prompt the user for their email. After the user responds, the application will invoke a custom LangChain tool called EmailValidation. This tools checks if there is an existing account in the bank’s Amazon DynamoDB database, by calling an endpoint deployed in Amazon API Gateway.
  6. After the email validation, KYC information is gathered, such as first and last name. Then, the user is prompted for an identity document, which is uploaded to Amazon S3.
  7. The agent will invoke a custom LangChain tool called IDVerification. This tool checks if the user details entered during the session match the ID by calling an endpoint deployed in Amazon API Gateway. The details are verified by extracting the document text using Amazon Textract, a machine learning (ML) service that automatically extracts text, handwriting, layout elements, and data from scanned documents.
  8. After the ID verification, the user is asked for a selfie. The image is uploaded to Amazon S3. Then, the agent will invoke a custom LangChain tool called SelfieVerification. This tool checks if the uploaded selfie matches the face on the ID by calling an endpoint deployed in API Gateway. The face match is detected using Amazon Rekognition, which offers pre-trained and customizable computer vision (CV) capabilities to extract information and insights from your images and videos.
  9. After the face verification is successful, the agent will use a custom LangChain tool called SaveData. This tool creates a new account in the bank’s DynamoDB database by calling an endpoint deployed in API Gateway.
  10. The user is notified that their new account has been created successfully, using Amazon Simple Email Service (Amazon SES).

Prompt design for agent orchestration

Now, let’s take a look at how we give our digital assistant, Penny, the capability to handle onboarding for financial services. The key is the prompt engineering for the custom LangChain agent. This has been specified in PennyAgent.py. This prompt includes onboarding stages and relevant LangChain tools that the agent might need to complete the onboarding steps.

To begin, we provide the agent with a name, role and company.

AGENT_TOOLS_PROMPT = """
Never forget your name is {assistant_name}. You work as a {assistant_role}.
You work at company named {bank_name}

Next, we define the various stages of onboarding and specify the respective tools and expected responses. Having stages in a sequential and structured format while also providing awareness of all possible stages helps the agent determine the onboarding stage with accuracy.

<STAGES>

These are the stages:

Introduction or greeting:  When conversation history is empty, choose stage 1
Response: Start the conversation with a greeting. Say that you can help with {bank_name} related questions or open a bank account for them. Do this only during the start of the conversation.
Tool: 
    
General Banking Questions: Customer asks general questions about AnyBank
Response: Use ProductSearch tool to get the relevant information and answer the question like a banking assistant. Never assume anything.
Tool: ProductSearch
    
Account Open 1: Customer has requested to open an account.
Response: Customer has requested to open an account. Now, respond with a question asking for the customer's email address only to get them started with onboarding. We need the email address to start the process.
Tool:
    
Account Open 2: User provided their email.
Response: Take the email and validate it using a EmailValidation tool. If it is valid and there is no existing account with the email, ask for account type: either CHEQUING or SAVINGS. If it is invalid or there is an existing account with the email, the user must try again. 
Tool: EmailValidation
    
Account Open 3: User provided which account type to open.
Response: Ask the user for their first name
Tool: 

Account Open 4: User provided first name.
Response: Ask the user for their last name
Tool: 

Account Open 5: User provided last name.
Response: Ask the user to upload an identity document.
Tool:
    
Account Open 6: Penny asked for identity document and then System notified that a new file has been uploaded
Response: Take the identity file name and verify it using the IDVerification tool. If the verification is unsuccessful, ask the user to try again. 
Tool: IDVerification
    
Account Open 7: The ID document is valid. 
Response: Ask the user to upload their selfie to compare their face to the ID.
Tool:
    
Account Open 8: Penny asked user for their selfie and then "System notified that a file has been uploaded. "
Response: Take the "selfie" file name and verify it using the SelfieVerification tool. If there is no face match, ask the user to try again.
Tool: SelfieVerification: Use this tool to verify the user selfie and compare faces. 
    
Account Open 9: Face match verified
Response: Give the summary of the all the information you collected and ask user to confirm. 
Tool:
        
Account Open 10: Confirmation
Response: Save the user data for future reference using SaveData tool. Upon saving the data, let the user know that they will receive an email confirmation of the bank account opening.
Tool: SaveData

We append the tools, their descriptions, and their response formats to the prompt. When calling on a specific tool, the agent can generate input parameters as required. Access to all the tools helps the agent identify the best tool choice based on the conversation stage.

TOOLS:
------
Penny has access to the following tools:
{tools}

We include some guidelines that the agent needs to follow while generating outputs. By using emotion-based prompt engineering, we minimize hallucinations and deviation from expected outputs. These guidelines were chosen after extensive testing to minimize edge cases and help prevent common agent mistakes.

<GUIDELINES>

1. If you ever assume any user response without asking, it may cause significant consequences.
2. It is of high priority that you respond and use appropriate tools in their respective stages. If not, it may cause significant consequences.
3. It is of high priority that you never reveal the tools or tool names to the user. Only communicate the outcome.
4. It is critical that you never reveal any details provided by the System including file names. 
5. If ever the user deviates by asking general question during your account opening process, Retrieve the necessary information using 'ProductSearch' tool and answer the question. With confidence, ask user if they want to resume the account opening process and continue from where we left off. 

The agent uses the ReAct framework to make decisions about how to respond based on user input. ReAct provides the agent with a thinking structure, through which it selects the most appropriate tool for a given task. Such frameworks make LLM agents versatile and adaptable to different use cases.

Based on the stage descriptions and the tools available, if the LLM generates a response that requires access to an external tool, then the response of the LLM will include Thought, Decision, Action, Action Input and Observation. The agent comes with a string matcher, which will detect Action and Action Input from the LLM’s response and trigger the respective tool. Based on the response from the tool, the LLM with decide whether to proceed with the Final Answer, and then the output will be returned by the agent.

FORMAT:
------

To use a tool, please always use the following format:
```
Thought: {input}
Decision: Do I need to use a tool? y
Action: what tool to use, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
```
When I am finished, I will have a response like this: 

Final Answer: [your response as a banking assistant]

Finally, we give the agent access to the conversation history to better decide what stage the conversation is currently in. In addition, we also give access to an agent scratchpad where it can store its thought processes to execute certain actions.

Be confident that you are a banking assistant and only respond with final answer.
Begin!

<Conversation history>
{conversation_history}

{agent_scratchpad}

Orchestrating intelligent digital assistants requires thoughtful prompt engineering to handle complex tasks. By structuring the conversation into stages, providing tooling, and setting guidelines, we enable the assistant to systematically complete the onboarding process. This approach allows assistants to scale across use cases while maintaining accuracy. With the right guardrails, assistants can deliver smooth, trustworthy customer experiences.

Prompt design is key to unlocking the versatility of LLMs for real-world automation. Amazon Bedrock Prompt Management can be used to streamline the creation, evaluation, versioning and testing of prompts. This will help developers and prompt engineers save time by applying the same prompt to different onboarding processes. When you create a prompt, you can select a different model for inference and adjust the variables to obtain the best-suited results for a variety of workflows.

The following sections explain how to deploy the solution in your AWS account.

Note: Running this workload would have an estimated hourly cost of $1.34 for the Oregon (us-west-2) AWS Region. Check the pricing details for each service to understand the costs you might be charged for different usage tiers and resource configurations.

Setup

To deploy the agent, visit the project Github Repository, and use the following instructions:

  1. Ensure the pre-requisites are completed as mentioned in the README.
  2. Deploy the solution including the agent, tools infrastructure, and demo application—in that order—based on the instructions in the README.
  3. After the deployment is successful, visit the outputted domain where the demo application is running. You can now begin testing the agent.

Testing the agent

Begin your exploration by accessing the Amplify endpoint where the demonstration is hosted. The demonstration incorporates an interactive chat interface, enabling you to engage in a conversational exchange with the digital assistant, Penny. Whenever you want to initiate a new instance of the agent, refresh the web page.

Let’s start talking to Penny:

  1. Enter Hi

Penny will respond with a friendly greeting

  1. Enter What are the cutoff times to receive wire transfers on the same day?

Penny will use the ProductSearch tool to find the relevant information from the loaded product catalog. You can try asking other questions about the bank’s product or services including the AnyBank Travel Rewards Visa Infinite Card or New Vehicle Loans.

  1. Enter I would like to open a new bank account

Penny will recognize that the account opening flow needs to be initiated and will proceed with the first step, which is asking you for an email address.

Open bank account

  1. Enter the verified customer email you registered with the Amazon SES identity. For our demonstration, we will use anup@test.com(parameter SesCustomerEmail used in the example command to setup infrastructure)

Penny will take the email address and run the EmailValidation Tool. If there is an existing account with this email, it will ask you to retry. Otherwise, it will move on to the next step which is gathering your account type.

  1. Enter I want a savings account or indicate that you want a checking account.

Penny will record your account type and move on to the KYC questions.

  1. Enter Anup

Penny will record your first name and continue gathering the remaining KYC information.

KYC information

  1. Enter Ravi

It will record your last name and prompt you for an ID next. We used Ravi to match the ID document provided below.

  1. Download the picture ID. It’s also located at ./api/lambdas/test/passport.png

Sample passport

Upload it to the chat by selecting Choose File.

After uploading the image, you will receive a confirmation message on the chat stating We have received your document. Penny will use ID Verification to compare the name entered during the session to the document. After verification is complete, Penny will prompt you to upload a selfie.

  1. Upload the selfie located at ./api/lambdas/test/selfie.png to the chat by selecting Choose File.

Sample selfie

After the upload is complete, you will receive a confirmation message on the chat stating We have received your document. Penny will use Selfie Verification to compare the face on the ID to the selfie for a face match. After verification is complete, Penny will prompt you to confirm that you want to proceed.

ID verification

  1. Enter Yes I confirm

Confirmation email

Penny will use Create Account to complete the onboarding process and send an email confirmation. It will inform to you of this update in the chat.

New account creation

Check the customer email you used. The email address specified as the SesCustomerEmail parameter (in this example: anup@test.com) during setup will receive a new email from the email address you set as the SesBankEmail parameter (in this example: owner@anybank.com).

  1. Go to the DynamoDB console, select Table from the navigation pane and select the table created by the AWS CloudFormation This is the accounts table in the bank’s AWS account. From the Table page, choose Explore items. You will see a new account created with the details that you entered.

Account creation DynamoDB

Guardrails and security

Security is a critical part of any application and must be rigorously addressed when developing and deploying solutions, especially those that involve handling sensitive data or interacting with users. For a solution similar to the example in this post, several robust security measures should be implemented to maintain the confidentiality, integrity, and availability of the system.

  • Address the security of the service itself. One approach to mitigate potential biases, toxicity, or other undesirable outputs is to use Constitutional AI techniques, such as those provided by the LangChain library or Guardrails for Amazon Bedrock. By defining and enforcing a set of rules or constraints, the system can be trained to generate outputs that align with predefined ethical principles and values, thereby enhancing the trustworthiness and reliability of the service.
  • To maintain data protection and privacy, implementing a write-only database architecture is recommended. In this setup, the agent or service can write data to the database but is prohibited from reading or retrieving sensitive stored information. This measure effectively isolates sensitive user data, making sure that the agent would be unable to access or disclose confidential details even in the event of a compromise.
  • Prompt injection attacks, where malicious inputs are crafted to manipulate the system’s behavior, are a serious concern in conversational AI systems. To mitigate this risk, it’s crucial to implement robust input validation and sanitization mechanisms. This could include techniques like whitelisting permissible characters, filtering out potentially harmful patterns, and employing context-aware input processing.
  • Secure coding practices, such as input validation, output encoding, and proper error handling, should be rigorously followed throughout the development process. Regular security audits, penetration testing, and vulnerability assessments should be conducted to identify and address potential weaknesses in the system.
  • Amazon API Gateway, a fully managed service, securely handles API traffic, acting as a front door for applications running on AWS. It supports multiple security mechanisms, including AWS Identity and Access Management (IAM) for authentication and authorization, AWS WAF for web application protection, AWS Secrets Manager for securely storing and retrieving secrets, and integration with AWS CloudTrail for API activity logging. API Gateway also supports client-side SSL certificates, API keys, and resource policies for granular access control.
  • Communication between users, the solution, and its internal dependencies should be protected using TLS to encrypt data in transit.
  • Additionally, the data should be encrypted using data-at-rest encryption with AWS Key Management Service (AWS KMS) customer managed keys (CMK).

By implementing these robust security measures and fostering a culture of continuous security awareness and improvement, the solution can better protect against potential threats, safeguard user privacy, and maintain the integrity and reliability of the service.

Cleanup

Follow the cleanup Instructions in the README of the Github repository to remove the environment from your account.

Conclusion

In this post, we presented an end-to-end solution that demonstrates how banks can transform user onboarding with an AI-powered digital assistant. By orchestrating workflows across AWS services, we enabled automated, secure account opening within minutes. The conversational interface delivers exceptional customer experiences while reducing operational costs.

This solution can be quickly deployed and enhanced using the features of Amazon Bedrock. Amazon Bedrock Agents streamlines workflows by executing multistep tasks and integrating with company systems and data sources. Amazon Bedrock Knowledge Bases provides contextual information from proprietary data sources, enhancing the accuracy and relevance of responses. Additionally, Amazon Bedrock Guardrails implements safeguards to enable responsible AI usage, filtering harmful content and protecting sensitive information. These can enable a robust and secure deployment of an AI-powered onboarding solution.

Key outcomes of this solution include:

  • Fully digital onboarding without paper forms or branch visits
  • Automated KYC verification using documents and facial recognition
  • Customers onboarded securely in minutes with email confirmations
  • Lower costs by reducing manual verification workloads
  • Personalized assistance for any product questions 24/7

Instant, secure, and scalable delivery has become the norm that customers demand. This AI assistant solution, powered by AWS, showcases the potential future of user onboarding for financial institutions. As consumer behaviors and expectations continue to be influenced by the latest digital experiences across industries, banks that invest in advanced technologies will gain a competitive edge over their rivals.

Ready to future proof your banking experience? Visit Artificial Intelligence and Machine learning for Financial services with AWS.


About the authors

Anup Ravindranath is a Senior Solutions Architect at Amazon Web Services (AWS) based in Toronto, Canada working with Financial Services organizations. He helps customers to transform their businesses and innovate on cloud.

Arya Subramanyam is a Solutions Architect based in Toronto, Canada. She works with Enterprise Greenfield customers as well as Small & Medium businesses as a technical advisor, helping them solve business challenges with cloud solutions. Arya holds a Bachelor of Applied Science in Computer Engineering from the University of British Columbia, Vancouver. Her passion for Generative AI has led her to develop various solutions leveraging Large Language Models (LLMs) with a focus on prompt engineering and AI agents.

Venkata Satyanarayana Chivatam is a Solutions Architect at AWS. He specializes in Generative AI and Computer Vision, with a particular focus on driving adoption across industries such as healthcare and finance. At AWS, he helps ISV and SMB customers leverage cutting-edge AI technologies to unlock new possibilities and solve complex challenges. He is passionate about supporting businesses of all sizes in their AI journey.

Akshata Ramesh Rao is a Solutions Architect in Toronto, Canada. Akshata works with enterprise customers to accelerate innovation and advise them through technical challenges. She also loves working with SMB customers and help them reach their business objectives quickly, safely, and cost-effectively with AWS services, frameworks, and best practices. Prior to joining AWS, Akshata worked as Devops Engineer at Amazon and holds a master’s degree in computer science from University of Ottawa.

Read More

US Healthcare System Deploys AI Agents, From Research to Rounds

US Healthcare System Deploys AI Agents, From Research to Rounds

The U.S. healthcare system is adopting digital health agents to harness AI across the board, from research laboratories to clinical settings.

The latest AI-accelerated tools — on display at the NVIDIA AI Summit taking place this week in Washington, D.C. — include NVIDIA NIM, a collection of cloud-native microservices that support AI model deployment and execution, and NVIDIA NIM Agent Blueprints, a catalog of pretrained, customizable workflows. 

These technologies are already in use in the public sector to advance the analysis of medical images, aid the search for new therapeutics and extract information from massive PDF databases containing text, tables and graphs. 

For example, researchers at the National Cancer Institute, part of the National Institutes of Health (NIH), are using several AI models built with NVIDIA MONAI for medical imaging — including the VISTA-3D NIM foundation model for segmenting and annotating 3D CT images. A team at NIH’s National Center for Advancing Translational Sciences (NCATS) is using the NIM Agent Blueprint for generative AI-based virtual screening to reduce the time and cost of developing novel drug molecules.

With NVIDIA NIM and NIM Agent Blueprints, medical researchers across the public sector can jump-start their adoption of state-of-the-art, optimized AI models to accelerate their work. The pretrained models are customizable based on an organization’s own data and can be continually refined based on user feedback.

NIM microservices and NIM Agent Blueprints are available at ai.nvidia.com and accessible through a wide variety of cloud service providers, global system integrators and technology solutions providers. 

Building With NIM Agent Blueprints

Dozens of NIM microservices and a growing set of NIM Agent Blueprints are available for developers to experience and download for free. They can be deployed in production with the NVIDIA AI Enterprise software platform.

  • The blueprint for generative virtual screening for drug discovery brings together three NIM microservices to help researchers search and optimize libraries of small molecules to identify promising candidates that bind to a target protein.
  • The multimodal PDF data extraction blueprint uses NVIDIA NeMo Retriever NIM microservices to extract insights from enterprise documents, helping developers build powerful AI agents and chatbots.
  • The digital human blueprint supports the creation of interactive, AI-powered avatars for customer service. These avatars have potential applications in telehealth and nonclinical aspects of patient care, such as scheduling appointments, filling out intake forms and managing prescriptions.

Two new NIM microservices for drug discovery are now available on ai.nvidia.com to help researchers understand how proteins bind to target molecules, a crucial step in drug design. By conducting more of this preclinical research digitally, scientists can narrow down their pool of drug candidates before testing in the lab — making the discovery process more efficient and less expensive. 

With the AlphaFold2-Multimer NIM microservice, researchers can accurately predict protein structure from their sequences in minutes, reducing the need for time-consuming tests in the lab. The RFdiffusion NIM microservice uses generative AI to design novel proteins that are promising drug candidates because they’re likely to bind with a target molecule. 

NCATS Accelerates Drug Discovery Research

ASPIRE, a research laboratory at NCATS, is evaluating the NIM Agent Blueprint for virtual screening and is using RAPIDS, a suite of open-source software libraries for GPU-accelerated data science, to accelerate its drug discovery research. Using the cuGraph library for graph data analytics and cuDF library for accelerating data frames, the lab’s researchers can map chemical reactions across the vast unknown chemical space. 

The NCATS informatics team reported that with NVIDIA AI, processes that used to take hours on CPU-based infrastructure are now done in seconds.

Massive quantities of healthcare data — including research papers, radiology reports and patient records — are unstructured and locked in PDF documents, making it difficult for researchers to quickly search for information. 

The Genetic and Rare Diseases Information Center, also run by NCATS, is exploring using the PDF data extraction blueprint to develop generative AI tools that enhance the center’s ability to glean information from previously unsearchable databases. These tools will help answer questions from those affected by rare diseases.

“The center analyzes data sources spanning the National Library of Medicine, the Orphanet database and other institutes and centers within the NIH to answer patient questions,” said Sam Michael, chief information officer of NCATS. “AI-powered PDF data extraction can make it massively easier to extract valuable information from previously unsearchable databases.”  

Mi-NIM-al Effort, Maximum Benefit: Getting Started With NIM 

A growing number of startups, cloud service providers and global systems integrators include NVIDIA NIM microservices and NIM Agent Blueprints as part of their platforms and services, making it easy for federal healthcare researchers to get started.   

Abridge, an NVIDIA Inception startup and NVentures portfolio company, was recently awarded a contract from the U.S. Department of Veterans Affairs to help transcribe and summarize clinical appointments, reducing the burden on doctors to document each patient interaction.

The company uses NVIDIA TensorRT-LLM to accelerate AI inference and NVIDIA Triton Inference Server for deploying its audio-to-text and content summarization models at scale, some of the same technologies that power NIM microservices.

The NIM Agent Blueprint for virtual screening is now available through AWS HealthOmics, a purpose-built service that helps customers orchestrate biological data analyses. 

Amazon Web Services (AWS) is a partner of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability Initiative, aka STRIDES Initiative, which aims to modernize the biomedical research ecosystem by reducing economic and process barriers to accessing commercial cloud services. NVIDIA and AWS are collaborating to make NIM Agent Blueprints broadly accessible to the biomedical research community. 

ConcertAI, another NVIDIA Inception member, is an oncology AI technology company focused on research and clinical standard-of-care solutions. The company is integrating NIM microservices, NVIDIA CUDA-X microservices and the NVIDIA NeMo platform into its suite of AI solutions for large-scale clinical data processing, multi-agent models and clinical foundation models. 

NVIDIA NIM microservices are supporting ConcertAI’s high-performance, low-latency AI models through its CARA AI platform. Use cases include clinical trial design, optimization and patient matching — as well as solutions that can help boost the standard of care and augment clinical decision-making.

Global systems integrator Deloitte is bringing the NIM Agent Blueprint for virtual screening to its customers worldwide. With Deloitte Atlas AI, the company can help clients at federal health agencies easily use NIM to adopt and deploy the latest generative AI pipelines for drug discovery. 

Experience NVIDIA NIM microservices and NIM Agent Blueprints today.

NVIDIA AI Summit Highlights Healthcare Innovation

At the NVIDIA AI Summit in Washington, NVIDIA leaders, customers and partners are presenting over 50 sessions highlighting impactful work in the public sector. 

Register for a free virtual pass to hear how healthcare researchers are accelerating innovation with NVIDIA-powered AI in these sessions: 

Watch the AI Summit special address by Bob Pette, vice president of enterprise platforms at NVIDIA:

 

See notice regarding software product information.

Read More

Accelerated Computing Key to Yale’s Quantum Research

Accelerated Computing Key to Yale’s Quantum Research

A recently released joint research paper by Yale, Moderna and NVIDIA reviews how techniques from quantum machine learning (QML) may enhance drug discovery methods by better predicting molecular properties.

Ultimately, this could lead to the more efficient generation of new pharmaceutical therapies.

The review also emphasizes that the key tool for exploring these methods is GPU-accelerated simulation of quantum algorithms.

The study focuses on how future quantum neural networks can use quantum computing to enhance existing AI techniques.

Applied to the pharmaceutical industry, these advances offer researchers the ability to streamline complex tasks in drug discovery.

Researching how such quantum neural networks impact real-world use cases like drug discovery requires intensive, large-scale simulations of future noiseless quantum processing units (QPUs).

This is just one example of how, as quantum computing scales up, an increasing number of challenges are only approachable with GPU-accelerated supercomputing.

The review article explores how NVIDIA’s CUDA-Q quantum development platform provides a unique tool for running such multi-GPU accelerated simulations of QML workloads.

The study also highlights CUDA-Q’s ability to simulate multiple QPUs in parallel. This is a key ability for studying realistic large-scale devices, which, in this particular study, also allowed for the exploration of quantum machine learning tasks that batch training data.

Many of the QML techniques covered by the review — such as hybrid quantum convolution neural networks — also require CUDA-Q’s ability to write programs interweaving classical and quantum resources.

The increased reliance on GPU supercomputing demonstrated in this work is the latest example of NVIDIA’s growing involvement in developing useful quantum computers.

NVIDIA plans to further highlight its role in the future of quantum computing at the SC24 conference, Nov. 17-22 in Atlanta.

Read More

A Not-So-Secret Agent: NVIDIA Unveils NIM Blueprint for Cybersecurity

A Not-So-Secret Agent: NVIDIA Unveils NIM Blueprint for Cybersecurity

Artificial intelligence is transforming cybersecurity with new generative AI tools and capabilities that were once the stuff of science fiction. And like many of the heroes in science fiction, they’re arriving just in time.

AI-enhanced cybersecurity can detect and respond to potential threats in real time — often before human analysts even become aware of them. It can analyze vast amounts of data to identify patterns and anomalies that might indicate a breach. And AI agents can automate routine security tasks, freeing up human experts to focus on more complex challenges.

All of these capabilities start with software, so NVIDIA has introduced an NVIDIA NIM Agent Blueprint for container security that developers can adapt to meet their own application requirements.

The blueprint uses NVIDIA NIM microservices, the NVIDIA Morpheus cybersecurity AI framework, NVIDIA cuVS and NVIDIA RAPIDS accelerated data analytics to help accelerate analysis of common vulnerabilities and exposures (CVEs) at enterprise scale — from days to just seconds.

All of this is included in NVIDIA AI Enterprise, a cloud-native software platform for developing and deploying secure, supported production AI applications.

Deloitte Secures Software With NVIDIA AI

Deloitte is among the first to use the NVIDIA NIM Agent Blueprint for container security in its cybersecurity solutions, which supports agentic analysis of open-source software to help enterprises build secure AI. It can help enterprises enhance and simplify cybersecurity by improving efficiency and reducing the time needed to identify threats and potential adversarial activity.

“Cybersecurity has emerged as a critical pillar in protecting digital infrastructure in the U.S. and around the world,” said Mike Morris, managing director, Deloitte & Touche LLP. “By incorporating NVIDIA’s NIM Agent Blueprint into our cybersecurity solutions, we’re able to offer our clients improved speed and accuracy in identifying and mitigating potential security threats.”

Securing Software With Generative AI

Vulnerability detection and resolution is a top use case for generative AI in software delivery, according to IDC(1).

The NIM Agent Blueprint for container security includes everything an enterprise developer needs to build and deploy customized generative AI applications for rapid vulnerability analysis of software containers.

Software containers incorporate large numbers of packages and releases, some of which may be subject to security vulnerabilities. Traditionally, security analysts would need to review each of these packages to understand potential security exploits across any software deployment.

These manual processes are tedious, time-consuming and error-prone. They’re also difficult to automate effectively because of the complexity of aligning software packages, dependencies, configurations and the operating environment.

With generative AI, cybersecurity applications can rapidly digest and decipher information across a wide range of data sources, including natural language, to better understand the context in which potential vulnerabilities could be exploited.

Enterprises can then create cybersecurity AI agents that take action on this generative AI intelligence. The NIM Agent Blueprint for container security enables quick, automatic and actionable CVE risk analysis using large language models and retrieval-augmented generation for agentic AI applications. It helps developers and security teams protect software with AI to enhance accuracy, efficiency and streamline potential issues for human agents to investigate.

Blueprints for Cybersecurity Success

The new NVIDIA NIM Agent Blueprint for container security includes the NVIDIA Morpheus cybersecurity AI framework to reduce the time and cost associated with identifying, capturing and acting on threats. This brings a new level of security to the data center, cloud and edge.

The GPU-accelerated, end-to-end AI framework enables developers to create optimized applications for filtering, processing and classifying large volumes of streaming cybersecurity data.

Built on NVIDIA RAPIDS software, Morpheus accelerates data processing workloads at enterprise scale. It uses the power of RAPIDS cuDF for fast and efficient data operations, ensuring downstream pipelines harness all available GPU cores for complex agentic AI tasks.

Morpheus also extends human analysts’ capabilities by automating real-time analysis and responses, producing synthetic data to train AI models that identify risks accurately and to run what-if scenarios.

The NVIDIA NIM Agent Blueprint for container security is available now. Learn more in the NVIDIA AI Summit DC special address.

(1) Source: IDC, GenAI Awareness, Readiness, and Commitment: 2024 Outlook — GenAI Plans and Implications for External Services Providers, AI-Ready Infrastructure, AI Platforms, and GenAI Applications US52023824, April 2024

Read More

From Concept to Compliance, MITRE Digital Proving Ground Will Accelerate Validation of Autonomous Vehicles

From Concept to Compliance, MITRE Digital Proving Ground Will Accelerate Validation of Autonomous Vehicles

The path to safe, widespread autonomous vehicles is going digital.

MITRE — a government-sponsored nonprofit research organization — today announced its partnership with Mcity at the University of Michigan to develop a virtual and physical autonomous vehicle (AV) validation platform for industry deployment.

As part of this collaboration, announced during the NVIDIA AI Summit in Washington, D.C., MITRE will use Mcity’s simulation tools and a digital twin of its Mcity Test Facility, a real-world AV test environment in its Digital Proving Ground (DPG). The joint platform will deliver physically based sensor simulation enabled by NVIDIA Omniverse Cloud Sensor RTX APIs.

By combining these simulation capabilities with the MITRE DPG reporting framework, developers will be able to perform exhaustive testing in a simulated world to safely validate AVs before real-world deployment.

The current regulatory environment for AVs is highly fragmented, posing significant challenges for widespread deployment. Today, companies navigate regulations at various levels — city, state and the federal government — without a clear path to large-scale deployment. MITRE and Mcity aim to address this ambiguity with comprehensive validation resources open to the entire industry.

Mcity currently operates a 32-acre mock city for automakers and researchers to test their technology. Mcity is also building a digital framework around its physical proving ground to provide developers with AV data and simulation tools.

Raising Safety Standards

One of the largest gaps in the regulatory framework is the absence of universally accepted safety standards that the industry and regulators can rely on.

The lack of common standards leaves regulators with limited tools to verify AV performance and safety in a repeatable manner, while companies struggle to demonstrate the maturity of their AV technology. The ability to do so is crucial in the wake of public road incidents, where AV developers need to demonstrate the reliability of their software in a way that is acceptable to both industry and regulators.

Efforts like the National Highway Traffic Safety Administration’s New Car Assessment Program (NCAP) have been instrumental in setting benchmarks for vehicle safety in traditional automotive development. However, NCAP is insufficient for AV evaluation, where measures of safety go beyond crash tests to the complexity of real-time decision-making in dynamic environments.

Additionally, traditional road testing presents inherent limitations, as it exposes vehicles to real-world conditions but lacks the scalability needed to prove safety across a wide variety of edge cases. It’s particularly difficult to test rare and dangerous scenarios on public roads without significant risk.

By providing both physical and digital resources to validate AVs, MITRE and Mcity will be able to offer a safe, universally accessible solution that addresses the complexity of verifying autonomy.

Physically Based Sensor Simulation

A core piece of this collaboration is sensor simulation, which models the physics and behavior of cameras, lidars, radars and ultrasonic sensors on a physical vehicle, as well as how these sensors interact with their surroundings.

Sensor simulation enables developers to train against and test rare and dangerous scenarios — such as extreme weather conditions, sudden pedestrian crossings or unpredictable driver behavior — safely in virtual settings.

In collaboration with regulators, AV companies can use sensor simulation to recreate a real-world event, analyze their system’s response and evaluate how their vehicle performed — accelerating the validation process.

Moreover, simulation tests are repeatable, meaning developers can track improvements or regressions in the AV stack over time. This means AV companies can provide quantitative evidence to regulators to show that their system is evolving and addressing safety concerns.

Bridging Industry and Regulators

MITRE and its ecosystem are actively developing the Digital Proving Ground platform to facilitate industry-wide standards and regulations.

The platform will be an open and accessible national resource for accelerating safe AV development and deployment, providing a trusted simulation test environment.

Mcity will contribute simulation infrastructure, a digital twin and the ability to seamlessly connect virtual and physical worlds with NVIDIA Omniverse, an open platform enabling system developers to build physical AI and robotic system simulation applications. By integrating this virtual proving ground into DPG, the collaboration will also accelerate the development and use of advanced digital engineering and simulation for AV safety assurance.

Mcity’s simulation tools will connect to Omniverse Cloud Sensor RTX APIs and render a Universal Scene Description (USD) model of Mcity’s physical proving ground. DPG will be able to access this environment, simulate the behavior of vehicles and pedestrians in a realistic test environment and use the DPG reporting framework to explain how the AV performed.

This testing will then be replicated on the physical Mcity proving ground to create a comprehensive feedback loop.

The Road Ahead

As developers, automakers and regulators continue to collaborate, the industry is moving closer to a future where AVs can operate safely and at scale. The establishment of a repeatable testbed for validating safety — across real and simulated environments — will be critical to gaining public trust and regulatory approval, bringing the promise of AVs closer to reality.

Read More

SETI Institute Researchers Engage in World’s First Real-Time AI Search for Fast Radio Bursts

SETI Institute Researchers Engage in World’s First Real-Time AI Search for Fast Radio Bursts

This summer, scientists supercharged their tools in the hunt for signs of life beyond Earth.

Researchers at the SETI Institute became the first to apply AI to the real-time direct detection of faint radio signals from space. Their advances in radio astronomy are available for any field that applies accelerated computing and AI.

“We’re on the cusp of a fundamentally different way of analyzing streaming astronomical data, and the kinds of things we’ll be able to discover with it will be quite amazing,” said Andrew Siemion, Bernard M. Oliver Chair for SETI at the SETI Institute, a group formed in 1984 that now includes more than 120 scientists.

The SETI Institute operates the Allen Telescope Array (pictured above) in Northern California. It’s a cutting-edge telescope used in the search for extraterrestrial intelligence (SETI) as well as for the study of intriguing transient astronomical events such as fast radio bursts.

Germinating AI

The seed of the latest project was planted more than a decade ago. Siemion attended a talk at the University of California, Berkeley, about an early version of machine learning, a classifier that analyzed radio signals like the ones his team gathered from deep space.

“I was really impressed, and realized the ways SETI researchers detected signals at the time were rather naive,” said Siemion, who earned his Ph.D. in astrophysics at Berkeley.

The researchers started connecting with radio experts in conferences outside the field of astronomy. There, they met Adam Thompson, who leads a group of developers at NVIDIA.

“We explained our challenges searching the extremely wide bandwidth of signals from space at high data rates,” Siemion said.

SETI Institute researchers had been using NVIDIA GPUs for years to accelerate the algorithms that separate signals from background noise. Now they thought there was potential to do more.

A Demo Leads to a Pilot

It took time — in part due to the coronavirus pandemic — but earlier this year, Thompson showed Siemion’s team a new product, NVIDIA Holoscan, a sensor processing platform for processing real Ntime data from scientific instruments.

Siemion’s team decided to build a trial application with Holoscan on the NVIDIA IGX edge computing platform that, if successful, could radically change the way the SETI Institute worked.

The institute collaborates with Breakthrough Listen, another SETI Institute research program, headquartered at the University of Oxford, that uses dozens of radio telescopes to collect and store mountains of data, later analyzed in separate processes using GPUs. Each telescope and analysis employs separate, custom-built programs.

“We wanted to create something that would really push our capabilities forward,” Siemion said. “We envisioned a streaming solution that in a more general way takes real-time data from telescopes and brings it directly into the GPUs to do AI inference on it.”

Pointing at the Stars

In a team effort, Luigi Cruz, a staff engineer at the SETI Institute, developed the real-time data reception and inference pipeline using the Holoscan SDK, while Peter Ma, a Breakthrough Listen collaborator, built and trained an AI model to detect fast radio bursts, one of many radio phenomena tracked by astronomers. Wael Farah, Allen Telescope Array project scientist, provided key contributions to the scientific aspects of the study.

They linked the combined real-time Holoscan pipeline, running on an NVIDIA IGX Orin platform, to 28 antennas pointed at the Crab Nebula. Over 15 hours, they gathered more than 90 billion data packets on signals across a spectrum of 5GHz.

Their system captured and analyzed in real time nearly the full 100Gbps of data from the experiment, twice the previous speed the astronomers had achieved. What’s more, they saw how the same code could be used with any telescope to detect all sorts of signals.

‘It’s Like a Magic Wand’

The test was “fantastically successful,” said Siemion. “It’s hard to overstate the transformative potential of Holoscan for radio astronomy because it’s like we’ve been given a magic wand to get all our data from telescopes into accelerated computers that are ideally suited for AI.”

He called the direct memory access in NVIDIA GPUs “a game changer.”

Rather than throw away some of its data to enable more efficient processing — as it did in the past — institute researchers can keep and analyze all of it, fast.

“It’s a profound change in how radio astronomy is done,” he said. “Now we have a viable path to a very different way of using telescopes with smart AI software, and if we do that in a scalable way the opportunities for discovery will be legion.”

Scaling Up the Pilot

The team plans to scale up its pilot software and deploy it in all the radio telescopes it currently uses across a dozen sites. It also aims to share the capability in collaborations with astronomers worldwide.

“Our intent is to bring this to larger international observatories with thousands of users and uses,” Siemion said.

The partnerships extend to globally distributed arrays of telescopes now under construction that promise to increase by an order of magnitude the kinds of signals space researchers can detect.

Sharing the Technology Broadly

Collaboration has been a huge theme for Siemion since 2015, when he became principal investigator for Breakthrough Listen.

“We voraciously collaborate with anyone we can find,” he said in a video interview from the Netherlands, where he was meeting local astronomers.

Work with NVIDIA was just one part of efforts that involve companies and governments across technical and scientific disciplines.

“The engineering talent at NVIDIA is world class … I can’t say enough about Adam and the Holoscan team,” he said.

The software opens a big door to technical collaborations.

“Holoscan lets us tap into a developer community far larger than those in astronomy with complementary skills,” he said. “It will be exciting to see if, say, a cancer algorithm could be repurposed to look for a novel astronomical source and vice versa.”

It’s one more way, NVIDIA and its customers are advancing AI for the benefit of all.

Read More

TSMC and NVIDIA Transform Semiconductor Manufacturing With Accelerated Computing

TSMC and NVIDIA Transform Semiconductor Manufacturing With Accelerated Computing

TSMC, the world leader in semiconductor manufacturing, is moving to production with NVIDIA’s computational lithography platform, called cuLitho, to accelerate manufacturing and push the limits of physics for the next generation of advanced semiconductor chips.

A critical step in the manufacture of computer chips, computational lithography is involved in the transfer of circuitry onto silicon. It requires complex computation — involving electromagnetic physics, photochemistry, computational geometry, iterative optimization and distributed computing. A typical foundry dedicates massive data centers for this computation, and yet this step has traditionally been a bottleneck in bringing new technology nodes and computer architectures to market.

Computational lithography is also the most compute-intensive workload in the entire semiconductor design and manufacturing process. It consumes tens of billions of hours per year on CPUs in the leading-edge foundries. A typical mask set for a chip can take 30 million or more hours of CPU compute time, necessitating large data centers within semiconductor foundries. With accelerated computing, 350 NVIDIA H100 Tensor Core GPU-based systems can now replace 40,000 CPU systems, accelerating production time, while reducing costs, space and power.

NVIDIA cuLitho brings accelerated computing to the field of computational lithography. Moving cuLitho to production is enabling TSMC to accelerate the development of next-generation chip technology, just as current production processes are nearing the limits of what physics makes possible.

“Our work with NVIDIA to integrate GPU-accelerated computing in the TSMC workflow has resulted in great leaps in performance, dramatic throughput improvement, shortened cycle time and reduced power requirements,” said Dr. C.C. Wei, CEO of TSMC, at the GTC conference earlier this year.

NVIDIA has also developed algorithms to apply generative AI to enhance the value of the cuLitho platform. A new generative AI workflow has been shown to deliver an additional 2x speedup on top of the accelerated processes enabled through cuLitho.

The application of generative AI enables creation of a near-perfect inverse mask or inverse solution to account for diffraction of light involved in computational lithography. The final mask is then derived by traditional and physically rigorous methods, speeding up the overall optical proximity correction process by 2x.

The use of optical proximity correction in semiconductor lithography is now three decades old. While the field has benefited from numerous contributions over this period, rarely has it seen a transformation quite as rapid as the one provided by the twin technologies of accelerated computing and AI. These together allow for the more accurate simulation of physics and the realization of mathematical techniques that were once prohibitively resource-intensive.

This enormous speedup of computational lithography accelerates the creation of every single mask in the fab, which speeds the total cycle time for developing a new technology node. More importantly, it makes possible new calculations that were previously impractical.

For example, while inverse lithography techniques have been described in the scientific literature for two decades, an accurate realization at full chip scale has been largely precluded because the computation takes too long. With cuLitho, that’s no longer the case. Leading-edge foundries will use it to ramp up inverse and curvilinear solutions that will help create the next generation of powerful semiconductors.

Image courtesy of TSMC.

Read More

Pittsburgh Steels Itself for Innovation With Launch of NVIDIA AI Tech Community

Pittsburgh Steels Itself for Innovation With Launch of NVIDIA AI Tech Community

Serving as a bridge for academia, industry and public-sector groups to partner on artificial intelligence innovation, NVIDIA is launching its inaugural AI Tech Community in Pittsburgh, Pennsylvania.

Collaborations with Carnegie Mellon University and the University of Pittsburgh, as well as startups, enterprises and organizations based in the “city of bridges,” are part of the new NVIDIA AI Tech Community initiative, announced today during the NVIDIA AI Summit in Washington, D.C.

The initiative aims to supercharge public-private partnerships across communities rich with potential for enabling technological transformation using AI.

Two NVIDIA joint technology centers will be established in Pittsburgh to tap into expertise in the region.

NVIDIA’s Joint Center with Carnegie Mellon University (CMU) for Robotics, Autonomy and AI will equip higher-education faculty, students and researchers with the latest technologies and boost innovation in the fields of AI and robotics.

NVIDIA’s Joint Center with the University of Pittsburgh for AI and Intelligent Systems will focus on computational opportunities across the health sciences, including applications of AI in clinical medicine and biomanufacturing.

CMU — the nation’s No. 1 AI university according to the U.S. News & World Report — has pioneered work in autonomous vehicles and natural language processing.

CMU’s Robotics Institute, the world’s largest university-affiliated robotics research group, brings a diverse group of more than a thousand faculty, staff, students, post-doctoral fellows and visitors together to solve humanity’s toughest challenges through robotics.

The University of Pittsburgh — designated as an R1 research university at the forefront of innovation — is ranked No. 6 among U.S. universities in research funding from the National Institutes of Health, topping more than $1 billion in research expenditures in fiscal year 2022 and ranking No. 14 among U.S. universities granted utility patents.

The university has a long history of learning-technology innovations that are interdisciplinary and conducted within research-practice partnerships. By prioritizing inclusivity and practical experience without technical barriers, Pitt is leading the way in democratizing AI education in healthcare and medicine.

By working with these universities, NVIDIA aims to accelerate the innovation, commercialization and operationalization of a technical community for physical AI, robotics, autonomous systems and AI across the nation — and the globe.

These centers will tap into NVIDIA’s full-stack AI platform and accelerated computing expertise to gear up tomorrow’s technology leaders for next-generation innovation.

Establishing the Centers for AI Development 

Generative AI and accelerated computing are transforming workflows across use cases. Three key AI platforms comprise the engine behind this transformation: NVIDIA DGX for AI training, NVIDIA Omniverse for simulation and NVIDIA Jetson for edge computing.

Through the new centers and public-sector-sponsored research opportunities, NVIDIA will provide CMU and Pitt with access to these and more of the company’s latest AI software and frameworks — such as NVIDIA Isaac Lab for robot learning, NVIDIA Isaac Sim for designing and testing robots, NVIDIA NeMo for custom generative AI and NVIDIA NIM microservices, available through the NVIDIA AI Enterprise software platform.

Advanced NVIDIA technological support can help accelerate the research groups’ workflows and enhance the scalability and resiliency of their AI applications.

In addition, the universities will have access to certain generative AI, data science and accelerated computing resources through the NVIDIA Deep Learning Institute, which provides training to meet diverse learning needs and upskill students and developers in AI.

“Pairing Carnegie Mellon University’s existing deep expertise and resources in AI and robotics with NVIDIA’s cutting-edge platform, software and tools has tremendous potential to power Pittsburgh’s already vibrant innovation ecosystem,” said Theresa Mayer, vice president for research at CMU. “This unique collaboration will accelerate innovation, commercialization and operationalization of robotics and autonomy, advancing the best impacts of AI on society.”

“Pitt has a long history and extraordinary research strengths in life sciences and learning sciences,” said Rob A. Rutenbar, senior vice chancellor for research at the University of Pittsburgh. “By focusing on computational and AI opportunities across these ‘meds and eds’ areas, we plan to leverage our collaboration with NVIDIA to explore new ways to connect these breakthroughs to improved health and education outcomes for everybody.”

Fostering Cross-Industry Collaboration

As part of the AI Tech Community initiative, NVIDIA is also increasing its engagement with Pittsburgh-based members of the NVIDIA Inception program for cutting-edge AI startups and the NVIDIA Connect program for software development companies and service providers.

For example, Inception member Lovelace AI is developing AI solutions using NVIDIA accelerated computing and CUDA to enhance the analysis of kinetic data, providing predictive analytics and actionable insights for national security customers.

Skild AI, a startup founded by two Carnegie Mellon professors, is developing a scalable robotics foundation model, called Skild Brain, that can easily adapt across hardware and tasks.

Skild AI is exploring NVIDIA Isaac Lab, a unified, modular framework for robot learning built on the NVIDIA Isaac Sim reference application for designing, simulating and training AI-based robots.

NVIDIA is also engaging with Pittsburgh’s broader robotics ecosystem through its collaborations with the Pittsburgh Robotics Network — which speeds the commercialization of robotics, AI and other advanced technologies — and technology accelerators like AlphaLab and the Robotics Factory at Innovation Works, which supports startups based in the city that are focused on AI, robotics and autonomy.

And through its Deep Learning Institute, which has trained more than 650,000 people, NVIDIA is committed to furthering AI workforce development worldwide.

Learn more about how NVIDIA is propelling the next era of computing in higher education and research, including at the NVIDIA AI Summit, running through Oct. 9. NVIDIA Vice President of Developer Programs Greg Estes will discuss scaling AI skills and economic growth through public-private collaboration.

Featured image courtesy of Wikimedia Commons.

Read More