A New ERA of AI Factories: NVIDIA Unveils Enterprise Reference Architectures

A New ERA of AI Factories: NVIDIA Unveils Enterprise Reference Architectures

As the world transitions from general-purpose to accelerated computing, finding a path to building data center infrastructure at scale is becoming more important than ever. Enterprises must navigate uncharted waters when designing and deploying infrastructure to support these new AI workloads.

Constant developments in model capabilities and software frameworks, along with the novelty of these workloads, mean best practices and standardized approaches are still in their infancy. This state of flux can make it difficult for enterprises to establish long-term strategies and invest in infrastructure with confidence.

To address these challenges, NVIDIA is unveiling Enterprise Reference Architectures (Enterprise RAs). These comprehensive blueprints help NVIDIA systems partners and joint customers build their own AI factories — high-performance, scalable and secure data centers for manufacturing intelligence.

Building AI Factories to Unlock Enterprise Growth

NVIDIA Enterprise RAs help organizations avoid pitfalls when designing AI factories by providing full-stack hardware and software recommendations, and detailed guidance on optimal server, cluster and network configurations for modern AI workloads.

Enterprise RAs can reduce the time and cost of deploying AI infrastructure solutions by providing a streamlined approach for building flexible and cost-effective accelerated infrastructure, while ensuring compatibility and interoperability.

Each Enterprise RA includes recommendations for:

  • Accelerated infrastructure based on an optimized NVIDIA-Certified server configuration, featuring the latest NVIDIA GPUs, CPUs and networking technologies, that’s been tested and validated to deliver performance at scale.
  • AI-optimized networking with the NVIDIA Spectrum-X AI Ethernet platform and NVIDIA BlueField-3 DPUs to deliver peak network performance, and guidance on optimal network configurations at multiple design points to address varying workload and scale requirements.
  • The NVIDIA AI Enterprise software platform for production AI, which includes NVIDIA NeMo and NVIDIA NIM microservices for easily building and deploying AI applications, and NVIDIA Base Command Manager Essentials for infrastructure provisioning, workload management and resource monitoring.

Businesses that deploy AI workloads on partner solutions based upon Enterprise RAs, which are informed by NVIDIA’s years of expertise in designing and building large-scale computing systems, will benefit from:

  • Accelerated time to market: By using NVIDIA’s structured approach and recommended designs, enterprises can deploy AI solutions faster, reducing the time to achieve business value.
  • Performance: Build upon tested and validated technologies with the confidence that AI workloads will run at peak performance.
  • Scalability and manageability: Develop AI infrastructure while incorporating design best practices that enable flexibility and scale and help ensure optimal network performance.
  • Security: Run workloads securely on AI infrastructure that’s engineered with zero trust in mind, supports confidential computing and is optimized for the latest cybersecurity AI innovations.
  • Reduced complexity: Accelerate deployment timelines, while avoiding design and planning pitfalls, through optimal server, cluster and network configurations for AI workloads.

Availability

Solutions based upon NVIDIA Enterprise RAs are available from NVIDIA’s global partners, including Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro.

Learn more about NVIDIA-Certified Systems and NVIDIA Enterprise Reference Architectures.

Read More

Automate document processing with Amazon Bedrock Prompt Flows (preview)

Automate document processing with Amazon Bedrock Prompt Flows (preview)

Enterprises in industries like manufacturing, finance, and healthcare are inundated with a constant flow of documents—from financial reports and contracts to patient records and supply chain documents. Historically, processing and extracting insights from these unstructured data sources has been a manual, time-consuming, and error-prone task. However, the rise of intelligent document processing (IDP), which uses the power of artificial intelligence and machine learning (AI/ML) to automate the extraction, classification, and analysis of data from various document types is transforming the game. For manufacturers, this means streamlining processes like purchase order management, invoice processing, and supply chain documentation. Financial services firms can accelerate workflows around loan applications, account openings, and regulatory reporting. And in healthcare, IDP revolutionizes patient onboarding, claims processing, and medical record keeping.

By integrating IDP into their operations, organizations across these key industries experience transformative benefits: increased efficiency and productivity through the reduction of manual data entry, improved accuracy and compliance by reducing human errors, enhanced customer experiences due to faster document processing, greater scalability to handle growing volumes of documents, and lower operational costs associated with document management.

This post demonstrates how to build an IDP pipeline for automatically extracting and processing data from documents using Amazon Bedrock Prompt Flows, a fully managed service that enables you to build generative AI workflow using Amazon Bedrock and other services in an intuitive visual builder. Amazon Bedrock Prompt Flows allows you to quickly update your pipelines as your business changes, scaling your document processing workflows to help meet evolving demands.

Solution overview

To be scalable and cost-effective, this solution uses serverless technologies and managed services. In addition to Amazon Bedrock Prompt Flows, the solution uses the following services:

  • Amazon Textract – Automatically extracts printed text, handwriting, and data from
  • Amazon Simple Storage Service (Amazon S3) – Object storage built to retrieve data from anywhere.
  • Amazon Simple Notification Service (Amazon SNS) – A highly available, durable, secure, and fully managed publish-subscribe (pub/sub) messaging service to decouple microservices, distributed systems, and serverless applications.
  • AWS Lambda – A compute service that runs code in response to triggers such as changes in data, changes in application state, or user actions. Because services such as Amazon S3 and Amazon SNS can directly trigger an AWS Lambda function, you can build a variety of real-time serverless data-processing systems.
  • Amazon DynamoDB – a serverless, NoSQL, fully-managed database with single-digit millisecond performance at

Solution architecture

The solution proposed contains the following steps:

  1. Users upload a PDF for analysis to Amazon S3.
  2. The Amazon S3 upload triggers an AWS Lambda function execution.
  3. The function invokes Amazon Textract to extract text from the PDF in batch mode.
  4. Amazon Textract sends an SNS notification when the job is complete.
  5. An AWS Lambda function reads the Amazon Textract response and calls an Amazon Bedrock prompt flow to classify the document.
  6. Results of the classification are stored in Amazon S3 and sent to a destination AWS Lambda function.
  7. The destination AWS Lambda function calls an Amazon Bedrock prompt flow to extract and analyze data based on the document class provided.
  8. Results of the extraction and analysis are stored in Amazon S3.

This workflow is shown in the following diagram.

Architecture

In the following sections, we dive deep into how to build your IDP pipeline with Amazon Bedrock Prompt Flows.

Prerequisites

To complete the activities described in this post, ensure that you complete the following prerequisites in your local environment:

Implementation time and cost estimation

Time to complete ~ 60 minutes
Cost to run 1000 pages Under $25
Time to cleanup ~20 minutes
Learning level Advanced (300)

Deploy the solution

To deploy the solution, follow these steps:

  1. Clone the GitHub repository
  2. Use the shell script to build and deploy the solution by running the following commands from your project root directory:
chmod +x deploy.sh
./deploy.sh
  1. This will trigger the AWS CloudFormation template in your AWS account.

Test the solution

Once the template is deployed successfully, follow these steps to test the solution:

  1. On the AWS CloudFormation console, select the stack that was deployed
  2. Select the Resources tab
  3. Locate the resources labeled SourceS3Bucket and DestinationS3Bucket, as shown in the following screenshot. Select the link to open the SourceS3Bucket in a new tab

CloudFormation S3 Resources

  1. Select Upload and then Add folder
  2. Under sample_files, select the folder customer123, then choose Upload

Alternatively, you can upload the folder using the following AWS CLI command from the root of the project:

aws s3 sync ./sample_files/customer123 s3://[SourceS3Bucket_NAME]/customer123

After a few minutes the uploaded files will be processed. To view the results, follow these steps:

  1. Open the DestinationS3Bucket
  2. Under customer123, you should see a folder for documents for the processing jobs. Download and review the files locally using the console or with the following AWS CLI command
aws s3 sync s3://[DestinationS3Bucket_NAME]/customer123 ./result_files/customer123

Inside the folder for customer123 you will see several subfolders, as shown in the following diagram:

customer123
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── FOR_REVIEW
        ├── pages_0.txt
        └── report.txt
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── URLA_1003
        ├── pages_0.json
        ├── pages_0.txt
        └── report.txt
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── BANK_STATEMENT
        ├── pages_0.json
        ├── pages_0.txt
        └── report.txt
└── [Long Textract Job ID]
    ├── classify_response.txt
    ├── input_doc.txt
    └── DRIVERS_LICENSE
        ├── pages_0.json
        ├── pages_0.txt
        └── report.txt

How it works

After the document text is extracted, it is sent to a classify prompt flow along with a list of classes, as shown in the following screenshot:

Classify Flow

The list of classes is generated in the AWS Lambda function by using the API to identify existing prompt flows that contain class definitions in their description. This approach allows us to expand the solution to new document types by adding a new prompt flow supporting the new document class, as shown in the following screenshot:

Prompt flows

For each document type, you can implement an extract and analyze flow that is appropriate to this document type. The following screenshot shows an example flow from the URLA_1003 flow. In this case, a prompt is used to convert the text to a standardized JSON format, and a second prompt then analyzes that JSON document to generate a report to the processing agent.

URLA Flow

Expand the solution using Amazon Bedrock Prompt Flows

To adapt to new use cases without changing the underlying code, use Amazon Bedrock Prompt Flows as described in the following steps.

Create a new prompt

From the files you downloaded, look for a folder named FOR_REVIEW. This folder contains documents that were processed and did not fit into an existing class. Open report.txt and review the suggested document class and proposed JSON template.

  1. In the navigation pane in Amazon Bedrock, open Prompt management and select Create prompt, as shown in the following screenshot:

Create Prompt

  1. Name the new prompt IDP_PAYSTUB_JSON and then choose Create
  2. In the Prompt box, enter the following text. Replace COPY YOUR JSON HERE with the JSON template from your txt file
Analyze the provided paystub
<PAYSTUB>
{{doc_text}}
</PAYSTUB>

Provide a structured JSON object containing the following information:

[COPY YOUR JSON HERE]

The following screenshot demonstrates this step.

Prompt Builder

  1. Choose Select model and choose Anthropic Claude 3 Sonnet
  2. Save your changes by choosing Save draft
  3. To test your prompt, open the pages_[n].txt file FOR_REVIEW folder and copy the content into the doc_text input box. Choose Run and the model should return a response

The following screenshot demonstrates this step.

Prompt test

  1. When you are satisfied with the results, choose Create Version. Note the version number because you will need it in the next section

Create a prompt flow

Now we will create a prompt flow using the prompt you created in the previous section.

  1. In the navigation menu, choose Prompt flows and then choose Create prompt flow, as shown in the following screenshot:

Create flow

  1. Name the new flow IDP_PAYSTUB
  2. Choose Create and use a new service role and then choose Save

Next, create the flow using the following steps. When you are done, the flow should resemble the following screenshot.

Paystub flow

  1. Configure the Flow input node:
    1. Choose the Flow input node and select the Configure
    2. Select Object as the Type. This means that flow invocation will expect to receive a JSON object.
  2. Add the S3 Retrieval node:
    1. In the Prompt flow builder navigation pane, select the Nodes tab
    2. Drag an S3 Retrieval node into your flow in the center pane
    3. In the Prompt flow builder pane, select the Configure tab
    4. Enter get_doc_text as the Node name
    5. Expand the Inputs Set the input express for objectKey to $.data.doc_text_s3key
    6. Drag a connection from the output of the Flow input node to the objectKey input of this node
  3. Add the Prompt node:
    1. Drag a Prompt node into your flow in the center pane
    2. In the Prompt flow builder pane, select the Configure tab
    3. Enter map_to_json as the Node name
    4. Choose Use a prompt from your Prompt Management
    5. Select IDP_PAYSTUB_JSON from the dropdown
    6. Choose the version you noted previously
    7. Drag a connection from the output of the get_doc_text node to the doc_text input of this node
  4. Add the S3 Storage node:
    1. In the Prompt flow builder navigation pane, select the Nodes tab
    2. Drag an S3 Storage node into your flow in the center pane
    3. In the Prompt flow builder pane, select the Configure tab in
    4. Enter save_json as the Node name
    5. Expand the Inputs Set the input express for objectKey to $.data.JSON_s3key
    6. Drag a connection from the output of the Flow input node to the objectKey input of this node
    7. Drag a connection from the output of the map_to_json node to the content input of this node
  5. Configure the Flow output node:
    1. Drag a connection from the output of the save_json node to the input of this node
  6. Choose Save to save your flow. Your flow should now be prepared for testing
    1. To test your flow, in the Test prompt flow pane on the right, enter the following JSON object. Choose Run and the flow should return a model response
    2. When you are satisfied with the result, choose Save and exit
{
"doc_text_s3key": "[PATH TO YOUR TEXT FILE IN S3].txt",
"JSON_s3key": "[PATH TO YOUR TEXT FILE IN S3].json"
}

To get the path to your file, follow these steps:

  1. Navigate to FOR_REVIEW in S3 and choose the pages_[n].txt file
  2. Choose the Properties tab
  3. Copy the key path by selecting the copy icon to the left of the key value, as shown in the following screenshot. Be sure to replace .txt with .json in the second line of input as noted previously.

S3 object key

Publish a version and alias

  1. On the flow management screen, choose Publish version. A success banner appears at the top
  2. At the top of the screen, choose Create alias
  3. Enter latest for the Alias name
  4. Choose Use an existing version to associate this alias. From the dropdown menu, choose the version that you just published
  5. Select Create alias. A success banner appears at the top.
  6. Get the FlowId and AliasId to use in the step below
    1. Choose the Alias you just created
    2. From the ARN, copy the FlowId and AliasId

Prompt flow alias

Add your new class to DynamoDB

  1. Open the AWS Management Console and navigate to the DynamoDB service.
  2. Select the table document-processing-bedrock-prompt-flows-IDP_CLASS_LIST
  3. Choose Actions then Create item
  4. Choose JSON view for entering the item data.
  5. Paste the following JSON into the editor:
{
    "class_name": {
        "S": "PAYSTUB"
    },
    "expected_inputs": {
        "S": "Should contain Gross Pay, Net Pay, Pay Date "
    },
    "flow_alias_id": {
        "S": "[Your flow Alias ID]"
    },
    "flow_id": {
        "S": "[Your flow ID]"
    },
    "flow_name": {
        "S": "[The name of your flow]"
    }
}
  1. Review the JSON to ensure all details are correct.
  2. Choose Create item to add the new class to your DynamoDB table.

Test by repeating the upload of the test file

Use the console to repeat the upload of the paystub.jpg file from your customer123 folder into Amazon S3. Alternatively, you can enter the following command into the command line:

aws s3 cp ./sample_files/customer123/paystub.jpeg s3://[INPUT_BUCKET_NAME]/customer123/

In a few minutes, check the report in the output location to see that you successfully added support for the new document type.

Clean up

Use these steps to delete the resources you created to avoid incurring charges on your AWS account:

  1. Empty the SourceS3Bucket and DestinationS3Bucket buckets including all versions
  2. Use the following shell script to delete the CloudFormation stack and test resources from your account:
chmod +x cleanup.sh
./cleanup.sh
  1. Return to the Expand the solution using Amazon Bedrock Prompt Flows section and follow these steps:
    1. In the Create a prompt flow section:
      1. Choose the flow idp_paystub that you created and choose Delete
      2. Follow the instructions to permanently delete the flow
    2. In the Create a new prompt section:
      1. Choose the prompt paystub_json that you created and choose Delete
      2. Follow the instructions to permanently delete the prompt

Conclusion

This solution demonstrates how customers can use Amazon Bedrock Prompt Flows to deploy and expand a scalable, low-code IDP pipeline. By taking advantage of the flexibility of Amazon Bedrock Prompt Flows, organizations can rapidly implement and adapt their document processing workflows to help meet evolving business needs. The low-code nature of Amazon Bedrock Prompt Flows makes it possible for business users and developers alike to create, modify, and extend IDP pipelines without extensive programming knowledge. This significantly reduces the time and resources required to deploy new document processing capabilities or adjust existing ones.

By adopting this integrated IDP solution, businesses across industries can accelerate their digital transformation initiatives, improve operational efficiency, and enhance their ability to extract valuable insights from document-based processes, driving significant competitive advantages.

Review your current manual document processing processes and identify where Amazon Bedrock Prompt Flows can help you automate these workflows for your business.

For further exploration and learning, we recommend checking out the following resources:


About the Authors

Erik Cordsen is a Solutions Architect at AWS serving customers in Georgia. He is passionate about applying cloud technologies and ML to solve real life problems. When he is not designing cloud solutions, Erik enjoys travel, cooking, and cycling.

Vivek Mittal is a Solution Architect at Amazon Web Services. He is passionate about serverless and machine learning technologies. Vivek takes great joy in assisting customers with building innovative solutions on the AWS cloud.

Brijesh Pati is an Enterprise Solutions Architect at AWS. His primary focus is helping enterprise customers adopt cloud technologies for their workloads. He has a background in application development and enterprise architecture and has worked with customers from various industries such as sports, finance, energy, and professional services. His interests include serverless architectures and AI/ML.

Read More

Governing the ML lifecycle at scale: Centralized observability with Amazon SageMaker and Amazon CloudWatch

Governing the ML lifecycle at scale: Centralized observability with Amazon SageMaker and Amazon CloudWatch

This post is part of an ongoing series on governing the machine learning (ML) lifecycle at scale. To start from the beginning, refer to Governing the ML lifecycle at scale, Part 1: A framework for architecting ML workloads using Amazon SageMaker.

A multi-account strategy is essential not only for improving governance but also for enhancing security and control over the resources that support your organization’s business. This approach enables various teams within your organization to experiment, innovate, and integrate more rapidly while keeping the production environment secure and available for your customers. However, because multiple teams might use your ML platform in the cloud, monitoring large ML workloads across a scaling multi-account environment presents challenges in setting up and monitoring telemetry data that is scattered across multiple accounts. In this post, we dive into setting up observability in a multi-account environment with Amazon SageMaker.

Amazon SageMaker Model Monitor allows you to automatically monitor ML models in production, and alerts you when data and model quality issues appear. SageMaker Model Monitor emits per-feature metrics to Amazon CloudWatch, which you can use to set up dashboards and alerts. You can use cross-account observability in CloudWatch to search, analyze, and correlate cross-account telemetry data stored in CloudWatch such as metrics, logs, and traces from one centralized account. You can now set up a central observability AWS account and connect your other accounts as sources. Then you can search, audit, and analyze logs across your applications to drill down into operational issues in a matter of seconds. You can discover and visualize operational and model metrics from many accounts in a single place and create alarms that evaluate metrics belonging to other accounts.

AWS CloudTrail is also essential for maintaining security and compliance in your AWS environment by providing a comprehensive log of all API calls and actions taken across your AWS account, enabling you to track changes, monitor user activities, and detect suspicious behavior. This post also dives into how you can centralize CloudTrail logging so that you have visibility into user activities within all of your SageMaker environments.

Solution overview

Customers often struggle with monitoring their ML workloads across multiple AWS accounts, because each account manages its own metrics, resulting in data silos and limited visibility. ML models across different accounts need real-time monitoring for performance and drift detection, with key metrics like accuracy, CPU utilization, and AUC scores tracked to maintain model reliability.

To solve this, we implement a solution that uses SageMaker Model Monitor and CloudWatch cross-account observability. This approach enables centralized monitoring and governance, allowing your ML team to gain comprehensive insights into logs and performance metrics across all accounts. With this unified view, your team can effectively monitor and manage their ML workloads, improving operational efficiency.

Implementing the solution consists of the following steps:

  1. Deploy the model and set up SageMaker Model Monitor.
  2. Enable CloudWatch cross-account observability.
  3. Consolidate metrics across source accounts and build unified dashboards.
  4. Configure centralized logging to API calls across multiple accounts using CloudTrail.

The following architecture diagram showcases the centralized observability solution in a multi-account setup. We deploy ML models across two AWS environments, production and test, which serve as our source accounts. We use SageMaker Model Monitor to assess these models’ performance. Additionally, we enhance centralized management and oversight by using cross-account observability in CloudWatch to aggregate metrics from the ML workloads in these source accounts into the observability account.

Deploy the model and set up SageMaker Model Monitor

We deploy an XGBoost classifier model, trained on publicly available banking marketing data, to identify potential customers likely to subscribe to term deposits. This model is deployed in both production and test source accounts, where its real-time performance is continually validated against baseline metrics using SageMaker Model Monitor to detect deviations in model performance. Additionally, we use CloudWatch to centralize and share the data and performance metrics of these ML workloads in the observability account, providing a comprehensive view across different accounts. You can find the full source code for this post in the accompanying GitHub repo.

The first step is to deploy the model to an SageMaker endpoint with data capture enabled:

endpoint_name = f"BankMarketingTarget-endpoint-{datetime.utcnow():%Y-%m-%d-%H%M}"
print("EndpointName =", endpoint_name)

data_capture_config = DataCaptureConfig(
enable_capture=True, sampling_percentage=100, destination_s3_uri=s3_capture_upload_path)

model.deploy(
initial_instance_count=1,
instance_type="ml.m4.xlarge",
endpoint_name=endpoint_name,
data_capture_config=data_capture_config,)

For real-time model performance evaluation, it’s essential to establish a baseline. This baseline is created by invoking the endpoint with validation data. We use SageMaker Model Monitor to perform baseline analysis, compute performance metrics, and propose quality constraints for effective real-time performance evaluation.

Next, we define the model quality monitoring object and run the model quality monitoring baseline job. The model monitor automatically generates baseline statistics and constraints based on the provided validation data. The monitoring job evaluates the model’s predictions against ground truth labels to make sure the model maintains its performance over time.

Banking_Quality_Monitor = ModelQualityMonitor(
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    volume_size_in_gb=20,
    max_runtime_in_seconds=1800,
    sagemaker_session=session,
)
job = Banking_Quality_Monitor.suggest_baseline(
    job_name=baseline_job_name,
    baseline_dataset=baseline_dataset_uri,
    dataset_format=DatasetFormat.csv(header=True),
    output_s3_uri=baseline_results_uri,
    problem_type="BinaryClassification",
    inference_attribute="prediction",
    probability_attribute="probability",
    ground_truth_attribute="label",
)
job.wait(logs=False)

In addition to the generated baseline, SageMaker Model Monitor requires two additional inputs: predictions from the deployed model endpoint and ground truth data provided by the model-consuming application. Because data capture is enabled on the endpoint, we first generate traffic to make sure prediction data is captured. When listing the data capture files stored, you should expect to see various files from different time periods, organized based on the hour in which the invocation occurred. When viewing the contents of a single file, you will notice the following details. The inferenceId attribute is set as part of the invoke_endpoint call. When ingesting ground truth labels and merging them with predictions for performance metrics, SageMaker Model Monitor uses inferenceId, which is included in captured data records. It’s used to merge these captured records with ground truth records, making sure the inferenceId in both datasets matches. If inferenceId is absent, it uses the eventId from captured data to correlate with the ground truth record.

{
"captureData": {
"endpointInput": {
"observedContentType": "text/csv",
"mode": "INPUT",
"data": "162,1,0.1,25,1.4,94.465,-41.8,4.961,0.2,0.3,0.4,0.5,0.6,0.7,0.8,1.1,0.9,0.10,0.11,0.12,0.13,0.14,0.15,1.2,0.16,0.17,0.18,0.19,0.20,1.3",
"encoding": "CSV"
},
"endpointOutput": {
"observedContentType": "text/csv; charset=utf-8",
"mode": "OUTPUT",
"data": "0.000508524535689503",
"encoding": "CSV"
}
},
"eventMetadata": {
"eventId": "527cfbb1-d945-4de8-8155-a570894493ca",
"inferenceId": "0",
"inferenceTime": "2024-08-18T20:25:54Z"
},
"eventVersion": "0"
}

SageMaker Model Monitor ingests ground truth data collected periodically and merges it with prediction data to calculate performance metrics. This monitoring process uses baseline constraints from the initial setup to continuously assess the model’s performance. By enabling enable_cloudwatch_metrics=True, SageMaker Model Monitor uses CloudWatch to monitor the quality and performance of our ML models, thereby emitting these performance metrics to CloudWatch for comprehensive tracking.

from sagemaker.model_monitor import CronExpressionGenerator

response = Banking_Quality_Monitor.create_monitoring_schedule(
monitor_schedule_name=Banking_monitor_schedule_name,
endpoint_input=endpointInput,
output_s3_uri=baseline_results_uri,
problem_type="BinaryClassification",
ground_truth_input=ground_truth_upload_path,
constraints=baseline_job.suggested_constraints(),
schedule_cron_expression=CronExpressionGenerator.hourly(),
enable_cloudwatch_metrics=True,
)

Each time the model quality monitoring job runs, it begins with a merge job that combines two datasets: the inference data captured at the endpoint and the ground truth data provided by the application. This is followed by a monitoring job that assesses the data for insights into model performance using the baseline setup.

Waiting for execution to finish......................................................!
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job status: Completed
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job exit message, if any: None
groundtruth-merge-202408182100-7460007b77e6223a3f739740 job failure reason, if any: None
Waiting for execution to finish......................................................!
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job status: Completed
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job exit message, if any: CompletedWithViolations: Job completed successfully with 8 violations.
model-quality-monitoring-202408182100-7460007b77e6223a3f739740 job failure reason, if any: None
Execution status is: CompletedWithViolations
{'MonitoringScheduleName': 'BankMarketingTarget-monitoring-schedule-2024-08-18-2029', 'ScheduledTime': datetime.datetime(2024, 8, 18, 21, 0, tzinfo=tzlocal()), 'CreationTime': datetime.datetime(2024, 8, 18, 21, 2, 21, 198000, tzinfo=tzlocal()), 'LastModifiedTime': datetime.datetime(2024, 8, 18, 21, 12, 53, 253000, tzinfo=tzlocal()), 'MonitoringExecutionStatus': 'CompletedWithViolations', 'ProcessingJobArn': 'arn:aws:sagemaker:us-west-2:730335512115:processing-job/model-quality-monitoring-202408182100-7460007b77e6223a3f739740', 'EndpointName': 'BankMarketingTarget-endpoint-2024-08-18-1958'}
====STOP====
No completed executions to inspect further. Please wait till an execution completes or investigate previously reported failures

Check for deviations from the baseline constraints to effectively set appropriate thresholds in your monitoring process. As you can see in the following the screenshot, various metrics such as AUC, accuracy, recall, and F2 score are closely monitored, each subject to specific threshold checks like LessThanThreshold or GreaterThanThreshold. By actively monitoring these metrics, you can detect significant deviations and make informed decisions promptly, making sure your ML models perform optimally within established parameters.

Enable CloudWatch cross-account observability

With CloudWatch integrated into SageMaker Model Monitor to track the metrics of ML workloads running in the source accounts (production and test), the next step involves enabling CloudWatch cross-account observability. CloudWatch cross-account observability allows you to monitor and troubleshoot applications spanning multiple AWS accounts within an AWS Region. This feature enables seamless searching, visualization, and analysis of metrics, logs, traces, and Application Insights across linked accounts, eliminating account boundaries. You can use this feature to consolidate CloudWatch metrics from these source accounts into the observability account.

To achieve this centralized governance and monitoring, we establish two types of accounts:

  • Observability account – This central AWS account aggregates and interacts with ML workload metrics from the source accounts
  • Source accounts (production and test) – These individual AWS accounts share their ML workload metrics and logging resources with the central observability account, enabling centralized oversight and analysis

Configure the observability account

Complete the following steps to configure the observability account:

  1. On the CloudWatch console of the observability account, choose Settings in the navigation pane.
  2. In the Monitoring account configuration section, choose Configure.

  1. Select which telemetry data can be shared with the observability account.

  1. Under List source accounts, enter the source accounts that will share data with the observability account.

To link the source accounts, you can use account IDs, organization IDs, or organization paths. You can use an organization ID to include all accounts within the organization, or an organization path can target all accounts within a specific department or business unit. In this case, because we have two source accounts to link, we enter the account IDs of those two accounts.

  1. Choose Configure.

After the setup is complete, the message “Monitoring account enabled” appears in the CloudWatch settings.

Additionally, your source accounts are listed on the Configuration policy tab.

Link source accounts

Now that the observability account has been enabled with source accounts, you can link these source accounts within an AWS organization. You can choose from two methods:

  • For organizations using AWS CloudFormation, you can download a CloudFormation template and deploy it in a CloudFormation delegated administration account. This method facilitates the bulk addition of source accounts.
  • For linking individual accounts, two options are available:
    • Download a CloudFormation template that can be deployed directly within each source account.
    • Copy a provided URL, which simplifies the setup process using the AWS Management Console.

Complete the following steps to use the provided URL:

  1. Copy the URL and open it in a new browser window where you’re logged in as the source account.

  1. Configure the telemetry data you want to share. This can include logs, metrics, traces, Application Insights, or Internet Monitor.

During this process, you’ll notice that the Amazon Resource Name (ARN) of the observability account configuration is automatically filled in. This convenience is due to copying and pasting the URL provided in the earlier step. If, however, you choose not to use the URL, you can manually enter the ARN. Copy the ARN from the observability account settings and enter it into the designated field in the source account configuration page.

  1. Define the label that identifies your source accounts. This label is crucial for organizing and distinguishing your accounts within the monitoring system.
  1. Choose Link to finalize the connection between your source accounts and the observability account.

  1. Repeat these steps for both source accounts.

You should see those accounts listed on the Linked source accounts tab within the observability account CloudWatch settings configuration.

Consolidate metrics across source accounts and build unified dashboards

In the observability account, you can access and monitor detailed metrics related to your ML workloads and endpoints deployed across the source accounts. This centralized view allows you to track a variety of metrics, including those from SageMaker endpoints and processing jobs, all within a single interface.

The following screenshot displays CloudWatch model metrics for endpoints in your source accounts. Because you linked the production and test source accounts using the label as the account name, CloudWatch categorizes metrics by account label, effectively distinguishing between the production and test environments. It organizes key details into columns, including account labels, metric names, endpoints, and performance metrics like accuracy and AUC, all captured by scheduled monitoring jobs. These metrics offer valuable insights into the performance of your models across these environments.

The observability account allows you to monitor key metrics of ML workloads and endpoints. The following screenshots display CPU utilization metrics associated with the BankMarketingTarget model and BankMarketing model endpoints you deployed in the source accounts. This view provides detailed insights into critical performance indicators, including:

  • CPU utilization
  • Memory utilization
  • Disk utilization

Furthermore, you can create dashboards that offer a consolidated view of key metrics related to your ML workloads running across the linked source accounts. These centralized dashboards are pivotal for overseeing the performance, reliability, and quality of your ML models on a large scale.

Let’s look at a consolidated view of the ML workload metrics running in our production and test source accounts. This dashboard provides us with immediate access to critical information:

  • AUC scores – Indicating model performance, giving insights into the trade-off between true positives and false positives
  • Accuracy rates – Showing prediction correctness, which helps in assessing the overall reliability of the model
  • F2 scores – Offering a balance between precision and recall, particularly valuable when false negatives are more critical to minimize
  • Total number of violations – Highlighting any breaches in predefined thresholds or constraints, making sure the model adheres to expected behavior
  • CPU usage levels – Helping you manage resource allocation by monitoring the processing power utilized by the ML workloads
  • Disk utilization percentages – Providing efficient storage management by keeping track of how much disk space is being consumed

This following screenshots show CloudWatch dashboards for the models deployed in our production and test source accounts. We track metrics for accuracy, AUC, CPU and disk utilization, and violation counts, providing insights into model performance and resource usage.

You can configure CloudWatch alarms to proactively monitor and receive notifications on critical ML workload metrics from your source accounts. The following screenshot shows an alarm configured to track the accuracy of our bank marketing prediction model in the production account. This alarm is set to trigger if the model’s accuracy falls below a specified threshold, so any significant degradation in performance is promptly detected and addressed. By using such alarms, you can maintain high standards of model performance and quickly respond to potential issues within your ML infrastructure.

You can also create a comprehensive CloudWatch dashboard for monitoring various aspects of Amazon SageMaker Studio, including the number of domains, apps, and user profiles across different AWS accounts. The following screenshot illustrates a dashboard that centralizes key metrics from the production and test source accounts.

Configure centralized logging of API calls across multiple accounts with CloudTrail

If AWS Control Tower has been configured to automatically create an organization-wide trail, each account will send a copy of its CloudTrail event trail to a centralized Amazon Simple Storage Service (Amazon S3) bucket. This bucket is typically created in the log archive account and is configured with limited access, where it serves as a single source of truth for security personnel. If you want to set up a separate account to allow the ML admin team to have access, you can configure replication from the log archive account. You can create the destination bucket in the observability account.

After you create the bucket for replicated logs, you can configure Amazon S3 replication by defining the source and destination bucket, and attaching the required AWS Identity and Access Management (IAM) permissions. Then you update the destination bucket policy to allow replication.

Complete the following steps:

  1. Create an S3 bucket in the observability account.
  2. Log in to the log archive account.
  3. On the Amazon S3 console, open the Control Tower logs bucket, which will have the format aws-controltower-logs-{ACCOUNT-ID}-{REGION}.

You should see an existing key that corresponds to your organization ID. The trail logs are stored under /{ORG-ID}/AWSLogs/{ACCOUNT-ID}/CloudTrail/{REGION}/YYYY/MM/DD.

  1. On the Management tab, choose Create replication rule.
  2. For Replication rule name, enter a name, such as replicate-ml-workloads-to-observability.
  3. Under Source bucket, select Limit the scope of the rule using one or more filters, and enter a path the corresponds to the account you want to enable querying against.

  1. Select Specify a bucket in another account and enter the observability account ID and the bucket name.
  2. Select Change object ownership to destination bucket owner.
  3. For IAM role, choose Create new role.

After you set the cross-account replication, the logs being stored in the S3 bucket in the log archive account will be replicated in the observability account. You can now use Amazon Athena to query and analyze the data being stored in Amazon S3. If you don’t have Control Tower configured, you have to manually configure CloudTrail in each account to write to the S3 bucket in the centralized observability account for analysis. If your organization has more stringent security and compliance requirements, you can configure replication of just the SageMaker logs from the log archive account to the bucket in the observability account by integrating Amazon S3 Event Notifications with AWS Lambda functions.

The following is a sample query run against the logs stored in the observability account bucket and the associated result in Athena:

SELECT useridentity.arn, useridentity.sessioncontext.sourceidentity, requestparametersFROM observability_replicated_logs
WHERE eventname = 'CreateEndpoint'
AND eventsource = 'sagemaker.amazonaws.com'

Conclusion

Centralized observability in a multi-account setup empowers organizations to manage ML workloads at scale. By integrating SageMaker Model Monitor with cross-account observability in CloudWatch, you can build a robust framework for real-time monitoring and governance across multiple environments.

This architecture not only provides continuous oversight of model performance, but also significantly enhances your ability to quickly identify and resolve potential issues, thereby improving governance and security throughout our ML ecosystem.

In this post, we outlined the essential steps for implementing centralized observability within your AWS environment, from setting up SageMaker Model Monitor to using cross-account features in CloudWatch. We also demonstrated centralizing CloudTrail logs by replicating them from the log archive account and querying them using Athena to get insights into user activity within SageMaker environments across the organization.

As you implement this solution, remember that achieving optimal observability is an ongoing process. Continually refining and expanding your monitoring capabilities is crucial to making sure your ML models remain reliable, efficient, and aligned with business objectives. As ML practices evolve, blending cutting-edge technology with sound governance principles is key. Run the code yourself using the following notebook or try out the observability module in the following workshop.


About the Authors

Abhishek Doppalapudi is a Solutions Architect at Amazon Web Services (AWS), where he assists startups in building and scaling their products using AWS services. Currently, he is focused on helping AWS customers adopt Generative AI solutions. In his free time, Abhishek enjoys playing soccer, watching Premier League matches, and reading.

Venu Kanamatareddy is a Startup Solutions Architect at AWS. He brings 16 years of extensive IT experience working with both Fortune 100 companies and startups. Currently, Venu is helping guide and assist Machine Learning and Artificial Intelligence-based startups to innovate, scale, and succeed.

Vivek Gangasani is a Senior GenAI Specialist Solutions Architect at AWS. He helps emerging GenAI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 3 decades of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure, scalable, reliable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides motorcycle and walks with his three-year old sheep-a-doodle!

Read More

Computational Bottlenecks of Training Small-Scale Large Language Models

This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) workshop at NeurIPS Workshop 2024.
While large language models (LLMs) dominate the AI landscape, Small-scale large Language Models (SLMs) are gaining attention due to cost and efficiency demands from consumers. However, there is limited research on the training behavior and computational requirements of SLMs. In this study, we explore the computational bottlenecks of training SLMs (up to 2B parameters) by examining the effects of various hyperparameters and configurations, including GPU type, batch size…Apple Machine Learning Research

ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models

The rapid evolution of Large Language Models (LLMs) and conversational assistants necessitates dynamic, scalable, and configurable conversational datasets for training and evaluation. These datasets must accommodate diverse user interaction modes, including text and voice, each presenting unique modeling challenges. Knowledge Graphs (KGs), with their structured and evolving nature, offer an ideal foundation for current and precise knowledge. Although human-curated KG-based conversational datasets exist, they struggle to keep pace with the rapidly changing user information needs. We present…Apple Machine Learning Research

What's new in TensorFlow 2.18

What’s new in TensorFlow 2.18

Posted by the TensorFlow team

TensorFlow 2.18 has been released! Highlights of this release (and 2.17) include NumPy 2.0, LiteRT repository, CUDA Update, Hermetic CUDA and more. For the full release notes, please click here.

Note: Release updates on the new multi-backend Keras will be published on keras.io, starting with Keras 3.0. For more information, please see https://keras.io/keras_3/.

TensorFlow Core

NumPy 2.0

The upcoming TensorFlow 2.18 release will include support for NumPy 2.0. While the majority of TensorFlow APIs will function seamlessly with NumPy 2.0, this may break some edge cases of usage, e.g., out-of-boundary conversion errors and numpy scalar representation errors. You can consult the following common solutions.

Note that NumPy’s type promotion rules have been changed (See NEP 50 for details). This may change the precision at which computations happen, leading either to type errors or to numerical changes to results. Please see the NumPy 2 migration guide.

We’ve updated some TensorFlow tensor APIs to maintain compatibility with NumPy 2.0 while preserving the out-of-boundary conversion behavior in NumPy 1.x.

LiteRT Repository

We’re making some changes to how LiteRT (formerly known as TFLite) is developed. Over the coming months, we’ll be gradually transitioning TFLite’s codebase to LiteRT. Once the migration is complete, we’ll start accepting contributions directly through the LiteRT repository. There will no longer be any binary TFLite releases and developers should switch to LiteRT for the latest updates.

Hermetic CUDA

If you build TensorFlow from source, Bazel will now download specific versions of CUDA, CUDNN and NCCL distributions, and then use those tools as dependencies in various Bazel targets. This enables more reproducible builds for Google ML projects and supported CUDA versions because the build no longer relies on the locally installed versions. More details are provided here.

CUDA Update

TensorFlow binary distributions now ship with dedicated CUDA kernels for GPUs with a compute capability of 8.9. This improves the performance on the popular Ada-Generation GPUs like NVIDIA RTX 40**, L4 and L40.

To keep Python wheel sizes in check, we made the decision to no longer ship CUDA kernels for compute capability 5.0. That means the oldest NVIDIA GPU generation supported by the precompiled Python packages is now the Pascal generation (compute capability 6.0). For Maxwell support, we either recommend sticking with TensorFlow version 2.16, or compiling TensorFlow from source. The latter will be possible as long as the used CUDA version still supports Maxwell GPUs.

Read More

Import data from Google Cloud Platform BigQuery for no-code machine learning with Amazon SageMaker Canvas

Import data from Google Cloud Platform BigQuery for no-code machine learning with Amazon SageMaker Canvas

In the modern, cloud-centric business landscape, data is often scattered across numerous clouds and on-site systems. This fragmentation can complicate efforts by organizations to consolidate and analyze data for their machine learning (ML) initiatives.

This post presents an architectural approach to extract data from different cloud environments, such as Google Cloud Platform (GCP) BigQuery, without the need for data movement. This minimizes the complexity and overhead associated with moving data between cloud environments, enabling organizations to access and utilize their disparate data assets for ML projects.

We highlight the process of using Amazon Athena Federated Query to extract data from GCP BigQuery, using Amazon SageMaker Data Wrangler to perform data preparation, and then using the prepared data to build ML models within Amazon SageMaker Canvas, a no-code ML interface.

SageMaker Canvas allows business analysts to access and import data from over 50 sources, prepare data using natural language and over 300 built-in transforms, build and train highly accurate models, generate predictions, and deploy models to production without requiring coding or extensive ML experience.

Solution overview

The solution outlines two main steps:

  • Set up Amazon Athena for federated queries from GCP BigQuery, which enables running live queries in GCP BigQuery directly from Athena
  • Import the data into SageMaker Canvas from BigQuery using Athena as an intermediate

After the data is imported into SageMaker Canvas, you can use the no-code interface to build ML models and generate predictions based on the imported data.

You can use SageMaker Canvas to build the initial data preparation routine and generate accurate predictions without writing code. However, as your ML needs evolve or require more advanced customization, you may want to transition from a no-code environment to a code-first approach. The integration between SageMaker Canvas and Amazon SageMaker Studio allows you to operationalize the data preparation routine for production-scale deployments. For more details, refer to Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio

The overall architecture, as seen below, demonstrates how to use AWS services to seamlessly access and integrate data from a GCP BigQuery data warehouse into SageMaker Canvas for building and deploying ML models.

Solution Architecture Diagram

The workflow includes the following steps:

  1. Within the SageMaker Canvas interface, the user composes a SQL query to run against the GCP BigQuery data warehouse. SageMaker Canvas relays this query to Athena, which acts as an intermediary service, facilitating the communication between SageMaker Canvas and BigQuery.
  2. Athena uses the Athena Google BigQuery connector, which uses a pre-built AWS Lambda function to enable Athena federated query capabilities. This Lambda function retrieves the necessary BigQuery credentials (service account private key) from AWS Secrets Manager for authentication purposes.
  3. After authentication, the Lambda function uses the retrieved credentials to query BigQuery and obtain the desired result set. It parses this result set and sends it back to Athena.
  4. Athena returns the queried data from BigQuery to SageMaker Canvas, where you can use it for ML model training and development purposes within the no-code interface.

This solution offers the following benefits:

  • Seamless integration – SageMaker Canvas empowers you to integrate and use data from various sources, including cloud data warehouses like BigQuery, directly within its no-code ML environment. This integration eliminates the need for additional data movement or complex integrations, enabling you to focus on building and deploying ML models without the overhead of data engineering tasks.
  • Secure access – The use of Secrets Manager makes sure BigQuery credentials are securely stored and accessed, enhancing the overall security of the solution.
  • Scalability – The serverless nature of the Lambda function and the ability in Athena to handle large datasets make this solution scalable and able to accommodate growing data volumes. Additionally, you can use multiple queries to partition the data to source in parallel.

In the next sections, we dive deeper into the technical implementation details and walk through a step-by-step demonstration of this solution.

Dataset

The steps outlined in this post provide an example of how to import data into SageMaker Canvas for no-code ML. In this example, we demonstrate how to import data through Athena from GCP BigQuery.

For our dataset, we use a synthetic dataset from a telecommunications mobile phone carrier. This sample dataset contains 5,000 records, where each record uses 21 attributes to describe the customer profile. The Churn column in the dataset indicates whether the customer left service (true/false). This Churn attribute is the target variable that the ML model should aim to predict.

The following screenshot shows an example of the dataset on the BigQuery console.

Example Dataset in BigQuery Console

Prerequisites

Complete the following prerequisite steps:

  1. Create a service account in GCP and a service account key.
  2. Download the private key JSON file.
  3. Store the JSON file in Secrets Manager:
    1. On the Secrets Manager console, choose Secrets in the navigation pane, then choose Store a new secret.
    2. For Secret type¸ select Other type of secret.
    3. Copy the contents of the JSON file and enter it under Key/value pairs on the Plaintext tab.

AWS Secret Manager Setup

  1. If you don’t have a SageMaker domain already created, create it along with the user profile. For instructions, see Quick setup to Amazon SageMaker.
  2. Make sure the user profile has permission to invoke Athena by confirming that the AWS Identity and Access Management (IAM) role has glue:GetDatabase and athena:GetDataCatalog permission on the resource. See the following example:
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "VisualEditor0",
    "Effect": "Allow",
    "Action": [
    "glue:GetDatabase",
    "athena:GetDataCatalog"
    ],
    "Resource": [
    "arn:aws:glue:*:<AWS account id>:catalog",
    "arn:aws:glue:*:<AWS account id>:database/*",
    "arn:aws:athena:*:<AWS account id>:datacatalog/*"
    ]
    }
    ]
    }

Register the Athena data source connector

Complete the following steps to set up the Athena data source connector:

  1. On the Athena console, choose Data sources in the navigation pane.
  2. Choose Create data source.
  3. On the Choose a data source page, search for and select Google BigQuery, then choose Next.

Select BigQuery as Datasource on Amazon Athena

  1. On the Enter data source details page, provide the following information:
    1. For Data source name¸ enter a name.
    2. For Description, enter an optional description.
    3. For Lambda function, choose Create Lambda function to configure the connection.

Provide Data Source Details

  1. Under Application settings¸ enter the following details:
    1. For SpillBucket, enter the name of the bucket where the function can spill data.
    2. For GCPProjectID, enter the project ID within GCP.
    3. For LambdaFunctionName, enter the name of the Lambda function that you’re creating.
    4. For SecretNamePrefix, enter the secret name stored in Secrets Manager that contains GCP credentials.

Application settings for data source connector

Application settings for data source connector

  1. Choose Deploy.

You’re returned to the Enter data source details page.

  1. In the Connection details section, choose the refresh icon under Lambda function.
  2. Choose the Lambda function you just created. The ARN of the Lambda function is displayed.
  3. Optionally, for Tags, add key-value pairs to associate with this data source.

For more information about tags, see Tagging Athena resources.

Lambda function connection details

  1. Choose Next.
  2. On the Review and create page, review the data source details, then choose Create data source.

The Data source details section of the page for your data source shows information about your new connector. You can now use the connector in your Athena queries. For information about using data connectors in queries, see Running federated queries.

To query from Athena, launch the Athena SQL editor and choose the data source you created. You should be able to run live queries against the BigQuery database.

Athena Query Editor

Connect to SageMaker Canvas with Athena as a data source

To import data from Athena, complete the following steps:

  1. On the SageMaker Canvas console, choose Data Wrangler in the navigation pane.
  2. Choose Import data and prepare.
  3. Select the Tabular
  4. Choose Athena as the data source.

SageMaker Data Wrangler in SageMaker Canvas allows you to prepare, featurize, and analyze your data. You can integrate a SageMaker Data Wrangler data preparation flow into your ML workflows to simplify and streamline data preprocessing and feature engineering using little to no coding.

  1. Choose an Athena table in the left pane from AwsDataCatalog and drag and drop the table into the right pane.

SageMaker Data Wrangler Select Athena Table

  1. Choose Edit in SQL and enter the following SQL query:
SELECT 
state,
account_length,
area_code,
phone,
intl_plan,
vmail_plan,vmail_message,day_mins,
day_calls,
day_charge,
eve_mins,
eve_calls,
eve_charge,
night_mins,
night_calls,
night_charge,
intl_mins,
intl_calls,
intl_charge,
custserv_calls,
churn FROM "bigquery"."athenabigquery"."customer_churn" order by random() limit 50 ;

In the preceding query, bigquery is the data source name created in Athena, athenabigquery is the database name, and customer_churn is the table name.

  1. Choose Run SQL to preview the dataset and when you’re satisfied with the data, choose Import.

Run SQL to preview the dataset

When working with ML, it’s crucial to randomize or shuffle the dataset. This step is essential because you may have access to millions or billions of data points, but you don’t necessarily need to use the entire dataset for training the model. Instead, you can limit the data to a smaller subset specifically for training purposes. After you’ve shuffled and prepared the data, you can begin the iterative process of data preparation, feature evaluation, model training, and ultimately hosting the trained model.

  1. You can process or export your data to a location that is suitable for your ML workflows. For example, you can export the transformed data as a SageMaker Canvas dataset and create an ML model from it.
  2. After you export your data, choose Create model to create an ML model from your data.

Create Model Option

The data is imported into SageMaker Canvas as a dataset from the specific table in Athena. You can now use this dataset to create a model.

Train a model

After your data is imported, it shows up on the Datasets page in SageMaker Canvas. At this stage, you can build a model. To do so, complete the following steps:

  1. Select your dataset and choose Create a model.

Create model from SageMaker Datasets menu option

  1. For Model name, enter your model name (for this post, my_first_model).

SageMaker Canvas enables you to create models for predictive analysis, image analysis, and text analysis.

  1. Because we want to categorize customers, select Predictive analysis for Problem type.
  2. Choose Create.

Create predictive analysis model

On the Build page, you can see statistics about your dataset, such as the percentage of missing values and mode of the data.

  1. For Target column, choose a column that you want to predict (for this post, churn).

SageMaker Canvas offers two types of models that can generate predictions. Quick build prioritizes speed over accuracy, providing a model in 2–15 minutes. Standard build prioritizes accuracy over speed, providing a model in 30 minutes–2 hours.

  1. For this example, choose Quick build.

Model quick build

After the model is trained, you can analyze the model accuracy.

The Overview tab shows us the column impact, or the estimated importance of each column in predicting the target column. In this example, the Night_calls column has the most significant impact in predicting if a customer will churn. This information can help the marketing team gain insights that lead to taking actions to reduce customer churn. For example, we can see that both low and high CustServ_Calls increase the likelihood of churn. The marketing team can take actions to help prevent customer churn based on these learnings. Examples include creating a detailed FAQ on websites to reduce customer service calls, and running education campaigns with customers on the FAQ that can keep engagement up.

Model outcome & results

Generate predictions

On the Predict tab, you can generate both batch predictions and single predictions. Complete the following steps to generate a batch prediction:

  1. Download the following sample inference dataset for generating predictions.
  2. To test batch predictions, choose Batch prediction.

SageMaker Canvas allows you to generate batch predictions either manually or automatically on a schedule. To learn how to automate batch predictions on a schedule, refer to Manage automations.

  1. For this post, choose Manual.
  2. Upload the file you downloaded.
  3. Choose Generate predictions.

After a few seconds, the prediction is complete, and you can choose View to see the prediction.

View generated predictions

Optionally, choose Download to download a CSV file containing the full output. SageMaker Canvas will return a prediction for each row of data and the probability of the prediction being correct.

Download CSV Output

Optionally, you can deploy your models to an endpoint to make predictions. For more information, refer to Deploy your models to an endpoint.

Clean up

To avoid future charges, log out of SageMaker Canvas.

Conclusion

In this post, we showcased a solution to extract the data from BigQuery using Athena federated queries and a sample dataset. We then used the extracted data to build an ML model using SageMaker Canvas to predict customers at risk of churning—without writing code. SageMaker Canvas enables business analysts to build and deploy ML models effortlessly through its no-code interface, democratizing ML across the organization. This enables you to harness the power of advanced analytics and ML to drive business insights and innovation, without the need for specialized technical skills.

For more information, see Query any data source with Amazon Athena’s new federated query and Import data from over 40 data sources for no-code machine learning with Amazon SageMaker Canvas. If you’re new to SageMaker Canvas, refer to Build, Share, Deploy: how business analysts and data scientists achieve faster time-to-market using no-code ML and Amazon SageMaker Canvas.


About the authors

Amit Gautam is an AWS senior solutions architect supporting enterprise customers in the UK on their cloud journeys, providing them with architectural advice and guidance that helps them achieve their business outcomes.

Sujata Singh is an AWS senior solutions architect supporting enterprise customers in the UK on their cloud journeys, providing them with architectural advice and guidance that helps them achieve their business outcomes.

Read More

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Customized model monitoring for near real-time batch inference with Amazon SageMaker

Real-world applications vary in inference requirements for their artificial intelligence and machine learning (AI/ML) solutions to optimize performance and reduce costs. Examples include financial systems processing transaction data streams, recommendation engines processing user activity data, and computer vision models processing video frames. In these scenarios, customized model monitoring for near real-time batch inference with Amazon SageMaker is essential, making sure the quality of predictions is continuously monitored and any deviations are promptly detected.

In this post, we present a framework to customize the use of Amazon SageMaker Model Monitor for handling multi-payload inference requests for near real-time inference scenarios. SageMaker Model Monitor monitors the quality of SageMaker ML models in production. Early and proactive detection of deviations in model quality enables you to take corrective actions, such as retraining models, auditing upstream systems, or fixing quality issues without having to monitor models manually or build additional tooling. SageMaker Model Monitor provides monitoring capabilities for data quality, model quality, bias drift in a model’s predictions, and drift in feature attribution. SageMaker Model Monitor adapts well to common AI/ML use cases and provides advanced capabilities given edge case requirements such as monitoring custom metrics, handling ground truth data, or processing inference data capture.

You can deploy your ML model to SageMaker hosting services and get a SageMaker endpoint for real-time inference. Your client applications invoke this endpoint to get inferences from the model. To reduce the number of invocations and meet custom business objectives, AI/ML developers can customize inference code to send multiple inference records in one payload to the endpoint for near real-time model predictions. Rather than using a SageMaker Model Monitoring schedule with native configurations, a SageMaker Model Monitor Bring Your Own Container (BYOC) approach meets these custom requirements. Although this advanced BYOC topic can appear overwhelming to AI/ML developers, with the right framework, there is opportunity to accelerate SageMaker Model Monitor BYOC development for customized model monitoring requirements.

In this post, we provide a BYOC framework with SageMaker Model Monitor to enable customized payload handling (such as multi-payload requests) from SageMaker endpoint data capture, use ground truth data, and output custom business metrics for model quality.

Overview of solution

SageMaker Model Monitor uses a SageMaker pre-built image using Spark Deequ, which accelerates the usage of model monitoring. Using this pre-built image occasionally becomes problematic when customization is required. For example, the pre-built image requires one inference payload per inference invocation (request to a SageMaker endpoint). However, if you’re sending multiple payloads in one invocation to reduce the number of invocations and setting up model monitoring with SageMaker Model Monitor, then you will need to explore additional capabilities within SageMaker Model Monitor.

A preprocessor script is a capability of SageMaker Model Monitor to preprocess SageMaker endpoint data capture before creating metrics for model quality. However, even with a preprocessor script, you still face a mismatch in the designed behavior of SageMaker Model Monitor, which expects one inference payload per request.

Given these requirements, we create the BYOC framework shown in the following diagram. In this example, we demonstrate setting up a SageMaker Model Monitor job for monitoring model quality.

The workflow includes the following steps:

  1.  Before and after training an AI/ML model, an AI/ML developer creates baseline and validation data that is used downstream for monitoring model quality. For example, users can save the accuracy score of a model, or create custom metrics, to validate model quality.
  2. An AI/ML developer creates a SageMaker endpoint including custom inference scripts. Data capture must be enabled for the SageMaker endpoint to save real-time inference data to Amazon Simple Storage Service (Amazon S3) and support downstream SageMaker Model Monitor.
  3. A user or application sends a request including multiple inference payloads. If you have a large volume of inference records, SageMaker batch transform may be a suitable option for your use case.
  4. The SageMaker endpoint (which includes the custom inference code to preprocesses the multi-payload request) passes the inference data to the ML model, postprocesses the predictions, and sends a response to the user or application. The information pertaining to the request and response is stored in Amazon S3.
  5. Independent of calling the SageMaker endpoint, the user or application generates ground truth for the predictions returned by the SageMaker endpoint.
  6. A customer image (BYOC) is pushed to Amazon Elastic Container Registry (Amazon ECR) that contains code to perform the following actions:
    • Read input and output contracts required for SageMaker Model Monitor.
    • Read ground truth data.
    • Optionally, read any baseline constraint or validation data (such as accuracy score threshold).
    • Process data capture stored in Amazon S3 from the SageMaker endpoint.
    • Compare real-time data with ground truth and create model quality metrics.
    • Publish metrics to Amazon CloudWatch Logs and output a model quality report.
  7. The AI/ML developer creates a SageMaker Model Monitor schedule and sets the custom image (BYOC) as the referable image URI.

This post uses code provided in the following GitHub repo to demonstrate the solution. The process includes the following steps:

  1. Train a multi-classification XGBoost model using the public forest coverage dataset.
  2. Create an inference script for the SageMaker endpoint for custom inference logic.
  3. Create a SageMaker endpoint with data capture enabled.
  4. Create a constraint file that contains metrics used to determine if model quality alerts should be generated.
  5. Create a custom Docker image for SageMaker Model Monitor by using the SageMaker Docker Build CLI and push it to Amazon ECR.
  6. Create a SageMaker Model Monitor schedule with the BYOC image.
  7. View the custom model quality report generated by the SageMaker Model Monitor job.

Prerequisites

To follow along with this walkthrough, make sure you have the following prerequisites:

Train the model

In the SageMaker Studio environment, launch a SageMaker training job to train a multi-classification model and output model artifacts to Amazon S3:


from sagemaker.xgboost.estimator import XGBoost
from sagemaker.estimator import Estimator

hyperparameters = {
    "max_depth": 5,
    "eta": 0.36,
    "gamma": 2.88,
    "min_child_weight": 9.89,
    "subsample": 0.77,
    "objective": "multi:softprob",
    "num_class": 7,
    "num_round": 50
}

xgb_estimator = XGBoost(
    entry_point="./src/train.py",
    hyperparameters=hyperparameters,
    role=role,
    instance_count=1,
    instance_type="ml.m5.2xlarge",
    framework_version="1.5-1",
    output_path=f's3://{bucket}/{prefix_name}/models'
)

xgb_estimator.fit(
    {
        "train": train_data_path,
        "validation": validation_data_path
    },
    wait=True,
    logs=True
)

Create Inference Code

Before you deploy the SageMaker endpoint, create an inference script (inference.py) that contains a function to preprocess the request with multiple payloads, invoke the model, and postprocess results.

For output_fn, a payload index is created for each inference record found in the request. This enables you to merge ground truth records with data capture within the SageMaker Model Monitor job.

See the following code:

def input_fn(input_data, content_type):
    """Take request data and de-serializes the data into an object for prediction.
        When an InvokeEndpoint operation is made against an Endpoint running SageMaker model server,
        the model server receives two pieces of information:
            - The request Content-Type, for example "application/json"
            - The request data, which is at most 5 MB (5 * 1024 * 1024 bytes) in size.
    Args:
        input_data (obj): the request data.
        content_type (str): the request Content-Type.
    Returns:
        (obj): data ready for prediction. For XGBoost, this defaults to DMatrix.
    """
    
    if content_type == "application/json":
        request_json = json.loads(input_data)
        prediction_df = pd.DataFrame.from_dict(request_json)
        return xgb.DMatrix(prediction_df)
    else:
        raise ValueError


def predict_fn(input_data, model):
    """A predict_fn for XGBooost Framework. Calls a model on data deserialized in input_fn.
    Args:
        input_data: input data (DMatrix) for prediction deserialized by input_fn
        model: XGBoost model loaded in memory by model_fn
    Returns: a prediction
    """
    output = model.predict(input_data, validate_features=True)
    return output


def output_fn(prediction, accept):
    """Function responsible to serialize the prediction for the response.
    Args:
        prediction (obj): prediction returned by predict_fn .
        accept (str): accept content-type expected by the client.
    Returns: JSON output
    """
    
    if accept == "application/json":
        prediction_labels = np.argmax(prediction, axis=1)
        prediction_scores = np.max(prediction, axis=1)
        output_returns = [
            {
                "payload_index": int(index), 
                "label": int(label), 
                "score": float(score)} for label, score, index in zip(
                prediction_labels, prediction_scores, range(len(prediction_labels))
            )
        ]
        return worker.Response(encoders.encode(output_returns, accept), mimetype=accept)
    
    else:
        raise ValueError

Deploy the SageMaker endpoint

Now that you have created the inference script, you can create the SageMaker endpoint:


from sagemaker.model_monitor import DataCaptureConfig

predictor = xgb_estimator.deploy(
    instance_type="ml.m5.large",
    initial_instance_count=1,
    wait=True,
    data_capture_config=DataCaptureConfig(
        enable_capture=True,
        sampling_percentage=100,
        destination_s3_uri=f"s3://{bucket}/{prefix_name}/model-monitor/data-capture"
    ),
    source_dir="./src",
    entry_point="inference.py"
)

Create constraints for model quality monitoring

In model quality monitoring, you need to compare your metric generated from ground truth and data capture with a pre-specified threshold. In this example, we use the accuracy value of the trained model on the test set as a threshold. If the newly computed accuracy metric (generated using ground truth and data capture) is lower than this threshold, a violation report will be generated and the metrics will be published to CloudWatch.

See the following code:

constraints_dict = {
    "accuracy":{
        "threshold": accuracy_value
    }
}
    

# Serializing json
json_object = json.dumps(constraints_dict, indent=4)
 
# Writing to sample.json
with open("constraints.json", "w") as outfile:
    outfile.write(json_object)

This contraints.json file is written to Amazon S3 and will be the input for the processing job for the SageMaker Model Monitor job downstream.

Publish the BYOC image to Amazon ECR

Create a script named model_quality_monitoring.py to perform the following functions:

  • Read environment variables and any arguments passed to the SageMaker Model Monitor job
  • Read SageMaker endpoint data capture and constraint metadata configured with the SageMaker Model Monitor job
  • Read ground truth data from Amazon S3 using the AWS SDK for pandas
  • Create accuracy metrics with data capture and ground truth
  • Create metrics and violation reports given constraint violations
  • Publish metrics to CloudWatch if violations are present

This script serves as the entry point for the SageMaker Model Monitor job. With a custom image, the entry point script needs to be specified in the Docker image, as shown in the following code. This way, when the SageMaker Model Monitor job initiates, the specified script is run. The sm-mm-mqm-byoc:1.0 image URI is passed to the image_uri argument when you define the SageMaker Model Monitor job downstream.

FROM 683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-scikit-learn:1.2-1-cpu-py3

RUN python3 -m pip install awswrangler

ENV PYTHONUNBUFFERED=TRUE

ADD ./src/model_quality_monitoring.py /

ENTRYPOINT ["python3", "/model_quality_monitoring.py"]

The custom BYOC image is pushed to Amazon ECR using the SageMaker Docker Build CLI:

sm-docker build . --file ./docker/Dockerfile --repository sm-mm-mqm-byoc:1.0

Create a SageMaker Model Monitor schedule

Next, you use the Amazon SageMaker Python SDK to create a model monitoring schedule. You can define the BYOC ECR image created in the previous section as the image_uri parameter.

You can customize the environment variables and arguments passed to the SageMaker Processing job when SageMaker Model Monitor runs the model quality monitoring job. In this example, the ground truth Amazon S3 URI path is passed as an environment variable and is used within the SageMaker Processing job:


sm_mm_mqm = ModelMonitor(
    role=role, 
    image_uri=f"{account_id}.dkr.ecr.us-east-1.amazonaws.com/sm-mm-mqm-byoc:1.0", 
    instance_count=1, 
    instance_type='ml.m5.xlarge', 
    base_job_name="sm-mm-mqm-byoc",
    sagemaker_session=sess,
    env={
        "ground_truth_s3_uri_path": f"s3://{bucket}/{prefix_name}/model-monitor/mqm/ground_truth/{predictor.endpoint_name}"
    }
)

Before you create the schedule, specify the endpoint name, the Amazon S3 URI output location you want to send violation reports to, the statistics and constraints metadata files (if applicable), and any custom arguments you want to pass to your entry script within your BYOC SageMaker Processing job. In this example, the argument –-create-violation-tests is passed, which creates a mock violation for demonstration purposes. SageMaker Model Monitor accepts the rest of the parameters and translates them into environment variables, which you can use within your custom monitoring job.

sm_mm_mqm.create_monitoring_schedule(
    endpoint_input=predictor.endpoint_name,
    output=MonitoringOutput(
        source="/opt/ml/processing/output",
        destination=f"s3://{bucket}/{prefix_name}/model-monitor/mqm/reports"
    ),
    statistics=f"s3://{bucket}/{prefix_name}/model-monitor/mqm/baseline-data/statistics.json",
    constraints=f"s3://{bucket}/{prefix_name}/model-monitor/mqm/baseline-data/constraints.json",
    monitor_schedule_name="sm-mm-byoc-batch-inf-schedule",
    schedule_cron_expression=CronExpressionGenerator().hourly(),
    arguments=[
        "--create-violation-tests"
    ]
)

Review the entry point script model_quallity_monitoring.py to better understand how to use custom arguments and environment variables provided by the SageMaker Model Monitor job.

Observe the SageMaker Model Monitor job output

Now that the SageMaker Model Monitor resource is created, the SageMaker endpoint is invoked.

In this example, a request is provided that includes a list of two payloads in which we want to collect predictions:

sm_runtime = boto3.client("sagemaker-runtime")

response = sm_runtime.invoke_endpoint(
    EndpointName=predictor.endpoint_name,
    ContentType="application/json",
    Accept="application/json",
    Body=test_records,
    InferenceId="0"
)

InferenceId is passed as an argument to the invoke_endpoint method. This ID is used downstream when merging the ground truth data to the real-time SageMaker endpoint data capture. In this example, we want to collect ground truth with the following structure.

InferenceI payload_index groundTruthLabel
0 0 1
0 1 0

This makes it simpler when merging the ground truth data with real-time data within the SageMaker Model Monitor custom job.

Because we set the CRON schedule for the SageMaker Model Monitor job to an hourly schedule, we can view the results at the end of the hour. In SageMaker Studio Classic, by navigating the SageMaker endpoint details page, you can choose the Monitoring job history tab to view status reports of the SageMaker Model Monitor job.


If an issue is found, you can choose the monitoring job name to review the report.

In this example, the custom model monitoring metric created in the BYOC flagged an accuracy score violation of -1 (this was done purposely for demonstration with the argument --create-violation-tests).

This gives you the ability to monitor model quality violations for your custom SageMaker Model Monitor job within the SageMaker Studio console. If you want to invoke CloudWatch alarms based on published CloudWatch metrics, you must create these CloudWatch metrics with your BYOC job. You can review how this is done within the monitor_quality_monitoring.py script. For automated alerts for model monitoring, creating an Amazon Simple Notification Service (Amazon SNS) topic is recommended, which email user groups will subscribe to for alerts on a given CloudWatch metric alarm.

Clean up

To avoid incurring future charges, delete all resources related to the SageMaker Model Monitor schedule by completing the following steps:

  1. Delete data capture and any ground truth data:
    ! aws s3 rm s3://{bucket}/{prefix_name}/model-monitor/data-capture/{predictor.endpoint_name} --recursive
    ! aws s3 rm s3://{bucket}/{prefix_name}/model-monitor/mqm/ground_truth/{predictor.endpoint_name} --recursive

  2. Delete the monitoring schedule:
    sm_mm_mqm.delete_monitoring_schedule()

  3. Delete the SageMaker model and SageMaker endpoint:
    predictor.delete_model()
    predictor.delete_endpoint()

Conclusion

Custom business or technical requirements for a SageMaker endpoint frequently have an impact on downstream efforts in model monitoring. In this post, we provided a framework that enables you to customize SageMaker Model Monitor jobs (in this case, for monitoring model quality) to handle the use case of passing multiple inference payloads to a SageMaker endpoint.

Explore the provided GitHub repository to implement this customized model monitoring framework with SageMaker Model Monitor. You can use this framework as a starting point to monitor your custom metrics or handle other unique requirements for model quality monitoring in your AI/ML applications.


About the Authors

Joe King is a Sr. Data Scientist at AWS, bringing a breadth of data science, ML engineering, MLOps, and AI/ML architecting to help businesses create scalable solutions on AWS.

Ajay Raghunathan is a Machine Learning Engineer at AWS. His current work focuses on architecting and implementing ML solutions at scale. He is a technology enthusiast and a builder with a core area of interest in AI/ML, data analytics, serverless, and DevOps. Outside of work, he enjoys spending time with family, traveling, and playing football.

Raju Patil is a Sr. Data Scientist with AWS Professional Services. He architects, builds, and deploys AI/ML solutions to help AWS customers across different verticals overcome business challenges in a variety of AI/ML use cases.

Read More

Fintech Leaders Tap Generative AI for Safer, Faster, More Accurate Financial Services

Fintech Leaders Tap Generative AI for Safer, Faster, More Accurate Financial Services

An overwhelming 91% of financial services industry (FSI) companies are either assessing artificial intelligence or already have it in the bag as a tool that’s driving innovation, improving operational efficiency and enhancing customer experiences.

Generative AI — powered by NVIDIA NIM microservices and accelerated computing — can help organizations improve portfolio optimization, fraud detection, customer service and risk management.

Among the companies harnessing these technologies to boost financial services applications are Ntropy, Contextual AI and NayaOne — all members of the NVIDIA Inception program for cutting-edge startups.

And Silicon Valley-based startup Securiti, which offers a centralized, intelligent platform for the safe use of data and generative AI, is using NVIDIA NIM to build an AI-powered copilot for financial services.

At Money20/20, a leading fintech conference running this week in Las Vegas, the companies will demonstrate how their technologies can turn disparate, often complex FSI data into actionable insights and advanced innovation opportunities for banks, fintechs, payment providers and other organizations.

Ntropy Brings Order to Unstructured Financial Data

New York-based Ntropy is helping remove various states of entropy — disorder, randomness or uncertainty — from financial services workflows.

“Whenever money is moved from point A to point B, text is left in bank statements, PDF receipts and other forms of transaction history,” said Naré Vardanyan, cofounder and CEO of Ntropy. “Traditionally, that unstructured data has been very hard to clean up and use for financial applications.”

The company’s transaction enrichment application programming interface (API) standardizes financial data from across different sources and geographies, acting as a common language that can help financial services applications understand any transaction with humanlike accuracy in just milliseconds, at 10,000x lower cost than traditional methods.

It’s built on the Llama 3 NVIDIA NIM microservice and NVIDIA Triton Inference Server running on NVIDIA H100 Tensor Core GPUs. Using the Llama 3 NIM microservice, Ntropy achieved up to 20x better utilization and throughput for its large language models (LLMs) compared with running the native models.

Airbase, a leading procure-to-pay software platform provider, boosts transaction authorization processes using LLMs and the Ntropy data enricher.

At Money20/20, Ntropy will discuss how its API can be used to clean up customers’ merchant data, which boosts fraud detection by improving the accuracy of risk-detection models. This in turn reduces both false transaction declines and revenue loss.

Another demo will highlight how an automated loan agent taps into the Ntropy API to analyze information on a bank’s website and generate a relevant investment report to speed loan dispersal and decision-making processes for users.

Contextual AI Advances Retrieval-Augmented Generation for FSI

Contextual AI — based in Mountain View, California — offers a production-grade AI platform, powered by retrieval-augmented generation (RAG) and ideal for building enterprise AI applications in knowledge-intensive FSI use cases.

“RAG is the answer to delivering enterprise AI into production,” said Douwe Kiela, CEO and cofounder of Contextual AI. “Tapping into NVIDIA technologies and large language models, the Contextual AI RAG 2.0 platform can bring accurate, auditable AI to FSI enterprises looking to optimize operations and offer new generative AI-powered products.”

The Contextual AI platform integrates the entire RAG pipeline — including extraction, retrieval, reranking and generation — into a single optimized system that can be deployed in minutes, and further tuned and specialized based on customer needs, delivering much greater accuracy in context-dependent tasks.

HSBC plans to use Contextual AI to provide research insights and process guidance support through retrieving and synthesizing relevant market outlooks, financial news and operational documents. Other financial organizations are also harnessing Contextual AI’s pre-built applications, including for financial analysis, policy-compliance report generation, financial advice query resolution and more.

For example, a user could ask, “What’s our forecast for central bank rates by Q4 2025?” The Contextual AI platform would provide a brief explanation and an accurate answer grounded in factual documents, including citations to specific sections in the source.

Contextual AI uses NVIDIA Triton Inference Server and the open-source NVIDIA TensorRT-LLM library for accelerating and optimizing LLM inference performance.

NayaOne Provides Digital Sandbox for Financial Services Innovation

London-based NayaOne offers an AI sandbox that allows customers to securely test and validate AI applications prior to commercial deployment. Its technology platform allows financial institutions the ability to create synthetic data and gives them access to a marketplace of hundreds of fintechs.

Customers can use the digital sandbox to benchmark applications for fairness, transparency, accuracy and other compliance measures and to better ensure top performance and successful integration.

“The demand for AI-driven solutions in financial services is accelerating, and our collaboration with NVIDIA allows institutions to harness the power of generative AI in a controlled, secure environment,” said Karan Jain, CEO of NayaOne. “We’re creating an ecosystem where financial institutions can prototype faster and more effectively, leading to real business transformation and growth initiatives.”

Using NVIDIA NIM microservices, NayaOne’s AI Sandbox lets customers explore and experiment with optimized AI models, and take them to deployment more easily. With NVIDIA accelerated computing, NayaOne achieves up to 10x faster processing for the large datasets used in its fraud detection models, at up to 40% lower infrastructure costs compared with running extensive CPU-based models.

The digital sandbox also uses the open-source NVIDIA RAPIDS set of data science and AI libraries to accelerate fraud detection and prevention capabilities in money movement applications. The company will demonstrate its digital sandbox at the NVIDIA AI Pavilion at Money20/20.

Securiti Improves Financial Planning With AI Copilot

Powering a broad range of generative AI applications — including safe enterprise AI copilots and LLM training and tuning — Securiti’s highly flexible Data+AI platform lets users build safe, end-to-end enterprise AI systems.

The company is now building an NVIDIA NIM-powered financial planning assistant. The copilot chatbot accesses diverse financial data while adhering to privacy and entitlement policies to provide context-aware responses to users’ finance-related questions.

“Banks struggle to provide personalized financial advice at scale while maintaining data security, privacy and compliance with regulations,” said Jack Berkowitz, chief data officer at Securiti. “With robust data protection and role-based access for secure, scalable support, Securiti helps build safe AI copilots that offer personalized financial advice tailored to individual goals.”

The chatbot retrieves data from a variety of sources, such as earnings transcripts, client profiles and account balances, and investment research documents. Securiti’s solution safely ingests and prepares it for use with high-performance, NVIDIA-powered LLMs, preserving controls such as access entitlements. Finally, it provides users with customized responses through a simple consumer interface.

Using the Llama 3 70B-Instruct NIM microservice, Securiti optimized the performance of the LLM, while ensuring the safe use of data. The company will demonstrate its generative AI solution at Money20/20.

NIM microservices and Triton Inference Server are available through the NVIDIA AI Enterprise software platform.

Learn more about AI for financial services by joining NVIDIA at Money20/20, running through Wednesday, Oct. 30. 

Explore a new NVIDIA AI workflow for fraud detection.

Read More

Bring Receipts: New NVIDIA AI Workflow Detects Fraudulent Credit Card Transactions

Bring Receipts: New NVIDIA AI Workflow Detects Fraudulent Credit Card Transactions

Financial losses from worldwide credit card transaction fraud are expected to reach $43 billion by 2026.

A new NVIDIA AI workflow for fraud detection running on Amazon Web Services (AWS) can help combat this burgeoning epidemic — using accelerated data processing and advanced algorithms to improve AI’s ability to detect and prevent credit card transaction fraud.

Launched this week at the Money20/20 fintech conference, the workflow enables financial institutions to identify subtle patterns and anomalies in transaction data based on user behavior to improve accuracy and reduce false positives compared with traditional methods.

Users can streamline the migration of their fraud detection workflows from traditional compute to accelerated compute using the NVIDIA AI Enterprise software platform and NVIDIA GPU instances.

Businesses embracing comprehensive machine learning tools and strategies can observe up to an estimated 40% improvement in fraud detection accuracy, boosting their ability to identify and stop fraudsters faster and mitigate harm.

As such, leading financial organizations like American Express and Capital One have been using AI to build proprietary solutions that mitigate fraud and enhance customer protection.

The new NVIDIA workflow accelerates data processing, model training and inference, and demonstrates how these components can be wrapped into a single, easy-to-use software offering, powered by NVIDIA AI.

Currently optimized for credit card transaction fraud, the workflow could be adapted for use cases such as new account fraud, account takeover and money laundering.

Accelerated Computing for Fraud Detection

As AI models expand in size, intricacy and diversity, it’s more important than ever for organizations across industries — including financial services — to harness cost- and energy-efficient computing power.

Traditional data science pipelines lack the necessary compute acceleration to handle the massive volumes of data required to effectively fight fraud amid rapidly growing losses across the industry. Leveraging NVIDIA RAPIDS Accelerator for Apache Spark could help payment companies reduce data processing times and save on their data processing costs.

To efficiently manage large-scale datasets and deliver real-time AI performance with complex AI models, financial institutions are turning to NVIDIA’s AI and accelerated computing platforms.

The use of gradient-boosted decision trees — a type of machine learning algorithm — tapping into libraries such as XGBoost, has long been the standard for fraud detection.

The new NVIDIA AI workflow for fraud detection enhances XGBoost using the NVIDIA RAPIDS suite of AI libraries with graph neural network (GNN) embeddings as additional features to help reduce false positives.

The GNN embeddings are fed into XGBoost to create and train a model that can then be orchestrated with the NVIDIA Morpheus Runtime Core library and NVIDIA Triton Inference Server for real-time inferencing.

The NVIDIA Morpheus framework securely inspects and classifies all incoming data, tagging it with patterns and flagging potentially suspicious activity. NVIDIA Triton Inference Server simplifies inference of all types of AI model deployments in production, while optimizing throughput, latency and utilization.

NVIDIA Morpheus, RAPIDS and Triton Inference Server are available through NVIDIA AI Enterprise.

Leading Financial Services Organizations Adopt AI

During a time when many large North American financial institutions are reporting online or mobile fraud losses continue to increase, AI is helping to combat this trend.

American Express, which began using AI to fight fraud in 2010, leverages fraud detection algorithms to monitor all customer transactions globally in real time, generating fraud decisions in just milliseconds. Using a combination of advanced algorithms, one of which tapped into the NVIDIA AI platform, American Express enhanced model accuracy, advancing the company’s ability to better fight fraud.

European digital bank bunq uses generative AI and large language models to help detect fraud and money laundering. Its AI-powered transaction-monitoring system achieved nearly 100x faster model training speeds with NVIDIA accelerated computing.

BNY announced in March that it became the first major bank to deploy an NVIDIA DGX SuperPOD with DGX H100 systems, which will help build solutions that support fraud detection and other use cases.

And now, systems integrators, software vendors and cloud service providers can integrate the new NVIDIA AI workflow for fraud detection to boost their financial services applications and help keep customers’ money, identities and digital accounts safe.

Explore the fraud detection NVIDIA AI workflow and read this NVIDIA Technical Blog on supercharging fraud detection with GNNs.

Learn more about AI for fraud detection by visiting the NVIDIA AI Pavilion featuring AWS at Money 20/20, running this week in Las Vegas.

Read More