Three Ways to Ride the Flywheel of Cybersecurity AI

Three Ways to Ride the Flywheel of Cybersecurity AI

The business transformations that generative AI brings come with risks that AI itself can help secure in a kind of flywheel of progress.

Companies who were quick to embrace the open internet more than 20 years ago were among the first to reap its benefits and become proficient in modern network security.

Enterprise AI is following a similar pattern today. Organizations pursuing its advances — especially with powerful generative AI capabilities — are applying those learnings to enhance their security.

For those just getting started on this journey, here are ways to address with AI three of the top security threats industry experts have identified for large language models (LLMs).

AI Guardrails Prevent Prompt Injections

Generative AI services are subject to attacks from malicious prompts designed to disrupt the LLM behind it or gain access to its data. As the report cited above notes, “Direct injections overwrite system prompts, while indirect ones manipulate inputs from external sources.”

The best antidote for prompt injections are AI guardrails, built into or placed around LLMs. Like the metal safety barriers and concrete curbs on the road, AI guardrails keep LLM applications on track and on topic.

The industry has delivered and continues to work on solutions in this area. For example, NVIDIA NeMo Guardrails software lets developers protect the trustworthiness, safety and security of generative AI services.

AI Detects and Protects Sensitive Data

The responses LLMs give to prompts can on occasion reveal sensitive information. With multifactor authentication and other best practices, credentials are becoming increasingly complex, widening the scope of what’s considered sensitive data.

To guard against disclosures, all sensitive information should be carefully removed or obscured from AI training data. Given the size of datasets used in training, it’s hard for humans — but easy for AI models — to ensure a data sanitation process is effective.

An AI model trained to detect and obfuscate sensitive information can help safeguard against revealing anything confidential that was inadvertently left in an LLM’s training data.

Using NVIDIA Morpheus, an AI framework for building cybersecurity applications, enterprises can create AI models and accelerated pipelines that find and protect sensitive information on their networks. Morpheus lets AI do what no human using traditional rule-based analytics can: track and analyze the massive data flows on an entire corporate network.

AI Can Help Reinforce Access Control

Finally, hackers may try to use LLMs to get access control over an organization’s assets. So, businesses need to prevent their generative AI services from exceeding their level of authority.

The best defense against this risk is using the best practices of security-by-design. Specifically, grant an LLM the least privileges and continuously evaluate those permissions, so it can only access the tools and data it needs to perform its intended functions. This simple, standard approach is probably all most users need in this case.

However, AI can also assist in providing access controls for LLMs. A separate inline model can be trained to detect privilege escalation by evaluating an LLM’s outputs.

Start the Journey to Cybersecurity AI

No one technique is a silver bullet; security continues to be about evolving measures and countermeasures. Those who do best on that journey make use of the latest tools and technologies.

To secure AI, organizations need to be familiar with it, and the best way to do that is by deploying it in meaningful use cases. NVIDIA and its partners can help with full-stack solutions in AI, cybersecurity and cybersecurity AI.

Looking ahead, AI and cybersecurity will be tightly linked in a kind of virtuous cycle, a flywheel of progress where each makes the other better. Ultimately, users will come to trust it as just another form of automation.

Learn more about NVIDIA’s cybersecurity AI platform and how it’s being put to use. And listen to cybersecurity talks from experts at the NVIDIA AI Summit in October.

Read More

19 New Games to Drop for GeForce NOW in September

19 New Games to Drop for GeForce NOW in September

Fall will be here soon, so leaf it to GeForce NOW to bring the games, with 19 joining the cloud in September.

Get started with the seven games available to stream this week, and a day one PC Game Pass title, Age of Mythology: Retold, from the creators of the award-winning Age of Empires franchise World’s Edge, Forgotten Empires and Xbox Game Studios.

The Open Beta for Call of Duty: Black Ops 6 runs Sept. 6-9, offering everyone a chance to experience game-changing innovations before the title officially launches on Oct. 25. Members can stream the Battle.net and Steam versions of the Open Beta instantly this week on GeForce NOW to jump right into the action.

Where Myths and Heroes Collide

Age of Mythology on GeForce NOW
A vast, mythical world to explore with friends? Say no more…

Age of Mythology: Retold revitalizes the classic real-time strategy game by merging its beloved elements with modern visuals.

Get immersed in a mythical universe, command legendary units and call upon the powers of various gods from the Atlantean, Greek, Egyptian and Norse pantheons. The single-player experience features a 50-mission campaign, including engaging battles and myth exploration in iconic locations like Troy and Midgard. Challenge friends in head-to-head matches or cooperate to take on advanced, AI-powered opponents.

Call upon the gods from the cloud with an Ultimate and Priority membership and stream the game across devices. Games update automatically in the cloud, so members can dive into the action without having to wait.

September Gets Better With New Games

The Casting of Frank Stone on GeForce NOW
Choose your fate.

Catch the storytelling prowess of Supermassive Games in The Casting of Frank Stone, available to stream this week for members. The shadow of Frank Stone looms over Cedar Hills, a town forever altered by his violent past. Delve into the mystery of Cedar Hills alongside an original cast of characters bound together on a twisted journey where nothing is quite as it seems. Every decision shapes the story and impacts the fate of the characters.

In addition, members can look for the following games this week:

  • The Casting of Frank Stone (New release on Steam, Sept. 3)
  • Age of Mythology (New release on Steam and Xbox, available on PC Game Pass, Sept.4 )
  • Sniper Ghost Warrior Contracts  (New release on Epic Games Store, early access Sept. 5)
  • Warhammer 40,000: Space Marine 2 (New release on Steam, early access Sept. 5)
  • Crime Scene Cleaner (Steam)
  • FINAL FANTASY XVI Demo (Epic Games Store)
  • Sins of a Solar Empire II (Steam)

Here’s what members can expect for the rest of September:

  • Frostpunk 2 (New release on Steam and Xbox available  on PC Game Pass, Sept. 17)
  • FINAL FANTASY XVI (New release on Steam and Epic Games Store, Sept. 17)
  • The Plucky Squire (New release on Steam, Sept. 17)
  • Tiny Glade (New release on Steam, Sept. 23)
  • Disney Epic Mickey: Rebrushed (New release on Steam, Sept. 24)
  • Greedfall II: The Dying World (New release on Steam, Sept. 24)
  • Mechabellum ( Steam)
  • Blacksmith Master (New release on Steam, Sept. 26)
  • Breachway (New release on Steam, Sept. 26)
  • REKA (New release on Steam)
  • Test Drive Unlimited Solar Crown (New release on Steam)
  • Rider’s Republic (New release on PC Game Pass, Sept. 11). To begin playing, members need to activate access, and can refer to the help article for instructions.

Additions to August

In addition to the 18 games announced last month, 48 more joined the GeForce NOW library:

  • Prince of Persia: The Lost Crown (Day zero release on Steam, Aug. 8)
  • FINAL FANTASY XVI Demo (New release on Steam, Aug. 19)
  • Black Myth: Wukong (New release on Steam and Epic Games Store, Aug. 20)
  • GIGANTIC: RAMPAGE EDITION (Available on Epic Games Store, free Aug. 22)
  • Skull and Bones (New release on Steam, Aug. 22)
  • Endzone 2 (New release on Steam, Aug. 26)
  • Age of Mythology: Retold (Advanced access on Steam, Xbox, available on PC Game Pass, Aug. 27)
  • Core Keeper (New release on Xbox, available on PC Game Pass, Aug. 27)
  • Alan Wake’s American Nightmare (Xbox, available on Microsoft Store)
  • Car Manufacture (Steam)
  • Cat Quest III (Steam)
  • Commandos 3 – HD Remaster (Xbox, available on Microsoft Store)
  • Cooking Simulator (Xbox, available on PC Game Pass)
  • Crown Trick (Xbox, available on Microsoft Store)
  • Darksiders Genesis (Xbox, available on Microsoft Store)
  • Desperados III (Xbox, available on Microsoft Store)
  • The Dungeon of Naheulbeuk: The Amulet of Chaos (Xbox, available on Microsoft Store)
  • Expeditions: Rome (Xbox, available on Microsoft Store)
  • The Flame in the Flood (Xbox, available on Microsoft Store)
  • FTL: Faster Than Light (Xbox, available on Microsoft Store)
  • Genesis Noir (Xbox, available on PC Game Pass)
  • House Flipper (Xbox, available on PC Game Pass)
  • Into the Breach (Xbox, available on Microsoft Store)
  • Iron Harvest (Xbox, available on Microsoft Store)
  • The Knight Witch (Xbox, available on Microsoft Store)
  • Lightyear Frontier (Xbox, available on PC Game Pass)
  • Medieval Dynasty (Xbox, available on PC Game Pass)
  • Metro Exodus Enhanced Edition (Xbox, available on Microsoft Store)
  • My Time at Portia (Xbox, available on PC Game Pass)
  • Night in the Woods (Xbox, available on Microsoft Store )
  • Offworld Trading Company (Xbox, available on PC Game Pass)
  • Orwell: Keeping an Eye on You (Xbox, available on Microsoft Store)
  • Outlast 2 (Xbox, available on Microsoft Store)
  • Project Winter (Xbox, available on Microsoft Store)
  • Psychonauts (Steam)
  • Psychonauts 2 (Steam and Xbox, available on PC Game Pass)
  • Shadow Tactics: Blades of the Shogun (Xbox, available on Microsoft Store)
  • Sid Meier’s Civilization VI (Steam, Epic Games Store and Xbox, available on the Microsoft store)
  • Sid Meier’s Civilization V (Steam)
  • Sid Meier’s Civilization IV (Steam)
  • Sid Meier’s Civilization: Beyond Earth (Steam)
  • Spirit of the North (Xbox, available on PC Game Pass)
  • SteamWorld Heist II (Steam, Xbox, available on Microsoft Store)
  • Visions of Mana Demo (Steam)
  • This War of Mine (Xbox, available on PC Game Pass)
  • We Were Here Too (Steam)
  • Wreckfest (Xbox, available on PC Game Pass)
  • Yoku’s Island Express (Xbox, available on Microsoft Store)

Breachway was originally included in the August games list, but the launch date was moved to September by the developer. Stay tuned to GFN Thursday for updates.

Starting in October, members will no longer see the option of launching “Epic Games Store” versions of games published by Ubisoft on GeForce NOW.  To play these supported games, members can select the “Ubisoft Connect” option on GeForce NOW and will need to connect their Ubisoft Connect and Epic game store accounts the first time they play the game. Check out more details.

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Volvo Cars EX90 SUV Rolls Out, Built on NVIDIA Accelerated Computing and AI

Volvo Cars EX90 SUV Rolls Out, Built on NVIDIA Accelerated Computing and AI

Volvo Cars’ new, fully electric EX90 is making its way from the automaker’s assembly line in Charleston, South Carolina, to dealerships around the U.S.

To ensure its customers benefit from future improvements and advanced safety features and capabilities, the Volvo EX90 is built on the NVIDIA DRIVE Orin system-on-a-chip (SoC), capable of more than 250 trillion operations per second (TOPS).

Running NVIDIA DriveOS, the system delivers high-performance processing in a package that’s literally the size of a postage stamp. This core compute architecture handles all vehicle functions, ranging from enabling safety and driving assistance features to supporting the development of autonomous driving capabilities — all while delivering an excellent user experience.

The state-of-the-art SUV is an intelligent mobile device on wheels, equipped with the automaker’s most advanced sensor suite to date, including radar, lidar, cameras, ultrasonic sensors and more. NVIDIA DRIVE Orin enables real-time, redundant and advanced 360-degree surround-sensor data processing, supporting Volvo Cars’ unwavering commitment to safety.

DRIVE Thor Powering the Next Generation of Volvo Cars

Setting its sights on the future, Volvo Cars also announced plans to migrate to the next-generation NVIDIA DRIVE Thor SoC for its upcoming fleets.

Before the end of the decade, Volvo Cars will move to NVIDIA DRIVE Thor, which boasts 1,000 TOPS —  quadrupling the processing power of a single DRIVE Orin SoC, while improving energy efficiency sevenfold.

The next-generation DRIVE Thor autonomous vehicle processor incorporates the latest NVIDIA Blackwell GPU architecture, helping unlock a new realm of possibilities and capabilities both in and around the car. This advanced platform will facilitate the deployment of safe advanced driver-assistance system (ADAS) and self-driving features — and pave the way for a new era of in-vehicle experiences powered by generative AI.

Highlighting Volvo Cars’ leap to NVIDIA’s next-generation processor, Volvo Cars CEO Jim Rowan noted, “With NVIDIA DRIVE Thor in our future cars, our in-house developed software becomes more scalable across our product lineup, and it helps us to continue to improve the safety in our cars, deliver best-in-class customer experiences — and increase our margins.”

Zenseact Strategic Investment in NVIDIA Technology

Volvo Cars and its software subsidiary, Zenseact, are also investing in NVIDIA DGX systems for AI model training in the cloud, helping ensure that future fleets are equipped with the most advanced and well-tested AI-powered safety features.

Managing the massive amount of data needed to safely train the next generation of AI-enabled vehicles demands data-center-level compute and infrastructure.

NVIDIA DGX systems provide the computational performance essential for training AI models with unprecedented efficiency. Transportation companies use them to speed autonomous technology development in a cost-effective, enterprise-ready and easy-to-deploy way.

Volvo Cars and Zenseact’s AI training hub, based in the Nordics, will use the systems to help catalyze multiple facets of ADAS and autonomous driving software development. A key benefit is the optimization of the data annotation process — a traditionally time-consuming task involving the identification and labeling of objects for classification and recognition.

The cluster of DGX systems will also enable processing of the required data for safety assurance, delivering twice the performance and potentially halving time to market.

“The NVIDIA DGX AI supercomputer will supercharge our AI training capabilities, making this in-house AI training center one of the largest in the Nordics,” said Anders Bell, chief engineering and technology officer at Volvo Cars. “By leveraging NVIDIA technology and setting up the data center, we pave a quick path to high-performing AI, ultimately helping make our products safer and better.”

With NVIDIA technology as the AI brain inside the car and in the cloud, Volvo Cars and Zenseact can deliver safe vehicles that allow customers to drive with peace of mind, wherever the road may lead.

Read More

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

Deploy Amazon SageMaker pipelines using AWS Controllers for Kubernetes

Kubernetes is a popular orchestration platform for managing containers. Its scalability and load-balancing capabilities make it ideal for handling the variable workloads typical of machine learning (ML) applications. DevOps engineers often use Kubernetes to manage and scale ML applications, but before an ML model is available, it must be trained and evaluated and, if the quality of the obtained model is satisfactory, uploaded to a model registry.

Amazon SageMaker provides capabilities to remove the undifferentiated heavy lifting of building and deploying ML models. SageMaker simplifies the process of managing dependencies, container images, auto scaling, and monitoring. Specifically for the model building stage, Amazon SageMaker Pipelines automates the process by managing the infrastructure and resources needed to process data, train models, and run evaluation tests.

A challenge for DevOps engineers is the additional complexity that comes from using Kubernetes to manage the deployment stage while resorting to other tools (such as the AWS SDK or AWS CloudFormation) to manage the model building pipeline. One alternative to simplify this process is to use AWS Controllers for Kubernetes (ACK) to manage and deploy a SageMaker training pipeline. ACK allows you to take advantage of managed model building pipelines without needing to define resources outside of the Kubernetes cluster.

In this post, we introduce an example to help DevOps engineers manage the entire ML lifecycle—including training and inference—using the same toolkit.

Solution overview

We consider a use case in which an ML engineer configures a SageMaker model building pipeline using a Jupyter notebook. This configuration takes the form of a Directed Acyclic Graph (DAG) represented as a JSON pipeline definition. The JSON document can be stored and versioned in an Amazon Simple Storage Service (Amazon S3) bucket. If encryption is required, it can be implemented using an AWS Key Management Service (AWS KMS) managed key for Amazon S3. A DevOps engineer with access to fetch this definition file from Amazon S3 can load the pipeline definition into an ACK service controller for SageMaker, which is running as part of an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. The DevOps engineer can then use the Kubernetes APIs provided by ACK to submit the pipeline definition and initiate one or more pipeline runs in SageMaker. This entire workflow is shown in the following solution diagram.

architecture

Prerequisites

To follow along, you should have the following prerequisites:

  • An EKS cluster where the ML pipeline will be created.
  • A user with access to an AWS Identity and Access Management (IAM) role that has IAM permissions (iam:CreateRole, iam:AttachRolePolicy, and iam:PutRolePolicy) to allow creating roles and attaching policies to roles.
  • The following command line tools on the local machine or cloud-based development environment used to access the Kubernetes cluster:

Install the SageMaker ACK service controller

The SageMaker ACK service controller makes it straightforward for DevOps engineers to use Kubernetes as their control plane to create and manage ML pipelines. To install the controller in your EKS cluster, complete the following steps:

  1. Configure IAM permissions to make sure the controller has access to the appropriate AWS resources.
  2. Install the controller using a SageMaker Helm Chart to make it available on the client machine.

The following tutorial provides step-by-step instructions with the required commands to install the ACK service controller for SageMaker.

Generate a pipeline JSON definition

In most companies, ML engineers are responsible for creating the ML pipeline in their organization. They often work with DevOps engineers to operate those pipelines. In SageMaker, ML engineers can use the SageMaker Python SDK to generate a pipeline definition in JSON format. A SageMaker pipeline definition must follow the provided schema, which includes base images, dependencies, steps, and instance types and sizes that are needed to fully define the pipeline. This definition then gets retrieved by the DevOps engineer for deploying and maintaining the infrastructure needed for the pipeline.

The following is a sample pipeline definition with one training step:

{
  "Version": "2020-12-01",
  "Steps": [
  {
    "Name": "AbaloneTrain",
    "Type": "Training",
    "Arguments": {
      "RoleArn": "<<YOUR_SAGEMAKER_ROLE_ARN>>",
      "HyperParameters": {
        "max_depth": "5",
        "gamma": "4",
        "eta": "0.2",
        "min_child_weight": "6",
        "objective": "multi:softmax",
        "num_class": "10",
        "num_round": "10"
     },
     "AlgorithmSpecification": {
     "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1",
     "TrainingInputMode": "File"
   },
   "OutputDataConfig": {
     "S3OutputPath": "s3://<<YOUR_BUCKET_NAME>>/sagemaker/"
   },
   "ResourceConfig": {
     "InstanceCount": 1,
     "InstanceType": "ml.m4.xlarge",
     "VolumeSizeInGB": 5
   },
   "StoppingCondition": {
     "MaxRuntimeInSeconds": 86400
   },
   "InputDataConfig": [
   {
     "ChannelName": "train",
     "DataSource": {
       "S3DataSource": {
         "S3DataType": "S3Prefix",
         "S3Uri": "s3://<<YOUR_BUCKET_NAME>>/sagemaker/xgboost/train/",
         "S3DataDistributionType": "
       }
     },
     "ContentType": "text/libsvm"
   },
   {
     "ChannelName": "validation",
     "DataSource": {
       "S3DataSource": {
         "S3DataType": "S3Prefix",
         "S3Uri": "s3://<<YOUR_BUCKET_NAME>>/sagemaker/xgboost/validation/",
         "S3DataDistributionType": "FullyReplicated"
       }
     },
     "ContentType": "text/libsvm"
   }]
  }
 }]
}

With SageMaker, ML model artifacts and other system artifacts are encrypted in transit and at rest. SageMaker encrypts these by default using the AWS managed key for Amazon S3. You can optionally specify a custom key using the KmsKeyId property of the OutputDataConfig argument. For more information on how SageMaker protects data, see Data Protection in Amazon SageMaker.

Furthermore, we recommend securing access to the pipeline artifacts, such as model outputs and training data, to a specific set of IAM roles created for data scientists and ML engineers. This can be achieved by attaching an appropriate bucket policy. For more information on best practices for securing data in Amazon S3, see Top 10 security best practices for securing data in Amazon S3.

Create and submit a pipeline YAML specification

In the Kubernetes world, objects are the persistent entities in the Kubernetes cluster used to represent the state of your cluster. When you create an object in Kubernetes, you must provide the object specification that describes its desired state, as well as some basic information about the object (such as a name). Then, using tools such as kubectl, you provide the information in a manifest file in YAML (or JSON) format to communicate with the Kubernetes API.

Refer to the following Kubernetes YAML specification for a SageMaker pipeline. DevOps engineers need to modify the .spec.pipelineDefinition key in the file and add the ML engineer-provided pipeline JSON definition. They then prepare and submit a separate pipeline execution YAML specification to run the pipeline in SageMaker. There are two ways to submit a pipeline YAML specification:

  • Pass the pipeline definition inline as a JSON object to the pipeline YAML specification.
  • Convert the JSON pipeline definition into String format using the command line utility jq. For example, you can use the following command to convert the pipeline definition to a JSON-encoded string:
jq -r tojson <pipeline-definition.json>

In this post, we use the first option and prepare the YAML specification (my-pipeline.yaml) as follows:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: Pipeline
metadata:
  name: my-kubernetes-pipeline
spec:
  parallelismConfiguration:
  	maxParallelExecutionSteps: 2
  pipelineName: my-kubernetes-pipeline
  pipelineDefinition: |
  {
    "Version": "2020-12-01",
    "Steps": [
    {
      "Name": "AbaloneTrain",
      "Type": "Training",
      "Arguments": {
        "RoleArn": "<<YOUR_SAGEMAKER_ROLE_ARN>>",
        "HyperParameters": {
          "max_depth": "5",
          "gamma": "4",
          "eta": "0.2",
          "min_child_weight": "6",
          "objective": "multi:softmax",
          "num_class": "10",
          "num_round": "30"
        },
        "AlgorithmSpecification": {
          "TrainingImage": "683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-xgboost:1.7-1",
          "TrainingInputMode": "File"
        },
        "OutputDataConfig": {
          "S3OutputPath": "s3://<<YOUR_S3_BUCKET>>/sagemaker/"
        },
        "ResourceConfig": {
          "InstanceCount": 1,
          "InstanceType": "ml.m4.xlarge",
          "VolumeSizeInGB": 5
        },
        "StoppingCondition": {
          "MaxRuntimeInSeconds": 86400
        },
        "InputDataConfig": [
        {
          "ChannelName": "train",
          "DataSource": {
            "S3DataSource": {
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://<<YOUR_S3_BUCKET>>/sagemaker/xgboost/train/",
              "S3DataDistributionType": "FullyReplicated"
            }
          },
          "ContentType": "text/libsvm"
        },
        {
          "ChannelName": "validation",
          "DataSource": {
            "S3DataSource": {
              "S3DataType": "S3Prefix",
              "S3Uri": "s3://<<YOUR_S3_BUCKET>>/sagemaker/xgboost/validation/",
              "S3DataDistributionType": "FullyReplicated"
            }
          },
          "ContentType": "text/libsvm"
        }
      ]
    }
  }
]}
pipelineDisplayName: my-kubernetes-pipeline
roleARN: <<YOUR_SAGEMAKER_ROLE_ARN>>

Submit the pipeline to SageMaker

To submit your prepared pipeline specification, apply the specification to your Kubernetes cluster as follows:

kubectl apply -f my-pipeline.yaml

Create and submit a pipeline execution YAML specification

Refer to the following Kubernetes YAML specification for a SageMaker pipeline. Prepare the pipeline execution YAML specification (pipeline-execution.yaml) as follows:

apiVersion: sagemaker.services.k8s.aws/v1alpha1
kind: PipelineExecution
metadata:
  name: my-kubernetes-pipeline-execution
spec:
  parallelismConfiguration:
  	maxParallelExecutionSteps: 2
  pipelineExecutionDescription: "My first pipeline execution via Amazon EKS cluster."
  pipelineName: my-kubernetes-pipeline

To start a run of the pipeline, use the following code:

kubectl apply -f pipeline-execution.yaml

Review and troubleshoot the pipeline run

To list all pipelines created using the ACK controller, use the following command:

kubectl get pipeline

To list all pipeline runs, use the following command:

kubectl get pipelineexecution

To get more details about the pipeline after it’s submitted, like checking the status, errors, or parameters of the pipeline, use the following command:

kubectl describe pipeline my-kubernetes-pipeline

To troubleshoot a pipeline run by reviewing more details about the run, use the following command:

kubectl describe pipelineexecution my-kubernetes-pipeline-execution

Clean up

Use the following command to delete any pipelines you created:

kubectl delete pipeline

Use the following command to cancel any pipeline runs you started:

kubectl delete pipelineexecution

Conclusion

In this post, we presented an example of how ML engineers familiar with Jupyter notebooks and SageMaker environments can efficiently work with DevOps engineers familiar with Kubernetes and related tools to design and maintain an ML pipeline with the right infrastructure for their organization. This enables DevOps engineers to manage all the steps of the ML lifecycle with the same set of tools and environment they are used to, which enables organizations to innovate faster and more efficiently.

Explore the GitHub repository for ACK and the SageMaker controller to start managing your ML operations with Kubernetes.


About the Authors

Pratik Yeole is a Senior Solutions Architect working with global customers, helping customers build value-driven solutions on AWS. He has expertise in MLOps and containers domains. Outside of work, he enjoys time with friends, family, music, and cricket.

Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS. Prior to joining AWS, Felipe worked with GE Digital and SLB, where he focused on modeling and optimization products for industrial applications.

Read More

Effectively manage foundation models for generative AI applications with Amazon SageMaker Model Registry

Effectively manage foundation models for generative AI applications with Amazon SageMaker Model Registry

Generative artificial intelligence (AI) foundation models (FMs) are gaining popularity with businesses due to their versatility and potential to address a variety of use cases. The true value of FMs is realized when they are adapted for domain specific data. Managing these models across the business and model lifecycle can introduce complexity. As FMs are adapted to different domains and data, operationalizing these pipelines becomes critical.

Amazon SageMaker, a fully managed service to build, train, and deploy machine learning (ML) models, has seen increased adoption to customize and deploy FMs that power generative AI applications. SageMaker provides rich features to build automated workflows for deploying models at scale. One of the key features that enables operational excellence around model management is the Model Registry. Model Registry helps catalog and manage model versions and facilitates collaboration and governance. When a model is trained and evaluated for performance, it can be stored in the Model Registry for model management.

Amazon SageMaker has released new features in Model Registry that make it easy to version and catalog FMs. Customers can use SageMaker to train or tune FMs, including Amazon SageMaker JumpStart and Amazon Bedrock models, and also manage these models within Model Registry. As customers begin to scale generative AI applications across various use cases such as fine-tuning for domain-specific tasks, the number of models can quickly grow. To keep track of models, versions, and associated metadata, SageMaker Model Registry can be used as an inventory of models.

In this post, we explore the new features of Model Registry that streamline FM management: you can now register unzipped model artifacts and pass an End User License Agreement (EULA) acceptance flag without needing users to intervene.

Overview

Model Registry has worked well for traditional models, which are smaller in size. For FMs, there were challenges because of their size and requirements for user intervention for EULA acceptance. With the new features in Model Registry, it’s become easier to register a fine-tuned FM within Model Registry, which then can be deployed for actual use.

A typical model development lifecycle is an iterative process. We conduct many experimentation cycles to achieve expected performance from the model. Once trained, these models can be registered in the Model Registry where they are cataloged as versions. The models can be organized in groups, the versions can be compared for their quality metrics, and models can have an associated approval status indicating if its deployable.

Once the model is manually approved, a continuous integration and continuous deployment (CI/CD) pipeline can be triggered to deploy these models to production. Optionally, Model Registry can be used as a repository of models that are approved for use by an enterprise. Various teams can then deploy these approved models from Model Registry and build applications around it.

An example workflow could follow these steps and is shown in the following diagram:

  1. Select a SageMaker JumpStart model and register it in Model Registry
  2. Alternatively, you can fine-tune a SageMaker JumpStart model
  3. Evaluate the model with SageMaker model evaluation. SageMaker allows for human evaluation if desired.
  4. Create a model group in the Model Registry. For each run, create a model version. Add your model group into one or more Model Registry Collections, which can be used to group registered models that are related to each other. For example, you could have a collection of large language models (LLMs) and another collection of diffusion models.
  5. Deploy the models as SageMaker Inference endpoints that can be consumed by generative AI applications.

Model Registry workflow for foundation modelsFigure 1: Model Registry workflow for foundation models

To better support generative AI applications, Model Registry released two new features: ModelDataSource, and source model URI. The following sections will explore these features and how to use them.

ModelDataSource speeds up deployment and provides access to EULA dependent models

Until now, model artifacts had to be stored along with the inference code when a model gets registered in Model Registry in a compressed format. This posed challenges for generative AI applications where FMs are of very large size with billions of parameters. The large size of FMs when stored as zipped models was causing increased latency with SageMaker endpoint startup time because decompressing these models at run time took very long. The model_data_source parameter can now accept the location of the unzipped model artifacts in Amazon Simple Storage Service (Amazon S3) making the registration process simple. This also eliminates the need for endpoints to unzip the model weights, leading to reduced latency during endpoint startup times.

Additionally, public JumpStart models and certain FMs from independent service providers, such as LLAMA2, require that their EULA must be accepted prior to using the models. Thus, when public models from SageMaker JumpStart were tuned, they could not be stored in the Model Registry because a user needed to accept the license agreement. Model Registry added a new feature: EULA acceptance flag support within the model_data_source parameter, allowing the registration of such models. Now customers can catalog, version, associate metadata such as training metrics, and more in Model Registry for a wider variety of FMs.

Register unzipped models stored in Amazon S3 using the AWS SDK.

model_data_source = {
               "S3DataSource": {
                      "S3Uri": "s3://bucket/model/prefix/", 
                      "S3DataType": "S3Prefix",          
                      "CompressionType": "None",            
                      "ModelAccessConfig": {                 
                           "AcceptEula": true
                       },
                 }
}
model = Model(       
               sagemaker_session=sagemaker_session,        
               image_uri=IMAGE_URI,      
               model_data=model_data_source
)
model.register()

Register models requiring a EULA.

from sagemaker.jumpstart.model importJumpStartModel
model_id = "meta-textgeneration-llama-2-7b"
my_model = JumpStartModel(model_id=model_id)
registered_model =my_model.register(accept_eula=True)
predictor = registered_model.deploy()

Source model URI provides simplified registration and proprietary model support

Model Registry now supports automatic population of inference specification files for some recognized model IDs, including select AWS Marketplace models, hosted models, or versioned model packages in Model Registry. Because of SourceModelURI’s support for automatic population, you can register proprietary JumpStart models from providers such as AI21 labs, Cohere, and LightOn without needing the inference specification file, allowing your organization to use a broader set of FMs in Model Registry.

Previously, to register a trained model in the SageMaker Model Registry, you had to provide the complete inference specification required for deployment, including an Amazon Elastic Container Registry (Amazon ECR) image and the trained model file. With the launch of source_uri support, SageMaker has made it easy for users to register any model by providing a source model URI, which is a free form field that stores model ID or location to a proprietary JumpStart and Bedrock model ID, S3 location, and MLflow model ID. Rather than having to supply the details required for deploying to SageMaker hosting at the time of registrations, you can add the artifacts later on. After registration, to deploy a model, you can package the model an inference specification and update Model Registry accordingly.

For example, you can register a model in Model Registry with a model Amazon Resource Name (ARN) SourceURI.

model_arn = "<arn of the model to be registered>"
registered_model_package = model.register(        
        model_package_group_name="model_group_name",
        source_uri=model_arn
)

Later, you can update the registered model with the inference specification, making it deployable on SageMaker.

model_package = sagemaker_session.sagemaker_client.create_model_package( 
        ModelPackageGroupName="model_group_name", 
        SourceUri="source_uri"
)
mp = ModelPackage(        
       role=get_execution_role(sagemaker_session),
       model_package_arn=model_package["ModelPackageArn"],
       sagemaker_session=sagemaker_session
)
mp.update_inference_specification(image_uris=["ecr_image_uri"])

Register an Amazon JumpStart proprietary FM.

from sagemaker.jumpstart.model import JumpStartModel
model_id = "ai21-contextual-answers"
my_model = JumpStartModel(
           model_id=model_id
)
model_package = my_model.register()

Conclusion

As organizations continue to adopt generative AI in different parts of their business, having robust model management and versioning becomes paramount. With Model Registry, you can achieve version control, tracking, collaboration, lifecycle management, and governance of FMs.

In this post, we explored how Model Registry can now more effectively support managing generative AI models across the model lifecycle, empowering you to better govern and adopt generative AI to achieve transformational outcomes.

To learn more about Model Registry, see Register and Deploy Models with Model Registry. To get started, visit the SageMaker console.


About the Authors

Chaitra Mathur serves as a Principal Solutions Architect at AWS, where her role involves advising clients on building robust, scalable, and secure solutions on AWS. With a keen interest in data and ML, she assists clients in leveraging AWS AI/ML and generative AI services to address their ML requirements effectively. Throughout her career, she has shared her expertise at numerous conferences and has authored several blog posts in the ML area.

Kait Healy is a Solutions Architect II at AWS. She specializes in working with startups and enterprise automotive customers, where she has experience building AI/ML solutions at scale to drive key business outcomes.

Saumitra Vikaram is a Senior Software Engineer at AWS. He is focused on AI/ML technology, ML model management, ML governance, and MLOps to improve overall organizational efficiency and productivity.

Siamak Nariman is a Senior Product Manager at AWS. He is focused on AI/ML technology, ML model management, and ML governance to improve overall organizational efficiency and productivity. He has extensive experience automating processes and deploying various technologies

Read More

Build an ecommerce product recommendation chatbot with Amazon Bedrock Agents

Build an ecommerce product recommendation chatbot with Amazon Bedrock Agents

Many ecommerce applications want to provide their users with a human-like chatbot that guides them to choose the best product as a gift for their loved ones or friends. To enhance the customer experience, the chatbot need to engage in a natural, conversational manner to understand the user’s preferences and requirements, such as the recipient’s gender, the occasion for the gift, and the desired product category. Based on the discussion with the user, the chatbot should be able to query the ecommerce product catalog, filter the results, and recommend the most suitable products.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Amazon Bedrock Agents is a feature that enables generative AI applications to run multistep tasks across company systems and data sources. In this post, we show you how to build an ecommerce product recommendation chatbot using Amazon Bedrock Agents and FMs available in Amazon Bedrock.

Solution overview

Traditional rule-based chatbots often struggle to handle the nuances and complexities of open-ended conversations, leading to frustrating experiences for users. Furthermore, manually coding all the possible conversation flows and product filtering logic is time-consuming and error-prone, especially as the product catalog grows.

To address this challenge, you need a solution that uses the latest advancements in generative AI to create a natural conversational experience. The solution should seamlessly integrate with your existing product catalog API and dynamically adapt the conversation flow based on the user’s responses, reducing the need for extensive coding.

With Amazon Bedrock Agents, you can build intelligent chatbots that can converse naturally with users, understand their preferences, and efficiently retrieve and recommend the most relevant products from the catalog. Amazon Bedrock Agents simplifies the process of building and deploying generative AI models, enabling businesses to create engaging and personalized conversational experiences without the need for extensive machine learning (ML) expertise.

For our use case, we create a recommender chatbot using Amazon Bedrock Agents that prompts users to describe who they want to buy the gift for and the relevant occasion. The agent queries the product information stored in an Amazon DynamoDB table, using an API implemented as an AWS Lambda function. The agent adapts the API inputs to filter products based on its discussion with the user, for example gender, occasion, and category. After obtaining the user’s gift preferences by asking clarifying questions, the agent responds with the most relevant products that are available in the DynamoDB table based on user preferences.

The following diagram illustrates the solution architecture.

ecommerce recommender chatbot architecture

As shown in the preceding diagram, the ecommerce application first uses the agent to drive the conversation with users and generate product recommendations. The agent uses an API backed by Lambda to get product information. Lastly, the Lambda function looks up product data from DynamoDB.

Prerequisites

You need to have an AWS account with a user or role that has at minimum the following AWS Identity and Access Management (IAM) policies and permissions:

  • AWS managed policies:
    • AmazonBedrockFullAccess
    • AWSMarketplaceManageSubscriptions
    • AWSLambda_ReadOnlyAccess
    • AmazonDynamoDBReadOnlyAccess
  • IAM actions:
    • iam:CreateRole
    • iam:CreatePolicy
    • iam:AttachRolePolicy

Deploy the solution resources with AWS CloudFormation

Before you create your agent, you need to set up the product database and API. We use an AWS CloudFormation template to create a DynamoDB table to store product information and a Lambda function to serve as the API for retrieving product details.

At the time of writing this post, you can use any of the following AWS Regions to deploy the solution: US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai, Sydney), Europe (Frankfurt, Paris), Canada (Central), or South America (São Paulo). Visit Supported regions and models for Amazon Bedrock Agents for updates.

To deploy the template, choose Launch Stack:

Launch Stack to create solution resources

This template creates a DynamoDB table named Products with the following attributes: product_name (partition key), category, gender, and occasion. It also defines a global secondary index (GSI) for each of these attributes to enable efficient querying.

Additionally, the template sets up a Lambda function named GetProductDetailsFunction that acts as an API for retrieving product details, This Lambda function accepts query parameters such as category, gender, and occasion. It constructs a filter expression based on the provided parameters and scans the DynamoDB table to retrieve matching products. If no parameters are provided, it retrieves all the products in the table and returns the first 100 products.

The template also creates another Lambda function called PopulateProductsTableFunction that generates sample data to store in the Products table. The CloudFormation template includes a custom resource that will run the PopulateProductsTableFunction function one time as part of the template deployment, to add 100 sample product entries in the products DynamoDB table, with various combinations of product names, descriptions, categories, genders, and occasions.

You can optionally update the sample product entries or replace it with your own product data. To do so, open the DynamoDB console, choose Explore items, and select the Products table. Choose Scan and choose Run to view and edit the current items or choose Create item to add a new item. If your data has different attributes than the sample product entries, you need to adjust the code of the Lambda function GetProductDetailsFunction, the OpenAPI schema, and the instructions for the agent that are used in the following section.

Create the agent

Now that you have the infrastructure in place, you can create the agent. The first step is to request model access.

  1. On the Amazon Bedrock console, choose Model access in the navigation pane.
  2. Choose Enable specific models.

Model Access Enable specific model

  1. Select the model you need access to (for this post, we select Claude 3 Sonnet).

edit model access page and select claude 3 sonnet

Wait for the model access status to change to Access granted.

Model access granted

Now you can create your agent. We use a CloudFormation template to create the agent and the action group that will invoke the Lambda function.

  1. To deploy the template, choose Launch Stack:

Launch Stack to create Agent

Now you can check the details of the agent that was created by the stack.

  1. On the Amazon Bedrock console, choose Agents under Builder tools in the navigation pane.
  2. Choose the agent product-recommendation-agent, then choose Edit in Agent Builder.
  3. The Instructions for the Agent section includes a set of instructions that guides the agent in how to communicate with the user and use the API. You can adjust the instructions based on different use cases and business scenarios as well as the available APIs.

The agent’s primary goal is to engage in a conversation with the user to gather information about the recipient’s gender, the occasion for the gift, and the desired category. Based on this information, the agent will query the Lambda function to retrieve and recommend suitable products.

Your next step is to check the action group that enables the agent to invoke the Lambda function.

  1. In the Action groups section, choose the Get-Product-Recommendations action group.

You can see the GetProductDetailsFunction Lambda function is selected in the Action group invocation section.

action group invocation details

In the Action group schema section, you can see the OpenAPI schema, which enables the agent to understand the description, inputs, outputs, and the actions of the API that it can use during the conversation with the user.

action group schema

Now you can use the Test Agent pane to have conversations with the chatbot.

Test the chatbot

The following screenshots show example conversations, with the chatbot recommending products after calling the API.

Agent Test sample for a gift for brother graduation

Agent Test sample for a gift for wife in valentine's day

In the sample conversation, the chatbot asks relevant questions to determine the gift recipient’s gender, the occasion, and the desired category. After it has gathered enough information, it queries the API and presents a list of recommended products matching the user’s preferences.

You can see the rationale for each response by choosing Show trace. The following screenshots show how the agent decided to use different API filters based on the discussion.

show trace and rationale

Another show trace and rationale

You can see in the rationale field how the agent made its decision for each interaction. This trace data can help you understand the reasons behind a recommendation. Logging this information can be beneficial for future refinements of your agent’s recommendations.

Clean up

Complete the following steps to clean up your resources:

  1. On the AWS CloudFormation console, delete the stack AgentStack.
  2. Then delete the stack Productstableandapi.

Conclusion

This post showed you how to use Amazon Bedrock Agents to create a conversational chatbot that can assist users in finding the perfect gift. The chatbot intelligently gathers user preferences, queries a backend API to retrieve relevant product details, and presents its recommendations to the user. This approach demonstrates the power of Agents for Amazon Bedrock in building engaging and context-aware conversational experiences.

We recommend you follow best practices while using Amazon Bedrock Agents. For instance, using AWS CloudFormation to create and configure the agent allows you to minimize human error and recreate the agent across different environments and Regions. Also, automating your agent testing using a set of golden questions and their expected answers enables you to test the quality of the instructions for the agent and compare the outputs of the different models on Amazon Bedrock in relation to your use case.

Visit Amazon Bedrock Agents to learn more about features and details.


About the Author

Mahmoud Salaheldin is a Senior Solutions Architect in AWS, working with customers in the Middle East, North Africa, and Turkey, where he helps enterprises, digital-centered businesses, and independent software vendors innovate new products that can enhance their customer experience and increase their business efficiency. He is a generative AI ambassador as well as a containers community member. He lives in Dubai, United Arab Emirates, and enjoys riding motorcycles and traveling.

Read More

How Thomson Reuters Labs achieved AI/ML innovation at pace with AWS MLOps services

How Thomson Reuters Labs achieved AI/ML innovation at pace with AWS MLOps services

This post is co-written by Danilo Tommasina and Andrei Voinov from Thomson Reuters.

Thomson Reuters (TR) is one of the world’s most trusted information organizations for businesses and professionals. TR provides companies with the intelligence, technology, and human expertise they need to find trusted answers, enabling them to make better decisions more quickly. TR’s customers span the financial, risk, legal, tax, accounting, and media markets.

Thomson Reuters Labs (TR Labs) is the dedicated applied research division within TR. TR Labs is focused on the research, development, and application of artificial intelligence (AI) and emerging trends in technologies that can be infused into existing TR products or new offerings. TR Labs works collaboratively with various product teams to experiment, prototype, test, and deliver AI-powered innovation in pursuit of smarter and more valuable tools for our customers. The TR Labs team includes over 150 applied scientists, machine learning specialists, and machine learning engineers.

In this post, we explore how TR Labs was able to develop an efficient, flexible, and powerful MLOps process by adopting a standardized MLOps framework that uses AWS SageMaker, SageMaker Experiments, SageMaker Model Registry, and SageMaker Pipelines. The goal being to accelerate how quickly teams can experiment and innovate using AI and machine learning (ML)—whether using natural language processing (NLP), generative AI, or other techniques. We discuss how this has helped decrease the time to market for fresh ideas and helped build a cost-efficient machine learning lifecycle. Lastly, we will go through the MLOps toolchain that TR Labs built to standardize the MLOps process for developers, scientists, and engineers.

The challenge

Machine learning operations (MLOps) is the intersection of people, for gaining business value from machine learning. An MLOps practice is essential for an organization with large teams of ML engineers and data scientists. Correctly using AI/ML tools to increase productivity directly influences efficiency and cost of development. TR Labs was founded in 1992 with a vision to be a world-leading AI/ML research and development practice, forming the core innovation team that works alongside the tax, legal and news divisions of TR to ensure that their offerings remain at the cutting edge of their markets.

The TR Labs team started off as a small team in its early days, with a team directive to spearhead ML innovation to help the company in various domains including but not limited to text summarization, document categorization, and various other NLP tasks. The team made remarkable progress from an early stage with AI/ML models being integrated into TR’s products and internal editorial systems to help with efficiency and productivity.

However, as the company grew, so did the team’s size and task complexity. The team had grown to over 100 people, and they were facing new challenges. Model development and training processes were becoming increasingly complex and challenging to manage. The team had different members working on different use cases, and therefore, models. Each researcher also had their own way of developing the models. This led to a situation where there was little standardization in the process for model development. Each researcher needed to configure all the underlying resources manually, and a large amount of boilerplate code was created in parallel by different teams. A significant portion of time was spent on tasks that could be performed more efficiently.

The TR Labs leadership recognized that the existing MLOps process wasn’t scalable and needed to be standardized. It lacked sufficient automation and assistance for those new to the platform. The idea was to take well architected practices for ML model development and operations and create a customized workflow specific to Labs that uses Amazon Web Services (AWS). The vision was to harmonize and simplify the model development process and accelerate the pace of innovation. They also aimed to set the path to quickly mature research and development solutions into an operational state that would support a high degree of automation for monitoring and retraining.

In this post, we will focus on the MLOps process parts involved in the research and model development phases.

The overview section will take you through the innovative solution that TR Labs created and how it helped lower the barrier to entry while increasing the adoption of AI/ML services for new ML users on AWS while decreasing time to market for new projects.

Solution overview

The existing ML workflow required a TR user to start from scratch every time they started a new project. Research teams would have to familiarize themselves with the TR Labs standards and deploy and configure the entire MLOps toolchain manually with little automation in place. Inconsistent practices within the research community meant extra work was needed to align with production grade deployments. Many research projects had to be refactored when handing code over to MLOps engineers, who often had to reverse engineer to achieve a similar level of functionality to make the code ready to deploy to production. The team had to create an environment where researchers and engineers worked on one shared codebase and use the same toolchain, reducing the friction between experimentation and production stages. A shared codebase is also a key element for long term maintenance—changes to the existing system should be integrated directly in the production level code and not reverse engineered and re-merged out of a research repository into the production codebase. This is an anti-pattern that leads to large costs and risks over time.

Regardless of the chosen model architecture, or even if the chosen model is a third-party provider for large language models (LLMs) without any fine tuning, a robust ML system requires validation on a relevant dataset. There are multiple testing methods, such as zero-shot learning, a machine learning technique that allows a model to classify objects from previously unseen classes, without receiving any specific training for those classes, with a transition to later introduce fine tuning to improve the model’s performance. How many iterations are necessary to obtain the expected initial quality and maintain or even improve the level over time depends on the use case and the model type being developed. However, when thinking about long-term systems, teams go through tens or even hundreds of repetitions. These repetitions will contain several recurring steps such as pre-processing, training, and post processing, which are similar, if not the same, no matter which approach is taken. Repeating the process manually without following a harmonized approach is also an anti-pattern.

This process inefficiency presented an opportunity to create a coherent set of MLOps tools that would enforce TR Labs standards for how to configure and deploy SageMaker services and expose these MLOps capabilities to a user by providing standard configuration and boilerplate code. The initiative was named TR MLTools and joined several MLOps libraries developed in TR Labs under one umbrella. Under this umbrella, the team provided a command line interface (CLI) tool that would support a standard project structure and deliver boilerplate code abstracting the underlying infrastructure deployment process and promoting a standardized TR ML workflow.

MLTools and MLTools CLI were designed to be flexible and extendable while incorporating a TR Labs-opinionated view on how to run MLOps in line with TR enterprise cloud platform standards.

MLTools CLI

MLTools CLI is a Python package and a command-line tool that promotes the standardization of TR Labs ML experiments workflow (ML model development, training, and so on) by providing code and configuration templates directly into the users’ code repository. At its core, MLTools CLI aims to connect all ML experiment-related entities (Python scripts, Jupyter notebooks, configuration files, data, pipeline definitions, and so on) and provide an easy way to bootstrap new experiments, conduct trials, and run user-defined scripts, testing them locally and remotely running them at scale as SageMaker jobs.

MLTools CLI is added as a development dependency to a new or existing Python project, where code for the planned ML experiments will be developed and tracked, for example in GitHub. As part of an initial configuration step, this source-code project is associated with specific AI Platform Machine Learning Workspaces. The users can then start using the MLTools CLI for running their ML experiments using SageMaker capabilities like Processing and Training jobs, Experiments, Pipelines, and so on.

Note: AI Platform Workspaces is an internal service, developed in TR, that provides secure access to Amazon Simple Storage Service (Amazon S3)-hosted data and AWS resources like SageMaker or SageMaker Studio Notebook instances for our ML researchers. You can find more information about the AI Platform Workspaces in this AWS blog: How Thomson Reuters built an AI platform using Amazon SageMaker to accelerate delivery of ML projects.

MLTools CLI acts effectively as a frontend or as a delivery channel for the set of capabilities (libraries, tools, and templates) that TR collectively refers to as MLTools. The following diagram shows a typical TR Labs ML experiments workflow, with a focus on the role of MLTools and MLTools CLI:

MLTools CLI offers various templates that can be generated using a command-line, including the following:

  • Separate directory structure for new ML experiments and experiment trials.
  • Script templates for launching SageMaker processing, training, and batch transform jobs.
  • Complete experiment pipeline template based on SageMaker Pipeline, with user scripts as steps.
  • Docker image templates for packaging user scripts. For example, for delivery to production.

MLTools CLI also provides the following features to support effective ML experiments:

  • User scripts can be run directly as SageMaker jobs without the need to build Docker images.
  • Each experiment runs in a sandboxed Poetry environment and can have its own code package and dependency tree.
  • The main, project-level code package is shared and can be used by all project experiments and user scripts code, allowing re-use of common code with no copy-paste.
  • Context-aware API resolves and loads experiment and trial metadata based on the current working directory.
  • Created AWS resources are automatically tagged with the experiment metadata.
  • Utilities to query these experiment-related AWS resources are available.

ML experiment workflow

After MLTools CLI is installed and configured on a laptop or notebook instance, a user can begin ML experimentation work. The first step is to create a new experiment using the MLTools CLI create-experiment command:

> mltools-cli create-experiment –experiment-name my-demo-experiment

An experiment template is generated in a sub-directory of the user’s project. The generated experiment folder has a standard structure, including the initial experiment’s configuration, a sandboxed Poetry package, and sample Jupyter notebooks to help quickly bootstrap new ML experiments:

experiments
└── my_demo_experiment
    ├── data
    ├── notebooks
    ├── scripts
    ├── src
    │   └── my_demo_experiment
    │       └── __init__.py
    ├── config.yaml
    ├── poetry.toml
    ├── pyproject.toml
    ├── README.md
    └── setup.sh

The user can then create script templates for the planned ML experiment steps:

> cd experiments/my_demo_experiments
> mltools-cli create-script –script-name preprocess --job-config PROCESS
> mltools-cli create-script –script-name train --job-config TRAIN
> mltools-cli create-script –script-name evaluate --job-config INFERENCE

Generated script templates are placed under the experiment directory:

experiments
└── my_demo_experiment
    ├── ...
    └── scripts
        ├── evaluate
        │   ├── evaluate.py
        │   ├── evaluate_job.py
        │   └── requirements.txt
        ├── preprocess
        │   ├── preprocess.py
        │   ├── preprocess_job.py
        │   └── requirements.txt
        └── train
            ├── train.py
            ├── train_job.py
            └── requirements.txt

Script names should be short and unique within their parent experiment, because they’re used to generate standardized AWS resource names. Script templates are supplemented by a job configuration for a specific type of job, as specified by the user. Templates and configurations for SageMaker processing, training, and batch transform jobs are currently supported by MLTools—these offerings will be expanded in the future. A requirements.txt file is also included where users can add any dependencies required by the script code to be automatically installed by SageMaker at runtime. The script’s parent experiment and project packages are added to the requirements.txt by default, so the user can import and run code from the whole project hierarchy.

The user would then proceed to add or adapt code in the generated script templates. Experiment scripts are ordinary Python scripts that contain common boilerplate code to give users a head start. They can be run locally while adapting and debugging the code. After the code is working, the same scripts can be launched directly as SageMaker jobs. The required SageMaker job configuration is defined separately in a <script_name>_job.py file, and job configuration details are largely abstracted from the notebook experiment code. As a result, an experiment script can be launched as a SageMaker job with a few lines of code:

Let’s explore the previous code snippet in detail.

First, the MLTools experiment context is loaded based on the current working directory using the load_experiment() factory method. The experiment context concept is a central point of the MLTools API. It provides access to the experiment’s user configuration, the experiment’s scripts, and the job configuration. All project experiments are also integrated with the project-linked AI Platform workspace and therefore have access to the resources and metadata of this workspace. For example, the experiments can access the workspace AWS Identity and Access Management (IAM) role, S3 bucket and its default Amazon Elastic Container Registry (Amazon ECR) repository.

From the experiment, a job context can be loaded, providing one of the experiment’s script names—load_job("train") in this instance. During this operation, the job configuration is loaded from the script’s <script_name>_job.py module. Also, if the script code depends on the experiment or the project packages, they’re automatically built (as Python wheels) and pre-packaged together with the script code, ready to be uploaded to S3.

Next, the training script is launched as a SageMaker training job. In the background, the MLTools factory code ensures that the respective SageMaker estimator or processor instances are created with the default configuration and conform to the rules and best practices accepted in TR. This includes naming conventions, virtual private cloud (VPC) and security configurations, and tagging. Note that SageMaker local mode is fully supported (set in the example by local=True) while its specific configuration details are abstracted from the code. Although the externalized job configuration provides all the defaults, these can be overwritten by the user. In the previous example, custom hyperparameters are provided.

SageMaker jobs that were launched as part of an experiment can be listed directly from the notebook using the experiment’s list_training_jobs() and list_processing_jobs() utilities. SageMaker ExperimentAnalytics data is also available for analysis and can be retrieved by calling the experiment’s experiment_analytics() method.

Integration with SageMaker Experiments

For every new MLTools experiment, a corresponding entity is automatically created in SageMaker Experiments. Experiment names in SageMaker are standardized and made unique by adding a prefix that includes the associated workspace ID and the root commit hash of the user repository. For any job launched from within an MLTools experiment context (that is by using job.run() as shown in the preceding code snippet), a SageMaker Experiments Run instance is created and the job is automatically launched within the SageMaker Experiments Run context. This means all MLTools job runs are automatically tracked in SageMaker Experiments, ensuring that all job run metadata is recorded. This also means that users can then browse their experiments and runs directly in the experiments browser in SageMaker Studio, create visualizations for analysis, and compare model metrics, among other tasks.

As shown in the following diagram, the MLTools experiment workflow is fully integrated with SageMaker Experiments:

Integration with SageMaker Pipelines

Some of the important factors that make ML experiments scalable are their reproducibility and their operationalization level. To support this, MLTools CLI provides users with a capability to add a template with boilerplate code to link the steps of their ML experiment into a deployable workflow (pipeline) that can be automated and delivers reproducible results. The MLTools experiment pipeline implementation is based on AWS SageMaker Pipelines. The same experiment scripts that might have been run and tested as standalone SageMaker jobs can naturally form the experiment pipeline steps.

MLTools currently offers the following standard experiment pipeline template:

We made a deliberate design decision to offer a simple, linear, single-model experiment pipeline template with well-defined standard steps. Oftentimes our project work on multi-model solutions involved an ensemble of ML models that might be ultimately trained on the same set of training data. In such cases, pipelines with more complex flows, or even integrated multi-model experiment pipelines, can be perceived as more efficient. Nevertheless, from a reproducibility and standardization standpoint, a decision to develop a customized experiment pipeline would need to be justified and is generally better suited for the later stages of ML operations where efficient model deployment might be a factor.

On the other hand, using the standard MLTools experiment pipeline template, users can create and start running their experiment pipelines in the early stages of their ML experiments. The underlying pipeline template implementation allows users to easily configure and deploy partial pipelines where only some of the defined steps are implemented. For example, a user can start with a pipeline that only has a single step implemented, such as a DataPreparation step, then add ModelTraining and ModelEvaluation steps and so on. This approach aligns well with the iterative nature of ML experiments and allows for gradually creating a complete experiment pipeline as the ML experiment itself matures.

As shown in the following diagram, MLTools allows users to deploy and run their complete experiment pipelines based on SageMaker Pipelines integrated with SageMaker Model Registry and SageMaker Studio.

Results and future improvements

TR Labs’s successful creation of the MLTools toolchain helps to standardize the MLOps framework throughout the organization and provides several benefits—the first of these is faster model development times. With a consistent process, team members can now work more efficiently by using project templates that deliver a modular setup, facilitating all phases of the ML development process. The structure delivers out-of-the-box integration with TR’s AWS-based AI Platform and the ability to switch between phases of the development including research and data analysis, running experiments at scale, and delivering end-to-end ML pipeline automation. This allows the team to focus on the critical aspects of model development while technicalities are handled and provisioned in advance.

The toolchain is designed to support a close collaboration between researchers and engineers who can work on different aspects of an ML delivery while sharing a codebase that follows software development best practices.

By following a standardized MLOps process, the TR Labs team can also quickly identify issues and model performance drifts more efficiently. It becomes easier to pinpoint where errors are occurring and how to fix them. This can help to reduce downtime and improve the overall efficiency of the development and maintenance processes. The standardized process also ensures that researchers working in model development are using the same environment as ML engineers. This leads to a more efficient transition from ideation and development to deploying the output as models in production and entering the maintenance phase.

Standardizing the MLOps platform has also led to cost savings through efficiencies. With a defined process, the team can reduce the time and resources required to develop and deploy models. This leads to cost savings in the long run, making the development, and particularly the long-term maintenance processes, more cost-effective.

A difficulty the team observed was in measuring how much the toolchain improved time to market and reduced costs. Thoroughly evaluating this would require a dedicated study where independent teams would work on the same use cases with and without the toolchain and comparing the results. However, there are subjective components and possibly different approaches that you can take to resolve this question. Such an approach would be very costly and still contain a high degree of imprecision.

The TR Labs team found an alternate solution for how to measure success. At a yearly interval we run an assessment with the userbase of the toolchain. The assessment covers a variety of aspects ranging over the entire AI/ML lifecycle. Toolchain users are asked to provide subjective assessments on how much of their development time is considered “wasted” on infrastructure issues, configuration issues, or manual tasks that are repetitive. Other questions cover the level of satisfaction with the current toolchain and the perceived improvement in productivity comparing current and past work without the toolchain or earlier versions of the toolchain. The resulting values are averaged over the entire userbase, which includes a mix of job roles ranging from engineers to data scientists to researchers.

The reduction of time spent on inefficiencies, the increase in perceived productivity, and user satisfaction can be used to compute the approximate monetary savings, improvement in code quality, and reduction in time to market. These combined factors contribute to user satisfaction and improvement in the retention of talent within the ML community at TR.

As a measure of success, the TR Labs team was able to achieve reductions in accumulated time spent on inefficiencies and found that this ranges between 3 to 5 days per month per person. Measuring the impact over a period of 12 months, TR has seen improvements of up to 40 percent in perceived productivity in several areas of the lifecycle and a measurable increase in user satisfaction. These numbers are based on what the users of the toolchain reported in the self-assessments.

Conclusion

A standardized MLOps framework can lead to the reduction of bugs, faster model development times, faster troubleshooting of issues, faster reaction to model performance drifts, and cost savings gained through a more efficient end-to-end machine learning process that facilitates experimentation and model creating at scale. By adopting a standardized MLOps framework that uses AWS SageMaker, SageMaker Experiments, SageMaker Model Registry, and SageMaker Pipelines, TR Labs was able to ensure that their machine learning models were developed and deployed efficiently and effectively. This has resulted in a faster time to market and accelerated business value through development.

To learn more about how AWS can help you with your AI/ML and MLOps journey, see What is Amazon SageMaker.


About the Authors

Andrei Voinov is a Lead Software Engineer at Thomson Reuters (TR). He is currently leading a team of engineers in TR Labs with the mandate to develop and support capabilities that help researchers and engineers in TR to efficiently transition ML projects from inception, through research, integration, and delivery into production. He brings over 25 years of experience with software engineering in various sectors and extended knowledge both in the cloud and ML spaces.

Danilo Tommasina is a Distinguished Engineer at Thomson Reuters (TR). With over 20 years of experience working in technology roles ranging from Software Engineer, over Director of Engineering and now as Distinguished Engineer. As a passionate generalist, proficient in multiple programming languages, cloud technologies, DevOps practices and with engineering knowledge in the ML space, he contributed to the scaling of TR Labs’ engineering organization. He is also a big fan of automation including but not limited to MLOps processes and Infrastructure as Code principles.

Simone Zucchet is a Manager of Solutions Architecture at AWS. With close to a decade of experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.

Jeremy Bartosiewicz is a Senior Solutions Architect at AWS. With over 15 years of experience working in technology in multiple roles. Coming from a consulting background, Jeremy enjoys working on a multitude of projects that help organizations grow using cloud solutions. He helps support large enterprise customers at AWS and is part of the Advertising and Machine Learning TFCs.

Read More

Manufacturing Intelligence: Deltia AI Delivers Assembly Line Gains With NVIDIA Metropolis and Jetson

Manufacturing Intelligence: Deltia AI Delivers Assembly Line Gains With NVIDIA Metropolis and Jetson

It all started at Berlin’s Merantix venture studio in 2022, when Silviu Homoceanu and Max Fischer agreed AI could play a big role in improving manufacturing. So the two started Deltia.ai, which runs NVIDIA Metropolis vision AI on NVIDIA Jetson AGX Orin modules to measure and help optimize assembly line processes.

Hailing from AI backgrounds, Homoceanu had previously led self-driving software at Volkswagen, while Fischer had founded a startup that helped digitize more than 40 factories.

Deltia, an NVIDIA Metropolis partner, estimates that today its software platform can provide as much as a 20% performance jump on production lines for its customers.

Customers using the Deltia platform include Viessman, a maker of heating pumps, and industrial electronics company ABB, among others. Viessman is running Deltia at 15 stations, and plans to add it to even more lines in the future. Once all lines are linked to Deltia, production managers say that they expect up to a 50% increase in overall productivity.

“We provide our users with a dashboard that is basically the Google Analytics of manufacturing,” said Homoceanu, Deltia’s CTO. “We install these sensors, and two weeks later they get the keys to this dashboard, and the magic happens in the background.”

Capturing Assembly Line Insights for Digital Transformations  

Once the cameras start gathering data on assembly lines, Deltia uses that information to train models on NVIDIA-accelerated computing that can monitor activities on the line. It then uses those models deployed on Jetson AGX Orin modules at the edge to gather operational insights.

These Jetson-based systems continuously monitor the camera streams and extract metadata. This metadata identifies the exact points in time when a product arrives at a specific station, when it is being worked on and when it leaves the station. This digital information is available to line managers and process improvement personnel via Deltia’s custom dashboard, helping to identify bottlenecks and accelerate line output.

“TensorRT helps us compress complex AI models to a level where we can serve, in an economical fashion, multiple stations with a single Jetson device,” said Homoceanu.

Tapping Into Jetson Orin for Edge AI-Based Customer Insights 

Beyond identifying quick optimizations, Deltia’s analytics help visualize production flows hour-by-hour. This means that Deltia can send rapid alerts when production slips away from predicted target ranges, and it can continuously track output, cycle times and other critical key performance indicators.

It also helps map how processes flow throughout a factory floor, and it suggests improvements for things like walking routes and shop-floor layouts. One of Deltia’s customers used the platform to identify that materials shelves were too far from workers, which caused unnecessarily long cycle times and limited production. Once the shelves were moved, production went up more than 30%.

Deltia’s applications extend beyond process improvements. The platform can be used to help monitor machine states at a granular level, assisting to predict when machine parts are worn out and recommend preemptive replacements, saving time and money down the line. The platform can also suggest optimizations for energy usage, saving on operational costs and reducing maintenance expenses.

“Our vision is to empower manufacturers with the tools to achieve unprecedented efficiency,” said Fischer, CEO of Deltia.ai. “Seeing our customers experience as much as a 30% increase in productivity with our vision models running on NVIDIA Jetson Orin validates the transformative potential of our technology.”

Deltia is a member of the NVIDIA Inception program for cutting-edge startups.

Learn more about NVIDIA Metropolis and NVIDIA Jetson.

Read More