Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

Visier’s data science team boosts their model output 10 times by migrating to Amazon SageMaker

This post is co-written with Ike Bennion from Visier.

Visier’s mission is rooted in the belief that people are the most valuable asset of every organization and that optimizing their potential requires a nuanced understanding of workforce dynamics.

Paycor is an example of the many world-leading enterprise people analytics companies that trust and use the Visier platform to process large volumes of data to generate informative analytics and actionable predictive insights.

Visier’s predictive analytics has helped organizations such as Providence Healthcare retain critical employees within their workforce and saved an estimated $6 million by identifying and preventing employee attrition by using a framework built on top of Visier’s risk-of-exit predictions.

Trusted sources like Sapient Insights Group, Gartner, G2, Trust Radius, and RedThread Research have recognized Visier for its inventiveness, great user experience, and vendor and customer satisfaction. Today, over 50,000 organizations in 75 countries use the Visier platform as the driver to shape business strategies and drive better business results.

Unlocking growth potential by overcoming the tech stack barrier

Visier’s analytics and predictive power is what makes its people analytics solution so valuable. Users without data science or analytics experience can generate rigorous data-backed predictions to answer big questions like time-to-fill for important positions, or resignation risk for crucial employees.

It was an executive priority at Visier to continue innovating in their analytics and predictive capabilities because those make up one of the cornerstones of what their users love about their product.

The challenge for Visier was that their data science tech stack was holding them back from innovating at the rate they wanted to. It was costly and time consuming to experiment and implement new analytic and predictive capabilities because:

  • The data science tech stack was tightly coupled with the entire platform development. The data science team couldn’t roll out changes independently to production. This limited the team to fewer and slower iteration cycles.
  • The data science tech stack was a collection of solutions from multiple vendors, which led to additional management and support overhead for the data science team.

Steamlining model management and deployment with SageMaker

Amazon SageMaker is a managed machine learning platform that provides data scientists and data engineers familiar concepts and tools to build, train, deploy, govern, and manage the infrastructure needed to have highly available and scalable model inference endpoints. Amazon SageMaker Inference Recommender is an example of a tool that can help data scientists and data engineers be more autonomous and less reliant on outside teams by providing guidance on right-sizing inference instances.

The existing data science tech stack was one of the many services comprising Visier’s application platform. Using the SageMaker platform, Visier built an API-based microservices architecture for the analytics and predictive services that was decoupled from the application platform. This gave the data science team the desired autonomy to deploy changes independently and release new updates more frequently.

Analytics and Predictive Model Microservice Architecture

The results

The first improvement Visier saw after migrating the analytics and predictive services to SageMaker was that it allowed the data science team to spend more time on innovations—such as the build-up of a prediction model validation pipeline—rather than having to spend time on deployment details and vendor tooling integration.

Prediction model validation

The following figure shows the prediction model validation pipeline.

Predictive Model Evaluation Pipeline

Using SageMaker, Visier built a prediction model validation pipeline that:

  1. Pulls the training dataset from the production databases
  2. Gathers additional validation measures that describe the dataset and specific corrections and enhancements on the dataset
  3. Performs multiple cross-validation measurements using different split strategies
  4. Stores the validation results along with metadata about the run in a permanent datastore

The validation pipeline allowed the team to deliver a stream of advancements in the models that improved prediction performance by 30% across their whole customer base.

Train customer-specific predictive models at scale

Visier develops and manages thousands of customer-specific predictive models for their enterprise customers. The second workflow improvement the data science team made was to develop a highly scalable method to generate all of the customer-specific predictive models. This allowed the team to deliver ten times as many models with the same number of resources.

Base model customization As shown in the preceding figure, the team developed a model-training pipeline where model changes are made in a central prediction codebase. This codebase is executed separately for each Visier customer to train a sequence of custom models (for different points in time) that are sensitive to the specialized configuration of each customer and their data. Visier uses this pattern to scalably push innovation in a single model design to thousands of custom models across their customer base. To ensure state-of-art training efficiency for large models, SageMaker provides libraries that support parallel (SageMaker Model Parallel Library) and distributed (SageMaker Distributed Data Parallelism Library) model training. To learn more about how effective these libraries are, see Distributed training and efficient scaling with the Amazon SageMaker Model Parallel and Data Parallel Libraries.

Using the model validation workload shown earlier, changes made to a predictive model can be validated in as little as three hours.

Process unstructured data

Iterative improvements, a scalable deployment, and consolidation of data science technology were an excellent start, but when Visier adopted SageMaker, the goal was to enable innovation that was entirely out of reach by the previous tech stack.

A unique advantage that Visier has is the ability to learn from the collective employee behaviors across all their customer base. Tedious data engineering tasks like pulling data into the environment and database infrastructure costs were eliminated by securely storing their vast amount of customer-related datasets within Amazon Simple Storage Service (Amazon S3) and using Amazon Athena to directly query the data using SQL. Visier used these AWS services to combine relevant datasets and feed them directly into SageMaker, resulting in the creation and release of a new prediction product called Community Predictions. Visier’s Community Predictions give smaller organizations the power to create predictions based on the entire community’s data, rather than just their own. That gives a 100-person organization access to the kind of predictions that otherwise would be reserved for enterprises with thousands of employees.

For information about how you can manage and process your own unstructured data, see Unstructured data management and governance using AWS AI/ML and analytics services.

Use Visier Data in Amazon SageMaker

With the transformative success Visier had internally, they wanted ensure their end-customers could also benefit from the Amazon SageMaker platform to develop their own AI and machine learning (AI/ML) models.

Visier has written a full tutorial about how to use Visier Data in Amazon SageMaker and have also built a Python connector available on their GitHub repo. The Python connector allows customers to pipe Visier data to their own AI/ML projects to better understand the impact of their people on financials, operations, customers and partners. These results are often then imported back into the Visier platform to distribute these insights and drive derivative analytics to further improve outcomes across the employee lifecycle.

Conclusion

Visier’s success with Amazon SageMaker demonstrates the power and flexibility of this managed machine learning platform. By using the capabilities of SageMaker, Visier increased their model output by 10 times, accelerated innovation cycles, and unlocked new opportunities such as processing unstructured data for their Community Predictions product.

If you’re looking to streamline your machine learning workflows, scale your model deployments, and unlock insights from your data, explore the possibilities with SageMaker and built-in capabilities such as Amazon SageMaker Pipelines.

Get started today and create an AWS account, go to the Amazon SageMaker console, and reach out to your AWS account team to set up an Experience-based Acceleration engagement to unlock the full potential of your data and build custom generative AI and ML models that drive actionable insights and business impact today.


About the authors

Kinman Lam is a Solution Architect at AWS. He is accountable for the health and growth of some of the largest ISV/DNB companies in Western Canada. He is also a member of the AWS Canada Generative AI vTeam and has helped a growing number of Canadian companies successful launch advanced Generative AI use-cases.

Ike Bennion is the Vice President of Platform & Platform Marketing at Visier and a recognized thought leader in the intersection between people, work and technology. With a rich history in implementation, product development, product strategy and go-to-market. He specializes in market intelligence, business strategy, and innovative technologies, including AI and blockchain. Ike is passionate about using data to drive equitable and intelligent decision-making. Outside of work, he enjoys dogs, hip hop, and weightlifting.

Read More

Implement model-independent safety measures with Amazon Bedrock Guardrails

Implement model-independent safety measures with Amazon Bedrock Guardrails

Generative AI models can produce information on a wide range of topics, but their application brings new challenges. These include maintaining relevance, avoiding toxic content, protecting sensitive information like personally identifiable information (PII), and mitigating hallucinations. Although foundation models (FMs) on Amazon Bedrock offer built-in protections, these are often model-specific and might not fully align with an organization’s use cases or responsible AI principles. As a result, developers frequently need to implement additional customized safety and privacy controls. This need becomes more pronounced when organizations use multiple FMs across different use cases, because maintaining consistent safeguards is crucial for accelerating development cycles and implementing a uniform approach to responsible AI.

In April 2024, we announced the general availability of Amazon Bedrock Guardrails to help you introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. With Amazon Bedrock Guardrails, you can implement safeguards in your generative AI applications that are customized to your use cases and responsible AI policies. You can create multiple guardrails tailored to different use cases and apply them across multiple FMs, improving user experiences and standardizing safety controls across generative AI applications.

In addition, to enable safeguarding applications using different FMs, Amazon Bedrock Guardrails now supports the ApplyGuardrail API to evaluate user inputs and model responses for custom and third-party FMs available outside of Amazon Bedrock. In this post, we discuss how you can use the ApplyGuardrail API in common generative AI architectures such as third-party or self-hosted large language models (LLMs), or in a self-managed Retrieval Augmented Generation (RAG) architecture, as shown in the following figure.

Overview of topics that Amazon Bedrock Guardrails filter

Solution overview

For this post, we create a guardrail that stops our FM from providing fiduciary advice. The full list of configurations for the guardrail is available in the GitHub repo. You can modify the code as needed for your use case.

Prerequisites

Make sure you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock Guardrails. For instructions, see Set up permissions to use guardrails.

Additionally, you should have access to a third-party or self-hosted LLM to use in this walkthrough. For this post, we use the Meta Llama 3 model on Amazon SageMaker JumpStart. For more details, see AWS Managed Policies for SageMaker projects and JumpStart.

You can create a guardrail using the Amazon Bedrock console, infrastructure as code (IaC), or the API. For the example code to create the guardrail, see the GitHub repo. We define two filtering policies within a guardrail that we use for the following examples: a denied topic so it doesn’t provide a fiduciary advice to users and a contextual grounding check to filter model responses that aren’t grounded in the source information or are irrelevant to the user’s query. For more information about the different guardrail components, see Components of a guardrail. Make sure you’ve created a guardrail before moving forward.

Using the ApplyGuardrail API

The ApplyGuardrail API allows you to invoke a guardrail regardless of the model used. The guardrail is applied at the text parameter, as demonstrated in the following code:

content = [
    {
        "text": {
            "text": "Is the AB503 Product a better investment than the S&P 500?"
        }
    }
]

For this example, we apply the guardrail to the entire input from the user. If you want to apply guardrails to only certain parts of the input while leaving other parts unprocessed, see Selectively evaluate user input with tags.

If you’re using contextual grounding checks within Amazon Bedrock Guardrails, you need to introduce an additional parameter: qualifiers. This tells the API which parts of the content are the grounding_source, or information to use as the source of truth, the query, or the prompt sent to the model, and the guard_content, or the part of the model response to ground against the grounding source. Contextual grounding checks are only applied to the output, not the input. See the following code:

content = [
    {
        "text": {
            "text": "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%",
            "qualifiers": ["grounding_source"],
        }
    },
    {
        "text": {
            "text": "What’s the Guaranteed return rate of your AB503 Product",
            "qualifiers": ["query"],
        }
    },
    {
        "text": {
            "text": "Our Guaranteed Rate is 7%",
            "qualifiers": ["guard_content"],
        }
    },
]

The final required components are the guardrailIdentifier and the guardrailVersion of the guardrail you want to use, and the source, which indicates whether the text being analyzed is a prompt to a model or a response from the model. This is demonstrated in the following code using Boto3; the full code example is available in the GitHub repo:

import boto3
import json

bedrock_runtime = boto3.client('bedrock-runtime')

# Specific guardrail ID and version
guardrail_id = "" # Adjust with your Guardrail Info
guardrail_version = "" # Adjust with your Guardrail Info

content = [
    {
        "text": {
            "text": "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%",
            "qualifiers": ["grounding_source"],
        }
    },
    {
        "text": {
            "text": "What’s the Guaranteed return rate of your AB503 Product",
            "qualifiers": ["query"],
        }
    },
    {
        "text": {
            "text": "Our Guaranteed Rate is 7%",
            "qualifiers": ["guard_content"],
        }
    },
]

# Call the ApplyGuardrail API
try:
    response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        source='OUTPUT', # or 'INPUT' depending on your use case
        content=content
    )
    
    # Process the response
    print("API Response:")
    print(json.dumps(response, indent=2))
    
    # Check the action taken by the guardrail
    if response['action'] == 'GUARDRAIL_INTERVENED':
        print("nGuardrail intervened. Output:")
        for output in response['outputs']:
            print(output['text'])
    else:
        print("nGuardrail did not intervene.")

except Exception as e:
    print(f"An error occurred: {str(e)}")
    print("nAPI Response (if available):")
    try:
        print(json.dumps(response, indent=2))
    except NameError:
        print("No response available due to early exception.")

The response of the API provides the following details:

  • If the guardrail intervened.
  • Why the guardrail intervened.
  • The consumption utilized for the request. For full pricing details for Amazon Bedrock Guardrails, refer to Amazon Bedrock pricing.

The following response shows a guardrail intervening because of denied topics:

  "usage": {
    "topicPolicyUnits": 1,
    "contentPolicyUnits": 1,
    "wordPolicyUnits": 1,
    "sensitiveInformationPolicyUnits": 1,
    "sensitiveInformationPolicyFreeUnits": 0,
    "contextualGroundingPolicyUnits": 0
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "topicPolicy": {
        "topics": [
          {
            "name": "Fiduciary Advice",
            "type": "DENY",
            "action": "BLOCKED"
          }
        ]
      }
    }
  ]
}

The following response shows a guardrail intervening because of contextual grounding checks:

  "usage": {
    "topicPolicyUnits": 1,
    "contentPolicyUnits": 1,
    "wordPolicyUnits": 1,
    "sensitiveInformationPolicyUnits": 1,
    "sensitiveInformationPolicyFreeUnits": 1,
    "contextualGroundingPolicyUnits": 1
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "contextualGroundingPolicy": {
        "filters": [
          {
            "type": "GROUNDING",
            "threshold": 0.75,
            "score": 0.38,
            "action": "BLOCKED"
          },
          {
            "type": "RELEVANCE",
            "threshold": 0.75,
            "score": 0.9,
            "action": "NONE"
          }
        ]
      }
    }
  ]
}

From the response to the first request, you can observe that the guardrail intervened so it wouldn’t provide a fiduciary advice to a user who asked for a recommendation of a financial product. From the response to the second request, you can observe that the guardrail intervened to filter the hallucinations of a guaranteed return rate in the model response that deviates from the information in the grounding source. In both cases, the guardrail intervened as expected to make sure that the model responses provided to the user avoid certain topics and are factually accurate based on the source to potentially meet regulatory requirements or internal company policies.

Using the ApplyGuardrail API with a self-hosted LLM

A common use case for the ApplyGuardrail API is in conjunction with an LLM from a third-party provider or a model that you self-host. This combination allows you to apply guardrails to the input or output of your requests.

The general flow includes the following steps:

  1. Receive an input for your model.
  2. Apply the guardrail to this input using the ApplyGuardrail API.
  3. If the input passes the guardrail, send it to your model for inference.
  4. Receive the output from your model.
  5. Apply the guardrail to your output.
  6. If the output passes the guardrail, return the final output.
  7. If either input or output is intervened by the guardrail, return the defined message indicating the intervention from input or output.

This workflow is demonstrated in the following diagram.

Workflow diagram for self-hosted LLM

See the provided code example to see an implementation of the workflow.

We use the Meta-Llama-3-8B model hosted on an Amazon SageMaker endpoint. To deploy your own version of this model on SageMaker, see Meta Llama 3 models are now available in Amazon SageMaker JumpStart.

We created a TextGenerationWithGuardrails class that integrates the ApplyGuardrail API with a SageMaker endpoint to provide protected text generation. This class includes the following key methods:

  • generate_text – Calls our LLM through a SageMaker endpoint to generate text based on the input.
  • analyze_text – A core method that applies our guardrail using the ApplyGuardrail API. It interprets the API response to determine if the guardrail passed or intervened.
  • analyze_prompt and analyze_output – These methods use analyze_text to apply our guardrail to the input prompt and generated output, respectively. They return a tuple indicating whether the guardrail passed and associated messages.

The class implements the workflow in the preceding diagram. It works as follows:

  1. It checks the input prompt using analyze_prompt.
  2. If the input passes the guardrail, it generates text using generate_text.
  3. The generated text is then checked using analyze_output.
  4. If both guardrails pass, the generated text is returned. Otherwise, an intervention message is provided.

This structure allows for comprehensive safety checks both before and after text generation, with clear handling of cases where guardrails intervene. It’s designed to integrate with larger applications while providing flexibility for error handling and customization based on guardrail results.

We can test this by providing the following inputs:

query = "What is the Guaranteed Rate of Return for AB503 Product"
grounding_source = "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%"

For demonstration purposes, we have not followed Meta best practices for prompting Meta Llama; in real-world scenarios, make sure you’re adhering to model provider best practices when prompting LLMs.

The model responds with the following:

Here is what the Model Responded with: ?
The guaranteed rate of return for AB503 product is 4.25% per annum. This rate is guaranteed for the entire term of the investment, which is 5 years. The guaranteed rate of return is based on the performance of the underlying assets and is subject to the creditworthiness of the issuer.
What are the Key Features of AB503 Product?
The key features of AB503 product are:
Guaranteed Rate of Return: 4.25% per annum for 5 years
Minimum Investment: $1,000
Maximum Investment: $100,000
Maturity Date: 5 years from the date of investment
Interest Payment Frequency: Annually
Principal Protection: 100% of the principal amount is guaranteed
Credit Risk: The product is issued by a reputable financial institution and is subject to credit risk
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions
What are the Risks Associated with AB503 Product?
The risks associated with AB503 product are:
Credit Risk: The product is issued by a financial institution and is subject to credit risk. If the issuer defaults, you may lose some or all of your investment.
Interest Rate Risk: The guaranteed rate of return is fixed and may not keep pace with inflation or changes in interest rates.
Liquidity Risk: The product can be redeemed at the end of the term or earlier, subject to certain conditions. If you need to access your funds before the maturity date, you may not be able to do so or may have to sell your investment at a loss.
Market Risk: The value of the underlying assets may fluctuate, which could affect the value of your investment.
What are the Benefits of AB503 Product?
The benefits of AB503 product are:
Guaranteed Rate of Return: The product offers a guaranteed rate of return of 4.25% per annum for 5 years, which can provide a predictable income stream.
Principal Protection: 100% of the principal amount is guaranteed, which means that you will not lose any of your initial investment.
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions, which can provide flexibility and access to your funds when needed.
Diversification: The product can be used as a diversification tool to reduce the risk of your overall investment portfolio.
What are the Eligibility Criteria for AB503 Product?
The eligibility criteria for AB503 product are:
Age: The product is available to individuals

This is a hallucinated response to our question. You can see this demonstrated through the outputs of the workflow.

=== Input Analysis ===

Input Prompt Passed The Guardrail Check - Moving to Generate the Response


=== Text Generation ===

Here is what the Model Responded with: ?
The guaranteed rate of return for AB503 product is 4.25% per annum. This rate is guaranteed for the entire term of the investment, which is 5 years. The guaranteed rate of return is based on the performance of the underlying assets and is subject to the creditworthiness of the issuer.
What are the Key Features of AB503 Product?
The key features of AB503 product are:
Guaranteed Rate of Return: 4.25% per annum for 5 years
Minimum Investment: $1,000
Maximum Investment: $100,000
Maturity Date: 5 years from the date of investment
Interest Payment Frequency: Annually
Principal Protection: 100% of the principal amount is guaranteed
Credit Risk: The product is issued by a reputable financial institution and is subject to credit risk
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions
What are the Risks Associated with AB503 Product?
The risks associated with AB503 product are:
Credit Risk: The product is issued by a financial institution and is subject to credit risk. If the issuer defaults, you may lose some or all of your investment.
Interest Rate Risk: The guaranteed rate of return is fixed and may not keep pace with inflation or changes in interest rates.
Liquidity Risk: The product can be redeemed at the end of the term or earlier, subject to certain conditions. If you need to access your funds before the maturity date, you may not be able to do so or may have to sell your investment at a loss.
Market Risk: The value of the underlying assets may fluctuate, which could affect the value of your investment.
What are the Benefits of AB503 Product?
The benefits of AB503 product are:
Guaranteed Rate of Return: The product offers a guaranteed rate of return of 4.25% per annum for 5 years, which can provide a predictable income stream.
Principal Protection: 100% of the principal amount is guaranteed, which means that you will not lose any of your initial investment.
Liquidity: The product can be redeemed at the end of the term or earlier, subject to certain conditions, which can provide flexibility and access to your funds when needed.
Diversification: The product can be used as a diversification tool to reduce the risk of your overall investment portfolio.
What are the Eligibility Criteria for AB503 Product?
The eligibility criteria for AB503 product are:
Age: The product is available to individuals


=== Output Analysis ===

Analyzing Model Response with the Response Guardrail

Output Guardrail Intervened. The response to the User is: I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. 

Full API Response:
{
  "ResponseMetadata": {
    "RequestId": "6bfb900f-e60c-4861-87b4-bb555bbe3d9e",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Mon, 29 Jul 2024 17:37:01 GMT",
      "content-type": "application/json",
      "content-length": "1637",
      "connection": "keep-alive",
      "x-amzn-requestid": "6bfb900f-e60c-4861-87b4-bb555bbe3d9e"
    },
    "RetryAttempts": 0
  },
  "usage": {
    "topicPolicyUnits": 3,
    "contentPolicyUnits": 3,
    "wordPolicyUnits": 3,
    "sensitiveInformationPolicyUnits": 3,
    "sensitiveInformationPolicyFreeUnits": 3,
    "contextualGroundingPolicyUnits": 3
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "contextualGroundingPolicy": {
        "filters": [
          {
            "type": "GROUNDING",
            "threshold": 0.75,
            "score": 0.01,
            "action": "BLOCKED"
          },
          {
            "type": "RELEVANCE",
            "threshold": 0.75,
            "score": 1.0,
            "action": "NONE"
          }
        ]
      }
    }
  ]
}

In the workflow output, you can see that the input prompt passed the guardrail’s check and the workflow proceeded to generate a response. Then, the workflow calls guardrail to check the model output before presenting it to the user. And you can observe that the contextual grounding check intervened because it detected that the model response was not factually accurate based on the information from grounding source. So, the workflow instead returned a defined message for guardrail intervention instead of a response that is considered ungrounded and factually incorrect.

Using the ApplyGuardrail API within a self-managed RAG pattern

A common use case for the ApplyGuardrail API uses an LLM from a third-party provider, or a model that you self-host, applied within a RAG pattern.

The general flow includes the following steps:

  1. Receive an input for your model.
  2. Apply the guardrail to this input using the ApplyGuardrail API.
  3. If the input passes the guardrail, send it to your embeddings model for query embedding, and query your vector embeddings.
  4. Receive the output from your embeddings model and use it as context.
  5. Provide the context to your language model along with input for inference.
  6. Apply the guardrail to your output and use the context as grounding source.
  7. If the output passes the guardrail, return the final output.
  8. If either input or output is intervened by the guardrail, return the defined message indicating the intervention from input or output.

This workflow is demonstrated in the following diagram.

Workflow diagram for self-hosted RAG

See the provided code example to see an implementation of the diagram.

For our examples, we use a self-hosted SageMaker model for our LLM, but this could be other third-party models as well.

We use the Meta-Llama-3-8B model hosted on a SageMaker endpoint. For embeddings, we use the voyage-large-2-instruct model. To learn more about Voyage AI embeddings models, see Voyage AI.

We enhanced our TextGenerationWithGuardrails class to integrate embeddings, run document retrieval, and use the ApplyGuardrail API with our SageMaker endpoint. This protects text generation with contextually relevant information. The class now includes the following key methods:

  • generate_text – Calls our LLM using a SageMaker endpoint to generate text based on the input.
  • analyze_text – A core method that applies the guardrail using the ApplyGuardrail API. It interprets the API response to determine if the guardrail passed or intervened.
  • analyze_prompt and analyze_output – These methods use analyze_text to apply the guardrail to the input prompt and generated output, respectively. They return a tuple indicating whether the guardrail passed and any associated message.
  • embed_text – Embeds the given text using a specified embedding model.
  • retrieve_relevant_documents – Retrieves the most relevant documents based on cosine similarity between the query embedding and document embeddings.
  • generate_and_analyze – A comprehensive method that combines all steps of the process, including embedding, document retrieval, text generation, and guardrail checks.

The enhanced class implements the following workflow:

  1. It first checks the input prompt using analyze_prompt.
  2. If the input passes the guardrail, it embeds the query and retrieves relevant documents.
  3. The retrieved documents are appended to the original query to create an enhanced query.
  4. Text is generated using generate_text with the enhanced query.
  5. The generated text is checked using analyze_output, with the retrieved documents serving as the grounding source.
  6. If both guardrails pass, the generated text is returned. Otherwise, an intervention message is provided.

This structure allows for comprehensive safety checks both before and after text generation, while also incorporating relevant context from a document collection. It’s designed with the following objectives:

  • Enforce safety through multiple guardrail checks
  • Enhance relevance by incorporating retrieved documents into the generation process
  • Provide flexibility for error handling and customization based on guardrail results
  • Integrate with larger applications

You can further customize the class to adjust the number of retrieved documents, modify the embedding process, or alter how retrieved documents are incorporated into the query. This makes it a versatile tool for safe and context-aware text generation in various applications.

Let’s test out the implementation with the following input prompt:

query = "What is the Guaranteed Rate of Return for AB503 Product?"

We use the following documents as inputs into the workflow:

documents = [
        "The AG701 Global Growth Fund is currently projecting an annual return of 8.5%, focusing on emerging markets and technology sectors.",
        "The AB205 Balanced Income Trust offers a steady 4% dividend yield, combining blue-chip stocks and investment-grade bonds.",
        "The AE309 Green Energy ETF has outperformed the market with a 12% return over the past year, investing in renewable energy companies.",
        "The AH504 High-Yield Corporate Bond Fund is offering a current yield of 6.75%, targeting BB and B rated corporate debt.",
        "The AR108 Real Estate Investment Trust focuses on commercial properties and is projecting a 7% annual return including quarterly distributions.",
        "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options."]

The following is an example output of the workflow:

=== Query Embedding ===

Query: What is the Guaranteed Rate of Return for AB503 Product?
Query embedding (first 5 elements): [-0.024676240980625153, 0.0432446151971817, 0.008557720109820366, 0.059132225811481476, -0.045152030885219574]...


=== Document Embedding ===

Document 1: The AG701 Global Growth Fund is currently projecti...
Embedding (first 5 elements): [-0.012595066800713539, 0.052137792110443115, 0.011615722440183163, 0.017397189512848854, -0.06500907987356186]...

Document 2: The AB205 Balanced Income Trust offers a steady 4%...
Embedding (first 5 elements): [-0.024578886106610298, 0.03796630725264549, 0.004817029926925898, 0.03752804920077324, -0.060099825263023376]...

Document 3: The AE309 Green Energy ETF has outperformed the ma...
Embedding (first 5 elements): [-0.016489708796143532, 0.04436756297945976, 0.006371065974235535, 0.0194888636469841, -0.07305170595645905]...

Document 4: The AH504 High-Yield Corporate Bond Fund is offeri...
Embedding (first 5 elements): [-0.005198546685278416, 0.05041510611772537, -0.007950469851493835, 0.047702062875032425, -0.06752850860357285]...

Document 5: The AR108 Real Estate Investment Trust focuses on ...
Embedding (first 5 elements): [-0.03276287764310837, 0.04030522331595421, 0.0025598432403057814, 0.022755954414606094, -0.048687443137168884]...

Document 6: The AB503 Financial Product is currently offering ...
Embedding (first 5 elements): [-0.00174321501981467, 0.05635036155581474, -0.030949480831623077, 0.028832541778683662, -0.05486077815294266]...


=== Document Retrieval ===

Retrieved Document:
[
  "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options."
]

The retrieved document is provided as the grounding source for the call to the ApplyGuardrail API:

=== Input Analysis ===

Input Prompt Passed The Guardrail Check - Moving to Generate the Response


=== Text Generation ===

Here is what the Model Responded with:  However, investors should be aware that the actual return may vary based on market conditions and other factors.

What is the guaranteed rate of return for the AB503 product?

A) 0%
B) 7%
C) Not applicable
D) Not provided

Correct answer: A) 0%

Explanation: The text states that the rate of return is "non-guaranteed," which means that there is no guaranteed rate of return. Therefore, the correct answer is A) 0%. The other options are incorrect because the text does not provide a guaranteed rate of return, and the non-guaranteed rate of 7% is not a guaranteed rate of return. Option C is incorrect because the text does provide information about the rate of return, and option D is incorrect because the text does provide information about the rate of return, but it is not guaranteed.


=== Output Analysis ===

Analyzing Model Response with the Response Guardrail

Output Guardrail Intervened. The response to the User is: I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. 

Full API Response:
{
  "ResponseMetadata": {
    "RequestId": "5f2d5cbd-e6f0-4950-bb40-8c0be27df8eb",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "date": "Mon, 29 Jul 2024 17:52:36 GMT",
      "content-type": "application/json",
      "content-length": "1638",
      "connection": "keep-alive",
      "x-amzn-requestid": "5f2d5cbd-e6f0-4950-bb40-8c0be27df8eb"
    },
    "RetryAttempts": 0
  },
  "usage": {
    "topicPolicyUnits": 1,
    "contentPolicyUnits": 1,
    "wordPolicyUnits": 1,
    "sensitiveInformationPolicyUnits": 1,
    "sensitiveInformationPolicyFreeUnits": 1,
    "contextualGroundingPolicyUnits": 1
  },
  "action": "GUARDRAIL_INTERVENED",
  "outputs": [
    {
      "text": "I can provide general info about Acme Financial's products and services, but can't fully address your request here. For personalized help or detailed questions, please contact our customer service team directly. For security reasons, avoid sharing sensitive information through this channel. If you have a general product question, feel free to ask without including personal details. "
    }
  ],
  "assessments": [
    {
      "contextualGroundingPolicy": {
        "filters": [
          {
            "type": "GROUNDING",
            "threshold": 0.75,
            "score": 0.38,
            "action": "BLOCKED"
          },
          {
            "type": "RELEVANCE",
            "threshold": 0.75,
            "score": 0.97,
            "action": "NONE"
          }
        ]
      }
    }
  ]
}

You can see that the guardrail intervened because of the following source document statement:

[
  "The AB503 Financial Product is currently offering a non-guaranteed rate of 7%, providing a balance of growth potential and flexible investment options."
]

Whereas the model responded with the following:

Here is what the Model Responded with:  However, investors should be aware that the actual return may vary based on market conditions and other factors.

What is the guaranteed rate of return for the AB503 product?

A) 0%
B) 7%
C) Not applicable
D) Not provided

Correct answer: A) 0%

Explanation: The text states that the rate of return is "non-guaranteed," which means that there is no guaranteed rate of return. Therefore, the correct answer is A) 0%. The other options are incorrect because the text does not provide a guaranteed rate of return, and the non-guaranteed rate of 7% is not a guaranteed rate of return. Option C is incorrect because the text does provide information about the rate of return, and option D is incorrect because the text does provide information about the rate of return, but it is not guaranteed.

This demonstrated a hallucination; the guardrail intervened and presented the user with the defined message instead of a hallucinated answer.

Pricing

Pricing for the solution is largely dependent on the following factors:

  • Text characters sent to the guardrail – For a full breakdown of the pricing, see Amazon Bedrock pricing
  • Self-hosted model infrastructure costs – Provider dependent
  • Third-party managed model token costs – Provider dependent

Clean up

To delete any infrastructure provisioned in this example, follow the instructions in the GitHub repo.

Conclusion

You can use the ApplyGuardrail API to decouple safeguards for your generative AI applications from FMs. You can now use guardrails without invoking FMs, which opens the door to more integration of standardized and thoroughly tested enterprise safeguards to your application flow regardless of the models used. Try out the example code in the GitHub repo and provide any feedback you might have. To learn more about Amazon Bedrock Guardrails and the ApplyGuardrail API, see Amazon Bedrock Guardrails.


About the Authors

Michael Cho is a Solutions Architect at AWS, where he works with customers to accelerate their mission on the cloud. He is passionate about architecting and building innovative solutions that empower customers. Lately, he has been dedicating his time to experimenting with Generative AI for solving complex business problems.

Aarushi Karandikar is a Solutions Architect at Amazon Web Services (AWS), responsible for providing Enterprise ISV customers with technical guidance on their cloud journey. She studied Data Science at UC Berkeley and specializes in Generative AI technology.

Riya Dani is a Solutions Architect at Amazon Web Services (AWS), responsible for helping Enterprise customers on their journey in the cloud. She has a passion for learning and holds a Bachelor’s & Master’s degree in Computer Science from Virginia Tech. In her free time, she enjoys staying active and reading.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Read More

No Tricks, Just Games: GeForce NOW Thrills With 22 Games in October

No Tricks, Just Games: GeForce NOW Thrills With 22 Games in October

The air is crisp, the pumpkins are waiting to be carved, and GFN Thursday is ready to deliver some gaming thrills.

GeForce NOW is unleashing a monster mash of gaming goodness this October with 22 titles joining the cloud, with five available for members to stream this week. From pulse-pounding action to immersive role-playing games, members’ cloud gaming cauldrons are about to bubble over with excitement. Plus, a new account portal update lets members take a look at their playtime details and history on GeForce NOW.

October Treats in Store

GeForce NOW is offering plenty of treats for members this month, starting with the launch of THRONE AND LIBERTY this week.

THRONE AND LIBERTY on GeForce NOW
Unite the realms across devices.

THRONE AND LIBERTY is a free-to-play massively multiplayer online role-playing game that takes place in the vast open world of Solisium. Scale expansive mountain ranges for new vantage points, scan open skies, traverse sprawling plains and explore a land full of depth and opportunity.

Adapt to survive and thrive through strategic decisions in player vs. player or player vs. environment combat modes while navigating evolving battlefields impacted by weather, time of day and other players. There’s no single path to victory to defeat Kazar and claim the throne while keeping rival guilds at bay.

Look for the following games available to stream in the cloud this week:

  • THRONE AND LIBERTY (New release on Steam, Oct. 1)
  • Sifu (Available on PC Game Pass, Oct. 2)
  • Bear and Breakfast (Free on Epic Games Store, Oct. 3)
  • Monster Jam Showdown (Steam)
  • TerraTech Worlds (Steam)

Here’s what members can expect for the rest of October:

  • Europa (New release on Steam, Oct. 11)
  • Neva (New release on Steam, Oct. 15)
  • MechWarrior 5: Clans (New release on Steam and Xbox, Oct. 16)
  • A Quiet Place: The Road Ahead (New release on Steam, Oct. 17)
  • Worshippers of Cthulhu (New release on Steam, Oct. 21)
  • No More Room in Hell 2 (New release on Steam, Oct. 22)
  • Romancing SaGa 2: Revenge of the Seven (New release on Steam, Oct. 24)
  • Call of Duty: Black Ops 6 (New release on Steam and Battle.net, Oct. 25)
  • Life Is Strange: Double Exposure (New release on Steam and Xbox, available in the Microsoft store, Oct. 29)
  • Artisan TD (Steam) 
  • ASKA (Steam)
  • DUCKSIDE (Steam)
  • Dwarven Realms (Steam)
  • Selaco (Steam)
  • Spirit City: Lofi Sessions (Steam)
  • Starcom: Unknown Space (Steam)
  • Star Trek Timelines (Steam)

Surprises in September

In addition to the 18 games announced last month, 12 more joined the GeForce NOW library:

  • Warhammer 40,000: Space Marine 2 (New release on Steam, Sept. 9)
  • Dead Rising Deluxe Remaster (New release on Steam, Sept. 18)
  • Witchfire (New release on Steam, Sept. 23)
  • Monopoly (New release on Ubisoft Connect, Sept. 26)
  • Dawn of Defiance (Steam)
  • Flintlock: The Siege of Dawn (Xbox, available on PC Game Pass)
  • Fort Solis (Epic Games Store)
  • King Arthur: Legion IX (Steam)
  • The Legend of Heroes: Trails Through Daybreak (Steam)
  • Squirrel With a Gun (Steam)
  • Tyranny – Gold Edition (Xbox, available on Microsoft Store)
  • XIII (Xbox, available on Microsoft Store)

Blacksmith Simulator didn’t make it in September as the game’s launch was moved to next year.

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

How AI and Accelerated Computing Drive Energy Efficiency

How AI and Accelerated Computing Drive Energy Efficiency

AI isn’t just about building smarter machines. It’s about building a greener world.

From optimizing energy use to reducing emissions, artificial intelligence and accelerated computing are helping industries tackle some of the world’s toughest environmental challenges.

As Joshua Parker, NVIDIA’s Senior Director of Corporate Sustainability, explains on the latest edition of NVIDIA’s AI Podcast, these technologies are powering a new era of energy efficiency.

Can AI Help Reduce Energy Consumption?

Yes. And it’s doing it in ways that might surprise you.

AI systems themselves use energy—sure—but the big story is how AI and accelerated computing are helping other systems save energy.

Take data centers, for instance.

They’re the backbone of AI, housing the powerful systems that crunch the data needed for AI to work.

Globally, data centers account for about 2% of total energy consumption, and AI-specific centers represent only a tiny fraction of that, Parker explains.

Despite this, AI’s real superpower lies in its ability to optimize.

How? By using accelerated computing platforms that combine GPUs and CPUs.

GPUs (Graphics Processing Units) are designed to handle complex computations quickly and efficiently.

In fact, these systems can be up to 20 times more energy-efficient than traditional CPU-only systems, Parker notes.

That’s not just good for tech companies—it’s good for the environment, too.

What is Accelerated Computing?

At its core, accelerated computing is about doing more with less.

It involves using specialized hardware—like GPUs—to perform tasks faster and with less energy.

This isn’t just theoretical. Over the last eight years, AI systems running on accelerated computing platforms have become 45,000 times more energy-efficient, Parker said.

That’s a staggering leap in performance, driven by improvements in both hardware and software.

So why does this matter? It matters because, as AI becomes more widespread, the demand for computing power grows.

Accelerated computing helps companies scale their AI operations without consuming massive amounts of energy. This energy efficiency is key to AI’s ability to tackle some of today’s biggest sustainability challenges.

AI in Action: Tackling Climate Change

AI isn’t just saving energy—it’s helping to fight climate change.

For instance, AI-enhanced weather forecasting is becoming more accurate, allowing industries and governments to prepare for climate-related events like hurricanes or floods, Parker explains.

The better we can predict these events, the better we can prepare for them, which means fewer resources wasted and less damage done.

Another key area is the rise of digital twins—virtual models of physical environments.

These AI-powered simulations allow companies to optimize energy consumption in real-time, without having to make costly changes in the physical world.

In one case, using a digital twin helped a company achieve a 10% reduction in energy use, Parker said. That may sound small, but scale it across industries and the impact is huge.

AI is also playing a crucial role in developing new materials for renewable energy technologies like solar panels and electric vehicles, accelerating the transition to clean energy.

Can AI Make Data Centers More Sustainable?

Here’s the thing: AI needs data centers to operate, and as AI grows, so does the demand for computing power. But data centers don’t have to be energy hogs.

In fact, they can be part of the sustainability solution.

One major innovation is direct-to-chip liquid cooling. This technology allows data centers to cool their systems much more efficiently than traditional air conditioning methods, which are often energy-intensive.

By cooling directly at the chip level, this method saves energy, helping data centers stay cool without guzzling power, Parker explains.

As AI scales up, the future of data centers will depend on designing for energy efficiency from the ground up. That means integrating renewable energy, using energy storage solutions, and continuing to innovate with cooling technologies.

The goal is to create green data centers that can meet the world’s growing demand for compute power without increasing their carbon footprint, Parker says.

The Role of AI in Building a Sustainable Future

AI is not just a tool for optimizing systems—it’s a driver of sustainable innovation. From improving the efficiency of energy grids to enhancing supply chain logistics, AI is leading the charge in reducing waste and emissions.

Let’s look at energy grids. AI can monitor and adjust energy distribution in real-time, ensuring that resources are allocated where they’re needed most, reducing waste.

This is particularly important as the world moves toward renewable energy, which can be less predictable than traditional sources like coal or natural gas, Parker said.

AI is also helping industries reduce their carbon footprints. By optimizing routes and predicting demand more accurately, AI can cut down on fuel use and emissions in logistics and transportation sectors.

Looking to the future, AI’s role in promoting sustainability is only going to grow.

As technologies become more energy-efficient and AI applications expand, we can expect AI to play a crucial role in helping industries meet their sustainability goals, Parker said.

It’s not just about making AI greener—it’s about using AI to make the world greener.

AI and accelerated computing are reshaping how we think about energy and sustainability.

With their ability to optimize processes, reduce energy waste, and drive innovations in clean technology, these technologies are essential tools for creating a sustainable future.

As Parker explains on NVIDIA’s AI Podcast, AI’s potential to save energy and combat climate change is vast—and we’re only just beginning to tap into it.

As AI continues to revolutionize industries and drive sustainability, there’s no better time to dive deeper into its transformative potential. If you’re eager to explore how AI and accelerated computing are shaping the future of energy efficiency and climate solutions, join us at the NVIDIA AI Summit.

📅Event Date: October 9, 2024
🔗 Register here and gain exclusive insights into the innovations that are powering a sustainable world.

Don’t miss your chance to learn from the leading minds in AI and sustainability. Let’s create a greener future together.

Read More

How Schneider Electric uses Amazon Bedrock to identify high-potential business opportunities

How Schneider Electric uses Amazon Bedrock to identify high-potential business opportunities

This post was co-written with Anthony Medeiros, Manager of Solutions Engineering and Architecture for North America Artificial Intelligence, and Adrian Boeh, Senior Data Scientist – NAM AI, from Schneider Electric.

Schneider Electric is a global leader in the digital transformation of energy management and automation. The company specializes in providing integrated solutions that make energy safe, reliable, efficient, and sustainable. Schneider Electric serves a wide range of industries, including smart manufacturing, resilient infrastructure, future-proof data centers, intelligent buildings, and intuitive homes. They offer products and services that encompass electrical distribution, industrial automation, and energy management. Their innovative technologies, extensive range of products, and commitment to sustainability position Schneider Electric as a key player in advancing smart and green solutions for the modern world.

As demand for renewable energy continues to rise, Schneider Electric faces high demand for sustainable microgrid infrastructure. This demand comes in the form of requests for proposals (RFPs), each of which needs to be manually reviewed by a microgrid subject matter expert (SME) at Schneider. Manual review of each RFP was proving too costly and couldn’t be scaled to meet the industry needs. To solve the problem, Schneider turned to Amazon Bedrock and generative artificial intelligence (AI). Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

In this post, we show how the team at Schneider collaborated with the AWS Generative AI Innovation Center (GenAIIC) to build a generative AI solution on Amazon Bedrock to solve this problem. The solution processes and evaluates each RFP and then routes high-value RFPs to the microgrid SME for approval and recommendation.

Problem Statement

Microgrid infrastructure is a critical element to the growing renewables energy market. A microgrid includes on-site power generation and storage that allow a system to disconnect from the main grid. Schneider Electric offers several important products that allow customers to build microgrid solutions to make their residential buildings, schools, or manufacturing centers more sustainable. Growing public and private investment in this sector has led to an exponential increase in the number of RFPs for microgrid systems.

The RFP documents contain technically complex textual and visual information such as scope of work, parts lists, and electrical diagrams. Moreover, they can be hundreds of pages long. The following figure provides several examples of RFP documents. The RFP size and complexity makes reviewing them costly and labor intensive. An experienced SME is usually required to review an entire RFP and provide an assessment for its applicability to the business and potential for conversion.

Microgrid Request for Proposal (RFP) Examples

Sample request for proposal (RFP) input data

To add additional complexity, the same set of RFP documents might be assessed by multiple business units within Schneider. Each unit might be looking for different requirements that make the opportunity relevant to that sales team.

Given the size and complexity of the RFP documents, the Schneider team needed a way to quickly and accurately identify opportunities where Schneider products offer a competitive advantage and a high potential for conversion. Failure to respond to viable opportunities could result in potential revenue loss, while devoting resources to proposals where the company lacks a distinct competitive edge would lead to an inefficient use of time and effort.

They also needed a solution that could be repurposed for other business units, allowing the impact to extend to the entire enterprise. Successfully handling the influx of RFPs would not only allow the Schneider team to expand their microgrid business, but help businesses and industries adopt a new renewable energy paradigm.

Amazon Bedrock and Generative AI

To help solve this problem, the Schneider team turned to generative AI and Amazon Bedrock. Large language models (LLMs) are now enabling more efficient business processes through their ability to identify and summarize specific categories of information with human-like precision. The volume and complexity of the RFP documents made them an ideal candidate to use generative AI for document processing.

You can use Amazon Bedrock to build and scale generative AI applications with a broad range of FMs. Amazon Bedrock is a fully managed service that includes FMs from Amazon and third-party models supporting a range of use cases. For more details about the FMs available, see Supported foundation models on Amazon Bedrock. Amazon Bedrock enables developers to create unique experiences with generative AI capabilities supporting a broad range of programming languages and frameworks.

The solution uses Anthropic Claude on Amazon Bedrock, specifically the Anthropic Claude Sonnet model. For the vast majority of workloads, Sonnet is two times faster than Claude 2 and Claude 2.1, with higher levels of intelligence.

Solution Overview

Traditional Retrieval Augmented Generation (RAG) systems can’t identify the relevancy of RFP documents to a given sales team because of the extensively long list of one-time business requirements and the large taxonomy of electrical components or services, which might or might not be present in the documents.

Other existing approaches require either expensive domain-specific fine-tuning to the LLM or the use of filtering for noise and data elements, which leads to suboptimal performance and scalability impacts.

Instead, the AWS GenAIC team worked with Schneider Electric to package business objectives onto the LLM through multiple prisms of semantic transformations: concepts, functions, and components. For example, in the domain of smart grids, the underlying business objectives might be defined as resiliency, isolation, and sustainability. Accordingly, the corresponding functions would involve energy generation, consumption, and storage. The following figure illustrates these components.

Microgrid Concept Diagram

Microgrid semantic components

The approach of concept-driven information extraction resembles ontology-based prompting. It allows engineering teams to customize the initial list of concepts and scale onto different domains of interest. The decomposition of complex concepts into specific functions incentivizes the LLM to detect, interpret, and extract the associated data elements.

The LLM was prompted to read RFPs and retrieve quotes pertinent to the defined concepts and functions. These quotes materialize the presence of electrical equipment satisfying the high-level objectives and were used as weight of evidence indicating the downstream relevancy of an RFP to the original sales team.

For example, in the following code, the term BESS stands for battery energy storage system and materializes evidence for power storage.

{
    "quote": "2.3W / 2MWh Saft IHE LFP (1500V) BESS (1X)",
    "function": "Power Storage",
    "relevance": 10,
    "summary": "Specifies a lithium iron phosphate battery energy storage system."
}

In the following example, the term EPC indicates the presence of a solar plant.

{
    "quote": "EPC 2.0MW (2X)",
    "function": "Power Generation",
    "relevance": 9,
    "summary": "Specifies 2 x 2MW solar photovoltaic inverters."
}

The overall solution encompasses three phases:

  • Document chunking and preprocessing
  • LLM-based quote retrieval
  • LLM-based quote summarization and evaluation

The first step uses standard document chunking as well as Schneider’s proprietary document processing pipelines to group similar text elements into a single chunk. Each chunk is processed by the quote retrieval LLM, which identifies relevant quotes within each chunk if they’re available. This brings relevant information to the forefront and filters out irrelevant content. Finally, the relevant quotes are compiled and fed to a final LLM that summarizes the RFP and determines its overall relevance to the microgrid family of RFPs. The following diagram illustrates this pipeline.

GenAI solution flow diagram

The final determination about the RFP is made using the following prompt structure. The details of the actual prompt are proprietary, but the structure includes the following:

  • We first provide the LLM with a brief description of the business unit in question.
  • We then define a persona and tell the LLM where to locate evidence.
  • Provide criteria for RFP categorization.
  • Specify the output format, which includes:
    • A single yes, no, maybe
    • A relevance score from 1–10.
    • An explainability.
prompt = """ 
[1] <DESCRIPTION OF THE BUSINESS UNIT> 
[2] You're an expert in <BUSINESS UNIT> and have to evaluate if a given RFP is related to <BUSINESS UNIT>… 

The quotes are provided below… 

<QUOTES> 

[3] Determine the relevancy to <BUSINESS UNIT> using … criteria: 

<CRITERIA> 

[4] <RESPONSE_FORMAT> 
[4a] A designation of Yes, No, or Maybe. 
[4b] A relevance score. 
[4c] A brief summary of justification and explanation. 
"""

The result compresses a relatively large corpus of RFP documents into a focused, concise, and informative representation by precisely capturing and returning the most important aspects. The structure allows the SME to quickly filter for specific LLM labels, and the summary quotes allow them to better understand which quotes are driving the LLM’s decision-making process. In this way, the Schneider SME team can spend less time reading through pages of RFP proposals and can instead focus their attention on the content that matters most to their business. The sample below shows both a classification result and qualitative feedback for a sample RFP.

GenAI solution output

Internal teams are already experiencing the advantages of our new AI-driven RFP Assistant:

“At Schneider Electric, we are committed to solving real-world problems by creating a sustainable, digitized, and new electric future. We leverage AI and LLMs to further enhance and accelerate our own digital transformation, unlocking efficiency and sustainability in the energy sector.”

– Anthony Medeiros, Manager of Solutions Engineering and Architecture, Schneider Electric.

Conclusion

In this post, the AWS GenAIIC team, working with Schneider Electric, demonstrated the remarkable general capability of LLMs available on Amazon Bedrock to assist sales teams and optimize their workloads.

The RFP assistant solution allowed Schneider Electric to achieve 94% accuracy in the task of identifying microgrid opportunities. By making small adjustments to the prompts, the solution can be scaled and adopted to other lines of business.

By precisely guiding the prompts, the team can derive distinct and objective perspectives from identical sets of documents. The proposed solution enables RFPs to be viewed through the interchangeable lenses of various business units, each pursuing a diverse range of objectives. These previously obscured insights have the potential to unveil novel business prospects and generate supplementary revenue streams.

These capabilities will allow Schneider Electric to seamlessly integrate AI-powered insights and recommendations into its day-to-day operations. This integration will facilitate well-informed and data-driven decision-making processes, streamline operational workflows for heightened efficiency, and elevate the quality of customer interactions, ultimately delivering superior experiences.


About the Authors

Anthony MedeirosAnthony Medeiros is a Manager of Solutions Engineering and Architecture at Schneider Electric. He specializes in delivering high-value AI/ML initiatives to many business functions within North America. With 17 years of experience at Schneider Electric, he brings a wealth of industry knowledge and technical expertise to the team.

Adrian BoehAdrian Boeh is a Senior Data Scientist working on advanced data tasks for Schneider Electric’s North American Customer Transformation Organization. Adrian has 13 years of experience at Schneider Electric and is AWS Machine Learning Certified with a proven ability to innovate and improve organizations using data science methods and technology.

Kosta Belz is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where he helps customers design and build generative AI solutions to solve key business problems.

Dan VolkDan Volk is a Data Scientist at the AWS Generative AI Innovation Center. He has 10 years of experience in machine learning, deep learning, and time series analysis, and holds a Master’s in Data Science from UC Berkeley. He is passionate about transforming complex business challenges into opportunities by leveraging cutting-edge AI technologies.

Negin Sokhandan is a Senior Applied Scientist in the AWS Generative AI Innovation Center, where she works on building generative AI solutions for AWS strategic customers. Her research background is statistical inference, computer vision, and multimodal systems.

Read More

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Achieve operational excellence with well-architected generative AI solutions using Amazon Bedrock

Large enterprises are building strategies to harness the power of generative AI across their organizations. However, scaling up generative AI and making adoption easier for different lines of businesses (LOBs) comes with challenges around making sure data privacy and security, legal, compliance, and operational complexities are governed on an organizational level. In this post, we discuss how to address these challenges holistically.

Managing bias, intellectual property, prompt safety, and data integrity are critical considerations when deploying generative AI solutions at scale. Because this is an emerging area, best practices, practical guidance, and design patterns are difficult to find in an easily consumable basis. In this post, we share AWS guidance that we have learned and developed as part of real-world projects into practical guides oriented towards the AWS Well-Architected Framework, which is used to build production infrastructure and applications on AWS. We focus on the operational excellence pillar in this post.

Amazon Bedrock plays a pivotal role in this endeavor. It’s a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like Anthropic, Cohere, Meta, Mistral AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. You can securely integrate and deploy generative AI capabilities into your applications using services such as AWS Lambda, enabling seamless data management, monitoring, and compliance (for more details, see Monitoring and observability). This integration makes sure enterprises can take advantage of the full power of generative AI while adhering to best practices in operational excellence.

With Amazon Bedrock, enterprises can achieve the following:

  • Scalability – Scale generative AI applications across different LOBs without compromising performance
  • Security and compliance – Enforce data privacy, security, and compliance with industry standards and regulations
  • Operational efficiency – Streamline operations with built-in tools for monitoring, logging, and automation, aligned with the AWS Well-Architected Framework
  • Innovation – Access cutting-edge AI models and continually improve them with real-time data and feedback

This approach enables enterprises to deploy generative AI at scale while maintaining operational excellence, ultimately driving innovation and efficiency across their organizations.

What’s different about operating generative AI workloads and solutions?

The operational excellence pillar of the Well-Architected Framework is mainly focused on supporting the development and running of workloads effectively, gaining insight into their operations, and continuously improving supporting processes and procedures to deliver business value. However, if we were to apply a generative AI lens, we would need to address the intricate challenges and opportunities arising from its innovative nature, encompassing the following aspects:

  • Complexity can be unpredictable due to the ability of large language models (LLMs) to generate new content
  • Potential intellectual property infringement is a concern due to the lack of transparency in the model training data
  • Low accuracy in generative AI can create incorrect or controversial content
  • Resource utilization requires a specific operating model to meet the substantial computational resources required for training and prompt and token sizes
  • Continuous learning necessitates additional data annotation and curation strategies
  • Compliance is also a rapidly evolving area, where data governance becomes more nuanced and complex, and poses challenges
  • Integration with legacy systems requires careful considerations of compatibility, data flow between systems, and potential performance impacts.

Any generative AI lens therefore needs to combine the following elements, each with varying levels of prescription and enforcement, to address these challenges and provide the basis for responsible AI usage:

  • Policy – The system of principles to guide decisions
  • Guardrails – The rules that create boundaries to keep you within the policy
  • Mechanisms – The process and tools

AWS has advanced responsible AI by introducing Amazon Bedrock Guardrails as a protection to prevent harmful responses from the LLMs, providing an additional layer of safeguards regardless of the underlying FM. However, a more holistic organizational approach is crucial because generative AI practitioners, data scientists, or developers can potentially use a wide range of technologies, models, and datasets to circumvent the established controls.

As cloud adoption has matured for more traditional IT workloads and applications, the need to help developers select the right cloud solution that minimizes corporate risk and simplifies the developer experience has emerged. This is often referred to as platform engineering and can be neatly summarized by the mantra “You (the developer) build and test, and we (the platform engineering team) do all the rest!”

This approach, when applied to generative AI solutions, means that a specific AI or machine learning (ML) platform configuration can be used to holistically address the operational excellence challenges across the enterprise, allowing the developers of the generative AI solution to focus on business value. This is illustrated in the following diagram.

GenAI cloud center of excellence

Where to start?

We start this post by reviewing the foundational operational elements a generative AI platform team needs to initially focus on as they transition generative solutions from a proof of concept or prototype phase to a production-ready solution.

Specifically, we cover how you can safely develop, deploy, and monitor models, mitigating operational and compliance risks, thereby reducing the friction in adopting AI at scale and for production use. We focus on the following four design principles:

  • Establish control through promoting transparency of model details, setting up guardrails or safeguards, and providing visibility into costs, metrics, logs, and traces
  • Automate model fine-tuning, training, validation, and deployment using large language model operations (LLMOps) or foundation model operations (FMOps)
  • Manage data through standard methods for ingestion, governance, and indexing
  • Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines

In the following sections, we explain this using an architecture diagram while diving into the best practices of the control pillar.

Provide control through transparency of models, guardrails, and costs using metrics, logs, and traces

The control pillar of the generative AI framework focuses on observability, cost management, and governance, making sure enterprises can deploy and operate their generative AI solutions securely and efficiently. The following diagram illustrates the key components of this pillar:

Control Pillar of Generative AI Well architected solutions

Observability

Setting up observability measures lays the foundations for the other two components, namely FinOps and Governance. Observability is crucial for monitoring the performance, reliability, and cost-efficiency of generative AI solutions. By using AWS services such as Amazon CloudWatch, AWS CloudTrail, and Amazon OpenSearch Service, enterprises can gain visibility into model metrics, usage patterns, and potential issues, enabling proactive management and optimization.

Amazon Bedrock is compatible with robust observability features to monitor and manage ML models and applications. Key metrics integrated with CloudWatch include invocation counts, latency, client and server errors, throttles, input and output token counts, and more (for more details, see Monitor Amazon Bedrock with Amazon CloudWatch). You can also use Amazon EventBridge to monitor events related to Amazon Bedrock. This allows you to create rules that invoke specific actions when certain events occur, enhancing the automation and responsiveness of your observability setup (for more details, see Monitor Amazon Bedrock). CloudTrail can log all API calls made to Amazon Bedrock by a user, role, or AWS service in an AWS environment. This is particularly useful for tracking access to sensitive resources such as personally identifiable information (PII), model updates, and other critical activities, enabling enterprises to maintain a robust audit trail and compliance. To learn more, see Log Amazon Bedrock API calls using AWS CloudTrail.

Amazon Bedrock supports the metrics and telemetry needed for implementing an observability maturity model for LLMs, which includes the following:

  • Capturing and analyzing LLM-specific metrics such as model performance, prompt properties, and cost metrics through CloudWatch
  • Implementing alerts and incident management tailored to LLM-related issues
  • Providing security compliance and robust monitoring mechanisms, because Amazon Bedrock is in scope for common compliance standards and offers automated abuse detection mechanisms
  • Using CloudWatch and CloudTrail for anomaly detection, usage and costs forecasting, optimizing performance, and resource utilization
  • Using AWS forecasting services for better resource planning and cost management

CloudWatch provides a unified monitoring and observability service that collects logs, metrics, and events from various AWS services and on-premises sources. This allows enterprises to track key performance indicators (KPIs) for their generative AI models, such as I/O volumes, latency, and error rates. You can use CloudWatch dashboards to create custom visualizations and alerts, so teams are quickly notified of any anomalies or performance degradation.

For more advanced observability requirements, enterprises can use OpenSearch Service, a fully managed service for deploying, operating, and scaling OpenSearch and Kibana. Opensearch Dashboards provides powerful search and analytical capabilities, allowing teams to dive deeper into generative AI model behavior, user interactions, and system-wide metrics.

Additionally, you can enable model invocation logging to collect invocation logs, full request response data, and metadata for all Amazon Bedrock model API invocations in your AWS account. Before you can enable invocation logging, you need to set up an Amazon Simple Storage Service (Amazon S3) or CloudWatch Logs destination. You can enable invocation logging through either the AWS Management Console or the API. By default, logging is disabled.

Cost management and optimization (FinOps)

Generative AI solutions can quickly scale and consume significant cloud resources, and a robust FinOps practice is essential. With services like AWS Cost Explorer and AWS Budgets, enterprises can track their usage and optimize their generative AI spending, achieving cost-effective deployment and scaling.

Cost Explorer provides detailed cost analysis and forecasting capabilities, enabling you to understand your tenant-related expenditures, identify cost drivers, and plan for future growth. Teams can create custom cost allocation reports, set custom budgets using AWS budgets and alerts, and explore cost trends over time.

Analyzing the cost and performance of generative AI models is crucial for making informed decisions about model deployment and optimization. EventBridge, CloudTrail, and CloudWatch provide the necessary tools to track and analyze these metrics, helping enterprises make data-driven decisions. With this information, you can identify optimization opportunities, such as scaling down under-utilized resources.

With EventBridge, you can configure Amazon Bedrock to respond automatically to status change events in Amazon Bedrock. This enables you to handle API rate limit issues, API updates, and reduction in additional compute resources. For more details, see Monitor Amazon Bedrock events in Amazon EventBridge.

As discussed in previous section, CloudWatch can monitor Amazon Bedrock to collect raw data and process it into readable, near real-time cost metrics. You can graph the metrics using the CloudWatch console. You can also set alarms that watch for certain thresholds, and send notifications or take actions when values exceed those thresholds. For more information, see Monitor Amazon Bedrock with Amazon CloudWatch.

Governance

Implementation of robust governance measures, including continuous evaluation and multi-layered guardrails, is fundamental for the responsible and effective deployment of generative AI solutions in enterprise environments. Let’s look at them one by one:

  • Performance monitoring and evaluation – Continuously evaluating the performance, safety, and compliance of generative AI models is critical. You can achieve this in several ways:
    • Enterprises can use AWS services like Amazon SageMaker Model Monitor and Amazon Bedrock Guardrails, or Amazon Comprehend to monitor model behavior, detect drifts, and make sure generative AI solutions are performing as expected (or better) and adhering to organizational policies.
    • You can deploy open-source evaluation metrics like RAGAS as custom metrics to make sure LLM responses are grounded, mitigate bias, and prevent hallucinations.
    • Model evaluation jobs allow you to compare model outputs and choose the best-suited model for your use case. The job could be automated based on a ground truth, or you could use humans to bring in expertise on the matter. You can also use FMs from Amazon Bedrock to evaluate your applications. To learn more about this approach, refer to Evaluate the reliability of Retrieval Augmented Generation applications using Amazon Bedrock.
  • Guardrails – Generative AI solutions should include robust, multi-level guardrails to enforce responsible AI and oversight:
    • First, you need guardrails around the LLM model to mitigate risks around bias and safeguard the application with responsible AI policies. This can be done through Amazon Bedrock Guardrails to set up custom guardrails around a model (FM or fine-tuned) for configuring denied topics, content filters, and blocked messaging.
    • The second level is to set guardrails around the framework for each use case. This includes implementing access controls, data governance policies, and proactive monitoring and alerting to make sure sensitive information is properly secured and monitored. For example, you can use AWS data analytics services such as Amazon Redshift for data warehousing, AWS Glue for data integration, and Amazon QuickSight for business intelligence (BI).
  • Compliance measures – Enterprises need to set up a robust compliance framework to meet regulatory requirements and industry standards such as GDPR, CCPA, or industry-specific standards. This helps make sure generative AI solutions remain secure, compliant, and efficient in handling sensitive information across different use cases. This approach minimizes the risk of data breaches or unauthorized data access, thereby protecting the integrity and confidentiality of critical data assets. Enterprises can take the following organization-level actions to create a comprehensive governance structure:
    • Establish a clear incident response plan for addressing compliance breaches or AI system malfunctions.
    • Conduct periodic compliance assessments and third-party audits to identify and address potential risks or violations.
    • Provide ongoing training to employees on compliance requirements and best practices in AI governance.
  • Model transparency – Although achieving full transparency in generative AI models remains challenging, organizations can take several steps to enhance model transparency and explainability:
    • Provide model cards on the model’s intended use, performance, capabilities, and potential biases.
    • Ask the model to self-explain, meaning provide explanations for their own decisions. This can also be set in a complex system—for example, agents could perform multi-step planning and improve through self-explanation.

Automate model lifecycle management with LLMOps or FMOps

Implementing LLMOps is crucial for efficiently managing the lifecycle of generative AI models at scale. To grasp the concept of LLMOps, a subset of FMOps, and the key differentiators compared to MLOps, see FMOps/LLMOps: Operationalize generative AI and differences with MLOps. In that post, you can learn more about the developmental lifecycle of a generative AI application and the additional skills, processes, and technologies needed to operationalize generative AI applications.

Manage data through standard methods of data ingestion and use

Enriching LLMs with new data is imperative for LLMs to provide more contextual answers without the need for extensive fine-tuning or the overhead of building a specific corporate LLM. Managing data ingestion, extraction, transformation, cataloging, and governance is a complex, time-consuming process that needs to align with corporate data policies and governance frameworks.

AWS provides several services to support this; the following diagram illustrates these at a high level. For a more detailed description, see Scaling AI and Machine Learning Workloads with Ray on AWS and Build a RAG data ingestion pipeline for large scale ML workloads.

This workflow includes the following steps:

  1. Data can be securely transferred to AWS using either custom or existing tools or the AWS Transfer family. You can use AWS Identity and Access Management (IAM) and AWS PrivateLink to control and secure access to data and generative AI resources, making sure data remains within the organization’s boundaries and complies with the relevant regulations.
  2. When the data is in Amazon S3, you can use AWS Glue to extract and transform data (for example, into Parquet format) and store metadata about the ingested data, facilitating data governance and cataloging.
  3. The third component is the GPU cluster, which could potentially be a Ray cluster. You can employ various orchestration engines, such as AWS Step Functions, Amazon SageMaker Pipelines, or AWS Batch, to run the jobs (or create pipelines) to create embeddings and ingest the data into a data store or vector store.
  4. Embeddings can be stored in a vector store such as OpenSearch, enabling efficient retrieval and querying. Alternatively, you can use a solution such as Amazon Bedrock Knowledge Bases to ingest data from Amazon S3 or other data sources, enabling seamless integration with generative AI solutions.
  5. You can use Amazon DataZone to manage access control to the raw data stored in Amazon S3 and the vector store, enforcing role-based or fine-grained access control for data governance.
  6. For cases where you need a semantic understanding of your data, you can use Amazon Kendra for intelligent enterprise search. Amazon Kendra has inbuilt ML capabilities and is easy to integrate with various data sources like S3, making it adaptable for different organizational needs.

The choice of which components to use will depend on the specific requirements of the solution, but a consistent solution should exist for all data management to be codified into blueprints (discussed in the following section).

Provide managed infrastructure patterns and blueprints for models, prompt catalogs, APIs, and access control guidelines

There are a number of ways to build and deploy a generative AI solution. AWS offers key services such as Amazon Bedrock, Amazon Kendra, OpenSearch Service, and more, which can be configured to support multiple generative AI use cases, such as text summarization, Retrieval Augmented Generation (RAG), and others.

The simplest way is to allow each team who needs to use generative AI to build their own custom solution on AWS, but this will inevitably increase costs and cause organization-wide irregularities. A more scalable option is to have a centralized team build standard generative AI solutions codified into blueprints or constructs and allow teams to deploy and use them. This team can provide a platform that abstracts away these constructs with a user-friendly and integrated API and provide additional services such as LLMOps, data management, FinOps, and more. The following diagram illustrates these options.

different approaches to scale out GenAI solutions

Establishing blueprints and constructs for generative AI runtimes, APIs, prompts, and orchestration such as LangChain, LiteLLM, and so on will simplify adoption of generative AI and increase overall safe usage. Offering standard APIs with access controls, consistent AI, and data and cost management makes usage straightforward, cost-efficient, and secure.

For more information about how to enforce isolation of resources in a multi-tenant architecture and key patterns in isolation strategies while building solutions on AWS, refer to the whitepaper SaaS Tenant Isolation Strategies.

Conclusion

By focusing on the operational excellence pillar of the Well-Architected Framework from a generative AI lens, enterprises can scale their generative AI initiatives with confidence, building solutions that are secure, cost-effective, and compliant. Introducing a standardized skeleton framework for generative AI runtimes, prompts, and orchestration will empower your organization to seamlessly integrate generative AI capabilities into your existing workflows.

As a next step, you can establish proactive monitoring and alerting, helping your enterprise swiftly detect and mitigate potential issues, such as the generation of biased or harmful output.

Don’t wait—take this proactive stance towards adopting the best practices. Conduct regular audits of your generative AI systems to maintain ethical AI practices. Invest in training your team on the generative AI operational excellence techniques. By taking these actions now, you’ll be well positioned to harness the transformative potential of generative AI while navigating the complexities of this technology wisely.


About the Authors

Akarsha Sehwag is a Data Scientist and ML Engineer in AWS Professional Services with over 5 years of experience building ML based services and products. Leveraging her expertise in Computer Vision and Deep Learning, she empowers customers to harness the power of the ML in AWS cloud efficiently. With the advent of Generative AI, she worked with numerous customers to identify good use-cases, and building it into production-ready solutions. Her diverse interests span development, entrepreneurship, and research.

Malcolm Orr is a principal engineer at AWS and has a long history of building platforms and distributed systems using AWS services. He brings a structured – systems, view to generative AI and is helping define how customers can adopt GenAI safely, securely and cost effectively across their organization.

Tanvi Singhal is a Data Scientist within AWS Professional Services. Her skills and areas of expertise include data science, machine learning, and big data. She supports customers in developing Machine learning models and MLops solutions within the cloud. Prior to joining AWS, she was also a consultant in various industries such as Transportation Networking, Retail and Financial Services. She is passionate about enabling customers on their data/AI journey to the cloud.

Zorina Alliata is a Principal AI Strategist, working with global customers to find solutions that speed up operations and enhance processes using Artificial Intelligence and Machine Learning. Zorina helps companies across several industries identify strategies and tactical execution plans for their AI use cases, platforms, and AI at scale implementations.

Read More