July 2023 – Page 8

Google DeepMind’s latest research at ICML 2023

Exploring AI safety, adaptability, and efficiency for the real worldRead More

Google DeepMind’s latest research at ICML 2023

Exploring AI safety, adaptability, and efficiency for the real worldRead More

Google DeepMind’s latest research at ICML 2023

Exploring AI safety, adaptability, and efficiency for the real worldRead More

Google DeepMind’s latest research at ICML 2023

Exploring AI safety, adaptability, and efficiency for the real worldRead More

Google DeepMind’s latest research at ICML 2023

Exploring AI safety, adaptability, and efficiency for the real worldRead More

Google DeepMind’s latest research at ICML 2023

Exploring AI safety, adaptability, and efficiency for the real worldRead More

Apple Natural Language Understanding Workshop 2023

Apple Machine Learning Research

Google DeepMind’s latest research at ICML 2023

Google DeepMind researchers are presenting more than 80 new papers at the 40th International Conference on Machine Learning (ICML 2023), taking place 23-29 July in Honolulu, Hawai’i.Read More

Use a generative AI foundation model for summarization and question answering using your own data

Large language models (LLMs) can be used to analyze complex documents and provide summaries and answers to questions. The post Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data describes how to fine-tune an LLM using your own dataset. Once you have a solid LLM, you’ll want to expose that LLM to business users to process new documents, which could be hundreds of pages long. In this post, we demonstrate how to construct a real-time user interface to let business users process a PDF document of arbitrary length. Once the file is processed, you can summarize the document or ask questions about the content. The sample solution described in this post is available on GitHub.

Working with financial documents

Financial statements like quarterly earnings reports and annual reports to shareholders are often tens or hundreds of pages long. These documents contain a lot of boilerplate language like disclaimers and legal language. If you want to extract the key data points from one of these documents, you need both time and some familiarity with the boilerplate language so you can identify the interesting facts. And of course, you can’t ask an LLM questions about a document it has never seen.

LLMs used for summarization have a limit on the number of tokens (characters) passed into the model, and with some exceptions, these are typically no more than a few thousand tokens. That normally precludes the ability to summarize longer documents.

Our solution handles documents that exceed an LLM’s maximum token sequence length, and make that document available to the LLM for question answering.

Solution overview

Our design has three important pieces:

It has an interactive web application for business users to upload and process PDFs
It uses the langchain library to split a large PDF into more manageable chunks
It uses the retrieval augmented generation technique to let users ask questions about new data that the LLM hasn’t seen before

As shown in the following diagram, we use a front end implemented with React JavaScript hosted in an Amazon Simple Storage Service (Amazon S3) bucket fronted by Amazon CloudFront. The front-end application lets users upload PDF documents to Amazon S3. After the upload is complete, you can trigger a text extraction job powered by Amazon Textract. As part of the post-processing, an AWS Lambda function inserts special markers into the text indicating page boundaries. When that job is done, you can invoke an API that summarizes the text or answers questions about it.

Because some of these steps may take some time, the architecture uses a decoupled asynchronous approach. For example, the call to summarize a document invokes a Lambda function that posts a message to an Amazon Simple Queue Service (Amazon SQS) queue. Another Lambda function picks up that message and starts an Amazon Elastic Container Service (Amazon ECS) AWS Fargate task. The Fargate task calls the Amazon SageMaker inference endpoint. We use a Fargate task here because summarizing a very long PDF may take more time and memory than a Lambda function has available. When the summarization is done, the front-end application can pick up the results from an Amazon DynamoDB table.

For summarization, we use AI21’s Summarize model, one of the foundation models available through Amazon SageMaker JumpStart. Although this model handles documents of up to 10,000 words (approximately 40 pages), we use langchain’s text splitter to make sure that each summarization call to the LLM is no more than 10,000 words long. For text generation, we use Cohere’s Medium model, and we use GPT-J for embeddings, both via JumpStart.

Summarization processing

When handling larger documents, we need to define how to split the document into smaller pieces. When we get the text extraction results back from Amazon Textract, we insert markers for larger chunks of text (a configurable number of pages), individual pages, and line breaks. Langchain will split based on those markers and assemble smaller documents that are under the token limit. See the following code:

text_splitter = RecursiveCharacterTextSplitter(
      separators = ["<CHUNK>", "<PAGE>", "n"],
         chunk_size = int(chunk_size),
         chunk_overlap  = int(chunk_overlap))

 with open(local_path) as f:
     doc = f.read()
 texts = text_splitter.split_text(doc)
 print(f"Number of splits: {len(texts)}")


 llm = SageMakerLLM(endpoint_name = endpoint_name)

 responses = []
 for t in texts:
     r = llm(t)
     responses.append(r)
 summary = "n".join(responses)

The LLM in the summarization chain is a thin wrapper around our SageMaker endpoint:

class SageMakerLLM(LLM):

endpoint_name: str
    
@property
def _llm_type(self) -> str:
    return "summarize"
    
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
    response = ai21.Summarize.execute(
                      source=prompt,
                      sourceType="TEXT",
                      sm_endpoint=self.endpoint_name
    )
    return response.summary

Question answering

In the retrieval augmented generation method, we first split the document into smaller segments. We create embeddings for each segment and store them in the open-source Chroma vector database via langchain’s interface. We save the database in an Amazon Elastic File System (Amazon EFS) file system for later use. See the following code:

documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500,
                                                chunk_overlap  = 0)
texts = text_splitter.split_documents(documents)
print(f"Number of splits: {len(texts)}")

embeddings = SMEndpointEmbeddings(
    endpoint_name=endpoint_name,
)
vectordb = Chroma.from_documents(texts, embeddings, 
    persist_directory=persist_directory)
vectordb.persist()

When the embeddings are ready, the user can ask a question. We search the vector database for the text chunks that most closely match the question:

embeddings = SMEndpointEmbeddings(
    endpoint_name=endpoint_embed
)
vectordb = Chroma(persist_directory=persist_directory, 
embedding_function=embeddings)
docs = vectordb.similarity_search_with_score(question)

We take the closest matching chunk and use it as context for the text generation model to answer the question:

cohere_client = Client(endpoint_name=endpoint_qa)
context = docs[high_score_idx][0].page_content.replace("n", "")
qa_prompt = f'Context={context}nQuestion={question}nAnswer='
response = cohere_client.generate(prompt=qa_prompt, 
                                  max_tokens=512, 
                                  temperature=0.25, 
                                  return_likelihoods='GENERATION')
answer = response.generations[0].text.strip().replace('n', '')

User experience

Although LLMs represent advanced data science, most of the use cases for LLMs ultimately involve interaction with non-technical users. Our example web application handles an interactive use case where business users can upload and process a new PDF document.

The following diagram shows the user interface. A user starts by uploading a PDF. After the document is stored in Amazon S3, the user is able to start the text extraction job. When that’s complete, the user can invoke the summarization task or ask questions. The user interface exposes some advanced options like the chunk size and chunk overlap, which would be useful for advanced users who are testing the application on new documents.

Next steps

LLMs provide significant new information retrieval capabilities. Business users need convenient access to those capabilities. There are two directions for future work to consider:

Take advantage of the powerful LLMs already available in Jumpstart foundation models. With just a few lines of code, our sample application could deploy and make use of advanced LLMs from AI21 and Cohere for text summarization and generation.
Make these capabilities accessible to non-technical users. A prerequisite to processing PDF documents is extracting text from the document, and summarization jobs may take several minutes to run. That calls for a simple user interface with asynchronous backend processing capabilities, which is easy to design using cloud-native services like Lambda and Fargate.

We also note that a PDF document is semi-structured information. Important cues like section headings are difficult to identify programmatically, because they rely on font sizes and other visual indicators. Identifying the underlying structure of information helps the LLM process the data more accurately, at least until such time that LLMs can handle input of unbounded length.

Conclusion

In this post, we showed how to build an interactive web application that lets business users upload and process PDF documents for summarization and question answering. We saw how to take advantage of Jumpstart foundation models to access advanced LLMs, and use text splitting and retrieval augmented generation techniques to process longer documents and make them available as information to the LLM.

At this point in time, there is no reason not to make these powerful capabilities available to your users. We encourage you to start using the Jumpstart foundation models today.

About the author

Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.

Integrate Amazon SageMaker Model Cards with the model registry

Amazon SageMaker Model Cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a factsheet of the model that is important for model governance.

Until now, model cards were logically associated to a model in the Amazon SageMaker Model Registry using model name match. However, when solving a business problem through a machine learning (ML) model, as customers iterate on the problem, they create multiple versions of the model and they need to operationalize and govern multiple model versions. Therefore, they need the ability to associate a model card to a particular model version.

In this post, we discuss a new feature that supports integrating model cards with the model registry at the deployed model version level. We discuss the solution architecture and best practices for managing model card versions, and walk through how to set up, operationalize, and govern the model card integration with the model version in the model registry.

Solution overview

SageMaker model cards help you standardize documenting your models from a governance perspective, and the SageMaker model registry helps you deploy and operationalize ML models. The model registry supports a hierarchical structure for organizing and storing ML models with model metadata information.

When an organization solves a business problem using ML, such as a customer churn prediction, we recommend the following steps:

Create a model card for the business problem to be solved.
Create a model package group for the business problem to be solved.
Build, train, evaluate, and register the first version of the model package version (for example, Customer Churn V1).
Update the model card linking the model package version to the model card.
As you iterate on new model package version, clone the model card from the previous version and link to the new model package version (for example, Customer Churn V2).

The following figure illustrates how a SageMaker model card integrates with the model registry.

As illustrated in the preceding diagram, the integration of SageMaker model cards and the model registry allows you to associate a model card with a specific model version in the model registry. This enables you to establish a single source of truth for your registered model versions, with comprehensive and standardized documentation across all stages of the model’s journey on SageMaker, facilitating discoverability and promoting governance, compliance, and accountability throughout the model lifecycle.

Best practices for managing model cards

Operating in machine learning with governance is a critical requirement for many enterprise organizations today, notably in highly regulated industries. As part of those requirements, AWS provides several services that enable reliable operation of the ML environment.

SageMaker model cards document critical details about your ML models in a single place for streamlined governance and reporting. Model cards help you capture details such as the intended use and risk rating of a model, training details and metrics, evaluation results and observations, and additional call-outs such as considerations, recommendations, and custom information.

Model cards need to be managed and updated as part of your development process, throughout the ML lifecycle. They are an important part of continuous delivery and pipelines in ML. In the same way that a Well-Architected ML project implements continuous integration and continuous delivery (CI/CD) under the umbrella of MLOps, a continuous ML documentation process is a critical capability in a lot of regulated industries or for higher risk use cases. Model cards are part of the best practices for responsible and transparent ML development.

The following diagram shows how model cards should be part of a development lifecycle.

Consider the following best practices:

We recommend creating model cards early in your project lifecycle. In the first phase of the project, when you are working on identifying the business goal and framing the ML problem, you should initiate the creation of the model card. As you work through the different steps of business requirements and important performance metrics, you can create the model card in a draft status and determine the business details and intended uses.
As part of your model development lifecycle phase, you should use the model registry to catalog models for production, manage model versions, and associate metadata with a model. The model registry enables lineage tracking.
After you have iterated successfully and are ready to deploy your model to production, it’s time to update the model card. In the deployment lifecycle phase, you can update the model details of the model card. You should also update training details, evaluation details, ethical considerations, and caveats and recommendations.

Model cards have versions associated with them. A given model version is immutable across all attributes other than the model card status. If you make any other changes to the model card, such as evaluation metrics, description, or intended uses, SageMaker creates a new version of the model card to reflect the updated information. This is to ensure that a model card, once created, can’t be tampered with. Additionally, each unique model name can have only one associated model card and it can’t be changed after you create the model card.

ML models are dynamic and workflow automation components enable you to easily scale your ability to build, train, test, and deploy hundreds of models in production, iterate faster, reduce errors due to manual orchestration, and build repeatable mechanisms.

Therefore, the lifecycle of your model cards will look as described in the following diagram. Every time you update your model card through the model lifecycle, you automatically create a new version of the model card. Every time you iterate on a new model version, you create a new model card that can inherit some model card information of the previous model versions and follow the same lifecycle.

Pre-requisites

This post assumes that you already have models in your model registry. If you want to follow along, you can use the following SageMaker example on GitHub to populate your model registry: SageMaker Pipelines integration with Model Monitor and Clarify.

Integrate a model card with the model version in the model registry

In this example, we have the model-monitor-clarify-group package in our model registry.

In this package, two model versions are available.

For this example, we link Version 1 of the model to a new model card. In the model registry, you can see the details for Version 1.

We can now use the new feature in the SageMaker Python SDK. From the sagemaker.model_card ModelPackage module, you can select a specific model version from the model registry that you would like to link the model card to.

You can now create a new model card for the model version and specify the model_package_details parameter with the previous model package retrieved. You need to populate the model card with all the additional details necessary. For this post, we create a simple model card as an example.

You can then use that definition to create a model card using the SageMaker Python SDK.

When loading the model card again, you can see the associated model under "__model_package_details".

You also have the option to update an existing model card with the model_package as shown in the example code snippet below:

my_card = ModelCard.load(("<model_card_name>")
mp_details = ModelPackage.from_model_package_arn("<arn>")
my_card.model_package_details = mp_details
my_card.update()

Finally, when creating or updating a new model package version in an existing model package, if a model card already exists in that model package group, some information such as the business details and intended uses can be carried over to the new model card.

Clean up

Users are responsible for cleaning up resources if created using the notebook mentioned in the pre-requisites section. Please follow the instructions in the notebook to clean up resources.

Conclusion

In this post, we discussed how to integrate a SageMaker model card with a model version in the model registry. We shared the solution architecture with best practices for implementing a model card and showed how to set up and operationalize a model card to improve your model governance posture. We encourage you to try out this solution and share your feedback in the comments section.

About the Authors

Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 2-year-old sheep-a-doodle!

Natacha Fort is the Government Data Science Lead for Public Sector Australia and New Zealand, Principal SA at AWS. She helps organizations navigate their machine learning journey, supporting them from framing the machine learning problem to deploying into production, all the while making sure the best architecture practices are in place to ensure their success. Natacha focuses with organizations on MLOps and responsible AI.

Vedere AI

Monthly Archives: July 2023

Google DeepMind’s latest research at ICML 2023

Google DeepMind’s latest research at ICML 2023

Google DeepMind’s latest research at ICML 2023

Google DeepMind’s latest research at ICML 2023

Google DeepMind’s latest research at ICML 2023

Google DeepMind’s latest research at ICML 2023

Apple Natural Language Understanding Workshop 2023

Google DeepMind’s latest research at ICML 2023

Use a generative AI foundation model for summarization and question answering using your own data

Working with financial documents

Solution overview

Summarization processing

Question answering

User experience

Next steps

Conclusion

About the author

Integrate Amazon SageMaker Model Cards with the model registry

Solution overview

Best practices for managing model cards

Pre-requisites

Integrate a model card with the model version in the model registry

Clean up

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.