Intelligent video and audio Q&A with multilingual support using LLMs on Amazon SageMaker

Intelligent video and audio Q&A with multilingual support using LLMs on Amazon SageMaker

Digital assets are vital visual representations of products, services, culture, and brand identity for businesses in an increasingly digital world. Digital assets, together with recorded user behavior, can facilitate customer engagement by offering interactive and personalized experiences, allowing companies to connect with their target audience on a deeper level. Efficiently discovering and searching for specific content within digital assets is crucial for businesses to optimize workflows, streamline collaboration, and deliver relevant content to the right audience. According to a study, by 2021, videos already make up 81% of all consumer internet traffic. This observation comes as no surprise because video and audio are powerful mediums offering more immersive experiences and naturally engages target audiences on a higher emotional level.

As companies accumulate large volumes of digital assets, it becomes more challenging to organize and manage them effectively to maximize their value. Traditionally, companies attach metadata, such as keywords, titles, and descriptions, to these digital assets to facilitate search and retrieval of relevant content. But this requires a well-designed digital asset management system and additional efforts to store these assets in the first place. In reality, most of the digital assets lack informative metadata that enables efficient content search. Additionally, you often need to do an analysis of different segments of the whole file and discover the concepts that are covered there. This is time consuming and requires a lot of manual effort.

Generative AI, particularly in the realm of natural language processing and understanding (NLP and NLU), has revolutionized the way we comprehend and analyze text, enabling us to gain deeper insights efficiently and at scale. The advancements in large language models (LLMs) have led to richer representations of texts, which provides better search capabilities for digital assets. Retrieval Augmented Generation (RAG), built on top of LLMs and advanced prompt techniques, is a popular approach to provide more accurate answers based on information hidden in the enterprise digital asset store. By taking advantage of embedding models of LLMs, and powerful indexers and retrievers, RAG can comprehend and process spoken or written queries and quickly find the most relevant information in the knowledge base. Previous studies have shown how RAG can be applied to provide a Q&A solution connecting with an enterprise’s private domain knowledge. However, among all types of digital assets, video and audio assets are the most common and important.

The RAG-based video/audio question answering solution can potentially solve business problems of locating training and reference materials that are in the form of non-text content. With limited tags or metadata associated of these assets, the solution is trying to make users interact with the chatbot and get answers to their queries, which could be links to specific video training (“I need link to Amazon S3 data storage training”) links to documents (“I need link to learn about machine learning”), or questions that were covered in the videos (“Tell me how to create an S3 bucket”). The response from the chatbot will be able to directly answer the question and also include the links to the source videos with the specific timestamp of the contents that are most relevant to the user’s request.

In this post, we demonstrate how to use the power of RAG in building a Q&A solution for video and audio assets on Amazon SageMaker.

Solution overview

The following diagram illustrates the solution architecture.

The workflow mainly consists of the following stages:

  1. Convert video to text with a speech-to-text model and text alignment with videos and organization. We store the data in Amazon Simple Storage Service (Amazon S3).
  2. Enable intelligent video search using a RAG approach with LLMs and LangChain. Users can get answers generated by LLMs and relevant sources with timestamps.
  3. Build a multi-functional chatbot using LLMs with SageMaker, where the two aforementioned solutions are wrapped and deployed.

For a detailed implementation, refer to the GitHub repo.

Prerequisites

You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, refer to create an AWS account.

If this is your first time working with Amazon SageMaker Studio, you first need to create a SageMaker domain. Additionally, you may need to request a service quota increase for the corresponding SageMaker processing and hosting instances. For preprocessing the video data, we use an ml.p3.2xlarge SageMaker processing instance. For hosting Falcon-40B, we use an ml.g5.12xlarge SageMaker hosting instance.

Convert video to text with a speech-to-text model and sentence embedding model

To be able to search through video or audio digital assets and provide contextual information from videos to LLMs, we need to convert all the media content to text and then follow the general approaches in NLP to process the text data. To make our solution more flexible to handle different scenarios, we provide the following options for this task:

  • Amazon Transcribe and Amazon Translate – If each video and audio file only contains one language, we highly recommend that you choose Amazon Transcribe, which is an AWS managed service to transcribe audio and video files. If you need to translate them into the same language, Amazon Translate is another AWS managed service, which supports multilingual translation.
  • Whisper – In real-world use cases, video data may include multiple languages, such as foreign language learning videos. Whisper is a multitasking speech recognition model that can perform multilingual speech recognition, speech translation, and language identification. You can use a Whisper model to detect and transcribe different languages on video data, and then translate all the different languages into one language. It’s important for most RAG solutions to run on the knowledge base with the same language. Even though OpenAI provides the Whisper API, for this post, we use the Whisper model from Hugging Face.

We run this task with an Amazon SageMaker Processing job on existing data. You can refer to data_preparation.ipynb for the details of how to run this task.

Convert video data to audio data

Because Amazon Transcribe can handle both video and audio data and the Whisper model can only accept audio data, to make both options work, we need to convert video data to audio data. In the following code, we use VideoFileClip from the library moviepy to run this job:

from moviepy.editor import VideoFileClip

video = VideoFileClip(video_path)
video.audio.write_audiofile(audio_path)

Transcribe audio data

When the audio data is ready, we can choose from our two transcribing options. You can choose the optimal option based on your own use case with the criteria we mentioned earlier.

Option 1: Amazon Transcribe and Amazon Translate

The first option is to use Amazon AI services, such as Amazon Transcribe and Amazon Translate, to get the transcriptions of the video and audio datasets. You can refer to the following GitHub example when choosing this option.

Option 2: Whisper

A Whisper model can handle audio data up to 30 seconds in duration. To handle large audio data, we adopt transformers.pipeline to run inference with Whisper. When searching relevant video clips or generating contents with RAG, timestamps for the relevant clips are the important references. Therefore, we turn return_timestamps on to get outputs with timestamps. By setting the parameter language in generate_kwargs, all the different languages in one video file are transcribed and translated into the same language. stride_length_s is the length of stride on the left and right of each chunk. With this parameter, we can make the Whisper model see more context when doing inference on each chunk, which will lead to a more accurate result. See the following code:

from transformers import pipeline
import torch

target_language = "en"
whisper_model = "whisper-large-v2"

device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = pipeline(
   "automatic-speech-recognition",
   model=f"openai/{whisper_model}",
   device=device
)

generate_kwargs = {"task":"transcribe", "language":f"<|{target_language}|>"}
prediction = pipe(
   file_path,
   return_timestamps=True,
   chunk_length_s=30,
   stride_length_s=(5),
   generate_kwargs=generate_kwargs
)

The output of pipe is the dictionary format data with items of text and chunks. text contains the entire transcribed result, and chunks consists of chunks with the timestamp and corresponding transcribed result (see the following screenshot). We use data in chunks to do further processing.

As the preceding screenshot shows, lot of sentences have been cut off and split into different chunks. To make the chunks more meaningful, we need to combine sentences cut off and update timestamps in the next step.

Organize sentences

We use a very simple rule to combine sentences. When the chunk ends with a period (.), we don’t make any change; otherwise, we concatenate it with the next chunk. The following code snippet explains how we make this change:

prev_chunk = None
new_chunks = []
for chunk in chunks:
    if prev_chunk:
        chunk['text'] = prev_chunk['text'] + chunk['text']
        chunk['timestamp'] = (prev_chunk['timestamp'][0], chunk['timestamp'][1])

    if not chunk['text'].endswith('.'):
        prev_chunk = chunk
    else:
        new_chunks.append(chunk)
        prev_chunk = None

Compared to the original chunks produced by the audio-to-text converts, we can get complete sentences that are cut off originally.

Chunk sentences

The text content in documents is normally organized by paragraph. Each paragraph focuses on the same topic. Chunking by paragraph may help embed texts into more meaningful vectors, which may improve retrieval accuracy.

Unlike the normal text content in documents, transcriptions from the transcription model are not paragraphed. Even though there are some stops in the audio files, sometimes it can’t be used to paragraph sentences. On the other hand, langchain provides the recursive chunking text splitter function RecursiveCharacterTextSplitter, which can keep all the semantically relevant content in the same chunk. Because we need to keep timestamps with chunks, we implement our own chunking process. Inspired by the post How to chunk text into paragraphs using python, we chunk sentences based on the similarity between the adjacent sentences with a sentence embedding approach. The basic idea is to take the sentences with the lowest similarity to adjacent sentences as the split points. We use all-MiniLM-L6-v2 for sentence embedding. You can refer the original post for the explanation of this approach. We have made some minor changes on the original source code; refer to our source code for the implementation. The core part for this process is as follows:

# Embed sentences
model_name = "all-minilm-l6-v2"
model = SentenceTransformer(model_name)
embeddings = model.encode(sentences_all)
# Create similarities matrix
similarities = cosine_similarity(embeddings)

# Let's apply our function. For long sentences i reccomend to use 10 or more sentences
minmimas = activate_similarities(similarities, p_size=p_size, order=order)

# Create empty string
split_points = [each for each in minmimas[0]]
text = ''

para_chunks = []
para_timestamp = []
start_timestamp = 0

for num, each in enumerate(sentences_all):
    current_timestamp = timestamps_all[num]
    
    if text == '' and (start_timestamp == current_timestamp[1]):
        start_timestamp = current_timestamp[0]
    
    if num in split_points:
        para_chunks.append(text)
        para_timestamp.append([start_timestamp, current_timestamp[1]])
        text = f'{each}. '
        start_timestamp = current_timestamp[1]
    else:
        text+=f'{each}. '

if len(text):
    para_chunks.append(text)
    para_timestamp.append([start_timestamp, timestamps_all[-1][1]])

To evaluate the efficiency of chunking with sentence embedding, we conducted qualitative comparisons between different chunking mechanisms. The assumption underlying such comparisons is that if the chunked texts are more semantically different and separate, there will be less irrelevant contextual information being retrieved for the Q&A, so that the answer will be more accurate and precise. At the same time, because less contextual information is sent to LLMs, the cost of inference will also be less as charges increment with the size of tokens.

We visualized the first two components of a PCA by reducing high dimension into two dimensions. Compared to recursive chunking, we can see the distances between vectors representing different chunks with sentence embedding are more scattered, meaning the chunks are more semantically separate. This means when the vector of a query is close to the vector of one chunk, it may have less possibility to be close to other chunks. A retrieval task will have fewer opportunities to choose relevant information from multiple semantically similar chunks.

When the chunking process is complete, we attach timestamps to the file name of each chunk, save it as a single file, and then upload it to an S3 bucket.

Enable intelligent video search using a RAG-based approach with LangChain

There are typically four approaches to build a RAG solution for Q&A with LangChain:

  • Using the load_qa_chain functionality, which feeds all information to an LLM. This is not an ideal approach given the context window size and the volume of video and audio data.
  • Using the RetrievalQA tool, which requires a text splitter, text embedding model, and vector store to process texts and retrieve relevant information.
  • Using VectorstoreIndexCreator, which is a wrapper around all logic in the second approach. The text splitter, text embedding model, and vector store are configured together inside the function at one time.
  • Using the ConversationalRetrievalChain tool, which further adds memory of chat history to the QA solution.

For this post, we use the second approach to explicitly customize and choose the best engineering practices. In the following sections, we describe each step in detail.

To search for the relevant content based on the user input queries, we use semantic search, which can better understand the intent behind and query and perform meaningful retrieval. We first use a pre-trained embedding model to embed all the transcribed text into a vector space. At search time, the query is also embedded into the same vector space and the closest embeddings from the source corpus are found. You can deploy the pre-trained embedding model as shown in Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart to create the embeddings for semantic search. In our post, we adopt similar ways to create an intelligent video search solution using a RAG-based approach with the open-source LangChain library. LangChain is an open-source framework for developing applications powered by language models. LangChain provides a generic interface for many different LLMs.

We first deploy an embedding model GPT-J 6B provided by Amazon SageMaker JumpStart and the language model Falcon-40B Instruct from Hugging Face to prepare for the solution. When the endpoints are ready, we follow similar steps described Question answering using Retrieval Augmented Generation with foundation models in Amazon SageMaker JumpStart to create the LLM model and embedding model for LangChain.

The following code snippet shows how to create the LLM model using the langchain.llms.sagemaker_endpoint.SagemakerEndpoint class and transform the request and response payload for the LLM in the ContentHandler:

from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerEndpoint

parameters = {
    "max_new_tokens": 500,
}

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        self.len_prompt = len(prompt)
        input_str = json.dumps({"inputs": prompt , "parameters": {**model_kwargs}})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = output.read()
        res = json.loads(response_json)
        print(res)
        ans = res[0]['generated_text'][self.len_prompt:]
        return ans 

content_handler = ContentHandler()

sm_llm = SagemakerEndpoint(
    endpoint_name=_MODEL_CONFIG_["huggingface-falcon-40b"]["endpoint_name"],
    region_name=aws_region,
    model_kwargs=parameters,
    content_handler=content_handler,
) 

When we use a SageMaker JumpStart embedding model, we need to customize the LangChain SageMaker endpoint embedding class and transform the model request and response to integrate with LangChain. Load the processed video transcripts using the LangChain document loader and create an index.

We use the DirectoryLoader package in LangChain to load the text documents into the document loader:

loader = DirectoryLoader("./data/demo-video-sagemaker-doc/", glob="*/.txt")
documents = loader.load()

Next, we use the embedding models to create the embeddings of the contents and store the embeddings in a FAISS vector store to create an index. We use this index to find relevant documents that are semantically similar to the input query. With the VectorstoreIndexCreator class, you can just write a few lines of code to achieve this task:

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=CharacterTextSplitter(chunk_size=500, chunk_overlap=0),
)
index = index_creator.from_loaders([loader])

Now we can use the index to search for relevant context and pass it to the LLM model to generate an accurate response:

index.query(question=question, llm=sm_llm)

Build a multi-functional chatbot with SageMaker

With the deployed LLM on SageMaker, we can build a multi-functional smart chatbot to show how these models can help your business build advanced AI-powered applications. In this example, the chatbot uses Streamlit to build the UI and the LangChain framework to chain together different components around LLMs. With the help of the text-to-text and speech-to-text LLMs deployed on SageMaker, this smart chatbot accepts inputs from text files and audio files so users can chat with the input files (accepts text and audio files) and further build applications on top of this. The following diagram shows the architecture of the chatbot.

When a user uploads a text file to the chatbot, the chatbot puts the content into the LangChain memory component and the user can chat with the uploaded document. This part is inspired by the following GitHub example that builds a document chatbot with SageMaker. We also add an option to allow users to upload audio files. Then the chatbot automatically invokes the speech-to-text model hosted on the SageMaker endpoint to extract the text content from the uploaded audio file and add the text content to the LangChain memory. Lastly, we allow the user to select the option to use the knowledge base when answering questions. This is the RAG capability shown in the preceding diagram. We have defined the SageMaker endpoints that are deployed in the notebooks provided in the previous sections. Note that you need to pass the actual endpoint names that are shown in your account when running the Streamlit app. You can find the endpoint names on the SageMaker console under Inference and Endpoints.

Falcon_endpoint_name = os.getenv("falcon_ep_name", default="falcon-40b-instruct-12xl")
whisper_endpoint_name = os.getenv('wp_ep_name', default="whisper-large-v2")
embedding_endpoint_name = os.getenv('embed_ep_name', default="huggingface-textembedding-gpt-j-6b")

When the knowledge base option is not selected, we use the conversation chain, where we add the memory component using the ConversationBufferMemory provided by LangChain, so the bot can remember the current conversation history:

def load_chain():
    memory = ConversationBufferMemory(return_messages=True)
    chain = ConversationChain(llm=llm, memory=memory)
    return chain

chatchain = load_chain()

We use similar logic as shown in the earlier section for the RAG component and add the document retrieval function to the code. For demo purposes, we load the transcribed text stored in SageMaker Studio local storage as a document source. You can implement other RAG solutions using the vector databases based on your choice, such as Amazon OpenSearch Service, Amazon RDS, Amazon Kendra, and more.

When users use the knowledge base for the question, the following code snippet retrieves the relevant contents from the database and provides additional context for the LLM to answer the question. We used the specific method provided by FAISS, similarity_search_with_score, when searching for relevant documents. This is because it can also provide the metadata and similarity score of the retrieved source file. The returned distance score is L2 distance. Therefore, a lower score is better. This gives us more options to provide more context for the users, such as providing the exact timestamps of the source videos that are relevant to the input query. When the RAG option is selected by the user from the UI, the chatbot uses the load_qa_chain function provided by LangChain to provide the answers based on the input prompt.

docs = docsearch.similarity_search_with_score(user_input)
contexts = []

for doc, score in docs:
    print(f"Content: {doc.page_content}, Metadata: {doc.metadata}, Score: {score}")
    if score <= 0.9:
        contexts.append(doc)
        source.append(doc.metadata['source'].split('/')[-1])
print(f"n INPUT CONTEXT:{contexts}")
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.:nn{context}nnQuestion: {question}nHelpful Answer:"""
                
PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain = load_qa_chain(llm=llm, prompt=PROMPT)
result = chain({"input_documents": contexts, "question": user_input},
                return_only_outputs=True)["output_text"] 

if len(source) != 0:
    df = pd.DataFrame(source, columns=['knowledge source'])
    st.data_editor(df)

Run the chatbot app

Now we’re ready to run the Streamlit app. Open a terminal in SageMaker Studio and navigate to the cloned GitHub repository folder. You need to install the required Python packages that are specified in the requirements.txt file. Run pip install -r requirements.txt to prepare the Python dependencies.

Then run the following command to update the endpoint names in the environment variables based on the endpoints deployed in your account accordingly. When you run the chatbot.py file, it automatically updates the endpoint names based on the environment variables.

export falcon_ep_name=<the falcon endpoint name deployed in your account>
export wp_ep_name=<the whisper endpoint name deployed in your account>
export embed_ep_name=<the embedding endpoint name deployed in your account>
streamlit run app_chatbot/chatbot.py --server.port 6006 --server.maxUploadSize 6

To access the Streamlit UI, copy your SageMaker Studio URL and replace lab? with proxy/[PORT NUMBER]/. For this post, we specified the server port as 6006, so the URL should look like https://<domain ID>.studio.<region>.sagemaker.aws/jupyter/default/proxy/6006/.

Replace domain ID and region with the correct value in your account to access the UI.

Chat with your audio file

In the Conversation setup pane, choose Browse files to select local text or audio files to upload to the chatbot. If you select an audio file, it will automatically invoke the speech-to-text SageMaker endpoint to process the audio file and present the transcribed text to the console, as shown in the following screenshot. You can continue asking questions about the audio file and the chatbot will be able to remember the audio content and respond to your queries based on the audio content.

Use the knowledge base for the Q&A

When you want to answer questions that require specific domain knowledge or use the knowledge base, select Use knowledge base. This lets the chatbot retrieve relevant information from the knowledge base built earlier (the vector database) to add additional context to answer the question. For example, when we ask the question “what is the recommended way to first customize a foundation model?” to the chatbot without the knowledge base, the chatbot returns an answer similar to the following screenshot.

When we use the knowledge base to help answer this question, the chatbot returns a different response. In the demo video, we read the SageMaker document about how to customize a model in SageMaker Jumpstart.

The output also provides the original video file name with the retrieved timestamp of the corresponding text. Users can go back to the original video file and locate the specific clips in the original videos.

This example chatbot demonstrates how businesses can use various types of digital assets to enhance their knowledge base and provide multi-functional assistance to their employees to improve productivity and efficiency. You can build the knowledge database from documents, audio and video datasets, and even image datasets to consolidate all the resources together. With SageMaker serving as an advanced ML platform, you accelerate project ideation to production speed with the breadth and depth of the SageMaker services that cover the whole ML lifecycle.

Clean up

To save costs, delete all the resources you deployed as part of the post. You can follow the provided notebook’s cleanup section to programmatically delete the resources, or you can delete any SageMaker endpoints you may have created via the SageMaker console.

Conclusion

The advent of generative AI models powered by LLMs has revolutionized the way businesses acquire and apply insights from information. Within this context, digital assets, including video and audio content, play a pivotal role as visual representations of products, services, and brand identity. Efficiently searching and discovering specific content within these assets is vital for optimizing workflows, enhancing collaboration, and delivering tailored experiences to the intended audience. With the power of generative AI models on SageMaker, businesses can unlock the full potential of their video and audio resources. The integration of generative AI models empowers enterprises to build efficient and intelligent search solutions, enabling users to access relevant and contextual information from their digital assets, and thereby maximizing their value and fostering business success in the digital landscape.

For more information on working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS.


About the authors

Gordon Wang is a Senior AI/ML Specialist TAM at AWS. He supports strategic customers with AI/ML best practices across many industries. He is passionate about computer vision, NLP, generative AI, and MLOps. In his spare time, he loves running and hiking.

Melanie Li is a Senior AI/ML Specialist TAM at AWS based in Sydney, Australia. She helps enterprise customers build solutions using state-of-the-art AI/ML tools on AWS and provides guidance on architecting and implementing ML solutions with best practices. In her spare time, she loves to explore nature and spend time with family and friends.

Guang Yang is a Senior Applied Scientist at the Amazon Generative AI Innovation Center, where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art generative AI solutions.

Harjyot Malik is a Senior Program Manager at AWS based in Sydney, Australia. He works with the APJC Enterprise Support teams and helps them build and deliver strategies. He collaborates with business teams, delving into complex problems to unearth innovative solutions that in return drive efficiencies for the business. In his spare time, he loves to travel and explore new places.

Read More

Best-in-Class is in Session: New NVIDIA Studio Laptops Supercharge Content, Gaming and Education

Best-in-Class is in Session: New NVIDIA Studio Laptops Supercharge Content, Gaming and Education

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

The start of a new school year is an ideal time for students to upgrade their content creation, gaming and educational capabilities by picking up an NVIDIA Studio laptop, powered by GeForce RTX 40 Series graphics cards.

Marmoset Toolbag — a fully fledged 3D art production tool — just released version 4.06, a free update for Toolbag 4 users. It extends support of the OpenUSD file format and adds NVIDIA Omniverse compatibility, NVIDIA DLSS capability, AI OptiX denoising and speedups in rendering and baking — all RTX-accelerated.

Finally, this week In the NVIDIA Studio, popular influencer JiffyVFX talks about his viral video series, Doors to Realities, which garnered over 1.2 million views.

Create, Game, Study

GeForce and NVIDIA Studio RTX 40 Series laptops use the power of AI to accelerate content creation, gaming and study apps.

GeForce RTX 40 Series laptops deliver the ultimate performance for creative projects. Creators can quickly render 3D models and edit up to 8K HDR RAW videos to bring ideas to life — fast. The NVIDIA Studio platform supercharges over 110 creative apps and includes exclusive access to tools like the NVIDIA Broadcast app for enhanced creation, collaboration and remote learning. NVIDIA Studio Drivers provide maximum stability and lightning-fast performance for the most popular creative apps.

Gamers equipped with GeForce RTX 40 Series GPUs have access to ray tracing-powered realistic and immersive graphics, a quantum leap in performance with the AI-powered DLSS 3, and the lowest latency and best system responsiveness for the ultimate competitive advantage.

For students in engineering, architecture, computer science and other STEM fields, Studio laptops can accelerate dozens of apps, aiding in the creation of smoother 3D design and modeling, faster AI training and simulation and more accurate machine learning models.

Fifth-generation Max-Q technologies use AI to optimize laptop performance, power and acoustics for peak efficiency. This enables thinner and lighter designs and a 70% improved battery life suited to serving student and creator needs alike. And DLSS is now optimized for laptops, giving creators incredible 3D rendering performance with DLSS 3 optical multi-frame generation and super resolution in Omniverse and D5 Render, as well as in hit games like Cyberpunk 2077.

Get the latest and greatest deals on Studio laptops today.

Marmoset Toolbag 4.06 Adds OpenUSD

Content exported from Toolbag to Universal Scene Description (known as OpenUSD) is now fully compatible with the Omniverse ecosystem.

The OpenUSD format delivers the notable advantages of preserving physically accurate material, mesh and lighting properties, even as content travels between 3D apps such as Blender, Marmoset and Unreal Engine.

Consistent mesh and lighting properties across 3D apps and NVIDIA Omniverse.

RTX GPU-accelerated OptiX denoising is also available for smooth, interactive ray tracing in the viewport. Artists can now navigate in full quality without visual artifacts and performance disruptions.

Toolbag also integrated NVIDIA DLSS, which renders the viewport at a reduced resolution and uses sophisticated AI-based technology to upscale images, improving performance while minimizing reductions in image quality. This works especially well with high-DPI displays and is a great tool for maintaining smooth performance while working on full resolution with more complex scenes.

More frames, better performance — that’s NVIDIA DLSS.

RTX GPU-accelerated rendering gives a 2x improvement in render times and a 4x improvement in baking. DirectX 12 migration delivers an additional 25% increase in rendering and baking performance.

Learn more about Toolbag 4.06. All new users receive a full-feature, 30-day free trial license.

Create in a Jiff

James Luke, a.k.a. JiffyVFX, never has to look far for inspiration.

Influencer JiffyVFX.

“Various films, shows, comic book characters, artists and musicians inspire me,” said Luke. “But I’d say the biggest inspirational factor is watching other artists’ works and their behind the scenes content — I’m forever watching tutorials and breakdown videos on YouTube of how other artists create their content.”

“NVIDIA GPUs are just top-of-the-line. The support, the performance, the quality. I’ve always used NVIDIA GPUs for my creative workstations over the years, and I will continue to use them for years to come.” — JiffyVFX

Luke’s Doors to Realities series began with a simple idea: put the planet Earth outside a bedroom window using the Video Copilot’s Orb plug-in for Adobe After Effects.

“I began seeing people post collages or montages of different aesthetics — things like cyberpunk, art deco, neo-noir, retro-futurism — wondering what it would look like to transport to one of those worlds through a portal or a door of some sort,” said Luke. “What would a first-person perspective of that look like?”

More recently, Luke created a Japan-themed Doors to Realities video aided by his ASUS Zenbook Pro 14 Studio laptop, powered by a GeForce RTX 4070 GPU.

Luke used 3D modeling to create the environment featured in the video. He previously used Unreal Engine for the first two installments of the series, but this time, he experimented with Kitbash 3D’s Cargo app for a one-click import into Blender. It was a game-changer — finding and uploading models into Blender was never so speedy and efficient.

Modeling aided by AI in Blender.

His GeForce graphics card enabled RTX-accelerated OptiX ray tracing in the viewport for interactive, photorealistic rendering. When the final renders were ready, Luke added the Doors to Realities template into an Adobe After Effects project with the clip already masked and ready to go. There, he added glow effects, one of over 30+ GPU-accelerated key effects, with keyframes set up from previous videos.

“The fact that I was able to work in rendered mode in real time on a device as small and portable as the Zenbook was mind blowing.” — JiffyVFX

“All I had to do was plop the render in, copy and paste the glow effects with their intensity keyframes and then hit render again,” said Luke.

Finally, Luke added music and background sounds like bird chirping, a river stream and the sound of leaves subtly blowing in the wind.

The Studio laptop not only enhances Luke’s content creation experience but also boosts the efficiency of his workflow. “High frame rates, DLSS 3 and ray tracing — what’s not to love?” said Luke.

He encourages other creators to push forward and continue experimenting.

“Never. Stop. Creating!” said Luke. “Never stop learning! The only way to improve is to keep pushing yourself and the limits of what you can do.”

Content creator JiffyVFX.

Check out JiffyVFX on TikTok.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Using GPT-4 for content moderation

We use GPT-4 for content policy development and content moderation decisions, enabling more consistent labeling, a faster feedback loop for policy refinement, and less involvement from human moderators.OpenAI Blog

Zero-shot and few-shot prompting for the BloomZ 176B foundation model with the simplified Amazon SageMaker JumpStart SDK

Zero-shot and few-shot prompting for the BloomZ 176B foundation model with the simplified Amazon SageMaker JumpStart SDK

Amazon SageMaker JumpStart is a machine learning (ML) hub offering algorithms, models, and ML solutions. With SageMaker JumpStart, ML practitioners can choose from a growing list of best performing and publicly available foundation models (FMs) such as BLOOM, Llama 2, Falcon-40B, Stable Diffusion, OpenLLaMA, Flan-T5/UL2, or FMs from Cohere and LightOn.

In this post and accompanying notebook, we demonstrate how to deploy the BloomZ 176B foundation model using the SageMaker Python simplified SDK in Amazon SageMaker JumpStart as an endpoint and use it for various natural language processing (NLP) tasks. You can also access the foundation models thru Amazon SageMaker Studio. The BloomZ 176B model, one of the largest publicly available models, is a state-of-the-art instruction-tuned model that can perform various in-context few-shot learning and zero-shot learning NLP tasks. Instruction tuning is a technique that involves fine-tuning a language model on a collection of NLP tasks using instructions. To learn more about instruction tuning, refer to Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart.

Zero-shot learning in NLP allows a pre-trained LLM to generate responses to tasks that it hasn’t been specifically trained for. In this technique, the model is provided with an input text and a prompt that describes the expected output from the model in natural language. Zero-shot learning is used in a variety of NLP tasks, such as the following:

  • Multilingual text and sentiment classification
  • Multilingual question and answering
  • Code generation
  • Paragraph rephrasing
  • Summarization
  • Common sense reasoning and natural language inference
  • Question answering
  • Sentence and sentiment classification
  • Imaginary article generation based on a title
  • Summarizing a title based on an article

Few-shot learning involves training a model to perform new tasks by providing only a few examples. This is useful where limited labeled data is available for training. Few-show learning is used in a variety of tasks, including the following:

  • Text summarization
  • Code generation
  • Name entity recognition
  • Question answering
  • Grammar and spelling correction
  • Product description and generalization
  • Sentence and sentiment classification
  • Chatbot and conversational AI
  • Tweet generation
  • Machine translation
  • Intent classification

About Bloom

The BigScience Large Open-science Open-access Multilingual (BLOOM) language model is a transformer-based large language model (LLM). BLOOM is an autoregressive LLM trained to continue text from a prompt on vast amounts of text data using industrial-scale computational resources. As such, it is able to output coherent text that is hardly distinguishable from text written by humans. BLOOM can also be instructed to perform text tasks it hasn’t been explicitly trained for by casting them as text generation tasks.

With its 176 billion parameters, BLOOM is able to generate text in 46 natural languages and 13 programming languages. For almost all of them, such as Spanish, French, and Arabic, BLOOM is the first language model with over 100 billion parameters ever created. Researchers can download, run, and study BLOOM to investigate the performance and behavior of recently developed LLMs down to their deepest internal operations.

Solution overview

In this post, we show how to use the state-of-the-art instruction-tuned BloomZ 176B model from Hugging Face for text generation. You can use the BloomZ 176B model with few-shot learning and zero-shot learning for many NLP tasks, without fine-tuning the model. There is no need to train a new model because models like BloomZ 176B have a significant number of parameters such that they can easily adapt to many contexts without being retrained. The BloomZ 176B model has been trained with a large amount of data, making to applicable for many general-purpose tasks.

The code for all the steps in this demo is available in the following notebook.

Instruction tuning

The size and complexity of LLMs have exploded in the last few years. LLMs have demonstrated remarkable capabilities in learning the semantics of natural language and producing human-like responses. Many recent LLMs are fine-tuned with a powerful technique called instruction tuning, which helps the model perform new tasks or generate responses to novel prompts without prompt-specific fine-tuning. An instruction-tuned model uses its understanding of related tasks or concepts to generate predictions to novel prompts. Because this technique doesn’t involve updating model weights, it avoids the time-consuming and computationally expensive process required to fine-tune a model for a new, previously unseen task.

Instruction tuning involves fine-tuning a language model on a collection of NLP tasks using instructions. In this technique, the model is trained to perform tasks by following textual instructions instead of specific datasets for each task. The model is fine-tuned with a set of input and output examples for each task, allowing the model to generalize to new tasks that it hasn’t been explicitly trained on as long as prompts are provided for the tasks. Instruction tuning helps improve the accuracy and effectiveness of models and is helpful in situations where large datasets aren’t available for specific tasks.

Prompt engineering for zero-shot and few-shot NLP tasks on BLOOM models

Prompt engineering deals with creating high-quality prompts to guide the model towards the desired responses. Prompts need to be designed based on the specific task and dataset being used. The goal here is to provide the model with necessary information to generate high-quality responses while minimizing noise. This could involve keywords, additional contexts, questions, and more.

A well-designed prompt can make the model more creative and generalized so that it can easily adapt to new tasks. Prompts can also help incorporate domain knowledge on specific tasks and improve interpretability. Prompt engineering can greatly improve the performance of zero-shot and few-shot learning models. Creating high-quality prompts requires careful consideration of the task at hand, as well as a deep understanding of the model’s strengths and limitations.

In the following table, the BloomZ 176B model is used to generate responses for various zero-shot and few-shot NLP tasks. The first column shows the task, the second column contains the prompt provided to the model (where the template text is bold and the non-bold text is the example input) and the third column is the response from the model when queried against the prompt

For all the following tasks, we set the hyperparameters as 600 to be the max_length, num_return_sequences, top_k to be 250, top_p to be 0.95, and no sampling.

Task Prompt Input Output
Zero-shot learning tasks
Multilingual text or sentiment classification (Chinese to English) 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative? positive
Multilingual question answering (English to Chinese) Explain to me in Traditional Chinese what is the backpropagation 梯度反向传播算法是一种机器学习算法,用于训练人工神经网络。该算法使用梯度下降来最小化损失函数。
Code generation Write a binary search tree with O(log(n)) computational complexity.
{code_start}
def binary_search(a, x):
low = 0
high = len(a) – 1
while low <= high:
mid = (low + high) // 2
if a[mid] < x:
low = mid + 1
elif a[mid] > x:
high = mid – 1
else:
return mid
return -1′

else:
return mid
return low’                                                                                      
Output is truncated
Paragraph rephrase Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker.
SageMaker aims to provide a machine learning service which can be used to build, train, and deploy ML models for virtually any use case.
During this year’s re:Invent conference, AWS made several announcements to further improve SageMaker’s capabilities.nnHow would you rephrase that briefly using English?,Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker.
SageMaker aims to provide a machine learning service which can be used to build, train, and deploy ML models for virtually any use case.
During this year’s re:Invent conference, AWS made several announcements to further improve SageMaker’s capabilities. nThe above sentence is very complicated. Please provide me a simplified synonymous version consisting of multiple sentences:
Amazon SageMaker gets a major update Amazon Web Services (AWS) has announced nine major new updates for its cloud-based machine learning platform, SageMaker. SageMaker aims to provide a machine learning service that can be used to build, train and deploy ML models for virtually any use case
Summarization Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition.
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input.
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend’s Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.nn===nWrite a summary of the previous text in English:
Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition.
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input.
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend’s Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.
Common sense reasoning and natural language inference Given that The world cup has kicked off in Los Angeles, United States. Does it follow that The world cup takes place in United States. Yes or no?,
The world cup has kicked off in Los Angeles, United States. Using only the above description and what you know about the world, is “The world cup takes place in United States” definitely correct? Yes or no?,
Yes
Question answering question: what are the key features of new Kindle?\nnContext: The newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more.
For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more.
Here’s everything you need to know about the Kindle Scribe, including frequently asked questions.
The Kindle Scribe makes it easy to read and write like you would on paper
The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more.
It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button.
The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing.
\nnAnswer
take notes on millions of books and documents, write lists and journals, and more
Sentence and sentiment classification Review:nThis moive is so great and once again dazzles and delights usnIs this movie review sentence negative or positive?nOPTIONS:n-positive n-negative Positive
Imaginary article generation based on title Title: “University has new facility coming up”\nGiven the above title of an imaginary article, imagine the article.\n”””

 The £1.2m project will see the creation of a new teaching and learning centre at the University of the Highlands and Islands (UHI) in Inverness.

It will be built on the site of the former Inverness College of Commerce.

UHI said the new facility would be used for teaching and learning, as well as student support and administration.

The university said it would also be used for conferences and events.

The new building will be built on the site of the former Inverness College of Commerce

The university said the new building would be built on the site of the former Inverness College of Commerce, which closed in 2013.

It said the new facility would be used for teaching and learning, as well as student support and administration.

It would also be used for conferences and events.

UHI said the new facility would be built on the site of the former Inverness College of Commerce

The university said the new building would be built on the site of the former Inverness College of Commerce, which closed in 2013.

It said the new facility would be used for teaching and learning, as well as student support and administration.

It would also be used for conferences and events.

UHI said the new facility would be built on the site of the former Inverness…

Summarizing a title based on the article ‘The newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more.
For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more.
Here’s everything you need to know about the Kindle Scribe, including frequently asked questions.
The Kindle Scribe makes it easy to read and write like you would on paper
The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more.
It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button.
The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing. nn \n\nGive me a good title for the article above.
Amazon’s Kindle Scribe: Everything you need to know
Few-shot learning tasks
Summarization [Original]: Amazon scientists, in collaboration with researchers from the University of Sheffield, are making a large-scale fact extraction and verification dataset publicly available for the first time. The dataset, comprising more than 185,000 evidence-backed claims, is being made available to hopefully catalyze research and development that addresses the problems of fact extraction and verification in software applications or cloud-based services that perform automatic information extraction.
[Summary]: Amazon and University researchers make fact extraction and verification dataset publicly available.
###
[Original]: Prime members in the U.S. can get even more delivered to their door with a Prime membership. Members can now enjoy one year of Grubhub+ valued at $9.99 per month for free—at no added cost to their Prime membership. To activate this deal, visit amazon.com/grubhub. This new offer includes unlimited, $0 food delivery fees on orders over $12 as well as exclusive perks for Grubhub+ members and rewards like free food and order discounts. Plus, diners can “eat good while doing good” by opting into Grubhub’s Donate the Change program, a donation-matching initiative that raised more than $25 million in 2021 alone, benefiting more than 20 charitable organizations across the country.
[Summary]: Prime members in the U.S. can enjoy one year of Grubhub+ for free, with no food-delivery fees on eligible orders.
###
[Original]: Amazon scientists, in collaboration with researchers from the University of Sheffield, are making a large-scale fact extraction and verification dataset publicly available for the first time. The dataset, comprising more than 185,000 evidence-backed claims, is being made available to hopefully catalyze research and development that addresses the problems of fact extraction and verification in software applications or cloud-based services that perform automatic information extraction.
[Summary]:
[Summary]: Amazon and University researchers make fact extraction and verification dataset publicly available.
Code generation description: a orange button that says stop
code: <button style=color:white; background-color:orange;>Stop</button>
###
description: a blue box that contains yellow circles with red borders
code: <div style=background-color: blue; padding: 20px;><div style=background-color: yellow; border: 5px solid red; border-radius: 50%; padding: 20px; width: 100px; height: 100px;>
###

description: a Headline saying Welcome to AI
code:
code: <h1>Welcome to AI</h1>’
Name entity recognition [Text]: Fred is a serial entrepreneur. Co-founder and CEO of Platform.sh, he previously co-founded Commerce Guys, a leading Drupal ecommerce provider. His mission is to guarantee that as we continue on an ambitious journey to profoundly transform how cloud computing is used and perceived, we keep our feet well on the ground continuing the rapid growth we have enjoyed up until now.
[Name]: Fred
[Position]: Co-founder and CEO
[Company]: Platform.sh
###
[Text]: Microsoft (the word being a portmanteau of “microcomputer software”) was founded by Bill Gates on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800. Steve Ballmer replaced Gates as CEO in 2000, and later envisioned a “devices and services” strategy.
[Name]:  Steve Ballmer
[Position]: CEO
[Company]: Microsoft
###
[Text]: Franck Riboud was born on 7 November 1955 in Lyon. He is the son of Antoine Riboud, the previous CEO, who transformed the former European glassmaker BSN Group into a leading player in the food industry. He is the CEO at Danone.
[Name]:  Franck Riboud
[Position]: CEO
[Company]: Danone
###
[Text]: David Melvin is an investment and financial services professional at CITIC CLSA with over 30 years’ experience in investment banking and private equity. He is currently a Senior Adviser of CITIC CLSA.
    [Name]: Fred
[Position]: Co-founder and CEO
[Company]: Platform.sh
Question answering Context: NLP Cloud was founded in 2021 when the team realized there was no easy way to reliably leverage Natural Language Processing in production.
Question: When was NLP Cloud founded?
Answer: 2021
###
Context: NLP Cloud developed their API by mid-2020 and they added many pre-trained open-source models since then.
Question: What did NLP Cloud develop?
Answer: API
###
Context: All plans can be stopped anytime. You only pay for the time you used the service. In case of a downgrade, you will get a discount on your next invoice.
Question: When can plans be stopped?
Answer: Anytime
###
Context: The main challenge with GPT-J is memory consumption. Using a GPU plan is recommended.
Question: Which plan is recommended for GPT-J?
Answer:”””
Answer: GPU plan’
Grammar and spelling correction I love goin to the beach.
Correction: I love going to the beach.
###
Let me hav it!
Correction: Let me have it!
###
It have too many drawbacks.
Correction: It has too many drawbacks.
###

I do not wan to go
Correction:
    Correction: I don’t want to go
Product description and generalization Generate a product description out of keywords.
Keywords: shoes, women, $59
Sentence: Beautiful shoes for women at the price of $59.
###
Keywords: trousers, men, $69
Sentence: Modern trousers for men, for $69 only.
###
Keywords: gloves, winter, $19
Sentence: Amazingly hot gloves for cold winters, at $19.
###

Keywords: t-shirt, men, $39
Sentence:
    Sentence: Cool t-shirt for men, for $39 only.
Sentence and sentiment classification Message: When the spaceship landed on Mars, the whole humanity was excited
Topic: space
###
Message: I love playing tennis and golf. I’m practicing twice a week.
Topic: sport
###
Message: Managing a team of sales people is a tough but rewarding job.
Topic: business
###

Message: I am trying to cook chicken with tomatoes.
Topic:
    Topic: food
Chatbot and conversational AI This is a discussion between a [human] and a [robot].
The [robot] is very nice and empathetic.
[human]: Hello nice to meet you.
[robot]: Nice to meet you too.
###
[human]: How is it going today?
[robot]: Not so bad, thank you! How about you?
###
[human]: I am ok, but I am a bit sad…
[robot]: Oh? Why that?
###

[human]: I caught flu
[robot]:
    [robot]: Oh no, I am sorry to hear that!
Tweet generation keyword: markets
tweet: Take feedback from nature and markets, not from people
###
keyword: children
tweet: Maybe we die so we can come back as children.
###
keyword: startups
tweet: Startups should not worry about how to put out fires, they should worry about how to start them.
###

keyword: nlp
tweet:
    tweet: NLP is a tool, not a goal.
Machine translation Hugging Face a révolutionné le NLP.
Translation: Hugging Face revolutionized NLP.
###
Cela est incroyable!
Translation: This is unbelievable!
###
Désolé je ne peux pas.
Translation: Sorry but I cannot.
###
    NLP Cloud permet de deployer le NLP en production facilement.
Translation:
    Translation: NLP Cloud makes it easy to deploy NLP in production.
Intent classification I want to start coding tomorrow because it seems to be so fun!
Intent: start coding
###
Show me the last pictures you have please.
Intent: show pictures
###
Search all these files as fast as possible.
Intent: search files
###

Can you please teach me Chinese next week?
Intent:
    Intent: teach me chinese

Access the BloomZ 176B instruction-tuned model in SageMaker

SageMaker JumpStart provides two ways to get started using these instruction-tuned Bloom models: Amazon SageMaker Studio and the SageMaker SDK. The following sections illustrate what each of these options look like and how to access them.

Access the model with the simplified SageMaker JumpStart SDK

The simplified SageMaker JumpStart SDK facilitates training and deploying built-in SageMaker JumpStart models with a couple lines of code. This gives you access to the entire library of SageMaker JumpStart models, including the latest foundation models and image generation models, without having to supply any inputs besides the model ID.

You can take advantage of the model-specific default values we provide to specify the configuration, such as the Docker image, ML instance type, model artifact location, and hyperparameters, among other fields. These attributes are only default values; you can override them and retain granular control over the AWS models you create. As a result of these changes, the effort to write Python workflows to deploy and train SageMaker JumpStart models has been reduced, enabling you to spend more time on the tasks that matter. This feature is available in all Regions where JumpStart is supported, and can be accessed with the SageMaker Python SDK version 2.154.0 or later.

You can programmatically deploy an endpoint through the SageMaker SDK. You will need to specify the model ID of your desired model in the SageMaker model hub and the instance type used for deployment. The model URI, which contains the inference script, and the URI of the Docker container are obtained through the SageMaker SDK. These URIs are provided by SageMaker JumpStart and can be used to initialize a SageMaker model object for deployment.

Deploy the model and query the endpoint

This notebook requires ipywidgets. Install ipywidgets and then use the execution role associated with the current notebook as the AWS account role with SageMaker access.

Choose the pre-trained model

We choose the bloomz-176b-fp16 pre-trained model:

model_id = "huggingface-textgeneration1-bloomz-176b-fp16"

The notebook in the following sections uses BloomZ 176B as an example. For a complete list of SageMaker pre-trained models, refer to Built-in Algorithms with pre-trained Model Table.

Retrieve artifacts and deploy an endpoint

With SageMaker, we can perform inference on the pre-trained model without fine-tuning it first on a new dataset. We start by retrieving the deploy_image_uri, deploy_source_uri, and model_uri for the pre-trained model. To host the pre-trained model, we create an instance of sagemaker.model.Model and deploy it. This may take a few minutes.

Now we can deploy the model using the simplified SageMaker JumpStart SDK with the following lines of code:

from sagemaker.jumpstart.model import JumpStartModel


#if no instance for this model id is available, use a smaller id
model = JumpStartModel(model_id=model_id)

# ml.p4de.24xlarge is used by default. You can add the kwarg
# instance_type to change this setting.
predictor = model.deploy()

endpoint_name = predictor.endpoint_name

We use SageMaker large model inference (LMI) containers to host the BloomZ 176B model. LMI is an AWS-built LLM software stack (container) that offers easy-to-use functions and performance gain on generative AI models. It’s embedded with model parallelism, compilation, quantization, and other stacks to speed up inference. For details, refer to Deploy BLOOM-176B and OPT-30B on Amazon SageMaker with large model inference Deep Learning Containers and DeepSpeed.

Note that deploying this model requires a p4de.24xlarge instance and the deployment usually takes about 1 hour. If you don’t have quota for that instance, request a quota increate on the AWS Service Quotas console.

Query the endpoint and parse the response using various parameters to control the generated text

The input to the endpoint is any string of text formatted as JSON and encoded in utf-8 format. The output of the endpoint is a JSON file with generated text.

In the following example, we provide some sample input text. You can input any text and the model predicts the next words in the sequence. Longer sequences of text can be generated by calling the model repeatedly. The following code shows how to invoke an endpoint with these arguments:

from sagemaker.predictor import retrieve_default

predictor = retrieve_default(model_id=model_id, model_version="*", endpoint_name=endpoint=name)
response = predictor.predict("How to make a pasta?")
print(response["generated_text"])

We get the following output:

['How to make a pasta? boil a pot of water and add salt. Add the pasta to the water and cook until al dente. Drain the pasta.']

Access the model in SageMaker Studio

You can also access these models through the JumpStart landing page in Studio. This page lists available end-to-end ML solutions, pre-trained models, and example notebooks.

At the time of publishing the post, BloomZ 176B is only available in the us-east-2 Region.

Access the model in SageMaker Studio

You can choose the BloomZ 176B model card to view the notebook.

choose the BloomZ 176B model card

You can then import the notebook to run the notebook further.

import the notebook

Clean up

To avoid ongoing charges, delete the SageMaker inference endpoints. You can delete the endpoints via the SageMaker console or from the SageMaker Studio notebook using the following commands:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we gave an overview of the benefits of zero-shot and few-shot learning and described how prompt engineering can improve the performance of instruction-tuned models. We also showed how to easily deploy an instruction-tuned BloomZ 176B model from SageMaker JumpStart and provided examples to demonstrate how you can perform different NLP tasks using the deployed BloomZ 176B model endpoint in SageMaker.

We encourage you to deploy a BloomZ 176B model from SageMaker JumpStart and create your own prompts for NLP use cases.

To learn more about SageMaker JumpStart, check out the following:


About the Authors

Rajakumar Sampathkumar is a Principal Technical Account Manager at AWS, providing customers guidance on business-technology alignment and supporting the reinvention of their cloud operation models and processes. He is passionate about cloud and machine learning. Raj is also a machine learning specialist and works with AWS customers to design, deploy, and manage their AWS workloads and architectures.

Xin HuangDr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.

Evan Kravitz is a software engineer at Amazon Web Services, working on SageMaker JumpStart. He enjoys cooking and going on runs in New York City.

Read More

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs

This blog post is co-written with Tuana Çelik from deepset. 

Enterprise search is a critical component of organizational efficiency through document digitization and knowledge management. Enterprise search covers storing documents such as digital files, indexing the documents for search, and providing relevant results based on user queries. With the advent of large language models (LLMs), we can implement conversational experiences in providing the results to users. However, we need to ensure that the LLMs limit the responses to company data, thereby mitigating model hallucinations.

In this post, we showcase how to build an end-to-end generative AI application for enterprise search with Retrieval Augmented Generation (RAG) by using Haystack pipelines and the Falcon-40b-instruct model from Amazon SageMaker JumpStart and Amazon OpenSearch Service. The source code for the sample showcased in this post is available in the GitHub repository

Solution overview

To restrict the generative AI application responses to company data only, we need to use a technique called Retrieval Augmented Generation (RAG). An application using the RAG approach retrieves information most relevant to the user’s request from the enterprise knowledge base or content, bundles it as context along with the user’s request as a prompt, and then sends it to the LLM to get a response. LLMs have limitations around the maximum word count for the input prompts, so choosing the right passages among thousands or millions of documents in the enterprise has a direct impact on the LLM’s accuracy.

The RAG technique has become increasingly important in enterprise search. In this post, we show a workflow that takes advantage of SageMaker JumpStart to deploy a Falcon-40b-instruct model and uses Haystack to design and run a retrieval augmented question answering pipeline. The final retrieval augmentation workflow covers the following high-level steps:

  1. The user query is used for a retriever component, which does a vector search, to retrieve the most relevant context from our database.
  2. This context is embedded into a prompt that is designed to instruct an LLM to generate an answer only from the provided context.
  3. The LLM generates a response to the original query by only considering the context embedded into the prompt it received.

SageMaker JumpStart

SageMaker JumpStart serves as a model hub encapsulating a broad array of deep learning models for text, vision, audio, and embedding use cases. With over 500 models, its model hub comprises both public and proprietary models from AWS’s partners such as AI21, Stability AI, Cohere, and LightOn. It also hosts foundation models solely developed by Amazon, such as AlexaTM. Some of the models offer capabilities for you to fine-tune them with your own data. SageMaker JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for machine learning (ML) with SageMaker.

Haystack

Haystack is an open-source framework by deepset that allows developers to orchestrate LLM applications made up of different components like models, vector DBs, file converters, and countless other modules. Haystack provides pipelines and Agents, two powerful structures for designing LLM applications for various use cases including search, question answering, and conversational AI. With a big focus on state-of-the art retrieval methods and solid evaluation metrics, it provides you with everything you need to ship a reliable, trustworthy application. You can serialize pipelines to YAML files, expose them via a REST API, and scale them flexibly with your workloads, making it easy to move your application from a prototype stage to production.

Amazon OpenSearch

OpenSearch Service is a fully managed service that makes it simple to deploy, scale, and operate OpenSearch in the AWS Cloud. OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, security monitoring, and observability applications, licensed under the Apache 2.0 license.

In recent years, ML techniques have become increasingly popular to enhance search. Among them are the use of embedding models, a type of model that can encode a large body of data into an n-dimensional space where each entity is encoded into a vector, a data point in that space, and organized such that similar entities are closer together. A vector database provides efficient vector similarity search by providing specialized indexes like k-NN indexes.

With the vector database capabilities of OpenSearch Service, you can implement semantic search, RAG with LLMs, recommendation engines, and search rich media. In this post, we use RAG to enable us to complement generative LLMs with an external knowledge base that is typically built using a vector database hydrated with vector-encoded knowledge articles.

Application overview

The following diagram depicts the structure of the final application.

In this application, we use the Haystack Indexing Pipeline to manage uploaded documents and index documents and the Haystack Query Pipeline to perform knowledge retrieval from indexed documents.

The Haystack Indexing Pipeline includes the following high-level steps:

  1. Upload a document.
  2. Initialize DocumentStore and index documents.

We use OpenSearch as our DocumentStore and a Haystack indexing pipeline to preprocess and index our files to OpenSearch. Haystack FileConverters and PreProcessor allow you to clean and prepare your raw files to be in a shape and format that your natural language processing (NLP) pipeline and language model of choice can deal with. The indexing pipeline we’ve used here also uses sentence-transformers/all-MiniLM-L12-v2 to create embeddings for each document, which we use for efficient retrieval.

The Haystack Query Pipeline includes the following high-level steps:

  1. We send a query to the RAG pipeline.
  2. An EmbeddingRetriever component acts as a filter that retrieves the most relevant top_k documents from our indexed documents in OpenSearch. We use our choice of embedding model to embed both the query and the documents (at indexing) to achieve this.
  3. The retrieved documents are embedded into our prompt to the Falcon-40b-instruct model.
  4. The LLM returns with a response that is based on the retrieved documents.

For model deployment, we use SageMaker JumpStart, which simplifies deploying models through a simple push of a button. Although we’ve used and tested Falcon-40b-instruct for this example, you may use any Hugging Face model available on SageMaker.

The final solution is available on the haystack-sagemaker repository and uses the OpenSearch website and documentation (for OpenSearch 2.7) as our example data to perform retrieval augmented question answering on.

Prerequisites

The first thing to do before we can use any AWS services is to make sure we have signed up for and created an AWS account. Then you should create an administrative user and group. For instructions on both steps, refer to Set Up Amazon SageMaker Prerequisites.

To be able to use the Haystack, you’ll have to install the farm-haystack package with the required dependencies. To accomplish this, use the requirements.txt file in the GitHub repository by running pip install requirements.txt.

Index documents to OpenSearch

Haystack offers a number of connectors to databases, which are called DocumentStores. For this RAG workflow, we use the OpenSearchDocumentStore. The example repository includes an indexing pipeline and AWS CloudFormation template to set up an OpenSearchDocumentStore with documents crawled from the OpenSearch website and documentation pages.

Often, to get an NLP application working for production use cases, we end up having to think about data preparation and cleaning. This is covered with Haystack indexing pipelines, which allows you to design your own data preparation steps, which ultimately write your documents to the database of your choice.

An indexing pipeline may also include a step to create embeddings for your documents. This is highly important for the retrieval step. In our example, we use sentence-transformers/all-MiniLM-L12-v2 as our embedding model. This model is used to create embeddings for all our indexed documents, but also the user’s query at query time.

To index documents into the OpenSearchDocumentStore, we provide two options with detailed instructions in the README of the example repository. Here, we walk through the steps for indexing to an OpenSearch service deployed on AWS.

Start an OpenSearch service

Use the provided CloudFormation template to set up an OpenSearch service on AWS. By running the following command, you’ll have an empty OpenSearch service. You can then either choose to index the example data we’ve provided or use your own data, which you can clean and preprocess using the Haystack Indexing Pipeline. Note that this creates an instance that is open to the internet, which is not recommended for production use.

```bash
aws cloudformation create-stack --stack-name HaystackOpensearch --template-body file://cloudformation/opensearch-index.yaml --parameters ParameterKey=InstanceType,ParameterValue=r5.large.search ParameterKey=InstanceCount,ParameterValue=3 ParameterKey=OSPassword,ParameterValue=Password123!
```

Allow approximately 30 minutes for the stack launch to complete. You can check its progress on the AWS CloudFormation console by navigating to the Stacks page and looking for the stack named HaystackOpensearch.

Index documents into OpenSearch

Now that we have a running OpenSearch service, we can use the OpenSearchDocumentStore class to connect to it and write our documents to it.

To get the hostname for OpenSearch, run the following command:

```bash
aws cloudformation describe-stacks --stack-name HaystackOpensearch --query "Stacks[0].Outputs[?OutputKey=='OpenSearchEndpoint'].OutputValue" --output text
```

First, export the following:

```bash
export OPENSEARCH_HOST='your_opensearch_host'
export OPENSEARCH_PORT=443
export OPENSEARCH_USERNAME=admin
export OPENSEARCH_PASSWORD=Password123!
```

Then, you can use the opensearch_indexing_pipeline.py script to preprocess and index the provided demo data.

If you would like to use your own data, modify the indexing pipeline in opensearch_indexing_pipeline.py to include the FileConverter and PreProcessor setup steps you require.

Implement the retrieval augmented question answering pipeline

Now that we have indexed data in OpenSearch, we can perform question answering on these documents. For this RAG pipeline, we use the Falcon-40b-instruct model that we’ve deployed on SageMaker JumpStart.

You also have the option of deploying the model programmatically from a Jupyter notebook. For instructions, refer to the GitHub repo.

  1. Search for the Falcon-40b-instruct model on SageMaker JumpStart.
  2. Deploy your model on SageMaker JumpStart, and take note of the endpoint name.
  3. Export the following values:
    ```bash
    export SAGEMAKER_MODEL_ENDPOINT=your_falcon_40b_instruc_endpoint
    export AWS_PROFILE_NAME=your_aws_profile
    export AWS_REGION_NAME=your_aws_region
    ```

  4. Run python rag_pipeline.py.

This will start a command line utility that waits for a user’s question. For example, let’s ask “How can I install the OpenSearch cli?”

This result is achieved because we have defined our prompt in the Haystack PromptTemplate to be the following:

```python
question_answering = PromptTemplate(prompt="Given the context please answer the question. If the answer is not contained within the context below, say 'I don't know'.n" 
"Context: {join(documents)};n Question: {query};n Answer: ", output_parser=AnswerParser(reference_pattern=r"Document[(d+)]"))
```

Further customizations

You can make additional customizations to different elements in the solution, such as the following:

  • The data – We’ve provided the OpenSearch documentation and website data as example data. Remember to modify the opensearch_indexing_pipeline.py script to fit your needs if you chose to use your own data.
  • The model – In this example, we’ve used the Falcon-40b-instruct model. You are free to deploy and use any other Hugging Face model on SageMaker. Note that changing a model will likely mean you should adapt your prompt to something it’s designed to handle.
  • The prompt – For this post, we created our own PromptTemplate that instructs the model to answer questions based on the provided context and answer “I don’t know” if the context doesn’t include relevant information. You may change this prompt to experiment with different prompts with Falcon-40b-instruct. You can also simply pull some of our prompts from the PromptHub.
  • The embedding model – For the retrieval step, we use a lightweight embedding model: sentence-transformers/all-MiniLM-L12-v2. However, you may also change this to your needs. Remember to modify the expected embedding dimensions in your DocumentStore accordingly.
  • The number of retrieved documents – You may also choose to play around with the number of documents you ask the EmbeddingRetriever to retrieve for each query. In our setup, this is set to top_k=5. You may experiment with changing this figure to see if providing more context improves the accuracy of your results.

Production readiness

The proposed solution in this post can accelerate the time to value of the project development process. You can build a project that is easy to scale with the security and privacy environment on the AWS Cloud.

For security and privacy, OpenSearch Service provides data protection with identity and access management and cross-service confused proxy prevention. You may employ fine-grained user access control so that the user can only access the data they are authorized to access. Additionally, SageMaker provides configurable security settings for access control, data protection, and logging and monitoring. You can protect your data at rest and in transit with AWS Key Management Service (AWS KMS) keys. You can also track the log of SageMaker model deployment or endpoint access using Amazon CloudWatch. For more information, refer to Monitor Amazon SageMaker with Amazon CloudWatch.

For the high scalability on OpenSearch Service, you may adjust it by sizing your OpenSearch Service domains and employing operational best practices. You can also take advantage of auto scaling your SageMaker endpoint—you can automatically scale SageMaker models to adjust the endpoint both when the traffic is increased or the resources are not being used.

Clean up

To save costs, delete all the resources you deployed as part of this post. If you launched the CloudFormation stack, you can delete it via the AWS CloudFormation console. Similarly, you can delete any SageMaker endpoints you may have created via the SageMaker console.

Conclusion

In this post, we showcased how to build an end-to-end generative AI application for enterprise search with RAG by using Haystack pipelines and the Falcon-40b-instruct model from SageMaker JumpStart and OpenSearch Service. The RAG approach is critical in enterprise search because it ensures that the responses generated are in-domain and therefore mitigating hallucinations. By using Haystack pipelines, we are able to orchestrate LLM applications made up of different components like models and vector databases. SageMaker JumpStart provides us with a one-click solution for deploying LLMs, and we used OpenSearch Service as the vector database for our indexed data. You can start experimenting and building RAG proofs of concept for your enterprise generative AI applications, using the steps outlined in this post and the source code available in the GitHub repository.


About the Authors

Tuana Celik is the Lead Developer Advocate at deepset, where she focuses on the open-source community for Haystack. She leads the developer relations function and regularly speaks at events about NLP and creates learning materials for the community.

Roy Allela is a Senior AI/ML Specialist Solutions Architect at AWS based in Munich, Germany. Roy helps AWS customers—from small startups to large enterprises—train and deploy large language models efficiently on AWS. Roy is passionate about computational optimization problems and improving the performance of AI workloads.

Mia Chang is an ML Specialist Solutions Architect for Amazon Web Services. She works with customers in EMEA and shares best practices for running AI/ML workloads on the cloud with her background in applied mathematics, computer science, and AI/ML. She focuses on NLP-specific workloads, and shares her experience as a conference speaker and a book author. In her free time, she enjoys hiking, board games, and brewing coffee.

Inaam Syed is a Startup Solutions Architect at AWS, with a strong focus on assisting B2B and SaaS startups in scaling and achieving growth. He possesses a deep passion for serverless architectures and AI/ML. In his leisure time, Inaam enjoys quality moments with his family and indulges in his love for biking and badminton.

David Tippett is the Senior Developer Advocate working on open-source OpenSearch at AWS. His work involves all areas of OpenSearch from search and relevance to observability and security analytics.

Read More

Improving the Quality of Neural TTS Using Long-form Content and Multi-speaker Multi-style Modeling

Neural text-to-speech (TTS) can provide quality close to natural speech if an adequate amount of high-quality speech material is available for training. However, acquiring speech data for TTS training is costly and time-consuming, especially if the goal is to generate different speaking styles. In this work, we show that we can transfer speaking style across speakers and improve the quality of synthetic speech by training a multi-speaker multi-style (MSMS) model with long-form recordings, in addition to regular TTS recordings. In particular, we show that 1) multi-speaker modeling improves the…Apple Machine Learning Research

Amazon Translate enhances its custom terminology to improve translation accuracy and fluency

Amazon Translate enhances its custom terminology to improve translation accuracy and fluency

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. When you translate from one language to another, you want your machine translation to be accurate, fluent, and most importantly contextual. Domain-specific and language-specific customizable terminology is a key requirement for many government and commercial organizations.

Custom terminology enables you to customize your translation output such that your domain and organization-specific vocabulary, such as brand names, character names, model names, and other unique content (named entities), are translated exactly the way you need. To use the custom terminology feature, you should create a terminology file (CSV or TMX file format) and specify the custom terminology as a parameter in an Amazon Translate real-time translation or asynchronous batch processing request. Refer to Customize Amazon Translate output to meet your domain and organization specific vocabulary to get started on custom terminology.

In this post, we explore key enhancements to custom terminology, which doesn’t just do a simple match and replace but adds context-sensitive match and replace, which preserves the sentence construct. This enhancement aims to create contextually appropriate versions of matching target terms to generate translations of higher quality and fluency.

Solution overview

We use the following custom terminology file to explore the enhanced custom terminology features. For instructions on creating a custom terminology, refer to Customize Amazon Translate output to meet your domain and organization specific vocabulary.

en fr es
tutor éducateur tutor
sheep agneau oveja
walking promenant para caminar
burger sandwich hamburguesa
action-specific spécifique à l’action especifico de acción
order commande commande

Exploring the custom terminology feature

Let’s translate the sentence “she was a great tutor” with Amazon Translate. Complete the following steps:

  1. On Amazon Translate console, choose Real-time translation in the navigation pane.
  2. Choose the Text tab.
  3. For Target language, choose French.
  4. Enter the text “she was a great tutor.”

As shown in the following screenshot, the translation in French as “elle était une excellente tutrice.”

  1. Under Additional settings¸ select Custom terminology and choose your custom terminology file.

The translation in French is changed to “elle était une excellente éducatrice.”

In the custom terminology file, we have specified the translation for “tutor” as “éducateur.” “Éducateur” is masculine in French, whereas “tutor” in English is gender neutral. Custom terminology did not perform a match and replace here, instead it used the target word and applied the correct gender based on the context.

Now let’s test the feature with the source sentence “he has 10 sheep.” The translation in French is “il a 10 agneaux.” We provided custom terminology for “sheep” as “agneau.” “Agneau” in French means “baby sheep” and is singular. In this case, the target word is changed to inflect plural.

The source sentence “walking in the evening is precious to me” is translated to “me promener le soir est précieux pour moi.” The custom terminology target word “promenant” is changed to “promener” to inflect the correct verb tense.

The source sentence “I like burger” will be translated to “J’aime les sandwichs” to inflect the correct noun based on the context.

Now let’s test sentences with the target language as Spanish.

The source sentence “any action-specific parameters are listed in the topic for that action” is translated to “odos los parámetros especificos de acción aparecen en el tema de esa acción” to inflect the correct adjective.

The source sentence “in order for us to help you, please share your name” will be translated to “pour que nous puissions vous aider, veuillez partager votre nom.”

Some words may have entirely different meanings based on context. For example, the word “order” in English can be a sequence (as is in the source sentence) or a command or instruction (as in “I order books”). It’s difficult to know which meaning is intended without explicit information. In this case, “order” should not be translated as “commande” because it means “command” or “instruct” in French.

Conclusion

The custom terminology feature in Amazon Translate can help you customize translations based on your domain or language constructs. Recent enhancements to the custom terminology feature create contextually appropriate versions of matching terms to generate translations of higher quality. This enhancement improves the translation accuracy and fluency. There is no change required for existing customers to use the enhanced feature.

For more information about Amazon Translate, visit Amazon Translate resources to find video resources and blog posts, and refer to AWS Translate FAQs.


About the Authors

Sathya Balakrishnan is a Senior Consultant in the Professional Services team at AWS, specializing in data and ML solutions. He works with US federal financial clients. He is passionate about building pragmatic solutions to solve customers’ business problems. In his spare time, he enjoys watching movies and hiking with his family.

Sid Padgaonkar is the Senior Product Manager for Amazon Translate, AWS’s natural language processing service. On weekends, you will find him playing squash and exploring the food scene in the Pacific Northwest.

Read More

Zero-shot text classification with Amazon SageMaker JumpStart

Zero-shot text classification with Amazon SageMaker JumpStart

Natural language processing (NLP) is the field in machine learning (ML) concerned with giving computers the ability to understand text and spoken words in the same way as human beings can. Recently, state-of-the-art architectures like the transformer architecture are used to achieve near-human performance on NLP downstream tasks like text summarization, text classification, entity recognition, and more.

Large language models (LLMs) are transformer-based models trained on a large amount of unlabeled text with hundreds of millions (BERT) to over a trillion parameters (MiCS), and whose size makes single-GPU training impractical. Due to their inherent complexity, training an LLM from scratch is a very challenging task that very few organizations can afford. A common practice for NLP downstream tasks is to take a pre-trained LLM and fine-tune it. For more information about fine-tuning, refer to Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data and Fine-tune transformer language models for linguistic diversity with Hugging Face on Amazon SageMaker.

Zero-shot learning in NLP allows a pre-trained LLM to generate responses to tasks that it hasn’t been explicitly trained for (even without fine-tuning). Specifically speaking about text classification, zero-shot text classification is a task in natural language processing where an NLP model is used to classify text from unseen classes, in contrast to supervised classification, where NLP models can only classify text that belong to classes in the training data.

We recently launched zero-shot classification model support in Amazon SageMaker JumpStart. SageMaker JumpStart is the ML hub of Amazon SageMaker that provides access to pre-trained foundation models (FMs), LLMs, built-in algorithms, and solution templates to help you quickly get started with ML. In this post, we show how you can perform zero-shot classification using pre-trained models in SageMaker Jumpstart. You will learn how to use the SageMaker Jumpstart UI and SageMaker Python SDK to deploy the solution and run inference using the available models.

Zero-shot learning

Zero-shot classification is a paradigm where a model can classify new, unseen examples that belong to classes that were not present in the training data. For example, a language model that has beed trained to understand human language can be used to classify New Year’s resolutions tweets on multiple classes like career, health, and finance, without the language model being explicitly trained on the text classification task. This is in contrast to fine-tuning the model, since the latter implies re-training the model (through transfer learning) while zero-shot learning doesn’t require additional training.

The following diagram illustrates the differences between transfer learning (left) vs. zero-shot learning (right).

Transfer learning vs Zero-shot

Yin et al. proposed a framework for creating zero-shot classifiers using natural language inference (NLI). The framework works by posing the sequence to be classified as an NLI premise and constructs a hypothesis from each candidate label. For example, if we want to evaluate whether a sequence belongs to the class politics, we could construct a hypothesis of “This text is about politics.” The probabilities for entailment and contradiction are then converted to label probabilities. As a quick review, NLI considers two sentences: a premise and a hypothesis. The task is to determine whether the hypothesis is true (entailment) or false (contradiction) given the premise. The following table provides some examples.

Premise Label Hypothesis
A man inspects the uniform of a figure in some East Asian country. Contradiction The man is sleeping.
An older and younger man smiling. Neutral Two men are smiling and laughing at the cats playing on the floor.
A soccer game with multiple males playing. entailment Some men are playing a sport.

Solution overview

In this post, we discuss the following:

  • How to deploy pre-trained zero-shot text classification models using the SageMaker JumpStart UI and run inference on the deployed model using short text data
  • How to use the SageMaker Python SDK to access the pre-trained zero-shot text classification models in SageMaker JumpStart and use the inference script to deploy the model to a SageMaker endpoint for a real-time text classification use case
  • How to use the SageMaker Python SDK to access pre-trained zero-shot text classification models and use SageMaker batch transform for a batch text classification use case

SageMaker JumpStart provides one-click fine-tuning and deployment for a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, simplifying the development of high-quality models and reducing time to deployment. The JumpStart APIs allow you to programmatically deploy and fine-tune a vast selection of pre-trained models on your own datasets.

The JumpStart model hub provides access to a large number of NLP models that enable transfer learning and fine-tuning on custom datasets. As of this writing, the JumpStart model hub contains over 300 text models across a variety of popular models, such as Stable Diffusion, Flan T5, Alexa TM, Bloom, and more.

Note that by following the steps in this section, you will deploy infrastructure to your AWS account that may incur costs.

Deploy a standalone zero-shot text classification model

In this section, we demonstrate how to deploy a zero-shot classification model using SageMaker JumpStart. You can access pre-trained models through the JumpStart landing page in Amazon SageMaker Studio. Complete the following steps:

  1. In SageMaker Studio, open the JumpStart landing page.
    Refer to Open and use JumpStart for more details on how to navigate to SageMaker JumpStart.
  2. In the Text Models carousel, locate the “Zero-Shot Text Classification” model card.
  3. Choose View model to access the facebook-bart-large-mnli model.
    Alternatively, you can search for the zero-shot classification model in the search bar and get to the model in SageMaker JumpStart.
  4. Specify a deployment configuration, SageMaker hosting instance type, endpoint name, Amazon Simple Storage Service (Amazon S3) bucket name, and other required parameters.
  5. Optionally, you can specify security configurations like AWS Identity and Access Management (IAM) role, VPC settings, and AWS Key Management Service (AWS KMS) encryption keys.
  6. Choose Deploy to create a SageMaker endpoint.

This step takes a couple of minutes to complete. When it’s complete, you can run inference against the SageMaker endpoint that hosts the zero-shot classification model.

In the following video, we show a walkthrough of the steps in this section.

Use JumpStart programmatically with the SageMaker SDK

In the SageMaker JumpStart section of SageMaker Studio, under Quick start solutions, you can find the solution templates. SageMaker JumpStart solution templates are one-click, end-to-end solutions for many common ML use cases. As of this writing, over 20 solutions are available for multiple use cases, such as demand forecasting, fraud detection, and personalized recommendations, to name a few.

The “Zero Shot Text Classification with Hugging Face” solution provides a way to classify text without the need to train a model for specific labels (zero-shot classification) by using a pre-trained text classifier. The default zero-shot classification model for this solution is the facebook-bart-large-mnli (BART) model. For this solution, we use the 2015 New Year’s Resolutions dataset to classify resolutions. A subset of the original dataset containing only the Resolution_Category (ground truth label) and the text columns is included in the solution’s assets.

New year's resolutions table

The input data includes text strings, a list of desired categories for classification, and whether the classification is multi-label or not for synchronous (real-time) inference. For asynchronous (batch) inference, we provide a list of text strings, the list of categories for each string, and whether the classification is multi-label or not in a JSON lines formatted text file.

Zero-shot input example

The result of the inference is a JSON object that looks something like the following screenshot.

Zero-shot output example

We have the original text in the sequence field, the labels used for the text classification in the labels field, and the probability assigned to each label (in the same order of appearance) in the field scores.

To deploy the Zero Shot Text Classification with Hugging Face solution, complete the following steps:

  1. On the SageMaker JumpStart landing page, choose Models, notebooks, solutions in the navigation pane.
  2. In the Solutions section, choose Explore All Solutions.
    Amazon SageMaker JumpStart landing page
  3. On the Solutions page, choose the Zero Shot Text Classification with Hugging Face model card.
  4. Review the deployment details and if you agree, choose Launch.
    Zero-shot text classification with hugging face

The deployment will provision a SageMaker real-time endpoint for real-time inference and an S3 bucket for storing the batch transformation results.

The following diagram illustrates the architecture of this method.

Zero-shot text classification solution architecture

Perform real-time inference using a zero-shot classification model

In this section, we review how to use the Python SDK to run zero-shot text classification (using any of the available models) in real time using a SageMaker endpoint.

  1. First, we configure the inference payload request to the model. This is model dependent, but for the BART model, the input is a JSON object with the following structure:
    {
    “inputs”: # The text to be classified
    “parameters”: {
    “candidate_labels”: # A list of the labels we want to use for the text classification
    “multi_label”: True | False
    }
    }

  2. Note that the BART model is not explicitly trained on the candidate_labels. We will use the zero-shot classification technique to classify the text sequence to unseen classes. The following code is an example using text from the New Year’s resolutions dataset and the defined classes:
    classification_categories = ['Health', 'Humor', 'Personal Growth', 'Philanthropy', 'Leisure', 'Career', 'Finance', 'Education', 'Time Management']
    data_zero_shot = {
    "inputs": "#newyearsresolution :: read more books, no scrolling fb/checking email b4 breakfast, stay dedicated to pt/yoga to squash my achin' back!",
    "parameters": {
    "candidate_labels": classification_categories,
    "multi_label": False
    }
    }

  3. Next, you can invoke a SageMaker endpoint with the zero-shot payload. The SageMaker endpoint is deployed as part of the SageMaker JumpStart solution.
    response = runtime.invoke_endpoint(EndpointName=sagemaker_endpoint_name,
    ContentType='application/json',
    Body=json.dumps(payload))
    
    parsed_response = json.loads(response['Body'].read())

  4. The inference response object contains the original sequence, the labels sorted by score from max to min, and the scores per label:
    {'sequence': "#newyearsresolution :: read more books, no scrolling fb/checking email b4 breakfast, stay dedicated to pt/yoga to squash my achin' back!",
    'labels': ['Personal Growth',
    'Health',
    'Time Management',
    'Leisure',
    'Education',
    'Humor',
    'Career',
    'Philanthropy',
    'Finance'],
    'scores': [0.4198768436908722,
    0.2169460505247116,
    0.16591140627861023,
    0.09742163866758347,
    0.031757451593875885,
    0.027988269925117493,
    0.015974704176187515,
    0.015464971773326397,
    0.008658630773425102]}

Run a SageMaker batch transform job using the Python SDK

This section describes how to run batch transform inference with the zero-shot classification facebook-bart-large-mnli model using the SageMaker Python SDK. Complete the following steps:

  1. Format the input data in JSON lines format and upload the file to Amazon S3.
    SageMaker batch transform will perform inference on the data points uploaded in the S3 file.
  2. Set up the model deployment artifacts with the following parameters:
    1. model_id – Use huggingface-zstc-facebook-bart-large-mnli.
    2. deploy_image_uri – Use the image_uris Python SDK function to get the pre-built SageMaker Docker image for the model_id. The function returns the Amazon Elastic Container Registry (Amazon ECR) URI.
    3. deploy_source_uri – Use the script_uris utility API to retrieve the S3 URI that contains scripts to run pre-trained model inference. We specify the script_scope as inference.
    4. model_uri – Use model_uri to get the model artifacts from Amazon S3 for the specified model_id.

      #imports
      from sagemaker import image_uris, model_uris, script_uris, hyperparameters
      
      #set model id and version
      model_id, model_version, = (
      "huggingface-zstc-facebook-bart-large-mnli",
      "*",
      )
      
      # Retrieve the inference Docker container URI. This is the base Hugging Face container image for the default model above.
      deploy_image_uri = image_uris.retrieve(
      region=None,
      framework=None, # Automatically inferred from model_id
      image_scope="inference",
      model_id=model_id,
      model_version=model_version,
      instance_type="ml.g4dn.xlarge",
      )
      
      # Retrieve the inference script URI. This includes all dependencies and scripts for model loading, inference handling, and more.
      deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="inference")
      
      # Retrieve the model URI. This includes the pre-trained model and parameters.
      model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="inference") 

  3. Use HF_TASK to define the task for the Hugging Face transformers pipeline and HF_MODEL_ID to define the model used to classify the text:
    # Hub model configuration <https://huggingface.co/models>
    hub = {
    'HF_MODEL_ID':'facebook/bart-large-mnli', # The model_id from the Hugging Face Hub
    'HF_TASK':'zero-shot-classification' # The NLP task that you want to use for predictions
    }

    For a complete list of tasks, see Pipelines in the Hugging Face documentation.

  4. Create a Hugging Face model object to be deployed with the SageMaker batch transform job:
    # Create HuggingFaceModel class
    huggingface_model_zero_shot = HuggingFaceModel(
    model_data=model_uri, # path to your trained sagemaker model
    env=hub, # configuration for loading model from Hub
    role=role, # IAM role with permissions to create an endpoint
    transformers_version="4.17", # Transformers version used
    pytorch_version="1.10", # PyTorch version used
    py_version='py38', # Python version used
    )

  5. Create a transform to run a batch job:
    # Create transformer to run a batch job
    batch_job = huggingface_model_zero_shot.transformer(
    instance_count=1,
    instance_type='ml.m5.xlarge',
    strategy='SingleRecord',
    assemble_with='Line',
    output_path=s3_path_join("s3://",sagemaker_config['S3Bucket'],"zero_shot_text_clf", "results"), # we are using the same s3 path to save the output with the input
    )

  6. Start a batch transform job and use S3 data as input:
    batch_job.transform(
    data=data_upload_path,
    content_type='application/json',
    split_type='Line',
    logs=False,
    wait=True
    )

You can monitor your batch processing job on the SageMaker console (choose Batch transform jobs under Inference in the navigation pane). When the job is complete, you can check the model prediction output in the S3 file specified in output_path.

For a list of all the available pre-trained models in SageMaker JumpStart, refer to Built-in Algorithms with pre-trained Model Table. Use the keyword “zstc” (short for zero-shot text classification) in the search bar to locate all the models capable of doing zero-shot text classification.

Clean up

After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the costs incurred by the assets deployed in this guide are stopped. The code to clean up the deployed resources is provided in the notebooks associated with the zero-shot text classification solution and model.

Default security configurations

The SageMaker JumpStart models are deployed using the following default security configurations:

To learn more about SageMaker security-related topics, check out Configure security in Amazon SageMaker.

Conclusion

In this post, we showed you how to deploy a zero-shot classification model using the SageMaker JumpStart UI and perform inference using the deployed endpoint. We used the SageMaker JumpStart New Year’s resolutions solution to show how you can use the SageMaker Python SDK to build an end-to-end solution and implement zero-shot classification application. SageMaker JumpStart provides access to hundreds of pre-trained models and solutions for tasks like computer vision, natural language processing, recommendation systems, and more. Try out the solution on your own and let us know your thoughts.


About the authors

David Laredo is a Prototyping Architect at AWS Envision Engineering in LATAM, where he has helped develop multiple machine learning prototypes. Previously, he has worked as a Machine Learning Engineer and has been doing machine learning for over 5 years. His areas of interest are NLP, time series, and end-to-end ML.

Vikram Elango is an AI/ML Specialist Solutions Architect at Amazon Web Services, based in Virginia, US. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. He is currently focused on natural language processing, responsible AI, inference optimization, and scaling ML across the enterprise. In his spare time, he enjoys traveling, hiking, cooking, and camping with his family.

Vivek MadanDr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Read More