Fine-tune large language models with Amazon SageMaker Autopilot

Fine-tune large language models with Amazon SageMaker Autopilot

Fine-tuning foundation models (FMs) is a process that involves exposing a pre-trained FM to task-specific data and fine-tuning its parameters. It can then develop a deeper understanding and produce more accurate and relevant outputs for that particular domain.

In this post, we show how to use an Amazon SageMaker Autopilot training job with the AutoMLV2 SDK to fine-tune a Meta Llama2-7B model on question answering tasks. Specifically, we train the model on multiple-choice science exam questions covering physics, chemistry, and biology. This fine-tuning approach can be extended to other tasks, such as summarization or text generation, in domains like healthcare, education, or financial services.

AutoMLV2 supports the instruction-based fine-tuning of a selection of general-purpose FMs powered by Amazon SageMaker JumpStart. We use Amazon SageMaker Pipelines, which helps automate the different steps, including data preparation, fine-tuning, and creating the model. We use the open source library fmeval to evaluate the model and register it in the Amazon SageMaker Model Registry based on its performance.

Solution overview

The following architecture diagram shows the various steps involved to create an automated and scalable process to fine-tune large language models (LLMs) using AutoMLV2. The AutoMLV2 SDK simplifies the process of creating and managing AutoML jobs by providing high-level functions and abstractions, making it straightforward for developers who may not be familiar with AutoML concepts. The CreateAutoMLJobV2 API offers a low-level interface that allows for more control and customization. Using the SDK offers benefits like faster prototyping, better usability, and pre-built functions, and the API is better for advanced customizations.

Add image architecture

To implement the solution, we use SageMaker Pipelines in Amazon SageMaker Studio to orchestrate the different steps. The solution consists of two pipelines: training and inference.

To create the training pipeline, you complete the following steps:

Load and prepare the dataset.

  1. Create a SageMaker Autopilot CreateAutoMLJobV2 training job.
  2. Check the training job status.
  3. Deploy the best candidate model.

The following steps configure the inference pipeline:

Preprocess data for evaluation.

  1. Evaluate the model using the fmeval library.
  2. Register the model if it meets the required performance.

To deploy the solution, refer to the GitHub repo, which provides step-by-step instructions for fine-tuning Meta Llama2-7B using SageMaker Autopilot and SageMaker Pipelines.

Prerequisites

For this walkthrough, complete the following prerequisite steps:

  1. Set up an AWS account.
  2. Create a SageMaker Studio environment.
  3. Create two AWS Identity and Access Management (IAM) roles: LambdaExecutionRole and SageMakerExecutionRole, with permissions as outlined in the SageMaker notebook. The managed policies should be scoped down further for improved security. For instructions, refer to Create a role to delegate permissions to an IAM user.
  4. On the SageMaker Studio console, upload the code from the GitHub repo.
  5. Open the SageMaker notebook ipynb and run the cells.

Training pipeline

The following training pipeline shows a streamlined way to automate the fine-tuning of a pre-trained LLM and the deployment of the model to a real-time endpoint inference.

Add training pipeline image

Prepare the data

For this project, we used the SciQ dataset, which contains science exam questions about physics, chemistry, biology, and other subjects. SageMaker Autopilot supports instruction-based fine-tuning datasets formatted as CSV files (default) or as Parquet files.

When you prepare your CSV file, make sure that it contains exactly two columns:

  • The input column must be in a string format and contains the prompt
  • The output column is in a string format and indicates the ground truth answer

In this project, we start by removing the irrelevant columns. Next, we combine the question and support columns to create a comprehensive prompt, which is then placed in the input column. SageMaker Autopilot sets a maximum limit on the number of rows in the dataset and the context length based on the type of model being used. We select 10,000 rows from the dataset.

Finally, we divide the data into training and validation sets:

# Load and split dataset. Change this to your own dataset
dataset = load_dataset("allenai/sciq", split="train")   
dataset = dataset.train_test_split(test_size=0.1, shuffle=True)   
dataset_training_df = pd.DataFrame(dataset['train'])   
dataset_validation_df = pd.DataFrame(dataset['test'])   
dataset_training_df = dataset_training_df.sample(n=10000, random_state=42, ignore_index=True)   
# prepare training dataset to fit autopilot job.   
fields = ['question', 'correct_answer', 'support']
dataset_train_ist_df = dataset_training_df[fields].copy()  
dataset_fine_tune_ist = Dataset.from_pandas(dataset_train_ist_df)
dataset_fine_tune_ist_cpy= dataset_train_ist_df.copy()
dataset_fine_tune_ist_cpy["input"] = ("Below is an instruction that describes a task, paired with an input that provides further context."
"Write a response that appropriately completes the request.nn### Instruction:n"+ dataset_fine_tune_ist_cpy["question"]+
 "nn### Input:n" + dataset_fine_tune_ist_cpy["support"])
dataset_fine_tune_ist_cpy["output"] = dataset_fine_tune_ist_cpy["correct_answer"]
autopilot_fields = ['input', 'output']
dataset_fine_tune = Dataset.from_pandas(dataset_fine_tune_ist_cpy[autopilot_fields])
dataset_fine_tune.to_csv(train_dataset_s3_path, index=False)

Create an CreateAutoMLJobV2 training job

AutoMLV2 makes it straightforward to train, optimize, and deploy machine learning (ML) models by automating the tasks involved in the ML development lifecycle. It provides a simple approach to create highly accurate models tailored to your specific problem type, whether it’s classification, regression, forecasting, or others. In this section, we go through the steps to train a model with AutoMLV2, using an LLM fine-tuning job as an example. For this project, we used the Meta Llama2-7B model. You can change the model by choosing from the supported LLMs for fine-tuning.

Define the text generation configuration

AutoMLV2 automates the entire ML process, from data preprocessing to model training and deployment. However, for AutoMLV2 to work effectively, it’s crucial to provide the right problem configuration. This configuration acts as a guide, helping SageMaker Autopilot understand the nature of your problem and select the most appropriate algorithm or approach. By specifying details such as the problem type (such as classification, regression, forecasting, or fine-tuning), you give AutoMLV2 the necessary information to tailor its solution to your specific requirements.

For a fine-tuning job, the configuration consists of determining the model to be used and its access configuration, in addition to the hyperparameters that optimize the model learning process. See the following code:

text_generation_config = AutoMLTextGenerationConfig(   
 base_model_name= "Llama2-7B",
 accept_eula= True,
 text_generation_hyper_params={"epochCount": "3", "learningRate": "0.00001", "batchSize": "1", "learningRateWarmupSteps": "1"},
)

The definitions of each parameter used in text_generation_config are:

  • base_model_name – The name of the base model to fine-tune. SageMaker Autopilot supports fine-tuning a variety of LLMs. If no value is provided, the default model used is Falcon7BInstruct.
  • accept_eula – The access configuration file to control access to the ML model. The value is set to True to accept the model end-user license agreement (EULA). This setting is necessary for models like Meta Llama2-7B, which require accepting the license terms before they can be used.

epochCount – The number of times the model goes through the entire training dataset. Its value should be a string containing an integer value within the range of 1–10. One epoch means the Meta Llama2-7B model has been exposed to the 10,000 samples and had a chance to learn from them. You can set it to 3, meaning the model will make three complete passes, or increase the number, if the model doesn’t converge with just three epochs.

learningRate – The step size at which a model’s parameters are updated during training. Its value should be a string containing a floating-point value within the range of 0–1. A learning rate of 0,00001 or 0,00002 is a good standard when fine-tuning LLMs like Meta Llama2-7B.

batchSize – The number of data samples used in each iteration of training. Its value should be a string containing an integer value within the range of 1–64. Start with 1 in order to not receive an out-of-memory error.

learningRateWarmupSteps – The number of training steps during which the learning rate gradually increases before reaching its target or maximum value. Its value should be a string containing an integer value within the range of 0–250. Start with 1.

The configuration settings can be adjusted to align with your specific requirements and the chosen FM.

Start the AutoMLV2 job

Next, set up the AutoMLV2 job by providing the problem configuration details, the AWS role with the necessary permissions, a base name for job identification, and the output path where the model artifacts will be saved. To initiate the training process in a pipeline step, we invoked the create_auto_ml_job_v2 method. In the following code snippet, the create_auto_ml_job_v2 method is called to create an AutoML job object with specific inputs. The AutoMLJobInputDataConfig parameter takes a list that includes an AutoMLDataChannel, which specifies the type of data (in this case, ‘S3Prefix’) and the location of the training dataset (given by train_dataset_s3_path.default_value) in an S3 bucket. The channel_type is set to ‘training’, indicating that this dataset is used for training the model.

sagemaker_client.create_auto_ml_job_v2(
AutoMLJobName=event["AutopilotJobName"],
AutoMLJobInputDataConfig=[{ 
"ChannelType": "training", 
"CompressionType": "None", 
"ContentType": "text/csv;header=present",
"DataSource": {"S3DataSource": {"S3DataType": "S3Prefix",
                                "S3Uri": event["TrainDatasetS3Path"],}}}],
DataSplitConfig={'ValidationFraction':0.1},
OutputDataConfig={"S3OutputPath": event["TrainingOutputS3Path"]}, 
AutoMLProblemTypeConfig={ 
"TextGenerationJobConfig":{"BaseModelName": event["BaseModelName"],
'TextGenerationHyperParameters':{
"epochCount": event["epochCount"],
"learningRate": event["learningRate"],
"batchSize": event["batchSize"],
"learningRateWarmupSteps": event["learningRateWarmupSteps"]},
'ModelAccessConfig':{'AcceptEula': True}}},    
RoleArn=event["AutopilotExecutionRoleArn"],)

Check SageMaker Autopilot job status

This step tracks the status of the Autopilot training job. In the script check_autopilot_job_status.py, we repeatedly check the status of the training job until it’s complete.

The callback step sends a token in an Amazon Simple Queue Service (Amazon SQS) queue, which invokes the AWS Lambda function to check the training job status. If the job is complete, the Lambda function sends a success message back to the callback step and the pipeline continues with the next step.

Deploy a model with AutoMLV2 using real-time inference

AutoMLV2 simplifies the deployment of models by automating the entire process, from model training to deployment. It takes care of the heavy lifting involved in selecting the best-performing model and preparing it for production use.

Furthermore, AutoMLV2 simplifies the deployment process. It can directly create a SageMaker model from the best candidate model and deploy it to a SageMaker endpoint with just a few lines of code.

In this section, we look at the code that deploys the best-performing model to a real-time SageMaker endpoint.

This pipeline step uses a Lambda step, which runs a serverless Lambda function. We use a Lambda step because the API call to create and deploy the SageMaker model is lightweight.

The first stage after the completion of the AutoMLV2 training process is to select the best candidate, making sure that the most accurate and efficient solution is chosen for deployment. We use the method describe_auto_ml_job_v2 to retrieve detailed information about a specific AutoMLV2 job. This method provides insights into the current status, configuration, and output of your AutoMLV2 job, allowing you to monitor its progress and access relevant information. See the following code:

autopilot_job = sagemaker_client.describe_auto_ml_job_v2(
AutoMLJobName= event['autopilot_job_name'])
best_candidate = autopilot_job['BestCandidate']

In SageMaker Autopilot, the best candidate model is selected based on minimizing cross-entropy loss, a default metric that measures the dissimilarity between predicted and actual word distributions during fine-tuning. Additionally, the model’s quality is evaluated using metrics like ROUGE scores (ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-L-Sum), which measure the similarity between machine-generated text and human-written reference text, along with perplexity, which assesses how well the model predicts the next word in a sequence. The model with the lowest cross-entropy and perplexity, combined with strong ROUGE scores, is considered the best candidate.

With the best candidate model identified, you can create a SageMaker model object, encapsulating the trained model artifacts and necessary dependencies. For that, we use the method create_model for the AutoML job object:

best_candidate_name = best_candidate['CandidateName']
response = sagemaker_client.create_model(ModelName=best_candidate_name,
PrimaryContainer={'Image': autopilot_job["BestCandidate"]["InferenceContainers"][0].pop("Image"),
'ModelDataUrl': autopilot_job["BestCandidate"]["InferenceContainers"][0].pop("ModelDataUrl"),
'ImageConfig': {'RepositoryAccessMode': 'Platform',},
'Environment': {"HUGGINGFACE_HUB_CACHE": "/tmp", 
"TRANSFORMERS_CACHE": "/tmp",
"HF_MODEL_ID": "/opt/ml/model"}},
ExecutionRoleArn=event["AutopilotExecutionRoleArn"])

Next, we create a SageMaker endpoint configuration and deploy a SageMaker endpoint for real-time inference using the best candidate model. We use the instance type ml.g5.12xlarge to deploy the model. You may need to increase your quota to use this instance.

endpoint_name = f"ep-{model_name}-automl"
endpoint_config_name = f"{model_name}-endpoint-config"
endpoint_configuration = sagemaker_client.create_endpoint_config(
EndpointConfigName = endpoint_config_name,
ProductionVariants=[{'VariantName': "Variant-1",
'ModelName': model_name,
'InstanceType': "ml.g5.12xlarge",
'InitialInstanceCount': 1,}],)
response = sagemaker_client.create_endpoint(EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)
endpoint_arn = response["EndpointArn"] 

Inference pipeline

The inference pipeline is used for batch inference. It demonstrates a way to deploy and evaluate an FM and register it in SageMaker Model Registry. The following diagram shows the workflow starting with a preprocess data step, through model inference, to post-inference evaluation and conditional model registration.

Add Inference Pipeline image

Preprocess data for evaluation

The first crucial step in evaluating the performance of the fine-tuned LLM is to preprocess the data for evaluation. This preprocessing stage involves transforming the data into a format suitable for the evaluation process and verifying the compatibility with the chosen evaluation library.

In this particular case, we use a pipeline step to prepare the data for evaluation. The preprocessing script (preprocess_evaluation.py) creates a .jsonl (JSON Lines) file, which serves as the test dataset for the evaluation phase. The JSON Lines format is a convenient way to store structured data, where each line represents a single JSON object.

This test dataset is crucial for obtaining an unbiased evaluation of the model’s generalization capabilities and its ability to handle new, previously unseen inputs. After the evaluation_dataset.jsonl file is created, it’s saved in the appropriate path in an Amazon Simple Storage Service (Amazon S3) bucket.

Evaluate the model using the fmeval library

SageMaker Autopilot streamlines the entire ML workflow, automating steps from data preprocessing to model evaluation. After training multiple models, SageMaker Autopilot automatically ranks them based on selected performance metrics, such as cross-entropy loss for text generation tasks, and identifies the best-performing model.

However, when deeper, more granular insights are required, particularly during post-training evaluation with a testing dataset, we use fmeval, an open source library tailored for fine-tuning and evaluating FMs. Fmeval provides enhanced flexibility and control, allowing for a comprehensive assessment of model performance using custom metrics tailored to the specific use case. This makes sure the model behaves as expected in real-world applications. Fmeval facilitates the evaluation of LLMs across a broad range of tasks, including open-ended text generation, summarization, question answering, and classification. Additionally, fmeval assesses models on metrics such as accuracy, toxicity, semantic robustness, and prompt stereotyping, helping identify the optimal model for diverse use cases while maintaining ethical and robust performance.

To start using the library, follow these steps:

  1. Create a ModelRunner that can perform invocation on your LLM. ModelRunner encapsulates the logic for invoking different types of LLMs, exposing a predict method to simplify interactions with LLMs within the eval algorithm code. For this project, we use SageMakerModelRunner from fmeval.
  2. In the file py used by our pipeline, create a DataConfig object to use the evaluation_dataset created in the previous step.
  3. Next, use an evaluation algorithm with the custom dataset. For this project, we use the QAAccuracy algorithm, which measures how well the model performs in question answering tasks. The model is queried for a range of facts, and we evaluate the accuracy of its response by comparing the model’s output to target answers under different metrics:
    1. Exact match (EM) – Binary score. 1 if model output and target answer match exactly.
    2. Quasi-exact match – Binary score. Similar to exact match, but both model output and target answer are normalized first by removing articles and punctuation.
    3. Precision over words – The fraction of words in the prediction that are also found in the target answer. The text is normalized as before.
    4. Recall over words – The fraction of words in the target answer that are also found in the prediction.
    5. F1 over words – The harmonic mean of precision and recall over words (normalized).

As an output, the evaluation step produces a file (evaluation_metrics.json) that contains the computed metrics. This file is stored in Amazon S3 and is registered as a property file for later access in the pipeline.

Register the model

Before registering the fine-tuned model, we introduce a quality control step by implementing a condition based on the evaluation metrics obtained from the previous step. Specifically, we focus on the F1 score metric, which measures the harmonic mean of precision and recall between the normalized response and reference.

To make sure that only high-performing models are registered and deployed, we set a predetermined threshold for the F1 score metric. If the model’s performance meets or exceeds this threshold, it is suitable for registration and deployment. However, if the model fails to meet the specified threshold, the pipeline concludes without registering the model, stopping the deployment of suboptimal models.

Create and run the pipeline

After we define the pipeline steps, we combine them into a SageMaker pipeline. The steps are run sequentially. The pipeline runs the steps for an AutoML job, using SageMaker Autopilot for training, model evaluation, and model registration. See the following code:

pipeline = Pipeline(name="training-pipeline",
parameters=[evaluation_dataset_s3_path,
model_name,
metrics_report_s3_path, 
output_s3_path,
model_package_name,
model_approval_status],
steps=[step_preprocess_evaluation_data,
step_evaluate_autopilot_model,
step_condition,
step_register_autopilot_model],
sagemaker_session=sagemaker_session,)
pipeline.upsert(role_arn=SAGEMAKER_EXECUTION_ROLE_ARN)
pipeline_execution = pipeline.start()
pipeline_execution.wait(delay=20, max_attempts=24 * 60 * 3)  # max wait: 24 hours

Clean up

To avoid unnecessary charges and maintain a clean environment after running the demos outlined in this post, it’s important to delete all deployed resources. Follow these steps to properly clean up:

  1. To delete deployed endpoints, use the SageMaker console or the AWS SDK. This step is essential because endpoints can accrue significant charges if left running.
  2. Delete both SageMaker pipelines created during this walkthrough. This will help prevent residual executions that might generate additional costs.
  3. Remove all artifacts stored in your S3 buckets that were used for training, storing model artifacts, or logging. Make sure you delete only the resources related to this project to help avoid data loss.
  4. Clean up any additional resources. Depending on your implementation and any additional configurations, there may be other resources to consider, such as IAM roles, Amazon CloudWatch logs, or other AWS services. Identify and delete any resources that are no longer needed.

Conclusion

In this post, we explored how AutoMLV2 streamlines the process of fine-tuning FMs by automating the heavy lifting involved in model development. We demonstrated an end-to-end solution that uses SageMaker Pipelines to orchestrate the steps of data preparation, model training, evaluation, and deployment. The fmeval library played a crucial role in assessing the fine-tuned LLM’s performance, enabling us to select the best-performing model based on relevant metrics. By seamlessly integrating with the SageMaker infrastructure, AutoMLV2 simplified the deployment process, allowing us to create a SageMaker endpoint for real-time inference with just a few lines of code.

Get started by accessing the code on the GitHub repo to train and deploy your own custom AutoML models.

For more information on SageMaker Pipelines and SageMaker Autopilot, refer to Amazon SageMaker Pipelines and SageMaker Autopilot, respectively.


About the Author

headshotHajer Mkacher is a Solutions Architect at AWS, specializing in the Healthcare and Life Sciences industries. With over a decade in software engineering, she leverages generative AI to create innovative solutions, acting as a trusted advisor to her customers. In her free time, Hajer enjoys painting or working on creative robotics projects with her family.

Read More

Efficiency Meets Personalization: How AI Agents Improve Customer Service

Efficiency Meets Personalization: How AI Agents Improve Customer Service

Editor’s note: This post is the first in the AI On blog series, which explores the latest techniques and real-world applications of agentic AI, chatbots and copilots. The series will also highlight the NVIDIA software and hardware powering advanced AI agents, which form the foundation of AI query engines that gather insights and perform tasks to transform everyday experiences and reshape industries.

Whether it’s getting a complex service claim resolved or having a simple purchase inquiry answered, customers expect timely, accurate responses to their requests.

AI agents can help organizations meet this need. And they can grow in scope and scale as businesses grow, helping keep customers from taking their business elsewhere.

AI agents can be used as virtual assistants, which use artificial intelligence and natural language processing to handle high volumes of customer service requests. By automating routine tasks, AI agents ease the workload on human agents, allowing them to focus on tasks requiring a more personal touch.

AI-powered customer service tools like chatbots have become table stakes across every industry looking to increase efficiency and keep buyers happy. According to a recent IDC study on conversational AI, 41% of organizations use AI-powered copilots for customer service and 60% have implemented them for IT help desks.

Now, many of those same industries are looking to adopt agentic AI, semi-autonomous tools that have the ability to perceive, reason and act on more complex problems.

How AI Agents Enhance Customer Service

A primary value of AI-powered systems is the time they free up by automating routine tasks. AI agents can perform specific tasks, or agentic operations, essentially becoming part of an organization’s workforce — working alongside humans who can focus on more complex customer issues.

AI agents can handle predictive tasks and problem-solve, can be trained to understand industry-specific terms and can pull relevant information from an organization’s knowledge bases, wherever that data resides.

With AI agents, companies can:

  • Boost efficiency: AI agents handle common questions and repetitive tasks, allowing support teams to prioritize more complicated cases. This is especially useful during high-demand periods.
  • Increase customer satisfaction: Faster, more personalized interactions result in happier and more loyal customers. Consistent and accurate support improves customer sentiment and experience.
  • Scale Easily: Equipped to handle high volumes of customer support requests, AI agents scale effortlessly with growing businesses, reducing customer wait times and resolving issues faster.

AI Agents for Customer Service Across Industries

AI agents are transforming customer service across sectors, helping companies enhance customer conversations, achieve high-resolution rates and improve human representative productivity.

For instance, ServiceNow recently introduced IT and customer service management AI agents to boost productivity by autonomously solving many employee and customer issues. Its agents can understand context, create step-by-step resolutions and get live agent approvals when needed.

To improve patient care and reduce preprocedure anxiety, The Ottawa Hospital is using AI agents that have consistent, accurate and continuous access to information. The agent has the potential to improve patient care and reduce administrative tasks for doctors and nurses.

The city of Amarillo, Texas, uses a multilingual digital assistant named Emma to provide its residents with 24/7 support. Emma brings more effective and efficient disbursement of important information to all residents, including the one-quarter who don’t speak English.

AI agents meet current customer service demands while preparing organizations for the future.

Key Steps for Designing AI Virtual Assistants for Customer Support

AI agents for customer service come in a wide range of designs, from simple text-based virtual assistants that resolve customer issues, to animated avatars that can provide a more human-like experience.

Digital human interfaces can add warmth and personality to the customer experience. These agents respond with spoken language and even animated avatars, enhancing service interactions with a touch of real-world flair. A digital human interface lets companies customize the assistant’s appearance and tone, aligning it with the brand’s identity.

There are three key building blocks to creating an effective AI agent for customer service:

  • Collect and organize customer data: AI agents need a solid base of customer data (such as profiles, past interactions, and transaction histories) to provide accurate, context-aware responses.
  • Use memory functions for personalization: Advanced AI systems remember past interactions, allowing agents to deliver personalized support that feels human.
  • Build an operations pipeline: Customer service teams should regularly review feedback and update the AI agent’s responses to ensure it’s always improving and aligned with business goals.

Powering AI Agents With NVIDIA NIM Microservices

NVIDIA NIM microservices power AI agents by enabling natural language processing, contextual retrieval and multilingual communication. This allows AI agents to deliver fast, personalized and accurate support tailored to diverse customer needs.

Key NVIDIA NIM microservices for customer service agents include:

NVIDIA NIM for Large Language Models — Microservices that bring advanced language models to applications and enable complex reasoning, so AI agents can understand complicated customer queries.

NVIDIA NeMo Retriever NIM — Embedding and reranking microservices that support retrieval-augmented generation pipelines allow virtual assistants to quickly access enterprise knowledge bases and boost retrieval performance by ranking relevant knowledge-base articles and improving context accuracy.

NVIDIA NIM for Digital Humans — Microservices that enable intelligent, interactive avatars to understand speech and respond in a natural way. NVIDIA Riva NIM microservices for text-to-speech, automatic speech recognition (ASR), and translation services enable AI agents to communicate naturally across languages. The recently released Riva NIM microservices for ASR enable additional multilingual enhancements. To build realistic avatars, Audio2Face NIM converts streamed audio to facial movements for real-time lip syncing. 2D and 3D Audio2Face NIM microservices support varying use cases.

Getting Started With AI Agents for Customer Service

NVIDIA AI Blueprints make it easy to start building and setting up virtual assistants by offering ready-made workflows and tools to accelerate deployment. Whether for a simple AI-powered chatbot or a fully animated digital human interface, the blueprints offer resources to create AI assistants that are scalable, aligned with an organization’s brand and deliver a responsive, efficient customer support experience.

Editor’s note: IDC figures are sourced to IDC, Market Analysis Perspective: Worldwide Conversational AI Tools and Technologies, 2024 US51619524, Sept 2024

Read More

Into the Omniverse: How Generative AI Fuels Personalized, Brand-Accurate Visuals With OpenUSD

Into the Omniverse: How Generative AI Fuels Personalized, Brand-Accurate Visuals With OpenUSD

Editor’s note: This post is part of Into the Omniverse, a blog series focused on how developers, 3D artists and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse.

3D product configurators are changing the way industries like retail and automotive engage with customers by offering interactive, customizable 3D visualizations of products.

Using physically accurate product digital twins, even non-3D artists can streamline content creation and generate stunning marketing visuals.

With the new NVIDIA Omniverse Blueprint for 3D conditioning for precise visual generative AI, developers can start using the NVIDIA Omniverse platform and Universal Scene Description (OpenUSD) to easily build personalized, on-brand and product-accurate marketing content at scale.

By integrating generative AI into product configurators, developers can optimize operations and reduce production costs. With repetitive tasks automated, teams can focus on the creative aspects of their jobs.

Developing Controllable Generative AI for Content Production

The new Omniverse Blueprint introduces a robust framework for integrating generative AI into 3D workflows to enable precise and controlled asset creation.

Example images created using the NVIDIA Omniverse Blueprint for 3D conditioning for precise visual generative AI.

Key highlights of the blueprint include:

  • Model conditioning to ensure that the AI-generated visuals adhere to specific brand requirements like colors and logos.
  • Multimodal approach that combines 3D and 2D techniques to offer developers complete control over final visual outputs while ensuring the product’s digital twin remains accurate.
  • Key components such as an on-brand hero asset, a simple and untextured 3D scene, and a customizable application built with the Omniverse Kit App Template.
  • OpenUSD integration to enhance development of 3D visuals with precise visual generative AI.
  • Integration of NVIDIA NIM, such as the Edify 360 NIM, Edify 3D NIM, USD Code NIM and USD Search NIM microservices, allows the blueprint to be extensible and customizable. The microservices are available to preview on build.nvidia.com.

How Developers Are Building AI-Enabled Content Pipelines

Katana Studio developed a content creation tool with OpenUSD called COATcreate that empowers marketing teams to rapidly produce 3D content for automotive advertising. By using 3D data prepared by creative experts and vetted by product specialists in OpenUSD, even users with limited artistic experience can quickly create customized, high-fidelity, on-brand content for any region or use case without adding to production costs.

Global marketing leader WPP has built a generative AI content engine for brand advertising with OpenUSD. The Omniverse Blueprint for precise visual generative AI helped facilitate the integration of controllable generative AI in its content creation tools. Leading global brands like The Coca-Cola Company are already beginning to adopt tools from WPP to accelerate iteration on its creative campaigns at scale.

Watch the replay of a recent livestream with WPP for more on its generative AI- and OpenUSD-enabled workflow:

The NVIDIA creative team developed a reference workflow called CineBuilder on Omniverse that allows companies to use text prompts to generate ads personalized to consumers based on region, weather, time of day, lifestyle and aesthetic preferences.

Developers at independent software vendors and production services agencies are building content creation solutions infused with controllable generative AI and built on OpenUSD. Accenture Song, Collective World, Grip, Monks and WPP are among those adopting Omniverse Blueprints to accelerate development.

Read the tech blog on developing product configurators with OpenUSD and get started developing solutions using the DENZA N7 3D configurator and CineBuilder reference workflow.

Get Plugged Into the World of OpenUSD

Various resources are available to help developers get started building AI-enabled product configuration solutions:

For more on optimizing OpenUSD workflows, explore the new Learn OpenUSD training curriculum that includes free Deep Learning Institute courses for 3D practitioners and developers. For more resources on OpenUSD, explore the Alliance for OpenUSD forum and visit the AOUSD website.

Don’t miss the CES keynote delivered by NVIDIA founder and CEO Jensen Huang live in Las Vegas on Monday, Jan. 6, at 6:30 p.m. PT for more on the future of AI and graphics.

Stay up to date by subscribing to NVIDIA news, joining the community and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.

Read More

First ‘Star Wars Outlaws’ Story Pack Hits GeForce NOW

First ‘Star Wars Outlaws’ Story Pack Hits GeForce NOW

Get ready to dive deeper into the criminal underworld of a galaxy far, far away as GeForce NOW brings the first major story pack for Star Wars Outlaws to the cloud this week.

The season of giving continues — GeForce NOW members can access a new free reward: a special in-game Star Wars Outlaws enhancement.

It’s all part of an exciting GFN Thursday, topped with five new games joining the more than 2,000 titles supported in the GeForce NOW library, including the launch of S.T.A.L.K.E.R. 2: Heart of Chornobyl and Xbox Gaming Studios fan favorites Fallout 3: Game of the Year Edition and The Elder Scrolls IV: Oblivion.

And make sure not to pass this opportunity up — gamers who want to take the Performance and Ultimate memberships for a spin can do so with 25% off Day Passes, now through Friday, Nov. 22. Day Passes give access to 24 continuous hours of powerful cloud gaming.

A New Saga Begins

The galaxy’s most electrifying escapade gets even more exciting with the new Wild Card story pack for Star Wars Outlaws.

This thrilling story pack invites scoundrels to join forces with the galaxy’s smoothest operator, Lando Calrissian, for a high-stakes Sabacc tournament that’ll keep players on the edge of their seats. As Kay Vess, gamers bluff, charm and blast their way through new challenges, exploring uncharted corners of the Star Wars galaxy. Meanwhile, a free update will scatter fresh Contract missions across the stars, offering members ample opportunities to build their reputations and line their pockets with credits.

To kick off this thrilling underworld adventure, GeForce NOW members are in for a special reward with the Forest Commando Character Pack.

Star Wars Outlaws Wild Card DLC on GeForce NOW
Time to get wild.

The pack gives Kay and Nix, her loyal companion, a complete set of gear that’s perfect for missions in lush forest worlds. Get equipped with tactical trousers, a Bantha leather belt loaded with attachments, a covert poncho to shield against jungle rain and a hood for Nix that’s great for concealment in thick forests.

Members of the GeForce NOW rewards program can check their email for instructions on how to claim the reward. Ultimate and Performance members can start redeeming style packages today. Don’t miss out — this offer is available through Saturday, Dec. 21, on a first-come, first-served basis.

Welcome to the Zone

STALKER 2 on GeForce NOW
Welcome to the zone.

S.T.A.L.K.E.R. 2: Heart of Chornobyl, the highly anticipated sequel in the cult-classic S.T.A.L.K.E.R. series, is a first-person-shooter survival-horror game set in the Chornobyl Exclusion Zone.

In the game — which blends postapocalyptic fiction with Ukrainian folklore and the eerie reality of the Chornobyl disaster — players can explore a vast open world filled with mutated creatures, anomalies and other stalkers while uncovering the zone’s secrets and battling for survival.

The title features advanced graphics and physics powered by Unreal Engine 5 for stunningly realistic and detailed environments. Players’ choices impact the game world and narrative, which comprises a nonlinear storyline with multiple possible endings.

Players will take on challenging survival mechanics to test their skills and decision-making abilities. Members can make their own epic story with a Performance membership for enhanced GeForce RTX-powered streaming at 1440p or an Ultimate membership for up to 4K 120 frames per second streaming, offering the crispest visuals and smoothest gameplay.

Adventures Await

Fallout 3 GOTY on GeForce NOW
Vault 101 has opened.

Members can emerge from Vault 101 into the irradiated ruins of Washington, D.C., in Fallout 3: Game of the Year Edition, which includes all five downloadable content packs released for Fallout 3. Experience the game that redefined the postapocalyptic genre with its morally ambiguous choices, memorable characters and the innovative V.A.T.S. combat system. Whether revisiting the Capital Wasteland, exploring the Mojave Desert or delving into the realm of Cyrodiil, these iconic titles have never looked or played better thanks to the power of GeForce NOW’s cloud streaming technology.

Members can look for the following games available to stream in the cloud this week:

  • Towers of Aghasba (New release on Steam, Nov. 19)
  • S.T.A.L.K.E.R. 2: Heart of Chornobyl (New release on Steam and Xbox, available on PC Game Pass, Nov. 20)
  • Star Wars Outlaws (New release on Steam, Nov. 21)
  • The Elder Scrolls IV: Oblivion Game of the Year Edition (Epic Games Store, Steam and Xbox, available on PC Game Pass)
  • Fallout 3: Game of the Year Edition (Epic Games Store, Steam and Xbox, available on PC Game Pass)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Rebellions logo

Rebellions Joins the PyTorch Foundation as a General Member

Rebellions logo

The PyTorch Foundation, a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem, is announcing today that Rebellions has joined as a general member.

Rebellions is a South Korea-based semiconductor company specializing in the design and development of AI chips for data centers and edge devices. Their innovative hardware and software solutions aim to accelerate generative AI and machine learning workloads, focusing on high energy efficiency and performance. The company successfully launched and deployed its AI chip ‘ATOM’ targeting data centers in 2023 and is developing its next-generation AI accelerator ‘REBEL’.

“We’re thrilled to welcome Rebellions as a new general member of the PyTorch Foundation,” said Matt White, Executive Director of the PyTorch Foundation. “Rebellions brings a unique perspective to the PyTorch ecosystem with their focus on advancing the integration of NPU architectures for AI acceleration with PyTorch. Their expertise will play a vital role in ensuring PyTorch continues to evolve as a versatile framework, accommodating the diverse needs of modern AI workloads. We look forward to collaborating with Rebellions to drive innovation and strengthen the PyTorch ecosystem for developers worldwide.”

Rebellions has introduced native support for PyTorch 2.0 in their RBLN SDK. This integration includes compatibility with torch.compile, a pivotal feature of PyTorch 2.0 that enhances model performance. Through this development, Rebellions has empowered developers to seamlessly harness the full potential of their AI accelerator lineup within the environment.

Rebellions is also deeply committed to advancing the PyTorch ecosystem through collaborative innovation starting in Korea. The company has established a Special Interest Group (SIG) focusing on Pytorch Core within the PyTorch Korea community and is actively working with volunteers recruited through MODULABS, an open research institute, to integrate native support for the deep learning framework into their Neural Processing Unit (NPU).

In addition, Rebellions is collaborating with academic institutions, such as Yonsei University, Hanyang University, University of Science & Technology (UST) and national agencies, such as the Electronics and Telecommunications Research Institute (ETRI), to offer undergraduate and graduate courses on PyTorch and enable them to leverage Pytorch as their research platform.

These initiatives highlight Rebellions’ dedication to optimizing the PyTorch experience for developers and researchers alike, while also fostering education and innovation in the field.

“By integrating our hardware innovations with PyTorch, we’re building Native NPU support to accelerate diverse AI workloads.” said Hong-seok Kim, the Chief Software Architect at Rebellions. “We’re excited to contribute to the PyTorch community by community-driven initiatives and partnerships, advancing NPU architecture support for next-generation AI solutions. Together with the PyTorch community, we aim to pioneer new possibilities in AI acceleration and empower developers worldwide with efficient computing solutions.”

To learn more about how your organization can be a part of the PyTorch Foundation, visit our website.

About Rebellions

Rebellions is a South Korea-based semiconductor company specializing in the design and development of AI chips for data centers and edge devices. Their innovative hardware and software solutions aim to accelerate generative AI and machine learning workloads, focusing on high energy efficiency and performance. The company successfully launched and deployed its AI chip ‘ATOM’ targeting data centers in 2023 and is developing its next-generation AI accelerator ‘REBEL’ incorporating a scalable chiplet architecture and high-bandwidth memory.

About PyTorch Foundation

The PyTorch Foundation is a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem. The PyTorch Foundation is supported by its members and leading contributors to the PyTorch open source project. The Foundation leverages resources provided by members and contributors to enable community discussions and collaboration.

About The Linux Foundation

The Linux Foundation is the world’s leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, ONAP, PyTorch, RISC-V, SPDX, OpenChain, and more. The Linux Foundation focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org.

Read More

Memory-Retaining Finetuning via Distillation

This paper was accepted at the Fine-Tuning in Modern Machine Learning: Principles and Scalability (FITML) Workshop at NeurIPS 2024.
Large language models (LLMs) pretrained on large corpora of internet text possess much of the world’s knowledge. Following pretraining, one often needs to conduct continued pretraining on certain capabilities, such as math and coding, or “posttraining” (a.k.a., alignment) techniques to make the models follow users’ instructions and align them with human preferences. One challenge during these finetuning stages is that the model can lose the pretraining knowledge…Apple Machine Learning Research

Multimodal Autoregressive Pre-Training of Large Vision Encoders

*Equal Contributors
A dominant paradigm in large multimodal models is to pair a large language de- coder with a vision encoder. While it is well-known how to pre-train and tune language decoders for multimodal tasks, it is less clear how the vision encoder should be pre-trained. A de facto standard is to pre-train the vision encoder with a discriminative objective, such as contrastive loss. This causes a mismatch between pre-training and the generative autoregressive downstream task. At the same time, following their success in the language domain, autoregressive image models have been shown…Apple Machine Learning Research

Instance-Optimal Private Density Estimation in the Wasserstein Distance

Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances.
For distributions…Apple Machine Learning Research