Build generative AI–powered Salesforce applications with Amazon Bedrock

Build generative AI–powered Salesforce applications with Amazon Bedrock

This post is co-authored by Daryl Martis and Darvish Shadravan from Salesforce.

This is the fourth post in a series discussing the integration of Salesforce Data Cloud and Amazon SageMaker.

In Part 1 and Part 2, we show how Salesforce Data Cloud and Einstein Studio integration with SageMaker allows businesses to access their Salesforce data securely using SageMaker’s tools to build, train, and deploy models to endpoints hosted on SageMaker. SageMaker endpoints can be registered with Salesforce Data Cloud to activate predictions in Salesforce. In Part 3, we demonstrate how business analysts and citizen data scientists can create machine learning (ML) models, without code, in Amazon SageMaker Canvas and deploy trained models for integration with Salesforce Einstein Studio to create powerful business applications.

In this post, we show how native integrations between Salesforce and Amazon Web Services (AWS) enable you to Bring Your Own Large Language Models (BYO LLMs) from your AWS account to power generative artificial intelligence (AI) applications in Salesforce. Requests and responses between Salesforce and Amazon Bedrock pass through the Einstein Trust Layer, which promotes responsible AI use across Salesforce.

We demonstrate BYO LLM integration by using Anthropic’s Claude model on Amazon Bedrock to summarize a list of open service cases and opportunities on an account record page, as shown in the following figure.

Partner quote

“We continue to expand on our strong collaboration with AWS with our BYO LLM integration with Amazon Bedrock, empowering our customers with more model choices and allowing them to create AI-powered features and Copilots customized for their specific business needs. Our open and flexible AI environment, grounded with customer data, positions us well to be leaders in AI-driven solutions in the CRM space.”

–Kaushal Kurapati, Senior Vice President of Product for AI at Salesforce

Amazon Bedrock

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can quickly experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don’t have to manage infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

Salesforce Data Cloud and Einstein Model Builder

Salesforce Data Cloud is a data platform that unifies your company’s data, giving every team a 360-degree view of the customer to drive automation and analytics, personalize engagement, and power trusted AI. Data Cloud creates a holistic customer view by turning volumes of disconnected data into a single, trusted model that’s simple to access and understand. With data harmonized within Salesforce Data Cloud, customers can put their data to work to build predictions and generative AI–powered business processes across sales, support, and marketing.

With Einstein Model Builder, customers can build their own models using Salesforce’s low-code model builder experience or integrate their own custom-built models into the Salesforce platform. Einstein Model Builder’s BYO LLM experience provides the capability to register custom generative AI models from external environments such as Amazon Bedrock and Salesforce Data Cloud.

Once custom Amazon Bedrock models are registered in Einstein Model Builder, models are connected through the Einstein Trust Layer, a robust set of features and guardrails that protect the privacy and security of data, improve the safety and accuracy of AI results, and promote the responsible use of AI across Salesforce. Registered models can then be used in Prompt Builder, a newly launched, low-code prompt engineering tool that allows Salesforce admins to build, test, and fine-tune trusted AI prompts that can be used across the Salesforce platform. These prompts can be integrated with Salesforce capabilities such as Flows and Invocable Actions and Apex.

Solution overview

With the Salesforce Einstein Model Builder BYO LLM feature, you can invoke Amazon Bedrock models in your AWS account. At the time of this writing, Salesforce supports Anthropic Claude 3 models on Amazon Bedrock for BYO LLM. For this post, we use the Anthropic Claude 3 Sonnet model. To learn more about inference with Claude 3, refer to Anthropic Claude models in the Amazon Bedrock documentation.

For your implementation, you may use the model of your choice. Refer to Bring Your Own Large Language Model in Einstein 1 Studio for models supported with Salesforce Einstein Model Builder.

The following image shows a high-level architecture of how you can integrate the LLM from your AWS account into the Salesforce Prompt Builder.

In this post, we show how to build generative AI–powered Salesforce applications with Amazon Bedrock. The following are the high-level steps involved:

  1. Grant Amazon Bedrock invoke model permission to an AWS Identity and Access Management (IAM) user
  2. Register the Amazon Bedrock model in Salesforce Einstein Model Builder
  3. Integrate the prompt template with the field in the Lightning App Builder

Prerequisites

Before deploying this solution, make sure you meet the following prerequisites:

  1. Have access to Salesforce Data Cloud and meet the requirements for using BYO LLM.
  2. Have Amazon Bedrock set up. If this is the first time you are accessing Anthropic Claude models on Amazon Bedrock, you need to request access. You need to have sufficient permissions to request access to models through the console. To request model access, sign in to the Amazon Bedrock console and select Model access at the bottom of the left navigation pane.

Solution walkthrough

To build generative AI–powered Salesforce applications with Amazon Bedrock, implement the following steps.

Grant Amazon Bedrock invoke model permission to an IAM User

Salesforce Einstein Studio requires an access key and a secret to access the Amazon Bedrock API. Follow the instructions to set up an IAM user and access keys. The IAM user must have Amazon Bedrock invoke model permission to access the model. Complete the following steps:

  1. On the IAM console, select Users in the navigation panel. On the right side of the console, choose Add permissions and Create inline policy.
  2. On the Specify permissions screen, in the Service dropdown menu, select Bedrock.
  3. Under Actions allowed, enter “invoke.” Under Read, select InvokeModel. Select All under Resources. Choose Next.
  4. On the Review and create screen, under Policy name, enter BedrockInvokeModelPolicy. Choose Create policy.

Register Amazon Bedrock model in Einstein Model Builder

  1. On the Salesforce Data Cloud console, under the Einstein Studio tab, choose Add Foundation Model.
  2. Choose Connect to Amazon Bedrock.
  3. For Endpoint information, enter the endpoint name, your AWS account Access Key, and your Secret Key. Enter the Region and Model information. Choose Connect.
  4. Now, create the configuration for the model endpoint you created in the previous steps. Provide Inference parameters such as temperature to set the deterministic factor of the LLM. Enter a sample prompt to verify the response.
  5. Next, you can save this new model configuration. Enter the name for the saved LLM model and choose Create Model.
  6. After the model creation is successful, choose Close and proceed to create the prompt template.
  7. Select the Model name to open the Model configuration.
  8. Select Create Prompt Template to launch the prompt builder.
  9. Select Field Generation as the prompt template type, template name, set Object to Account, and set Object Field to PB Case and Oppty Summary. This will associate the template to a custom field in the account record object to summarize the cases.

For this demo, a rich text field named PB Case and Oppty Summary was created and added to the Salesforce Account page layout according to the Add a Field Generation Prompt Template to a Lightning Record Page instructions.

  1. Provide the prompt and input variables or objects for data grounding and select the model. Refer to Prompt Builder to learn more.

Integrate prompt template with the field in the Lightning App builder

  1. On the Salesforce console, use the search bar to find Lightning App Builder. Build or edit an existing page to integrate the prompt template with the field as shown in the following screenshot. Refer to Add a Field Generation Prompt Template to a Lightning Record Page for detailed instructions.
  2. Navigate to the Account page and click on the PB Case and Oppty Summary enabled for chat completion to launch the Einstein generative AI assistant and summarize the account case data.

Cleanup

Complete the following steps to clean up your resources.

  1. Delete the IAM user
  2. Delete the foundation model in Einstein Studio

Amazon Bedrock offers on-demand inference pricing. There’s no additional costs with a continued model subscription. To remove model access, refer to the steps in Remove model access.

Conclusion

In this post, we demonstrated how to use your own LLM in Amazon Bedrock to power Salesforce applications. We used summarization of open service cases on an account object as an example to showcase the implementation steps.

Amazon Bedrock is a fully managed service that makes high-performing FMs from leading AI companies and Amazon available for your use through a unified API. You can choose from a wide range of FMs to find the model that is best suited for your use case.

Salesforce Einstein Model Builder lets you register your Amazon Bedrock model and use it in Prompt Builder to create prompts grounded in your data. These prompts can then be integrated with Salesforce capabilities such as Flows and Invocable Actions and Apex. You can then build custom generative AI applications with Claude 3 that are grounded in the Salesforce user experience. Amazon Bedrock requests from Salesforce pass through the Einstein Trust Layer, which provides responsible AI use with features such as dynamic grounding, zero data retention, and toxicity detection while maintaining safety and security standards.

AWS and Salesforce are excited for our mutual customers to harness this integration and build generative AI–powered applications. To learn more and start building, refer to the following resources.


About the Authors

Daryl Martis is the Director of Product for Einstein Studio at Salesforce Data Cloud. He has over 10 years of experience in planning, building, launching, and managing world-class solutions for enterprise customers, including AI/ML and cloud solutions. He has previously worked in the financial services industry in New York City. Follow him on LinkedIn.

Darvish Shadravan is a Director of Product Management in the AI Cloud at Salesforce. He focuses on building AI/ML features for CRM, and is the product owner for the Bring Your Own LLM feature. You can connect with him on LinkedIn.

RachnaRachna Chadha is a Principal Solutions Architect AI/ML in Strategic Accounts at AWS. Rachna is an optimist who believes that ethical and responsible use of AI can improve society in the future and bring economic and social prosperity. In her spare time, Rachna likes spending time with her family, hiking, and listening to music.

Ravi Bhattiprolu is a Sr. Partner Solutions Architect at AWS. Ravi works with strategic partners Salesforce and Tableau to deliver innovative and well-architected products and solutions that help joint customers realize their business objectives.

Ife Stewart is a Principal Solutions Architect in the Strategic ISV segment at AWS. She has been engaged with Salesforce Data Cloud over the last 2 years to help build integrated customer experiences across Salesforce and AWS. Ife has over 10 years of experience in technology. She is an advocate for diversity and inclusion in the technology field.

Mike Patterson is a Senior Customer Solutions Manager in the Strategic ISV segment at AWS. He has partnered with Salesforce Data Cloud to align business objectives with innovative AWS solutions to achieve impactful customer experiences. In Mike’s spare time, he enjoys spending time with his family, sports, and outdoor activities.

Dharmendra Kumar Rai (DK Rai) is a Sr. Data Architect, Data Lake & AI/ML, serving strategic customers. He works closely with customers to understand how AWS can help them solve problems, especially in the AI/ML and analytics space. DK has many years of experience in building data-intensive solutions across a range of industry verticals, including high-tech, FinTech, insurance, and consumer-facing applications.

Read More

CMU-MATH Team’s Innovative Approach Secures 2nd Place at the AIMO Prize

CMU-MATH Team’s Innovative Approach Secures 2nd Place at the AIMO Prize

Recently, our CMU-MATH team proudly clinched 2nd place in the Artificial Intelligence Mathematical Olympiad (AIMO) out of 1,161 participating teams, earning a prize of $65,536! This prestigious competition aims to revolutionize AI in mathematical problem-solving, with the ultimate goal of  building  a publicly-shared AI model capable of winning a gold medal in the International Mathematical Olympiad (IMO). Dive into our blog to discover the winning formula that set us apart in this significant contest.

Background: The AIMO competition

The Artificial Intelligence Mathematical Olympiad (AIMO) Prize, initiated by XTX Markets, is a pioneering competition designed to revolutionize AI’s role in mathematical problem-solving. It  pushes the boundaries of AI by solving  complex mathematical problems akin to those in the International Mathematical Olympiad (IMO). The advisory committee of AIMO includes Timothy Gowers and Terence Tao, both winners of the Fields Medal. Attracting attention from world-class mathematicians as well as machine learning researchers, the AIMO sets a new benchmark for excellence in the field.

AIMO has introduced a series of progress prizes. The first of these was a Kaggle competition, with the 50 test problems hidden from competitors. The problems are comparable in difficulty to the AMC12 and AIME exams for the USA IMO team pre-selection. The private leaderboard determined the final rankings, which then determined the distribution of $253,952 in the one-million dollar prize pool among the top five teams. Each submitted solution was allocated either a P100 GPU or 2xT4 GPUs, with up to 9 hours to solve the 50 problems.

Just to give an idea about how the problems look like, AIMO provided a 10-problem training set open to the public. Here are two example problems in the set:

  • Let (k, l > 0) be parameters. The parabola (y = kx^2 – 2kx + l) intersects the line (y = 4) at two points (A) and (B). These points are distance 6 apart. What is the sum of the squares of the distances from (A) and (B) to the origin?
  • Each of the three-digits numbers (111) to (999) is coloured blue or yellow in such a way that the sum of any two (not necessarily different) yellow numbers is equal to a blue number. What is the maximum possible number of yellow numbers there can be?

The first problem is about analytic geometry. It requires the model to understand geometric objects based on textual descriptions and perform symbolic computations using the distance formula and Vieta’s formulas. The second problem falls under extremal combinatorics, a topic beyond the scope of high school math. It’s notoriously challenging because there’s no general formula to apply; solving it requires creative thinking to exploit the problem’s structure. It’s non-trivial to master all these required capabilities even for humans, let alone language models.

In general, the problems in AIMO were significantly more challenging than those in GSM8K, a standard mathematical reasoning benchmark for LLMs, and about as difficult as the hardest problems in the challenging MATH dataset. The limited computational resources—P100 and T4 GPUs, both over five years old and much slower than more advanced hardware—posed an additional challenge. Thus, it was crucial to employ appropriate models and inference strategies to maximize accuracy within the constraints of limited memory and FLOPs.

Our winning formula

Unlike most teams that relied on a single model for the competition, we utilized a dual-model approach. Our final solutions were derived through a weighted majority voting system, which consists of generating multiple solutions with a policy model, assigning a weight to each solution using a reward model, and then choosing the answer with the highest total weight. Specifically, we paired a policy model—designed to generate problem solutions in the form of computer code—with a reward model—which scored the outputs of the policy model. Our final solutions were derived through a weighted majority voting system, where the answers were generated by the policy model and the weights were determined by the scores from the reward model. This strategy stemmed from our study on compute-optimal inference, demonstrating that weighted majority voting with a reward model consistently outperforms naive majority voting given the same inference budget.

Both models in our submission were fine-tuned from the DeepSeek-Math-7B-RL checkpoint. Below, we detail the fine-tuning process and inference strategies for each model.

Policy model: Program-aided problem solver based on self-refinement

The policy model served as the primary problem solver in our approach. We noted that LLMs can perform mathematical reasoning using both text and programs. Natural language excels in abstract reasoning but falls short in precise computation, symbolic manipulation, and algorithmic processing. Programs, on the other hand, are adept at rigorous operations and can leverage specialized tools like equation solvers for complex calculations. To harness the benefits of both methods, we implemented the Program-Aided Language Models (PAL) or more precisely Tool-Augmented Reasoning (ToRA) approach, originally proposed by CMU & Microsoft. This approach combines natural language reasoning with program-based problem-solving. The model first generates rationales in text form, followed by a computer program which executes to derive a numerical answer

Figure 1: The tool-integrated reasoning format (from ToRA paper)

To train the model, we needed a suitable problem set (the given “training set” of this competition is too small for fine-tuning) with  “ground truth” solutions in ToRA format for supervised fine-tuning. Given the problem difficulty (comparable to AMC12 and AIME exams) and the special format (integer answers only), we used a combination of AMC, AIME, and Odyssey-Math as our problem set, removing multiple-choice options and filtering out problems with non-integer answers. This resulted in a dataset of 2,600 problems. We prompted GPT-4o (and DeepSeek-Coder-V2) with few-shot examples to generate 64 solutions for each problem, retaining those that led to correct answers. Our final dataset contained 41,160 problem-solution pairs. We performed supervised fine-tuning on the open-sourced DeepSeek-Math-7B-RL model for 3 epochs with a learning rate of 2e-5.

During inference, we employed the self-refinement technique (which is another widely adopted technique proposed by CMU!), providing feedback to the policy model on the execution results of the generated program (e.g., invalid output, execution failure) and allowing the model to refine the solution accordingly.

Below we present our ablation study on the techniques we employed for the policy model. We used the accuracy on a selected subset of the MATH test set as the evaluation metric. It’s easy to see the combination of techniques that lead to large performance gains compared with naive baselines.

Model Output format Inference strategy Accuracy
DeepSeek RL 7b Text-only Greedy decoding 54.02%
DeepSeek RL 7b ToRA Greedy decoding 58.05%
DeepSeek RL 7b ToRA Greedy + Self-refine 60.73%
DeepSeek RL 7b ToRA Maj@16 + Self-refine 70.46%
DeepSeek RL 7b ToRA Maj@64 + Self-refine 72.48%
Our finetuned model ToRA Maj@16 + Self-refine 74.50%
Our finetuned model ToRA Maj@64 + Self-refine 76.51%
Table: Ablation study of our techniques on a selected MATH subset (in which the problems are similar to AIMO problems). Maj@(n) denotes majority voting over (n) sampled solutions.

Notably, the first-place team also used ToRA with self-refinement. However, they curated a much larger problem set of 60,000 problems and used GPT-4 to generate solutions in the ToRA format. Their dataset was more than 20x larger than ours. The cost to generate solutions was far beyond our budget as an academic team (over $100,000 based on our estimation). Our problem set was based purely on publicly available data, and we spent only ~$1,000 for solution generation.

Reward model: Solution scorer using label-balance training

While the policy model was a creative problem solver, it could sometimes hallucinate and produce incorrect solutions. On the publicly available 10-problem training set, our policy model only correctly solved two problems using standard majority voting with 32 sampled solutions. Interestingly, for another 2 problems, the model generated correct solutions that failed to be selected due to wrong answers dominating in majority voting.

This observation highlighted the potential of the reward model. The reward model was a solution scorer that took the policy model’s output and generated a score between 0 and 1. Ideally, it assigned high scores to correct solutions and low scores to incorrect ones, aiding in the selection of correct answers during weighted majority voting.

The reward model was fine-tuned from a DeepSeek-Math-7B-RL model on a labeled dataset containing both correct and incorrect problem-solution pairs. We utilized the same problem set as for the policy model training and expanded it by incorporating problems from the MATH dataset with integer answers. Simple as it may sound, generating high-quality data and training a strong reward model was non-trivial. We considered the following two essential factors for the reward model training set:

  • Label balance: The dataset should contain both correct (positive examples) and incorrect solutions (negative examples) for each problem, with a balanced number of correct and incorrect solutions.
  • Diversity: The dataset should include diverse solutions for each problem, encompassing different correct approaches and various failure modes.

Sampling solutions from a single model cannot meet those factors. For example, while our fine-tuned policy model achieved very high accuracy on the problem set, it was unable to generate any incorrect solutions and lacked diversity amongst correct solutions. Conversely, sampling from a weaker model, such as DeepSeek-Math-7B-Base, rarely yielded correct solutions. To create a diverse set of models with varying capabilities, we employed two novel strategies:

  • Interpolate between strong and weak models. For MATH problems, we interpolated the model parameters of a strong model (DeepSeek-Math-7B-RL) and a weak model (DeepSeek-Math-7B-Base) to get models with different level of capabilities. Denote by (mathbf{theta}_{mathrm{strong}}) and (mathbf{theta}_{mathrm{weak}}) the model parameters of the strong and weak model. We considered interpolated models with parameters (mathbf{theta}_{alpha}=alphamathbf{theta}_{mathrm{strong}}+(1-alpha)mathbf{theta}_{mathrm{weak}}) and set (alphain{0.3, 0.4, cdots, 1.0}), obtaining 8 models. Those models exhibited different problem solving accuracies on MATH. We sampled two solutions from each model for each problem, yielding diverse outputs with balanced correct and incorrect solutions. This technique was motivated by the research on model parameter merging (e.g., model soups) and represented an interesting application of this idea, i.e., generating models with different levels of capabilities.
  • Leverage intermediate checkpoints. For the AMC, AIME, and Odyssey, recall that our policy model had been fine-tuned on those problems for 3 epochs. The final model and its intermediate checkpoints naturally provided us with multiple models exhibiting different levels of accuracy on these problems. We leveraged these intermediate checkpoints, sampling 12 solutions from each model trained for 0, 1, 2, and 3 epochs.

These strategies allowed us to obtain a diverse set of models almost for free, sampling varied correct and incorrect solutions. We further filtered the generated data by removing wrong solutions with non-integer answers since it was trivial to determine that those answers are incorrect during inference. In addition, for each problem, we maintained equal numbers of correct and incorrect solutions to ensure label balance and avoid a biased reward model. The final dataset contains 7000 unique problems and 37880 labeled problem-solution pairs. We finetuned DeepSeek-Math-7B-RL model for 2 epochs with learning rate 2e-5 on the curated dataset.

Figure 2: Weighted majority voting system based on the policy and reward models.

We validated the effectiveness of our reward model on the public training set. Notably, by pairing the policy model with the reward model and applying weighted majority voting, our method correctly solved 4 out of the 10 problems – while a single policy model could only solve 2 using standard majority voting.

Concluding remarks: Towards machine-based mathematical reasoning

With the models and techniques described above, our CMU-MATH team solved 22 out of 50 problems in the private test set, snagging the second place and establishing the best performance of an academic team. This outcome marks a significant step towards the goal of machine-based mathematical reasoning.

However, we also note that the accuracy achieved by our models still trails behind that of proficient human competitors who can easily solve over 95% of AIMO problems, indicating substantial room for improvement. There are a wide range of directions to be explored:

  • Advanced inference-time algorithms for mathematical reasoning. Our dual-model approach is a robust technique to enhance model reasoning at inference time. Recent research from our team suggests that more advanced inference-time algorithms, e.g., tree search methods, could even surpass weighted majority voting. Although computational constraints limited our ability to deploy this technique in the AIMO competition, future explorations on optimizing these inference-time algorithms can potentially lead to better mathematical reasoning approaches.
  • Integration of Automated Theorem Proving. Integrating automated theorem proving (ATP) tools, such as Lean, represents another promising frontier. ATP tools can provide rigorous logical frameworks and support for deeper mathematical analyses, potentially elevating the precision and reliability of problem-solving strategies employed by LLMs. The synergy between LLMs and ATP could lead to breakthroughs in complex problem-solving scenarios, where deep logical reasoning is essential.
  • Leveraging Larger, More Diverse Datasets. The competition reinforced a crucial lesson about the pivotal role of data in machine learning. Rich, diverse datasets, especially those comprising challenging mathematical problems, are vital for training more capable models. We advocate for the creation and release of larger datasets focused on mathematical reasoning, which would not only benefit our research but also the broader AI and mathematics communities.

Finally, we would like to thank Kaggle and XTX Markets for organizing this wonderful competition. We have open-sourced our code and datasets used in our solution to ensure reproducibility and facilitate future research. We invite the community to explore, utilize, and build upon our work, which is available in our GitHub repository. For further details about our results, please feel free to reach out to us!

Read More

Recipe for Magic: WPP and NVIDIA Omniverse Help The Coca-Cola Company Scale Generative AI Content That Pops With Brand Authenticity

Recipe for Magic: WPP and NVIDIA Omniverse Help The Coca-Cola Company Scale Generative AI Content That Pops With Brand Authenticity

When The Coca-Cola Company produces thirst-quenching marketing, the creative elements of campaigns aren’t just left to chance — there’s a recipe for the magic. Now, the beverage company, through its partnership with WPP Open X, is beginning to scale its global campaigns with generative AI from NVIDIA Omniverse and NVIDIA NIM microservices.

“With NVIDIA, we can personalize and customize Coke and meals imagery across 100-plus markets, delivering on hyperlocal relevance with speed and at global scale,” said Samir Bhutada, global vice president of StudioX Digital Transformation at The Coca-Cola Company.

Coca-Cola has been working with WPP to develop digital twin tools and roll out Prod X — a custom production studio experience created specifically for the beverage maker to use globally.

WPP announced today at SIGGRAPH that The Coca-Cola Company will be an early adopter for integrating the new NVIDIA NIM microservices for Universal Scene Description (aka OpenUSD) into its Prod X roadmap. OpenUSD is a 3D framework that enables interoperability between software tools and data types for building virtual worlds. NIM inference microservices provide models as optimized containers.

The USD Search NIM allows WPP to tap into a large archive of models to create on-brand assets, and the USD Code NIM can be used to assemble them into scenes.

These NIM microservices will enable Prod X users to create 3D advertising assets that contain culturally relevant elements on a global scale, using prompt engineering to quickly make adjustments to AI-generated images so that brands can better target their products at local markets.

Tapping Into NVIDIA NIM Microservices to Deploy Generative AI 

WPP said that the NVIDIA NIM microservices will have a lasting impact on the 3D engineering and art world.

The USD Search NIM can make WPP’s massive visual asset libraries quickly available via written prompts. The USD Code NIM allows developers to enter prompts and get Python code to create novel 3D worlds.

“The beauty of the solution is that it compresses multiple phases of the production process into a single interface and process,” said Perry Nightingale, senior vice president of creative AI at WPP, of the new NIM microservices. “It empowers artists to get more out of the technology and create better work.”

Redefining Content Production With Production Studio

WPP recently announced the release of Production Studio on WPP Open, the company’s intelligent marketing operating system powered by AI. Co-developed with its production company, Hogarth, Production Studio taps into the Omniverse development platform and OpenUSD for its generative AI-enabled product configurator workflows.

Production Studio can streamline and automate multilingual text, image and video creation, simplifying content creation for advertisers and marketers, and directly addresses the challenges advertisers continue to face in producing brand-compliant and product-accurate content at scale.

“Our groundbreaking research with NVIDIA Omniverse for the past few years, and the research and development associated with having built our own core USD pipeline and decades of experience in 3D workflows, is what made it possible for us to stand up a tailored experience like this for The Coca-Cola Company,” said Priti Mhatre, managing director for strategic consulting and AI at Hogarth.

SIGGRAPH attendees can hear more about WPP’s efforts by joining the company’s session on “Robotics, Generative AI, and OpenUSD: How WPP Is Building the Future of Creativity.”

NVIDIA founder and CEO Jensen Huang will also be featured at the event in fireside chats with Meta founder and CEO Mark Zuckerberg and WIRED Senior Writer Lauren Goode. Watch the talks and other sessions from NVIDIA at SIGGRAPH 2024 on demand.

Photo credit: WPP, The Coca-Cola Company

See notice regarding software product information.

Read More

Reality Reimagined: NVIDIA Introduces fVDB to Build Bigger Digital Models of the World

Reality Reimagined: NVIDIA Introduces fVDB to Build Bigger Digital Models of the World

NVIDIA announced at SIGGRAPH fVDB, a new deep-learning framework for generating AI-ready virtual representations of the real world.

fVDB is built on top of OpenVDB, the industry-standard library for simulating and rendering sparse volumetric data such as water, fire, smoke and clouds.

Generative physical AI, such as autonomous vehicles and robots that inhabit the real world, need to have “spatial intelligence” — the ability to understand and operate in 3D space.

Capturing the large scale and super-fine details of the world around us is essential. But converting reality into a virtual representation to train AI is hard.

Raw data for real-world environments can be collected through many different techniques, like neural radiance fields (NeRFs) and lidar. fVDB translates this data into massive, AI-ready environments rendered in real time.

Building on a decade of innovation in the OpenVDB standard, the introduction of fVDB at SIGGRAPH represents a significant leap forward in how industries can benefit from digital twins of the real world.

Reality-scale virtual environments are used for training autonomous agents. City-scale 3D models are captured by drones for climate science and disaster planning. Today, 3D generative AI is even used to plan urban spaces and smart cities.

fVDB enables industries to tap into spatial intelligence on a larger scale and with higher resolution than ever before, making physical AI even smarter.

The framework builds NVIDIA-accelerated AI operators on top of NanoVDB, a GPU-accelerated data structure for efficient 3D simulations. These operators include convolution, pooling, attention and meshing, all of which are designed for high-performance 3D deep learning applications.

AI operators allow businesses to build complex neural networks for spatial intelligence, like large-scale point cloud reconstruction and 3D generative modeling.

fVDB is the result of a long-running effort by NVIDIA’s research team and is already used to support NVIDIA Research, NVIDIA DRIVE and NVIDIA Omniverse projects that require high-fidelity models of large, complex real-world spaces.

Key Advantages of fVDB

  • Larger: 4x larger spatial scale than prior frameworks
  • Faster: 3.5x faster than prior frameworks
  • Interoperable: Businesses can fully tap into massive real-world datasets. fVDB reads VDB datasets into full-sized 3D environments. AI-ready and real-time rendered for building physical AI with spatial intelligence.
  • More powerful: 10x more operators than prior frameworks. fVDB simplifies processes by combining functionalities that previously required multiple deep-learning libraries.

fVDB will soon be available as NVIDIA NIM inference microservices. A trio of the microservices will enable businesses to incorporate fVDB into OpenUSD workflows, generating AI-ready OpenUSD geometry in NVIDIA Omniverse, a development platform for industrial digitalization and generative physical AI applications. They are:

  • fVDB Mesh Generation NIM — Generates digital 3D environments of the real world
  • fVDB NeRF-XL NIM — Generates large-scale NeRFs in OpenUSD using Omniverse Cloud APIs
  • fVDB Physics Super-Res NIM — Performs super-resolution to generate an OpenUSD-based, high-resolution physics simulation

Over the past decade, OpenVDB, housed at the Academy Software Foundation, has earned multiple Academy Awards as a core technology used throughout the visual-effects industry. It has since grown beyond entertainment to industrial and scientific uses, like industrial design and robotics.

NVIDIA continues to enhance the open-source OpenVDB library. Four years ago, the company introduced NanoVDB, which added GPU support to OpenVDB. This delivered an order-of-magnitude speed-up, enabling faster performance and easier development, and opening the door to real-time simulation and rendering.

Two years ago, NVIDIA introduced NeuralVDB, which builds machine learning on top of NanoVDB to compress the memory footprint of VDB volumes up to 100x, allowing creators, developers and researchers to interact with extremely large and complex datasets.

fVDB builds AI operators on top of NanoVDB to unlock spatial intelligence at the scale of reality. Apply to the early-access program for the fVDB PyTorch extension. fVDB will also be available as part of the OpenVDB GitHub repository.

Dive deeper into fVDB in this technical blog and watch how accelerated computing and generative AI are transforming industries and creating new opportunities for innovation and growth in NVIDIA founder and CEO Jensen Huang’s two fireside chats at SIGGRAPH.

See notice regarding software product information.

Read More

NVIDIA Supercharges Digital Marketing With Greater Control Over Generative AI

NVIDIA Supercharges Digital Marketing With Greater Control Over Generative AI

The world’s brands and agencies are using generative AI to create advertising and marketing content, but it doesn’t always provide the desired outputs.

NVIDIA offers a comprehensive set of technologies — bringing together generative AI, NVIDIA NIM microservices, NVIDIA Omniverse and Universal Scene Description (OpenUSD) — to allow developers to build applications and workflows that enable brand-accurate, targeted and efficient advertising at scale.

Developers can use the USD Search NIM microservice to provide artists access to a vast archive of OpenUSD-based, brand-approved assets — such as products, props and environments — and when integrated with the USD Code NIM microservice, assembly of these scenes can be accelerated. Teams can also use the NVIDIA Edify-powered Shutterstock Generative 3D service to rapidly generate 3D new assets using AI.

The scenes, once constructed, can be rendered to a 2D image and used as input to direct an AI-powered image generator to create precise, brand-accurate visuals.

Global agencies, developers and production studios are tapping these technologies to revolutionize every aspect of the advertising process, from creative production and content supply chain to dynamic creative optimization.

WPP announced at SIGGRAPH its adoption of the technologies, naming The Coca-Cola Company the first brand to embrace generative AI with Omniverse and NVIDIA NIM microservices.

Agencies and Service Providers Increase Adoption of Omniverse

The NVIDIA Omniverse development platform has seen widespread adoption for its ability to build accurate digital twins of products. These virtual replicas allow brands and agencies to create ultra-photorealistic and physically accurate 3D product configurators, helping to increase personalization, customer engagement and loyalty, and average selling prices, and reducing return rates.

Digital twins can also serve many purposes and be updated to meet shifting consumer preferences with minimal time, cost and effort, helping flexibly scale content production.

Agencies and Service Providers Increase Adoption of Omniverse

The NVIDIA Omniverse development platform has seen widespread adoption for its ability to build accurate digital twins of products. These virtual replicas allow brands and agencies to create ultra-photorealistic and physically accurate 3D product configurators, helping to increase personalization, customer engagement and loyalty, and average selling prices, and reducing return rates.

Digital twins can also serve many purposes and be updated to meet shifting consumer preferences with minimal time, cost and effort, helping flexibly scale content production.

 

Image courtesy of Monks, Hatch.

Global marketing and technology services company Monks developed Monks.Flow, an AI-centric professional managed service that uses the Omniverse platform to help brands virtually explore different customizable product designs and unlock scale and hyper-personalization across any customer journey.

“NVIDIA Omniverse and OpenUSD’s interoperability accelerates connectivity between marketing, technology and product development,” said Lewis Smithingham, executive vice president of strategic industries at Monks. “Combining Omniverse with Monks’ streamlined marketing and technology services, we infuse AI throughout the product development pipeline and help accelerate technological and creative possibilities for clients.”

Collective World, a creative and technology company, is an early adopter of real-time 3D, OpenUSD and NVIDIA Omniverse, using them to create high-quality digital campaigns for customers like Unilever and EE. The technologies allow Collective to develop digital twins, delivering consistent, high-quality product content at scale to streamline advertising and marketing campaigns.

Building on its use of NVIDIA technologies, Collective World announced at SIGGRAPH that it has joined the NVIDIA Partner Network.

Product digital twin configurator and content generation tool built by Collective on NVIDIA Omniverse.

INDG is using Omniverse to introduce new capabilities into Grip, its popular software tool. Grip uses OpenUSD and generative AI to streamline and enhance the creation process, delivering stunning, high-fidelity marketing content faster than ever.

“This integration helps bring significant efficiencies to every brand by delivering seamless interoperability and enabling real-time visualization,” said Frans Vriendsendorp, CEO of INDG. “Harnessing the potential of USD to eliminate the lock-in to proprietary formats, the combination of Grip and Omniverse are helping set new standards in the realm of digital content creation.”

Image generated with Grip, copyright Beiersdorf

To get started building applications and services using OpenUSD, Omniverse and NVIDIA AI, check out the product configurator developer resources and the generative AI workflow for content creation reference architecture, or submit a contact form to learn more or connect with NVIDIA’s ecosystem of service providers.

Watch NVIDIA founder and CEO Jensen Huang’s fireside chats, as well as other on-demand sessions from NVIDIA at SIGGRAPH.

Stay up to date by subscribing to our newsletter, and following NVIDIA Omniverse on Instagram, LinkedIn, Medium and X.

Read More

New NVIDIA Digital Human Technologies Enhance Customer Interactions Across Industries

New NVIDIA Digital Human Technologies Enhance Customer Interactions Across Industries

Generative AI is unlocking new ways for enterprises to engage customers through digital human avatars.

At SIGGRAPH, NVIDIA previewed “James,” an interactive digital human that can connect with people using emotions, humor and more. James is based on a customer-service workflow using NVIDIA ACE, a reference design for creating custom, hyperrealistic, interactive avatars. Users will soon be able to talk with James in real time at ai.nvidia.com.

NVIDIA also showcased at the computer graphics conference the latest advancements to the NVIDIA Maxine AI platform, including Maxine 3D and Audio2Face-2D for an immersive telepresence experience.

Developers can use Maxine and NVIDIA ACE digital human technologies to make customer interactions with digital interfaces more engaging and natural. ACE technologies enable digital human development with AI models for speech and translation, vision, intelligence, lifelike animation and behavior, and realistic appearance.

Companies across industries are using Maxine and ACE to deliver immersive virtual customer experiences.

Meet James, a Digital Brand Ambassador

Built on top of NVIDIA NIM microservices, James is a virtual assistant that can provide contextually accurate responses.

Using retrieval-augmented generation (RAG), James can accurately tell users about the latest NVIDIA technologies. ACE allows developers to use their own data to create domain-specific avatars that can communicate relevant information to customers.

James is powered by the latest NVIDIA RTX rendering technologies for advanced, lifelike animations. His natural-sounding voice is powered by ElevenLabs. NVIDIA ACE lets developers customize animation, voice and language when building avatars tailored for different use cases.

NVIDIA Maxine Enhances Digital Humans in Telepresence

Maxine, a platform for deploying cutting-edge AI features that enhance the audio and video quality of digital humans, enables the use of real-time, photorealistic 2D and 3D avatars with video-conferencing devices.

Maxine 3D converts 2D video portrait inputs into 3D avatars, allowing the integration of highly realistic digital humans in video conferencing and other two-way communication applications. The technology will soon be available in early access.

Audio2Face-2D, currently in early access, animates static portraits based on audio input, creating dynamic, speaking digital humans from a single image. Try the technology at ai.nvidia.com.

Companies Embracing Digital Human Applications

HTC, Looking Glass, Reply and UneeQ are among the latest companies using NVIDIA ACE and Maxine across a broad range of use cases, including customer service agents, and telepresence experiences in entertainment, retail and hospitality.

At SIGGRAPH, digital human technology developer UneeQ is showcasing two new demos.

The first spotlights cloud-rendered digital humans powered by NVIDIA GPUs with local, in-browser computer vision for enhanced scalability and privacy, and animated using the Audio2Face-3D NVIDIA NIM microservice. UneeQ’s Synapse technology processes anonymized user data and feeds it to a large language model (LLM) for more accurate, responsive interactions.

The second demo runs on a single NVIDIA RTX GPU-powered laptop, featuring an advanced digital human powered by Gemma 7B LLM, RAG and the NVIDIA Audio2Face-3D NIM microservice.

Both demos showcase UneeQ’s NVIDIA-powered efforts to develop digital humans that can react to users’ facial expressions and actions, pushing the boundaries of realism in virtual customer service experiences.

HTC Viverse has integrated the Audio2Face-3D NVIDIA NIM microservice into its VIVERSE AI agent for dynamic facial animation and lip sync, allowing for more natural and immersive user interactions.

Hologram technology company Looking Glass’ Magic Mirror demo at SIGGRAPH uses a simple camera setup and Maxine’s advanced 3D AI capabilities to generate a real-time holographic feed of users’ faces on its newly launched, group-viewable Looking Glass 16-inch and 32-inch Spatial Displays.

Reply is unveiling an enhanced version of Futura, its cutting-edge digital human developed for Costa Crociere’s Costa Smeralda cruise ship. Powered by Audio2Face-3D NVIDIA NIM and Riva ASR NIM microservices, Futura’s speech-synthesis capabilities tap advanced technologies including GPT-4o, LlamaIndex for RAG and Microsoft Azure text-to-speech services.

Futura also incorporates Reply’s proprietary affective computing technology, alongside Hume AI and MorphCast, for comprehensive emotion recognition. Built using Unreal Engine 5.4.3 and MetaHuman Creator with NVIDIA ACE-powered facial animation, Futura supports six languages. The intelligent assistant can help plan personalized port visits, suggest tailored itineraries and facilitate tour bookings.

In addition, Futura refines recommendations based on guest feedback and uses a specially created knowledge base to provide informative city presentations, enhancing tourist itineraries. Futura aims to enhance customer service and offer immersive interactions in real-world scenarios, leading to streamlined operations and driving business growth.

Learn more about NVIDIA ACE and NVIDIA Maxine

Discover how accelerated computing and generative AI are transforming industries and creating new opportunities for innovation by watching NVIDIA founder and CEO Jensen Huang’s fireside chats at SIGGRAPH.

See notice regarding software product information.

Read More

Hugging Face Offers Developers Inference-as-a-Service Powered by NVIDIA NIM

Hugging Face Offers Developers Inference-as-a-Service Powered by NVIDIA NIM

One of the world’s largest AI communities — comprising 4 million developers on the Hugging Face platform — is gaining easy access to NVIDIA-accelerated inference on some of the most popular AI models.

New inference-as-a-service capabilities will enable developers to rapidly deploy leading large language models such as the Llama 3 family and Mistral AI models with optimization from NVIDIA NIM microservices running on NVIDIA DGX Cloud.

Announced today at the SIGGRAPH conference, the service will help developers quickly prototype with open-source AI models hosted on the Hugging Face Hub and deploy them in production. Enterprise Hub users can tap serverless inference for increased flexibility, minimal infrastructure overhead and optimized performance with NVIDIA NIM.

The inference service complements Train on DGX Cloud, an AI training service already available on Hugging Face.

Developers facing a growing number of open-source models can benefit from a hub where they can easily compare options. These training and inference tools give Hugging Face developers new ways to experiment with, test and deploy cutting-edge models on NVIDIA-accelerated infrastructure. They’re made easily accessible using the “Train” and “Deploy” drop-down menus on Hugging Face model cards, letting users get started with just a few clicks.

Get started with inference-as-a-service powered by NVIDIA NIM.

Beyond a Token Gesture — NVIDIA NIM Brings Big Benefits

NVIDIA NIM is a collection of AI microservices — including NVIDIA AI foundation models and open-source community models — optimized for inference using industry-standard application programming interfaces, or APIs.

NIM offers users higher efficiency in processing tokens — the units of data used and generated by a language model. The optimized microservices also improve the efficiency of the underlying NVIDIA DGX Cloud infrastructure, which can increase the speed of critical AI applications.

This means developers see faster, more robust results from an AI model accessed as a NIM compared with other versions of the model. The 70-billion-parameter version of Llama 3, for example, delivers up to 5x higher throughput when accessed as a NIM compared with off-the-shelf deployment on NVIDIA H100 Tensor Core GPU-powered systems.

Near-Instant Access to DGX Cloud Provides Accessible AI Acceleration

The NVIDIA DGX Cloud platform is purpose-built for generative AI, offering developers easy access to reliable accelerated computing infrastructure that can help them bring production-ready applications to market faster.

The platform provides scalable GPU resources that support every step of AI development, from prototype to production, without requiring developers to make long-term AI infrastructure commitments.

Hugging Face inference-as-a-service on NVIDIA DGX Cloud powered by NIM microservices offers easy access to compute resources that are optimized for AI deployment, enabling users to experiment with the latest AI models in an enterprise-grade environment.

More on NVIDIA NIM at SIGGRAPH 

At SIGGRAPH, NVIDIA also introduced generative AI models and NIM microservices for the OpenUSD framework to accelerate developers’ abilities to build highly accurate virtual worlds for the next evolution of AI.

To experience more than 100 NVIDIA NIM microservices with applications across industries, visit ai.nvidia.com.

Read More

AI Gets Physical: New NVIDIA NIM Microservices Bring Generative AI to Digital Environments

AI Gets Physical: New NVIDIA NIM Microservices Bring Generative AI to Digital Environments

Millions of people already use generative AI to assist in writing and learning. Now, the technology can also help them more effectively navigate the physical world.

NVIDIA announced at SIGGRAPH generative physical AI advancements including the NVIDIA Metropolis reference workflow for building interactive visual AI agents and new NVIDIA NIM microservices that will help developers train physical machines and improve how they handle complex tasks.

These include three fVDB NIM microservices that support NVIDIA’s new deep learning framework for 3D worlds, as well as the USD Code, USD Search and USD Validate NIM microservices for working with Universal Scene Description (aka OpenUSD).

The NVIDIA OpenUSD NIM microservices work together with the world’s first generative AI models for OpenUSD development — also developed by NVIDIA — to enable developers to incorporate generative AI copilots and agents into USD workflows and broaden the possibilities of 3D worlds.

NVIDIA NIM Microservices Transform Physical AI Landscapes

Physical AI uses advanced simulations and learning methods to help robots and other industrial automation more effectively perceive, reason and navigate their surroundings. The technology is transforming industries like manufacturing and healthcare, and advancing smart spaces with robots, factory and warehouse technologies, surgical AI agents and cars that can operate more autonomously and precisely.

NVIDIA offers a broad range of NIM microservices customized for specific models and industry domains. NVIDIA’s suite of NIM microservices tailored for physical AI supports capabilities for speech and translation, vision and intelligence, and realistic animation and behavior.

Turning Visual AI Agents Into Visionaries With NVIDIA NIM

Visual AI agents use computer vision capabilities to perceive and interact with the physical world and perform reasoning tasks.

Highly perceptive and interactive visual AI agents are powered by a new class of generative AI models called vision language models (VLMs), which bridge digital perception and real-world interaction in physical AI workloads to enable enhanced decision-making, accuracy, interactivity and performance. With VLMs, developers can build vision AI agents that can more effectively handle challenging tasks, even in complex environments.

Generative AI-powered visual AI agents are rapidly being deployed across hospitals, factories, warehouses, retail stores, airports, traffic intersections and more.

To help physical AI developers more easily build high-performing, custom visual AI agents, NVIDIA offers NIM microservices and reference workflows for physical AI. The NVIDIA Metropolis reference workflow provides a simple, structured approach for customizing, building and deploying visual AI agents, as detailed in the blog.

NVIDIA NIM Helps K2K Make Palermo More Efficient, Safe and Secure

City traffic managers in Palermo, Italy, deployed visual AI agents using NVIDIA NIM to uncover physical insights that help them better manage roadways.

K2K, an NVIDIA Metropolis partner, is leading the effort, integrating NVIDIA NIM microservices and VLMs into AI agents that analyze the city’s live traffic cameras in real time. City officials can ask the agents questions in natural language and receive fast, accurate insights on street activity and suggestions on how to improve the city’s operations, like adjusting traffic light timing.

Leading global electronics giants Foxconn and Pegatron have adopted physical AI, NIM microservices and Metropolis reference workflows to more efficiently design and run their massive manufacturing operations.

The companies are building virtual factories in simulation to save significant time and costs. They’re also running more thorough tests and refinements for their physical AI — including AI multi-camera and visual AI agents — in digital twins before real-world deployment, improving worker safety and leading to operational efficiencies.

Bridging the Simulation-to-Reality Gap With Synthetic Data Generation

Many AI-driven businesses are now adopting a “simulation-first” approach for generative physical AI projects involving real-world industrial automation.

Manufacturing, factory logistics and robotics companies need to manage intricate human-worker interactions, advanced facilities and expensive equipment. NVIDIA physical AI software, tools and platforms — including physical AI and VLM NIM microservices, reference workflows and fVDB — can help them streamline the highly complex engineering required to create digital representations or virtual environments that accurately mimic real-world conditions.

VLMs are seeing widespread adoption across industries because of their ability to generate highly realistic imagery. However, these models can be challenging to train because of the immense volume of data required to create an accurate physical AI model.

Synthetic data generated from digital twins using computer simulations offers a powerful alternative to real-world datasets, which can be expensive — and sometimes impossible — to acquire for model training, depending on the use case.

Tools like NVIDIA NIM microservices and Omniverse Replicator let developers build generative AI-enabled synthetic data pipelines to accelerate the creation of robust, diverse datasets for training physical AI. This enhances the adaptability and performance of models such as VLMs, enabling them to generalize more effectively across industries and use cases.

Availability

Developers can access state-of-the-art, open and NVIDIA-built foundation AI models and NIM microservices at ai.nvidia.com. The Metropolis NIM reference workflow is available in the GitHub repository, and Metropolis VIA microservices are available for download in developer preview.

OpenUSD NIM microservices are available in preview through the NVIDIA API catalog.

Watch how accelerated computing and generative AI are transforming industries and creating new opportunities for innovation and growth in NVIDIA founder and CEO Jensen Huang’s fireside chats at SIGGRAPH.

See notice regarding software product information.

Read More

For Your Edification: Shutterstock Releases Generative 3D, Getty Images Upgrades Service Powered by NVIDIA

For Your Edification: Shutterstock Releases Generative 3D, Getty Images Upgrades Service Powered by NVIDIA

Designers and artists have new and improved ways to boost their productivity with generative AI trained on licensed data.

Shutterstock, a leading platform for creative content, launched its Generative 3D service in commercial beta. It lets creators quickly prototype 3D assets and generate 360 HDRi backgrounds that light scenes, using just text or image prompts.

Getty Images, a premier visual content creator and marketplace, turbocharged its Generative AI by Getty Images service so it creates images twice as fast, improves output quality, brings advanced controls and enables fine-tuning.

The services are built with NVIDIA’s visual AI foundry using NVIDIA Edify, a multimodal generative AI architecture. The AI models are then optimized and packaged for maximum performance with NVIDIA NIM, a set of accelerated microservices for AI inference.

Edify enables service providers to train responsible generative models on their licensed data and scale them quickly with NVIDIA DGX Cloud, the cloud-first way to get the best of NVIDIA AI.

Generative AI Speeds 3D Modeling

Available now for enterprises in commercial beta, Shutterstock’s service lets designers and artists quickly create 3D objects that help them prototype or populate virtual environments. For example, tapping generative AI, they can quickly create the silverware and plates on a dining room table so they can focus on designing the characters around it.

The 3D assets the service generates are ready to edit using digital content creation tools, and available in a variety of popular file formats. Their clean geometry and layout gives artists an advanced starting point for adding their own flair.

An example of a 3D mesh from Shutterstock Generative 3D.

The AI model first delivers a preview of a single asset in as little as 10 seconds. If users like it, the preview can be turned into a higher-quality 3D asset, complete with physically based rendering materials like concrete, wood or leather.

At this year’s SIGGRAPH computer graphics conference, designers will see just how fast they can make their ideas come to life.

Shutterstock will demo a workflow in Blender that lets artists generate objects directly within their 3D environment. In the Shutterstock booth at SIGGRAPH, HP will show 3D prints and physical prototypes of the kinds of assets attendees can design on the show floor using Generative 3D.

Shutterstock is also working with global marketing and communications services company WPP to bring ideas to life with Edify 3D generation for virtual production (see video below).

Explore Generative 3D by Shutterstock on the company’s website, or test-drive the application programming interface (API) at build.nvidia.com/.

Virtual Lighting Gets Real

Lighting a virtual scene with accurate reflections can be a complicated task. Creatives need to operate expensive 360-degree camera rigs and go on set to create backgrounds from scratch, or search vast libraries for something that approximates what they want.

With Shutterstock’s Generative 3D service, users can now simply describe the exact environment they need in text or with an image, and out comes a high-dynamic-range panoramic image, aka 360 HDRi, in brilliant 16K resolution. (See video below.)

Want that beautiful new sports car shown in a desert, a tropical beach or maybe on a winding mountain road? With generative AI, designers can shift gears fast.

Three companies plan to integrate Shutterstock’s 360 HDRi APIs directly into their workflows — WPP, CGI studio Katana and Dassault Systèmes, developer of the 3DEXCITE applications for creating high-end visualizations and 3D content for virtual worlds.

Examples from Generative AI by Getty Images.

Great Images Get a Custom Fit

Generative AI by Getty Images has upgraded to a more powerful Edify AI model with a portfolio of new features that let artists control image composition and style.

Want a red beach ball floating above that perfect shot of a coral reef in Fiji? Getty Images’ service can get it done in a snap.

The new model is twice as fast, boosts image quality and prompt accuracy, and lets users control camera settings like the depth of field or focal length of a shot. Users can generate four images in about six seconds and scale them up to 4K resolution.

An example of the camera controls in Generative AI by Getty Images.
An example of the camera controls in Generative AI by Getty Images.

In addition, the commercially safe foundational model now serves as the basis for a fine-tuning capability that lets companies customize the AI with their own data. That lets them generate images tailored to the creative style of their specific brands.

New controls in the service support the use of a sketch or depth map to guide the composition or structure of an image.

Creatives at Omnicom, a global leader in marketing and sales solutions, are using Getty Images’ service to streamline advertising workflows and safely create on-brand content. The collaboration with Getty Images is part of Omnicom’s strategy to infuse generative AI into every facet of its business, helping teams move from ideas to outcomes faster.

Generative AI by Getty Images is available through the Getty Images and iStock websites, and via an API.

For more about NVIDIA’s offerings, read about the AI foundry for visual generative AI built on NVIDIA DGX Cloud, and try it on ai.nvidia.com.

To get the big picture, listen to NVIDIA founder and CEO Jensen Huang in two fireside chats at SIGGRAPH.

See notice regarding software product information.

Read More

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

Transition your Amazon Forecast usage to Amazon SageMaker Canvas

Amazon Forecast is a fully managed service that uses statistical and machine learning (ML) algorithms to deliver highly accurate time series forecasts. Launched in August 2019, Forecast predates Amazon SageMaker Canvas, a popular low-code no-code AWS tool for building, customizing, and deploying ML models, including time series forecasting models.

With SageMaker Canvas, you get faster model building, cost-effective predictions, advanced features such as a model leaderboard and algorithm selection, and enhanced transparency. You can also either use the SageMaker Canvas UI, which provides a visual interface for building and deploying models without needing to write any code or have any ML expertise, or use its automated machine learning (AutoML) APIs for programmatic interactions.

In this post, we provide an overview of the benefits SageMaker Canvas offers and details on how Forecast users can transition their use cases to SageMaker Canvas.

Benefits of SageMaker Canvas

Forecast customers have been seeking greater transparency, lower costs, faster training, and enhanced controls for building time series ML models. In response to this feedback, we have made next-generation time series forecasting capabilities available in SageMaker Canvas, which already offers a robust platform for preparing data and building and deploying ML models. With the addition of forecasting, you can now access end-to-end ML capabilities for a broad set of model types—including regression, multi-class classification, computer vision (CV), natural language processing (NLP), and generative artificial intelligence (AI)—within the unified user-friendly platform of SageMaker Canvas.

SageMaker Canvas offers up to 50% faster model building performance and up to 45% quicker predictions on average for time series models compared to Forecast across various benchmark datasets. Generating predictions is  significantly more cost-effective than Forecast, because costs are based solely on the Amazon SageMaker compute resources used. SageMaker Canvas also provides excellent model transparency by offering direct access to trained models, which you can deploy at your chosen location, along with numerous model insight reports, including access to validation data, model- and item-level performance metrics, and hyperparameters employed during training.

SageMaker Canvas includes the key capabilities found in Forecast, including the ability to train an ensemble of forecasting models using both statistical and neural network algorithms. It creates the best model for your dataset by generating base models for each algorithm, evaluating their performance, and then combining the top-performing models into an ensemble. This approach leverages the strengths of different models to produce more accurate and robust forecasts. You have the flexibility to select one or several algorithms for model creation, along with the capability to evaluate the impact of model features on prediction accuracy. SageMaker Canvas simplifies your data preparation with automated solutions for filling in missing values, making your forecasting efforts as seamless as possible. It facilitates an out-of-the-box integration of external information, such as country-specific holidays, through simple UI options or API configurations. You can also take advantage of its data flow feature to connect with external data providers’ APIs to import data, such as weather information. Furthermore, you can conduct what-if analyses directly in the SageMaker Canvas UI to explore how various scenarios might affect your outcomes.

We will continue to innovate and deliver cutting-edge, industry-leading forecasting capabilities through SageMaker Canvas by lowering latency, reducing training and prediction costs, and improving accuracy. This includes expanding the range of forecasting algorithms we support and incorporating new advanced algorithms to further enhance the model building and prediction experience.

Transitioning from Forecast to SageMaker Canvas

Today, we’re releasing a transition package comprising two resources to help you transition your usage from Forecast to SageMaker Canvas. The first component includes a workshop to get hands-on experience with the SageMaker Canvas UI and APIs and to learn how to transition your usage from Forecast to SageMaker Canvas. We also provide a Jupyter notebook that shows how to transform your existing Forecast training datasets to the SageMaker Canvas format.

Before we learn how to build forecast models in SageMaker Canvas using your Forecast input datasets, let’s understand some key differences between Forecast and SageMaker Canvas:

  • Dataset types – Forecast uses multiple datasets – target time series, related time series (optional), and item metadata (optional). In contrast, SageMaker Canvas requires only one dataset, eliminating the need for managing multiple datasets.
  • Model invocation – SageMaker Canvas allows you to invoke the model for a single dataset or a batch of datasets using the UI as well as the APIs. Unlike Forecast, which requires you to first create a forecast and then query it, you simply use the UI or API to invoke the endpoint where the model is deployed to generate forecasts. The SageMaker Canvas UI also gives you the option to deploy the model for inference on SageMaker real-time endpoints. With just a few clicks, you can receive an HTTPS endpoint that can be invoked from within your application to generate forecasts.

In the following sections, we discuss the high-level steps for transforming your data, building a model, and deploying a model using SageMaker Canvas using either the UI or APIs.

Build and deploy a model using the SageMaker Canvas UI

We recommend reorganizing your data sources to directly create a single dataset for use with SageMaker Canvas. Refer to Time Series Forecasts in Amazon SageMaker Canvas  for guidance on structuring your input dataset to build a forecasting model in SageMaker Canvas. However, if you prefer to continue using multiple datasets as you do in Forecast, you have the following options to merge them into a single dataset supported by SageMaker Canvas:

  • SageMaker Canvas UI – Use the SageMaker Canvas UI to join the target time series, related time series, and item metadata datasets into one dataset. The following screenshot shows an example dataflow created in SageMaker Canvas to merge the three datasets into one SageMaker Canvas dataset.
  • Python script – Use a Python script to merge the datasets. For sample code and hands-on experience in transforming multiple Forecast datasets into one dataset for SageMaker Canvas, refer to this workshop.

When the dataset is ready, use the SageMaker Canvas UI, available on the SageMaker console, to load the dataset into the SageMaker Canvas application, which uses AutoML to train, build, and deploy the model for inference. The workshop shows how to merge your datasets and build the forecasting model.

After the model is built, there are multiple ways to generate and consume forecasts:

  • Make an in-app prediction – You can generate forecasts using the SageMaker Canvas UI and export them to Amazon QuickSight using built-in integration or download the prediction file to your local desktop. You can also access the generated predictions from the Amazon Simple Storage Service (Amazon S3) storage location where SageMaker Canvas is configured to store model artifacts, datasets, and other application data. Refer to Configure your Amazon S3 storage to learn more about the Amazon S3 storage location used by SageMaker Canvas.
  • Deploy the model to a SageMaker endpoint – You can deploy the model to SageMaker real-time endpoints directly from the SageMaker Canvas UI. These endpoints can be queried by developers in their applications with a few lines of code. You can update the code in your existing application to invoke the deployed model. Refer to the workshop for more details.

Build and deploy a model using the SageMaker Canvas (Autopilot) APIs

You can use the sample code provided in the notebook in the GitHub repo to process your datasets, including target time series data, related time series data, and item metadata, into a single dataset needed by SageMaker Canvas APIs.

Next, use the SageMaker AutoML API for time series forecasting to process the data, train the ML model, and deploy the model programmatically. Refer to the sample notebook in the GitHub repo for a detailed implementation on how to train a time series model and produce predictions using the model.

Refer to the workshop for more hands-on experience.

Conclusion

In this post, we outlined steps to transition from Forecast and build time series ML models in SageMaker Canvas, and provided a data transformation notebook and prescriptive guidance through a workshop. After the transition, you can benefit from a more accessible UI, cost-effectiveness, and higher transparency of the underlying AutoML API in SageMaker Canvas, democratizing time series forecasting within your organization and saving time and resources on model training and deployment.

SageMaker Canvas can be accessed from the SageMaker console. Time series forecasting with Canvas is available in all regions where SageMaker Canvas is available. For more information about AWS Region availability, see AWS Services by Region.

Resources

For more information, see the following resources:


About the Authors

Nirmal Kumar is Sr. Product Manager for the Amazon SageMaker service. Committed to broadening access to AI/ML, he steers the development of no-code and low-code ML solutions. Outside work, he enjoys travelling and reading non-fiction.

Dan Sinnreich is a Sr. Product Manager for Amazon SageMaker, focused on expanding no-code / low-code services. He is dedicated to making ML and generative AI more accessible and applying them to solve challenging problems. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Biswanath Hore is a Solutions Architect at Amazon Web Services. He works with customers early in their AWS journey, helping them adopt cloud solutions to address their business needs. He is passionate about Machine Learning and, outside of work, loves spending time with his family.

Read More