SLMming Down Latency: How NVIDIA’s First On-Device Small Language Model Makes Digital Humans More Lifelike

SLMming Down Latency: How NVIDIA’s First On-Device Small Language Model Makes Digital Humans More Lifelike

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC and workstation users.

At Gamescom this week, NVIDIA announced that NVIDIA ACE — a suite of technologies for bringing digital humans to life with generative AI — now includes the company’s first on-device small language model (SLM), powered locally by RTX AI.

The model, called Nemotron-4 4B Instruct, provides better role-play, retrieval-augmented generation and function-calling capabilities, so game characters can more intuitively comprehend player instructions, respond to gamers, and perform more accurate and relevant actions.

Available as an NVIDIA NIM microservice for cloud and on-device deployment by game developers, the model is optimized for low memory usage, offering faster response times and providing developers a way to take advantage of over 100 million GeForce RTX-powered PCs and laptops and NVIDIA RTX-powered workstations.

The SLM Advantage

An AI model’s accuracy and performance depends on the size and quality of the dataset used for training. Large language models are trained on vast amounts of data, but are typically general-purpose and contain excess information for most uses.

SLMs, on the other hand, focus on specific use cases. So even with less data, they’re capable of delivering more accurate responses, more quickly — critical elements for conversing naturally with digital humans.

Nemotron-4 4B was first distilled from the larger Nemotron-4 15B LLM. This process requires the smaller model, called a “student,” to mimic the outputs of the larger model, appropriately called a “teacher.” During this process, noncritical outputs of the student model are pruned or removed to reduce the parameter size of the model. Then, the SLM is quantized, which reduces the precision of the model’s weights.

With fewer parameters and less precision, Nemotron-4 4B has a lower memory footprint and faster time to first token — how quickly a response begins — than the larger Nemotron-4 LLM while still maintaining a high level of accuracy due to distillation. Its smaller memory footprint also means games and apps that integrate the NIM microservice can run locally on more of the GeForce RTX AI PCs and laptops and NVIDIA RTX AI workstations that consumers own today.

This new, optimized SLM is also purpose-built with instruction tuning, a technique for fine-tuning models on instructional prompts to better perform specific tasks. This can be seen in Mecha BREAK, a video game in which players can converse with a mechanic game character and instruct it to switch and customize mechs.

ACEs Up

ACE NIM microservices allow developers to deploy state-of-the-art generative AI models through the cloud or on RTX AI PCs and workstations to bring AI to their games and applications. With ACE NIM microservices, non-playable characters (NPCs) can dynamically interact and converse with players in the game in real time.

ACE consists of key AI models for speech-to-text, language, text-to-speech and facial animation. It’s also modular, allowing developers to choose the NIM microservice needed for each element in their particular process.

NVIDIA Riva automatic speech recognition (ASR) processes a user’s spoken language and uses AI to deliver a highly accurate transcription in real time. The technology builds fully customizable conversational AI pipelines using GPU-accelerated multilingual speech and translation microservices. Other supported ASRs include OpenAI’s Whisper, a open-source neural net that approaches human-level robustness and accuracy on English speech recognition.

Once translated to digital text, the transcription goes into an LLM — such as Google’s Gemma, Meta’s Llama 3 or now NVIDIA Nemotron-4 4B — to start generating a response to the user’s original voice input.

Next, another piece of Riva technology — text-to-speech — generates an audio response. ElevenLabs’ proprietary AI speech and voice technology is also supported and has been demoed as part of ACE, as seen in the above demo.

Finally, NVIDIA Audio2Face (A2F) generates facial expressions that can be synced to dialogue in many languages. With the microservice, digital avatars can display dynamic, realistic emotions streamed live or baked in during post-processing.

The AI network automatically animates face, eyes, mouth, tongue and head motions to match the selected emotional range and level of intensity. And A2F can automatically infer emotion directly from an audio clip.

Finally, the full character or digital human is animated in a renderer, like Unreal Engine or the NVIDIA Omniverse platform.

AI That’s NIMble

In addition to its modular support for various NVIDIA-powered and third-party AI models, ACE allows developers to run inference for each model in the cloud or locally on RTX AI PCs and workstations.

The NVIDIA AI Inference Manager software development kit allows for hybrid inference based on various needs such as experience, workload and costs. It streamlines AI model deployment and integration for PC application developers by preconfiguring the PC with the necessary AI models, engines and dependencies. Apps and games can then orchestrate inference seamlessly across a PC or workstation to the cloud.

ACE NIM microservices run locally on RTX AI PCs and workstations, as well as in the cloud. Current microservices running locally include Audio2Face, in the Covert Protocol tech demo, and the new Nemotron-4 4B Instruct and Whisper ASR in Mecha BREAK.

To Infinity and Beyond

Digital humans go far beyond NPCs in games. At last month’s SIGGRAPH conference, NVIDIA previewed “James,” an interactive digital human that can connect with people using emotions, humor and more. James is based on a customer-service workflow using ACE.

Interact with James at ai.nvidia.com.

Changes in communication methods between humans and technology over the decades eventually led to the creation of digital humans. The future of the human-computer interface will have a friendly face and require no physical inputs.

Digital humans drive more engaging and natural interactions. According to Gartner, 80% of conversational offerings will embed generative AI by 2025, and 75% of customer-facing applications will have conversational AI with emotion. Digital humans will transform multiple industries and use cases beyond gaming, including customer service, healthcare, retail, telepresence and robotics.

Users can get a glimpse of this future now by interacting with James in real time at ai.nvidia.com.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

How Snowflake Is Unlocking the Value of Data With Large Language Models

How Snowflake Is Unlocking the Value of Data With Large Language Models

Snowflake is using AI to help enterprises transform data into insights and applications. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz and Baris Gultekin, head of AI at Snowflake, discuss how the company’s AI Data Cloud platform enables customers to access and manage data at scale. By separating the storage of data from compute, Snowflake has allowed organizations across the world to connect via cloud technology and work on a unified platform — eliminating data silos and streamlining collaborative workflows.

Time Stamps

1:45: What does Snowflake do?
3:18: Snowflake’s AI and data strategies — building a platform with natural language analysis
5:30: How to efficiently access large language models with Snowflake Cortex
11:49: Snowflake’s open-source LLM: Arctic
16:18: Gultekin’s journey in AI and data science
23:05: The AI industry in three to five years — real-world applications of Snowflake technology
27:54: Gutlekin’s advice for professionals interested in AI

You Might Also Like:

How Roblox Uses Generative AI to Enhance User Experiences – Ep. 227

Roblox is a colorful online platform reimagining the way people come together. Anupam Singh, vice president of AI and growth engineering at Roblox, discusses how the company uses generative AI to enhance virtual experiences and bolster inclusivity and user safety.

NVIDIA’s Jim Fan Delves Into Large Language Models and Their Industry Impact – Ep. 204

Most know Minecraft as the popular blocky sandbox game, but for Jim Fan, senior AI scientist at NVIDIA, Minecraft was the perfect place to test the decision-making agency of AI models. Fan discusses how he used large language models to research open-ended AI agents and create Voyager, an AI bot built with Chat GPT-4 that can autonomously play Minecraft.

Media.Monk’s Lewis Smithingham on Enhancing Media and Marketing With AI – Ep. 222

Media.Monks’ platform Wormhole streamlines marketing and content creation workflows with AI-powered insights. Lewis Smithingham, senior vice president of innovation and special operations at Media.Monks, addresses AI’s potential in the entertainment and advertisement industries.

NVIDIA’s Annamali Chockalingam on the Rise of LLMs – Ep. 206

LLMs are in the spotlight, capable of tasks like generation, summarization, translation, instruction and chatting. Annamalai Chockalingam, senior product manager of developer marketing at NVIDIA, discusses how a combination of these modalities and actions can build applications to solve any problem.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

High-Tech Highways: India Uses NVIDIA Accelerated Computing to Ease Tollbooth Traffic

High-Tech Highways: India Uses NVIDIA Accelerated Computing to Ease Tollbooth Traffic

India is home to the globe’s second-largest road network, spanning nearly 4 million miles, and has over a thousand tollbooths, most of them run manually.

Traditional booths like these, wherever in the world they’re deployed, can contribute to massive traffic delays, long commute times and serious road congestion.

To help automate tollbooths across India, Calsoft, an Indian-American technology company, helped implement a broad range of NVIDIA technologies integrated with the country’s dominant payment system, known as the unified payments interface, or UPI, for a client.

Manual tollbooths demand more time and labor compared to automated ones. However, automating India’s toll systems faces an extra complication: the diverse range of license plates.

India’s non-standardized plates pose a significant challenge to the accuracy of automatic number plate recognition (ANPR) systems. Any implementation would need to address these plate variations, which include divergent color, sizing, font styles and placement upon vehicles, as well as many different languages.

The solution Calsoft helped build automatically reads passing vehicle plates and charges the associated driver’s UPI account. This approach reduces the need for manual toll collection and is a massive step toward addressing traffic in the region.

Automation in Action

As part of a pilot program, this solution has been deployed in several leading metropolitan cities. The solution provides about 95% accuracy in its ability to read plates through the use of an ANPR pipeline that detects and classifies the plates as they roll through tollbooths.

NVIDIA’s technology has been crucial in this effort, according to Vipin Shankar, senior vice president of technology at Calsoft. “Particularly challenging was night-time detection,” he said. “Another challenge was model accuracy improvement on pixel distortions due to environmental impacts like fog, heavy rains, reflections due to bright sunshine, dusty winds and more.”

The solution uses NVIDIA Metropolis to track and detect vehicles throughout the process. Metropolis is an application framework, a set of developer tools and a partner ecosystem that brings visual data and AI together to improve operational efficiency and safety across a range of industries.

Calsoft engineers used NVIDIA Triton Inference Server software to deploy and manage their AI models. The team also used the NVIDIA DeepStream software development kit to build a real-time streaming platform. This was key for processing and analyzing data streams efficiently, incorporating advanced capabilities such as real-time object detection and classification.

Calsoft uses NVIDIA hardware, including NVIDIA Jetson edge AI modules and NVIDIA A100 Tensor Core GPUs in its AI solutions. Calsoft’s tollbooth solution is also scalable, meaning it’s designed to accommodate future growth and expansion needs, and can better ensure sustained performance and adaptability as traffic conditions evolve.

Learn how NVIDIA Metropolis has helped other municipalities, like Raleigh, North Carolina, better manage traffic flow and enhance pedestrian safety. 

Read More

NVIDIA Showcases New AI Capabilities With ACE, RTX Games and More at Gamescom 2024

NVIDIA Showcases New AI Capabilities With ACE, RTX Games and More at Gamescom 2024

At Gamescom, the world’s biggest gaming expo, NVIDIA has once again pushed the boundaries of gaming technology to ensure that gamers have incredibly immersive experiences and can enjoy enhanced performance and visual fidelity.

The company’s announcements today include its first digital human technologies on-device small language model showcased in the first game tech demo, Mecha BREAK, a milestone celebration of 600 RTX games and applications with 20 new RTX games announced, and new games on GeForce NOW.

Alongside these, NVIDIA announced a collaboration with MediaTek that brings G-SYNC display technologies to more gamers.

Gamescom, held every year in Cologne, Germany, is where innovators from across the gaming community showcase their latest creations. In 2018, NVIDIA founder and CEO Jensen Huang introduced NVIDIA RTX at the event, bringing real-time ray tracing and AI to gaming and setting a new standard for graphics performance.

NVIDIA ACE: Advancing AI-Powered Game Characters

Leading NVIDIA’s announcements at Gamescom was NVIDIA ACE, a revolutionary suite of technologies for bringing digital humans to life with generative AI.

The first game to showcase ACE and digital human technologies is Amazing Seasun Games’ Mecha BREAK, a fast-based mech combat game that demonstrates the potential of AI-powered game characters.

https://www.youtube.com/watch?v=d5z7oIXhVqg

The ACE suite also expanded with NVIDIA’s first digital human technologies on-device small language model (SLM), NVIDIA Nemotron-4 4B Instruct, improving conversation for game characters. This new on-device model provides better role-play, retrieval-augmented generation and function-calling capabilities, allowing game characters to more intuitively comprehend player instructions, respond to gamers and perform more accurate and relevant actions.

Perfect World Games is advancing its ACE and digital human tech demo, Legends, with new AI-powered vision capabilities, unlocking a new level of immersion and accessibility for PC games.

Celebrating 600 RTX Games and Apps With 20 New RTX Titles

NVIDIA RTX continues to revolutionize the ways people play and create with ray tracing, DLSS and AI-powered technologies. Today marks another RTX milestone: 600 RTX-enhanced games and applications are now available.

This week, NVIDIA announced 20 new RTX and DLSS titles to join this impressive roster, including high-profile games such as Indiana Jones and the Great Circle, Dune: Awakening and Dragon Age: The Veilguard.

Game Science’s much-anticipated Black Myth: Wukong also launches today, featuring full ray tracing and DLSS 3, delivering the ultimate RTX experience for GeForce RTX 40 Series gamers.

https://www.youtube.com/watch?v=97egUiMlLZM

Half-Life 2: An RTX Remix Project Unveils Remastered Nova Prospekt

Half-Life 2 RTX: An RTX Remix Project from Orbifold Studios is a community remaster of Valve’s classic game. Now boasting over 100 contributing artists, Orbifold Studios unveiled a remaster of one of Half-Life 2’s most iconic levels, Nova Prospekt.

Using NVIDIA RTX Remix, Orbifold Studios has remastered Nova Prospekt with full ray tracing, DLSS 3.5 with Ray Reconstruction and Reflex. The Nova Prospekt trailer also reveals remasters of Gordon’s revolver, shotgun and Overwatch Standard Issue Pulse Rifle, remasters of the Combine soldiers and Antlions, and the addition of new geometry and detail that uses the capabilities of modern PCs to increase realism.

https://www.youtube.com/watch?v=R0-F8sPprmA

NVIDIA and MediaTek Bring G-SYNC Display Technologies to More Gamers 

NVIDIA and MediaTek are collaborating to make the industry’s best gaming display technologies more accessible.

The companies’ collaboration integrates the full suite of NVIDIA G-SYNC technologies to the world’s most popular scalers, allowing for the creation of feature-rich G-SYNC monitors at a more affordable price.

A highlight of this collaboration is the introduction of G-SYNC Pulsar, a new technology that offers 4x the effective motion clarity alongside a smooth and tear-free variable refresh rate (VRR) experience. G-SYNC Pulsar will debut on newly announced monitors, including the ASUS ROG Swift 360Hz PG27AQNR, Acer Predator XB273U F5 and AOC AGON PRO AG276QSG2.

GeForce NOW Raises the Bar for Cloud Gaming

Each week geforcenow.com adds top-tier PC games on GFN Thursday to stream at peak performance from GeForce RTX SuperPODs in the cloud, along with new features and updates for members.

For Gamescom, GeForce NOW is adding the highly anticipated action role-playing game Black Myth: Wukong from Game Science, as well as a demo for the upcoming PC release of FINAL FANTASY XVI from Square Enix. A new update brings Xbox automatic sign-in, making it easy for members to quickly jump into their PC games across devices by linking their account just once.

These latest GeForce NOW updates — available today — raise the bar for cloud gaming and build on recent milestones, including added support for mods, new data centers in Japan and Poland, and 2,000 games available in the cloud. Check out GeForce NOW’s Gamescom blog for more details.

Star Wars Outlaws: GeForce RTX 40 Series Bundle 

In collaboration with Ubisoft, Massive Entertainment and Lucasfilm Games, NVIDIA is launching a new Star Wars Outlaws GeForce RTX 40 Series Bundle. Gamers will experience the first-ever open-world Star Wars game, set between the events of Star Wars: The Empire Strikes Back and Star Wars: Return of the Jedi, enhanced with NVIDIA DLSS 3.5, ray tracing and Reflex technologies. It’ll also be available in the cloud on GeForce NOW.

For all the news and details on NVIDIA’s latest Gamescom announcements, visit GeForce News

Read More

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Migrate Amazon SageMaker Data Wrangler flows to Amazon SageMaker Canvas for faster data preparation

Amazon SageMaker Data Wrangler provides a visual interface to streamline and accelerate data preparation for machine learning (ML), which is often the most time-consuming and tedious task in ML projects. Amazon SageMaker Canvas is a low-code no-code visual interface to build and deploy ML models without the need to write code. Based on customers’ feedback, we have combined the advanced ML-specific data preparation capabilities of SageMaker Data Wrangler inside SageMaker Canvas, providing users with an end-to-end, no-code workspace for preparing data, and building and deploying ML models.

By abstracting away much of the complexity of the ML workflow, SageMaker Canvas enables you to prepare data, then build or use a model to generate highly accurate business insights without writing code. Additionally, preparing data in SageMaker Canvas offers many enhancements, such as page loads up to 10 times faster, a natural language interface for data preparation, the ability to view the data size and shape at every step, and improved replace and reorder transforms to iterate on a data flow. Finally, you can one-click create a model in the same interface, or create a SageMaker Canvas dataset to fine-tune foundation models (FMs).

This post demonstrates how you can bring your existing SageMaker Data Wrangler flows—the instructions created when building data transformations—from SageMaker Studio Classic to SageMaker Canvas. We provide an example of moving files from SageMaker Studio Classic to Amazon Simple Storage Service (Amazon S3) as an intermediate step before importing them into SageMaker Canvas.

Solution overview

The high-level steps are as follows:

  1. Open a terminal in SageMaker Studio and copy the flow files to Amazon S3.
  2. Import the flow files into SageMaker Canvas from Amazon S3.

Prerequisites

In this example, we use a folder called data-wrangler-classic-flows as a staging folder for migrating flow files to Amazon S3. It is not necessary to create a migration folder, but in this example, the folder was created using the file system browser portion of SageMaker Studio Classic. After you create the folder, take care to move and consolidate relevant SageMaker Data Wrangler flow files together. In the following screenshot, three flow files necessary for migration have been moved into the folder data-wrangler-classic-flows, as seen in the left pane. One of these files, titanic.flow, is opened and visible in the right pane.

Copy flow files to Amazon S3

To copy the flow files to Amazon S3, complete the following steps:

  1. To open a new terminal in SageMaker Studio Classic, on the File menu, choose Terminal.
  2. With a new terminal open, you can supply the following commands to copy your flow files to the Amazon S3 location of your choosing (replacing NNNNNNNNNNNN with your AWS account number):
    cd data-wrangler-classic-flows
    target="s3://sagemaker-us-west-2-NNNNNNNNNNNN/data-wrangler-classic-flows/"
    aws s3 sync . $target --exclude "*.*" --include "*.flow"

The following screenshot shows an example of what the Amazon S3 sync process should look like. You will get a confirmation after all files are uploaded. You can adjust the preceding code to meet your unique input folder and Amazon S3 location needs. If you don’t want to create a folder, when you enter the terminal, simply skip the change directory (cd) command, and all flow files on your entire SageMaker Studio Classic file system will be copied to Amazon S3, regardless of origin folder.

After you upload the files to Amazon S3, you can validate that they have been copied using the Amazon S3 console. In the following screenshot, we see the original three flow files, now in an S3 bucket.

Import Data Wrangler flow files into SageMaker Canvas

To import the flow files into SageMaker Canvas, complete the following steps:

  1. On the SageMaker Studio console, choose Data Wrangler in the navigation pane.
  2. Choose Import data flows.
  3. For Select a data source, choose Amazon S3.
  4. For Input S3 endpoint, enter the Amazon S3 location you used earlier to copy files from SageMaker Studio to Amazon S3, then choose Go. You can also navigate to the Amazon S3 location using the browser below.
  5. Select the flow files to import, then choose Import.

After you import the files, the SageMaker Data Wrangler page will refresh to show the newly imported files, as shown in the following screenshot.

Use SageMaker Canvas for data transformation with SageMaker Data Wrangler

Choose one of the flows (for this example, we choose titanic.flow) to launch the SageMaker Data Wrangler transformation.

Now you can add analyses and transformations to the data flow using a visual interface (Accelerate data preparation for ML in Amazon SageMaker Canvas) or natural language interface (Use natural language to explore and prepare data with a new capability of Amazon SageMaker Canvas).

When you’re happy with the data, choose the plus sign and choose Create model, or choose Export to export the dataset to build and use ML models.

Alternate migration method

This post has provided guidance on using Amazon S3 to migrate SageMaker Data Wrangler flow files from a SageMaker Studio Classic environment. Phase 3: (Optional) Migrate data from Studio Classic to Studio provides a second method that uses your local machine to transfer the flow files. Furthermore, you can download single flow files from the SageMaker Studio tree control to your local machine, then import them manually in SageMaker Canvas. Choose the method that suits your needs and use case.

Clean up

When you’re done, shut down any running SageMaker Data Wrangler applications in SageMaker Studio Classic. To save costs, you can also remove any flow files from the SageMaker Studio Classic file browser, which is an Amazon Elastic File System (Amazon EFS) volume. You can also delete any of the intermediate files in Amazon S3. After the flow files are imported into SageMaker Canvas, the files copied to Amazon S3 are no longer needed.

You can log out of SageMaker Canvas when you’re done, then relaunch it when you’re ready to use it again.

Conclusion

Migrating your existing SageMaker Data Wrangler flows to SageMaker Canvas is a straightforward process that allows you to use the advanced data preparations you’ve already developed while taking advantage of the end-to-end, low-code no-code ML workflow of SageMaker Canvas. By following the steps outlined in this post, you can seamlessly transition your data wrangling artifacts to the SageMaker Canvas environment, streamlining your ML projects and enabling business analysts and non-technical users to build and deploy models more efficiently.

Start exploring SageMaker Canvas today and experience the power of a unified platform for data preparation, model building, and deployment!


About the Authors

Charles Laughlin is a Principal AI Specialist at Amazon Web Services (AWS). Charles holds an MS in Supply Chain Management and a PhD in Data Science. Charles works in the Amazon SageMaker service team where he brings research and voice of the customer to inform the service roadmap. In his work, he collaborates daily with diverse AWS customers to help transform their businesses with cutting-edge AWS technologies and thought leadership.

Dan Sinnreich is a Sr. Product Manager for Amazon SageMaker, focused on expanding no-code / low-code services. He is dedicated to making ML and generative AI more accessible and applying them to solve challenging problems. Outside of work, he can be found playing hockey, scuba diving, and reading science fiction.

Huong Nguyen is a Sr. Product Manager at AWS. She is leading the ML data preparation for SageMaker Canvas and SageMaker Data Wrangler, with 15 years of experience building customer-centric and data-driven products.

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.get confirmation

Read More

Use IP-restricted presigned URLs to enhance security in Amazon SageMaker Ground Truth

Amazon SageMaker Ground Truth significantly reduces the cost and time required for labeling data by integrating human annotators with machine learning to automate the labeling process. You can use SageMaker Ground Truth to create labeling jobs, which are workflows where data objects (such as images, videos, or documents) need to be annotated by human workers. These labeling jobs are distributed among a workteam—a group of workers assigned to perform the annotations. To access the data objects they need to label, workers are provided with Amazon S3 presigned URLs.

A presigned URL is a temporary URL that grants time-limited access to an Amazon Simple Storage Service (Amazon S3) object. In the context of SageMaker Ground Truth, these presigned URLs are generated using the grant_read_access Liquid filter and embedded into the task templates. Workers can then use these URLs to directly access the necessary files, such as images or documents, in their web browsers for annotation purposes.

While presigned URLs offer a convenient way to grant temporary access to S3 objects, sharing these URLs with people outside of the workteam can lead to unintended access of those objects. To mitigate this risk and enhance the security of SageMaker Ground Truth labeling tasks, we have introduced a new feature that adds an additional layer of security by restricting access to the presigned URLs to the worker’s IP address or virtual private cloud (VPC) endpoint from which they access the labeling task. In this blog post, we show you how to enable this feature, allowing you to enhance your data security as needed, and outline the success criteria for this feature, including the scenarios where it will be most beneficial.

Prerequisites

Before you get started configuring IP-restricted presigned URLs, the following resources can help you understand the background concepts:

  • Amazon S3 presigned URL: This documentation covers the use of Amazon S3 presigned URLs, which provide temporary access to objects. Understanding how presigned URLs work will be beneficial.
  • Use Amazon SageMaker Ground Truth to label data: This guide explains how to use SageMaker Ground Truth for data labeling tasks, including setting up workteams and workforces. Familiarity with these concepts will be helpful when configuring IP restrictions for your workteams.

Introducing IP-restricted presigned URLs

Working closely with our customers, we recognized the need for enhanced security posture and stricter access controls to presigned URLs. So, we introduced a new feature that uses AWS global condition context keys aws:SourceIp and aws:VpcSourceIp to allow customers to restrict presigned URL access to specific IP addresses or VPC endpoints. By incorporating AWS Identity and Access Management (IAM) policy constraints, you can now restrict presigned URLs to only be accessible from an IP address or VPC endpoint of your choice. This IP-based access control effectively locks down the presigned URL to the worker’s location, mitigating the risk of unauthorized access or unintended sharing.

Benefits of the new feature

This update brings several significant security benefits to SageMaker Ground Truth:

  • Enhanced data privacy: These IP restrictions restrict presigned URLs to only be accessible from customer-approved locations, such as corporate VPNs, workers’ home networks, or designated VPC endpoints. Although the presigned URLs are pre-authenticated, this feature adds an additional layer of security by verifying the access location and locking the URL to that location until the task is completed.
  • Reduced risk of unauthorized access: Enforcing IP-based access controls minimizes the risk of data being accessed from unauthorized locations and mitigates the risk of data sharing outside the worker’s approved access network. This is particularly important when dealing with sensitive or confidential data.
  • Flexible security options: You can apply these restrictions in either VPC or non-VPC settings, allowing you to tailor security measures to your organization’s specific needs.
  • Auditing and compliance: By locking down presigned URLs to specific IP addresses or VPC endpoints, you can more easily track and audit access to your organization’s data, helping achieve compliance with internal policies and external regulations.
  • Seamless integration: This new feature seamlessly integrates with existing SageMaker Ground Truth workflows, providing enhanced security without disrupting established labeling processes or requiring significant changes to existing infrastructure.

By introducing IP-Restricted presigned URLs, SageMaker Ground Truth empowers you with greater control over data access, so sensitive information remains accessible only to authorized workers within approved locations.

Configuring IP-restricted presigned URLs for SageMaker Ground Truth

The new IP restriction feature for presigned URLs in SageMaker Ground Truth can be enabled through the SageMaker API or the AWS Command Line Interface (AWS CLI). Before we go into the configuration of this new feature, let’s look at how you can create and update workteams today using the AWS CLI. You can also perform these operations through the SageMaker API using the AWS SDK.

Here’s an example of creating a new workteam using the create-workteam command:

aws sagemaker create-workteam 
    --description "A team for image labeling tasks" 
    --workforce-name "default" 
    --workteam-name "MyWorkteam" 
    --member-definitions '{
        "CognitoMemberDefinition": {
            "ClientId": "exampleclientid",
            "UserGroup": "sagemaker-groundtruth-user-group",
            "UserPool": "us-west-2_examplepool"
        }
    }'

To update an existing workteam, you use the update-workteam command:

aws sagemaker update-workteam 
    --workteam-name "MyWorkteam" 
    --description "Updated description for image labeling tasks"

Note that these examples only show a subset of the available parameters for the create-workteam and update-workteam APIs. You can find detailed documentation and examples in the SageMaker Ground Truth Developer Guide.

Enabling IP restrictions for presigned URLs

With the new IP restriction feature, you can now configure IP-based access constraints specific to each workteam when creating a new workteam or modifying an existing one. Here’s how you can enable these restrictions:

  1. When creating or updating a workteam, you can specify a WorkerAccessConfiguration object, which defines access constraints for the workers in that workteam.
  2. Within the WorkerAccessConfiguration, you can include an S3Presign object, which allows you to set access configurations for the presigned URLs used by the workers. Currently, only IamPolicyConstraints can be added to the S3Presign SageMaker Ground Truth provides two Liquid filters that you can use in your custom worker task templates to generate presigned URLs:
    • grant_read_access: This filter generates a presigned URL for the specified S3 object, granting temporary read access. The command will look like:
      <!-- Using grant_read_access filter -->
      <img src="{{ s3://bucket-name/path/to/image.jpg | grant_read_access }}"/>

    • s3_presign: This new filter serves the same purpose as grant_read_access but makes it clear that the generated URL is subject to the S3Presign configuration defined for the workteam. The command will look like:
      <!-- Using s3_presign filter (equivalent) -->
      <img src="{{ s3://bucket-name/path/to/image.jpg | s3_presign }}"/>

  3. The S3Presign object supports IamPolicyConstraints, where you can enable or disable the SourceIp and VpcSourceIp
    • SourceIp: When enabled, workers can access presigned URLs only from the specified IP addresses or ranges.
    • VpcSourceIp: When enabled, workers can access presigned URLs only from the specified VPC endpoints within your AWS account.

You can call the SageMaker ListWorkteams or DescribeWorkteam APIs to view workteams’ metadata, including the WorkerAccessConfiguration.

Let’s say you want to create or update a workteam so that presigned URLs will be restricted to the public IP address of the worker who originally accessed it.

Create workteam:

aws sagemaker create-workteam 
    --description "An example workteam with S3 presigned URLs restricted" 
    --workforce-name "default" 
    --workteam-name "exampleworkteam" 
    --member-definitions '{
        "CognitoMemberDefinition": {
            "ClientId": "exampleclientid",
            "UserGroup": "sagemaker-groundtruth-user-group", 
            "UserPool": "us-west-2_examplepool"
        }
    }' 
    --worker-access-configuration '{
        "S3Presign": {
            "IamPolicyConstraints": {
                "SourceIp": "Enabled",
                "VpcSourceIp": "Disabled"
            }
        }
    }'

Update workteam:

aws sagemaker update-workteam 
    --workteam-name "existingworkteam" 
    --worker-access-configuration '{
        "S3Presign": {
            "IamPolicyConstraints": {
                "SourceIp": "Enabled", 
                "VpcSourceIp": "Disabled"
            }
        }
    }'

Success criteria

While the IP-restricted presigned URLs feature provides enhanced security, there are scenarios where it might not be suitable. Understanding these limitations can help you make an informed decision about using the feature and verify that it aligns with your organization’s security needs and network configurations.

IP-restricted presigned URLs are effective in scenarios where there’s a consistent IP address used by the worker accessing SageMaker Ground Truth and the S3 object. For example, if a worker accesses labeling tasks from a stable public IP address, such as an office network with a fixed IP address, the IP restriction will provide access with enhanced security. Similarly, when a worker accesses both SageMaker Ground Truth and S3 objects through the same VPC endpoint, the IP restriction will verify that the presigned URL is only accessible from within this VPC. In both scenarios, the consistent IP address enables the IP-based access controls to function correctly, providing an additional layer of security.

Scenarios where IP-restricted presigned URLs aren’t effective

Scenario Description Example Exit criteria
Asymmetric VPC endpoints SageMaker Ground Truth is accessed through a public internet connection while Amazon S3 is accessed through a VPC endpoint, or vice versa. Worker accesses SageMaker Ground Truth through the public internet but S3 through a VPC endpoint. Verify that both SageMaker Ground Truth and S3 are accessed either entirely through the public internet or entirely through the same VPC endpoint.
Network Address Translation (NAT) layers NAT layers can alter the source IP address of requests, causing IP mismatches. Issues can arise from dynamically assigned IP addresses or asymmetric configurations. Examples include:

  • N-to-M IP translation where multiple internal IP addresses are translated to multiple public IP addresses.
  • A NAT gateway with multiple public IP addresses assigned to it, which can cause requests to appear from different IP addresses.
  • Shared IP addresses where multiple users’ traffic is routed through a single public IP address, making it difficult to enforce IP-based restrictions effectively.
Verify that the NAT gateway is configured to preserve the source IP address. Validate the NAT configuration for consistency when accessing both SageMaker Ground Truth and S3 resources.
Use of VPNs VPNs change the outgoing IP address, leading to potential access issues with IP-restricted presigned URLs. Worker uses a split-tunnel VPN that changes IP address for different requests to Ground Truth or S3, access might be denied. Disable the VPN or use a full tunnel VPN that offers consistent IP address for all requests.

Interface endpoints aren’t supported by the grant_read_access feature because of their inability to resolve public DNS names. This limitation is orthogonal to the IP restrictions and should be considered when configuring your network setup for accessing S3 objects with presigned URLs. In such cases, use the S3 Gateway endpoint when accessing S3 to verify compatibility with the public DNS names generated by grant_read_access.

Using S3 access logs for debugging

To debug issues related to IP-restricted presigned URLs, S3 access logs can provide valuable insights. By enabling access logging for your S3 bucket, you can track every request made to your S3 objects, including the IP addresses from which the requests originate. This can help you identify:

  • Mismatches between expected and actual IP addresses
  • Dynamic IP addresses or VPNs causing access issues
  • Unauthorized access from unexpected locations

To debug using S3 access logs, follow these steps:

  1. Enable S3 access logging: Configure your bucket to deliver access logs to another bucket or a logging service such as Amazon CloudWatch Logs.
  2. Review log files: Analyze the log files to identify patterns or anomalies in IP addresses, request timestamps, and error codes.
  3. Look for IP address changes: If you observe frequent changes in IP addresses within the logs, it might indicate that the worker’s IP address is dynamic or altered by a VPN or proxy.
  4. Check for NAT layer modifications: See if NAT layers are modifying the source IP address by checking the x-forwarded-for header in the log files.
  5. Verify authorized access: Confirm that requests are coming from approved and consistent IP addresses by checking the Remote IP field in the log files.

By following these steps and analyzing the S3 access logs, you can validate that the presigned URLs are accessed only from approved and consistent IP addresses.

Conclusion

The introduction of IP-restricted presigned URLs in Amazon SageMaker Ground Truth significantly enhances the security of data accessed through the service. By allowing you to restrict access to specific IP addresses or VPC endpoints, this feature helps facilitate more fine-tuned control of presigned URLs. It provides organizations with added protection for their sensitive data, offering a valuable option for those with stringent security requirements. We encourage you to explore this new security feature to protect your organization’s data and enhance the overall security of your labeling workflows. To get started with SageMaker Ground Truth, visit Getting Started. To implement IP restrictions on presigned URLs as part of your workteam setup, refer to the CreateWorkteam and UpdateWorkteam API documentation. Follow the guidance provided in this blog to configure these security measures effectively. For more information or assistance, contact your AWS account team or visit the SageMaker community forums.


About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers build scalable and cost-efficient AI/ML pipelines with Human in the Loop services. In his free time, Sundar loves traveling, sports and enjoying outdoor activities with his family.

Michael Borde is a lead software engineer at Amazon AI, where he has been for seven years. He previously studied mathematics and computer science at the University of Chicago. Michael is passionate about cloud computing, distributed systems design, and digital privacy & security. After work, you can often find Michael putzing around the local powerlifting gym in Capitol Hill.

Jacky Shum is a Software Engineer at AWS in the SageMaker Ground Truth team. He works to help AWS customers leverage machine learning applications, including prior work on ML-based fraud detection with Amazon Fraud Detector.

Rohith Kodukula is a Software Development Engineer on the SageMaker Ground Truth team. In his free time he enjoys staying active and reading up on anything that he finds mildly interesting (most things really).

Abhinay Sandeboina is a Engineering Manager at AWS Human In The Loop (HIL). He has been in AWS for over 2 years and his teams are responsible for managing ML platform services. He has a decade of experience in software/ML engineering building infrastructure platforms at scale. Prior to AWS, he worked in various engineering management roles at Zillow and Capital One.

Read More

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

Unlock the power of structured data for enterprises using natural language with Amazon Q Business

One of the most common applications of generative artificial intelligence (AI) and large language models (LLMs) in an enterprise environment is answering questions based on the enterprise’s knowledge corpus. Pre-trained foundation models (FMs) excel at natural language understanding (NLU) tasks, including summarization, text generation, and question answering across a wide range of topics. However, they often struggle to provide accurate answers without hallucinations and fall short when addressing questions about content that wasn’t included in their training data. Furthermore, FMs are trained with a point-in-time snapshot of data and have no inherent ability to access fresh data at inference time; therefore, they might provide responses that are incorrect or inadequate.

We face a fundamental challenge with enterprise data—overcoming the disconnect between natural language and structured data. Natural language is ambiguous and imprecise, whereas data adheres to rigid schemas. For example, SQL queries can be complex and unintuitive for non-technical users. Handling complex queries involving multiple tables, joins, and aggregations makes it difficult to interpret user intent and translate it into correct SQL operations. Domain-specific terminology further complicates the mapping process. Another challenge is accommodating the linguistic variations users employ to express the same requirement. Effectively managing synonyms, paraphrases, and alternative phrasings is important. The inherent ambiguity of natural language can also result in multiple interpretations of a single query, making it difficult to accurately understand the user’s precise intent.

To bridge this gap, you need advanced natural language processing (NLP) to map user queries to database schema, tables, and operations. In this architecture, Amazon Q Business acts as an intermediary, translating natural language into precise SQL queries. You can simply ask questions like “What were the sales for outdoor gear in Q3 2023?” Amazon Q Business analyzes intent, accesses data sources, and generates the SQL query. This simplifies data access for your non-technical users and streamlines workflows for professionals, allowing them to focus on higher-level tasks.

In this post, we discuss an architecture to query structured data using Amazon Q Business, and build out an application to query cost and usage data in Amazon Athena with Amazon Q Business. Amazon Q Business can create SQL queries to your data sources when provided with the database schema, additional metadata describing the columns and tables, and prompting instructions. You can extend this architecture to use additional data sources, query validation, and prompting techniques to cover a wider range of use cases.

Solution overview

The following figure represents the high-level architecture of the proposed solution. Steps 3 and 4 augment the AWS IAM Identity Center integration with Amazon Q Business for an authorization flow. In this architecture, we use Amazon Cognito for user authentication as well as a trusted token issuer to IAM Identity Center. You can also use your own identity provider as a trusted token issuer as long as it supports OpenID Connect (OIDC).

architecture diagram

The workflow includes the following steps:

  1. The user initiates the interaction with the Streamlit application, which is accessible through an Application Load Balancer, acting as the entry point.
  2. The application prompts the user to authenticate using their Amazon Cognito credentials, maintaining secure access.
  3. The application exchanges the token obtained from Amazon Cognito for an IAM Identity Center token, granting the necessary scope to interact with Amazon Q Business.
  4. Using the IAM Identity Center token, the application assumes an AWS Identity and Access Management (IAM) role and retrieves an AWS session from AWS Security Token Service (AWS STS), enabling authorized communication with Amazon Q Business.
  5. Based on the user’s natural language query, the application formulates relevant prompts and metadata, which are then submitted to the chat_sync API of Amazon Q Business. In response, Amazon Q Business provides an appropriate Athena query to run.
  6. The application runs the Athena query received from Amazon Q Business, and the resulting data is displayed on the web application’s UI.

Querying Amazon Q Business LLMs directly

As explained in the response settings for Amazon Q Business, there are different options to generate responses that allow you to either use your enterprise data, use LLMs directly, or fall back on the LLMs if the answer is not found in your enterprise data. Along with the global controls for response settings, you need to specify which chatMode you want to use based on your specific use case. If you want to bypass Retrieval Augmented Generation (RAG) and use plain text in the context window, you should use CREATOR_MODE. Alternatively, RAG is also bypassed when you upload files directly in the context window.

If you just use text in the context window and call Amazon Q Business APIs without switching to CREATOR_MODE, that may break your use case in the future if you add content to the index (RAG). In this use case, because we’re not indexing any data and using schemas as attachments in the API call to Amazon Q Business, RAG is automatically bypassed and the response is generated directly from the LLMs. Another reason to use attachments for this use case is that for the chatSync API, userMessage has a maximum length of 7,000, which can be surpassed depending on how large your text is in the context window.

Data query workflow

Let’s look at the prompts, query generation, and Athena query in detail. We use Athena as the data store in this post. Users enter natural language questions into a web application built with Streamlit. Amazon Q Business converts the natural language questions to valid SQL for Athena using the prompting instructions, the database schema, and data dictionary that are provided as context to the LLM. The generated SQL is sent to Athena to run as a query, and the returned data is displayed to the user in the Streamlit application. The following diagram illustrates this workflow.

query workflow

These are the various components to this data flow, as numbered in the diagram:

  1. User intent
  2. Prompt builder
  3. SQL query generator
  4. Running the query
  5. Query results

In the following sections, we look at each component in more detail.

User intent

The user intent or your inquiry is the starting point of the process. It can be in natural language, such as “What was the total spend for ElasticSearch last year?” The user’s input serves as the basis for the subsequent steps in the workflow.

Prompt builder

The prompt builder component plays a crucial role in bridging the gap between your natural language input and the structured data format required for SQL querying. It augments your question with relevant information from the table schema and data dictionary to provide context for the query generation process. This step involves the following sub-tasks:

  • Natural language processing – NLP techniques are employed to analyze and understand your questions. This includes steps like tokenization and dependency parsing to extract the intent and relevant entities from the natural language input.
  • Entity recognition – Named entity recognition (NER) is used to identify and classify relevant entities mentioned in your question, such as product names, dates, or region. This step helps map your input to the corresponding data elements in the database schema.
  • Intent mapping – The prompt builder maps your intent, extracted from the NLP analysis, to the appropriate data structures and operations required to fulfill the query. This mapping process uses the table schema and data dictionary to establish connections between your natural language questions and the database elements. The output of the prompt builder is a structured representation of your question, augmented with the necessary context from the database schema and data dictionary. This structured representation serves as input for the next step, SQL query generation.

The following is an example prompt for “What was the total spend for ElasticSearch last year?”

You will not respond to gibberish, random character sequences, or prompts that do not make logical sense. 
If the input the input does not make sense or is outside the scope of the provided context, do not respond with SQL 
but respond with - I do not know about this. Please fix your input.
You are an expert SQL developer. Only return the sql query. Do not include any verbiage. 
You are required to return SQL queries based on the provided schema and the service mappings for common services and 
their synonyms. The table with the provided schema is the only source of data. Do not use joins. Assume product, 
service are synonyms for product_servicecode and price,cost,spend are synonymns for line_item_unblended_cost. Use the 
column names from the provided schema while creating queries. Do not use preceding zeroes for the column month when 
creating the query. Only use predicates when asked. For your reference, current date is June 01, 2024. write a sql 
query for this task - What was the total spend for ElasticSearch last year?

SQL query generation

Based on the prompt generated from the prompt builder and your original question, Amazon Q Business generates the corresponding SQL query. The SQL query is tailored to retrieve the relevant data and perform the desired analysis or calculations to accurately answer the user’s question. This step may involve techniques such as:

  • Mapping your intent and entities to SQL clauses (SELECT, FROM, WHERE, JOIN, and so on)
  • Handling complex queries involving aggregations, subqueries, or predicates
  • Incorporating domain-specific knowledge or business rules into the query generation process

Running the query

In this step, the generated SQL query is run against the chosen data store, which could be a relational database, data warehouse, NoSQL database, or an object store like Amazon Simple Storage Service (Amazon S3). The data store serves as the repository for the data required to answer the user’s question. Depending on the architecture and requirements, the data store query may involve additional components or processes, such as:

  • Query optimization and indexing strategies
  • Materialized views for complex queries
  • Real-time data ingestion and updates

Query results

The query engine runs the generated SQL query against the data store and returns the query results. These results contain the insights or answers to the original user question. The presentation of the query results can take various forms, depending on the requirements of the application or UI:

  • Tabular data – The results can be displayed as a table or spreadsheet, suitable for structured data analysis
  • Visualizations – The query results can be rendered as charts, graphs, or other visual representations, providing a more intuitive way to understand and explore the data
  • Natural language responses – In some cases, the query results can be translated back into natural language statements or summaries, making the insights more accessible to non-technical users

In the following sections, we walk through the steps to deploy the web application and test the solution.

Prerequisites

Complete the following prerequisite steps:

  1. Set up IAM Identity Center and add users that you intend to give access to in your Amazon Q Business application.
  2. Have an existing, working Amazon Q Business application and give access to the users created in the previous step to the application.
  3. AWS Cost and Usage Reports (AWS CUR) data is available in Athena. If you have CUR data, you can skip the following steps for CUR data setup. If not, you have a few options to set up CUR data:
    1. To set up sample CUR data, refer to the following lab and follow the instructions.
    2. You also need to set up an AWS Glue crawler to make the data available in Athena.
  4. If you already have an SSL certificate, you can skip this step; otherwise, generate a private certificate.
  5. Import the certificate into AWS Certificate Manager (ACM). For more details, refer to Importing a certificate.

Set up the application

Complete the following steps to set up the application:

  1. From your terminal, clone the GitHub repository:
git clone https://github.com/aws-samples/data-insights-with-amazon-q-business.git
  1. Go to the project directory:
cd data-insights-with-amazon-q-business
  1. Based on your CUR table, update the CUR schema under app/schemas/cur_schema.txt. Review the prompts under app/qb_config.py. The schema looks similar to the following code:

  1. Review the data dictionary under app/schemas/service_mappings.csv. You can modify the mappings according to your dataset. A sample data dictionary for CUR might look like the following screenshot.

  1. Zip up the code repository and upload it to an S3 bucket.
  2. Follow the steps in the GitHub repo to deploy the Streamlit application.

Access the web application

As part of the deployment steps, you launched an AWS CloudFormation stack. On the AWS CloudFormation console, navigate to the Outputs tab for the stack and find the URL to access the Streamlit application. When you open the URL in a browser, you’ll see a login screen like the following screenshot. Sign up to create a user in the Amazon Cognito user pool. After you’re validated, you can use the same credentials to log in to the web application.

Query your cost and usage data

Start with a simple query like “What was the total spend for ElasticSearch this year?” A relevant prompt will be created and sent to Amazon Q Business. It will respond back with the corresponding SQL query. Notice the predicate where product_servicecode = ‘AmazonES’. Amazon Q Business is able to formulate the query because it has the schema and the data dictionary in context. It understands that ElasticSearch is an AWS service represented by a column named product_servicecode in the CUR data schema and its corresponding value of ‘AmazonES’. Next, the query is run against Athena and you get the results back.

The sample dataset used in this post is from 2023. If you’re using the sample dataset, natural language queries referring to current year will give not return results. Modify your queries to 2023 or mention the year in the user intent.

The following figure highlights the steps as explained in the data flow.

sample query run

You can also try complex queries likeGive me a list of the top 3 products by total spend last year. For each of these products, what percentage of the overall spend is from this product?” Because the prompt builder has schema and product (AWS services) information in its context, Amazon Q Business creates the corresponding query. In this case, you’ll see a query similar to the following:

SELECT 
product_servicecode,
SUM(line_item_unblended_cost) AS total_spend,
ROUND(SUM(line_item_unblended_cost) * 100.0 / (SELECT SUM(line_item_unblended_cost)
FROM cur_daily WHERE year = '2023'), 2) AS percentage_of_total
FROM cur_daily
WHERE year = '2023'
GROUP BY product_servicecode
ORDER BY total_spend DESC
LIMIT 3;

When the query is run against Athena, you’ll see similar results corresponding to your data.

Along with the data, you can also see a summary and trend analysis of your data on the Description tab of your Streamlit app.

The prompts used in the application are open domain and you’re free to update them in the code. For example, the following is a prompt used for a summary task:

You are an AI assistant. You are required to return a summary based on the provided data in attachment. Use atleast 
100 words. The spend is in dollars. The unit of measurement is dollars. Give trend analysis too. Start your response 
with - Here is your summary..

The following screenshot shows the results.

Feedback loop

You also have the option of capturing feedback for the generated queries with the thumbs up/down icon on the web application. Currently, the feedback is captured in a local file under /app/feedback. You can change this implementation to write to a database of your choice and have it serve as a query validation mechanism after your testing, to allow only validated queries to run.

Clean up

To clean up your resources, delete the CloudFormation stack, Amazon Q Business application, and Athena tables.

Conclusion

In this post, we demonstrated how Amazon Q Business can effectively bridge the gap between users and data, enabling you to extract valuable insights from various data stores using natural language queries, without the need for extensive technical knowledge or SQL expertise. The natural language understanding capabilities of Amazon Q Business can accurately interpret user intent, extract relevant entities, and generate SQL to translate the user’s query into executable data operations. You can now empower a wider range of enterprise users to unlock the full value of your organization’s data assets. By democratizing data access and analysis using natural language queries, you can foster data-driven decision-making, drive innovation, and unlock new opportunities for growth and success.

In Part 2 of this series, we demonstrate how to integrate this architecture with LangChain using Amazon Q Business as a custom model. We also cover query validation and accuracy measurement.


About the Authors

Vishal Karlupia is a Senior Technical Account Manager/Lead at Amazon Web Services, Toronto. He specializes in generative AI applications and helps customers build and scale their AI/ML workloads on AWS. Outside of work, he enjoys being outdoors and keeping bonfires alive.

Srinivas Ganapathi is a Principal Technical Account Manager at Amazon Web Services. He is based in Toronto, Canada, and works with games customers to run efficient workloads on AWS.

Read More

At Gamescom 2024, GeForce NOW Brings ‘Black Myth: Wukong’ and ‘FINAL FANTASY XVI Demo’ to the Cloud

At Gamescom 2024, GeForce NOW Brings ‘Black Myth: Wukong’ and ‘FINAL FANTASY XVI Demo’ to the Cloud

Each week, GeForce NOW elevates cloud gaming by bringing top PC games and new updates to the cloud.

Starting today, members can stream the highly anticipated action role-playing game (RPG) Black Myth: Wukong from Game Science, as well as a demo for the upcoming PC release of FINAL FANTASY XVI from Square Enix.

Experience these triple-A releases now — at peak performance, even on low-powered devices — along with Xbox automatic sign-in coming Aug. 22 on GFN Thursday. The feature will make it easy for members to dive into their favorite PC games.

These latest GeForce NOW updates — announced today at the annual Gamescom conference — build on recent milestones, including added support for mods, new data centers in Japan and Poland, and 2,000 games available in the cloud.

Get Your Game Face On

GeForce NOW recently celebrated 2,000 games in the cloud, thanks to strong collaborations with publishers that have brought their titles to the platform.

New games are added weekly from popular digital gaming stores Steam, Epic Games Store, Ubisoft Connect, Battle.net, and Xbox, including over 140 PC Game Pass titles. Powered by up to GeForce RTX GPU-class graphics, GeForce NOW is bringing even more top titles from celebrated publishers to the cloud.

Black Myth Wukong on GeForce NOW
No monkey business in the cloud — just high-performance gameplay.

Members can be among the first to play Black Myth: Wukong without waiting for downloads or worrying about system requirements.

Take on the role of Sun Wukong the Monkey King, wielding the powers of a magical staff throughout a richly woven narrative filled with mythical creatures and formidable foes. Learn to harness Wukong’s unique abilities and combat styles, reminiscent of the Soulslike game genre. As the “Destined One,” take on various challenges to uncover the truth hidden beneath the veil of a glorious historical legend.

Witness the beauty of ancient China — from snow-draped mountains to intricate cave networks — in exquisite detail elevated by NVIDIA RTX technologies, including full ray tracing and DLSS 3. The GeForce NOW Ultimate membership brings this visual splendor to life at 4K resolution and 120 frames per second, even on underpowered devices, streaming from GeForce RTX 4080 SuperPODs in the cloud.

FINAL FANTASY XVI demo on GeForce NOW
Questing looks so good in the cloud.

The latest mainline numbered entry in the renowned RPG series from Square Enix, FINAL FANTASY XVI is coming soon to the cloud. Get a taste of the title’s high-octane action, breathtaking world and epic storytelling in its PC demo on GeForce NOW. Members can learn to master their Eikonic abilities while battling colossal foes and exploring the stunning realm of Valisthea.

Try the demo on GeForce NOW to jump right into an epic adventure without worrying about system specs.

GeForce NOW members can play these titles and more at high performance. Ultimate members can stream at up to 4K resolution and 120 fps with support for NVIDIA DLSS and NVIDIA Reflex technology, and experience the action even on low-powered devices. Keep an eye out on GFN Thursdays for the latest on game release dates in the cloud.

Set It and Forget It

Xbox SSO on GeForce NOW
Link-credible.

GeForce NOW makes gaming more convenient by letting members link their gaming accounts directly to the cloud service. Starting Aug. 22, such support extends to Xbox accounts — alongside existing support for Epic Games and Ubisoft automatic sign-in — enabling seamless access across devices.

After linking their Xbox profiles just once, members will be signed in automatically across their devices for all future GeForce NOW sessions — speeding access to their favorite PC games, from Fallout 76 to Starfield to Forza Horizon 5.

This new feature joins Xbox game library sync on GeForce NOW, which allows members to sync their supported Xbox Game Pass and Microsoft Store games to their cloud streaming library. Members will enjoy a more unified gaming experience with the ability to instantly play over 140 Xbox Game Pass titles across devices.

With a steady drumbeat of quality games from top publishers and new features to continuously improve the service, GeForce NOW is a gamer’s gateway to high-performance gaming from the cloud, enabling play on any device with real-time ray tracing and high resolutions. Check back every GFN Thursday for more news on upcoming game launches, new game releases, service updates and more.

Read More

NVIDIA Announces First Digital Human Technologies On-Device Small Language Model, Improving Conversation for Game Characters

NVIDIA Announces First Digital Human Technologies On-Device Small Language Model, Improving Conversation for Game Characters

NVIDIA’s first digital human technology small language model is being demonstrated in Mecha BREAK, a new multiplayer mech game developed by Amazing Seasun Games, to bring its characters to life and provide a more dynamic and immersive gameplay experience on GeForce RTX AI PCs.

The new on-device model, called Nemotron-4 4B Instruct, improves the conversation abilities of game characters, allowing them to more intuitively comprehend players and respond naturally.

NVIDIA has optimized ACE technology to run directly on GeForce RTX AI PCs and laptops. This greatly improves developers’ ability to deploy state-of-the-art digital human technology in next-generation games such as Mecha BREAK.

A Small Language Model Purpose-Built for Role-Playing

NVIDIA Nemotron-4 4B Instruct provides better role-play, retrieval-augmented generation and function-calling capabilities, allowing game characters to more intuitively comprehend player instructions, respond to gamers and perform more accurate and relevant actions.

The model is available as an NVIDIA NIM microservice, which provides a streamlined path for developing and deploying generative AI-powered applications. The NIM is optimized for low memory usage, offering faster response times and providing developers a way to take advantage of over 100 million GeForce RTX-powered PCs and laptops.

Nemotron-4 4B Instruct is part of NVIDIA ACE, a suite of digital human technologies that provide speech, intelligence and animation powered by generative AI. It’s available as a NIM for cloud and on-device deployment by game developers.

First Game Showcases ACE NIM Microservices

Mecha BREAK, developed by Amazing Seasun Games, a Kingsoft Corporation game subsidiary, is showcasing the NVIDIA Nemotron-4 4B Instruct NIM running on device in the first display of ACE-powered game interactions. The NVIDIA Audio2Face-3D NIM and Whisper, OpenAI’s automatic speech recognition model, provide facial animation and speech recognition running on-device. ElevenLabs powers the character’s voice through the cloud.

In this demo, shown first at Gamescom, one of the world’s biggest gaming expos, NVIDIA ACE and digital human technologies allow players to interact with a mechanic non-playable character (NPC) that can help them choose from a diverse range of mechanized robots, or mechs, to complement their playstyle or team needs, assist in appearance customization and give advice on how to best prepare their colossal war machine for battle.

“We’re excited to showcase the power and potential of ACE NIM microservices in Mecha BREAK, using Audio2Face and Nemotron-4 4B Instruct to dramatically enhance in-game immersion,” said Kris Kwok, CEO of Amazing Seasun Games.

Perfect World Games Explores Latest Digital Human Technologies

NVIDIA ACE and digital human technologies continue to expand their footprint in the gaming industry.

Global game publisher and developer Perfect World Games is advancing its NVIDIA ACE and digital human technology demo, Legends, with new AI-powered vision capabilities. Within the demo, the character Yun Ni can see gamers and identify people and objects in the real world using the computer’s camera powered by ChatGPT-4o, adding an augmented reality layer to the gameplay experience. These capabilities unlock a new level of immersion and accessibility for PC games.

Learn more about NVIDIA ACE and download the NIM to begin building game characters powered by generative AI. 

Read More

Level Up: NVIDIA, MediaTek to Bring G-SYNC Display Technologies to More Gamers

Level Up: NVIDIA, MediaTek to Bring G-SYNC Display Technologies to More Gamers

Picture this: NVIDIA and MediaTek are working together to make the industry’s best gaming display technologies more accessible to gamers globally.

The companies’ collaboration, announced today at the Gamescom gaming gathering in Cologne, Germany, integrates the full suite of NVIDIA G-SYNC technologies into the world’s most popular scalers.

Gamers can expect superior image quality, unmatched motion clarity, ultra-low latency, highly accurate colors, and more cutting-edge benefits on their displays.

G-SYNC Pulsar: The Star of New Display Technologies

A highlight of this collaboration is the introduction of G-SYNC Pulsar, a new technology that offers 4x the effective motion clarity alongside a smooth and tear-free variable refresh rate (VRR) gaming experience.

G-SYNC Pulsar will debut on newly announced monitors, including the ASUS ROG Swift 360Hz PG27AQNR, Acer Predator XB273U F5 and AOC AGON PRO AG276QSG2.

These monitors, expected later this year, feature 2560×1440 resolution, a 360Hz refresh rate and HDR support.

Integrating G-SYNC into MediaTek scalers eliminates the need for a separate G-SYNC module, streamlining the production process and reducing costs.

This allows for the creation of feature-rich G-SYNC monitors at a more affordable price. And by expanding the availability of these premium gaming products to a broader audience, more gamers will be able to enjoy the best in motion clarity, image quality and performance.

Read More