How BigBasket improved AI-enabled checkout at their physical stores using Amazon SageMaker

How BigBasket improved AI-enabled checkout at their physical stores using Amazon SageMaker

This post is co-written with Santosh Waddi and Nanda Kishore Thatikonda from BigBasket.

BigBasket is India’s largest online food and grocery store. They operate in multiple ecommerce channels such as quick commerce, slotted delivery, and daily subscriptions. You can also buy from their physical stores and vending machines. They offer a large assortment of over 50,000 products across 1,000 brands, and are operating in more than 500 cities and towns. BigBasket serves over 10 million customers.

In this post, we discuss how BigBasket used Amazon SageMaker to train their computer vision model for Fast-Moving Consumer Goods (FMCG) product identification, which helped them reduce training time by approximately 50% and save costs by 20%.

Customer challenges

Today, most supermarkets and physical stores in India provide manual checkout at the checkout counter. This has two issues:

  • It requires additional manpower, weight stickers, and repeated training for the in-store operational team as they scale.
  • In most stores, the checkout counter is different from the weighing counters, which adds to the friction in the customer purchase journey. Customers often lose the weight sticker and have to go back to the weighing counters to collect one again before proceeding with the checkout process.

Self-checkout process

BigBasket introduced an AI-powered checkout system in their physical stores that uses cameras to distinguish items uniquely. The following figure provides an overview of the checkout process.

Self-Checkout

The BigBasket team was running open source, in-house ML algorithms for computer vision object recognition to power AI-enabled checkout at their Fresho (physical) stores. We were facing the following challenges to operate their existing setup:

  • With the continuous introduction of new products, the computer vision model needed to continuously incorporate new product information. The system needed to handle a large catalog of over 12,000 Stock Keeping Units (SKUs), with new SKUs being continually added at a rate of over 600 per month.
  • To keep pace with new products, a new model was produced each month using the latest training data. It was costly and time consuming to train the models frequently to adapt to new products.
  • BigBasket also wanted to reduce the training cycle time to improve the time to market. Due to increases in SKUs, the time taken by the model was increasing linearly, which impacted their time to market because the training frequency was very high and took a long time.
  • Data augmentation for model training and manually managing the complete end-to-end training cycle was adding significant overhead. BigBasket was running this on a third-party platform, which incurred significant costs.

Solution overview

We recommended that BigBasket rearchitect their existing FMCG product detection and classification solution using SageMaker to address these challenges. Before moving to full-scale production, BigBasket tried a pilot on SageMaker to evaluate performance, cost, and convenience metrics.

Their objective was to fine-tune an existing computer vision machine learning (ML) model for SKU detection. We used a convolutional neural network (CNN) architecture with ResNet152 for image classification. A sizable dataset of around 300 images per SKU was estimated for model training, resulting in over 4 million total training images. For certain SKUs, we augmented data to encompass a broader range of environmental conditions.

The following diagram illustrates the solution architecture.

Architecture

The complete process can be summarized into the following high-level steps:

  1. Perform data cleansing, annotation, and augmentation.
  2. Store data in an Amazon Simple Storage Service (Amazon S3) bucket.
  3. Use SageMaker and Amazon FSx for Lustre for efficient data augmentation.
  4. Split data into train, validation, and test sets. We used FSx for Lustre and Amazon Relational Database Service (Amazon RDS) for fast parallel data access.
  5. Use a custom PyTorch Docker container including other open source libraries.
  6. Use SageMaker Distributed Data Parallelism (SMDDP) for accelerated distributed training.
  7. Log model training metrics.
  8. Copy the final model to an S3 bucket.

BigBasket used SageMaker notebooks to train their ML models and were able to easily port their existing open source PyTorch and other open source dependencies to a SageMaker PyTorch container and run the pipeline seamlessly. This was the first benefit seen by the BigBasket team, because there were hardly any changes needed to the code to make it compatible to run on a SageMaker environment.

The model network consists of a ResNet 152 architecture followed by fully connected layers. We froze the low-level feature layers and retained the weights acquired through transfer learning from the ImageNet model. The total model parameters were 66 million, consisting of 23 million trainable parameters. This transfer learning-based approach helped them use fewer images at the time of training, and also enabled faster convergence and reduced the total training time.

Building and training the model within Amazon SageMaker Studio provided an integrated development environment (IDE) with everything needed to prepare, build, train, and tune models. Augmenting the training data using techniques like cropping, rotating, and flipping images helped improve the model training data and model accuracy.

Model training was accelerated by 50% through the use of the SMDDP library, which includes optimized communication algorithms designed specifically for AWS infrastructure. To improve data read/write performance during model training and data augmentation, we used FSx for Lustre for high-performance throughput.

Their starting training data size was over 1.5 TB. We used two Amazon Elastic Compute Cloud (Amazon EC2) p4d.24 large instances with 8 GPU and 40 GB GPU memory. For SageMaker distributed training, the instances need to be in the same AWS Region and Availability Zone. Also, training data stored in an S3 bucket needs to be in the same Availability Zone. This architecture also allows BigBasket to change to other instance types or add more instances to the current architecture to cater to any significant data growth or achieve further reduction in training time.

How the SMDDP library helped reduce training time, cost, and complexity

In traditional distributed data training, the training framework assigns ranks to GPUs (workers) and creates a replica of your model on each GPU. During each training iteration, the global data batch is divided into pieces (batch shards) and a piece is distributed to each worker. Each worker then proceeds with the forward and backward pass defined in your training script on each GPU. Finally, model weights and gradients from the different model replicas are synced at the end of the iteration through a collective communication operation called AllReduce. After each worker and GPU has a synced replica of the model, the next iteration begins.

The SMDDP library is a collective communication library that improves the performance of this distributed data parallel training process. The SMDDP library reduces the communication overhead of the key collective communication operations such as AllReduce. Its implementation of AllReduce is designed for AWS infrastructure and can speed up training by overlapping the AllReduce operation with the backward pass. This approach achieves near-linear scaling efficiency and faster training speed by optimizing kernel operations between CPUs and GPUs.

Note the following calculations:

  • The size of the global batch is (number of nodes in a cluster) * (number of GPUs per node) * (per batch shard)
  • A batch shard (small batch) is a subset of the dataset assigned to each GPU (worker) per iteration

BigBasket used the SMDDP library to reduce their overall training time. With FSx for Lustre, we reduced the data read/write throughput during model training and data augmentation. With data parallelism, BigBasket was able to achieve almost 50% faster and 20% cheaper training compared to other alternatives, delivering the best performance on AWS. SageMaker automatically shuts down the training pipeline post-completion. The project completed successfully with 50% faster training time in AWS (4.5 days in AWS vs. 9 days on their legacy platform).

At the time of writing this post, BigBasket has been running the complete solution in production for more than 6 months and scaling the system by catering to new cities, and we’re adding new stores every month.

“Our partnership with AWS on migration to distributed training using their SMDDP offering has been a great win. Not only did it cut down our training times by 50%, it was also 20% cheaper. In our entire partnership, AWS has set the bar on customer obsession and delivering results—working with us the whole way to realize promised benefits.”

– Keshav Kumar, Head of Engineering at BigBasket.

Conclusion

In this post, we discussed how BigBasket used SageMaker to train their computer vision model for FMCG product identification. The implementation of an AI-powered automated self-checkout system delivers an improved retail customer experience through innovation, while eliminating human errors in the checkout process. Accelerating new product onboarding by using SageMaker distributed training reduces SKU onboarding time and cost. Integrating FSx for Lustre enables fast parallel data access for efficient model retraining with hundreds of new SKUs monthly. Overall, this AI-based self-checkout solution provides an enhanced shopping experience devoid of frontend checkout errors. The automation and innovation have transformed their retail checkout and onboarding operations.

SageMaker provides end-to-end ML development, deployment, and monitoring capabilities such as a SageMaker Studio notebook environment for writing code, data acquisition, data tagging, model training, model tuning, deployment, monitoring, and much more. If your business is facing any of the challenges described in this post and wants to save time to market and improve cost, reach out to the AWS account team in your Region and get started with SageMaker.


About the Authors

Santosh-waddiSantosh Waddi is a Principal Engineer at BigBasket, brings over a decade of expertise in solving AI challenges. With a strong background in computer vision, data science, and deep learning, he holds a postgraduate degree from IIT Bombay. Santosh has authored notable IEEE publications and, as a seasoned tech blog author, he has also made significant contributions to the development of computer vision solutions during his tenure at Samsung.

nandaNanda Kishore Thatikonda is an Engineering Manager leading the Data Engineering and Analytics at BigBasket. Nanda has built multiple applications for anomaly detection and has a patent filed in a similar space. He has worked on building enterprise-grade applications, building data platforms in multiple organizations and reporting platforms to streamline decisions backed by data. Nanda has over 18 years of experience working in Java/J2EE, Spring technologies, and big data frameworks using Hadoop and Apache Spark.

Sudhanshu Hate is a Principal AI & ML Specialist with AWS and works with clients to advise them on their MLOps and generative AI journey. In his previous role, he conceptualized, created, and led teams to build a ground-up, open source-based AI and gamification platform, and successfully commercialized it with over 100 clients. Sudhanshu has to his credit a couple of patents; has written 2 books, several papers, and blogs; and has presented his point of view in various forums. He has been a thought leader and speaker, and has been in the industry for nearly 25 years. He has worked with Fortune 1000 clients across the globe and most recently is working with digital native clients in India.

Ayush Kumar is Solutions Architect at AWS. He is working with a wide variety of AWS customers, helping them adopt the latest modern applications and innovate faster with cloud-native technologies. You’ll find him experimenting in the kitchen in his spare time.

Read More

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used repeatedly by multiple teams, and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it’s hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store to process, standardize, and use features at scale across the ML lifecycle.

SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. This new capability promotes collaboration and minimizes duplicate work for teams involved in ML model and application development, particularly in enterprise environments with multiple accounts spanning different business units or functions.

With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). After they’re granted access, users of those accounts can conveniently view all of their feature groups, including the shared ones, through Amazon SageMaker Studio or SDKs. This enables teams to discover and utilize features developed by other teams, fostering knowledge sharing and efficiency. Additionally, usage details of shared resources can be monitored with Amazon CloudWatch and AWS CloudTrail. For a deep dive, refer to Cross account feature group discoverability and access.

In this post, we discuss the why and how of a centralized feature store with cross-account access. We show how to set it up and run a sample demonstration, as well as the benefits you can get by using this new capability in your organization.

Who needs a cross-account feature store

Organizations need to securely share features across teams to build accurate ML models, while preventing unauthorized access to sensitive data. SageMaker Feature Store now allows granular sharing of features across accounts via AWS RAM, enabling collaborative model development with governance.

SageMaker Feature Store provides purpose-built storage and management for ML features used during training and inferencing. With cross-account support, you can now selectively share features stored in one AWS account with other accounts in your organization.

For example, the analytics team may curate features like customer profile, transaction history, and product catalogs in a central management account. These need to be securely accessed by ML developers in other departments like marketing, fraud detection, and so on to build models.

The following are key benefits of sharing ML features across accounts:

  • Consistent and reusable features – Centralized sharing of curated features improves model accuracy by providing consistent input data to train on. Teams can discover and directly consume features created by others instead of duplicating them in each account.
  • Feature group access control – You can grant access to only the specific feature groups required for an account’s use case. For example, the marketing team may only get access to the customer profile feature group needed for recommendation models.
  • Collaboration across teams – Shared features allow disparate teams like fraud, marketing, and sales to collaborate on building ML models using the same reliable data instead of creating siloed features.
  • Audit trail for compliance – Administrators can monitor feature usage by all accounts centrally using CloudTrail event logs. This provides an audit trail required for governance and compliance.

Delineating producers from consumers in cross-account feature stores

In the realm of machine learning, the feature store acts as a crucial bridge, connecting those who supply data with those who harness it. This dichotomy can be effectively managed using a cross-account setup for the feature store. Let’s demystify this using the following personas and a real-world analogy:

  • Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store
  • Data scientists (consumers) – They extract and utilize this data to craft their models

Data engineers serve as architects sketching the initial blueprint. Their task is to construct and oversee efficient data pipelines. Drawing data from source systems, they mold raw data attributes into discernable features. Take “age” for instance. Although it merely represents the span between now and one’s birthdate, its interpretation might vary across an organization. Ensuring quality, uniformity, and consistency is paramount here. Their aim is to feed data into a centralized feature store, establishing it as the undisputed reference point.

ML engineers refine these foundational features, tailoring them for mature ML workflows. In the context of banking, they might deduce statistical insights from account balances, identifying trends and flow patterns. The hurdle they often face is redundancy. It’s common to see repetitive feature creation pipelines across diverse ML initiatives.

Imagine data scientists as gourmet chefs scouting a well-stocked pantry, seeking the best ingredients for their next culinary masterpiece. Their time should be invested in crafting innovative data recipes, not in reassembling the pantry. The hurdle at this juncture is discovering the right data. A user-friendly interface, equipped with efficient search tools and comprehensive feature descriptions, is indispensable.

In essence, a cross-account feature store setup meticulously segments the roles of data producers and consumers, ensuring efficiency, clarity, and innovation. Whether you’re laying the foundation or building atop it, knowing your role and tools is pivotal.

The following diagram shows two different data scientist teams, from two different AWS accounts, who share and use the same central feature store to select the best features needed to build their ML models. The central feature store is located in a different account managed by data engineers and ML engineers, where the data governance layer and data lake are usually situated.

Cross-account feature group controls

With SageMaker Feature Store, you can share feature group resources across accounts. The resource owner account shares resources with the resource consumer accounts. There are two distinct categories of permissions associated with sharing resources:

  • Discoverability permissionsDiscoverability means being able to see feature group names and metadata. When you grant discoverability permission, all feature group entities in the account that you share from (resource owner account) become discoverable by the accounts that you are sharing with (resource consumer accounts). For example, if you make the resource owner account discoverable by the resource consumer account, then principals of the resource consumer account can see all feature groups contained in the resource owner account. This permission is granted to resource consumer accounts by using the SageMaker catalog resource type.
  • Access permissions – When you grant an access permission, you do so at the feature group resource level (not the account level). This gives you more granular control over granting access to data. The type of access permissions that can be granted are read-only, read/write, and admin. For example, you can select only certain feature groups from the resource owner account to be accessible by principals of the resource consumer account, depending on your business needs. This permission is granted to resource consumer accounts by using the feature group resource type and specifying feature group entities.

The following example diagram visualizes sharing the SageMaker catalog resource type granting the discoverability permission vs. sharing a feature group resource type entity with access permissions. The SageMaker catalog contains all of your feature group entities. When granted a discoverability permission, the resource consumer account can search and discover all feature group entities within the resource owner account. A feature group entity contains your ML data. When granted an access permission, the resource consumer account can access the feature group data, with access determined by the relevant access permission.

Solution overview

Complete the following steps to securely share features between accounts using SageMaker Feature Store:

  1. In the source (owner) account, ingest datasets and prepare normalized features. Organize related features into logical groups called feature groups.
  2. Create a resource share to grant cross-account access to specific feature groups. Define allowed actions like get and put, and restrict access only to authorized accounts.
  3. In the target (consumer) accounts, accept the AWS RAM invitation to access shared features. Review the access policy to understand permissions granted.

Developers in target accounts can now retrieve shared features using the SageMaker SDK, join with additional data, and use them to train ML models. The source account can monitor access to shared features by all accounts using CloudTrail event logs. Audit logs provide centralized visibility into feature usage.

With these steps, you can enable teams across your organization to securely use shared ML features for collaborative model development.

Prerequisites

We assume that you have already created feature groups and ingested the corresponding features inside your owner account. For more information about getting started, refer to Get started with Amazon SageMaker Feature Store.

Grant discoverability permissions

First, we demonstrate how to share our SageMaker Feature Store catalog in the owner account. Complete the following steps:

  1. In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
  2. Under Shared by me in the navigation pane, choose Resource shares.
  3. Choose Create resource share.
  4. Enter a resource share name and choose SageMaker Resource Catalogs as the resource type.
  5. Choose Next.
  6. For discoverability-only access, enter AWSRAMPermissionSageMakerCatalogResourceSearch for Managed permissions.
  7. Choose Next.
  8. Enter your consumer account ID and choose Add. You may add several consumer accounts.
  9. Choose Next and complete your resource share.

Now the shared SageMaker Feature Store catalog should show up on the Resource shares page.

You can achieve the same result by using the AWS Command Line Interface (AWS CLI) with the following command (provide your AWS Region, owner account ID, and consumer account ID):

aws ram create-resource-share 
  --name MyCatalogFG 
  --resource-arns arn:aws:sagemaker:REGION:OWNERACCOUNTID:sagemaker-catalog/DefaultFeatureGroupCatalog 
  --principals CONSACCOUNTID 
  --permission-arns arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerCatalogResourceSearch

Accept the resource share invite

To accept the resource share invite, complete the following steps:

  1. In the target (consumer) account, open the AWS RAM console.
  2. Under Shared with me in the navigation pane, choose Resource shares.
  3. Choose the new pending resource share.
  4. Choose Accept resource share.

You can achieve the same result using the AWS CLI with the following command:

aws ram get-resource-share-invitations

From the output of preceding command, retrieve the value of resourceShareInvitationArn and then accept the invitation with the following command:

aws ram accept-resource-share-invitation 
--resource-share-invitation-arn RESOURCESHAREINVITATIONARN

The workflow is the same for sharing feature groups with another account via AWS RAM.

After you share some feature groups with the target account, you can inspect the SageMaker Feature Store, where you can observe that the new catalog is available.

Grant access permissions

With access permissions, we can grant permissions at the feature group resource level. Complete the following steps:

  1. In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
  2. Under Shared by me in the navigation pane, choose Resource shares.
  3. Choose Create resource share.
  4. Enter a resource share name and choose SageMaker Feature Groups as the resource type.
  5. Select one or more feature groups to share.
  6. Choose Next.
  7. For read/write access, enter AWSRAMPermissionSageMakerFeatureGroupReadWrite for Managed permissions.
  8. Choose Next.
  9. Enter your consumer account ID and choose Add. You may add several consumer accounts.
  10. Choose Next and complete your resource share.

Now the shared catalog should show up on the Resource shares page.

You can achieve the same result by using the AWS CLI with the following command (provide your Region, owner account ID, consumer account ID, and feature group name):

aws ram create-resource-share 
  --name MyCatalogFG 
  --resource-arns arn:aws:sagemaker:REGION:OWNERACCOUNTID:feature-group/FEATUREGROUPNAME 
  --principals CONSACCOUNTID 
  --permission-arns arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerFeatureGroupReadWrite

There are three types of access that you can grant to feature groups:

  • AWSRAMPermissionSageMakerFeatureGroupReadOnly – The read-only privilege allows resource consumer accounts to read records in the shared feature groups and view details and metadata
  • AWSRAMPermissionSageMakerFeatureGroupReadWrite – The read/write privilege allows resource consumer accounts to write records to, and delete records from, the shared feature groups, in addition to read permissions
  • AWSRAMPermissionSagemakerFeatureGroupAdmin – The admin privilege allows the resource consumer accounts to update the description and parameters of features within the shared feature groups and update the configuration of the shared feature groups, in addition to read/write permissions

Accept the resource share invite

To accept the resource share invite, complete the following steps:

  1. In the target (consumer) account, open the AWS RAM console.
  2. Under Shared with me in the navigation pane, choose Resource shares.
  3. Choose the new pending resource share.
  4. Choose Accept resource share.

The process of accepting the resource share using the AWS CLI is the same as for the previous discoverability section, with the get-resource-share-invitations and accept-resource-share-invitation commands.

Sample notebooks showcasing this new capability

Two notebooks were added to the SageMaker Feature Store Workshop GitHub repository in the folder 09-module-security/09-03-cross-account-access:

  • m9_03_nb1_cross-account-admin.ipynb – This needs to be launched on your admin or owner AWS account
  • m9_03_nb2_cross-account-consumer.ipynb – This needs to be launched on your consumer AWS account

The first script shows how to create the discoverability resource share for existing feature groups at the admin or owner account and share it with another consumer account programmatically using the AWS RAM API create_resource_share(). It also shows how to grant access permissions to existing feature groups at the owner account and share these with another consumer account using AWS RAM. You need to provide your consumer AWS account ID before running the notebook.

The second script accepts the AWS RAM invitations to discover and access cross-account feature groups from the owner level. Then it shows how to discover cross-account feature groups that are on the owner account and list these on the consumer account. You can also see how to access in read/write cross-account feature groups that are on the owner account and perform the following operations from the consumer account: describe(), get_record(), ingest(), and delete_record().

Conclusion

The SageMaker Feature Store cross-account capability offers several compelling benefits. Firstly, it facilitates seamless collaboration by enabling sharing of feature groups across multiple AWS accounts. This enhances data accessibility and utilization, allowing teams in different accounts to use shared features for their ML workflows.

Additionally, the cross-account capability enhances data governance and security. With controlled access and permissions through AWS RAM, organizations can maintain a centralized feature store while ensuring that each account has tailored access levels. This not only streamlines data management, but also strengthens security measures by limiting access to authorized users.

Furthermore, the ability to share feature groups across accounts simplifies the process of building and deploying ML models in a collaborative environment. It fosters a more integrated and efficient workflow, reducing redundancy in data storage and facilitating the creation of robust models with shared, high-quality features. Overall, the Feature Store’s cross-account capability optimizes collaboration, governance, and efficiency in ML development across diverse AWS accounts. Give it a try, and let us know what you think in the comments.


About the Authors

Ioan Catana is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He helps customers develop and scale their ML solutions in the AWS Cloud. Ioan has over 20 years of experience, mostly in software architecture design and cloud engineering.

Philipp Kaindl is a Senior Artificial Intelligence and Machine Learning Solutions Architect at AWS. With a background in data science and mechanical engineering, his focus is on empowering customers to create lasting business impact with the help of AI. Outside of work, Philipp enjoys tinkering with 3D printers, sailing, and hiking.

Dhaval Shah is a Senior Solutions Architect at AWS, specializing in machine learning. With a strong focus on digital native businesses, he empowers customers to use AWS and drive their business growth. As an ML enthusiast, Dhaval is driven by his passion for creating impactful solutions that bring positive change. In his leisure time, he indulges in his love for travel and cherishes quality moments with his family.

Mizanur Rahman is a Senior Software Engineer for Amazon SageMaker Feature Store with over 10 years of hands-on experience specializing in AI and ML. With a strong foundation in both theory and practical applications, he holds a Ph.D. in Fraud Detection using Machine Learning, reflecting his dedication to advancing the field. His expertise spans a broad spectrum, encompassing scalable architectures, distributed computing, big data analytics, micro services and cloud infrastructures for organizations.

Read More

Say What? Chat With RTX Brings Custom Chatbot to NVIDIA RTX AI PCs

Say What? Chat With RTX Brings Custom Chatbot to NVIDIA RTX AI PCs

Chatbots are used by millions of people around the world every day, powered by NVIDIA GPU-based cloud servers. Now, these groundbreaking tools are coming to Windows PCs powered by NVIDIA RTX for local, fast, custom generative AI.

Chat with RTX, now free to download, is a tech demo that lets users personalize a chatbot with their own content, accelerated by a local NVIDIA GeForce RTX 30 Series GPU or higher with at least 8GB of video random access memory, or VRAM.

Ask Me Anything

Chat with RTX uses retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM software and NVIDIA RTX acceleration to bring generative AI capabilities to local, GeForce-powered Windows PCs. Users can quickly, easily connect local files on a PC as a dataset to an open-source large language model like Mistral or Llama 2, enabling queries for quick, contextually relevant answers.

Rather than searching through notes or saved content, users can simply type queries. For example, one could ask, “What was the restaurant my partner recommended while in Las Vegas?” and Chat with RTX will scan local files the user points it to and provide the answer with context.

The tool supports various file formats, including .txt, .pdf, .doc/.docx and .xml. Point the application at the folder containing these files, and the tool will load them into its library in just seconds.

Users can also include information from YouTube videos and playlists. Adding a video URL to Chat with RTX allows users to integrate this knowledge into their chatbot for contextual queries. For example, ask for travel recommendations based on content from favorite influencer videos, or get quick tutorials and how-tos based on top educational resources.

Chat with RTX can integrate knowledge from YouTube videos into queries.

Since Chat with RTX runs locally on Windows RTX PCs and workstations, the provided results are fast — and the user’s data stays on the device. Rather than relying on cloud-based LLM services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection.

In addition to a GeForce RTX 30 Series GPU or higher with a minimum 8GB of VRAM, Chat with RTX requires Windows 10 or 11, and the latest NVIDIA GPU drivers.

Develop LLM-Based Applications With RTX

Chat with RTX shows the potential of accelerating LLMs with RTX GPUs. The app is built from the TensorRT-LLM RAG developer reference project, available on GitHub. Developers can use the reference project to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM. Learn more about building LLM-based applications.

Enter a generative AI-powered Windows app or plug-in to the NVIDIA Generative AI on NVIDIA RTX developer contest, running through Friday, Feb. 23, for a chance to win prizes such as a GeForce RTX 4090 GPU, a full, in-person conference pass to NVIDIA GTC and more.

Learn more about Chat with RTX.

Read More

Resource-constrained Stereo Singing Voice Cancellation

We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large state-of-the-art source separation networks starting from a small, efficient model for real-time speech separation. Such a model is useful when memory and compute are limited and singing voice processing has to run with limited look-ahead. In practice, this is realised by adapting an existing mono model to handle stereo input. Improvements in quality are obtained by tuning…Apple Machine Learning Research

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

How Booking.com modernized its ML experimentation framework with Amazon SageMaker

This post is co-written with Kostia Kofman and Jenny Tokar from Booking.com.

As a global leader in the online travel industry, Booking.com is always seeking innovative ways to enhance its services and provide customers with tailored and seamless experiences. The Ranking team at Booking.com plays a pivotal role in ensuring that the search and recommendation algorithms are optimized to deliver the best results for their users.

Sharing in-house resources with other internal teams, the Ranking team machine learning (ML) scientists often encountered long wait times to access resources for model training and experimentation – challenging their ability to rapidly experiment and innovate. Recognizing the need for a modernized ML infrastructure, the Ranking team embarked on a journey to use the power of Amazon SageMaker to build, train, and deploy ML models at scale.

Booking.com collaborated with AWS Professional Services to build a solution to accelerate the time-to-market for improved ML models through the following improvements:

  • Reduced wait times for resources for training and experimentation
  • Integration of essential ML capabilities such as hyperparameter tuning
  • A reduced development cycle for ML models

Reduced wait times would mean that the team could quickly iterate and experiment with models, gaining insights at a much faster pace. Using SageMaker on-demand available instances allowed for a tenfold wait time reduction. Essential ML capabilities such as hyperparameter tuning and model explainability were lacking on premises. The team’s modernization journey introduced these features through Amazon SageMaker Automatic Model Tuning and Amazon SageMaker Clarify. Finally, the team’s aspiration was to receive immediate feedback on each change made in the code, reducing the feedback loop from minutes to an instant, and thereby reducing the development cycle for ML models.

In this post, we delve into the journey undertaken by the Ranking team at Booking.com as they harnessed the capabilities of SageMaker to modernize their ML experimentation framework. By doing so, they not only overcame their existing challenges, but also improved their search experience, ultimately benefiting millions of travelers worldwide.

Approach to modernization

The Ranking team consists of several ML scientists who each need to develop and test their own model offline. When a model is deemed successful according to the offline evaluation, it can be moved to production A/B testing. If it shows online improvement, it can be deployed to all the users.

The goal of this project was to create a user-friendly environment for ML scientists to easily run customizable Amazon SageMaker Model Building Pipelines to test their hypotheses without the need to code long and complicated modules.

One of the several challenges faced was adapting the existing on-premises pipeline solution for use on AWS. The solution involved two key components:

  • Modifying and extending existing code – The first part of our solution involved the modification and extension of our existing code to make it compatible with AWS infrastructure. This was crucial in ensuring a smooth transition from on-premises to cloud-based processing.
  • Client package development – A client package was developed that acts as a wrapper around SageMaker APIs and the previously existing code. This package combines the two, enabling ML scientists to easily configure and deploy ML pipelines without coding.

SageMaker pipeline configuration

Customizability is key to the model building pipeline, and it was achieved through config.ini, an extensive configuration file. This file serves as the control center for all inputs and behaviors of the pipeline.

Available configurations inside config.ini include:

  • Pipeline details – The practitioner can define the pipeline’s name, specify which steps should run, determine where outputs should be stored in Amazon Simple Storage Service (Amazon S3), and select which datasets to use
  • AWS account details – You can decide which Region the pipeline should run in and which role should be used
  • Step-specific configuration – For each step in the pipeline, you can specify details such as the number and type of instances to use, along with relevant parameters

The following code shows an example configuration file:

[BUILD]
pipeline_name = ranking-pipeline
steps = DATA_TRANFORM, TRAIN, PREDICT, EVALUATE, EXPLAIN, REGISTER, UPLOAD
train_data_s3_path = s3://...
...
[AWS_ACCOUNT]
region = eu-central-1
...
[DATA_TRANSFORM_PARAMS]
input_data_s3_path = s3://...
compression_type = GZIP
....
[TRAIN_PARAMS]
instance_count = 3
instance_type = ml.g5.4xlarge
epochs = 1
enable_sagemaker_debugger = True
...
[PREDICT_PARAMS]
instance_count = 3
instance_type = ml.g5.4xlarge
...
[EVALUATE_PARAMS]
instance_type = ml.m5.8xlarge
batch_size = 2048
...
[EXPLAIN_PARAMS]
check_job_instance_type = ml.c5.xlarge
generate_baseline_with_clarify = False
....

config.ini is a version-controlled file managed by Git, representing the minimal configuration required for a successful training pipeline run. During development, local configuration files that are not version-controlled can be utilized. These local configuration files only need to contain settings relevant to a specific run, introducing flexibility without complexity. The pipeline creation client is designed to handle multiple configuration files, with the latest one taking precedence over previous settings.

SageMaker pipeline steps

The pipeline is divided into the following steps:

  • Train and test data preparation – Terabytes of raw data are copied to an S3 bucket, processed using AWS Glue jobs for Spark processing, resulting in data structured and formatted for compatibility.
  • Train – The training step uses the TensorFlow estimator for SageMaker training jobs. Training occurs in a distributed manner using Horovod, and the resulting model artifact is stored in Amazon S3. For hyperparameter tuning, a hyperparameter optimization (HPO) job can be initiated, selecting the best model based on the objective metric.
  • Predict – In this step, a SageMaker Processing job uses the stored model artifact to make predictions. This process runs in parallel on available machines, and the prediction results are stored in Amazon S3.
  • Evaluate – A PySpark processing job evaluates the model using a custom Spark script. The evaluation report is then stored in Amazon S3.
  • Condition – After evaluation, a decision is made regarding the model’s quality. This decision is based on a condition metric defined in the configuration file. If the evaluation is positive, the model is registered as approved; otherwise, it’s registered as rejected. In both cases, the evaluation and explainability report, if generated, are recorded in the model registry.
  • Package model for inference – Using a processing job, if the evaluation results are positive, the model is packaged, stored in Amazon S3, and made ready for upload to the internal ML portal.
  • Explain – SageMaker Clarify generates an explainability report.

Two distinct repositories are used. The first repository contains the definition and build code for the ML pipeline, and the second repository contains the code that runs inside each step, such as processing, training, prediction, and evaluation. This dual-repository approach allows for greater modularity, and enables science and engineering teams to iterate independently on ML code and ML pipeline components.

The following diagram illustrates the solution workflow.

Automatic model tuning

Training ML models requires an iterative approach of multiple training experiments to build a robust and performant final model for business use. The ML scientists have to select the appropriate model type, build the correct input datasets, and adjust the set of hyperparameters that control the model learning process during training.

The selection of appropriate values for hyperparameters for the model training process can significantly influence the final performance of the model. However, there is no unique or defined way to determine which values are appropriate for a specific use case. Most of the time, ML scientists will need to run multiple training jobs with slightly different sets of hyperparameters, observe the model training metrics, and then try to select more promising values for the next iteration. This process of tuning model performance is also known as hyperparameter optimization (HPO), and can at times require hundreds of experiments.

The Ranking team used to perform HPO manually in their on-premises environment because they could only launch a very limited number of training jobs in parallel. Therefore, they had to run HPO sequentially, test and select different combinations of hyperparameter values manually, and regularly monitor progress. This prolonged the model development and tuning process and limited the overall number of HPO experiments that could run in a feasible amount of time.

With the move to AWS, the Ranking team was able to use the automatic model tuning (AMT) feature of SageMaker. AMT enables Ranking ML scientists to automatically launch hundreds of training jobs within hyperparameter ranges of interest to find the best performing version of the final model according to the chosen metric. The Ranking team is now able choose between four different automatic tuning strategies for their hyperparameter selection:

  • Grid search – AMT will expect all hyperparameters to be categorical values, and it will launch training jobs for each distinct categorical combination, exploring the entire hyperparameter space.
  • Random search – AMT will randomly select hyperparameter values combinations within provided ranges. Because there is no dependency between different training jobs and parameter value selection, multiple parallel training jobs can be launched with this method, speeding up the optimal parameter selection process.
  • Bayesian optimization – AMT uses Bayesian optimization implementation to guess the best set of hyperparameter values, treating it as a regression problem. It will consider previously tested hyperparameter combinations and its impact on the model training jobs with the new parameter selection, optimizing for smarter parameter selection with fewer experiments, but it will also launch training jobs only sequentially to always be able to learn from previous trainings.
  • Hyperband – AMT will use intermediate and final results of the training jobs it’s running to dynamically reallocate resources towards training jobs with hyperparameter configurations that show more promising results while automatically stopping those that underperform.

AMT on SageMaker enabled the Ranking team to reduce the time spent on the hyperparameter tuning process for their model development by enabling them for the first time to run multiple parallel experiments, use automatic tuning strategies, and perform double-digit training job runs within days, something that wasn’t feasible on premises.

Model explainability with SageMaker Clarify

Model explainability enables ML practitioners to understand the nature and behavior of their ML models by providing valuable insights for feature engineering and selection decisions, which in turn improves the quality of the model predictions. The Ranking team wanted to evaluate their explainability insights in two ways: understand how feature inputs affect model outputs across their entire dataset (global interpretability), and also be able to discover input feature influence for a specific model prediction on a data point of interest (local interpretability). With this data, Ranking ML scientists can make informed decisions on how to further improve their model performance and account for the challenging prediction results that the model would occasionally provide.

SageMaker Clarify enables you to generate model explainability reports using Shapley Additive exPlanations (SHAP) when training your models on SageMaker, supporting both global and local model interpretability. In addition to model explainability reports, SageMaker Clarify supports running analyses for pre-training bias metrics, post-training bias metrics, and partial dependence plots. The job will be run as a SageMaker Processing job within the AWS account and it integrates directly with the SageMaker pipelines.

The global interpretability report will be automatically generated in the job output and displayed in the Amazon SageMaker Studio environment as part of the training experiment run. If this model is then registered in SageMaker model registry, the report will be additionally linked to the model artifact. Using both of these options, the Ranking team was able to easily track back different model versions and their behavioral changes.

To explore input feature impact on a single prediction (local interpretability values), the Ranking team enabled the parameter save_local_shap_values in the SageMaker Clarify jobs and was able to load them from the S3 bucket for further analyses in the Jupyter notebooks in SageMaker Studio.

The preceding images show an example of how a model explainability would look like for an arbitrary ML model.

Training optimization

The rise of deep learning (DL) has led to ML becoming increasingly reliant on computational power and vast amounts of data. ML practitioners commonly face the hurdle of efficiently using resources when training these complex models. When you run training on large compute clusters, various challenges arise in optimizing resource utilization, including issues like I/O bottlenecks, kernel launch delays, memory constraints, and underutilized resources. If the configuration of the training job is not fine-tuned for efficiency, these obstacles can result in suboptimal hardware usage, prolonged training durations, or even incomplete training runs. These factors increase project costs and delay timelines.

Profiling of CPU and GPU usage helps understand these inefficiencies, determine the hardware resource consumption (time and memory) of the various TensorFlow operations in your model, resolve performance bottlenecks, and, ultimately, make the model run faster.

Ranking team used the framework profiling feature of Amazon SageMaker Debugger (now deprecated in favor of Amazon SageMaker Profiler) to optimize these training jobs. This allows you to track all activities on CPUs and GPUs, such as CPU and GPU utilizations, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs.

Ranking team also used the TensorFlow Profiler feature of TensorBoard, which further helped profile the TensorFlow model training. SageMaker is now further integrated with TensorBoard and brings the visualization tools of TensorBoard to SageMaker, integrated with SageMaker training and domains. TensorBoard allows you to perform model debugging tasks using the TensorBoard visualization plugins.

With the help of these two tools, Ranking team optimized the their TensorFlow model and were able to identify bottlenecks and reduce the average training step time from 350 milliseconds to 140 milliseconds on CPU and from 170 milliseconds to 70 milliseconds on GPU, speedups of 60% and 59%, respectively.

Business outcomes

The migration efforts centered around enhancing availability, scalability, and elasticity, which collectively brought the ML environment towards a new level of operational excellence, exemplified by the increased model training frequency and decreased failures, optimized training times, and advanced ML capabilities.

Model training frequency and failures

The number of monthly model training jobs increased fivefold, leading to significantly more frequent model optimizations. Furthermore, the new ML environment led to a reduction in the failure rate of pipeline runs, dropping from approximately 50% to 20%. The failed job processing time decreased drastically, from over an hour on average to a negligible 5 seconds. This has strongly increased operational efficiency and decreased resource wastage.

Optimized training time

The migration brought with it efficiency increases through SageMaker-based GPU training. This shift decreased model training time to a fifth of its previous duration. Previously, the training processes for deep learning models consumed around 60 hours on CPU; this was streamlined to approximately 12 hours on GPU. This improvement not only saves time but also expedites the development cycle, enabling faster iterations and model improvements.

Advanced ML capabilities

Central to the migration’s success is the use of the SageMaker feature set, encompassing hyperparameter tuning and model explainability. Furthermore, the migration allowed for seamless experiment tracking using Amazon SageMaker Experiments, enabling more insightful and productive experimentation.

Most importantly, the new ML experimentation environment supported the successful development of a new model that is now in production. This model is deep learning rather than tree-based and has introduced noticeable improvements in online model performance.

Conclusion

This post provided an overview of the AWS Professional Services and Booking.com collaboration that resulted in the implementation of a scalable ML framework and successfully reduced the time-to-market of ML models of their Ranking team.

The Ranking team at Booking.com learned that migrating to the cloud and SageMaker has proved beneficial, and that adapting machine learning operations (MLOps) practices allows their ML engineers and scientists to focus on their craft and increase development velocity. The team is sharing the learnings and work done with the entire ML community at Booking.com, through talks and dedicated sessions with ML practitioners where they share the code and capabilities. We hope this post can serve as another way to share the knowledge.

AWS Professional Services is ready to help your team develop scalable and production-ready ML in AWS. For more information, see AWS Professional Services or reach out through your account manager to get in touch.


About the Authors

Laurens van der Maas is a Machine Learning Engineer at AWS Professional Services. He works closely with customers building their machine learning solutions on AWS, specializes in distributed training, experimentation and responsible AI, and is passionate about how machine learning is changing the world as we know it.

Daniel Zagyva is a Data Scientist at AWS Professional Services. He specializes in developing scalable, production-grade machine learning solutions for AWS customers. His experience extends across different areas, including natural language processing, generative AI and machine learning operations.

Kostia Kofman is a Senior Machine Learning Manager at Booking.com, leading the Search Ranking ML team, overseeing Booking.com’s most extensive ML system. With expertise in Personalization and Ranking, he thrives on leveraging cutting-edge technology to enhance customer experiences.

Jenny Tokar is a Senior Machine Learning Engineer at Booking.com’s Search Ranking team. She specializes in developing end-to-end ML pipelines characterized by efficiency, reliability, scalability, and innovation. Jenny’s expertise empowers her team to create cutting-edge ranking models that serve millions of users every day.

Aleksandra Dokic is a Senior Data Scientist at AWS Professional Services. She enjoys supporting customers to build innovative AI/ML solutions on AWS and she is excited about business transformations through the power of data.

Luba Protsiva is an Engagement Manager at AWS Professional Services. She specializes in delivering Data and GenAI/ML solutions that enable AWS customers to maximize their business value and accelerate speed of innovation.

Read More

NVIDIA CEO: Every Country Needs Sovereign AI

NVIDIA CEO: Every Country Needs Sovereign AI

Every country needs to own the production of their own intelligence, NVIDIA founder and CEO Jensen Huang told attendees Monday at the World Governments Summit in Dubai.

Huang, who spoke as part of a fireside chat with the UAE’s Minister of AI, His Excellency Omar Al Olama, described sovereign AI — which emphasizes a country’s ownership over its data and the intelligence it produces — as an enormous opportunity for the world’s leaders.

“It codifies your culture, your society’s intelligence, your common sense, your history – you own your own data,” Huang told Al Olama during their conversation, a highlight of an event attended by more than 4,000 delegates from 150 countries.

“We completely subscribe to that vision,” Al Olama said. “That’s why the UAE is moving aggressively on creating large language models and mobilizing compute.”

Huang’s appearance in the UAE comes as the Gulf State is moving rapidly to transform itself from an energy powerhouse into a global information technology hub.

Dubai is the latest stop for Huang in a global tour that has included meetings with leaders in Canada, France, India, Japan, Malaysia, Singapore and Vietnam over the past six months.

The Middle East is poised to reap significant benefits from AI, with PwC projecting a $320 billion boost to the region’s economy by 2030.

At Monday’s summit, Huang urged leaders not to be “mystified” by AI. AI’s unprecedented ability to take directions from ordinary humans makes it critical for countries to embrace AI, infusing it with local languages and expertise.

In response to Al Olama’s question about how he might approach AI if he were the leader of a developing nation, Huang emphasized the importance of building infrastructure.

“It’s not that costly, it is also not that hard,” Huang said. “The first thing that I would do, of course, is I would codify the language, the data of your culture into your own large language model.”

And as AI and accelerated computing has developed, NVIDIA GPUs have become a platform for one innovation after another.

“NVIDIA GPU is the only platform that’s available to everybody on any platform,” Huang said. “This ubiquity has not only democratized AI but facilitated a wave of innovation that spans from cloud computing to autonomous systems and beyond.

All of this promises to unleash new kinds of innovations that go beyond what’s traditionally been thought of as information technology.

Huang even countered advice offered by many visionaries over the years who urged young people to study computer science in order to compete in the information age. No longer.

“In fact, it’s almost exactly the opposite,” Huang said. “It is our job to create computing technologies that nobody has to program and that the programming language is human: everybody in the world is now a programmer — that is the miracle.”

In a move that further underscores the regional momentum behind AI, Moro Hub, a subsidiary of Digital DEWA, the digital arm of the Dubai Electricity and Water Authority, focused on providing cloud services, cybersecurity and smart city solutions, announced Monday it has agreed to build a green data center with NVIDIA.

In addition to the fireside chat, the summit featured panels on smart mobility, sustainable development and more, showcasing the latest in AI advancements. Later in the evening, Huang and Al Olama took the stage at the ‘Get Inspired’ ecosystem event, organized by the UAE’s AI Office, featuring 280 attendees including developers, start-ups, and others.

Read More

NVIDIA RTX 2000 Ada Generation GPU Brings Performance, Versatility for Next Era of AI-Accelerated Design and Visualization

NVIDIA RTX 2000 Ada Generation GPU Brings Performance, Versatility for Next Era of AI-Accelerated Design and Visualization

Generative AI is driving change across industries — and to take advantage of its benefits, businesses must select the right hardware to power their workflows.

The new NVIDIA RTX 2000 Ada Generation GPU delivers the latest AI, graphics and compute technology to compact workstations, offering up to 1.5x the performance of the previous-generation RTX A2000 12GB in professional workflows.

From crafting stunning 3D environments to streamlining complex design reviews to refining industrial designs, the card’s capabilities pave the way for an AI-accelerated future, empowering professionals to achieve more without compromising on performance or capabilities.

Modern multi-application workflows, such as AI-powered tools, multi-display setups and high-resolution content, put significant demands on GPU memory. With 16GB of memory in the RTX 2000 Ada, professionals can tap the latest technologies and tools to work faster and better with their data.

Powered by NVIDIA RTX technology, the new GPU delivers impressive realism in graphics with NVIDIA DLSS, delivering ultra-high-quality, photorealistic ray-traced images more than 3x faster than before. In addition, the RTX 2000 Ada enables an immersive experience for enterprise virtual-reality workflows, such as for product design and engineering design reviews.

With its blend of performance, versatility and AI capabilities, the RTX 2000 Ada helps professionals across industries achieve efficiencies.

Architects and urban planners can use it to accelerate visualization workflows and structural analysis, enhancing design precision. Product designers and engineers using industrial PCs can iterate rapidly on product designs with fast, photorealistic rendering and AI-powered generative design. Content creators can edit high-resolution videos and images seamlessly, and use AI for realistic visual effects and content creation assistance.

And in vital embedded applications and edge computing, the RTX 2000 Ada can power real-time data processing for medical devices, optimize manufacturing processes with predictive maintenance and enable AI-driven intelligence in retail environments.

Expanding the Reach of NVIDIA RTX

Among the first to tap the power and performance of the RTX 2000 Ada are Dassault Systèmes for its SOLIDWORKS applications, Rob Wolkers Design and Engineering, and WSP.

“The new RTX 2000 Ada Generation GPU boasts impressive features compared to previous generations, with a compact design that offers exceptional performance and versatility,” said Mark Kauffman, assistant vice president and technical lead at WSP. “Its 16GB of RAM is a game-changer, enabling smooth loading of asset-heavy content, and its ability to run applications like Autodesk 3ds Max, Adobe After Effects and Unreal Engine, as well as support path tracing, expands my creative possibilities.”

“The new NVIDIA RTX 2000 Ada — with its higher-efficiency, next-generation architecture, low power consumption and large frame buffer — will benefit SOLIDWORKS users,” said Olivier Zegdoun, graphics applications research and development director for SOLIDWORKS at Dassault Systèmes. “It delivers excellent performance for designers and engineers to accelerate the development of innovative product experiences with full-model fidelity, even with larger datasets.”

“Today’s design and visualization workflows demand more advanced compute and horsepower,” said Rob Wolkers, owner and senior industrial design engineer at Rob Wolkers Design and Engineering. “Equipped with next-generation architecture and a large frame buffer, the RTX 2000 Ada Generation GPU improves productivity in my everyday industrial design and engineering workflows, allowing me to work with large datasets in full fidelity and generate renders with more lighting and reflection scenarios 3x faster.”

Elevating Workflows With Next-Generation RTX Technology 

The NVIDIA RTX 2000 Ada features the latest technologies in the NVIDIA Ada Lovelace GPU architecture, including:

  • Third-generation RT Cores: Up to 1.7x faster ray-tracing performance for high-fidelity, photorealistic rendering.
  • Fourth-generation Tensor Cores: Up to 1.8x AI throughput over the previous generation, with structured sparsity and FP8 precision to enable higher inference performance for AI-accelerated tools and applications.
  • CUDA cores: Up to 1.5x the FP32 throughput of the previous generation for significant performance improvements in graphics and compute workloads.
  • Power efficiency: Up to a 2x performance boost across professional graphics, rendering, AI and compute workloads, all within the same 70W of power as the previous generation.
  • Immersive workflows: Up to 3x performance for virtual-reality workflows over the previous generation.
  • 16GB of GPU memory: An expanded canvas enables users to tackle larger projects, along with support for error correction code memory to deliver greater computing accuracy and reliability for mission-critical applications.
  • DLSS 3: Delivers a breakthrough in AI-powered graphics, significantly boosting performance by generating additional high-quality frames.
  • AV1 encoder: Eighth-generation NVIDIA Encoder, aka NVENC, with AV1 support is 40% more efficient than H.264, enabling new possibilities for broadcasters, streamers and video callers.

NVIDIA RTX Enterprise Driver Delivers New Features, Adds Support for RTX 2000 Ada

The latest RTX Enterprise Driver, available now to download, includes a range of features that enhance graphics workflows, along with support for the RTX 2000 Ada.

The AI-based, standard dynamic range to high dynamic range tone-mapping feature, called Video TrueHDR, expands the color range and brightness levels when viewing content in Chrome or Edge browsers. With added support for Video Super Resolution and TrueHDR to the NVIDIA NGX software development kit, video quality of low-resolution sources can be enhanced, and SDR content can easily be converted to HDR.

Additional features in this release include:

  • TensorRT-LLM, an open-source library that optimizes and accelerates inference performance for the latest large language models on NVIDIA GPUs.
  • Video quality improvement and enhanced coding efficiency to video codecs through bit depth expansion techniques and new low-delay B frame.
  • Ability to offload work from the CPU to the GPU with the execute indirect extension NVIDIA API for quicker task completion.
  • Ability to display the GPU serial number in the NV Control Panel on desktops for easier registration to the NVIDIA AI Enterprise and NVIDIA Omniverse Enterprise platforms.

Availability

The NVIDIA RTX 2000 Ada is available now through global distribution partners such as Arrow Electronics, Ingram Micro, Leadtek, PNY, Ryoyo Electro and TD SYNNEX, and will be available from Dell Technologies, HP and Lenovo starting in April.

See the NVIDIA RTX 2000 Ada at Dassault Systèmes’ 3DEXPERIENCE World

Stop by the Dell, Lenovo and Z by HP booths at Dassault Systèmes’ 3DEXPERIENCE World, running Feb. 11-14 at the Kay Bailey Hutchison Convention Center in Dallas, to view live demos of Dassault Systèmes SOLIDWORKS applications powered by the NVIDIA RTX 2000 Ada.

Attend the Z by HP session on Tuesday, Feb. 13, where Wolkers will discuss the workflow used to design NEMO, the supercar of submarines.

Learn more about the NVIDIA RTX 2000 Ada Generation GPU.

Read More

Efficient ConvBN Blocks for Transfer Learning and Beyond

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and beyond, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To…Apple Machine Learning Research