November 2023 – Page 14

Principal Financial Group uses AWS Post Call Analytics solution to extract omnichannel customer insights

An established financial services firm with over 140 years in business, Principal is a global investment management leader and serves more than 62 million customers around the world. Principal is conducting enterprise-scale near-real-time analytics to deliver a seamless and hyper-personalized omnichannel customer experience on their mission to make financial security accessible for all. They are processing data across channels, including recorded contact center interactions, emails, chat and other digital channels.

In this post, we demonstrate how data aggregated within the AWS CCI Post Call Analytics solution allowed Principal to gain visibility into their contact center interactions, better understand the customer journey, and improve the overall experience between contact channels while also maintaining data integrity and security.

Solution requirements

Principal provides investment services through Genesys Cloud CX, a cloud-based contact center that provides powerful, native integrations with AWS. Each year, Principal handles millions of calls and digital interactions. As a first step, they wanted to transcribe voice calls and analyze those interactions to determine primary call drivers, including issues, topics, sentiment, average handle time (AHT) breakdowns, and develop additional natural language processing (NLP)-based analytics.

In order analyze the calls properly, Principal had a few requirements:

Contact details: Understanding the customer journey requires understanding whether a speaker is an automated interactive voice response (IVR) system or a human agent and when a call transfer occurs between the two.
Content redaction: Each customer audio interaction is recorded as a stereo WAV file, but could potentially include sensitive information such as HIPAA-protected and personally identifiable information (PII).
Scalability: This architecture needed to immediately scale to thousands of calls per day and millions of calls per year. In addition, Principal needed an extensible analytics architecture that analyze other channels such as email threads and traditional voice of the customer (VoC) survey results.
Integrity is non-negotiable at Principal—it guides everything they do. In fact, doing what’s right is one of the core values at Principal. Therefore, when the Principal team started tackling this project, they knew that ensuring the highest standard of data security such as regulatory compliance, data privacy, and data quality would be a non-negotiable, key requirement. The team needed to utilize technology with a matching stance on data security, and the ability to build custom compliance and security controls to uphold strict requirements. Attention to this key requirement allows Principal to maintain a safe and secure customer experience.

Solution overview

After extensive research, the Principal team finalized AWS Contact Center Intelligence (CCI) solutions, which empower companies to improve customer experience and gain conversation insights by adding AI capabilities to third-party on-premises and cloud contact centers. The CCI Post-Call Analytics (PCA) solution is part of CCI solutions suite and fit many of the identified requirements. PCA has a Solutions Library Guidance reference architecture with an open-source example repository on GitHub. Working with their AWS account team, Principal detailed the PCA solution and its deployment, and set up custom training programs and immersion days to rapidly upskill the Principal teams. The example architecture (see the following diagram) and code base in the open-source repository allowed the Principal engineering teams to jumpstart their solution around unifying the customer journey, and merging telephony records and transcript records together.

PCA provides an entire architecture around ingesting audio files in a fully automated workflow with AWS Step Functions, which is initiated when an audio file is delivered to a configured Amazon Simple Storage Service (Amazon S3) bucket. After a few minutes, a transcript is produced with Amazon Transcribe Call Analytics and saved to another S3 bucket for processing by other business intelligence (BI) tools. PCA also offers a web-based user interface that allows customers to browse call transcripts. PCA’s security features ensure that any PII data was redacted from the transcript, as well as from the audio file itself. Additionally, all data within the S3 bucket can be encrypted with keys belonging to Principal.

Principal worked with AWS technical teams to modify the Step Functions workflow within PCA to further achieve their goals. Call details such as interaction timestamps, call queues, agent transfers, and participant speaking times are tracked by Genesys in a file called a Contact Trace Record (CTR). Combining accurate transcripts with Genesys CTR files, Principal could properly identify the speakers, categorize the calls into groups, analyze agent performance, identify upsell opportunities, and conduct additional machine learning (ML)-powered analytics.

The teams built a new data ingestion mechanism, allowing the CTR files to be jointly delivered with the audio file to an S3 bucket. Principal and AWS collaborated on a new AWS Lambda function that was added to the Step Functions workflow. This Lambda function identifies CTR records and provides an additional processing step that outputs an enhanced transcript containing additional metadata such as queue and agent ID information, IVR identification and tagging, and how many agents (and IVRs) the customer was transferred to, all aggregated from the CTR records. This extra information enables Principal to create a map of the customer interaction throughout the lifecycle of the conversation and focus on the critical speech segments, while excluding less relevant ones.

Additionally, this postprocessing step enabled Principal to further enrich transcripts with internal information such as agent and queue names and expand the analytics capabilities of PCA, including custom NLP-based ML models for topic and customer intent identification, deployed using Amazon SageMaker endpoints, and additional transcript augmentation using foundational generative AI models hosted on Amazon Bedrock.

PCA is open source on GitHub, which allows customers such as Principal to extend and maintain their own forks with customized, private business code. It also allows the community to submit code back to the main repository for others to use. Principal and AWS technical teams partnered to merge the Genesys CTR and postprocessing placeholder features into the main release of PCA. This partnership between Principal and AWS enabled speed-to-market for Principal, while ensuring that existing and incoming business requirements could be rapidly added. The contributions to the open-source project has accelerated other customers’ Genesys CTR workloads.

Answer business questions

Once PCA was in place, Principal analysts, data scientists, engineers, and business owners worked with AWS SMEs to build numerous Amazon QuickSight dashboards to display the data insights and begin answering business questions. QuickSight is a cloud-scale BI service that you can use to deliver easy-to-understand insights from multiple datasets, from AWS data, third-party data, software as a service (SaaS) data, and more. The use of this BI tool, with its native integrations to the existing data repositories made accessible by Amazon Athena, made the creation of visualizations to display the large-scale data relatively straightforward, and enabled self-service BI. Visualizations were quickly drafted to answer some key questions, including “What are our customers calling us about,” “What topics relate to the longest AHT/most transfers,” and “What topics and issues relate to the lowest customer sentiment scores?” By ingesting additional data related to Principal custom topic models, the team was able to expand their use of QuickSight to include topic and correlation comparisons, model validation capabilities, and comparisons of sentiment based on speaker, segment, call, and conversation. In addition, the use of QuickSight insights quickly allowed the Principal teams to implement anomaly detection and volume prediction, while Amazon QuickSight Q, an ML feature within QuickSight that uses NLP, enabled rapid natural language quantitative data analytics.

When the initial initiative for PCA was complete, Principal knew they needed to immediately dive deeper into the omnichannel customer experience. Together, Principal and AWS have built data ingestion pipelines for customer email interactions and additional metadata from their customer data platform, and built data aggregation and analytics mechanisms to combine omnichannel data into a single customer insight lens. Utilization of Athena views and QuickSight dashboards has continued to enable classic analytics, and the implementation of proof of concept graph databases via Amazon Neptune will help Principal extract insights into interaction topics and intent relationships within the omnichannel view when implemented at scale.

The Results

PCA helped accelerate time to market. Principal was able to deploy the existing open-source PCA app by themselves in 1 day. Then, Principal worked together with AWS and expanded the PCA offering with numerous features like the Genesys CTR integration over a period of 3 months. The development and deployment process was a joint, iterative process that allowed Principal to test and process production call volumes on newly built features. Since the initial engagement, AWS and Principal continue to work together, sharing business requirements, roadmaps, code, and bug fixes to expand PCA.

Since its initial development and deployment, Principal has processed over 1 million customer calls through the PCA framework. This resulted in over 63 million individual speech segments spoken by a customer, agent, or IVR. With this wealth of data, Principal has been able to conduct large-scale historical and near-real-time analytics to gain insights into the customer experience.

AWS CCI solutions are a game-changer for Principal. Principal’s existing suite of CCI tools, which includes Qualtrics for simple dashboarding and opportunity identification, was expanded with the addition of PCA. The addition of PCA to the suite of CCI tools enabled Principal to rapidly conduct deep analytics on their contact center interactions. With this data, Principal now can conduct advanced analytics to understand customer interactions and call drivers, including topics, intents, issues, action items, and outcomes. Even in a small-scale, controlled production environment, the PCA data lake has spawned numerous new use cases.

Roadmap

The data generated from PCA could be used to make critical business decisions regarding call routing based on insights around which topics are driving longer average handle time, longer holds, more transfers, and negative customer sentiment. Knowledge on when customer interactions with the IVR and automated voice assistants are misunderstood or misrouted will help Principal improve the self-service experience. Understanding why a customer called instead of using the website is critical to improving the customer journey and boosting customer happiness. Product managers responsible for enhancing web experiences have shared how excited they are to be able to use data from PCA to drive their prioritization of new enhancements and measure the impact of changes. Principal is also analyzing other potential use cases such as customer profile mapping, fraud detection, workforce management, the use of additional AI/ML and large language models (LLMs), and identifying new and emerging trends within their contact centers.

In the future, Principal plans to continue expanding postprocessing capabilities with additional data aggregation, analytics, and natural language generation (NLG) models for text summarization. Principal is currently integrating generative AI and foundational models (such as Amazon Titan) to their proprietary solutions. Principal plans to use AWS generative AI to enhance employee productivity, grow assets under management, deliver high-quality customer experiences, and deliver tools that allow customers to make investment and retirement decisions efficiently. Given the flexibility and extensibility of the open-source PCA framework, the teams at Principal have an extensive list of additional enhancements, analytics, and insights that could extend the existing framework.

“With AWS Post Call analytics solution, Principal can currently conduct large-scale historical analytics to understand where customer experiences can be improved, generate actionable insights, and prioritize where to act. Now, we are adding generative AI using Amazon Bedrock to help our business users make data-driven decisions with higher speed and accuracy, while reducing costs. We look forward to exploring the post call summarization feature in Amazon Transcribe Call Analytics in order to enable our agents to focus their time and resources engaging with customers, rather than manual after contact work.”

– says Miguel Sanchez Urresty, Director of Data & Analytics at Principal Financial Group.

Conclusion

The AWS CCI PCA solution is designed to improve customer experience, derive customer insights, and reduce operational costs by adding AI and ML to the contact center provider of your choice. To learn more about other CCI solutions, such as Live Call Analytics, refer to AWS Contact Center Intelligence (CCI) Solutions.

About Principal Financial Group

Principal Financial Group and affiliates, Des Moines IA is a financial company with 19,000 employees. In business for more than 140 years, we’re helping more than 62 million customers in various countries around the world as of December 31, 2022.

AWS and Amazon are not affiliates of any company of the Principal Financial Group Insurance products issued by Principal National Life Insurance Co (except in NY) and Principal Life Insurance Company. Plan administrative services offered by Principal Life. Principal Funds, Inc. is distributed by Principal Funds Distributor, Inc. Securities offered through Principal Securities, Inc., member SIPC and/or independent broker/dealers. Referenced companies are members of the Principal Financial Group, Des Moines, IA 50392. ©2023 Principal Financial Services, Inc.

This communication is intended to be educational in nature and is not intended to be taken as a recommendation. Insurance products and plan administrative services provided through Principal Life Insurance Company, a member of the Principal Financial Group, Des Moines, IA 50392

About the authors

Christopher Lott is a Senior Solutions Architect in the AWS AI Language Services team. He has 20 years of enterprise software development experience. Chris lives in Sacramento, California, and enjoys gardening, cooking, aerospace/general aviation, and traveling the world.

Dr. Nicki Susman is a Senior Data Scientist and the Technical Lead of the Principal Language AI Services team. She has extensive experience in data and analytics, application development, infrastructure engineering, and DevSecOps.

Foundational vision models and visual prompt engineering for autonomous driving applications

Prompt engineering has become an essential skill for anyone working with large language models (LLMs) to generate high-quality and relevant texts. Although text prompt engineering has been widely discussed, visual prompt engineering is an emerging field that requires attention. Visual prompts can include bounding boxes or masks that guide vision models in generating relevant and accurate outputs. In this post, we explore the basics of visual prompt engineering, its benefits, and how it can be used to solve a specific use case: image segmentation for autonomous driving.

In recent years, the field of computer vision has witnessed significant advancements in the area of image segmentation. One such breakthrough is the Segment Anything Model (SAM) by Meta AI, which has the potential to revolutionize object-level segmentation with zero-shot or few-shot training. In this post, we use the SAM model as an example foundation vision model and explore its application to the BDD100K dataset, a diverse autonomous driving dataset for heterogeneous multitask learning. By combining the strengths of SAM with the rich data provided by BDD100K, we showcase the potential of visual prompt engineering with different versions of SAM. Inspired by the LangChain framework for language models, we propose a visual chain to perform visual prompting by combining object detection models with SAM.

Although this post focuses on autonomous driving, the concepts discussed are applicable broadly to domains that have rich vision-based applications such as healthcare and life sciences, and media and entertainment. Let’s begin by learning a little more about what’s under the hood of a foundational vision model like SAM. We used Amazon SageMaker Studio on an ml.g5.16xlarge instance for this post.

Segment Anything Model (SAM)

Foundation models are large machine learning (ML) models trained on vast quantity of data and can be prompted or fine-tuned for task-specific use cases. Here, we explore the Segment Anything Model (SAM), which is a foundational model for vision, specifically image segmentation. It is pre-trained on a massive dataset of 11 million images and 1.1 billion masks, making it the largest segmentation dataset as of writing. This extensive dataset covers a wide range of objects and categories, providing SAM with a diverse and large-scale training data source.

The SAM model is trained to understand objects and can output segmentation masks for any object in images or video frames. The model allows for visual prompt engineering, enabling you to provide inputs such as text, points, bounding boxes, or masks to generate labels without altering the original image. SAM is available in three sizes: base (ViT-B, 91 million parameters), large (ViT-L, 308 million parameters), and huge (ViT-H, 636 million parameters), catering to different computational requirements and use cases.

The primary motivation behind SAM is to improve object-level segmentation with minimal training samples and epochs for any objects of interest. The power of SAM lies in its ability to adapt to new image distributions and tasks without prior knowledge, a feature known as zero-shot transfer. This adaptability is achieved through its training on the expansive SA-1B dataset, which has demonstrated impressive zero-shot performance, surpassing many prior fully supervised results.

As shown in the following architecture for SAM, the process of generating segmentation masks involves three steps:

An image encoder produces a one-time embedding for the image.
A prompt encoder converts any prompt into an embedding vector for the prompt.
The lightweight decoder combines the information from the image encoder and the prompt encoder to predict segmentation masks.

As an example, we can provide an input with an image and bounding box around an object of interest in that image (e.g. Silver car or driving lane) and SAM model would produce segmentation masks for that object.

Visual prompt engineering

Prompt engineering refers to structuring inputs to a model that makes the model understand the intent and produces desired outcome. With textual prompt engineering, you can structure the input text through modifications such as choice of words, formatting, ordering, and more to get the desired output. Visual prompt engineering assumes that the user is working in a visual modality (image or video), and provides inputs. The following is a non-exhaustive list of potential ways to provide input to the generative AI model in the visual domain:

Point – A singular (x, y) coordinate point in the image plane
Points – Multiple (x, y) coordinate points, not necessarily related to each other
Bounding box – A set of four values (x, y, w, h) that define a rectangular region in the image plane
Contour – A set of (x, y) coordinate points in the image plane that form a closed shape
Mask – An array the same size as the image with a partial mask of the object of interest

With the visual prompt engineering techniques in mind, let’s explore how this can be applied to the SAM pre-trained model. We have use the base version of the pre-trained model.

Zero-shot prompting with the pre-trained SAM model

To start with, let’s explore the zero-shot approach. The following is a sample image from the training dataset taken from a vehicle’s front camera.

We can get segmentation masks for all objects from the image without any explicit visual prompting by automatically generating masks with just an input image. In the following image, we see parts of the car, road, traffic sign, license plates, flyover, pillars, signs, and more are segmented.

However, this output is not immediately useful for the following reasons:

The cars are not segmented as a whole, but in parts. For most perception models, for example, we don’t really care about each of the tires having separate output masks. This is true when looking for other known objects of interest as well, such as road, vegetation, signs, and so on.
Parts of the image that are useful for downstream tasks like drivable area are split up, with no explanation. On the other hand, similar instances are identified separately, and we may be interested in grouping similar objects (panoptic vs. instance segmentation).

Visual prompt engineering with the pre-trained SAM model

Fortunately, SAM supports providing input prompts, and we can use points, point arrays, and bounding boxes as inputs. With these specific instructions, we expect SAM to do better with segmentations focusing on specific points or areas. This can be compared with the language prompt template
"What is a good name for a company that makes {product}?"
where the input along with this prompt template from the user is the {product}. {product} is an input slot. In visual prompting, the bounding boxes, points, or masks are the input slots.

The following image provides the original ground truth bounding box around vehicles, and the drivable area patch from BDD100K ground truth data. The image also shows an input point (a yellow X) at the center of the green bounding box that we will refer to in the next few sections.

Let’s try to generate a mask for the car on the left with the green bounding box as an input to SAM. As shown in the following example, the base model of SAM doesn’t really find anything. This is also seen in the low segmentation score. When we look at the segmentation masks more closely, we see that there are small regions returned as masks (pointed at using red arrows) that aren’t really usable for any downstream application.

Let’s try a combination of a bounding box and a point as the input visual prompt. The yellow cross in the preceding image is the center of the bounding box. Providing this point’s (x,y) coordinates as the prompt along with the bounding box constraint gives us the following mask and a slightly higher score. This is still not usable by any means.

Finally, with the base pre-trained model, we can provide just the input point as a prompt (without the bounding box). The following images show two of the top three masks we thought were interesting.

Mask 1 segments the full car, whereas Mask 3 segments out an area that holds the car’s number plate close to the yellow cross (input prompt). Mask 1 is still not a tight, clean mask around the car; this points to the quality of the model, which we can assume increases with model size.

We can try larger pre-trained models with the same input prompt. The following images show our results. When using the huge SAM pre-trained model, Mask 3 is the entire car, whereas Mask 1 and 2 can be used to extract the number plate.

The large version of the SAM model also provides similar outputs.

The process we went through here is similar to manual prompt engineering for text prompts that you may already be familiar with. Note that a recent improvement in the SAM model to segment anything in high quality provides much better object- and context-specific outputs. In our case, we find that zero-shot prompting with text and visual prompts (point, box, and point and box inputs) don’t improve results drastically as we saw above.

Prompt templates and visual chains

As we can see from the preceding zero-shot examples, SAM struggles to identify all the objects in the scene. This is a good example of where we can take advantage of prompt templates and visual chains. Visual chain is inspired by the chain concept in the popular LangChain framework for language applications. It helps chain the data sources and an LLM to produce the output. For example, we can use an API chain to call an API and invoke an LLM to answer the question based on the API response.

Inspired by LangChain, we propose a sequential visual chain that looks like the following figure. We use a tool (like a pre-trained object detection model) to get initial bounding boxes, calculate the point at the center of the bounding box, and use this to prompt the SAM model with the input image.

For example, the following image shows the segmentation masks as a result of running this chain.

Another example chain can involve a text input of the object the user is interested in identifying. To implement this, we built a pipeline using Grounding DINO, an object detection model to prompt SAM for segmentation.

Grounding DINO is a zero-shot object detection model that can perform object detection with text providing category names (such as “traffic lights” or “truck”) and expressions (such as “yellow truck”). It accepts pairs of text and image to perform the object detection. It’s based on a transformer architecture and enables cross modalities with text and image data. To learn more about Grounding DINO, refer to Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection. This generates bounding boxes and labels and can be processed further to generate center points, filter based on labels, thresholds, and more. This is used (boxes or points) as a prompt to SAM for segmentation, which outputs masks.

The following are some examples showing the input text, DINO output (bounding boxes), and the final SAM output (segmentation masks).

The following images show the output for “yellow truck.”

The following images show the output for “silver car.”

The following image shows the output for “driving lane.”

We can use this pipeline to build a visual chain. The following code snippet explains this concept:

pipeline = [object_predictor, segment_predictor]
image_chain = ImageChain.from_visual_pipeline(pipeline, image_store, verbose=True)
image_chain.run('All silver cars', image_id='5X3349')

Although this is a simple example, this concept can be extended to process feeds from cameras on vehicles to perform object tracking, personally identifiable information (PII) data redaction, and more. We can also get the bounding boxes from smaller models, or in some cases, using standard computer vision tools. It’s fairly straightforward to use a pre-trained model or a service like Amazon Rekognition to get initial (visual) labels for your prompt. At the time of writing this, there are over 70 models available on Amazon SageMaker Jumpstart for object detection, and Amazon Rekognition already identifies several useful categories of objects in images, including cars, pedestrians, and other vehicles.

Next, we look at some quantitative results related to performance of SAM models with a subset of BDD100K data.

Quantitative results

Our objective is to compare the performance of three pre-trained models when given the same visual prompting. In this case, we use the center point of the object location as the visual input. We compare the performance with respect to the object sizes (in proportion to image size)— small (area <0.11%), medium (0.11% < area < 1%), and large (area > 1%). The bounding box area thresholds are defined by the Common Objects in Context (COCO) evaluation metrics [Lin et al., 2014].

The evaluation is at the pixel level and we use the following evaluation metrics:

Precision = (number relevant and retrieved instances) / (total number of retrieved instances)
Recall = (number of relevant and retrieve instances) / (total number of relevant instances)
Instances here are each pixel within the bounding box of the object of interest

The following table reports the performance of three different versions of the SAM model (base, large, and huge). These versions have three different encoders: ViT-B (base), ViT-L (large), ViT-H (huge). The encoders have different parameter counts, where the base model has less parameters than large, and large is less than huge. Although increasing the number of parameters shows improved performance with larger objects, this is not so for smaller objects.

Fine-tuning SAM for your use case

In many cases, directly using a pre-trained SAM model may not be very useful. For example, let’s look at a typical scene in traffic—the following picture is the output from the SAM model with randomly sampled prompt points as input on the left, and the actual labels from the semantic segmentation task from BDD100K on the right. These are obviously very different.

Perception stacks in AVs can easily use the second image, but not the first. On the other hand, there some useful outputs from the first image that can be used, and that the model was not explicitly trained on, for example, lane markings, sidewalk segmentation, license plate masks, and so on. We can fine-tune the SAM model to improve the segmentation results. To perform this fine-tuning, we created a training dataset using an instance segmentation subset (500 images) from the BDD10K dataset. This is a very small subset of images, but our purpose is to prove that foundational vision models (much like LLMs) can perform well for your use case with a surprisingly small number of images. The following image shows the input image, output mask (in blue, with a red border for the car on the left), and possible prompts (bounding box in green and center point X in yellow).

We performed fine-tuning using the Hugging Face library on Amazon SageMaker Studio. We used the ml.g4dn.xlarge instance for the SAM base model tests, and the ml.g4dn.2xlarge for the SAM huge model tests. In our initial experiments, we observed that fine-tuning the base model with just bounding boxes was not successful. The fine-tuned and pre-trained models weren’t able to learn car-specific ground truth masks from the original datasets. Adding query points to the fine-tuning also didn’t improve the training.

Next, we can try fine-tuning the SAM huge model for 30 epochs, with a very small dataset (500 images). The original ground truth mask looks like the following image for the label type car.

As shown in the following images, the original pre-trained version of the huge model with a specific bounding box prompt (in green) gives no output, whereas the fine-tuned version gives an output (still not accurate but fine-tuning was cut off after 40 epochs, and with a very small training dataset of 500 images). The original, pre-trained huge model wasn’t able to predict masks for any of the images we tested. As an example downstream application, the fine-tuned model can be used in pre-labeling workflows such as the one described in Auto-labeling module for deep learning-based Advanced Driver Assistance Systems on AWS.

Conclusion

In this post, we discussed the foundational vision model known as the Segment Anything Model (SAM) and its architecture. We used the SAM model to discuss visual prompting and the various inputs to visual prompting engineering. We explored how different visual prompts perform and their limitations. We also described how visual chains increase performance over using just one prompt, similar to the LangChain API. Next, we provided a quantitative evaluation of three pre-trained models. Lastly, we discussed the fine-tuned SAM model and its results compared to the original base model. Fine-tuning of foundation models helps improve model performance for specific tasks like segmentation. It should be noted that SAM model due to its resource requirements, limits usage for real-time use-cases and inferencing at the edge in its current state. We hope with future iterations and improved techniques, would reduce compute requirements and improve latency.

It is our hope that this post encourages you to explore visual prompting for your use cases. Because this is still an emerging form of prompt engineering, there is much to discover in terms of visual prompts, visual chains, and performance of these tools. Amazon SageMaker is a fully managed ML platform that enables builders to explore large language and visual models and build generative AI applications. Start building the future with AWS today.

About the authors

Gopi Krishnamurthy is a Senior AI/ML Solutions Architect at Amazon Web Services based in New York City. He works with large Automotive customers as their trusted advisor to transform their Machine Learning workloads and migrate to the cloud. His core interests include deep learning and serverless technologies. Outside of work, he likes to spend time with his family and explore a wide range of music.

Shreyas Subramanian is a Principal AI/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges using the AWS platform. Shreyas has a background in large scale optimization and Machine Learning, and in use of Machine Learning and Reinforcement Learning for accelerating optimization tasks.

Sujitha Martin is an Applied Scientist in the Generative AI Innovation Center (GAIIC). Her expertise is in building machine learning solutions involving computer vision and natural language processing for various industry verticals. In particular, she has extensive experience working on human-centered situational awareness and knowledge infused learning for highly autonomous systems.

Francisco Calderon is a Data Scientist in the Generative AI Innovation Center (GAIIC). As a member of the GAIIC, he helps discover the art of the possible with AWS customers using Generative AI technologies. In his spare time, Francisco likes to play music and guitar, playing soccer with his daughters, and enjoying time with his family.

Ringing in the Future: NVIDIA and Amdocs Bring Custom Generative AI to Global Telco Industry

The telecommunications industry — the backbone of today’s interconnected world — is valued at a staggering $1.7 trillion globally, according to IDC.

It’s a massive operation, as telcos process hundreds of petabytes of data in their networks each day. That magnitude is only increasing, as the total amount of data transacted globally is forecast to grow to more than 180 zettabytes by 2025.

To meet this demand for data processing and analysis, telcos are turning to generative AI, which is improving efficiency and productivity across industries.

NVIDIA announced an AI foundry service — a collection of NVIDIA AI Foundation Models, NVIDIA NeMo framework and tools, and NVIDIA DGX Cloud AI supercomputing and services — that gives enterprises an end-to-end solution for creating and optimizing custom generative AI models.

Using the AI foundry service, Amdocs, a leading provider of software and services for communications and media providers, will optimize enterprise-grade large language models for the telco and media industries to efficiently deploy generative AI use cases across their businesses, from customer experiences to network operations and provisioning. The LLMs will run on NVIDIA accelerated computing as part of the Amdocs amAIz framework.

The collaboration builds on the previously announced Amdocs-Microsoft partnership, enabling service providers to adopt these applications in secure, trusted environments, including on premises and in the cloud.

Custom Models for Custom Results

While preliminary applications of generative AI used broad datasets, enterprises have become increasingly focused on developing custom models to perform specialized, industry-specific skills.

By training models on proprietary data, telcos can deliver tailored solutions that produce more accurate results for their use cases.

To simplify the development, tuning and deployment of such custom models, Amdocs is integrating the new NVIDIA AI foundry service.

Equipped with these new generative AI capabilities — including guardrail features — service providers can enhance performance, optimize resource utilization and flexibly scale to meet future needs.

Amdocs’ Global Telco Ecosystem Footprint

More than 350 of the world’s leading telecom and media companies across 90 countries take advantage of Amdocs services each day, including 27 of the world’s top 30 service providers, according to OMDIA.⁽¹⁾ Powering more than 1.7 billion daily digital journeys, Amdocs platforms impact more than 3 billion people around the world.

NVIDIA and Amdocs are exploring several generative AI use cases to simplify and improve operations by providing secure, cost-effective, and high-performance generative AI capabilities.

Initial use cases span customer care, including accelerating resolution of customer inquiries by drawing information from across company data.

And in network operations, the companies are exploring ways to generate solutions to address configuration, coverage or performance issues as they arise.

⁽¹⁾ Source: OMDIA 2022 revenue estimates, excludes China.

Stay up to date on the latest NVIDIA generative AI news and technologies and Microsoft Azure AI News.

In the Fast Lane: NVIDIA Announces Omniverse Cloud Services on Microsoft Azure to Accelerate Automotive Digitalization

Automotive companies are transforming every phase of their product lifecycle — evolving their primarily physical, manual processes into software-driven, AI-enhanced digital systems.

To help them save costs and reduce lead times, NVIDIA is announcing two new simulation engines on Omniverse Cloud: the virtual factory simulation engine and the autonomous vehicle (AV) simulation engine.

Omniverse Cloud, a platform-as-a-service for developing and deploying applications for industrial digitalization, is hosted on Microsoft Azure. This one-stop shop enables automakers worldwide to unify digitalization across their core product and business processes. It allows enterprises to achieve faster production and more efficient operations, improving time to market and enhancing sustainability initiatives.

For design, engineering and manufacturing teams, digitalization streamlines their work, converting once primarily manual industrial processes into efficient systems for concept and styling; AV development, testing and validation; and factory planning.

Virtual Factory Simulation Engine

The Omniverse Cloud virtual factory simulation engine is a collection of customizable developer applications and services that enable factory planning teams to connect large-scale industrial datasets while collaborating, navigating and reviewing them in real time.

Design teams working with 3D data can assemble virtual factories and share their work with thousands of planners who can view, annotate and update the full-fidelity factory dataset from lightweight devices. By simulating virtual factories on Omniverse Cloud, automakers can increase throughput and production quality while saving years of effort and millions of dollars that would result from making changes once construction is underway.

On Omniverse Cloud, teams can create interoperability between existing software applications such as Autodesk Factory Planning, which supports the entire lifecycle for building, mechanical, electrical, and plumbing and factory lines, as well as Siemens’ NX, Process Simulate and Teamcenter Visualization software and the JT file format. They can share knowledge and data in real time in live, virtual factory reviews across 2D devices or in extended reality.

T-Systems, a leading IT solutions provider for Europe’s largest automotive manufacturers, is building and deploying a custom virtual factory application that its customers can deploy in Omniverse Cloud.

SoftServe, an elite member of the NVIDIA Service Delivery Partner program, is also developing custom factory simulation and visualization solutions on this Omniverse Cloud engine, covering factory design, production planning and control.

AV Simulation Engine

The AV simulation engine is a service that delivers physically based sensor simulation, enabling AV and robotics developers to run autonomous systems in a closed-loop virtual environment.

The next generation of AV architectures will be built on large, unified AI models that combine layers of the vehicle stack, including perception, planning and control. Such new architectures call for an integrated approach to development.

With previous architectures, developers could train and test these layers independently, as they were governed by different models. For example, simulation could be used to develop a vehicle’s planning and control system, which only needs basic information about objects in a scene — such as the speed and distance of surrounding vehicles — while perception networks could be trained and tested on recorded sensor data.

However, using simulation to develop an advanced unified AV architecture requires sensor data as the input. For a simulator to be effective, it must be able to simulate vehicle sensors, such as cameras, radars and lidars, with high fidelity.

To address this challenge, NVIDIA is bringing state-of-the-art sensor simulation pipelines used in DRIVE Sim and Isaac Sim to Omniverse Cloud on Microsoft Azure.

Omniverse Cloud sensor simulation provides AV and robotics workflows with high-fidelity, physically based simulation for cameras, radars, lidars and other types of sensors. It can be connected to existing simulation applications, whether developed in-house or provided by a third party, via Omniverse Cloud application programming interfaces for integration into workflows.

Fast Track to Digitalization

The factory simulation engine is now available to customers via an Omniverse Cloud enterprise private offer through the Azure Marketplace, which provides access to NVIDIA OVX systems and fully managed Omniverse software, reference applications and workflows. The sensor simulation engine is coming soon.

Enterprises can now also deploy Omniverse Enterprise on new optimized Azure virtual machines.

Learn more on NVIDIA’s Microsoft Ignite showcase page.

New NVIDIA H100, H200 Tensor Core GPU Instances Coming to Microsoft Azure to Accelerate AI Workloads

As NVIDIA continues to collaborate with Microsoft to build state-of-the-art AI infrastructure, Microsoft is introducing additional H100-based virtual machines to Microsoft Azure to accelerate demanding AI workloads.

At its Ignite conference in Seattle today, Microsoft announced its new NC H100 v5 VM series for Azure, the industry’s first cloud instances featuring NVIDIA H100 NVL GPUs.

This offering brings together a pair of PCIe-based H100 GPUs connected via NVIDIA NVLink, with nearly 4 petaflops of AI compute and 188GB of faster HBM3 memory. The NVIDIA H100 NVL GPU can deliver up to 12x higher performance on GPT-3 175B over the previous generation and is ideal for inference and mainstream training workloads.

Additionally, Microsoft announced plans to add the NVIDIA H200 Tensor Core GPU to its Azure fleet next year to support larger model inferencing with no increase in latency. This new offering is purpose-built to accelerate the largest AI workloads, including LLMs and generative AI models.

The H200 GPU brings dramatic increases both in memory capacity and bandwidth using the latest-generation HBM3e memory. Compared to the H100, this new GPU will offer 141GB of HBM3e memory (1.8x more) and 4.8 TB/s of peak memory bandwidth (a 1.4x increase).

Cloud Computing Gets Confidential

Further expanding availability of NVIDIA-accelerated generative AI computing for Azure customers, Microsoft announced another NVIDIA-powered instance: the NCC H100 v5.

These Azure confidential VMs with NVIDIA H100 Tensor Core GPUs allow customers to protect the confidentiality and integrity of their data and applications in use, in memory, while accessing the unsurpassed acceleration of H100 GPUs. These GPU-enhanced confidential VMs will be coming soon to private preview.

To learn more about the new confidential VMs with NVIDIA H100 Tensor Core GPUs, and sign up for the preview, read the blog.

Learn more about NVIDIA-powered Azure instances on the GPU VM information page.

NVIDIA Fast-Tracks Custom Generative AI Model Development for Enterprises

Today’s landscape of free, open-source large language models (LLMs) is like an all-you-can-eat buffet for enterprises. This abundance can be overwhelming for developers building custom generative AI applications, as they need to navigate unique project and business requirements, including compatibility, security and the data used to train the models.

NVIDIA AI Foundation Models — a curated collection of enterprise-grade pretrained models — give developers a running start for bringing custom generative AI to their enterprise applications.

NVIDIA-Optimized Foundation Models Speed Up Innovation

NVIDIA AI Foundation Models can be experienced through a simple user interface or API, directly from a browser. Additionally, these models can be accessed from NVIDIA AI Foundation Endpoints to test model performance from within their enterprise applications.

Available models include leading community models such as Llama 2, Stable Diffusion XL and Mistral, which are formatted to help developers streamline customization with proprietary data. Additionally, models have been optimized with NVIDIA TensorRT-LLM to deliver the highest throughput and lowest latency and to run at scale on any NVIDIA GPU-accelerated stack. For instance, the Llama 2 model optimized with TensorRT-LLM runs nearly 2x faster on NVIDIA H100.

The new NVIDIA family of Nemotron-3 8B foundation models supports the creation of today’s most advanced enterprise chat and Q&A applications for a broad range of industries, including healthcare, telecommunications and financial services.

The models are a starting point for customers building secure, production-ready generative AI applications, are trained on responsibly sourced datasets and operate at comparable performance to much larger models. This makes them ideal for enterprise deployments.

Multilingual capabilities are a key differentiator of the Nemotron-3 8B models. Out of the box, the models are proficient in over 50 languages, including English, German, Russian, Spanish, French, Japanese, Chinese, Korean, Italian and Dutch.

Fast-Track Customization to Deployment

Enterprises leveraging generative AI across business functions need an AI foundry to customize models for their unique applications. NVIDIA’s AI foundry features three elements — NVIDIA AI Foundation Models, NVIDIA NeMo framework and tools, and NVIDIA DGX Cloud AI supercomputing services. Together, these provide an end-to-end enterprise offering for creating custom generative AI models.

Importantly, enterprises own their customized models and can deploy them virtually anywhere on accelerated computing with enterprise-grade security, stability and support using NVIDIA AI Enterprise software.

NVIDIA AI Foundation Models are freely available to experiment with now on the NVIDIA NGC catalog and Hugging Face, and are also hosted in the Microsoft Azure AI model catalog.

What Is Retrieval-Augmented Generation?

To understand the latest advance in generative AI, imagine a courtroom.

Judges hear and decide cases based on their general understanding of the law. Sometimes a case — like a malpractice suit or a labor dispute — requires special expertise, so judges send court clerks to a law library, looking for precedents and specific cases they can cite.

Like a good judge, large language models (LLMs) can respond to a wide variety of human queries. But to deliver authoritative answers that cite sources, the model needs an assistant to do some research.

The court clerk of AI is a process called retrieval-augmented generation, or RAG for short.

The Story of the Name

Patrick Lewis, lead author of the 2020 paper that coined the term, apologized for the unflattering acronym that now describes a growing family of methods across hundreds of papers and dozens of commercial services he believes represent the future of generative AI.

Picture of Patrick Lewis, lead author of RAG paper — Patrick Lewis

“We definitely would have put more thought into the name had we known our work would become so widespread,” Lewis said in an interview from Singapore, where he was sharing his ideas with a regional conference of database developers.

“We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea,” said Lewis, who now leads a RAG team at AI startup Cohere.

So, What Is Retrieval-Augmented Generation?

Retrieval-augmented generation is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

In other words, it fills a gap in how LLMs work. Under the hood, LLMs are neural networks, typically measured by how many parameters they contain. An LLM’s parameters essentially represent the general patterns of how humans use words to form sentences.

That deep understanding, sometimes called parameterized knowledge, makes LLMs useful in responding to general prompts at light speed. However, it does not serve users who want a deeper dive into a current or more specific topic.

Combining Internal, External Resources

Lewis and colleagues developed retrieval-augmented generation to link generative AI services to external resources, especially ones rich in the latest technical details.

The paper, with coauthors from the former Facebook AI Research (now Meta AI), University College London and New York University, called RAG “a general-purpose fine-tuning recipe” because it can be used by nearly any LLM to connect with practically any external resource.

Building User Trust

Retrieval-augmented generation gives models sources they can cite, like footnotes in a research paper, so users can check any claims. That builds trust.

What’s more, the technique can help models clear up ambiguity in a user query. It also reduces the possibility a model will make a wrong guess, a phenomenon sometimes called hallucination.

Another great advantage of RAG is it’s relatively easy. A blog by Lewis and three of the paper’s coauthors said developers can implement the process with as few as five lines of code.

That makes the method faster and less expensive than retraining a model with additional datasets. And it lets users hot-swap new sources on the fly.

How People Are Using Retrieval-Augmented Generation

With retrieval-augmented generation, users can essentially have conversations with data repositories, opening up new kinds of experiences. This means the applications for RAG could be multiple times the number of available datasets.

For example, a generative AI model supplemented with a medical index could be a great assistant for a doctor or nurse. Financial analysts would benefit from an assistant linked to market data.

In fact, almost any business can turn its technical or policy manuals, videos or logs into resources called knowledge bases that can enhance LLMs. These sources can enable use cases such as customer or field support, employee training and developer productivity.

The broad potential is why companies including AWS, IBM, Glean, Google, Microsoft, NVIDIA, Oracle and Pinecone are adopting RAG.

Getting Started With Retrieval-Augmented Generation

To help users get started, NVIDIA developed a reference architecture for retrieval-augmented generation. It includes a sample chatbot and the elements users need to create their own applications with this new method.

The workflow uses NVIDIA NeMo, a framework for developing and customizing generative AI models, as well as software like NVIDIA Triton Inference Server and NVIDIA TensorRT-LLM for running generative AI models in production.

The software components are all part of NVIDIA AI Enterprise, a software platform that accelerates development and deployment of production-ready AI with the security, support and stability businesses need.

Getting the best performance for RAG workflows requires massive amounts of memory and compute to move and process data. The NVIDIA GH200 Grace Hopper Superchip, with its 288GB of fast HBM3e memory and 8 petaflops of compute, is ideal — it can deliver a 150x speedup over using a CPU.

Once companies get familiar with RAG, they can combine a variety of off-the-shelf or custom LLMs with internal or external knowledge bases to create a wide range of assistants that help their employees and customers.

RAG doesn’t require a data center. LLMs are debuting on Windows PCs, thanks to NVIDIA software that enables all sorts of applications users can access even on their laptops.

Chart shows running RAG on a PC — An example application for RAG on a PC.

PCs equipped with NVIDIA RTX GPUs can now run some AI models locally. By using RAG on a PC, users can link to a private knowledge source – whether that be emails, notes or articles – to improve responses. The user can then feel confident that their data source, prompts and response all remain private and secure.

A recent blog provides an example of RAG accelerated by TensorRT-LLM for Windows to get better results fast.

The History of Retrieval-Augmented Generation

The roots of the technique go back at least to the early 1970s. That’s when researchers in information retrieval prototyped what they called question-answering systems, apps that use natural language processing (NLP) to access text, initially in narrow topics such as baseball.

The concepts behind this kind of text mining have remained fairly constant over the years. But the machine learning engines driving them have grown significantly, increasing their usefulness and popularity.

In the mid-1990s, the Ask Jeeves service, now Ask.com, popularized question answering with its mascot of a well-dressed valet. IBM’s Watson became a TV celebrity in 2011 when it handily beat two human champions on the Jeopardy! game show.

Today, LLMs are taking question-answering systems to a whole new level.

Insights From a London Lab

The seminal 2020 paper arrived as Lewis was pursuing a doctorate in NLP at University College London and working for Meta at a new London AI lab. The team was searching for ways to pack more knowledge into an LLM’s parameters and using a benchmark it developed to measure its progress.

Building on earlier methods and inspired by a paper from Google researchers, the group “had this compelling vision of a trained system that had a retrieval index in the middle of it, so it could learn and generate any text output you wanted,” Lewis recalled.

Picture of IBM Watson winning on "Jeopardy" TV show, popularizing a RAG-like AI service — The IBM Watson question-answering system became a celebrity when it won big on the TV game show Jeopardy!

When Lewis plugged into the work in progress a promising retrieval system from another Meta team, the first results were unexpectedly impressive.

“I showed my supervisor and he said, ‘Whoa, take the win. This sort of thing doesn’t happen very often,’ because these workflows can be hard to set up correctly the first time,” he said.

Lewis also credits major contributions from team members Ethan Perez and Douwe Kiela, then of New York University and Facebook AI Research, respectively.

When complete, the work, which ran on a cluster of NVIDIA GPUs, showed how to make generative AI models more authoritative and trustworthy. It’s since been cited by hundreds of papers that amplified and extended the concepts in what continues to be an active area of research.

How Retrieval-Augmented Generation Works

At a high level, here’s how an NVIDIA technical brief describes the RAG process.

When users ask an LLM a question, the AI model sends the query to another model that converts it into a numeric format so machines can read it. The numeric version of the query is sometimes called an embedding or a vector.

NVIDIA diagram of how RAG works with LLMs — Retrieval-augmented generation combines LLMs with embedding models and vector databases.

The embedding model then compares these numeric values to vectors in a machine-readable index of an available knowledge base. When it finds a match or multiple matches, it retrieves the related data, converts it to human-readable words and passes it back to the LLM.

Finally, the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user, potentially citing sources the embedding model found.

Keeping Sources Current

In the background, the embedding model continuously creates and updates machine-readable indices, sometimes called vector databases, for new and updated knowledge bases as they become available.

Chart of a RAG process described by LangChain — Retrieval-augmented generation combines LLMs with embedding models and vector databases.

Many developers find LangChain, an open-source library, can be particularly useful in chaining together LLMs, embedding models and knowledge bases. NVIDIA uses LangChain in its reference architecture for retrieval-augmented generation.

The LangChain community provides its own description of a RAG process.

Looking forward, the future of generative AI lies in creatively chaining all sorts of LLMs and knowledge bases together to create new kinds of assistants that deliver authoritative results users can verify.

Get a hands on using retrieval-augmented generation with an AI chatbot in this NVIDIA LaunchPad lab.

Igniting the Future: TensorRT-LLM Release Accelerates AI Inference Performance, Adds Support for New Models Running on RTX-Powered Windows 11 PCs

Artificial intelligence on Windows 11 PCs marks a pivotal moment in tech history, revolutionizing experiences for gamers, creators, streamers, office workers, students and even casual PC users.

It offers unprecedented opportunities to enhance productivity for users of the more than 100 million Windows PCs and workstations that are powered by RTX GPUs. And NVIDIA RTX technology is making it even easier for developers to create AI applications to change the way people use computers.

New optimizations, models and resources announced at Microsoft Ignite will help developers deliver new end-user experiences, quicker.

An upcoming update to TensorRT-LLM — open-source software that increases AI inference performance — will add support for new large language models and make demanding AI workloads more accessible on desktops and laptops with RTX GPUs starting at 8GB of VRAM.

TensorRT-LLM for Windows will soon be compatible with OpenAI’s popular Chat API through a new wrapper. This will enable hundreds of developer projects and applications to run locally on a PC with RTX, instead of in the cloud — so users can keep private and proprietary data on Windows 11 PCs.

Custom generative AI requires time and energy to maintain projects. The process can become incredibly complex and time-consuming, especially when trying to collaborate and deploy across multiple environments and platforms.

AI Workbench is a unified, easy-to-use toolkit that allows developers to quickly create, test and customize pretrained generative AI models and LLMs on a PC or workstation. It provides developers a single platform to organize their AI projects and tune models to specific use cases.

This enables seamless collaboration and deployment for developers to create cost-effective, scalable generative AI models quickly. Join the early access list to be among the first to gain access to this growing initiative and to receive future updates.

To support AI developers, NVIDIA and Microsoft will release are releasing DirectML enhancements to accelerate one two of the most popular foundational AI models,: Llama 2 and Stable Diffusion. Developers now have more options for cross-vendor deployment, in addition to setting a new standard for performance.

Portable AI

Last month, NVIDIA announced TensorRT-LLM for Windows, a library for accelerating LLM inference.

The next TensorRT-LLM release, v0.6.0 coming later this month, will bring improved inference performance — up to 5x faster — and enable support for additional popular LLMs, including the new Mistral 7B and Nemotron-3 8B. Versions of these LLMs will run on any GeForce RTX 30 Series and 40 Series GPU with 8GB of RAM or more, making fast, accurate, local LLM capabilities accessible even in some of the most portable Windows devices.

TensorRT-LLM V0.6 Windows Perf Chart — *Up to 5X performance with the new TensorRT-LLM v0.6.0.*

The new release of TensorRT-LLM will be available for install on the /NVIDIA/TensorRT-LLM GitHub repo. New optimized models will be available on ngc.nvidia.com.

Conversing With Confidence

Developers and enthusiasts worldwide use OpenAI’s Chat API for a wide range of applications — from summarizing web content and drafting documents and emails to analyzing and visualizing data and creating presentations.

One challenge with such cloud-based AIs is that they require users to upload their input data, making them impractical for private or proprietary data or for working with large datasets.

To address this challenge, NVIDIA is soon enabling TensorRT-LLM for Windows to offer a similar API interface to OpenAI’s widely popular ChatAPI, through a new wrapper, offering a similar workflow to developers whether they are designing models and applications to run locally on a PC with RTX or in the cloud. By changing just one or two lines of code, hundreds of AI-powered developer projects and applications can now benefit from fast, local AI. Users can keep their data on their PCs and not worry about uploading datasets to the cloud.

Perhaps the best part is that many of these projects and applications are open source, making it easy for developers to leverage and extend their capabilities to fuel the adoption of generative AI on Windows, powered by RTX.

The wrapper will work with any LLM that’s been optimized for TensorRT-LLM (for example, Llama 2, Mistral and NV LLM) and is being released as a reference project on GitHub, alongside other developer resources for working with LLMs on RTX.

Model Acceleration

Developers can now leverage cutting-edge AI models and deploy with a cross-vendor API. As part of an ongoing commitment to empower developers, NVIDIA and Microsoft have been working together to accelerate Llama on RTX via the DirectML API.

Building on the announcements for the fastest inference performance for these models announced last month, this new option for cross-vendor deployment makes it easier than ever to bring AI capabilities to PC.

Developers and enthusiasts can experience the latest optimizations by downloading the latest ONNX runtime and following the installation instructions from Microsoft, and installing the latest driver from NVIDIA, which will be available on Nov. 21.

These new optimizations, models and resources will accelerate the development and deployment of AI features and applications to the 100 million RTX PCs worldwide, joining the more than 400 partners shipping AI-powered apps and games already accelerated by RTX GPUs.

As models become even more accessible and developers bring more generative AI-powered functionality to RTX-powered Windows PCs, RTX GPUs will be critical for enabling users to take advantage of this powerful technology.

Empowering the next generation for an AI-enabled world

Experience AI’s course and resources are expanding on a global scaleRead More

Empowering the next generation for an AI-enabled world

Experience AI’s course and resources are expanding on a global scaleRead More

Solution requirements

Solution overview

Answer business questions

The Results

Roadmap

Conclusion

About Principal Financial Group

About the authors

Segment Anything Model (SAM)

Visual prompt engineering

Zero-shot prompting with the pre-trained SAM model

Visual prompt engineering with the pre-trained SAM model

Prompt templates and visual chains

Quantitative results

Fine-tuning SAM for your use case

Conclusion

About the authors

Custom Models for Custom Results

Amdocs’ Global Telco Ecosystem Footprint

Virtual Factory Simulation Engine

AV Simulation Engine

Fast Track to Digitalization

Cloud Computing Gets Confidential

NVIDIA-Optimized Foundation Models Speed Up Innovation

Fast-Track Customization to Deployment

The Story of the Name

So, What Is Retrieval-Augmented Generation?

Combining Internal, External Resources

Building User Trust

How People Are Using Retrieval-Augmented Generation

Getting Started With Retrieval-Augmented Generation

The History of Retrieval-Augmented Generation

Insights From a London Lab

How Retrieval-Augmented Generation Works

Keeping Sources Current

Portable AI

Conversing With Confidence

Model Acceleration

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.