March 2024 – Page 6

Enhance performance of generative language models with self-consistency prompting on Amazon Bedrock

Generative language models have proven remarkably skillful at solving logical and analytical natural language processing (NLP) tasks. Furthermore, the use of prompt engineering can notably enhance their performance. For example, chain-of-thought (CoT) is known to improve a model’s capacity for complex multi-step problems. To additionally boost accuracy on tasks that involve reasoning, a self-consistency prompting approach has been suggested, which replaces greedy with stochastic decoding during language generation.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading AI companies and Amazon via a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With the batch inference API, you can use Amazon Bedrock to run inference with foundation models in batches and get responses more efficiently. This post shows how to implement self-consistency prompting via batch inference on Amazon Bedrock to enhance model performance on arithmetic and multiple-choice reasoning tasks.

Overview of solution

Self-consistency prompting of language models relies on the generation of multiple responses that are aggregated into a final answer. In contrast to single-generation approaches like CoT, the self-consistency sample-and-marginalize procedure creates a range of model completions that lead to a more consistent solution. The generation of different responses for a given prompt is possible due to the use of a stochastic, rather than greedy, decoding strategy.

The following figure shows how self-consistency differs from greedy CoT in that it generates a diverse set of reasoning paths and aggregates them to produce the final answer.

Decoding strategies for text generation

Text generated by decoder-only language models unfolds word by word, with the subsequent token being predicted on the basis of the preceding context. For a given prompt, the model computes a probability distribution indicating the likelihood of each token to appear next in the sequence. Decoding involves translating these probability distributions into actual text. Text generation is mediated by a set of inference parameters that are often hyperparameters of the decoding method itself. One example is the temperature, which modulates the probability distribution of the next token and influences the randomness of the model’s output.

Greedy decoding is a deterministic decoding strategy that at each step selects the token with the highest probability. Although straightforward and efficient, the approach risks falling into repetitive patterns, because it disregards the broader probability space. Setting the temperature parameter to 0 at inference time essentially equates to implementing greedy decoding.

Sampling introduces stochasticity into the decoding process by randomly selecting each subsequent token based on the predicted probability distribution. This randomness results in greater output variability. Stochastic decoding proves more adept at capturing the diversity of potential outputs and often yields more imaginative responses. Higher temperature values introduce more fluctuations and increase the creativity of the model’s response.

Prompting techniques: CoT and self-consistency

The reasoning ability of language models can be augmented via prompt engineering. In particular, CoT has been shown to elicit reasoning in complex NLP tasks. One way to implement a zero-shot CoT is via prompt augmentation with the instruction to “think step by step.” Another is to expose the model to exemplars of intermediate reasoning steps in few-shot prompting fashion. Both scenarios typically use greedy decoding. CoT leads to significant performance gains compared to simple instruction prompting on arithmetic, commonsense, and symbolic reasoning tasks.

Self-consistency prompting is based on the assumption that introducing diversity in the reasoning process can be beneficial to help models converge on the correct answer. The technique uses stochastic decoding to achieve this goal in three steps:

Prompt the language model with CoT exemplars to elicit reasoning.
Replace greedy decoding with a sampling strategy to generate a diverse set of reasoning paths.
Aggregate the results to find the most consistent answer in the response set.

Self-consistency is shown to outperform CoT prompting on popular arithmetic and commonsense reasoning benchmarks. A limitation of the approach is its larger computational cost.

This post shows how self-consistency prompting enhances performance of generative language models on two NLP reasoning tasks: arithmetic problem-solving and multiple-choice domain-specific question answering. We demonstrate the approach using batch inference on Amazon Bedrock:

We access the Amazon Bedrock Python SDK in JupyterLab on an Amazon SageMaker notebook instance.
For arithmetic reasoning, we prompt Cohere Command on the GSM8K dataset of grade school math problems.
For multiple-choice reasoning, we prompt AI21 Labs Jurassic-2 Mid on a small sample of questions from the AWS Certified Solutions Architect – Associate exam.

Prerequisites

This walkthrough assumes the following prerequisites:

An AWS account with a ml.t3.medium notebook Instance hosted in SageMaker.
An AWS Identity and Access Management (IAM) SageMaker execution role with attached AmazonBedrockFullAccess and iam:PassRole policies to run Jupyter inside the SageMaker notebook instance.
An IAM BedrockBatchInferenceRole role for batch inference with Amazon Bedrock with Amazon Simple Storage Service (Amazon S3) access and sts:AssumeRole trust policies. For more information, refer to Set up permissions for batch inference.
Access to models hosted on Amazon Bedrock. Choose Manage model access on the Amazon Bedrock console and choose among the list of available options. We use Cohere Command and AI21 Labs Jurassic-2 Mid for this demo.

The estimated cost to run the code shown in this post is $100, assuming you run self-consistency prompting one time with 30 reasoning paths using one value for the temperature-based sampling.

Dataset to probe arithmetic reasoning capabilities

GSM8K is a dataset of human-assembled grade school math problems featuring a high linguistic diversity. Each problem takes 2–8 steps to solve and requires performing a sequence of elementary calculations with basic arithmetic operations. This data is commonly used to benchmark the multi-step arithmetic reasoning capabilities of generative language models. The GSM8K train set comprises 7,473 records. The following is an example:

{"question": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?", "answer": "Natalia sold 48/2 = <<48/2=24>>24 clips in May.nNatalia sold 48+24 = <<48+24=72>>72 clips altogether in April and May.n#### 72"}

Set up to run batch inference with Amazon Bedrock

Batch inference allows you to run multiple inference calls to Amazon Bedrock asynchronously and improve the performance of model inference on large datasets. The service is in preview as of this writing and only available through the API. Refer to Run batch inference to access batch inference APIs via custom SDKs.

After you have downloaded and unzipped the Python SDK in a SageMaker notebook instance, you can install it by running the following code in a Jupyter notebook cell:

# Install preview SDK packages
!pip install -q $(ls ./bedrock-python-sdk-reinvent/botocore-*.whl | head -1)
!pip install -q $(ls ./bedrock-python-sdk-reinvent/boto3-*.whl | head -1)

Format and upload input data to Amazon S3

Input data for batch inference needs to be prepared in JSONL format with recordId and modelInput keys. The latter should match the body field of the model to be invoked on Amazon Bedrock. In particular, some supported inference parameters for Cohere Command are temperature for randomness, max_tokens for output length, and num_generations to generate multiple responses, all of which are passed together with the prompt as modelInput:

data = [
    {
        "recordId": "1",
        "modelInput": {
            "prompt": prompt,
            "temperature": temperature,
            "max_tokens": max_tokens,
            "num_generations": n,
        },
    },
    ...,
]

See Inference parameters for foundation models for more details, including other model providers.

Our experiments on arithmetic reasoning are performed in the few-shot setting without customizing or fine-tuning Cohere Command. We use the same set of eight few-shot exemplars from the chain-of-thought (Table 20) and self-consistency (Table 17) papers. Prompts are created by concatenating the exemplars with each question from the GSM8K train set.

We set max_tokens to 512 and num_generations to 5, the maximum allowed by Cohere Command. For greedy decoding, we set temperature to 0 and for self-consistency, we run three experiments at temperatures 0.5, 0.7, and 1. Each setting yields different input data according to the respective temperature values. Data is formatted as JSONL and stored in Amazon S3.

# Set up S3 client
session = boto3.Session()
s3 = session.client("s3")

# Create S3 bucket with unique name to store input/output data
suffix = str(uuid.uuid4())[:8]
bucket = f"bedrock-self-consistency-{suffix}"
s3.create_bucket(
    Bucket=bucket, CreateBucketConfiguration={"LocationConstraint": session.region_name}
)

# Process data and output to new lines as JSONL
input_key = f"gsm8k/T{temperature}/input.jsonl"
s3_data = ""
for row in data:
    s3_data += json.dumps(row) + "n"
s3.put_object(Body=s3_data, Bucket=bucket, Key=input_key)

Create and run batch inference jobs in Amazon Bedrock

Batch inference job creation requires an Amazon Bedrock client. We specify the S3 input and output paths and give each invocation job a unique name:

# Create Bedrock client							    
bedrock = boto3.client("bedrock")

# Input and output config						     
input_config = {"s3InputDataConfig": {"s3Uri": f"s3://{bucket}/{input_key}"}}
output_config = {"s3OutputDataConfig": {"s3Uri": f"s3://{bucket}/{output_key}"}}

# Create a unique job name
suffix = str(uuid.uuid4())[:8] 
job_name = f"command-batch-T{temperature}-{suffix}"

Jobs are created by passing the IAM role, model ID, job name, and input/output configuration as parameters to the Amazon Bedrock API:

response = bedrock.create_model_invocation_job(
    roleArn=f"arn:aws:iam::{account_id}:role/BedrockBatchInferenceRole",
    modelId="cohere.command-text-v14",
    jobName=job_name,
    inputDataConfig=input_config,
    outputDataConfig=output_config,
)
job_arn = response["jobArn"]

Listing, monitoring, and stopping batch inference jobs is supported by their respective API calls. On creation, jobs appear first as Submitted, then as InProgress, and finally as Stopped, Failed, or Completed.

# Get job details
job_details = bedrock.get_model_invocation_job(jobIdentifier=job_arn)

If the jobs are successfully complete, the generated content can be retrieved from Amazon S3 using its unique output location.

# Get the output file key
s3_prefix = f"s3://{bucket}/"
output_path = job_details["outputDataConfig"]["s3OutputDataConfig"]["s3Uri"].replace(
    s3_prefix, ""
)
output_folder = job_details["jobArn"].split("/")[1]
output_file = (
    f'{job_details["inputDataConfig"]["s3InputDataConfig"]["s3Uri"].split("/")[-1]}.out'
)
result_key = f"{output_path}{output_folder}/{output_file}"

# Get output data
obj = s3.get_object(Bucket=bucket, Key=result_key)
content = obj["Body"].read().decode("utf-8").strip().split("n")

# Show answer to the first question
print(json.loads(content[0])["modelOutput"]["generations"][0]["text"])

[Out]: 'Natalia sold 48 * 1/2 = 24 clips less in May. This means she sold 48 + 24 = 72 clips in April and May. The answer is 72.'

Self-consistency enhances model accuracy on arithmetic tasks

Self-consistency prompting of Cohere Command outperforms a greedy CoT baseline in terms of accuracy on the GSM8K dataset. For self-consistency, we sample 30 independent reasoning paths at three different temperatures, with topP and topK set to their default values. Final solutions are aggregated by choosing the most consistent occurrence via majority voting. In case of a tie, we randomly choose one of the majority responses. We compute accuracy and standard deviation values averaged over 100 runs.

The following figure shows the accuracy on the GSM8K dataset from Cohere Command prompted with greedy CoT (blue) and self-consistency at temperature values 0.5 (yellow), 0.7 (green), and 1.0 (orange) as a function of the number of sampled reasoning paths.

The preceding figure shows that self-consistency enhances arithmetic accuracy over greedy CoT when the number of sampled paths is as low as three. Performance increases consistently with further reasoning paths, confirming the importance of introducing diversity in the thought generation. Cohere Command solves the GSM8K question set with 51.7% accuracy when prompted with CoT vs. 68% with 30 self-consistent reasoning paths at T=1.0. All three surveyed temperature values yield similar results, with lower temperatures being comparatively more performant at less sampled paths.

Practical considerations on efficiency and cost

Self-consistency is limited by the increased response time and cost incurred when generating multiple outputs per prompt. As a practical illustration, batch inference for greedy generation with Cohere Command on 7,473 GSM8K records finished in less than 20 minutes. The job took 5.5 million tokens as input and generated 630,000 output tokens. At current Amazon Bedrock inference prices, the total cost incurred was around $9.50.

For self-consistency with Cohere Command, we use inference parameter num_generations to create multiple completions per prompt. As of this writing, Amazon Bedrock allows a maximum of five generations and three concurrent Submitted batch inference jobs. Jobs proceed to the InProgress status sequentially, therefore sampling more than five paths requires multiple invocations.

The following figure shows the runtimes for Cohere Command on the GSM8K dataset. Total runtime is shown on the x axis and runtime per sampled reasoning path on the y axis. Greedy generation runs in the shortest time but incurs a higher time cost per sampled path.

Greedy generation completes in less than 20 minutes for the full GSM8K set and samples a unique reasoning path. Self-consistency with five samples requires about 50% longer to complete and costs around $14.50, but produces five paths (over 500%) in that time. Total runtime and cost increase step-wise with every extra five sampled paths. A cost-benefit analysis suggests that 1–2 batch inference jobs with 5–10 sampled paths is the recommended setting for practical implementation of self-consistency. This achieves enhanced model performance while keeping cost and latency at bay.

Self-consistency enhances model performance beyond arithmetic reasoning

A crucial question to prove the suitability of self-consistency prompting is whether the method succeeds across further NLP tasks and language models. As an extension to an Amazon-related use case, we perform a small-sized analysis on sample questions from the AWS Solutions Architect Associate Certification. This is a multiple-choice exam on AWS technology and services that requires domain knowledge and the ability to reason and decide among several options.

We prepare a dataset from SAA-C01 and SAA-C03 sample exam questions. From the 20 available questions, we use the first 4 as few-shot exemplars and prompt the model to answer the remaining 16. This time, we run inference with the AI21 Labs Jurassic-2 Mid model and generate a maximum of 10 reasoning paths at temperature 0.7. Results show that self-consistency enhances performance: although greedy CoT produces 11 correct answers, self-consistency succeeds on 2 more.

The following table shows the accuracy results for 5 and 10 sampled paths averaged over 100 runs.

.	Greedy decoding	T = 0.7
# sampled paths: 5	68.6	74.1 ± 0.7
# sampled paths: 10	68.6	78.9 ± 0.3

In the following table, we present two exam questions that are incorrectly answered by greedy CoT while self-consistency succeeds, highlighting in each case the correct (green) or incorrect (red) reasoning traces that led the model to produce correct or incorrect responses. Although not every sampled path generated by self-consistency is correct, the majority converges on the true answer as the number of sampled paths increases. We observe that 5–10 paths are typically enough to improve over the greedy results, with diminishing returns in terms of efficiency past those values.

Question	A web application allows customers to upload orders to an S3 bucket. The resulting Amazon S3 events trigger a Lambda function that inserts a message to an SQS queue. A single EC2 instance reads messages from the queue, processes them, and stores them in a DynamoDB table partitioned by unique order ID. Next month traffic is expected to increase by a factor of 10 and a Solutions Architect is reviewing the architecture for possible scaling problems. Which component is MOST likely to need re-architecting to be able to scale to accommodate the new traffic? A. Lambda function B. SQS queue C. EC2 instance D. DynamoDB table	An application running on AWS uses an Amazon Aurora Multi-AZ DB cluster deployment for its database. When evaluating performance metrics, a solutions architect discovered that the database reads are causing high I/O and adding latency to the write requests against the database. What should the solutions architect do to separate the read requests from the write requests? A. Enable read-through caching on the Aurora database. B. Update the application to read from the Multi-AZ standby instance. C. Create an Aurora replica and modify the application to use the appropriate endpoints. D. Create a second Aurora database and link it to the primary database as a read replica.
Correct answer	C – A single EC2 instance will not scale and is a single point of failure in the architecture. A much better solution would be to have EC2 instances in an Auto Scaling group across 2 availability zones read messages from the queue. The other responses are all managed services that can be configured to scale or will scale automatically.	C – Aurora Replicas provide a way to offload read traffic. Aurora Replicas share the same underlying storage as the main database, so lag time is generally very low. Aurora Replicas have their own endpoints, so the application will need to be configured to direct read traffic to the new endpoints.
Greedy decoding	Scaling the EC2 instance directly may not be possible, as it may not be possible to resize the instance. Scaling the Lambda function may not be sufficient, as Lambda functions have a 6 GB of memory allocation limit. Scaling the SQS queue may not be sufficient, as SQS has a 10,000 message limit. DynamoDB tables can be scaled by provisioning more capacity units. The correct answer is D.	Modifying the application to read from the Multi-AZ standby instance (option B) is the best approach. The standby instance is asynchronously replicated from the primary instance, so read operations will not interfere with write operations. Enabling read-through caching (option A) or creating an Aurora replica (option C) will not solve the I/O contention issue. Creating a second Aurora database and linking it as a read replica (option D) will not solve the I/O contention issue either. The correct answer is B.
Sampled path #1	Scaling the EC2 instance to 10 times its current size may not be feasible, either from a cost or performance perspective. It would require a larger instance, which can be expensive. A more cost-effective approach might be to use a swarm of smaller instances (e.g., Auto Scaling Group) to share the load. In this case, the EC2 instance would likely need to be re-architected. The correct answer is C.	Modifying the application to use the appropriate endpoints is not a practical solution as it would require changes to the application code. The correct answer is C: Create an Aurora replica and modify the application to use the appropriate endpoints.
Sampled path #2	Scaling a single EC2 instance is more difficult and problematic than scaling the other service components. Amazon SQS, Lambda, and DynamoDB can all scale automatically to cope with the increased load. The correct answer is C.	(C) Create an Aurora replica and modify the application to use the appropriate endpoints. By configuring an Aurora Replica, you can separate read traffic from write traffic. The Aurora Replicas use different endpoint URLs, allowing you to direct read traffic to the replica instead of the primary database. The replica can process read requests in parallel with write requests to the primary database, reducing I/O and latency.

Clean up

Running batch inference in Amazon Bedrock is subject to charges according to the Amazon Bedrock Pricing. When you complete the walkthrough, delete your SageMaker notebook instance and remove all data from your S3 buckets to avoid incurring future charges.

Considerations

Although the demonstrated solution shows improved performance of language models when prompted with self-consistency, it’s important to note that the walkthrough is not production-ready. Before you deploy to production, you should adapt this proof of concept to your own implementation, keeping in mind the following requirements:

Access restriction to APIs and databases to prevent unauthorized usage.
Adherence to AWS security best practices regarding IAM role access and security groups.
Validation and sanitization of user input to prevent prompt injection attacks.
Monitoring and logging of triggered processes to enable testing and auditing.

Conclusion

This post shows that self-consistency prompting enhances performance of generative language models in complex NLP tasks that require arithmetic and multiple-choice logical skills. Self-consistency uses temperature-based stochastic decoding to generate various reasoning paths. This increases the ability of the model to elicit diverse and useful thoughts to arrive at correct answers.

With Amazon Bedrock batch inference, the language model Cohere Command is prompted to generate self-consistent answers to a set of arithmetic problems. Accuracy improves from 51.7% with greedy decoding to 68% with self-consistency sampling 30 reasoning paths at T=1.0. Sampling five paths already enhances accuracy by 7.5 percent points. The approach is transferable to other language models and reasoning tasks, as demonstrated by results of the AI21 Labs Jurassic-2 Mid model on an AWS Certification exam. In a small-sized question set, self-consistency with five sampled paths increases accuracy by 5 percent points over greedy CoT.

We encourage you to implement self-consistency prompting for enhanced performance in your own applications with generative language models. Learn more about Cohere Command and AI21 Labs Jurassic models available on Amazon Bedrock. For more information about batch inference, refer to Run batch inference.

Acknowledgements

The author thanks technical reviewers Amin Tajgardoon and Patrick McSweeney for helpful feedback.

About the Author

Lucía Santamaría is a Sr. Applied Scientist at Amazon’s ML University, where she’s focused on raising the level of ML competency across the company through hands-on education. Lucía has a PhD in astrophysics and is passionate about democratizing access to tech knowledge and tools.

NVIDIA Celebrates Americas Partners Driving AI-Powered Transformation

NVIDIA recognized 14 partners in the Americas for their achievements in transforming businesses with AI, this week at GTC.

The winners of the NVIDIA Partner Network Americas Partner of the Year awards have helped customers across industries advance their operations with the software, systems and services needed to integrate AI into their businesses.

NPN awards categories span a multitude of competencies and industries, including service delivery, data center networking, public sector, healthcare and higher education.

Three new categories were created this year. One recognizes a partner driving AI-powered success in the financial services industry; one celebrates a partner exhibiting overall AI excellence; and another honors a partner’s dedication to advancing NVIDIA’s full-stack portfolio across a multitude of industries.

All awards reflect the spirit of the NPN ecosystem in driving business success through accelerated full-stack computing and software.

“NVIDIA’s work driving innovation in generative AI has helped partners empower their customers with cutting-edge technology — as well reduced costs and fostered growth opportunities while solving intricate business challenges,” said Rob Enderle, president and principal analyst at the Enderle Group. “The recipients of the 2024 NPN awards embody a diverse array of AI expertise, offering rich knowledge to help customers deploy transformative solutions across industries.”

The 2024 NPN award winners for the Americas are:

AI Excellence Partner of the Year — Lambda received this award for its dedication to providing end-to-end AI solutions featuring NVIDIA accelerated computing and NVIDIA AI Enterprise across NVIDIA DGX and Lambda Cloud.
Enterprise Partner of the Year — World Wide Technology received this newly introduced award for its leadership, dedication and expertise in advancing the adoption of AI with NVIDIA’s portfolio of purpose-built systems, data center networking, software and accelerated computing solutions across machine learning, digital twins, NVIDIA Omniverse and visualization.
Canadian Partner of the Year — Converge Technology Solutions is recognized for its dedication and expertise in NVIDIA DGX systems and for its Canadian customer support services, leveraging training courses from NVIDIA, to further industry knowledge of the NVIDIA software stack.
Financial Services Partner of the Year — CDW received this newly introduced award for its ecosystem partnerships, strategic investments and targeted domain expertise serving financial customers seeking HPC solutions and customer experience solutions such as chatbots and agentless routing.
Global Consulting Partner of the Year — Deloitte is recognized for the fourth consecutive time for its embrace of generative AI and for leveraging the capabilities of NVIDIA DGX Cloud.
Healthcare Partner of the Year — Mark III is recognized for the second consecutive year for its utilization for the NVIDIA healthcare portfolio, which supports biopharma research, academic medical centers, research institutions, healthcare systems and life sciences organizations with NVIDIA infrastructure, software and cloud technologies.
Higher Education Partner of the Year — Cambridge Computer is recognized for the fourth consecutive year for its customer service and technical expertise, bringing NVIDIA AI solutions to the life sciences, education and research sectors.
Networking Partner of the Year — Converge Technology Solutions is recognized for its expertise in NVIDIA high-performance networking solutions to support state-of-the-art accelerated computing deployments.
Public Sector Partner of the Year — Sterling is recognized for its investment and achievements in developing a robust AI practice. This includes assembling a team of dedicated AI software engineers focused on the full-stack NVIDIA platform, establishing Sterling Labs — an AI briefing center near Washington, D.C. — and collaborating with NVIDIA to launch ARC, a 5G/6G platform targeted for next-gen wireless networks.
Rising Star Partner of the Year — International Computer Concepts is recognized for its growth in developing AI and machine learning solutions for cloud service providers and financial services customers to power machine learning training, real-time inference and other AI workloads.
Service Delivery Partner of the Year — Quantiphi is recognized for the third consecutive year for its commitment to driving adoption of NVIDIA software and hardware in the enterprise. Its AI Service Delivery team has demonstrated expertise in using LLMs, information retrieval, imaging and data analytics to solve complex business problems in the telecom, life sciences, retail and energy industries for its global customers.
Distribution Partner of the Year — TD SYNNEX is recognized for demonstrating its commitment to building its AI business on the NVIDIA AI platform, with year-over-year growth that underscores its operational excellence in distribution.
Software Partner of the Year — Insight is recognized for its leadership in NVIDIA AI Enterprise deployments, establishing cutting-edge innovation labs and certifications that cultivate expertise while seamlessly embedding generative AI into its operations.
Solution Integration Partner of the Year — EXXACT is recognized for its commitment and expertise in providing end-to-end NVIDIA AI and high performance computing solutions, including NVIDIA software and data center products across multiple industries.

Learn how to join NPN, or find your local NPN partner.

SCIN: A new resource for representative dermatology images

Posted by Pooja Rao, Research Scientist, Google Research

Health datasets play a crucial role in research and medical education, but it can be challenging to create a dataset that represents the real world. For example, dermatology conditions are diverse in their appearance and severity and manifest differently across skin tones. Yet, existing dermatology image datasets often lack representation of everyday conditions (like rashes, allergies and infections) and skew towards lighter skin tones. Furthermore, race and ethnicity information is frequently missing, hindering our ability to assess disparities or create solutions.

To address these limitations, we are releasing the Skin Condition Image Network (SCIN) dataset in collaboration with physicians at Stanford Medicine. We designed SCIN to reflect the broad range of concerns that people search for online, supplementing the types of conditions typically found in clinical datasets. It contains images across various skin tones and body parts, helping to ensure that future AI tools work effectively for all. We’ve made the SCIN dataset freely available as an open-access resource for researchers, educators, and developers, and have taken careful steps to protect contributor privacy.

Example set of images and metadata from the SCIN dataset.

Dataset composition

The SCIN dataset currently contains over 10,000 images of skin, nail, or hair conditions, directly contributed by individuals experiencing them. All contributions were made voluntarily with informed consent by individuals in the US, under an institutional-review board approved study. To provide context for retrospective dermatologist labeling, contributors were asked to take images both close-up and from slightly further away. They were given the option to self-report demographic information and tanning propensity (self-reported Fitzpatrick Skin Type, i.e., sFST), and to describe the texture, duration and symptoms related to their concern.

One to three dermatologists labeled each contribution with up to five dermatology conditions, along with a confidence score for each label. The SCIN dataset contains these individual labels, as well as an aggregated and weighted differential diagnosis derived from them that could be useful for model testing or training. These labels were assigned retrospectively and are not equivalent to a clinical diagnosis, but they allow us to compare the distribution of dermatology conditions in the SCIN dataset with existing datasets.

The SCIN dataset contains largely allergic, inflammatory and infectious conditions while datasets from clinical sources focus on benign and malignant neoplasms.

While many existing dermatology datasets focus on malignant and benign tumors and are intended to assist with skin cancer diagnosis, the SCIN dataset consists largely of common allergic, inflammatory, and infectious conditions. The majority of images in the SCIN dataset show early-stage concerns — more than half arose less than a week before the photo, and 30% arose less than a day before the image was taken. Conditions within this time window are seldom seen within the health system and therefore are underrepresented in existing dermatology datasets.

We also obtained dermatologist estimates of Fitzpatrick Skin Type (estimated FST or eFST) and layperson labeler estimates of Monk Skin Tone (eMST) for the images. This allowed comparison of the skin condition and skin type distributions to those in existing dermatology datasets. Although we did not selectively target any skin types or skin tones, the SCIN dataset has a balanced Fitzpatrick skin type distribution (with more of Types 3, 4, 5, and 6) compared to similar datasets from clinical sources.

Self-reported and dermatologist-estimated Fitzpatrick Skin Type distribution in the SCIN dataset compared with existing un-enriched dermatology datasets (Fitzpatrick17k, PH², SKINL2, and PAD-UFES-20).

The Fitzpatrick Skin Type scale was originally developed as a photo-typing scale to measure the response of skin types to UV radiation, and it is widely used in dermatology research. The Monk Skin Tone scale is a newer 10-shade scale that measures skin tone rather than skin phototype, capturing more nuanced differences between the darker skin tones. While neither scale was intended for retrospective estimation using images, the inclusion of these labels is intended to enable future research into skin type and tone representation in dermatology. For example, the SCIN dataset provides an initial benchmark for the distribution of these skin types and tones in the US population.

The SCIN dataset has a high representation of women and younger individuals, likely reflecting a combination of factors. These could include differences in skin condition incidence, propensity to seek health information online, and variations in willingness to contribute to research across demographics.

Crowdsourcing method

To create the SCIN dataset, we used a novel crowdsourcing method, which we describe in the accompanying research paper co-authored with investigators at Stanford Medicine. This approach empowers individuals to play an active role in healthcare research. It allows us to reach people at earlier stages of their health concerns, potentially before they seek formal care. Crucially, this method uses advertisements on web search result pages — the starting point for many people’s health journey — to connect with participants.

Our results demonstrate that crowdsourcing can yield a high-quality dataset with a low spam rate. Over 97.5% of contributions were genuine images of skin conditions. After performing further filtering steps to exclude images that were out of scope for the SCIN dataset and to remove duplicates, we were able to release nearly 90% of the contributions received over the 8-month study period. Most images were sharp and well-exposed. Approximately half of the contributions include self-reported demographics, and 80% contain self-reported information relating to the skin condition, such as texture, duration, or other symptoms. We found that dermatologists’ ability to retrospectively assign a differential diagnosis depended more on the availability of self-reported information than on image quality.

Dermatologist confidence in their labels (scale from 1-5) depended on the availability of self-reported demographic and symptom information.

While perfect image de-identification can never be guaranteed, protecting the privacy of individuals who contributed their images was a top priority when creating the SCIN dataset. Through informed consent, contributors were made aware of potential re-identification risks and advised to avoid uploading images with identifying features. Post-submission privacy protection measures included manual redaction or cropping to exclude potentially identifying areas, reverse image searches to exclude publicly available copies and metadata removal or aggregation. The SCIN Data Use License prohibits attempts to re-identify contributors.

We hope the SCIN dataset will be a helpful resource for those working to advance inclusive dermatology research, education, and AI tool development. By demonstrating an alternative to traditional dataset creation methods, SCIN paves the way for more representative datasets in areas where self-reported data or retrospective labeling is feasible.

Acknowledgements

We are grateful to all our co-authors Abbi Ward, Jimmy Li, Julie Wang, Sriram Lakshminarasimhan, Ashley Carrick, Bilson Campana, Jay Hartford, Pradeep Kumar S, Tiya Tiyasirisokchai, Sunny Virmani, Renee Wong, Yossi Matias, Greg S. Corrado, Dale R. Webster, Dawn Siegel (Stanford Medicine), Steven Lin (Stanford Medicine), Justin Ko (Stanford Medicine), Alan Karthikesalingam and Christopher Semturs. We also thank Yetunde Ibitoye, Sami Lachgar, Lisa Lehmann, Javier Perez, Margaret Ann Smith (Stanford Medicine), Rachelle Sico, Amit Talreja, Annisah Um’rani and Wayne Westerlind for their essential contributions to this work. Finally, we are grateful to Heather Cole-Lewis, Naama Hammel, Ivor Horn, Michael Howell, Yun Liu, and Eric Teasley for their insightful comments on the study design and manuscript.

NVIDIA, Huang Win Top Honors in Innovation, Engineering

NVIDIA today was named the world’s most innovative company by Fast Company magazine.

The accolade comes on the heels of company founder and CEO Jensen Huang being inducted into the U.S. National Academy of Engineering.

A team of several dozen journalists at Fast Company — a business media brand launched in 1995 by two Harvard Business Review editors — ranked NVIDIA the leader in its 2024 list of the world’s 50 most innovative companies.

“Putting AI to Work”

“Nvidia isn’t just in the business of providing ever-more-powerful computing hardware and letting everybody else figure out what to do with it,” Fast Company wrote in an article detailing its selection.

“Across an array of industries, the company’s technologies, platforms, and partnerships are doing much of the heavy lifting of putting AI to work,” citing advances in automotive, drug discovery, gaming and retail announced in one recent week.

The article noted the central role of the CUDA compute platform. It also shared an eye-popping experience using NVIDIA Omniverse to interact with a digital twin of a Nissan Z sport coupe.

In a League With Giants

“Even for AI’s titans, building on what Nvidia has created — the more ambitiously, the better — is often how progress happens,” the article concluded.

Last year, OpenAI led the list for ChatGPT, the large language model that started a groundswell in generative AI. In 2021, Moderna and Pfizer-BioNTech topped the ranking for rapidly developing a Covid vaccine.

Fast Company bases its ranking on four criteria: innovation, impact, timeliness and relevance. Launched in 2008, the annual list recognizes organizations that have introduced groundbreaking products, fostered positive social impact and reshaped industries.

NVIDIA invented the GPU in 1999, redefining computer graphics and igniting the era of modern AI. NVIDIA is now driving the platform shift to accelerated computing and generative AI, transforming the world’s largest industries.

Fueling an AI Revolution

Last month, Huang was elected to the National Academy of Engineering (NAE) for contributions in “high-powered graphics processing units, fueling the artificial intelligence revolution.”

Academy membership honors those who have made outstanding contributions such as pioneering new fields of technology. Founded in 1964, the NAE provides a trusted source of engineering advice for creating a healthier, more secure and sustainable world.

“Jensen Huang’s induction into the National Academy of Engineering is a testament to his enduring contributions to our industry and world,” said Satya Nadella, chairman and CEO of Microsoft.

“His visionary leadership has forever transformed computing, from the broad adoption of advanced 3D graphics to today’s GPUs — and, more importantly, has driven foundational innovations across every sector, from gaming and productivity, to digital biology and healthcare. All of us at Microsoft congratulate Jensen on this distinction, and we are honored to partner with him and the entire NVIDIA team on defining this new era of AI,” he added.

“Jensen’s election is incredibly well deserved,” said John Hennessy, president emeritus of Stanford University and an NAE member since 1992.

“His election recognizes both his transformative technical contributions, as well as his incredible leadership of NVIDIA for almost 30 years. I have seen many NAE nominations over the past 30 years, Jensen’s was one of the best!”

Morris Chang, founder of Taiwan Semiconductor Manufacturing Co. and an NAE member since 2002, added his congratulations.

“Jensen is one of the most visionary engineers and charismatic business leaders I have had the pleasure to work with in the last three decades,” he said.

Huang is also a recipient of the Semiconductor Industry Association’s highest honor, the Robert N. Noyce Award, as well as honorary doctorate degrees from Taiwan’s National Chiao Tung University, National Taiwan University and Oregon State University.

Generation Sensation: New Generative AI and RTX Tools Boost Content Creation

Editor’s note: This post is part of our In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Creators are getting a generative AI boost with tools announced at NVIDIA GTC, a global AI conference bringing together the brightest minds in AI content creation and accelerated computing.

Adobe Substance 3D Stager and Sampler via Adobe Firefly, the OBS 30.1 YouTube HDR Beta and NVIDIA Omniverse Audio2Face for iClone 8 will also receive sizable upgrades.

DLSS 3.5 with Ray Reconstruction is coming soon to the NVIDIA RTX Remix Open Beta, enabling modders to upgrade their projects with the power of AI. Sample this leap in high graphical fidelity with the new Portal With RTX update available on Steam with DLSS Ray Reconstruction, which provides enhanced ray-traced imagery. Learn more about the DLSS 3.5 update to Portal With RTX.

The March NVIDIA Studio Driver, optimizing the latest creative app updates, is available for download today.

A March of Creative App Upgrades

The Adobe Substance 3D Stager beta announced a new Generative Background feature — powered by Adobe Firefly — to create backdrops for rendered images. Stager’s Match Image tool uses machine learning to accurately place 3D models within the generated background, optimizing lighting and perspective for greater flexibility and realism.

Meanwhile, Substance 3D Sampler’s announced Text to Texture beta — also powered by Adobe Firefly — gives artists a new way to source texture imagery using only a description. All Text to Texture images are square and tileable with proper perspective, ready for material-creation workflows.

Learn more about both apps in the GTC session “Elevating 3D Concepts: GenAI-Infused Design.” Search the GTC session catalog and check out the “Content Creation / Rendering / Ray Tracing” and “Generative AI” topics for additional creator-focused sessions.

The recently launched OBS 30.1 beta will enable content creators to use Real-Time Messaging Protocol — an Adobe open-source protocol designed to stream audio and video by maintaining low-latency connections — to stream high-dynamic range, high-efficiency video coding content to YouTube. Download OBS Beta 30.1 on the OBS website to get started.

NVIDIA Omniverse Audio2Face for iClone 8 uses AI to produce expressive facial animations solely from audio input. In addition to generating natural lip-sync animations for multilingual dialogue, the latest standalone release supports multilingual lip-sync and singing animations, as well as full-spectrum editing with slider controls and a keyframe editor.

For more information on how RTX is powering premium AI capabilities and performance, check out the new AI Decoded blog series and sign up to receive updates weekly.

Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

What’s cooking: How Hellmann’s is using Google Cloud AI to turn leftovers into meals

Hellmann’s introduces Meal Reveal, a new tool powered by Google Cloud AI that analyses fridge contents and suggests matching recipes.Read More

“We Created a Processor for the Generative AI Era,” NVIDIA CEO Says

Generative AI promises to revolutionize every industry it touches — all that’s been needed is the technology to meet the challenge.

NVIDIA CEO Jensen Huang on Monday introduced that technology—the company’s new Blackwell computing platform—as he outlined the major advances that increased computing power can deliver for everything from software to services, robotics to medical technology, and more.

“Accelerated computing has reached the tipping point — general purpose computing has run out of steam,” Huang told more than 11,000 GTC attendees gathered in-person — and many tens of thousands more online — for his keynote address at Silicon Valley’s cavernous SAP Center arena.

“We need another way of doing computing — so that we can continue to scale so that we can continue to drive down the cost of computing, so that we can continue to consume more and more computing while being sustainable. Accelerated computing is a dramatic speedup over general purpose computing, in every single industry.”

Huang spoke in front of massive images on a 40-foot tall, 8k screen the size of a tennis court to a crowd packed with CEOs and developers, AI enthusiasts and entrepreneurs, who walked together 20 minutes to the arena from the San Jose Convention Center on a dazzling spring day.

Delivering a massive upgrade to the world’s AI infrastructure, Huang introduced the NVIDIA Blackwell platform to unleash real-time generative AI on trillion-parameter large language models.

Huang presented NVIDIA NIM — a reference to NVIDIA inference microservices — a new way of packaging and delivering software that connects developers with hundreds of millions of GPUs to deploy custom AI of all kinds.

And bringing AI into the physical world, Huang introduced Omniverse Cloud APIs to deliver advanced simulation capabilities.

Huang punctuated these major announcements with powerful demos, partnerships with some of the world’s largest enterprises, and more than a score of announcements detailing his vision.

GTC — which in 15 years has grown from the confines of a local hotel ballroom to the world’s most important AI conference — is returning to a physical event for the first time in five years.

This year’s has over 900 sessions — including a panel discussion on transformers moderated by Huang with the eight pioneers who first developed the technology, more than 300 exhibits, and 20-plus technical workshops.

It’s an event that’s at the intersection of AI and just about everything. In a stunning opening act to the keynote, Refik Anadol, the world’s leading AI artist, showed a massive real-time AI data sculpture with wave-like swirls in greens, blues, yellows and reds, crashing, twisting and unraveling across the screen.

As he kicked off his talk, Huang explained that the rise of multi-modal AI — able to process diverse data types handled by different models — gives AI greater adaptability and power. By increasing their parameters, these models can handle more complex analyses.

But this also means a significant rise in the need for computing power. And as these collaborative, multi-modal systems become more intricate — with as many as a trillion parameters — the demand for advanced computing infrastructure intensifies.

“We need even larger models,” Huang said. “We’re going to train it with multimodality data, not just text on the internet, we’re going to train it on texts and images, graphs and charts, and just as we learned watching TV there’s going to be a whole bunch of watching video.”

The Next Generation of Accelerated Computing

In short, Huang said “we need bigger GPUs.” The Blackwell platform is built to meet this challenge. Huang pulled a Blackwell chip out of his pocket and held it up side-by-side with a Hopper chip, which it dwarfed.

Named for David Harold Blackwell — a University of California, Berkeley mathematician specializing in game theory and statistics, and the first Black scholar inducted into the National Academy of Sciences — the new architecture succeeds the NVIDIA Hopper architecture, launched two years ago.

Blackwell delivers 2.5x its predecessor’s performance in FP8 for training, per chip, and 5x with FP4 for inference. It features a fifth-generation NVLINK interconnect that’s twice as fast as Hopper and scales up to 576 GPUs.

And the NVIDIA GB200 Grace Blackwell Superchip connects two Blackwell NVIDIA B200 Tensor Core GPUs to the NVIDIA Grace CPU over a 900GB/s ultra-low-power NVLink chip-to-chip interconnect.

Huang held up a board with the system. “This computer is the first of its kind where this much computing fits into this small of a space,: Huang said. “Since this is memory coherent they feel like it’s one big happy family working on one application together.”

For the highest AI performance, GB200-powered systems can be connected with the NVIDIA Quantum-X800 InfiniBand and Spectrum-X800 Ethernet platforms, also announced today, which deliver advanced networking at speeds up to 800Gb/s.

“The amount of energy we save, the amount of networking bandwidth we save, the amount of wasted time we save, will be tremendous,” Huang said. “The future is generative…which is why this is a brand new industry. The way we compute is fundamentally different. We created a processor for the generative AI era.”

To scale up Blackwell, NVIDIA built a new chip called NVLINK Switch. Each can connect four NVLinks at 1.8 terabytes per second and eliminate traffic by doing in-network reduction.

NVIDIA Switch and GB200 are key components of what Huang described as “one giant GPU,” the NVIDIA GB200 NVL72, a multi-node, liquid-cooled, rack-scale system that harnesses Blackwell to offer supercharged compute for trillion-parameter models, with 720 petaflops of AI training performance and 1.4 exaflops of AI inference performance in a single rack.

“There are only a couple, maybe three exaflop machines on the planet as we speak,” Huang said of the machine, which packs 600,000 parts and weighs 3,000 pounds. “And so this is an exaflop AI system in one single rack. Well let’s take a look at the back of it.”

Going even bigger, NVIDIA today also announced its next-generation AI supercomputer — the NVIDIA DGX SuperPOD powered by NVIDIA GB200 Grace Blackwell Superchips — for processing trillion-parameter models with constant uptime for superscale generative AI training and inference workloads.

Featuring a new, highly efficient, liquid-cooled rack-scale architecture, the new DGX SuperPOD is built with NVIDIA DG GB200 systems and provides 11.5 exaflops of AI supercomputing at FP4 precision and 240 terabytes of fast memory — scaling to more with additional racks.

“In the future, data centers are going to be thought of…as AI factories,” Huang said. “Their goal in life is to generate revenues, in this case, intelligence.”

The industry has already embraced Blackwell.

The press release announcing Blackwell includes endorsements from Alphabet and Google CEO Sundar Pichai, Amazon CEO Andy Jassy, Dell CEO Michael Dell, Google DeepMind CEO Demis Hassabis, Meta CEO Mark Zuckerberg, Microsoft CEO Satya Nadella, OpenAI CEO Sam Altman, Oracle Chairman Larry Ellison, and Tesla and xAI CEO Elon Musk.

Blackwell is being adopted by every major global cloud services provider, pioneering AI companies, system and server vendors, and regional cloud service providers and telcos all around the world.

“The whole industry is gearing up for Blackwell,” which Huang said would be the most successful launch in the company’s history.

A New Way to Create Software

Generative AI changes the way applications are written, Huang said.

Rather than writing software, he explained, companies will assemble AI models, give them missions, give examples of work products, review plans and intermediate results.

These packages — NVIDIA NIMs, a reference to NVIDIA inference microservices — are built from NVIDIA’s accelerated computing libraries and generative AI models, Huang explained.

“How do we build software in the future? It is unlikely that you’ll write it from scratch or write a whole bunch of Python code or anything like that,” Huang said. “It is very likely that you assemble a team of AIs.”

The microservices support industry-standard APIs so they are easy to connect, work across NVIDIA’s large CUDA installed base, are re-optimized for new GPUs, and are constantly scanned for security vulnerabilities and exposures.

Huang said customers can use NIM microservices off-the-shelf, or NVIDIA can help build proprietary AI and co-pilots, teaching a model specialized skills only your company would know to create invaluable new services.

“The enterprise IT industry is sitting on a goldmine,” Huang said. “They have all these amazing tools (and data) that have been created over the years. If they could take that goldmine and turn it into copilots, these copilots can help us do things.”

Major tech players are already putting it to work. Huang detailed how NVIDIA is already helping Cohesity, NetApp, SAP, ServiceNow, and Snowflake build co-pilots and virtual assistants. And industries are stepping in, as well.

In telecoms, Huang announced the NVIDIA 6G research cloud, a generative AI and Omniverse-powered platform to advance the next communications era. It’s built with NVIDIA’s Sionna neural radio framework, NVIDIA Aerial CUDA-accelerated radio access network and the NVIDIA Aerial Omniverse Digital Twin for 6G.

In semiconductor design and manufacturing, Huang announced that, in collaboration with TSMC and Synopsys, NVIDIA is bringing its breakthrough computational lithography platform, cuLitho, to production. This platform will accelerate the most compute-intensive workload in semiconductor manufacturing by 40-60x.

Huang also announced the NVIDIA Earth Climate Digital Twin. The cloud platform — available now — enables interactive, high-resolution simulation to accelerate climate and weather prediction.

The greatest impact of AI will be in healthcare, Huang said, explaining that NVIDIA is already in imaging systems, in gene sequencing instruments and working with leading surgical robotics companies.

NVIDIA is launching a new type of biology software. NVIDIA today launched more than two dozen new microservices that allow healthcare enterprises worldwide to take advantage of the latest advances in generative AI from anywhere and on any cloud. They offer advanced imaging, natural language and speech recognition, and digital biology generation, prediction and simulation.

Omniverse Brings AI to the Physical World

The next wave of AI will be AI learning about the physical world, Huang said.

“We need a simulation engine that represents the world digitally for the robot so that the robot has a gym to go learn how to be a robot,” he said. “We call that virtual world Omniverse.”

That’s why NVIDIA today announced that NVIDIA Omniverse Cloud will be available as APIs, extending the reach of the world’s leading platform for creating industrial digital twin applications and workflows across the entire ecosystem of software makers.

The five new Omniverse Cloud application programming interfaces enable developers to easily integrate core Omniverse technologies directly into existing design and automation software applications for digital twins, or their simulation workflows for testing and validating autonomous machines like robots or self-driving vehicles.

To show how this works, Huang shared a demo of a robotic warehouse — using multi-camera perception and tracking — watching over workers and orchestrating robotic forklifts, which are driving autonomously with the full robotic stack running.

Hang also announced that NVIDIA is bringing Omniverse to Apple Vision Pro, with the new Omniverse Cloud APIs letting developers stream interactive industrial digital twins into the VR headsets.

Some of the world’s largest industrial software makers are embracing Omniverse Cloud APIs, including Ansys, Cadence, Dassault Systèmes for its 3DEXCITE brand, Hexagon, Microsoft, Rockwell Automation, Siemens and Trimble.

Robotics

Everything that moves will be robotic, Huang said. The automotive industry will be a big part of that, NVIDIA computers are already in cars, trucks, delivery bots and robotaxis.

Huang announced that BYD, the world’s largest AV company, has selected NVIDIA’s next-generation computer for their AV, building its next-generation EV fleets on DRIVE Thor.

To help robots better see their environment, Huang also announced the Isaac Perceptor software development kit with state-of-the-art multi-camera visual odometry, 3D reconstruction and occupancy map, and depth perception.

And to help make manipulators, or robotic arms, more adaptable, NVIDIA is announcing Isaac Manipulator — a state-of-the-art robotic arm perception, path planning and kinematic control library.
Finally, Huang announced Project GR00T, a general-purpose foundation model for humanoid robots, designed to further the company’s work driving breakthroughs in robotics and embodied AI.

Supporting that effort, Huang unveiled a new computer, Jetson Thor, for humanoid robots based on the NVIDIA Thor system-on-a-chip and significant upgrades to the NVIDIA Isaac robotics platform.

In his closing minutes, Huang brought on stage a pair of diminutive NVIDIA-powered robots from Disney Research.

“The soul of NVDIA — the intersection of computer graphics, physics, artificial intelligence,” he said.“It all came to bear at this moment.”

All Aboard: NVIDIA Scores 23 World Records for Route Optimization

With nearly two dozen world records to its name, NVIDIA cuOpt now holds the top spot for 100% of the largest routing benchmarks in the last three years. And this means the route optimization engine allows industries to hop on board for all kinds of cost-saving efficiencies.

Kawasaki Heavy Industries and SyncTwin are among the companies that are riding cuOpt for logistics improvements.

Today at GTC 2024, NVIDIA founder and CEO Jensen Huang announced that cuOpt is moving into general availability.

“With cuOpt, NVIDIA is reinventing logistics management and operations research. It is NVIDIA’s pre-quantum computer, driving transformational operational efficiencies for deliveries, service calls, warehouses and factories, and supply chains,” he said.

The NVIDIA cuOpt microservice, part of the NVIDIA AI Enterprise software platform, makes accelerated optimization for real-time dynamic rerouting, factory optimization and robotic simulations available to any organization.

Companies can embed cuOpt into the advanced 3D tools, applications and USD-based workflows they develop with NVIDIA Omniverse, a software platform for developing and deploying advanced 3D applications and pipelines based on OpenUSD.

Implemented together, cuOpt, Omniverse and NVIDIA Metropolis for Factories can help optimize and create safe environments in logistics-heavy facilities that rely on complex automation, precise material flow and human-robot interaction, such as automotive factories, semiconductor fabs and warehouses.

cuOpt has been continuously tested against the best-known solutions on the most studied benchmarks for route optimization, with results up to 100x faster than CPU-based implementations. With 15 records from the Gehring & Homberger vehicle routing benchmark and eight from the Li & Lim pickup and delivery benchmark, cuOpt has demonstrated the world’s best accuracy with the fastest times.

AI promises to deliver logistics efficiencies spanning from transportation networks to manufacturing and much more.

Delivering Cost-Savings for Inspections With cuOpt

Kawasaki Heavy Industries is a manufacturing company that’s been building large machinery for more than a hundred years. The Japanese company partnered with Slalom and used cuOpt to create routing efficiencies for the development of its AI-driven Kawasaki Track Maintenance Platform.

Railroad track maintenance is getting an AI makeover worldwide. Traditionally, track inspections and maintenance are time-consuming and difficult to manage to keep trains running on time. But track maintenance is critical for safety and transportation service. Railway companies are automating track inspections with AI and machine learning paired with digital cameras, lasers and gyrometric sensors.

Kawasaki is harnessing the edge computing of NVIDIA Jetson AGX Orin to develop track inspections on its Track Maintenance Platform for running onboard trains. The platform enables customers to improve vision models with the data collected on tracks for advances in the inspection capability of the edge-based AI system.

The platform provides maintenance teams data on track conditions that allows them to prioritize repairs, creating increased safety and reliability of operations.

According to Kawasaki, it’s estimated that such an AI-driven system can save $218 million a year for seven companies from automating their track inspections.

Creating Manufacturing Efficiencies With cuOpt and Omniverse

A worldwide leader in automotive seating manufacturing has adopted SyncTwin’s digital twin capability, which is driven by Omniverse and cuOpt, to improve its operations with AI.

The global automotive seating manufacturer has a vast network of loading docks for the delivery of raw materials, and forklifts for unloading and transporting them to storage and handling areas to ensure a steady supply to production lines. SyncTwin’s connection to cuOpt delivers routing efficiencies that optimize all of these moving parts — from vehicles to robotic pallet jacks.

As the SyncTwin solution was developed on top of Omniverse and USD, manufacturers can ensure that their various factory planning tools can contribute to the creation of a rich digital twin environment. Plus, they eliminate tedious manual data collection and gain new insights from their previously disconnected data.

Attend GTC to explore how cuOpt is achieving world-record accuracy and performance to solve complex problems. Learn more about cuOpt world records in our tech blog. Learn more about Omniverse.

All Eyes on AI: Automotive Tech on Full Display at GTC 2024

All eyes across the auto industry are on GTC — the global AI conference running in San Jose, Calif., and online through Thursday, March 21 — as the world’s top automakers and tech leaders converge to showcase the latest models, demo new technologies and dive into the remarkable innovations reshaping the sector.

Attendees will experience how generative AI and software-defined computing are advancing the automotive landscape and transforming the behind-the-wheel experience to become safer, smarter and more enjoyable.

Automakers Adopting NVIDIA DRIVE Thor

NVIDIA founder and CEO Jensen Huang kicked off GTC with a keynote address in which he revealed that NVIDIA DRIVE Thor, which combines advanced driver assistance technology and in-vehicle infotainment, now features the newly announced NVIDIA Blackwell GPU architecture for transformer and generative AI workloads.

Following the keynote, top EV makers shared how they will integrate DRIVE Thor into their vehicles. BYD, the world’s largest electric vehicle maker, is expanding its ongoing collaboration with NVIDIA and building its next-generation EV fleets on DRIVE Thor. Hyper, a premium luxury brand owned by GAC AION, is announcing it has selected DRIVE Thor for its new models, which will begin production in 2025. XPENG will use DRIVE Thor as the AI brain of its next-generation EV fleets. These EV makers join Li Auto and ZEEKR, which previously announced they’re building their future vehicle roadmaps on DRIVE Thor.

Additionally, trucking, robotaxis and goods delivery vehicle makers are announcing support for DRIVE Thor. Nuro is choosing DRIVE Thor to power the Nuro Driver. Plus is announcing that future generations of its level 4 solution, SuperDrive, will run on DRIVE Thor. Waabi is leveraging DRIVE Thor to deliver the first generative AI-powered autonomous trucking solution to market. WeRide, in cooperation with tier 1 partner Lenovo Vehicle Computing, is creating level 4 autonomous driving solutions for commercial applications built on DRIVE Thor.

And, DeepRoute.ai is unveiling its new smart driving architecture powered by NVIDIA DRIVE Thor, scheduled to launch next year.

Next-Generation Tech on the Show Floor

The GTC exhibit hall is buzzing with excitement as companies showcase the newest vehicle models and offer technology demonstrations.

Attendees have the opportunity to see firsthand the latest NVIDIA-powered vehicles on display, including Lucid Air, Mercedes-Benz Concept CLA Class, Nuro R3, Polestar 3, Volvo EX90, WeRide Robobus, and an Aurora truck. The Lucid Air is available for test drives during the week.

A wide array of companies are showcasing innovative automotive technology at GTC, including Foretellix, Luminar and MediaTek, which is launching its Dimensity Auto Cockpit chipsets at the show. The new solutions harness NVIDIA’s graphics and AI technologies to help deliver state-of-the-art in-vehicle user experiences, added safety and security capabilities.

Also Announced at GTC: Omniverse Cloud APIs, Generative AI

Omniverse Cloud APIs, announced today at NVIDIA GTC, are poised to accelerate the path to autonomy by enabling high-fidelity sensor simulation for AV development and validation. Developers and software vendors such as CARLA, MathWorks, MITRE, Foretellix and Voxel51 underscore the broad appeal of these APIs in autonomous vehicles.
Generative AI developers including Cerence, Geely, Li Auto, NIO, SoundHound, Tata Consulting Services and Wayve announced plans to transform the in-vehicle experience by using NVIDIA’s cloud-to-edge technology to help develop intelligent AI assistants, driver and passenger monitoring, scene understanding and more.

AI and Automotive Sessions Available Live and on Demand

Throughout the week, the world’s foremost experts on automotive technology will lead a broad array of sessions and panels at GTC, including:

How LLMs and Generative AI Will Enhance the Way We Experience Self-Driving Cars
Tuesday, March 19, 9 a.m. PT
Accelerating the New Era of Autonomous Vehicles With Generative AI
Tuesday, March 19, 10 a.m. PT
Generative AI and Industrial Digitalization in the Automotive Industry
Tuesday, March 19, 2 p.m. PT
Accelerating Automotive Workflows With Large Language Models
Tuesday, March 19, 3 p.m. PT
Accelerating the Shift to AI-Defined Vehicles
Thursday, March 21, 8 a.m. PT

On DRIVE Developer Day, taking place Thursday, March 21, NVIDIA’s engineering experts will highlight the latest DRIVE features and developments through a series of deep-dive sessions on how to build safe and robust self-driving systems.

See the full schedule of automotive programming at GTC and be sure to tune in.

Generative AI Developers Harness NVIDIA Technologies to Transform In-Vehicle Experiences

Cars of the future will be more than just modes of transportation; they’ll be intelligent companions, seamlessly blending technology and comfort to enhance driving experiences, and built for safety, inside and out.

NVIDIA GTC, running this week at the San Jose Convention Center, will spotlight the groundbreaking work NVIDIA and its partners are doing to bring the transformative power of generative AI, large language models and visual language models to the mobility sector.

At its booth, NVIDIA will showcase how it’s building automotive assistants to enhance driver safety, security and comfort through enhanced perception, understanding and generative capabilities powered by deep learning and transformer models.

Talking the Talk

LLMs, a form of generative AI, largely represent a class of deep-learning architectures known as transformer models, which are neural networks adept at learning context and meaning.

Vision language models are another derivative of generative AI, that offer image processing and language understanding capabilities. Unlike traditional or multimodal LLMs that primarily process and generate text-based data, VLMs can analyze and generate text via images or videos.

And retrieval-augmented generations allows manufacturers to access knowledge from a specific database or the web to assist drivers.

These technologies together enable NVIDIA Avatar Cloud Engine, or ACE, and multimodal language models to work together with the NVIDIA DRIVE platform to let automotive manufacturers develop their own intelligent in-car assistants.

For example, an Avatar configurator can allow designers to build unique, brand-inspired personas for their cars, complete with customized voices and emotional attributes. These AI-animated avatars can engage in natural dialogue, providing real-time assistance, recommendations and personalized interactions.

Furthermore, AI-enhanced surround visualization enhances vehicle safety using 360-degree camera reconstruction, while the intelligent assistant sources external information, such as local driving laws, to inform decision-making.

Personalization is paramount, with AI assistants learning driver and passenger habits and adapting its behavior to suit occupants’ needs.

Generative AI for Automotive in Full Force at GTC

Several NVIDIA partners at GTC are also showcasing their latest generative AI developments using NVIDIA’s edge-to-cloud technology:

Cerence’s CaLLM is an automotive-specific LLM that serves as the foundation for the company’s next-gen in-car computing platform, running on NVIDIA DRIVE. The platform, unveiled late last year, is the future of in-car interaction, with an automotive- and mobility-specific assistant that provides an integrated in-cabin experience. Cerence is collaborating with NVIDIA engineering teams for deeper integration of CaLLM with the NVIDIA AI Foundation Models. Through joint efforts, Cerence is harnessing NVIDIA DGX Cloud as the development platform, applying guardrails for enhanced performance, and leveraging NVIDIA AI Enterprise to optimize inference. NVIDIA and Cerence will continue to partner and pioneer this solution together with several automotive OEMs this year.
Wavye is helping usher in the new era of Embodied AI for autonomy, their next-generation AV2.0 approach is characterized by a large Embodied AI foundation model that learns to drive self-supervised using AI end-to-end —from sensing, as an input, to outputting driving actions. The British startup has already unveiled its GAIA-1, a generative world model for AV development running on NVIDIA; alongside LINGO-1, a closed-loop driving commentator that uses natural language to enhance the learning and explainability of AI driving models.
Li Auto unveiled its multimodal cognitive model, Mind GPT, in June. Built on NVIDIA TensorRT-LLM, an open-source library, it serves as the basis for the electric vehicle maker’s AI assistant, Lixiang Tongxue, for scene understanding, generation, knowledge retention and reasoning capabilities. Li Auto is currently developing DriveVLM to enhance autonomous driving capabilities, enabling the system to understand complex scenarios, particularly those that are challenging for traditional AV pipelines, such as unstructured roads, rare and unusual objects, and unexpected traffic events. This advanced model is trained on the NVIDIA GPUs and utilizes TensorRT-LLM and NVIDIA Triton Inference Server for data generation in the data center. With inference optimized by NVIDIA DRIVE and TensorRT-LLM, DriveVLMs perform efficiently on embedded systems.
NIO launched its NOMI GPT, which offers a number of functional experiences, including NOMI Encyclopedia Q&A, Cabin Atmosphere Master and Vehicle Assistant. With the capabilities enabled by LLMs and an efficient computing platform powered by NVIDIA AI stacks, NOMI GPT is capable of basic speech recognition and command execution functions and can use deep learning to understand and process more complex sentences and instructions inside the car.
Geely is working with NVIDIA to provide intelligent cabin experiences, along with accelerated edge-to-cloud deployment. Specifically, Geely is applying generative AI and LLM technology to provide smarter, personalized and safer driving experiences, using natural language processing, dialogue systems and predictive analytics for intelligent navigation and voice assistants. When deploying LLMs into production, Geely uses NVIDIA TensorRT-LLM to achieve highly efficient inference. For more complex tasks or scenarios requiring massive data support, Geely plans to deploy large-scale models in the cloud.
Waabi is building AI for self-driving and will use the generative AI capabilities afforded by NVIDIA DRIVE Thor for its breakthrough autonomous trucking solutions, bringing safe and reliable autonomy to the trucking industry.
Lenovo is unveiling a new AI acceleration engine, dubbed UltraBoost, which will run on NVIDIA DRIVE, and features an AI model engine and AI compiler tool chains to facilitate the deployment of LLMs within vehicles.
SoundHound AI is using NVIDIA to run its in-vehicle voice interface — which combines both real-time and generative AI capabilities — even when a vehicle has no cloud connectivity. This solution also offers drivers access to SoundHound’s Vehicle Intelligence product, which instantly delivers settings, troubleshooting and other information directly from the car manual and other data sources via natural speech, as opposed to through a physical document.
Tata Consulting Services (part of the TATA Group), through its AI-based technology and engineering innovation, has built its automotive GenAI suite powered by NVIDIA GPUs and software frameworks. It accelerates the design, development, and validation of software-defined vehicles, leveraging the various LLMs and VLMs for in-vehicle and cloud-based systems.
MediaTek is announcing four automotive systems-on-a-chip within its Dimensity Auto Cockpit portfolio, offering powerful AI-based in-cabin experiences for the next generation of intelligent vehicles that span from premium to entry level. To support deep learning capabilities, the Dimensity Auto Cockpit chipsets integrate NVIDIA’s next-gen GPU-accelerated AI computing and NVIDIA RTX-powered graphics to run LLMs in the car, allowing vehicles to support chatbots, rich content delivery to multiple displays, driver alertness detection and other AI-based safety and entertainment applications.

Check out the many automotive talks on generative AI and LLMs throughout the week of GTC.

Register today to attend GTC in person, or tune in virtually, to explore how generative AI is making transportation safer, smarter and more enjoyable.

Overview of solution

Decoding strategies for text generation

Prompting techniques: CoT and self-consistency

Prerequisites

Dataset to probe arithmetic reasoning capabilities

Set up to run batch inference with Amazon Bedrock

Format and upload input data to Amazon S3

Create and run batch inference jobs in Amazon Bedrock

Self-consistency enhances model accuracy on arithmetic tasks

Practical considerations on efficiency and cost

Self-consistency enhances model performance beyond arithmetic reasoning

Clean up

Considerations

Conclusion

Acknowledgements

About the Author

Dataset composition

Crowdsourcing method

Acknowledgements

“Putting AI to Work”

In a League With Giants

Fueling an AI Revolution

A March of Creative App Upgrades

The Next Generation of Accelerated Computing

A New Way to Create Software

Omniverse Brings AI to the Physical World

Robotics

Delivering Cost-Savings for Inspections With cuOpt

Creating Manufacturing Efficiencies With cuOpt and Omniverse

Automakers Adopting NVIDIA DRIVE Thor

Next-Generation Tech on the Show Floor

Also Announced at GTC: Omniverse Cloud APIs, Generative AI

AI and Automotive Sessions Available Live and on Demand

Talking the Talk

Generative AI for Automotive in Full Force at GTC

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.