December 2022 – Page 6

Introducing Fortuna: A library for uncertainty quantification

Proper estimation of predictive uncertainty is fundamental in applications that involve critical decisions. Uncertainty can be used to assess the reliability of model predictions, trigger human intervention, or decide whether a model can be safely deployed in the wild.

We introduce Fortuna, an open-source library for uncertainty quantification. Fortuna provides calibration methods, such as conformal prediction, that can be applied to any trained neural network to obtain calibrated uncertainty estimates. The library further supports a number of Bayesian inference methods that can be applied to deep neural networks written in Flax. The library makes it easy to run benchmarks and will enable practitioners to build robust and reliable AI solutions by taking advantage of advanced uncertainty quantification techniques.

The problem of overconfidence in deep learning

If you have ever looked at class probabilities returned by a trained deep neural network classifier, you might have observed that the probability of one class was much larger than the others. Something like this, for example:

p = [0.0001, 0.0002, …, 0.9991, 0.0003, …, 0.0001]

If this is the case for the majority of the predictions, your model might be overconfident. In order to evaluate the validity of the probabilities returned by the classifier, we may compare them with the actual accuracy achieved over a holdout data set. Indeed, it is natural to assume that the proportion of correctly classified data points should approximately match the estimated probability of the predicted class. This concept is known as calibration [Guo C. et al., 2017].

Unfortunately, many trained deep neural networks are miscalibrated, meaning that the estimated probability of the predicted class is much higher than the proportion of correctly classified input data points. In other words, the classifier is overconfident.

Being overconfident might be problematic in practice. A doctor may not order relevant additional tests, as a result of an overconfident healthy diagnosis produced by an AI. A self-driving car may decide not to brake because it confidently assessed that the object in front was not a person. A governor may decide to evacuate a town because the probability of an eminent natural disaster estimated by an AI is too high. In these and many other applications, calibrated uncertainty estimates are critical to assess the reliability of model predictions, fall back to a human decision-maker, or decide whether a model can be safely deployed.

Fortuna: A library for uncertainty quantification

There are many published techniques to either estimate or calibrate the uncertainty of predictions, e.g., Bayesian inference [Wilson A.G., 2020], temperature scaling [Guo C. et al., 2017], and conformal prediction [Angelopoulos A.N. et al., 2022] methods. However, existing tools and libraries for uncertainty quantification have a narrow scope and do not offer a breadth of techniques in a single place. This results in a significant overhead, hindering the adoption of uncertainty into production systems.

In order to fill this gap, we launch Fortuna, a library for uncertainty quantification that brings together prominent methods across the literature and makes them available to users with a standardized and intuitive interface.

As an example, suppose you have training, calibration, and test data loaders in tensorflow.Tensor format, namely train_data_loader, calib_data_loader and test_data_loader. Furthermore, you have a deep learning model written in Flax, namely model. Then you can use Fortuna to:

fit a posterior distribution;
calibrate the model outputs;
make calibrated predictions;
estimate uncertainty estimates;
compute evaluation metrics.

The following code does all of this for you.

from fortuna.data import DataLoader
from fortuna.prob_model.classification import ProbClassifier
from fortuna.metric.classification import expected_calibration_error

# convert data loaders
train_data_loader = DataLoader.from_tensorflow_data_loader(train_data_loader)
calib_data_loader = DataLoader.from_tensorflow_data_loader(calib_data_loader)
test_data_loader = DataLoader.from_tensorflow_data_loader(test_data_loader)

# define and train a probabilistic model
prob_model = ProbClassifier(model=model)
train_status = prob_model.train(train_data_loader=train_data_loader, calib_data_loader=calib_data_loader)

# make predictions and estimate uncertainty
test_inputs_loader = test_data_loader.to_inputs_loader()
test_means = prob_model.predictive.mean(inputs_loader=test_inputs_loader)
test_modes = prob_model.predictive.mode(inputs_loader=test_inputs_loader, means=test_means)

# compute the expected calibration error and plot a reliability diagram
test_targets = test_data_loader.to_array_targets()
ece = expected_calibration_error(preds=test_modes, probs=test_means, targets=test_targets)

The code above makes use of several default choices, including SWAG [Maddox W.J. et al., 2019] as a posterior inference method, temperature scaling [Guo C. et al., 2017] to calibrate the model outputs, and a standard Gaussian prior distribution, as well as the configuration of the posterior fitting and calibration processes. You can easily configure all of these components, and you are highly encouraged to do so if you are looking for a specific configuration or if you want to compare several ones.

Usage modes

Fortuna offers three usage modes: 1/ Starting from Flax models, 2/ Starting from model outputs, and 3/ Starting from uncertainty estimates. Their pipelines are depicted in the following figure, each starting from one of the green panels. The code snippet above is an example of using Fortuna starting from Flax models, which allows training a model using Bayesian inference procedures. Alternatively, you can start either by model outputs or directly from your own uncertainty estimates. Both these latter modes are framework independent and help you obtain calibrated uncertainty estimates starting from a trained model.

1/ Starting from uncertainty estimates

Starting from uncertainty estimates has minimal compatibility requirements, and it is the quickest level of interaction with the library. This usage mode offers conformal prediction methods for both classification and regression. These take uncertainty estimates in numpy.ndarray format and return rigorous sets of predictions that retain a user-given level of probability. In one-dimensional regression tasks, conformal sets may be thought of as calibrated versions of confidence or credible intervals.

Mind that if the uncertainty estimates that you provide in inputs are inaccurate, conformal sets might be large and unusable. For this reason, if your application allows it, please consider the Starting from model outputs and Starting from Flax models usage modes detailed below.

2/ Starting from model outputs

This mode assumes you have already trained a model in some framework and arrive at Fortuna with model outputs in numpy.ndarray format for each input data point. This usage mode allows you to calibrate your model outputs, estimate uncertainty, compute metrics, and obtain conformal sets.

Compared to the Starting from uncertainty estimates usage mode, Starting from model outputs provides better control, as it can make sure uncertainty estimates have been appropriately calibrated. However, if the model had been trained with classical methods, the resulting quantification of model (aka epistemic) uncertainty may be poor. To mitigate this problem, please consider the Starting from Flax models usage mode.

3/ Starting from Flax models

Starting from Flax models has higher compatibility requirements than the Starting from uncertainty estimates and Starting from model outputs usage modes, as it requires deep learning models written in Flax. However, it enables you to replace standard model training with scalable Bayesian inference procedures, which may significantly improve the quantification of predictive uncertainty.

Bayesian methods work by representing uncertainty over which solution is correct, given limited information, through uncertainty over model parameters. This type of uncertainty is called “epistemic” uncertainty. Because neural networks can represent many different solutions, corresponding to different settings of their parameters, Bayesian methods can be especially impactful in deep learning. We provide many scalable Bayesian inference procedures, which can often be used to provide uncertainty estimates, as well as improved accuracy and calibration, with essentially no training-time overhead.

Conclusion

We announced the general availability of Fortuna, a library for uncertainty quantification in deep learning. Fortuna brings together prominent methods across the literature, e.g., conformal methods, temperature scaling, and Bayesian inference, and makes them available to users with a standardized and intuitive interface. To get started with Fortuna, you can consult the following resources:

Try Fortuna out, and let us know what you think! You are encouraged to contribute to the library or leave your suggestions and contributions—just create an issue or open a pull request. On our side, we will keep on improving Fortuna, increase its coverage of uncertainty quantification methods and add further examples that showcase its usefulness in several scenarios.

About the authors

Gianluca Detommaso is an Applied Scientist at AWS. He currently works on uncertainty quantification in deep learning. In his spare time, Gianluca likes to practice sports, eating great food and learning new skills.

Alberto Gasparin is an Applied Scientist within Amazon Community Shopping since July 2021. His interests include natural language processing, information retrieval and uncertainty quantification. He is a food and wine enthusiast.

Michele Donini is a Sr Applied Scientist at AWS. He leads a team of scientists working on Responsible AI and his research interests are Algorithmic Fairness and Explainable Machine Learning.

Matthias Seeger is a Principal Applied Scientist at AWS.

Cedric Archambeau is a Principal Applied Scientist at AWS and Fellow of the European Lab for Learning and Intelligent Systems.

Andrew Gordon Wilson is an Associate Professor at the Courant Institute of Mathematical Sciences and Center for Data Science at New York University, and an Amazon Visiting Academic at AWS. He is particularly engaged in building methods for Bayesian and probabilistic deep learning, scalable Gaussian processes, Bayesian optimization, and physics-inspired machine learning.

Best practices for Amazon SageMaker Training Managed Warm Pools

Amazon SageMaker Training Managed Warm Pools gives you the flexibility to opt in to reuse and hold on to the underlying infrastructure for a user-defined period of time. This is done while also maintaining the benefit of passing the undifferentiated heavy lifting of managing compute instances in to Amazon SageMaker Model Training. In this post, we outline the key benefits and pain points addressed by SageMaker Training Managed Warm Pools, as well as benchmarks and best practices.

Overview of SageMaker Training Managed Warm Pools

SageMaker Model Training is a fully managed capability that spins up instances for every job, trains a model, runs and then spins down instances after the job. You’re only billed for the duration of the job down to the second. This fully managed capability gives you the freedom to focus on your machine learning (ML) algorithm and not worry about undifferentiated heavy lifting like infrastructure management while training your models.

This mechanism necessitates a finite startup time for a training job. Although this startup time, also known as cold-start startup time, is fairly low, some of our most demanding customer use cases require even lower startup times, such as under 20 seconds. There are two prominent use cases that have these requirements:

The first is active ML experimentation by data scientists using the Amazon SageMaker training platform, especially while training large models, like GPT3, that require multiple iterations to get to a production-ready state.
The second is the programmatic launch of a large number (in the order of several hundred or thousands) of consecutive jobs on the same kind of instances on a scheduled cadence. For example, parameter search or incremental training.

For such use cases, every second spent on overhead, like the startup time for a training job, has a cumulative effect on all these jobs.

With SageMaker Training Managed Warm Pools, data scientists and ML engineers have the ability to opt in to keep SageMaker training instances or multi-instance clusters warm for a prespecified and reconfigurable time (keep_alive_period_in_seconds) after each training job completes. So even though you incur a cold-start penalty for the first training job run on an instance or cluster, for all the subsequent training jobs, the instances are already up and running. As a result, these subsequent training jobs that start on an instance before the keep_alive_period_in_seconds expires don’t incur the cold-start startup time overhead. This can reduce training job startup times to roughly less than 20 seconds (P90).

Data scientists and ML engineers can use SageMaker Training Managed Warm Pools to keep single or multiple instances warm in between training runs for experimentation or run multiple jobs consecutively on the same single or multi-instance cluster. You only pay for the duration of training jobs and the reconfigurable keep_alive_period_in_seconds like everywhere else you specify for every single instance.

In essence, with SageMaker Training Managed Warm Pools, you get a combination of SageMaker managed instance utilization with the ability to opt in and provision capacity and self-manage utilization for short intervals of time. These intervals are configurable before a job, but if during the keep_alive_period_in_seconds interval, you need to reduce or increase it, you can do so. Increases to keep_alive_period_in_seconds can be done in intervals of up to 60 minutes, with a max period for an instance or cluster being 7 days.

To get started with warm pools, first request a warm pool quota limit increase, then specify the keep_alive_period_in_seconds parameter when starting a training job.

Benchmarks

We performed benchmarking tests to measure job startup latency using a 1.34 GB TensorFlow image, 2 GB of data, and different training data input modes (Amazon FSx, Fast File Mode, File Mode). The tests were run across a variety of instance types from the m4, c4, m5, and c5 families in the us-east-2 Region. The startup latency was measured as the time of job creation to the start of the actual training job on the instances. The first jobs that started the cluster and created the warm pool had a startup latency of 2–3 minutes. This higher latency is due to the time taken to provision the infrastructure, download the image, and download the data. The consequent jobs that utilized the warm pool cluster had a startup latency of approximately 20 seconds for Fast File Mode (FFM) or Amazon FSx, and 70 seconds for File Mode (FM). This delta is a result of FM requiring the entire dataset to be downloaded from Amazon S3 prior to the start of the job.

Your choice of training data input mode affects the startup time, even with Warm Pools. Guidance on what input mode to select is in the best practices section later in this post.

The following table summarizes the job startup latency P90 for different training data input modes.

Data Input Mode	Startup Latency P90 (seconds)
Data Input Mode	First Job	Warm Pool Jobs (second job onwards)
FSx	136	19
Fast File Mode	143	21
File Mode	176	70

Best practices for using warm pools

In the following section, we share some best practices when using warm pools.

When should you use warm pools?

Warm pools are recommended in the following scenarios:

You are interactively experimenting and tuning your script over a series of short jobs.
You are running your own custom-made, large-scale hyperparameter optimization (for example, Syne Tune).
You have a batch process that runs a large number (in the order of several hundreds or thousands) of consecutive jobs on the same kind of instances on a daily or weekly cadence. For example, training an ML model per city.

Warm pools are not recommended when it’s unlikely that someone will reuse the warm pool before it expires. For example, a single lengthy job that runs via an automated ML pipeline.

Minimize warm pool training job startup latency

Training jobs that reuse a warm pool start faster than the first job that created the warm pool. This is due to keeping the ML instances running between jobs with a cached training container Docker image to skip pulling the container from Amazon Elastic Container Registry (Amazon ECR). However, even when reusing a warm pool, certain initialization steps occur for all jobs. Optimizing these steps can reduce your job startup time (both first and subsequent jobs). Consider the following:

Training data input mode can affect startup time – Managed training data input channels are recreated for each training job, contributing to job startup latency. So doing initial experiments over a smaller dataset will allow for faster startup time (and faster training time). For later stages of experimentation, when a large dataset is needed, consider using an input mode type that has minimal or fixed initialization time. For example, FILE input mode copies the entire dataset from Amazon Simple Storage Service (Amazon S3) to the training instance, which is time-consuming for large datasets (even with warm pools). Fast File Mode is better suited for lower startup latency because only S3 object metadata needs to be read from Amazon S3 before the workload can start. The Amazon FSx for Lustre, or Amazon Elastic File System (Amazon EFS) file system input mode, has a fixed initialization time regardless of the number of files in the file system, which is beneficial when working with a large dataset.
For more information on how to choose an input channel, see Choose the best data source for your Amazon SageMaker training job.
Reduce runtime installation of packages – Any software installation that takes place during container startup, for example, Python’s pip or operating system apt-get, will increase training job latency. Minimizing this startup latency requires making a trade-off between the flexibility and simplicity of runtime installations vs. installation at container build time. If you use your own Docker container with SageMaker, refer to Adapting Your Own Docker Container to Work with SageMaker. If you rely on prebuilt SageMaker container images, you’ll need to extend a prebuilt container and explicitly manage these containers. Consider this if your runtime installs significantly increase startup latency.
Avoid updating your Docker image frequently – If you use your own Docker container with SageMaker, try to avoid updating it every job run. If the Docker image changes between the job submissions, the warm pool will be reused, but the startup process will need to re-pull the container image from Amazon ECR instead of reusing a cached container image. If the Docker image must be updated, confine the updates to the last Docker layer to take advantage of Docker layer caching. Ideally, you should remove the Dockerfile content that’s likely to change over iterations, like hyperparameter, dataset definitions, and the ML code itself. To iterate on ML code without having to rebuild Docker images with each change, you can adopt the framework container paradigm advocated in the SageMaker Training Toolkit. If you’d like to develop a framework container with your own code, refer to this Amazon SageMaker tutorial.

Share warm pools between multiple users

When working with a large team of data scientists, you can share warm pools that have matching job criteria, such as the same AWS Identity and Access Management (IAM) role or container image.

Let’s look at an example timeline. User-1 starts a training job that completes and results in a new warm pool created. When user-2 starts a training job, the job will reuse the existing warm pool, resulting in a fast job startup. While user-2’s job is running with the warm pool in use, if another user starts a training job, then a second warm pool will be created.

This reuse behavior helps reduce costs by sharing warm pools between users that start similar jobs. If you want to avoid sharing warm pools between users, then users’ jobs must not have matching job criteria (for example, they must use a different IAM role).

Notify users on job completion

When using warm pools for experimentation, we recommend notifying users when their job is complete. This allows users to resume experimentation before the warm pool expires or stop the warm pool if it’s no longer needed. You can also automatically trigger notifications through Amazon EventBridge.

Further tools for fast experimentation and troubleshooting training jobs

With warm pools, you can start a job in less than 20 seconds. Some scenarios require real-time, hands-on interactive experimentation and troubleshooting. The open-source SageMaker SSH Helper library allows you to shell into a SageMaker training container and conduct remote development and debugging.

Conclusion

With SageMaker Training Managed Warm Pools, you can keep your model training hardware instances warm after every job for a specified period. This can reduce the startup latency for a model training job by up to 8x. SageMaker Training Managed Warm Pools are available in all public AWS Regions where SageMaker Model Training is available.

To get started, see Train Using SageMaker Managed Warm Pools.

About the authors

Dr. Romi Datta is a Senior Manager of Product Management in the Amazon SageMaker team responsible for training, processing and feature store. He has been in AWS for over 4 years, holding several product management leadership roles in SageMaker, S3 and IoT. Prior to AWS he worked in various product management, engineering and operational leadership roles at IBM, Texas Instruments and Nvidia. He has an M.S. and Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin, and an MBA from the University of Chicago Booth School of Business.

Arun Nagarajan is a Principal Engineer with the Amazon SageMaker team focussing on the Training and MLOps areas. He has been with the SageMaker team from the launch year, enjoyed contributing to different areas in SageMaker including the realtime inference and Model Monitor products. He likes to explore the outdoors in the Pacific Northwest area and climb mountains.

Amy You is a Software Development Manager at AWS SageMaker. She focuses on bringing together a team of software engineers to build, maintain and develop new capabilities of the SageMaker Training platform that helps customers train their ML models more efficiently and easily. She has a passion for ML and AI technology, especially related to image and vision from her graduate studies. In her spare time, she loves working on music and art with her family.

Sifei Li is a Software Engineer in Amazon AI where she’s working on building Amazon Machine Learning Platforms and was part of the launch team for Amazon SageMaker. In her spare time, she likes playing music and reading.

Jenna Zhao is a Software Development Engineer at AWS SageMaker. She is passionate about ML/AI technology and has been focusing on building SageMaker Training platform that enables customers to quickly and easily train machine learning models. Outside of work, she enjoys traveling and spending time with her family.

Paras Mehra is a Senior Product Manager at AWS. He is focused on helping build Amazon SageMaker Training and Processing. In his spare time, Paras enjoys spending time with his family and road biking around the Bay Area. You can find him on LinkedIn.

Gili Nachum is a senior AI/ML Specialist Solutions Architect who works as part of the EMEA Amazon Machine Learning team. Gili is passionate about the challenges of training deep learning models, and how machine learning is changing the world as we know it. In his spare time, Gili enjoy playing table tennis.

Olivier Cruchant is a Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Emily Webber joined AWS just after SageMaker launched, and has been trying to tell the world about it ever since! Outside of building new ML experiences for customers, Emily enjoys meditating and studying Tibetan Buddhism.

Accelerating Text Generation with Confident Adaptive Language Modeling (CALM)

Posted by Tal Schuster, Research Scientist, Google Research

Language models (LMs) are the driving force behind many recent breakthroughs in natural language processing. Models like T5, LaMDA, GPT-3, and PaLM have demonstrated impressive performance on various language tasks. While multiple factors can contribute to improving the performance of LMs, some recent studies suggest that scaling up the model’s size is crucial for revealing emergent capabilities. In other words, some instances can be solved by small models, while others seem to benefit from increased scale.

Despite recent efforts that enabled the efficient training of LMs over large amounts of data, trained models can still be slow and costly for practical use. When generating text at inference time, most autoregressive LMs output content similar to how we speak and write (word after word), predicting each new word based on the preceding words. This process cannot be parallelized since LMs need to complete the prediction of one word before starting to compute the next one. Moreover, predicting each word requires significant computation given the model’s billions of parameters.

In “Confident Adaptive Language Modeling”, presented at NeurIPS 2022, we introduce a new method for accelerating the text generation of LMs by improving efficiency at inference time. Our method, named CALM, is motivated by the intuition that some next word predictions are easier than others. When writing a sentence, some continuations are trivial, while others might require more effort. Current LMs devote the same amount of compute power for all predictions. Instead, CALM dynamically distributes the computational effort across generation timesteps. By selectively allocating more computational resources only to harder predictions, CALM generates text faster while preserving output quality.

Confident Adaptive Language Modeling

When possible, CALM skips some compute effort for certain predictions. To demonstrate this, we use the popular encoder-decoder T5 architecture. The encoder reads the input text (e.g., a news article to summarize) and converts the text to dense representations. Then, the decoder outputs the summary by predicting it word by word. Both the encoder and decoder include a long sequence of Transformer layers. Each layer includes attention and feedforward modules with many matrix multiplications. These layers gradually modify the hidden representation that is ultimately used for predicting the next word.

Instead of waiting for all decoder layers to complete, CALM attempts to predict the next word earlier, after some intermediate layer. To decide whether to commit to a certain prediction or to postpone the prediction to a later layer, we measure the model’s confidence in its intermediate prediction. The rest of the computation is skipped only when the model is confident enough that the prediction won’t change. For quantifying what is “confident enough”, we calibrate a threshold that statistically satisfies arbitrary quality guarantees over the full output sequence.

Text generation with a regular language model (top) and with CALM (bottom). CALM attempts to make early predictions. Once confident enough (darker blue tones), it skips ahead and saves time.

Language Models with Early Exits

Enabling this early exit strategy for LMs requires minimal modifications to the training and inference processes. During training, we encourage the model to produce meaningful representations in intermediate layers. Instead of predicting only using the top layer, our learning loss function is a weighted average over the predictions of all layers, assigning higher weight to top layers. Our experiments demonstrate that this significantly improves the intermediate layer predictions while preserving the full model’s performance. In one model variant, we also include a small early-exit classifier trained to classify if the local intermediate layer prediction is consistent with the top layer. We train this classifier in a second quick step where we freeze the rest of the model.

Once the model is trained, we need a method to allow early-exiting. First, we define a local confidence measure for capturing the model’s confidence in its intermediate prediction. We explore three confidence measures (described in the results section below): (1) softmax response, taking the maximum predicted probability out of the softmax distribution; (2) state propagation, the cosine distance between the current hidden representation and the one from the previous layer; and (3) early-exit classifier, the output of a classifier specifically trained for predicting local consistency. We find the softmax response to be statistically strong while being simple and fast to compute. The other two alternatives are lighter in floating point operations (FLOPS).

Another challenge is that the self-attention of each layer depends on hidden-states from previous words. If we exit early for some word predictions, these hidden-states might be missing. Instead, we attend back to the hidden state of the last computed layer.

Finally, we set up the local confidence threshold for exiting early. In the next section, we describe our controlled process for finding good threshold values. As a first step, we simplify this infinite search space by building on a useful observation: mistakes that are made at the beginning of the generation process are more detrimental since they can affect all of the following outputs. Therefore, we start with a higher (more conservative) threshold, and gradually reduce it with time. We use a negative exponent with user-defined temperature to control this decay rate. We find this allows better control over the performance-efficiency tradeoff (the obtained speedup per quality level).

Reliably Controlling the Quality of the Accelerated Model

Early exit decisions have to be local; they need to happen when predicting each word. In practice, however, the final output should be globally consistent or comparable to the original model. For example, if the original full model generated “the concert was wonderful and long”, one would accept CALM switching the order of the adjectives and outputting “the concert was long and wonderful”. However, at the local level, the word “wonderful” was replaced with “long”. Therefore, the two outputs are globally consistent, but include some local inconsistencies. We build on the Learn then Test (LTT) framework to connect local confidence-based decisions to globally consistent outputs.

In CALM, local per-timestep confidence thresholds for early exiting decisions are derived, via LTT calibration, from user-defined consistency constraints over the full output text. Red boxes indicate that CALM used most of the decoder’s layers for that specific prediction. Green boxes indicate that CALM saved time by using only a few Transformer layers. Full sentence shown in the last example of this post.

First, we define and formulate two types of consistency constraints from which to choose:

Textual consistency: We bound the expected textual distance between the outputs of CALM and the outputs of the full model. This doesn’t require any labeled data.
Risk consistency: We bound the expected increase in loss that we allow for CALM compared to the full model. This requires reference outputs against which to compare.

For each of these constraints, we can set the tolerance that we allow and calibrate the confidence threshold to allow early exits while reliably satisfying our defined constraint with an arbitrarily high probability.

CALM Saves Inference Time

We run experiments on three popular generation datasets: CNN/DM for summarization, WMT for machine translation, and SQuAD for question answering. We evaluate each of the three confidence measures (softmax response, state propagation and early-exit classifier) using an 8-layer encoder-decoder model. To evaluate global sequence-level performance, we use the standard Rouge-L, BLEU, and Token-F1 scores that measure distances against human-written references. We show that one can maintain full model performance while using only a third or half of the layers on average. CALM achieves this by dynamically distributing the compute effort across the prediction timesteps.

As an approximate upper bound, we also compute the predictions using a local oracle confidence measure, which enables exiting at the first layer that leads to the same prediction as the top one. On all three tasks, the oracle measure can preserve full model performance when using only 1.5 decoder layers on average. In contrast to CALM, a static baseline uses the same number of layers for all predictions, requiring 3 to 7 layers (depending on the dataset) to preserve its performance. This demonstrates why the dynamic allocation of compute effort is important. Only a small fraction of the predictions require most of the model’s complexity, while for others much less should suffice.

Performance per task against the average number of decoder layers used.

Finally, we also find that CALM enables practical speedups. When benchmarking on TPUs, we saved almost half of the compute time while maintaining the quality of the outputs.

Example of a generated news summary. The top cell presents the reference human-written summary. Below is the prediction of the full model (8 layers) followed by two different CALM output examples. The first CALM output is 2.9x faster and the second output is 3.6x faster than the full model, benchmarked on TPUs.

Conclusion

CALM allows faster text generation with LMs, without reducing the quality of the output text. This is achieved by dynamically modifying the amount of compute per generation timestep, allowing the model to exit the computational sequence early when confident enough.

As language models continue to grow in size, studying how to efficiently use them becomes crucial. CALM is orthogonal and can be combined with many efficiency related efforts, including model quantization, distillation, sparsity, effective partitioning, and distributed control flows.

Acknowledgements

It was an honor and privilege to work on this with Adam Fisch, Ionel Gog, Seungyeon Kim, Jai Gupta, Mostafa Dehghani, Dara Bahri, Vinh Q. Tran, Yi Tay, and Donald Metzler. We also thank Anselm Levskaya, Hyung Won Chung, Tao Wang, Paul Barham, Michael Isard, Orhan Firat, Carlos Riquelme, Aditya Menon, Zhifeng Chen, Sanjiv Kumar, and Jeff Dean for helpful discussions and feedback. Finally, we thank Tom Small for preparing the animation in this blog post.

How to evaluate the quality of the synthetic data – measuring from the perspective of fidelity, utility, and privacy

In an increasingly data-centric world, enterprises must focus on gathering both valuable physical information and generating the information that they need but can’t easily capture. Data access, regulation, and compliance are an increasing source of friction for innovation in analytics and artificial intelligence (AI).

For highly regulated sectors such as Financial Services, Healthcare, Life Sciences, Automotive, Robotics, and Manufacturing, the problem is even greater. It causes barriers to system design, data sharing (Internal and external), monetization, analytics, and machine learning (ML).

Synthetic data is a tool that addresses many data challenges, particularly AI and analytics issues like privacy protection, regulatory compliance, accessibility, data scarcity, and bias. This also includes data sharing and time to data (and therefore time to market).

Synthetic data is algorithmically generated. It mirrors statistical properties and patterns from the source data. But importantly it contains no sensitive, private, or personal data points.

You ask questions of the synthetic data and get the same answers that you would from the real data.

In our earlier post, we demonstrated how to use adversarial networks like Generative Adversarial Networks (GANS) to generate tabular datasets to enhance credit fraud model training.

For business stakeholders to adopt synthetic data for their ML and analytics projects, it’s imperative to not only make sure that the generated synthetic data will fit the purpose and the expected downstream applications, but also for them to be able to measure and demonstrate the quality of the generated data.

With increasing legal and ethical obligations in preserving privacy, one of synthetic data’s strengths is the ability to remove sensitive and original information during its synthesis. Therefore, in addition to quality, we need metrics to evaluate the risk of private information leaks, if any, and assess that the process of generation isn’t “memorizing” or copying any of the original data.

To achieve all of this, we can map the quality of synthetic data into dimensions, which help the users, stakeholders, and us to better understand the generated data.

The three dimensions of synthetic data quality evaluation

The synthetic data generated is measured against three key dimensions:

Fidelity
Utility
Privacy

These are some of the questions about any generated synthetic data that should be answered by a synthetic data quality report:

How similar is this synthetic data as compared to the original training set?
How useful is this synthetic data for our downstream applications?
Has any information been leaked from the original training data into the synthetic data?
Has any data which is considered sensitive in the real world (from other data sets not used for training the model) been inadvertently synthesized by our model?

The metrics that translate each one of these dimensions for the end-users are somewhat flexible. After all, the data to be generated can vary in terms of distributions, size, and behaviors. They should also be easy to grasp and interpret.

Ultimately, the metrics must be completely data-driven, and not requiring any prior knowledge or domain-specific information. However, if the user wants to apply specific rules and constraints applicable to a specific business domain, then they should be able to define them during the synthesis process to make sure that the domain-specific fidelity is met.

We look at each of these metrics in more detail in the following sections.

Metrics to understand fidelity

In any data science project, we must understand whether a certain sample population is relevant to the problem that we’re solving. Similarly, for the process of assessing the relevance of the synthetic data generated, we must evaluate it in terms of fidelity as compared to the original.

Visual representations of these metrics make them easier to comprehend. We could illustrate whether the cardinality and ratio of categories were respected, the correlations between the different variables were kept, and so on.

Visualizing the data not only helps to evaluate the quality of the synthetic data, but also fits in as one of the initial steps in the data science lifecycle for a better understanding of the data.

Let’s dive into some fidelity metrics in more detail.

Exploratory statistical comparisons

Within the exploratory statistical comparisons, the features of the original and synthetic datasets are explored using key statistical measures, such as the mean, median, standard deviation, distinct values, missing values, minima, maxima, quartile ranges for continuous features, and the number of records per category, missing values per category, and most occurring characters for categorical attributes.

This comparison should be conducted between the original hold-out dataset and the synthetic data. This evaluation would reveal if the datasets compared are statistically similar. If they aren’t, then we’ll have an understanding of which features and measures are different. You should consider retraining and regenerating the synthetic data with different parameters if a significant difference is noted.

This test acts as an initial screening to make sure that the synthetic data has reasonable fidelity to the original dataset and can therefore usefully undergo more rigorous testing.

Histogram similarity score

The histogram similarity score measures each feature’s marginal distributions of the synthetic and original datasets.

The similarity score is bounded between zero and one, with a score of one indicating that the synthetic data distributions perfectly overlap the distributions of the original data.

A score close to one would give the users the confidence that the holdout dataset and the synthetic dataset are statistically similar.

Mutual information score

The mutual information score measures the mutual dependence of two features, numerical or categorical, indicating how much information can be obtained from one feature by observing another.

Mutual information can measure non-linear relationships, providing a more comprehensive understanding of the synthetic data quality as it lets us understand the extent of the variable’s relations preservation.

A score of one indicates that the mutual dependence between features has been perfectly captured in the synthetic data.

Correlation score

The correlation score measures how well the correlations in the original dataset have been captured in the synthetic data.

Correlations between two or more columns are extremely important for ML applications, which help uncover relationships between features and the target variable and help create a well-trained model.

The correlation score is bounded between zero and one, with a score of one indicating that the correlations have been perfectly matched.

Unlike structured tabular data, which we commonly encounter in data problems, some types of structured data have a particular behavior where past observations have a probability of influencing the following observation. These are known as time-series or sequential data – for example, a dataset with hourly measurements of room temperature.

This behavior means that there is a requirement to define certain metrics that can specifically measure the quality of these time-series datasets

Autocorrelation and partial autocorrelation score

Although similar to correlation, autocorrelation shows the relationship of a time series at its present value as it relates to its previous values. Removing the effects of the previous time lags yields partial autocorrelation. Therefore, the autocorrelation score measures how well the synthetic data has captured the significant autocorrelations, or partial correlations, from the original dataset.

Metrics to understand utility

Now we may have statistically realized that the synthetic data is similar to the original dataset. In addition, we must also assess how well the synthesized dataset fares on common data science problems when trained on several ML algorithms.

Using the following utility metrics, we aim to build confidence that we can actually achieve performance on downstream applications regarding how the original data has performed.

Prediction score

Measuring the performance of synthetic data as compared to the original real data can be done through ML models. The downstream model score captures the quality of the synthetic data by comparing the performance of ML models trained on both the synthetic and original datasets and validated on withheld testing data from the original dataset. This provides a Train Synthetic Test Real (TSTR) score and a Train Real Test Real (TRTR) score respectively.

TSTR, TRTR scores, and the Feature Importance Score (Image by author)

The score incorporates a wide range of the most trusted ML algorithms for either regression or classification tasks. Using several classifiers and regressors makes sure that the score is more generalizable across most algorithms, so that the synthetic data may be considered useful in the future.

In the end, if the TSTR score and TRTR score are comparable, this indicates that the synthetic data has the quality to be used to train effective ML models for real-world applications.

Feature importance score

Highly related to the prediction score, the feature importance (FI) score extends it by adding interpretability to the TSTR and TRTR scores.

The F1 score compares the changes and stability of the feature’s importance order obtained with the prediction score. A synthetic set of data is considered to be of high utility if it yields the same order of feature importance as the original real data.

QScore

To make sure that a model trained on our newly generated data is going to produce the same answers to the same questions as a model trained using the original data, we use the Qscore. This measures the downstream performance of the synthetic data by running many random aggregation-based queries on both the synthetic and original (and holdout) datasets.

The idea here is that both of these queries should return similar results.

A high QScore makes sure that downstream applications that utilize querying and aggregation operations can provide close to equal value as that of the original dataset.

Metrics to understand privacy

With privacy regulations already in place, it’s an ethical obligation and a legal requirement to make sure that sensitive information is protected.

Before this synthetic data can be shared freely and used for downstream applications, we must consider the privacy metrics that can help the stakeholder understand where the generated synthetic data stands as compared to the original data in terms of the extent of leaked information. Moreover, we must make critical decisions regarding how the synthetic data can be shared and used.

Exact match score

A direct and intuitive evaluation of privacy is to look for copies of the real data among the synthetic records. The exact match score counts the number of real records that can be found among the synthetic set.

The score should be zero, stating that no real information is present as-is in the synthetic data. This metric acts as a screening mechanism before we evaluate further privacy metrics.

Neighbors’ privacy score

Furthermore, the neighbors’ privacy score measures the ratio of synthetic records that might be too close in similarity to the real ones. This means that, although they aren’t direct copies, they are potential points of privacy leakage and a source of useful information for inference attacks.

The score is calculated by conducting a high-dimensional nearest-neighbors search on the synthetic data overlapped with the original data.

Membership inference score

In the data science lifecycle, once a model has been trained, it no longer needs access to the training samples and can make predictions on unseen data. Similarly, in our case, once the synthesizer model is trained, samples of synthetic data can be generated without the need for the original data.

Through a type of attack called “membership inference attack”, attackers can attempt to reveal the data that was used to create the synthetic data, without having the access to the original data. This results in a compromise of privacy.

The membership inference score measures the likelihood of a membership inference attack being successful.

A low score suggests the feasibility of inference that a particular record was a member of the training dataset that led to the creation of the synthetic data. In other words, the attacks can infer details of an individual record, thereby compromising privacy.

A high membership inference score indicates that an attacker is unlikely to determine if a particular record was part of the original dataset used to create the synthetic data. This also means that no individual’s information was compromised through the synthetic data.

The holdout concept

An important best practice that we must follow is to make sure that the synthetic data is general enough and doesn’t overfit the original data on which it was trained. In typical data science flow, while building ML models such as a Random Forest classifier, we set aside test data, train the models using the training data, and evaluate the metrics on unseen test data.

Similarly, for synthetic data, we keep aside a sample of the original data – generally referred to as a hold-out dataset or unseen withheld test data – and evaluate the generated synthetic data against the hold-out dataset.

The holdout dataset is expected to be a representation of the original data, yet not seen when the synthetic data was generated. Therefore, it’s vital to have similar scores for all of the metrics when comparing the original to the holdout and the synthetic datasets.

When similar scores are obtained, we can establish that the synthetic data points aren’t a result of memorization of the original data points, while preserving the same fidelity and utility.

Final thoughts

The world is starting to understand the strategic importance of synthetic data . As data scientists and data generators, it’s our duty to build trust in the synthetic data that we generate and make sure that it’s for a purpose.

Synthetic data is evolving into a must-have in the data science development toolkit. MIT Technology Review has noted synthetic data as one of the breakthrough technologies of 2022. We can’t imagine building excellent value AI models without synthetic data, claims Gartner.

According to McKinsey, synthetic data minimizes costs and barriers that you would otherwise have when developing algorithms or getting access to data.

The generation of synthetic data is about knowing the downstream applications and understanding the trade-offs between the different dimensions for the quality of synthetic data.

Summary

As the user of the synthetic data, it’s essential to define the context of the use case for which every sample of synthetic will be used in the future. Just as with real data, the quality of the synthetic data is dependent on the use case intended, as well as the parameters chosen for synthetization.

For example, keeping outliers in the synthetic data as in the original data is useful for a fraud detection use case. However, it’s not useful in a healthcare use case with privacy concerns, as outliers generally could be information leakage.

Moreover, a tradeoff exists between fidelity, utility, and privacy. Data can’t be optimized for all three simultaneously. These metrics enable the stakeholders to prioritize what is essential for each use case and manage expectations from the generated synthetic data.

Ultimately, when we see the values of each metric and when they meet expectations, stakeholders can be confident in the solutions that they build using the synthetic data.

The use cases for structured synthetic data cover a wide gamut of application from test data for software development to creating Synthetic control arms in clinical trials.

Reach out to explore these opportunities or built a PoC to demonstrate the value.

Faris Haddad is the Data & Insights Lead in the AABG Strategic Pursuits team. He helps enterprises successfully become data-driven.

Augment fraud transactions using synthetic data in Amazon SageMaker

Developing and training successful machine learning (ML) fraud models requires access to large amounts of high-quality data. Sourcing this data is challenging because available datasets are sometimes not large enough or sufficiently unbiased to usefully train the ML model and may require significant cost and time. Regulation and privacy requirements further prevent data use or sharing even within an enterprise organization. The process of authorizing the use of, and access to, sensitive data often delays or derails ML projects. Alternatively, we can tackle these challenges by generating and using synthetic data.

Synthetic data describes artificially created datasets that mimic the content and patterns in the original dataset in order to address regulatory risk and compliance, time, and costs of sourcing. Synthetic data generators use the real data to learn relevant features, correlations, and patterns in order to generate required amounts of synthetic data matching the statistical qualities of the originally ingested dataset.

Synthetic Data has been in use in lab environments for over two decades; the market has evidence of utility that is accelerating adoption in commercial and public sectors. Gartner predicts that by 2024, 60 percent of the data used for the development of ML and analytics solutions will be synthetically generated and that the use of synthetic data will continue to increase substantially.

The Financial Conduct Authority, a UK regulatory body, acknowledges that “Access to data is the catalyst for innovation, and synthetic financial data could play a role in supporting innovation and enabling new entrants to develop, test, and demonstrate the value of new solutions.”

Amazon SageMaker GroundTruth currently supports synthetic data generation of labeled synthetic image data. This blog post explores tabular synthetic data generation. Structured data, such as single and relational tables, and time series data are the types most often encountered in enterprise analytics.

This is a two-part blog post; we create synthetic data in part one and evaluate its quality in part two.

In this blog post, you will learn how to use the open-source library ydata-synthetic and AWS SageMaker notebooks to synthesize tabular data for a fraud use case, where we do not have enough fraudulent transactions to train a high-accuracy fraud model. The general process of training a fraud model is covered in this post.

Overview of the solution

The aim of this tutorial is to synthesize the minority class of a highly imbalanced credit card fraud dataset using an optimized generative adversarial network (GAN) called WGAN-GP to learn patterns and statistical properties of original data and then create endless samples of synthetic data that resemble the original data. This process can also be used to enhance the original data by up-sampling rare events like fraud or to generate edge cases that are not present in the original.

We use a credit card fraud dataset published by ULB, which can be downloaded from Kaggle. Generating synthetic data for the minority class helps address problems related to imbalanced datasets, which can help in developing more accurate models.

We use AWS services, including Amazon SageMaker and Amazon S3, which incur costs to use cloud resources.

Set up the development environment

SageMaker provides a managed Jupyter notebook instance for model building, training, and deployment.

Prerequisites:

You must have an AWS account to run SageMaker. You can get started with SageMaker and try hands-on tutorials.

For instructions on setting up your Jupyter Notebook working environment, see Get Started with Amazon SageMaker Notebook Instances.

Step 1: Set up your Amazon SageMaker instance

Sign in to the AWS console and search for “SageMaker.”
Select Studio.
Select Notebook instances on the left bar, and select Create notebook instance.
From the next page (as shown in the following image), select the configurations of the virtual machine (VM) according to your needs, and select Create notebook instance. Note that we used an ML optimized VM with no GPU and 5 GB of data, ml.t3.medium running an Amazon Linux 2, and Jupyter Lab 3 kernel.
A notebook instance will be ready for you to use within a few minutes.
Select Open JupyterLab to launch.
Now that we have a JupyterLab with our required specifications, we will install the synthetic library.

pip install ydata-synthetic

Step 2: Download or extract the real dataset to create synthetic data

Download the reference data from Kaggle either manually, as we do here, or programmatically through Kaggle API if you have a Kaggle account. If you explore this dataset, you’ll notice that the “fraud” class contains much less data than the “not fraud” class.

If you use this data directly for machine learning predictions, the models might always learn to predict “not fraud.” A model will easily have a higher accuracy in nonfraud cases since fraud cases are rare. However, since detecting the fraud cases is our objective in this exercise, we will boost the fraud class numbers with synthetic data modeled on the real data.

Create a data folder in JupyterLab and upload the Kaggle data file into it. This will let you use the data within the notebook since SageMaker comes with storage that you would have specified when you instantiated the notebook.

This dataset is 144 MB

You can then read the data using standard code via the pandas library:

import pandas as pd
data = pd.read_csv('./data/creditcard.csv')

Fraud-detection data has certain characteristics, namely:

Large class imbalances (typically towards nonfraud data points).
Privacy-related concerns (owing to the presence of sensitive data).
A degree of dynamism, in that a malicious user is always trying to avoid detection by systems monitoring for fraudulent transactions.
The available data sets are very large and often unlabeled.

Now that you have inspected the dataset, let’s filter the minority class (the “fraud” class from the credit card dataset) and perform transformations as required. You can check out the data transformations from this notebook.

When this minority class dataset is synthesized and added back to the original dataset, it allows the generation of a larger synthesized dataset that addresses the imbalance in data. We can achieve greater prediction accuracy by training a fraud detection model using the new dataset.

Let’s synthesize the new fraud dataset.

Step 3: Train the synthesizers and create the model

Since you have the data readily available within SageMaker, it’s time to put our synthetic GAN models to work.

A generative adversarial network (GAN) has two parts:

The generator learns to generate plausible data. The generated instances become negative training examples for the discriminator.

The discriminator learns to distinguish the generator’s fake data from real data. The discriminator penalizes the generator for producing implausible results.

When training begins, the generator produces obviously fake data, and the discriminator quickly learns to tell that it’s fake. As training progresses, the generator gets closer to producing output that can fool the discriminator. Finally, if generator training goes well, the discriminator gets worse at telling the difference between real and fake. It starts to classify fake data as real, and its accuracy decreases.

Both the generator and the discriminator are neural networks. The generator output is connected directly to the discriminator input. Through backpropagation, the discriminator’s classification provides a signal that the generator uses to update its weights.

Step 4: Sample synthetic data from the synthesizer

Now that you have built and trained your model, it’s time to sample the required data by feeding noise to the model. This enables you to generate as much synthetic data as you want.

In this case, you generate an equal quantity of synthetic data to the quantity of actual data because this it makes it easier to compare the similar sample sizes in Step 5.

We have the option to sample rows containing fraudulent transactions—which, when combined with the nonsynthetic fraud data, will lead to an equal distribution of “fraud” and “not-fraud” classes. The original Kaggle dataset contained 492 frauds out of 284,807 transactions, so we create a same sample from the synthesizer.

# use the same shape as the real data
synthetic_fraud = synthesizer.sample(492)

We have the option to up-sample rows containing fraudulent transactions in a process called data augmentation—which, when combined with the nonsynthetic fraud data, will lead to an equal distribution of “fraud” and “not-fraud” classes.

Step 5: Compare and evaluate the synthetic data against the real data

Though this step is optional, you can qualitatively visualize and assess the generated synthetic data against the actual data using a scatter plot.

This helps us iterate our model by tweaking parameters, changing sample size, and making other transformations to generate the most accurate synthetic data. This nature of accuracy is always depends on the purpose of the synthesis

The image below depicts how similar the actual fraud and the synthetic fraud data points are across the training steps. This gives a good qualitative inspection of the similarity between the synthetic and the actual data and how that improves as we run it through more epochs (transit of entire training dataset through algorithm). Note that as we run more epochs, the synthetic data pattern set gets closer to the original data.

Step 6: Clean up

Finally, stop your notebook instance when you’re done with the synthesis to avoid unexpected costs.

Conclusion

As machine learning algorithms and coding frameworks evolve rapidly, high-quality data at scale is the scarcest resource in ML. Good-quality synthetic datasets can be used in a variety of tasks.

In this blog post, you learned the importance of synthesizing the dataset by using an open-source library that uses WGAN-GP. This is an active research area with thousands of papers on GANs published and many hundreds of named GANs available for you to experiment with. There are variants that are optimized for specific use cases like relational tables and time series data.

You can find all the code used for this article in this notebook, and of course, more tutorials like this are available from the SageMaker official documentation page.

In the second part of this two-part blog post series, we will do a deep dive into how to evaluate the quality of the synthetic data from a perspective of fidelity, utility, and privacy.

About the Author

Faris Haddad is the Data & Insights Lead in the AABG Strategic Pursuits team. He helps enterprises successfully become data-driven.

AI’s Highlight Reel: Top Five NVIDIA Videos of 2022

If AI had a highlight reel, the NVIDIA YouTube channel might just be it.

The channel showcases the latest breakthroughs in artificial intelligence, with demos, keynotes and other videos that help viewers see and believe the astonishing ways in which the technology is changing the world.

NVIDIA’s most popular videos of 2022 put spotlights on photorealistically animated data centers, digital twins for climate science, AI for healthcare and more.

And the latest GTC keynote address by NVIDIA founder and CEO Jensen Huang racked up 19 million views in just three months, making it the channel’s most-watched video of all time.

It all demonstrates the power of AI, its growth and applications.

But don’t just take our word for it — watch NVIDIA’s top five YouTube videos of the year:

Meet NVIDIA — the Engine of AI

While watching graphics cards dance and autonomous vehicles cruise, learn more about how NVIDIA’s body of work is fueling all things AI.

NVIDIA DGX A100 — Bringing AI to Every Industry

In a dazzling clip that unpacks NVIDIA DGX A100, the universal system for AI workloads, check out the many applications for the world’s first 5 petaFLOPS AI system.

A New Era of Digital Twins and Virtual Worlds With NVIDIA Omniverse

Watch stunning demos and hear about how the NVIDIA Omniverse platform enables real-time 3D simulation, design collaboration and the creation of virtual worlds.

Optimizing an Ultrarapid DNA Sequencing Technique for Critical Care Patients

A collaboration including NVIDIA led to a record-breaking AI technique where a whole genome was sequenced in just about seven hours.

Maximizing Wind Energy Production Using Wake Optimization

Dive into how Siemens Gamesa is using NVIDIA-powered, physics-informed, super-resolution AI models to simulate wind farms and boost energy production.

The post AI’s Highlight Reel: Top Five NVIDIA Videos of 2022 appeared first on NVIDIA Blog.

Accelerated Computing, AI and Digital Twins: A Recipe for US Manufacturing Leadership

A national initiative in semiconductors provides a once-in-a-generation opportunity to energize manufacturing in the U.S.

The CHIPS and Science Act includes an $13 billion R&D investment in the chip industry. Done right, it’s a recipe for bringing advanced manufacturing techniques to every industry and cultivating a highly skilled workforce.

The semiconductor industry uses the most complex manufacturing processes and equipment in human history. To produce each chip inside a car or computer, hundreds of steps must be executed perfectly, most already automated with robotics.

The U.S. government asked industry where it should focus its efforts on improving this sector. In response, NVIDIA released a 12-page document with its best ideas.

Supercharged with accelerated computing and AI, a modern fab is also a guidepost for all other types of complex manufacturing — from making smartphones to shoes — flexibly and efficiently.

The World’s Most Expensive Factories

Semiconductors are made in factories called fabs. Building and outfitting a new one costs as much as $20 billion.

The latest factories rely heavily on computers that are built, programmed and operated by skilled workers armed with machine learning for the next generation of manufacturing processes.

For example, AI can find patterns no human can see, including tiny defects in a product on a fast-moving assembly line. The semiconductor industry needs this technology to create tomorrow’s increasingly large and complex chips. Other industries will be able to use it to make better products faster, too.

Efficiency Through Simulation

We can now create a digital copy of an entire factory. Using NVIDIA technologies, BMW is already building a digital twin of one of its automotive plants to bring new efficiencies to its business.

No one has built anything as complex as a digital twin of a chip fab yet, but that goal is now within reach.

A virtual fab would let specialists design and test new processes much more quickly and cheaply without stopping production in a physical plant. A simulation also can use AI to analyze data from sensors inside physical factories, finding new ways to route materials that reduce waste and speed operations.

Soon, any manufacturing plant with a digital twin will be more economically competitive than a plant without one.

Virtual Factories, Real Operators

Digital twins enable remote specialists to collaborate as if they were in the same room. They also take worker training to a new level.

Some of the most vital tools in a fab are the size of a shipping container and cost as much as $200 million each. Digital twins let workers train on these expensive systems before they’re even installed.

Once trained, workers can qualify, operate and service them without needing to set foot in the ultra-clean rooms where they’re installed. This kind of work represents the future of all manufacturing.

Factories designed with virtual twins also can optimize energy efficiency, water consumption and maximize reuse, reducing environmental impact.

Wanted: More Performance per Watt

Tomorrow’s factories will need more computing muscle than ever. To deliver it, we need investments in energy-efficient technologies at every level.

The circuits inside chips need to use and waste significantly less energy. The signals they send to nearby chips and across global networks must move faster while consuming less power.

Computers will need to tackle more data-intensive jobs while increasing productivity. To design and build these systems, we need research on new kinds of accelerator chips, accelerated systems and the software that will run on them.

NVIDIA and others have made great progress in green computing. Now we have an opportunity to take another big step forward.

A Broad Agenda and Partnerships

These are just some of the ways NVIDIA wants to help advance the U.S. semiconductor industry and by extension all manufacturers.

No company can do this work alone. Industry, academia and government must collaborate to get this right.

NVIDIA is at the center of a vibrant ecosystem of 3.5 million developers and more than 12,000 global startups registered in the NVIDIA Inception program.

The University of Florida provides a model for advancing AI and data science education across every field of study.

In 2020, it kicked off a plan to become one of the nation’s first AI universities. Today it’s infusing its entire curriculum with machine learning. At its heart, UF’s AI supercomputer is already advancing research in fields such as healthcare, agriculture and engineering.

It’s one more example of the transformative power of accelerated computing and AI. We look forward to the opportunity to take part in this grand adventure in U.S. manufacturing.

To learn more about NVIDIA’s ideas on the future of semiconductor manufacturing, including how AI is critical to advancing lithography, electronic design tools and cybersecurity processes, read the full document.

The post Accelerated Computing, AI and Digital Twins: A Recipe for US Manufacturing Leadership appeared first on NVIDIA Blog.

Yong Jae Lee is advancing the cutting edge of CV research

University of Wisconsin-Madison associate professor and ARA recipient has authored a series of pioneering papers on real-time object instance segmentation.Read More

Safe Travels: NVIDIA DRIVE OS Receives Premier Safety Certification

To make transportation safer, autonomous vehicles (AVs) must have processes and underlying systems that meet the highest standards.

NVIDIA DRIVE OS is the operating system for in-vehicle accelerated computing powered by the NVIDIA DRIVE platform. DRIVE OS 5.2 is now functional safety-certified by TÜV SÜD, one of the most experienced and rigorous assessment bodies in the automotive industry.

TÜV SÜD has determined that the software meets the International Organization for Standardization (ISO) 26262 ASIL B standard, which targets functional safety, or “the absence of unreasonable risk due to hazards caused by malfunctioning behavior of electrical or electronic systems.”

Based in Munich, Germany, TÜV SÜD assesses compliance to national and international standards for safety, durability and quality in cars, as well as for factories, buildings, bridges and other infrastructure.

Safety architecture, design and methodologies are pervasive throughout NVIDIA DRIVE solutions, from the data center to the car. NVIDIA has invested 15,000 engineering years in safety systems and processes.

A Strong Foundation

DRIVE OS is the foundation of the NVIDIA DRIVE SDK and is the first functionally safe operating system for complex in-vehicle accelerated computing platforms.

It includes NVIDIA CUDA libraries for efficient parallel computing, the NVIDIA TensorRT SDK for real-time AI inferencing, the NvMedia library for sensor input processing and other developer tools and modules for access to hardware engines.

NVIDIA is working across the industry to ensure the safe deployment of AVs. It participates in standardization and regulation bodies worldwide, including ISO, the Society of Automotive Engineers (SAE), the Institute of Electrical and Electronics Engineers (IEEE) and more.

Measuring Up

NVIDIA DRIVE is an open platform, meaning experts at top car companies can build upon this industrial-strength system.

TÜV SÜD, among the world’s most respected safety experts, measured DRIVE OS against industry safety standards, specifically ISO 26262, the definitive global standard for functional safety of road vehicles’ systems, hardware and software.

To meet that standard, software must detect failures during operation, as well as be developed in a process that handles potential systematic faults along the whole V-model — from safety-requirements definition to coding, analysis, verification and validation.

That is, the software must avoid failures whenever possible, but detect and respond to them if they cannot be avoided.

TÜV SÜD’s team determined DRIVE OS 5.2 complies with the testing criteria and is suitable for safety-related use in applications up to ASIL B.

Safety Across the Stack

Safety is NVIDIA’s first priority in AV development.

This certification builds on TÜV SÜD’s 2020 assessment of the NVIDIA DRIVE Xavier system-on-a-chip, which determined that it meets ISO 26262 random hardware integrity of ASIL C and a systematic capability of ASIL D for process — the strictest standard for functional safety.

These processes all contribute to our dedication to a comprehensive safety approach that extends from the SoC to the operating system, the application software and the cloud.

The post Safe Travels: NVIDIA DRIVE OS Receives Premier Safety Certification appeared first on NVIDIA Blog.

Point-E: A system for generating 3D point clouds from complex prompts

OpenAI Blog

The problem of overconfidence in deep learning

Fortuna: A library for uncertainty quantification

Usage modes

1/ Starting from uncertainty estimates

2/ Starting from model outputs

3/ Starting from Flax models

Conclusion

About the authors

Overview of SageMaker Training Managed Warm Pools

Benchmarks

Best practices for using warm pools

When should you use warm pools?

Minimize warm pool training job startup latency

Share warm pools between multiple users

Notify users on job completion

Further tools for fast experimentation and troubleshooting training jobs

Conclusion

About the authors

Confident Adaptive Language Modeling

Language Models with Early Exits

Reliably Controlling the Quality of the Accelerated Model

CALM Saves Inference Time

Conclusion

Acknowledgements

The three dimensions of synthetic data quality evaluation

Metrics to understand fidelity

Exploratory statistical comparisons

Histogram similarity score

Mutual information score

Correlation score

Autocorrelation and partial autocorrelation score

Metrics to understand utility

Prediction score

Feature importance score

QScore

Metrics to understand privacy

Exact match score

Neighbors’ privacy score

Membership inference score

The holdout concept

Final thoughts

Summary

Overview of the solution

Set up the development environment

Prerequisites:

Step 1: Set up your Amazon SageMaker instance

Step 2: Download or extract the real dataset to create synthetic data

Step 3: Train the synthesizers and create the model

Step 4: Sample synthetic data from the synthesizer

Step 5: Compare and evaluate the synthetic data against the real data

Step 6: Clean up

Conclusion

About the Author

Meet NVIDIA — the Engine of AI

NVIDIA DGX A100 — Bringing AI to Every Industry

A New Era of Digital Twins and Virtual Worlds With NVIDIA Omniverse

Optimizing an Ultrarapid DNA Sequencing Technique for Critical Care Patients

Maximizing Wind Energy Production Using Wake Optimization

The World’s Most Expensive Factories

Efficiency Through Simulation

Virtual Factories, Real Operators

Wanted: More Performance per Watt

A Broad Agenda and Partnerships

A Strong Foundation

Measuring Up

Safety Across the Stack

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.