We’ve published our joint paper with Google Research in Nature Medicine, which proposes CoDoC (Complementarity-driven Deferral-to-Clinical Workflow), an AI system that learns when to rely on predictive AI tools or defer to a clinician for the most accurate interpretation of medical images.Read More
Training Diffusion Models with Reinforcement Learning
Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. You may know them for their ability to produce stunning AI art and hyper-realistic synthetic images, but they have also found success in other applications such as drug design and continuous control. The key idea behind diffusion models is to iteratively transform random noise into a sample, such as an image or protein structure. This is typically motivated as a maximum likelihood estimation problem, where the model is trained to generate samples that match the training data as closely as possible.
However, most use cases of diffusion models are not directly concerned with matching the training data, but instead with a downstream objective. We don’t just want an image that looks like existing images, but one that has a specific type of appearance; we don’t just want a drug molecule that is physically plausible, but one that is as effective as possible. In this post, we show how diffusion models can be trained on these downstream objectives directly using reinforcement learning (RL). To do this, we finetune Stable Diffusion on a variety of objectives, including image compressibility, human-perceived aesthetic quality, and prompt-image alignment. The last of these objectives uses feedback from a large vision-language model to improve the model’s performance on unusual prompts, demonstrating how powerful AI models can be used to improve each other without any humans in the loop.
Training Diffusion Models with Reinforcement Learning
Diffusion models have recently emerged as the de facto standard for generating complex, high-dimensional outputs. You may know them for their ability to produce stunning AI art and hyper-realistic synthetic images, but they have also found success in other applications such as drug design and continuous control. The key idea behind diffusion models is to iteratively transform random noise into a sample, such as an image or protein structure. This is typically motivated as a maximum likelihood estimation problem, where the model is trained to generate samples that match the training data as closely as possible.
However, most use cases of diffusion models are not directly concerned with matching the training data, but instead with a downstream objective. We don’t just want an image that looks like existing images, but one that has a specific type of appearance; we don’t just want a drug molecule that is physically plausible, but one that is as effective as possible. In this post, we show how diffusion models can be trained on these downstream objectives directly using reinforcement learning (RL). To do this, we finetune Stable Diffusion on a variety of objectives, including image compressibility, human-perceived aesthetic quality, and prompt-image alignment. The last of these objectives uses feedback from a large vision-language model to improve the model’s performance on unusual prompts, demonstrating how powerful AI models can be used to improve each other without any humans in the loop.
International ACM Conference on Research and Development in Information Retrieval (SIGIR) 2023
Apple Machine Learning Research
Symbol tuning improves in-context learning in language models
A key feature of human intelligence is that humans can learn to perform new tasks by reasoning using only a few examples. Scaling up language models has unlocked a range of new applications and paradigms in machine learning, including the ability to perform challenging reasoning tasks via in-context learning. Language models, however, are still sensitive to the way that prompts are given, indicating that they are not reasoning in a robust manner. For instance, language models often require heavy prompt engineering or phrasing tasks as instructions, and they exhibit unexpected behaviors such as performance on tasks being unaffected even when shown incorrect labels.
In “Symbol tuning improves in-context learning in language models”, we propose a simple fine-tuning procedure that we call symbol tuning, which can improve in-context learning by emphasizing input–label mappings. We experiment with symbol tuning across Flan-PaLM models and observe benefits across various settings.
- Symbol tuning boosts performance on unseen in-context learning tasks and is much more robust to underspecified prompts, such as those without instructions or without natural language labels.
- Symbol-tuned models are much stronger at algorithmic reasoning tasks.
- Finally, symbol-tuned models show large improvements in following flipped-labels presented in-context, meaning that they are more capable of using in-context information to override prior knowledge.
Motivation
Instruction tuning is a common fine-tuning method that has been shown to improve performance and allow models to better follow in-context examples. One shortcoming, however, is that models are not forced to learn to use the examples because the task is redundantly defined in the evaluation example via instructions and natural language labels. For example, on the left in the figure above, although the examples can help the model understand the task (sentiment analysis), they are not strictly necessary since the model could ignore the examples and just read the instruction that indicates what the task is.
In symbol tuning, the model is fine-tuned on examples where the instructions are removed and natural language labels are replaced with semantically-unrelated labels (e.g., “Foo,” “Bar,” etc.). In this setup, the task is unclear without looking at the in-context examples. For example, on the right in the figure above, multiple in-context examples would be needed to figure out the task. Because symbol tuning teaches the model to reason over the in-context examples, symbol-tuned models should have better performance on tasks that require reasoning between in-context examples and their labels.
Datasets and task types used for symbol tuning. |
Symbol-tuning procedure
We selected 22 publicly-available natural language processing (NLP) datasets that we use for our symbol-tuning procedure. These tasks have been widely used in the past, and we only chose classification-type tasks since our method requires discrete labels. We then remap labels to a random label from a set of ~30K arbitrary labels selected from one of three categories: integers, character combinations, and words.
For our experiments, we symbol tune Flan-PaLM, the instruction-tuned variants of PaLM. We use three different sizes of Flan-PaLM models: Flan-PaLM-8B, Flan-PaLM-62B, and Flan-PaLM-540B. We also tested Flan-cont-PaLM-62B (Flan-PaLM-62B at 1.3T tokens instead of 780B tokens), which we abbreviate as 62B-c.
We use a set of ∼300K arbitrary symbols from three categories (integers, character combinations, and words). ∼30K symbols are used during tuning and the rest are held out for evaluation. |
Experimental setup
We want to evaluate a model’s ability to perform unseen tasks, so we cannot evaluate on tasks used in symbol tuning (22 datasets) or used during instruction tuning (1.8K tasks). Hence, we choose 11 NLP datasets that were not used during fine-tuning.
In-context learning
In the symbol-tuning procedure, models must learn to reason with in-context examples in order to successfully perform tasks because prompts are modified to ensure that tasks cannot simply be learned from relevant labels or instructions. Symbol-tuned models should perform better in settings where tasks are unclear and require reasoning between in-context examples and their labels. To explore these settings, we define four in-context learning settings that vary the amount of reasoning required between inputs and labels in order to learn the task (based on the availability of instructions/relevant labels)
Symbol tuning improves performance across all settings for models 62B and larger, with small improvements in settings with relevant natural language labels (+0.8% to +4.2%) and substantial improvements in settings without relevant natural language labels (+5.5% to +15.5%). Strikingly, when relevant labels are unavailable, symbol-tuned Flan-PaLM-8B outperforms FlanPaLM-62B, and symbol-tuned Flan-PaLM-62B outperforms Flan-PaLM-540B. This performance difference suggests that symbol tuning can allow much smaller models to perform as well as large models on these tasks (effectively saving ∼10X inference compute).
Algorithmic reasoning
We also experiment on algorithmic reasoning tasks from BIG-Bench. There are two main groups of tasks: 1) List functions — identify a transformation function (e.g., remove the last element in a list) between input and output lists containing non-negative integers; and 2) simple turing concepts — reason with binary strings to learn the concept that maps an input to an output (e.g., swapping 0s and 1s in a string).
On the list function and simple turing concept tasks, symbol tuning results in an average performance improvement of 18.2% and 15.3%, respectively. Additionally, Flan-cont-PaLM-62B with symbol tuning outperforms Flan-PaLM-540B on the list function tasks on average, which is equivalent to a ∼10x reduction in inference compute. These improvements suggest that symbol tuning strengthens the model’s ability to learn in-context for unseen task types, as symbol tuning did not include any algorithmic data.
Symbol-tuned models achieve higher performance on list function tasks and simple turing concept tasks. (A–E): categories of list functions tasks. (F): simple turing concepts task. |
Flipped labels
In the flipped-label experiment, labels of in-context and evaluation examples are flipped, meaning that prior knowledge and input-label mappings disagree (e.g., sentences containing positive sentiment labeled as “negative sentiment”), thereby allowing us to study whether models can override prior knowledge. Previous work has shown that while pre-trained models (without instruction tuning) can, to some extent, follow flipped labels presented in-context, instruction tuning degraded this ability.
We see that there is a similar trend across all model sizes — symbol-tuned models are much more capable of following flipped labels than instruction-tuned models. We found that after symbol tuning, Flan-PaLM-8B sees an average improvement across all datasets of 26.5%, Flan-PaLM-62B sees an improvement of 33.7%, and Flan-PaLM-540B sees an improvement of 34.0%. Additionally, symbol-tuned models achieve similar or better than average performance as pre-training–only models.
Symbol-tuned models are much better at following flipped labels presented in-context than instruction-tuned models are. |
Conclusion
We presented symbol tuning, a new method of tuning models on tasks where natural language labels are remapped to arbitrary symbols. Symbol tuning is based off of the intuition that when models cannot use instructions or relevant labels to determine a presented task, it must do so by instead learning from in-context examples. We tuned four language models using our symbol-tuning procedure, utilizing a tuning mixture of 22 datasets and approximately 30K arbitrary symbols as labels.
We first showed that symbol tuning improves performance on unseen in-context learning tasks, especially when prompts do not contain instructions or relevant labels. We also found that symbol-tuned models were much better at algorithmic reasoning tasks, despite the lack of numerical or algorithmic data in the symbol-tuning procedure. Finally, in an in-context learning setting where inputs have flipped labels, symbol tuning (for some datasets) restores the ability to follow flipped labels that was lost during instruction tuning.
Future work
Through symbol tuning, we aim to increase the degree to which models can examine and learn from input–label mappings during in-context learning. We hope that our results encourage further work towards improving language models’ ability to reason over symbols presented in-context.
Acknowledgements
The authors of this post are now part of Google DeepMind. This work was conducted by Jerry Wei, Le Hou, Andrew Lampinen, Xiangning Chen, Da Huang, Yi Tay, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma, and Quoc V. Le. We would like to thank our colleagues at Google Research and Google DeepMind for their advice and helpful discussions.
Effectively solve distributed training convergence issues with Amazon SageMaker Hyperband Automatic Model Tuning
Recent years have shown amazing growth in deep learning neural networks (DNNs). This growth can be seen in more accurate models and even opening new possibilities with generative AI: large language models (LLMs) that synthesize natural language, text-to-image generators, and more. These increased capabilities of DNNs come with the cost of having massive models that require significant computational resources in order to be trained. Distributed training addresses this problem with two techniques: data parallelism and model parallelism. Data parallelism is used to scale the training process over multiple nodes and workers, and model parallelism splits a model and fits them over the designated infrastructure. Amazon SageMaker distributed training jobs enable you with one click (or one API call) to set up a distributed compute cluster, train a model, save the result to Amazon Simple Storage Service (Amazon S3), and shut down the cluster when complete. Furthermore, SageMaker has continuously innovated in the distributed training space by launching features like heterogeneous clusters and distributed training libraries for data parallelism and model parallelism.
Efficient training on a distributed environment requires adjusting hyperparameters. A common example of good practice when training on multiple GPUs is to multiply batch (or mini-batch) size by the GPU number in order to keep the same batch size per GPU. However, adjusting hyperparameters often impacts model convergence. Therefore, distributed training needs to balance three factors: distribution, hyperparameters, and model accuracy.
In this post, we explore the effect of distributed training on convergence and how to use Amazon SageMaker Automatic Model Tuning to fine-tune model hyperparameters for distributed training using data parallelism.
The source code mentioned in this post can be found on the GitHub repository (an m5.xlarge instance is recommended).
Scale out training from a single to distributed environment
Data parallelism is a way to scale the training process to multiple compute resources and achieve faster training time. With data parallelism, data is partitioned among the compute nodes, and each node computes the gradients based on their partition and updates the model. These updates can be done using one or multiple parameter servers in an asynchronous, one-to-many, or all-to-all fashion. Another way can be to use an AllReduce algorithm. For example, in the ring-allreduce algorithm, each node communicates with only two of its neighboring nodes, thereby reducing the overall data transfers. To learn more about parameter servers and ring-allreduce, see Launching TensorFlow distributed training easily with Horovod or Parameter Servers in Amazon SageMaker. With regards to data partitioning, if there are n compute nodes, then each node should get a subset of the data, approximately 1/n in size.
To demonstrate the effect of scaling out training on model convergence, we run two simple experiments:
- Train an image classification model using a fully connected-layer DNN with ReLU activation functions using MXNet and Gluon frameworks. For training data, we used the MNIST dataset of handwritten digits. We used the source provided in the SageMaker example repository.
- Train a binary classification model using the SageMaker built-in XGBoost algorithm. We used the direct marketing dataset to predict bank customers who are likely to respond with a specific offer. The source code and steps to reproduce the experiment can be found on the GitHub repo.
Each model training ran twice: on a single instance and distributed over multiple instances. For the DNN distributed training, in order to fully utilize the distributed processors, we multiplied the mini-batch size by the number of instances (four). The following table summarizes the setup and results.
Problem type | Image classification | Binary classification | ||
Model | DNN | XGBoost | ||
Instance | ml.c4.xlarge | ml.m5.2xlarge | ||
Data set |
(Labeled images) |
Direct Marketing (tabular, numeric and vectorized categories) |
||
Validation metric | Accuracy | AUC | ||
Epocs/Rounds | 20 | 150 | ||
Number of Instances | 1 | 4 | 1 | 3 |
Distribution type | N/A | Parameter server | N/A | AllReduce |
Training time (minutes) | 8 | 3 | 3 | 1 |
Final Validation score | 0.97 | 0.11 | 0.78 | 0.63 |
For both models, the training time was reduced almost linearly by the distribution factor. However, model convergence suffered a significant drop. This behavior is consistent for the two different models, the different compute instances, the different distribution methods, and different data types. So, why did distributing the training process affect model accuracy?
There are a number of theories that try to explain this effect:
- When tensor updates are big in size, traffic between workers and the parameter server can get congested. Therefore, asynchronous parameter servers will suffer significantly worse convergence due to delays in weights updates [1].
- Increasing batch size can lead to over-fitting and poor generalization, thereby reducing the validation accuracy [2].
- When asynchronously updating model parameters, some DNNs might not be using the most recent updated model weights; therefore, they will be calculating gradients based on weights that are a few iterations behind. This leads to weight staleness [3] and can be caused by a number of reasons.
- Some hyperparameters are model or optimizer specific. For example, the XGBoost official documentation says that the
exact
value for thetree_mode
hyperparameter doesn’t support distributed training because XGBoost employs row splitting data distribution whereas theexact
tree method works on a sorted column format. - Some researchers proposed that configuring a larger mini-batch may lead to gradients with less stochasticity. This can happen when the loss function contains local minima and saddle points and no change is made to step size, to optimization getting stuck in such local minima or saddle point [4].
Optimize for distributed training
Hyperparameter optimization (HPO) is the process of searching and selecting a set of hyperparameters that are optimal for a learning algorithm. SageMaker Automatic Model Tuning (AMT) provides HPO as a managed service by running multiple training jobs on the provided dataset. SageMaker AMT searches the ranges of hyperparameters that you specify and returns the best values, as measured by a metric that you choose. You can use SageMaker AMT with the built-in algorithms or use your custom algorithms and containers.
However, optimizing for distributed training differs from common HPO because instead of launching a single instance per training job, each job actually launches a cluster of instances. This means a greater impact on cost (especially if you consider costly GPU-accelerated instances, which are typical for DNN). In addition to AMT limits, you could possibly hit SageMaker account limits for concurrent number of training instances. Finally, launching clusters can introduce operational overhead due to longer starting time. SageMaker AMT has specific features to address these issues. Hyperband with early stopping ensures that well-performing hyperparameters configurations are fine-tuned and those that underperform are automatically stopped. This enables efficient use of training time and reduces unnecessary costs. Also, SageMaker AMT fully supports the use of Amazon EC2 Spot Instances, which can optimize the cost of training up to 90% over on-demand instances. With regards to long start times, SageMaker AMT automatically reuses training instances within each tuning job, thereby reducing the average startup time of each training job by 20 times. Additionally, you should follow AMT best practices, such as choosing the relevant hyperparameters, their appropriate ranges and scales, and the best number of concurrent training jobs, and setting a random seed to reproduce results.
In the next section, we see these features in action as we configure, run, and analyze an AMT job using the XGBoost example we discussed earlier.
Configure, run, and analyze a tuning job
As mentioned earlier, the source code can be found on the GitHub repo. In Steps 1–5, we download and prepare the data, create the xgb3
estimator (the distributed XGBoost estimator is set to use three instances), run the training jobs, and observe the results. In this section, we describe how to set up the tuning job for that estimator, assuming you already went through Steps 1–5.
A tuning job computes optimal hyperparameters for the training jobs it launches by using a metric to evaluate performance. You can configure your own metric, which SageMaker will parse based on regex you configure and emit to stdout
, or use the metrics of SageMaker built-in algorithms. In this example, we use the built-in XGBoost objective metric, so we don’t need to configure a regex. To optimize for model convergence, we optimize based on the validation AUC metric:
We tune seven hyperparameters:
- num_round – Number of rounds for boosting during the training.
- eta – Step size shrinkage used in updates to prevent overfitting.
- alpha – L1 regularization term on weights.
- min_child_weight – Minimum sum of instance weight (hessian) needed in a child. If the tree partition step results in a leaf node with the sum of instance weight less than
min_child_weight
, the building process gives up further partitioning. - max_depth – Maximum depth of a tree.
- colsample_bylevel – Subsample ratio of columns for each split, in each level. This subsampling takes place once for every new depth level reached in a tree.
- colsample_bytree – Subsample ratio of columns when constructing each tree. For every tree constructed, the subsampling occurs once.
To learn more about XGBoost hyperparameters, see XGBoost Hyperparameters. The following code shows the seven hyperparameters and their ranges:
Next, we provide the configuration for the Hyperband strategy and the tuner object configuration using the SageMaker SDK. HyperbandStrategyConfig
can use two parameters: max_resource
(optional) for the maximum number of iterations to be used for a training job to achieve the objective, and min_resource
– the minimum number of iterations to be used by a training job before stopping the training. We use HyperbandStrategyConfig
to configure StrategyConfig
, which is later used by the tuning job definition. See the following code:
Now we create a HyperparameterTuner
object, to which we pass the following information:
- The XGBoost estimator, set to run with three instances
- The objective metric name and definition
- Our hyperparameter ranges
- Tuning resource configurations such as number of training jobs to run in total and how many training jobs can be run in parallel
- Hyperband settings (the strategy and configuration we configured in the last step)
- Early stopping (
early_stopping_type
) set toOff
Why do we set early stopping to Off? Training jobs can be stopped early when they are unlikely to improve the objective metric of the hyperparameter tuning job. This can help reduce compute time and avoid overfitting your model. However, Hyperband uses an advanced built-in mechanism to apply early stopping. Therefore, the parameter early_stopping_type
must be set to Off
when using the Hyperband internal early stopping feature. See the following code:
Finally, we start the automatic model tuning job by calling the fit method. If you want to launch the job in an asynchronous fashion, set wait
to False
. See the following code:
You can follow the job progress and summary on the SageMaker console. In the navigation pane, under Training, choose Hyperparameter tuning jobs, then choose the relevant tuning job. The following screenshot shows the tuning job with details on the training jobs’ status and performance.
When the tuning job is complete, we can review the results. In the notebook example, we show how to extract results using the SageMaker SDK. First, we examine how the tuning job increased model convergence. You can attach the HyperparameterTuner
object using the job name and call the describe method. The method returns a dictionary containing tuning job metadata and results.
In the following code, we retrieve the value of the best-performing training job, as measured by our objective metric (validation AUC):
The result is 0.78 in AUC on the validation set. That’s a significant improvement over the initial 0.63!
Next, let’s see how fast our training job ran. For that, we use the HyperparameterTuningJobAnalytics method in the SDK to fetch results about the tuning job, and read into a Pandas data frame for analysis and visualization:
Let’s see the average time a training job took with Hyperband strategy:
The average time took approximately 1 minute. This is consistent with the Hyperband strategy mechanism that stops underperforming training jobs early. In terms of cost, the tuning job charged us for a total of 30 minutes of training time. Without Hyperband early stopping, the total billable training duration was expected to be 90 minutes (30 jobs * 1 minutes per job * 3 instances per job). That is three times better in cost savings! Finally, we see that the tuning job ran 30 training jobs and took a total of 12 minutes. That is almost 50% less of the expected time (30 jobs/4 jobs in parallel * 3 minutes per job).
Conclusion
In this post, we described some observed convergence issues when training models with distributed environments. We saw that SageMaker AMT using Hyperband addressed the main concerns that optimizing data parallel distributed training introduced: convergence (which improved by more than 10%), operational efficiency (the tuning job took 50% less time than a sequential, non-optimized job would have taken) and cost-efficiency (30 vs. the 90 billable minutes of training job time). The following table summarizes our results:
Improvement Metric | No Tuning/Naive Model Tuning Implementation | SageMaker Hyperband Automatic Model Tuning | Measured Improvement |
Model Quality (Measured by validation AUC) |
0.63 | 0.78 | 15% |
Cost (Measured by billable training minutes) |
90 | 30 | 66% |
Operational efficiency (Measured by total running time) |
24 | 12 | 50% |
In order to fine-tune with regards to scaling (cluster size), you can repeat the tuning job with multiple cluster configurations and compare the results to find the optimal hyperparameters that satisfy speed and model accuracy.
We included the steps to achieve this in the last section of the notebook.
References
[1] Lian, Xiangru, et al. “Asynchronous decentralized parallel stochastic gradient descent.” International Conference on Machine Learning. PMLR, 2018. [2] Keskar, Nitish Shirish, et al. “On large-batch training for deep learning: Generalization gap and sharp minima.” arXiv preprint arXiv:1609.04836 (2016). [3] Dai, Wei, et al. “Toward understanding the impact of staleness in distributed machine learning.” arXiv preprint arXiv:1810.03264 (2018). [4] Dauphin, Yann N., et al. “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization.” Advances in neural information processing systems 27 (2014).About the Author
Uri Rosenberg is the AI & ML Specialist Technical Manager for Europe, Middle East, and Africa. Based out of Israel, Uri works to empower enterprise customers to design, build, and operate ML workloads at scale. In his spare time, he enjoys cycling, hiking, and complaining about data preparation.
AI-Fueled Productivity: Generative AI Opens New Era of Efficiency Across Industries
A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet.
On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers to their questions to accelerating the work of researchers as they seek scientific breakthroughs, and much, much more.
Businesses that previously dabbled in AI are now rushing to adopt and deploy the latest applications. Generative AI — the ability of algorithms to create new text, images, sounds, animations, 3D models and even computer code — is moving at warp speed, transforming the way people work and play.
By employing large language models (LLMs) to handle queries, the technology can dramatically reduce the time people devote to manual tasks like searching for and compiling information.
The stakes are high. AI could contribute more than $15 trillion to the global economy by 2030, according to PwC. And the impact of AI adoption could be greater than the inventions of the internet, mobile broadband and the smartphone — combined.
The engine driving generative AI is accelerated computing. It uses GPUs, DPUs and networking along with CPUs to accelerate applications across science, analytics, engineering, as well as consumer and enterprise use cases.
Early adopters across industries — from drug discovery, financial services, retail and telecommunications to energy, higher education and the public sector — are combining accelerated computing with generative AI to transform business operations, service offerings and productivity.
Generative AI for Drug Discovery
Today, radiologists use AI to detect abnormalities in medical images, doctors use it to scan electronic health records to uncover patient insights, and researchers use it to accelerate the discovery of novel drugs.
Traditional drug discovery is a resource-intensive process that can require the synthesis of over 5,000 chemical compounds and yields an average success rate of just 10%. And it takes more than a decade for most new drug candidates to reach the market.
Researchers are now using generative AI models to read a protein’s amino acid sequence and accurately predict the structure of target proteins in seconds, rather than weeks or months.
Using NVIDIA BioNeMo models, Amgen, a global leader in biotechnology, has slashed the time it takes to customize models for molecule screening and optimization from three months to just a few weeks. This type of trainable foundation model enables scientists to create variants for research into specific diseases, allowing them to develop target treatments for rare conditions.
Whether predicting protein structures or securely training algorithms on large real-world and synthetic datasets, generative AI and accelerated computing are opening new areas of research that can help mitigate the spread of disease, enable personalized medical treatments and boost patient survival rates.
Generative AI for Financial Services
According to a recent NVIDIA survey, the top AI use cases in the financial services industry are customer services and deep analytics, where natural language processing and LLMs are used to better respond to customer inquiries and uncover investment insights. Another common application is in recommender systems that power personalized banking experiences, marketing optimization and investment guidance.
Advanced AI applications have the potential to help the industry better prevent fraud and transform every aspect of banking, from portfolio planning and risk management to compliance and automation.
Eighty percent of business-relevant information is in an unstructured format — primarily text — which makes it a prime candidate for generative AI. Bloomberg News produces 5,000 stories a day related to the financial and investment community. These stories represent a vast trove of unstructured market data that can be used to make timely investment decisions.
NVIDIA, Deutsche Bank, Bloomberg and others are creating LLMs trained on domain-specific and proprietary data to power finance applications.
Financial Transformers, or “FinFormers,” can learn context and understand the meaning of unstructured financial data. They can power Q&A chatbots, summarize and translate financial texts, provide early warning signs of counterparty risk, quickly retrieve data and identify data-quality issues.
These generative AI tools rely on frameworks that can integrate proprietary data into model training and fine-tuning, integrate data curation to prevent bias and use guardrails to keep conversations finance-specific.
Expect fintech startups and large international banks to expand their use of LLMs and generative AI to develop sophisticated virtual assistants to serve internal and external stakeholders, create hyper-personalized customer content, automate document summarization to reduce manual work, and analyze terabytes of public and private data to generate investment insights.
Generative AI for Retail
With 60% of all shopping journeys starting online and consumers more connected and knowledgeable than ever, AI has become a vital tool to help retailers match shifting expectations and differentiate from a rising tide of competition.
Retailers are using AI to improve customer experiences, power dynamic pricing, create customer segmentation, design personalized recommendations and perform visual search.
Generative AI can support customers and employees at every step through the buyer journey.
With AI models trained on specific brand and product data, they can generate robust product descriptions that improve search engine optimization rankings and help shoppers find the exact product they’re looking for. For example, generative AI can use metatags containing product attributes to generate more comprehensive product descriptions that include various terms like “low sugar” or “gluten free.”
AI virtual assistants can check enterprise resource planning systems and generate customer service messages to inform shoppers about which items are available and when orders will ship, and even assist customers with order change requests.
Fashable, a member of NVIDIA Inception’s global network of technology startups, is using generative AI to create virtual clothing designs, eliminating the need for physical fabric during product development. With the models trained on both proprietary and market data, this reduces the environmental impact of fashion design and helps retailers design clothes according to current market trends and tastes.
Expect retailers to use AI to capture and retain customer attention, deliver superior shopping experiences, and drive revenue by matching shoppers with the right products at the right time.
Generative AI for Telecommunications
In an NVIDIA survey covering the telecommunications industry, 95% of respondents reported that they were engaged with AI, while two-thirds believed that AI would be important to their company’s future success.
Whether improving customer service, streamlining network operations and design, supporting field technicians or creating new monetization opportunities, generative AI has the potential to reinvent the telecom industry.
Telcos can train diagnostic AI models with proprietary data on network equipment and services, performance, ticket issues, site surveys and more. These models can accelerate troubleshooting of technical performance issues, recommend network designs, check network configurations for compliance, predict equipment failures, and identify and respond to security threats.
Generative AI applications on handheld devices can support field technicians by scanning equipment and generating virtual tutorials to guide them through repairs. Virtual guides can then be enhanced with augmented reality, enabling technicians to analyze equipment in a 3D immersive environment or call on a remote expert for support.
New revenue opportunities will also open for telcos. With large edge infrastructure and access to vast datasets, telcos around the world are now offering generative AI as a service to enterprise and government customers.
As generative AI advances, expect telecommunications providers to use the technology to optimize network performance, improve customer support, detect security intrusions and enhance maintenance operations.
Generative AI for Energy
In the energy industry, AI is powering predictive maintenance and asset optimization, smart grid management, renewable energy forecasting, grid security and more.
To meet growing data needs across aging infrastructure and new government compliance regulations, energy operators are looking to generative AI.
In the U.S., electric utility companies spend billions of dollars every year to inspect, maintain and upgrade power generation and transmission infrastructure.
Until recently, using vision AI to support inspection required algorithms to be trained on thousands of manually collected and tagged photos of grid assets, with training data constantly updated for new components. Now, generative AI can do the heavy lifting.
With a small set of image training data, algorithms can generate thousands of physically accurate images to train computer vision models that help field technicians identify grid equipment corrosion, breakage, obstructions and even detect wildfires. This type of proactive maintenance enhances grid reliability and resiliency by reducing downtime, while diminishing the need to dispatch teams to the field.
Generative AI can also reduce the need for manual research and analysis. According to McKinsey, employees spend up to 1.8 hours per day searching for information — nearly 20% of the work week. To increase productivity, energy companies can train LLMs on proprietary data, including meeting notes, SAP records, emails, field best practices and public data such as standard material data sheets.
With this type of knowledge repository connected to an AI chatbot, engineers and data scientists can get instant answers to highly technical questions. For example, a maintenance engineer troubleshooting pitch control issues on a turbine’s hydraulic system could ask a bot: “How should I adjust the hydraulic pressure or flow to rectify pitch control issues on a model turbine from company X?” A properly trained model would deliver specific instructions to the user, who wouldn’t have to look through a bulky manual to find answers.
With AI applications for new system design, customer service and automation, expect generative AI to enhance safety and energy efficiency, as well as reduce operational expenses in the energy industry.
Generative AI for Higher Education and Research
From intelligent tutoring systems to automated essay grading, AI has been employed in education for decades. As universities use AI to improve teacher and student experiences, they’re increasingly dedicating resources to build AI-focused research initiatives.
For example, researchers at the University of Florida have access to one of the world’s fastest supercomputers in academia. They’ve used it to develop GatorTron — a natural language processing model that enables computers to read and interpret medical language in clinical notes that are stored in electronic health records. With a model that understands medical context, AI developers can create numerous medical applications, such as speech-to-text apps that support doctors with automated medical charting.
In Europe, an industry-university collaboration involving the Technical University of Munich is demonstrating that LLMs trained on genomics data can generalize across a plethora of genomic tasks, unlike previous approaches that required specialized models. The genomics LLM is expected to help scientists understand the dynamics of how DNA is translated into RNA and proteins, unlocking new clinical applications that will benefit drug discovery and health.
To conduct this type of groundbreaking research and attract the most motivated students and qualified academic professionals, higher education institutes should consider a whole-university approach to pool budget, plan AI initiatives, and distribute AI resources and benefits across disciplines.
Generative AI for the Public Sector
Today, the biggest opportunity for AI in the public sector is helping public servants to perform their jobs more efficiently and save resources.
The U.S. federal government employs over 2 million civilian employees — two-thirds of whom work in professional and administrative jobs.
These administrative roles often involve time-consuming manual tasks, including drafting, editing and summarizing documents, updating databases, recording expenditures for auditing and compliance, and responding to citizen inquiries.
To control costs and bring greater efficiency to routine job functions, government agencies can use generative AI.
Generative AI’s ability to summarize documents has great potential to boost the productivity of policymakers and staffers, civil servants, procurement officers and contractors. Consider a 756-page report recently released by the National Security Commission on Artificial Intelligence. With reports and legislation often spanning hundreds of pages of dense academic or legal text, AI-powered summaries generated in seconds can quickly break down complex content into plain language, saving the human resources otherwise needed to complete the task.
AI virtual assistants and chatbots powered by LLMs can instantly deliver relevant information to people online, taking the burden off of overstretched staff who work phone banks at agencies like the Treasury Department, IRS and DMV.
With simple text inputs, AI content generation can help public servants create and distribute publications, email correspondence, reports, press releases and public service announcements.
The analytical capabilities of AI can also help process documents to speed the delivery of vital services provided by organizations like Medicare, Medicaid, Veterans Affairs, USPS and the State Department.
Generative AI could be a pivotal tool to help government bodies work within budget constraints, deliver government services more quickly and achieve positive public sentiment.
Generative AI – A Key Ingredient for Business Success
Across every field, organizations are transforming employee productivity, improving products and delivering higher-quality services with generative AI.
To put generative AI into practice, businesses need expansive amounts of data, deep AI expertise and sufficient compute power to deploy and maintain models quickly. Enterprises can fast-track adoption with the NeMo generative AI framework, part of NVIDIA AI Enterprise software, running on DGX Cloud. NVIDIA’s pretrained foundation models offer a simplified approach to building and running customized generative AI solutions for unique business use cases.
Learn more about powerful generative AI tools to help your business increase productivity, automate tasks, and unlock new opportunities for employees and customers.
Bringing code analysis tools to Jupyter notebooks
Based on a survey of thousands of machine learning practitioners, a new CodeGuru extension addresses common problems, such as code cell execution order, incorrect API calls, and security.Read More
Full-Scale Gaming: ‘Dragon’s Dogma: Dark Arisen’ Comes to GeForce NOW
Arise, members! Capcom’s legendary role-playing game Dragon’s Dogma: Dark Arisen joins the GeForce NOW library today.
The RPG and THQ Nordic’s Jagged Alliance 3 are newly supported on GeForce NOW, playable on nearly any device.
From Dusk Till Pawn
Become the Arisen and take up the challenge in Capcom’s critically acclaimed RPG. Set in a huge open world, Dragon’s Dogma: Dark Arisen brings players on an epic adventure filled with challenging battles and action.
But there’s no need to go it alone: Adventure with up to three Pawns. These customizable AI companions fight independently, demonstrating prowess and ability they’ve developed based on traits learned from each player.
Players can share their Pawns online and reap rewards of treasures, tips and strategy hints for taking down terrifying enemies. Pawns can also be borrowed when specific skills are needed to complete various challenging quests.
Revisit Gransys or experience Dragon’s Dogma for the first time. Members can play the real Steam version of this RPG classic with support for stunning visuals and high-resolution graphics, even on devices like Macs, mobile devices and smart TVs. Priority members can adventure at up to 1080p 60 frames per second, or upgrade to an Ultimate membership for gameplay at up to 4K 120 fps, longer streaming sessions and RTX ON for supported games.
Game On
Another week means new games.
THQ Nordic’s tactical RPG Jagged Alliance 3 joins the cloud this week. Chaos reigns when the elected president of Grand Chien — a nation of rich natural resources and deep political divides — goes missing and a paramilitary force known as “The Legion” seizes control of the countryside. Recruit from a large cast of unique mercenaries and make choices to impact the country’s fate.
Members can look forward to the following this week:
On top of that, in collaboration with EE, the U.K.’s biggest and fastest mobile network, GeForce NOW launched new cloud gaming bundles featuring Priority and Ultimate memberships. To celebrate, check out how streamer Leah ‘Leahviathan’ Alexandra showcased GeForce NOW in action at the U.K.’s highest-altitude gaming den on the slopes of Ben Nevis, 1,500 feet above sea level in the clouds of the Scottish Highlands.
What are you planning to play this weekend? Let us know on Twitter or in the comments below.
Who’s an NPC you’d want to be friends with IRL?
— NVIDIA GeForce NOW (@NVIDIAGFN) July 12, 2023
Introducing NotebookLM
We’re rolling out NotebookLM, an experimental offering from Google Labs to summarize information, complex ideas and brainstorm new connections.Read More