NVIDIA to Share New Details on Grace CPU, Hopper GPU, NVLink Switch, Jetson Orin Module at Hot Chips

In four talks over two days, senior NVIDIA engineers will describe innovations in accelerated computing for modern data centers and systems at the edge of the network.

Speaking at a virtual Hot Chips event, an annual gathering of processor and system architects, they’ll disclose performance numbers and other technical details for NVIDIA’s first server CPU, the Hopper GPU, the latest version of the NVSwitch interconnect chip and the NVIDIA Jetson Orin system on module (SoM).

The presentations provide fresh insights on how the NVIDIA platform will hit new levels of performance, efficiency, scale and security.

Specifically, the talks demonstrate a design philosophy of innovating across the full stack of chips, systems and software where GPUs, CPUs and DPUs act as peer processors. Together they create a platform that’s already running AI, data analytics and high performance computing jobs inside cloud service providers, supercomputing centers, corporate data centers and autonomous systems.

Inside NVIDIA’s First Server CPU

Data centers require flexible clusters of CPUs, GPUs and other accelerators sharing massive pools of memory to deliver the energy-efficient performance today’s workloads demand.

To meet that need, Jonathon Evans, a distinguished engineer and 15-year veteran at NVIDIA, will describe the NVIDIA NVLink-C2C. It connects CPUs and GPUs at 900 gigabytes per second with 5x the energy efficiency of the existing PCIe Gen 5 standard, thanks to data transfers that consume just 1.3 picojoules per bit.

NVLink-C2C connects two CPU chips to create the NVIDIA Grace CPU with 144 Arm Neoverse cores. It’s a processor built to solve the world’s largest computing problems.

For maximum efficiency, the Grace CPU uses LPDDR5X memory. It enables a terabyte per second of memory bandwidth while keeping power consumption for the entire complex to 500 watts.

One Link, Many Uses

NVLink-C2C also links Grace CPU and Hopper GPU chips as memory-sharing peers in the NVIDIA Grace Hopper Superchip, delivering maximum acceleration for performance-hungry jobs such as AI training.

Anyone can build custom chiplets using NVLink-C2C to coherently connect to NVIDIA GPUs, CPUs, DPUs and SoCs, expanding this new class of integrated products. The interconnect will support AMBA CHI and CXL protocols used by Arm and x86 processors, respectively.

Memory benchmarks for Grace and Grace Hopper
First memory benchmarks for Grace and Grace Hopper.

To scale at the system level, the new NVIDIA NVSwitch connects multiple servers into one AI supercomputer. It uses NVLink, interconnects running at 900 gigabytes per second, more than 7x the bandwidth of PCIe Gen 5.

NVSwitch lets users link 32 NVIDIA DGX H100 systems into an AI supercomputer that delivers an exaflop of peak AI performance.

Alexander Ishii and Ryan Wells, both veteran NVIDIA engineers, will describe how the switch lets users build systems with up to 256 GPUs to tackle demanding workloads like training AI models that have more than 1 trillion parameters.

The switch includes engines that speed data transfers using the NVIDIA Scalable Hierarchical Aggregation Reduction Protocol. SHARP is an in-network computing capability that debuted on NVIDIA Quantum InfiniBand networks. It can double data throughput on communications-intensive AI applications.

NVSwitch systems enable exaflop-class AI
NVSwitch systems enable exaflop-class AI supercomputers.

Jack Choquette, a senior distinguished engineer with 14 years at the company, will provide a detailed tour of the NVIDIA H100 Tensor Core GPU, aka Hopper.

In addition to using the new interconnects to scale to unprecedented heights, it packs many advanced features that boost the accelerator’s performance, efficiency and security.

Hopper’s new Transformer Engine and upgraded Tensor Cores deliver a 30x speedup compared to the prior generation on AI inference with the world’s largest neural network models. And it employs the world’s first HBM3 memory system to deliver a whopping 3 terabytes of memory bandwidth, NVIDIA’s biggest generational increase ever.

Among other new features:

Choquette, one of the lead chip designers on the Nintendo64 console early in his career, will also describe parallel computing techniques underlying some of Hopper’s advances.

Michael Ditty, an architecture manager with a 17-year tenure at the company, will provide new performance specs for NVIDIA Jetson AGX Orin, an engine for edge AI, robotics and advanced autonomous machines.

It integrates 12 Arm Cortex-A78 cores and an NVIDIA Ampere architecture GPU to deliver up to 275 trillion operations per second on AI inference jobs. That’s up to 8x greater performance at 2.3x higher energy efficiency than the prior generation.

The latest production module packs up to 32 gigabytes of memory and is part of a compatible family that scales down to pocket-sized 5W Jetson Nano developer kits.

Performance benchmarks for NVIDIA Orin
Performance benchmarks for NVIDIA Orin

All the new chips support the NVIDIA software stack that accelerates more than 700 applications and is used by 2.5 million developers.

Based on the CUDA programming model, it includes dozens of NVIDIA SDKs for vertical markets like automotive (DRIVE) and healthcare (Clara), as well as technologies such as recommendation systems (Merlin) and conversational AI (Riva).

The NVIDIA AI platform is available from every major cloud service and system maker.

The post NVIDIA to Share New Details on Grace CPU, Hopper GPU, NVLink Switch, Jetson Orin Module at Hot Chips appeared first on NVIDIA Blog.

Read More

Meet the Omnivore: Startup in3D Turns Selfies Into Talking, Dancing Avatars With NVIDIA Omniverse

Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse to accelerate their 3D workflows and create virtual worlds.

Imagine taking a selfie and using it to get a moving, talking, customizable 3D avatar of yourself in just seconds.

A new extension for NVIDIA Omniverse, a design collaboration and world simulation platform, enables just that.

Created by developers at software startup in3D, the extension lets people instantly import 3D avatars of themselves into virtual environments using their smartphones. Omniverse Extensions are the core building blocks that let anyone create and extend functions of Omniverse Apps.

The in3D app can now bring people, in their digital forms, into Omniverse. It helps creators build engaging virtual worlds and use these avatars as heroes, actors or spectators in their stories. The app works on any phone with a camera, recreating a user’s full geometry and texture based on a video selfie.

The avatars can even be added into 3D worlds with animations and a customizable wardrobe.

In3D is a member of NVIDIA Inception, a free, global program that nurtures cutting-edge startups.

Simple and Scalable Avatar Creation

Creating a photorealistic 3D avatar has traditionally taken up to several months, with costs reaching up to tens of thousands of dollars. Photogrammetry, a standard approach to creating 3D references of humans from images, is extremely costly, requires a digital studio and lacks scalability.

With in3D, the process of creating 3D avatars is simple and scalable. The app understands the geometry, texture, depth and various vectors of a person via a mobile scan — and uses this information to replicate lifelike detail and create predictive animations for avatars.

Dmitry Ulyanov, CEO of in3D, which is based in Tel Aviv, Israel, said the app captures even small details with centimeter-grade accuracy and automatically fixes lighting. This allows for precise head geometry from a single selfie, as well as estimation of a user’s exact body shape.

For creators building 3D worlds, in3D software can save countless hours, increase productivity and result in substantial cost savings, Ulyanov said.

“Manually creating one avatar can take up to months,” he added. “With in3D’s scanning app and software development kit, a user can scan and upload 21,000 people with a single GPU and mobile phone in the same amount of time.”

Connecting to Omniverse

Ulyanov said that using in3D’s extension with NVIDIA Omniverse Avatar Cloud Engine (ACE) opens up many possibilities for avatar building, as users can easily customize imported avatars from in3D to engage and interact with their virtual worlds — in real time and at scale.

In3D uses Universal Scene Description (USD), an open-source, extensible file format, to seamlessly integrate its high-fidelity avatars into Omniverse. All avatar data is contained in a USD file, removing the need for complex shaders or embeddings. And bringing the avatars into Omniverse only requires a simple drag and drop.

Once imported into Omniverse via USD, the avatars can be used in apps like Omniverse Create and Audio2Face. Users have a complete toolset within Omniverse to support holistic content creation, whether animating avatars’ bodies with the retargeting tool or crafting their facial expressions with Audio2Face.

To build the Omniverse Extension, in3D used Omniverse Kit and followed the development flow using the VSCode computer program. Being able to put a breakpoint anywhere in the code made VSCode an easy-to-use, convenient, out-of-the-box solution for connecting in3D to Omniverse, Ulyanov said.

“The ability to centralize our SDK alongside other software for 3D developers is game changing,” he said. “With our Omniverse Extension now available, we’re looking to expand the base of developers who use our avatars.”

“Having the ability to upload our SDK and connect it with all the tools that 3D developers use has made in3D a tangible solution to deploy across all 3D development environments,” said Sergei Sherman, chief marketing officer at in3D. “This was something we wouldn’t have been able to achieve on our own in such a short amount of time.”

Join In on the Creation

Creators and developers across the world can download NVIDIA Omniverse for free, and enterprise teams can use the platform for their 3D projects.

Learn how to connect and create virtual worlds with Omniverse at NVIDIA GTC, the design and simulation conference for the era of AI and the metaverse, running online Sept. 19-22. Registration is free and offers access to dozens of sessions and special events.

Developers can use Omniverse Code to create their own Omniverse Extension for the inaugural #ExtendOmniverse contest by Friday, Sept. 9, at 5 p.m. PT, for a chance to win an NVIDIA RTX GPU. The winners will be announced in the NVIDIA Omniverse User Group at GTC.

Find additional documentation and tutorials in the Omniverse Resource Center, which details how developers like Ulyanov can build custom USD-based applications and extensions for the platform.

Follow NVIDIA Omniverse on Instagram, Medium, Twitter and YouTube for additional resources and inspiration. Check out the Omniverse forums, and join our Discord server and Twitch channel to chat with the community.

The post Meet the Omnivore: Startup in3D Turns Selfies Into Talking, Dancing Avatars With NVIDIA Omniverse appeared first on NVIDIA Blog.

Read More

Run PyTorch Lightning and native PyTorch DDP on Amazon SageMaker Training, featuring Amazon Search

So much data, so little time. Machine learning (ML) experts, data scientists, engineers and enthusiasts have encountered this problem the world over. From natural language processing to computer vision, tabular to time series, and everything in-between, the age-old problem of optimizing for speed when running data against as many GPUs as you can get has inspired countless solutions. Today, we’re happy to announce features for PyTorch developers using native open-source frameworks, like PyTorch Lightning and PyTorch DDP, that will streamline their path to the cloud.

Amazon SageMaker is a fully-managed service for ML, and SageMaker model training is an optimized compute environment for high-performance training at scale. SageMaker model training offers a remote training experience with a seamless control plane to easily train and reproduce ML models at high performance and low cost. We’re thrilled to announce new features in the SageMaker training portfolio that make running PyTorch at scale even easier and more accessible:

  1. PyTorch Lightning can now be integrated to SageMaker’s distributed data parallel library with only one-line of code change.
  2. SageMaker model training now has support for native PyTorch Distributed Data Parallel with NCCL backend, allowing developers to migrate onto SageMaker easier than ever before.

In this post, we discuss these new features, and also learn how Amazon Search has run PyTorch Lightning with the optimized distributed training backend in SageMaker to speed up model training time.

Before diving into the Amazon Search case study, for those who aren’t familiar we would like to give some background on SageMaker’s distributed data parallel library. In 2020, we developed and launched a custom cluster configuration for distributed gradient descent at scale that increases overall cluster efficiency, introduced on Amazon Science as Herring. Using the best of both parameter servers and ring-based topologies, SageMaker Distributed Data Parallel (SMDDP) is optimized for the Amazon Elastic Compute Cloud (Amazon EC2) network topology, including EFA. For larger cluster sizes, SMDDP is able to deliver 20–40% throughput improvements relative to Horovod (TensorFlow) and PyTorch Distributed Data Parallel. For smaller cluster sizes and supported models, we recommend the SageMaker Training Compiler, which is able to decrease overall job time by up to 50%.

Customer spotlight: PyTorch Lightning on SageMaker’s optimized backend with Amazon Search

Amazon Search is responsible for the search and discovery experience on Amazon.com. It powers the search experience for customers looking for products to buy on Amazon. At a high level, Amazon Search builds an index for all products sold on Amazon.com. When a customer enters a query, Amazon Search then uses a variety of ML techniques, including deep learning models, to match relevant and interesting products to the customer query. Then it ranks the products before showing the results to the customer.

Amazon Search scientists have used PyTorch Lightning as one of the main frameworks to train the deep learning models that power Search ranking due to its added usability features on top of PyTorch. SMDDP was not supported for deep learning models written in PyTorch Lightning before this new SageMaker launch. This prevented Amazon Search scientists who prefer using PyTorch Lightning from scaling their model training using data parallel techniques, significantly slowing down their training time and preventing them from testing new experiments that require more scalable training.

The team’s early benchmarking results show 7.3 times faster training time for a sample model when trained on eight nodes as compared to a single-node training baseline. The baseline model used in these benchmarking is a multi-layer perceptron neural network with seven dense fully connected layers and over 200 parameters. The following table summarizes the benchmarking result on ml.p3.16xlarge SageMaker training instances.

Number of Instances Training Time (minutes) Improvement
1 99 Baseline
2 55 1.8x
4 27 3.7x
8 13.5 7.3x

Next, we dive into the details on the new launches. If you like, you can step through our corresponding example notebook .

Run PyTorch Lightning with the SageMaker distributed training library

We are happy to announce that SageMaker Data Parallel now seamlessly integrates with PyTorch Lightning within SageMaker training.

PyTorch Lightning is an open-source framework that provides a simplification for writing custom models in PyTorch. In some ways similar to what Keras did for TensorFlow, or even arguably Hugging Face, PyTorch Lightning provides a high-level API with abstractions for much of the lower-level functionality of PyTorch itself. This includes defining the model, profiling, evaluation, pruning, model parallelism, hyperparameter configurations, transfer learning, and more.

Previously, PyTorch Lightning developers were uncertain about how to seamlessly migrate their training code on to high-performance SageMaker GPU clusters. In addition, there was no way for them to take advantage of efficiency gains introduced by SageMaker Data Parallel.

For PyTorch Lightning, generally speaking, there should be little-to-no code changes to simply run these APIs on SageMaker Training. In the example notebooks we use the DDPStrategy and DDPPlugin methods.

There are three steps to use PyTorch Lightning with SageMaker Data Parallel as an optimized backend:

  1. Use a supported AWS Deep Learning Container (DLC) as your base image, or optionally create your own container and install the SageMaker Data Parallel backend yourself. Ensure that you have PyTorch Lightning included in your necessary packages, such as with a requirements.txt file.
  2. Make a few minor code changes to your training script that enable the optimized backend. These include:
    1. Import the SM DDP library:
      import smdistributed.dataparallel.torch.torch_smddp
      

    2. Set up the PyTorch Lightning environment for SageMaker:
      from pytorch_lightning.plugins.environments.lightning_environment 
        import LightningEnvironment
      
      env = LightningEnvironment()
      env.world_size = lambda: int(os.environ["WORLD_SIZE"])
      env.global_rank = lambda: int(os.environ["RANK"])

    3. If you’re using a version of PyTorch Lightning older than 1.5.10, you’ll need to add a few more steps.
      1. First, add the environment variable:
        os.environ["PL_TORCH_DISTRIBUTED_BACKEND"] = "smddp"

      2. Second, ensure you use DDPPlugin, rather than DDPStrategy. If you’re using a more recent version, which you can easily set by placing the requirements.txt in the source_dir for your job, then this isn’t necessary. See the following code:
        ddp = DDPPlugin(parallel_devices=[torch.device("cuda", d) for d in range(num_gpus)], cluster_environment=env)

    4. Optionally, define your process group backend as "smddp" in the DDPSTrategy object. However, if you’re using PyTorch Lightning with the PyTorch DDP backend, which is also supported, simply remove this `process_group_backend` parameter. See the following code:
      ddp = DDPStrategy(
        cluster_environment=env, 
        process_group_backend="smddp", 
        accelerator="gpu")

  3. Ensure that you have a distribution method noted in the estimator, such as distribution={"smdistributed":{"dataparallel":{"enabled":True} if you’re using the Herring backend, or distribution={"pytorchddp":{"enabled":True}.
  • For a full listing of suitable parameters in the distribution parameter, see our documentation here.

Now you can launch your SageMaker training job! You can launch your training job via the Python SDK, Boto3, the SageMaker console, the AWS Command Line Interface (AWS CLI), and countless other methods. From an AWS perspective, this is a single API command: create-training-job. Whether you launch this command from your local terminal, an AWS Lambda function, an Amazon SageMaker Studio notebook, a KubeFlow pipeline, or any other compute environment is completely up to you.

Please note that the integration between PyTorch Lightning and SageMaker Data Parallel is currently supported for only newer versions of PyTorch, starting at 1.11. In addition, this release is only available in the AWS DLCs for SageMaker starting at PyTorch 1.12. Make sure you point to this image as your base. In us-east-1, this address is as follows:

ecr_image = '763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.12.0-gpu-py38-cu113-ubuntu20.04-sagemaker'

Then you can extend your Docker container using this as your base image, or you can pass this as a variable into the image_uri argument of the SageMaker training estimator.

As a result, you’ll be able to run your PyTorch Lightning code on SageMaker Training’s optimized GPUs, with the best performance available on AWS.

Run PyTorch Distributed Data Parallel on SageMaker

The biggest problem PyTorch Distributed Data Parallel (DDP) solves is deceptively simple: speed. A good distributed training framework should provide stability, reliability, and most importantly, excellent performance at scale. PyTorch DDP delivers on this through providing torch developers with APIs to replicate their models over multiple GPU devices, in both single-node and multi-node settings. The framework then manages sharding different objects from the training dataset to each model copy, averaging the gradients for each of the model copies to synchronize them at each step. This produces one model at the total completion of the full training run. The following diagram illustrates this process.

PyTorch DDP is common in projects that use large datasets. The precise size of each dataset will vary widely, but a general guideline is to scale datasets, compute sizes, and model sizes in similar ratios. Also called scaling laws, the optimal combination of these three is very much up for debate and will vary based on applications. At AWS, based on working with multiple customers, we can clearly see benefits from data parallel strategies when an overall dataset size is at least a few tens of GBs. When the datasets get even larger, implementing some type of data parallel strategy is a critical technique to speed up the overall experiment and improve your time to value.

Previously, customers who were using PyTorch DDP for distributed training on premises or in other compute environments lacked a framework to easily migrate their projects onto SageMaker Training to take advantage of high-performance GPUs with a seamless control plane. Specifically, they needed to either migrate their data parallel framework to SMDDP, or develop and test the capabilities of PyTorch DDP on SageMaker Training manually. Today, SageMaker Training is happy to provide a seamless experience for customers onboarding their PyTorch DDP code.

To use this effectively, you don’t need to make any changes to your training scripts.

You can see this new parameter in the following code. In the distribution parameter, simply add pytorchddp and set enabled as true.

estimator = PyTorch(
    base_job_name="pytorch-dataparallel-mnist",
    source_dir="code",
    entry_point = "my_model.py",
    ... 
    # Training using SMDataParallel Distributed Training Framework
    distribution = {"pytorchddp": {"enabled": "true"}}
)

This new configuration starts at SageMaker Python SDK versions 2.102.0 and PyTorch DLC’s 1.11.

For PyTorch DDP developers who are familiar with the popular torchrun framework, it’s helpful to know that this isn’t necessary on the SageMaker training environment, which already provides robust fault tolerance. However, to minimize code rewrites, you can bring another launcher script that runs this command as your entry point.

Now PyTorch developers can easily move their scripts onto SageMaker, ensuring their scripts and containers can run seamlessly across multiple compute environments.

This prepares them to, in the future, take advantage of SageMaker’s distributed training libraries that provide AWS-optimized training topologies to deliver up to 40% speedup enhancements. For PyTorch developers, this is a single line of code! For PyTorch DDP code, you can simply set the backend to smddp in the initialization (see Modify a PyTorch Training Script), as shown in the following code:

import smdistributed.dataparallel.torch.torch_smddp
import torch.distributed as dist
dist.init_process_group(backend='smddp')

As we saw above, you can also set the backend of DDPStrategy to smddp when using Lightning. This can lead to up to 40% overall speedups for large clusters! To learn more about distributed training on SageMaker see our on-demand webinar, supporting notebooks, relevant documentation, and papers.

Conclusion

In this post, we introduced two new features within the SageMaker Training family. These make it much easier for PyTorch developers to use their existing code on SageMaker, both PyTorch DDP and PyTorch Lightning.

We also showed how Amazon Search uses SageMaker Training for training their deep learning models, and in particular PyTorch Lightning with the SageMaker Data Parallel optimized collective library as a backend. Moving to distributed training overall helped Amazon Search achieve 7.3x faster train times.


About the authors

Emily Webber joined AWS just after SageMaker launched, and has been trying to tell the world about it ever since! Outside of building new ML experiences for customers, Emily enjoys meditating and studying Tibetan Buddhism.

Karan Dhiman is a Software Development Engineer at AWS, based in Toronto, Canada. He is very passionate about Machine Learning space and building solutions for accelerating distributed computing workloads.

Vishwa Karia is a Software Development Engineer at AWS Deep Engine. Her interests lie at the intersection of Machine Learning and Distributed Systems and she is also passionate about empowering women in tech and AI.

Eiman Elnahrawy is a Principal Software Engineer at Amazon Search leading the efforts on Machine Learning acceleration, scaling, and automation. Her expertise spans multiple areas, including Machine Learning, Distributed Systems, and Personalization.

Read More

OptFormer: Towards Universal Hyperparameter Optimization with Transformers

One of the most important aspects in machine learning is hyperparameter optimization, as finding the right hyperparameters for a machine learning task can make or break a model’s performance. Internally, we regularly use Google Vizier as the default platform for hyperparameter optimization. Throughout its deployment over the last 5 years, Google Vizier has been used more than 10 million times, over a vast class of applications, including machine learning applications from vision, reinforcement learning, and language but also scientific applications such as protein discovery and hardware acceleration. As Google Vizier is able to keep track of use patterns in its database, such data, usually consisting of optimization trajectories termed studies, contain very valuable prior information on realistic hyperparameter tuning objectives, and are thus highly attractive for developing better algorithms.

While there have been many previous methods for meta-learning over such data, such methods share one major common drawback: their meta-learning procedures depend heavily on numerical constraints such as the number of hyperparameters and their value ranges, and thus require all tasks to use the exact same total hyperparameter search space (i.e., tuning specifications). Additional textual information in the study, such as its description and parameter names, are also rarely used, yet can hold meaningful information about the type of task being optimized. Such a drawback becomes more exacerbated for larger datasets, which often contain significant amounts of such meaningful information.

Today in “Towards Learning Universal Hyperparameter Optimizers with Transformers”, we are excited to introduce the OptFormer, one of the first Transformer-based frameworks for hyperparameter tuning, learned from large-scale optimization data using flexible text-based representations. While numerous works have previously demonstrated the Transformer’s strong abilities across various domains, few have touched on its optimization-based capabilities, especially over text space. Our core findings demonstrate for the first time some intriguing algorithmic abilities of Transformers: 1) a single Transformer network is capable of imitating highly complex behaviors from multiple algorithms over long horizons; 2) the network is further capable of predicting objective values very accurately, in many cases surpassing Gaussian Processes, which are commonly used in algorithms such as Bayesian Optimization.

Approach: Representing Studies as Tokens
Rather than only using numerical data as common with previous methods, our novel approach instead utilizes concepts from natural language and represents all of the study data as a sequence of tokens, including textual information from initial metadata. In the animation below, this includes “CIFAR10”, “learning rate”, “optimizer type”, and “Accuracy”, which informs the OptFormer of an image classification task. The OptFormer then generates new hyperparameters to try on the task, predicts the task accuracy, and finally receives the true accuracy, which will be used to generate the next round’s hyperparameters. Using the T5X codebase, the OptFormer is trained in a typical encoder-decoder fashion using standard generative pretraining over a wide range of hyperparameter optimization objectives, including real world data collected by Google Vizier, as well as public hyperparameter (HPO-B) and blackbox optimization benchmarks (BBOB).

The OptFormer can perform hyperparameter optimization encoder-decoder style, using token-based representations. It initially observes text-based metadata (in the gray box) containing information such as the title, search space parameter names, and metrics to optimize, and repeatedly outputs parameter and objective value predictions.

Imitating Policies
As the OptFormer is trained over optimization trajectories by various algorithms, it may now accurately imitate such algorithms simultaneously. By providing a text-based prompt in the metadata for the designated algorithm (e.g. “Regularized Evolution”), the OptFormer will imitate the algorithm’s behavior.

Over an unseen test function, the OptFormer produces nearly identical optimization curves as the original algorithm. Mean and standard deviation error bars are shown.

Predicting Objective Values
In addition, the OptFormer may now predict the objective value being optimized (e.g. accuracy) and provide uncertainty estimates. We compared the OptFormer’s prediction with a standard Gaussian Process and found that the OptFormer was able to make significantly more accurate predictions. This can be seen below qualitatively, where the OptFormer’s calibration curve closely follows the ideal diagonal line in a goodness-of-fit test, and quantitatively through standard aggregate metrics such as log predictive density.

Left: Rosenblatt Goodness-of-Fit. Closer diagonal fit is better. Right: Log Predictive Density. Higher is better.

Combining Both: Model-based Optimization
We may now use the OptFormer’s function prediction capability to better guide our imitated policy, similar to techniques found in Bayesian Optimization. Using Thompson Sampling, we may rank our imitated policy’s suggestions and only select the best according to the function predictor. This produces an augmented policy capable of outperforming our industry-grade Bayesian Optimization algorithm in Google Vizier when optimizing classic synthetic benchmark objectives and tuning the learning rate hyperparameters of a standard CIFAR-10 training pipeline.

Left: Best-so-far optimization curve over a classic Rosenbrock function. Right: Best-so-far optimization curve over hyperparameters for training a ResNet-50 on CIFAR-10 via init2winit. Both cases use 10 seeds per curve, and error bars at 25th and 75th percentiles.

Conclusion
Throughout this work, we discovered some useful and previously unknown optimization capabilities of the Transformer. In the future, we hope to pave the way for a universal hyperparameter and blackbox optimization interface to use both numerical and textual data to facilitate optimization over complex search spaces, and integrate the OptFormer with the rest of the Transformer ecosystem (e.g. language, vision, code) by leveraging Google’s vast collection of offline AutoML data.

Acknowledgements
The following members of DeepMind and the Google Research Brain Team conducted this research: Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc’aurelio Ranzato, Sagi Perel, and Nando de Freitas.

We would like to also thank Chris Dyer, Luke Metz, Kevin Murphy, Yannis Assael, Frank Hutter, and Esteban Real for providing valuable feedback, and further thank Sebastian Pineda Arango, Christof Angermueller, and Zachary Nado for technical discussions on benchmarks. In addition, we thank Daniel Golovin, Daiyi Peng, Yingjie Miao, Jack Parker-Holder, Jie Tan, Lucio Dery, and Aleksandra Faust for multiple useful conversations.

Finally, we thank Tom Small for designing the animation for this post.

Read More

Startup Digs Into Public Filings With GPU-Driven Machine Learning to Serve Up Alternative Financial Data Services

When Rachel Carpenter and Joseph French founded Intrinio a decade ago, the fintech revolution had only just begun. But they saw an opportunity to apply machine learning to vast amounts of financial filings to create an alternative data provider among the giants.

The startup, based in St. Petersburg, Fla., delivers financial data to hedge funds, proprietary trading shops, retail brokers, fintech developers and others. Intrinio runs machine learning on AWS instances of NVIDIA GPUs to parse mountains of publicly available financial data.

Carpenter and French realized early that such data was sold for a premium, and that machine learning offered a way to sort through free financial filings to deliver new products.

The company offers information on equities, options, estimates and ETFs — as well as environmental, social and governance data. Its most popular product is equities-fundamentals data.

Intrinio has taken an unbundling approach to traditional product offerings, creating à la carte data services now used in some 450 fintech applications.

“GPUs have helped us unlock data that is otherwise expensive and sourced manually,” said Carpenter, the company’s CEO. “We built a lot of technology with the idea that we wanted to unlock data for innovators in the financial services space.”

Intrinio is a member of NVIDIA Inception, a free, global program designed to support cutting-edge startups.

Partnering With Fintechs

With lower overhead enabled by GPU-driven machine learning for providing financial data, Intrinio has been able to deliver products at lower prices that appeal to startups.

“We have a much smaller and agile team, because a small team — in conjunction with NVIDIA GPUs, TensorFlow, PyTorch and everything else that we’re using — makes our work a lot more automated,” she said.

Its clients include fintech players like Robinhood, FTX, Domain Money, MarketBeat and Alpaca. Another, Aiera, transcribes earnings calls live with its own automated-speech-recognition models driven by NVIDIA GPUs, and relies on Intrinio for financial data.

“Our use of GPUs made our data packages affordable and easy to use for Aiera, so the company is integrating Intrinio financial data into its platform,” said Carpenter.

Aiera needed financial-data-cleansing services for consistent information on company earnings and more. Harnessing Intrinio’s application programming interface, Aiera can access normalized, split-second company financial data.

“GPUs are a critical component of Intrinio’s underlying technology — without them, we wouldn’t have been able to apply machine learning techniques to the cleansing and standardization of fundamental and financial statement data,” said Carpenter.

Servicing Equities, Options, ESG 

For equities pricing, Intrinio’s machine learning technology can sort out pricing discrepancies in milliseconds. This results in substantially higher data quality and reliability for users, according to Carpenter. With equity fundamentals, Intrinio automates several key processes, such as entity recognition. Intrinio uses machine learning to identify company names or other key information from unstructured text to ensure the correct categorization of data.

In other cases, Intrinio applies machine learning to reconcile line items from financial statements into standardized buckets so that, for example, you can compare revenue across companies cleanly.

The use of GPUs and machine learning in both of these cases results in higher quality data than a manually-oriented approach. Using Intrinio has shown to decrease by 88% the number of errors requiring corrections compared with manual sorting, according to the company.

For options, Intrinio takes the raw Options Price Reporting Authority (OPRA) feed and applies cutting-edge filtering, algorithms and server architecture to provide its options API

ESG data is also an area of interest for investors right now. As retail investors are starting to be more conscious of the environment and institutions are feeling the pressure to invest responsibly, they want to see how companies stack up with this information.

As regulation around ESG disclosures solidifies, Intrinio says it will be able to use its automated XBRL-standardization technology to unlock these data sets for their users. XBRL is a standardized format of digital information exchange for business.

“On the retail side, app developers need to show this information to their users because people want to see it — making that data accessible is critical to the evolution of the financial industry,” said Carpenter.

Register free for GTC, running online Sept. 19-22, to attend sessions with NVIDIA and dozens of industry leaders. View the financial services agenda for the conference. 

Image credit: Luca Bravo from Unsplash

The post Startup Digs Into Public Filings With GPU-Driven Machine Learning to Serve Up Alternative Financial Data Services appeared first on NVIDIA Blog.

Read More

Visualize your Amazon Lookout for Metrics anomaly results with Amazon QuickSight

One of the challenges encountered by teams using Amazon Lookout for Metrics is quickly and efficiently connecting it to data visualization. The anomalies are presented individually on the Lookout for Metrics console, each with their own graph, making it difficult to view the set as a whole. An automated, integrated solution is needed for deeper analysis.

In this post, we use a Lookout for Metrics live detector built following the Getting Started section from the AWS Samples, Amazon Lookout for Metrics GitHub repo. After the detector is active and anomalies are generated from the dataset, we connect Lookout for Metrics to Amazon QuickSight. We create two datasets: one by joining the dimensions table with the anomaly table, and another by joining the anomaly table with the live data. We can then add these two datasets to a QuickSight analysis, where we can add charts in a single dashboard.

We can provide two types of data to the Lookout for Metrics detector: continuous and historical. The AWS Samples GitHub repo offers both, though we focus on the continuous live data. The detector monitors this live data to identify anomalies and writes the anomalies to Amazon Simple Storage Service (Amazon S3) as they’re generated. At the end of a specified interval, the detector analyzes the data. Over time, the detector learns to more accurately identify anomalies based on patterns it finds.

Lookout for Metrics uses machine learning (ML) to automatically detect and diagnose anomalies in business and operational data, such as a sudden dip in sales revenue or customer acquisition rates. The service is now generally available as of March 25, 2021. It automatically inspects and prepares data from a variety of sources to detect anomalies with greater speed and accuracy than traditional methods used for anomaly detection. You can also provide feedback on detected anomalies to tune the results and improve accuracy over time. Lookout for Metrics makes it easy to diagnose detected anomalies by grouping together anomalies related to the same event and sending an alert that includes a summary of the potential root cause. It also ranks anomalies in order of severity so you can prioritize your attention to what matters the most to your business.

QuickSight is a fully-managed, cloud-native business intelligence (BI) service that makes it easy to connect to your data to create and publish interactive dashboards. Additionally, you can use Amazon QuickSight to get instant answers through natural language queries.

You can access serverless, highly scalable QuickSight dashboards from any device, and seamlessly embed them into your applications, portals, and websites. The following screenshot is an example of what you can achieve by the end of this post.

Overview of solution

The solution is a combination of AWS services, primarily Lookout for Metrics, QuickSight, AWS Lambda, Amazon Athena, AWS Glue, and Amazon S3.

The following diagram illustrates the solution architecture. Lookout for Metrics detects and sends the anomalies to Lambda via an alert. The Lambda function generates the anomaly results as CSV files and saves them in Amazon S3. An AWS Glue crawler analyzes the metadata, and creates tables in Athena. QuickSight uses Athena to query the Amazon S3 data, allowing dashboards to be built to visualize both the anomaly results and the live data.

Solution Architecture

This solution expands on the resources created in the Getting Started section of the GitHub repo. For each step, we include options to create the resources either using the AWS Management Console or launching the provided AWS CloudFormation stack. If you have a customized Lookout for Metrics detector, you can use it and adapt it the following notebook to achieve the same results.

The implementation steps are as follows:

  1. Create the Amazon SageMaker notebook instance (ALFMTestNotebook) and notebooks using the stack provided in the Initial Setup section from the GitHub repo.
  2. Open the notebook instance on the SageMaker console and navigate to the amazon-lookout-for-metrics-samples/getting_started folder.
  3. Create the S3 bucket and complete the data preparation using the first notebook (1.PrereqSetupData.ipynb). Open the notebook with the conda_python3 kernel, if prompted.

We skip the second notebook because it’s focused on backtesting data.

  1. If you’re walking through the example using the console, create the Lookout for Metrics live detector and its alert using the third notebook (3.GettingStartedWithLiveData.ipynb).

If you’re using the provided CloudFormation stacks, the third notebook isn’t required. The detector and its alert are created as part of the stack.

  1. After you create the Lookout for Metrics live detector, you need to activate it from the console.

This can take up to 2 hours to initialize the model and detect anomalies.

  1. Deploy a Lambda function, using Python with a Pandas library layer, and create an alert attached to the live detector to launch it.
  2. Use the combination of Athena and AWS Glue to discover and prepare the data for QuickSight.
  3. Create the QuickSight data source and datasets.
  4. Finally, create a QuickSight analysis for visualization, using the datasets.

The CloudFormation scripts are typically run as a set of nested stacks in a production environment. They’re provided individually in this post to facilitate a step-by-step walkthrough.

Prerequisites

To go through this walkthrough, you need an AWS account where the solution will be deployed. Make sure that all the resources you deploy are in the same Region. You need a running Lookout for Metrics detector built from notebooks 1 and 3 from the GitHub repo. If you don’t have a running Lookout for Metrics detector, you have two options:

  • Run notebooks 1 and 3, and continue from the step 1 of this post (creating the Lambda function and alert)
  • Run notebook 1 and then use the CloudFormation template to generate the Lookout for Metrics detector

Create the live detector using AWS CloudFormation

The L4MLiveDetector.yaml CloudFormation script creates the Lookout for Metrics anomaly detector with its source pointing to the live data in the specified S3 bucket. To create the detector, complete the following steps:

  1. Launch the stack from the following link:

  1. On the Create stack page, choose Next.
  2. On the Specify stack details page, provide the following information:
    1. A stack name. For example, L4MLiveDetector.
    2. The S3 bucket, <Account Number>-lookoutmetrics-lab.
    3. The Role ARN, arn:aws:iam::<Account Number>:role/L4MTestRole.
    4. An anomaly detection frequency. Choose PT1H (hourly).
  3. Choose Next.
  4. On the Configure stack options page, leave everything as is and choose Next.
  5. On the Review page, leave everything as is and choose Create stack.

Create the live detector SMS alert using AWS CloudFormation (Optional)

This step is optional. The alert is presented as an example, with no impact on the dataset creation. The L4MLiveDetectorAlert.yaml CloudFormation script creates the Lookout for Metrics anomaly detector alert with an SMS target.

  1. Launch the stack from the following link:

  1. On the Create stack page, choose Next.
  2. On the Specify stack details page, update the SMS phone number and enter a name for the stack (for example, L4MLiveDetectorAlert).
  3. Choose Next.
  4. On the Configure stack options page, leave everything as is and choose Next.
  5. On the Review page, select the acknowledgement check box, leave everything else as is, and choose Create stack.

Resource cleanup

Before proceeding with the next step, stop your SageMaker notebook instance to ensure no unnecessary costs are incurred. It is no longer needed.

Create the Lambda function and alert

In this section, we provide instructions on creating your Lambda function and alert via the console or AWS CloudFormation.

Create the function and alert with the console

You need a Lambda AWS Identity and Access Management (IAM) role following the least privilege best practice to access the bucket where you want the results to be saved.

    1. On the Lambda console, create a new function.
    2. Select Author from scratch.
    3. For Function name¸ enter a name.
    4. For Runtime, choose Python 3.8.
    5. For Execution role, select Use an existing role and specify the role you created.
    6. Choose Create function.
    1. Download the ZIP file containing the necessary code for the Lambda function.
    2. On the Lambda console, open the function.
    3. On the Code tab, choose Upload from, choose .zip file, and upload the file you downloaded.
    4. Choose Save.

Your file tree should remain the same after uploading the ZIP file.

  1. In the Layers section, choose Add layer.
  2. Select Specify an ARN.
  3. In the following GitHub repo, choose the CSV corresponding to the Region you’re working in and copy the ARN from the latest Pandas version.
  4. For Specify an ARN, enter the ARN you copied.
  5. Choose Add.

  1. To adapt the function to your environment, at the bottom of the code from the lambda_function.py file, make sure to update the bucket name with your bucket where you want to save the anomaly results, and the DataSet_ARN from your anomaly detector.
  2. Choose Deploy to make the changes active.

You now need to connect the Lookout for Metrics detector to your function.

  1. On the Lookout for Metrics console, navigate to your detector and choose Add alert.
  2. Enter the alert name and your preferred severity threshold.
  3. From the channel list, choose Lambda.
  4. Choose the function you created and make sure you have the right role to trigger it.
  5. Choose Add alert.

Now you wait for your alert to trigger. The time varies depending on when the detector finds an anomaly.

When an anomaly is detected, Lookout for Metrics triggers the Lambda function. It receives the necessary information from Lookout for Metrics and checks if there is already a saved CSV file in Amazon S3 at the corresponding timestamp of the anomaly. If there isn’t a file, Lambda generates the file and adds the anomaly data. If the file already exists, Lambda updates the file with the extra data received. The function generates a separated CSV file for each different timestamp.

Create the function and alert using AWS CloudFormation

Similar to the console instructions, you download the ZIP file containing the necessary code for the Lambda function. However, in this case it needs to be uploaded to the S3 bucket in order for the AWS CloudFormation code to load it during function creation.

In the S3 bucket specified in the Lookout for Metrics detector creation, create a folder called lambda-code, and upload the ZIP file.

The Lambda function loads this as its code during creation.

The L4MLambdaFunction.yaml CloudFormation script creates the Lambda function and alert resources and uses the function code archive stored in the same S3 bucket.

  1. Launch the stack from the following link:

  1. On the Create stack page, choose Next.
  2. On the Specify stack details page, specify a stack name (for example, L4MLambdaFunction).
  3. In the following GitHub repo, open the CSV corresponding to the Region you’re working in and copy the ARN from the latest Pandas version.
  4. Enter the ARN as the Pandas Lambda layer ARN parameter.
  5. Choose Next.
  6. On the Configure stack options page, leave everything as is and choose Next.
  7. On the Review page, select the acknowledgement check box, leave everything else as is, and choose Create stack.

Activate the detector

Before proceeding to the next step, you need to activate the detector from the console.

  1. On the Lookout for Metrics console, choose Detectors in the navigation pane.
  2. Choose your newly created detector.
  3. Choose Activate, then choose Activate again to confirm.

Activation initializes the detector; it’s finished when the model has completed its learning cycle. This can take up to 2 hours.

Prepare the data for QuickSight

Before you complete this step, give the detector time to find anomalies. The Lambda function you created saves the anomaly results in the Lookout for Metrics bucket in the anomalyResults directory. We can now process this data to prepare it for QuickSight.

Create the AWS Glue crawler on the console

After some anomaly CSV files have been generated, we use an AWS Glue crawler to generate the metadata tables.

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Choose Add crawler.

  1. Enter a name for the crawler (for example, L4MCrawler).
  2. Choose Next.
  3. For Crawler source type, select Data stores.
  4. For Repeat crawls of S3 data stores, select Crawl all folders.
  5. Choose Next.

  1. On the data store configuration page, for Crawl data in, select Specified path in my account.
  2. For Include path, enter the path of your dimensionContributions file (s3://YourBucketName/anomalyResults/dimensionContributions).
  3. Choose Next.
  4. Choose Yes to add another data store and repeat the instructions for metricValue_AnomalyScore(s3://YourBucketName/anomalyResults/metricValue_AnomalyScore).
  5. Repeat the instructions again for the live data to be analyzed by the Lookout for Metrics anomaly detector (this is the S3 dataset location from your Lookout for Metrics detector).

You should now have three data stores for the crawler to process.

Now you need to select the role to allow the crawler to go through the S3 locations of your data.

  1. For this post, select Create an IAM role and enter a name for the role.
  2. Choose Next.

  1. For Frequency, leave as Run on demand and choose Next.
  2. In the Configure the crawler’s output section, choose Add database.

This creates the Athena database where your metadata tables are located after the crawler is complete.

  1. Enter a name for your database and choose Create.
  2. Choose Next, then choose Finish.

  1. On the Crawlers page of the AWS Glue console, select the crawler you created and choose Run crawler.

You may need to wait a few minutes, depending on the size of the data. When it’s complete, the crawler’s status shows as Ready. To see the metadata tables, navigate to your database on the Databases page and choose Tables in the navigation pane.

In this example, the metadata table called live represents the S3 dataset from the Lookout for Metrics live detector. As a best practice, it’s recommended to encrypt your AWS Glue Data Catalog metadata.

Athena automatically recognizes the metadata tables, and QuickSight uses Athena to query the data and visualize the results.

Create the AWS Glue crawler using AWS CloudFormation

The L4MGlueCrawler.yaml CloudFormation script creates the AWS Glue crawler, its associated IAM role, and the output Athena database.

  1. Launch the stack from the following link:

  1. On the Create stack page, choose Next.
  2. On the Specify stack details page, enter a name for your stack (for example, L4MGlueCrawler), and choose Next.
  3. On the Configure stack options page, leave everything as is and choose Next.
  4. On the Review page, select the acknowledgement check box, leave everything else as is, and choose Create stack.

Run the AWS Glue crawler

After you create the crawler, you need to run it before moving to the next step. You can run it from the console or the AWS Command Line Interface (AWS CLI). To use the console, complete the following steps:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Select your crawler (L4MCrawler).
  3. Choose Run crawler.

When the crawler is complete, it shows the status Ready.

Create a QuickSight account

Before starting this next step, navigate to the QuickSight console and create an account if you don’t already have one. To make sure you have access to the corresponding services (Athena and S3 bucket), choose your account name on the top right, choose Manage QuickSight, and choose Security and Permissions, where you can add the necessary services. When setting up your Amazon S3 access, make sure to select Write permission for Athena Workgroup.

Now you’re ready to visualize your data in QuickSight.

Create the QuickSight datasets on the console

If this is your first time using Athena, you have to configure the output location of the queries. For instructions, refer to Steps 1–6 in Create a database. Then complete the following steps:

  1. On the QuickSight console, choose Datasets.
  2. Choose New dataset.
  3. Choose Athena as your source.
  4. Enter a name for your data source.
  5. Choose Create data source.

  1. For your database, specify the one you created earlier with the AWS Glue crawler.
  2. Specify the table that contains your live data (not the anomalies).
  3. Choose Edit/preview data.

You’re redirected to an interface similar to the following screenshot.

The next step is to add and combine the metricValue_AnomalyScore data with the live data.

  1. Choose Add data.
  2. Choose Add data source.
  3. Specify the database you created and the metricValue_AnomalyScore table.
  4. Choose Select.

You need now to configure the join of the two tables.

  1. Choose the link between the two tables.
  2. Leave the join type as Left, add the timestamp and each dimension you have as a join clause, and choose Apply.

In the following example, we use timestamp, platform, and marketplace as join clauses.

On the right pane, you can remove the fields you’re not interested in keeping.

  1. Remove the timestamp from the metricValue_AnomalyScore table to not have a duplicated column.
  2. Change the timestamp data type (of the live data table) from string to date, and specify the correct format. In our case, it should be yyyy-MM-dd HH:mm:ss.

The following screenshot shows your view after you remove some fields and adjust the data type.

  1. Choose Save and visualize.
  2. Choose the pencil icon next to the dataset.
  3. Choose Add dataset and choose dimensioncontributions.

Create the QuickSight datasets using AWS CloudFormation

This step contains three CloudFormation stacks.

The first CloudFormation script, L4MQuickSightDataSource.yaml, creates the QuickSight Athena data source.

  1. Launch the stack from the following link:

  1. On the Create stack page, choose Next.
  2. On the Specify stack details page, enter your QuickSight user name, the QuickSight account Region (specified when creating the QuickSight account), and a stack name (for example, L4MQuickSightDataSource).
  3. Choose Next.
  4. On the Configure stack options page, leave everything as is and choose Next.
  5. On the Review page, leave everything as is and choose Create stack.

The second CloudFormation script, L4MQuickSightDataSet1.yaml, creates a QuickSight dataset that joins the dimensions table with the anomaly table.

  1. Launch the stack from the following link:

  1. On the Create stack page, choose Next.
  2. On the Specify stack details, enter a stack name (for example, L4MQuickSightDataSet1).
  3. Choose Next.
  4. On the Configure stack options page, leave everything as is and choose Next.
  5. On the Review page, leave everything as is and choose Create stack.

The third CloudFormation script, L4MQuickSightDataSet2.yaml, creates the QuickSight dataset that joins the anomaly table with the live data table.

  1. Launch the stack from the following link:

  1. On the Create stack page¸ choose Next.
  2. On the Specify stack details page, enter a stack name (for example, L4MQuickSightDataSet2).
  3. Choose Next.
  4. On the Configure stack options page, leave everything as is and choose Next.
  5. On the Review page, leave everything as is and choose Create stack.

Create the QuickSight analysis for dashboard creation

This step can only be completed on the console. After you’ve created your QuickSight datasets, complete the following steps:

  1. On the QuickSight console, choose Analysis in the navigation pane.
  2. Choose New analysis.
  3. Choose the first dataset, L4MQuickSightDataSetWithLiveData.

  1. Choose Create analysis.

The QuickSight analysis is initially created with only the first dataset.

  1. To add the second dataset, choose the pencil icon next to Dataset and choose Add dataset.
  2. Choose the second dataset and choose Select.

You can then use either dataset for creating charts by choosing it on the Dataset drop-down menu.

Dataset metrics

You have successfully created a QuickSight analysis from Lookout for Metrics inference results and the live data. Two datasets are in QuickSight for you to use: L4M_Visualization_dataset_with_liveData and L4M_Visualization_dataset_with_dimensionContribution.

The L4M_Visualization_dataset_with_liveData dataset includes the following metrics:

  • timestamp – The date and time of the live data passed to Lookout for Metrics
  • views – The value of the views metric
  • revenue – The value of the revenue metric
  • platform, marketplace, revenueAnomalyMetricValue, viewsAnomalyMetricValue, revenueGroupScore and viewsGroupScore – These metrics are part of both datasets

The L4M_Visualization_dataset_with_dimensionContribution dataset includes the following metrics:

  • timestamp – The date and time of when the anomaly was detected
  • metricName – The metrics you’re monitoring
  • dimensionName – The dimension within the metric
  • dimensionValue – The value of the dimension
  • valueContribution – The percentage on how much dimensionValue is affecting the anomaly when detected

The following screenshot shows these five metrics on the anomaly dashboard of the Lookout for Metrics detector.

The following metrics are part of both datasets:

  • platform – The platform where the anomaly happened
  • marketplace – The marketplace where the anomaly happened
  • revenueAnomalyMetricValue and viewsAnomalyMetricValue – The corresponding values of the metric when the anomaly was detected (in this situation, the metrics are revenue or views)
  • revenueGroupScore and viewsGroupScore – The severity scores for each metric for the detected anomaly

To better understand these last metrics, you can review the CSV files created by the Lambda function in your S3 bucket where you saved anomalyResults/metricValue_AnomalyScore.

Next steps

The next step is to build the dashboards for the data you want to see. This post doesn’t include an explanation on creating QuickSight charts. If you’re new to QuickSight, refer to Getting started with data analysis in Amazon QuickSight for an introduction. The following screenshots show examples of basic dashboards. For more information, check out the QuickSight workshops.

Conclusion

The anomalies are presented individually on the Lookout for Metrics console, each with their own graph, making it difficult to view the set as a whole. An automated, integrated solution is needed for deeper analysis. In this post, we used a Lookout for Metrics detector to generate anomalies, and connected the data to QuickSight to create visualizations. This solution enables us to conduct deeper analysis into anomalies and have them all in one single place/dashboard.

As a next step, this solution could as well be expanded by adding an extra dataset and combine anomalies from multiple detectors. You could also adapt the Lambda function. The Lambda function contains the code that generates the data sets and variable names that we use for the QuickSight dashboards. You can adapt this code to your particular use case by changing the data sets itself or the variable names that make more sense to you.

If you have any feedback or questions, please leave them in the comments.


About the Authors

Benoît de Patoul is an AI/ML Specialist Solutions Architect at AWS. He helps customers by providing guidance and technical assistance to build solutions related to AI/ML when using AWS.

Paul Troiano is a Senior Solutions Architect at AWS, based in Atlanta, GA. He helps customers by providing guidance on technology strategies and solutions on AWS. He is passionate about all things AI/ML and solution automation.

Read More

Boldly Go: Discover New Frontiers in AI-Powered Transportation at GTC

AI and the metaverse are revolutionizing every aspect of the way we live, work and play — including how we move.

Leaders in the automotive and technology industries will come together at NVIDIA GTC to discuss the newest breakthroughs driving intelligent vehicles, whether in the real world or in simulation.

The virtual conference, which runs from Sept. 19-22, will feature a slate of in-depth sessions on end-to-end software-defined vehicle development, as well as advances in robotics, healthcare, high performance computing and more. And it’s all free to attend.

Headlining GTC is NVIDIA founder and CEO Jensen Huang, who will present the latest in AI and NVIDIA Omniverse in the keynote address on Tuesday, Sept. 20, at 8 a.m. PT.

Conference attendees will have plenty of networking opportunities, and they can learn from NVIDIA experts and industry luminaries about AV development, from the cloud to the car.

Here’s a brief look at what to expect during GTC:

Meet the Trailblazers

Every stage of the automotive pipeline is being transformed by AI and metaverse technologies, from manufacturing and design, to autonomous driving, to the passenger experience.

Speakers from each of these areas will share how they’re harnessing AI innovations to accelerate software-defined transportation.

Automotive sessions include:

  • Michael Bell, senior vice president of Digital at Lucid Motors, walks through the development of the Lucid Dream Drive Pro advanced driver assistance system, and how the company continuously deploys new features for a cutting-edge driving experience.
  • Yuli Bai, head of AI Platform at NIO, outlines the AI infrastructure that the automaker is using to develop intelligent, software-defined vehicles running on the NVIDIA DRIVE Orin compute platform.
  • Apeksha Kumavat, chief engineer and co-founder at Gatik, explains how its autonomous commercial-delivery vehicles are helping the retail industry adapt to rapidly changing consumer demands.
  • Dennis Nobelius, chief operating officer at Polestar, describes how the performance electric vehicle maker is developing AI-powered features geared toward the human driver, while prioritizing long-term environmental sustainability.

Don’t miss additional sessions from BMW, Mercedes-Benz and Waabi covering manufacturing, AI research and more.

Get the Inside Track on DRIVE Development

Learn about the latest NVIDIA DRIVE technologies directly from the minds behind their creation.

NVIDIA DRIVE Developer Day consists of a series of deep-dive sessions on building safe and robust autonomous vehicles. Led by the NVIDIA engineering team, the talks will highlight the newest DRIVE features and discuss how to apply them to AV development.

Topics include:

  • NVIDIA DRIVE product roadmap
  • Intelligent in-vehicle infotainment
  • Data center development
  • Synthetic data generation for testing and validation

All of this virtual content is available to GTC attendees — register for free today to see the technologies shaping the intelligent future of transportation.

The post Boldly Go: Discover New Frontiers in AI-Powered Transportation at GTC appeared first on NVIDIA Blog.

Read More

Startup’s Vision AI Software Trains Itself — in One Hour — to Detect Manufacturing Defects in Real Time

Cameras have been deployed in factories for over a decade — so why, Franz Tschimben wondered, hasn’t automated visual inspection yet become the worldwide standard?

This question motivated Tschimben and his colleagues to found Covision Quality, an AI-based visual-inspection software startup that uses NVIDIA technology to transform end-of-line defect detection for the manufacturing industry.

“The simple answer is that these systems are hard to scale,” said Tschimben, the northern Italy-based company’s CEO. “Material defects, like burrs, holes or scratches, have varying geometric shapes and colors that make identifying them cumbersome. That meant quality-control specialists had to program inspection systems by hand to fine-tune their defect parameters.”

Covision’s software allows users to train AI models for visual inspection without needing to code. It quadruples accuracy for defect detection and reduces false-negative rates by up to 90% compared with traditional rule-based methods, according to Tschimben.

The software relies on unsupervised machine learning that’s trained on NVIDIA RTX A5000 GPUs. This technique allows the AI in just one hour to teach itself, based on hundreds of example images, what qualifies as a defect for a specific customer. It removes the extensive labeling of thousands of images that’s typically required for a supervised learning pipeline.

The startup is a member of NVIDIA Metropolis — a partner ecosystem centered on vision AI that includes a suite of GPU-accelerated software development kits, pretrained models and the TAO toolkit to supercharge a range of automation applications. Covision is also part of NVIDIA Inception, a free, global program that nurtures cutting-edge startups.

In June, Covision was chosen from hundreds of emerging companies as the winner of a startup award at Automate, a flagship conference on all things automation.

Reducing Pseudo-Scrap Rates

In manufacturing, the pseudo-scrap rate — or the frequency at which products are falsely identified as defective — is a key indicator of a visual-inspection system’s efficiency.

Covision’s software, which is hardware agnostic, reduces pseudo-scrap rates by up to 90%, according to Tschimben.

As an item passes through a production line, a camera captures an image of it. Then, Covision’s real-time AI model analyzes it. Finally, it sends the information to a simple user interface that displays image frames: green for good pieces and red for defective ones.

For GKN Powder Metallurgy, a global producer of 13 million metal parts each day, the above steps can occur in as quick as 200 milliseconds per piece — enabled by Covision software and NVIDIA GPUs deployed at the production line.

Two to six cameras usually inspect one production line at a factory, Tschimben said. And one NVIDIA A5000 GPU on premises can process the images from four production lines in real time.

“NVIDIA GPUs are robust and reliable,” he added. “The TensorRT SDK and CUDA toolkit enable our developers to use the latest resources to build our platform, and the Metropolis program helps us with go-to-market strategy — NVIDIA is a one-stop solution for us.”

Plus, being an Inception member gives Covision access to free credits for NVIDIA Deep Learning Institute courses, which Tschimben said are “very helpful hands-on resources” for the company’s engineers to stay up to date on the latest NVIDIA tech.

Increasing Efficiency, Sustainability in Industrial Production

In addition to identifying defective pieces at production lines, Covision software offers a management panel that displays AI-based data analyses of improvements in a production site’s quality of outputs over time — and more.

“It can show, for example, which site out of a company’s many across the world is producing the best metal pieces with the highest production-line uptime, or which production line within a factory needs attention at a given moment,” Tschimben said.

This feature can help managers make high-level decisions to optimize factory efficiency, globally.

“There’s also a sustainability factor,” Tschimben said. “Companies want to reduce waste. Our software reduces production inefficiencies, increasing sustainability and making the work more streamlined.”

Reducing pseudo-scrap rates using Covision software means that companies can produce materials at higher efficiency and profitability levels, and ultimately waste less.

Covision software is deployed at production sites across the U.S. and Europe for customers including Alupress Group and Aluflexpack, in addition to GKN Powder Metallurgy.

Learn more about NVIDIA Metropolis and apply to join NVIDIA Inception.

Attend NVIDIA GTC, running online Sept.19-22, to discover how vision AI and other groundbreaking technologies are shaping the world.

The post Startup’s Vision AI Software Trains Itself — in One Hour — to Detect Manufacturing Defects in Real Time appeared first on NVIDIA Blog.

Read More