NVIDIA AI Turbocharges Industrial Research, Scientific Discovery in the Cloud on Rescale HPC-as-a-Service Platform

NVIDIA AI Turbocharges Industrial Research, Scientific Discovery in the Cloud on Rescale HPC-as-a-Service Platform

Just like many businesses, the world of industrial scientific computing has a data problem.

Solving seemingly intractable challenges — from developing new energy sources and creating new modes of transportation, to addressing mission-critical issues such as driving operational efficiencies and improving customer support — requires massive amounts of high performance computing.

Instead of having to architect, engineer and build ever-more supercomputers, companies such as Electrolux, Denso, Samsung and Virgin Orbit are embracing benefits offered by Rescale’s cloud platform. This makes it possible to scale their accelerated computing in an energy-efficient way and to speed their innovation.

Addressing the industrial scientific community’s rising demand for AI in the cloud, NVIDIA founder and CEO Jensen Huang joined Rescale founder and CEO Joris Poort at the Rescale Big Compute virtual conference, where they announced that Rescale is adopting the NVIDIA AI software portfolio.

NVIDIA AI will bring new capabilities to Rescale’s HPC-as-a-service offerings, which include simulation and engineering software used by hundreds of customers across industries. NVIDIA is also accelerating the Rescale Compute Recommendation Engine announced today, which enables customers to identify the right infrastructure options to optimize cost and speed objectives.

“Fusing principled and data-driven methods, physics-ML AI models let us explore our design space at speeds and scales many orders of magnitude greater than ever before,” Huang said. “Rescale is at the intersection of these major trends. NVIDIA’s accelerated and AI computing platform perfectly complements Rescale to advance industrial scientific computing.”

“Engineers and scientists working on breakthrough innovations need integrated cloud platforms that put R&D software and accelerated computing at their fingertips,” said Poort. “We’ve helped customers speed discoveries and save costs with NVIDIA-accelerated HPC, and adding NVIDIA AI Enterprise to the Rescale platform will bring together the most advanced computing capabilities with the best of AI, and support an even broader range of AI-powered workflows R&D leaders can run on any cloud of their choice.”

Expanding HPC to New Horizons in the Cloud With NVIDIA AI

The companies announced that they are working to bring NVIDIA AI Enterprise to Rescale, broadening the cloud platform’s offerings to include NVIDIA-supported AI workflows and processing engines. Once it’s available, customers will be able to develop AI applications in any leading cloud, with support from NVIDIA.

The globally adopted software of the NVIDIA AI platform, NVIDIA AI Enterprise includes essential processing engines for each step of the AI workflow, from data processing and AI model training to simulation and large-scale deployment.

NVIDIA AI enables organizations to develop predictive models to complement and expand industrial HPC research and development with applications such as computer vision, route and supply chain optimization, robotics simulations and more.

The Rescale software catalog provides access to hundreds of NVIDIA-accelerated containerized applications and pretrained AI models on NVIDIA NGC, and allows customers to run simulations on demand and scale up or down as needed.

NVIDIA Modulus to Speed Physics-Based Machine Learning

Rescale now offers the NVIDIA Modulus framework for developing physics machine learning neural network models to support a broad range of engineering use cases.

Modulus blends the power of physics with data to build high-fidelity models that enable near-real-time simulations. With just a few clicks on the Rescale platform, Modulus will allow customers to run their entire AI-driven simulation workflow, from data pre-processing and model training to inference and model deployment.

On-Prem to Cloud Workflow Orchestration Expands Flexibility

Rescale is additionally integrating the NVIDIA Base Command Platform AI developer workflow management software, which can orchestrate workloads across clouds to on-premises NVIDIA DGX systems.

Rescale’s HPC-as-a-service platform is accelerated by NVIDIA on leading cloud service provider platforms, including Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure. Rescale is a member of the NVIDIA Inception program.

To learn more, watch Huang and Poort discuss the news in the replay of the Big Compute keynote address.

The post NVIDIA AI Turbocharges Industrial Research, Scientific Discovery in the Cloud on Rescale HPC-as-a-Service Platform appeared first on NVIDIA Blog.

Read More

NVIDIA Hopper, Ampere GPUs Sweep Benchmarks in AI Training

NVIDIA Hopper, Ampere GPUs Sweep Benchmarks in AI Training

Two months after their debut sweeping MLPerf inference benchmarks, NVIDIA H100 Tensor Core GPUs set world records across enterprise AI workloads in the industry group’s latest tests of AI training.

Together, the results show H100 is the best choice for users who demand utmost performance when creating and deploying advanced AI models.

MLPerf is the industry standard for measuring AI performance. It’s backed by a broad group that includes Amazon, Arm, Baidu, Google, Harvard University, Intel, Meta, Microsoft, Stanford University and the University of Toronto.

In a related MLPerf benchmark also released today, NVIDIA A100 Tensor Core GPUs raised the bar they set last year in high performance computing (HPC).

Hopper sweeps MLPerf for AI Training
NVIDIA H100 GPUs were up to 6.7x faster than A100 GPUs when they were first submitted for MLPerf Training.

H100 GPUs (aka Hopper) raised the bar in per-accelerator performance in MLPerf Training. They delivered up to 6.7x more performance than previous-generation GPUs when they were first submitted on MLPerf training. By the same comparison, today’s A100 GPUs pack 2.5x more muscle, thanks to advances in software.

Due in part to its Transformer Engine, Hopper excelled in training the popular BERT model for natural language processing. It’s among the largest and most performance-hungry of the MLPerf AI models.

MLPerf gives users the confidence to make informed buying decisions because the benchmarks cover today’s most popular AI workloads — computer vision, natural language processing, recommendation systems, reinforcement learning and more. The tests are peer reviewed, so users can rely on their results.

A100 GPUs Hit New Peak in HPC

In the separate suite of MLPerf HPC benchmarks, A100 GPUs swept all tests of training AI models in demanding scientific workloads run on supercomputers. The results show the NVIDIA AI platform’s ability to scale to the world’s toughest technical challenges.

For example, A100 GPUs trained AI models in the CosmoFlow test for astrophysics 9x faster than the best results two years ago in the first round of MLPerf HPC. In that same workload, the A100 also delivered up to a whopping 66x more throughput per chip than an alternative offering.

The HPC benchmarks train models for work in astrophysics, weather forecasting and molecular dynamics. They are among many technical fields, like drug discovery, adopting AI to advance science.

A100 leads in MLPerf HPC
In tests around the globe, A100 GPUs led in both speed and throughput of training.

Supercomputer centers in Asia, Europe and the U.S. participated in the latest round of the MLPerf HPC tests. In its debut on the DeepCAM benchmarks, Dell Technologies showed strong results using NVIDIA A100 GPUs.

An Unparalleled Ecosystem

In the enterprise AI training benchmarks, a total of 11 companies, including the Microsoft Azure cloud service, made submissions using NVIDIA A100, A30 and A40 GPUs. System makers including ASUS, Dell Technologies, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo and Supermicro used a total of nine NVIDIA-Certified Systems for their submissions.

In the latest round, at least three companies joined NVIDIA in submitting results on all eight MLPerf training workloads. That versatility is important because real-world applications often require a suite of diverse AI models.

NVIDIA partners participate in MLPerf because they know it’s a valuable tool for customers evaluating AI platforms and vendors.

Under the Hood

The NVIDIA AI platform provides a full stack from chips to systems, software and services. That enables continuous performance improvements over time.

For example, submissions in the latest HPC tests applied a suite of software optimizations and techniques described in a technical article. Together they slashed runtime on one benchmark by 5x, to just 22 minutes from 101 minutes.

A second article describes how NVIDIA optimized its platform for the enterprise AI benchmarks. For example, we used NVIDIA DALI  to efficiently load and pre-process data for a computer vision benchmark.

All the software used in the tests is available from the MLPerf repository, so anyone can get these world-class results. NVIDIA continuously folds these optimizations into containers available on NGC, a software hub for GPU applications.

The post NVIDIA Hopper, Ampere GPUs Sweep Benchmarks in AI Training appeared first on NVIDIA Blog.

Read More

Elastic Weight Consolidation Improves the Robustness of Self-Supervised Learning Methods under Transfer

This paper was accepted at the workshop “Self-Supervised Learning – Theory and Practice” at NeurIPS 2022.
Self-supervised representation learning (SSL) methods provide an effective label-free initial condition for fine-tuning downstream tasks. However, in numerous realistic scenarios, the downstream task might be biased with respect to the target label distribution. This in turn moves the learned fine-tuned model posterior away from the initial (label) bias-free self-supervised model posterior. In this work, we re-interpret SSL fine-tuning under the lens of Bayesian continual learning and…Apple Machine Learning Research

Brain tumor segmentation at scale using AWS Inferentia

Brain tumor segmentation at scale using AWS Inferentia

Medical imaging is an important tool for the diagnosis and localization of disease. Over the past decade, collections of medical images have grown rapidly, and open repositories such as The Cancer Imaging Archive and Imaging Data Commons have democratized access to this vast imaging data. Computational tools such as machine learning (ML) and artificial intelligence (AI) have emerged as an effective and viable option for rapid analysis of this imaging data. Many algorithms have been developed for different kinds of image analysis. These include classification, segmentation, and localization, to name a few. However, the development of the algorithm and training of the required ML model is only one piece of the larger ML/AI puzzle.

Cost-efficient and high-performance deployment of the model is also vital. Additionally, for a model to be of any use at scale, it must be deployed for inference in a reliable, scalable environment.

In this post, we discuss one possible approach of using native AWS technologies to deploy ML algorithms at scale for a medical imaging use case. We talk about segmenting a tumor from MRI brain scans and cover solution architecture, compute infrastructure, and results.

Solution overview

The solution proposed in this post is based around a trained U-net model using the popular Keras framework and with a sample dataset from the popular Kaggle competition platform.

The trained U-net model is then processed via the AWS Neuron SDK so that it can be optimized to target Amazon EC2 Inf1 instances, featuring AWS Inferentia, the first AWS ML accelerator optimized for inference.

The solution uses a managed elastic architecture with fast storage to ensure that high throughput is maintained across each layer of the solution. The following diagram describes the overall architecture.

The central idea around the proposed architecture spins around an elastic cluster of AWS Inferentia-powered containers running on Amazon Elastic Container Service (Amazon ECS) serving a U-net model optimized via the AWS Neuron SDK.

The inference nodes: AWS Inferentia

AWS offers various ways to deploy a deep learning model in the cloud. One option uses AWS Inferentia, which is a high-performance ML inference chip designed by AWS.

AWS Inferentia delivers up to 80% lower cost per inference and up to 2.3 times higher throughput than comparable current generation GPU-based Amazon Elastic Compute Cloud (Amazon EC2) instances. With Inf1 instances, you can run high-scale ML inference applications for a variety of medical imaging uses cases. The AWS Neuron SDK optimizes models for deployment onto AWS Inferentia-powered instances.

AWS Neuron consists of a compiler, runtime, and profiling tools that help optimize the performance of workloads for AWS Inferentia.

With AWS Neuron, developers can deploy neural network models using popular frameworks like PyTorch or TensorFlow on AWS Inferentia-based EC2 Inf1 instances.

The workflow to deploy a trained deep learning model into an AWS Inferentia accelerated inference node consists of the following steps:

  1. Train a neural network model.
  2. Process the trained model via the AWS Neuron compiler to generate an AWS Inferentia-optimized trained neural model.
  3. Use the AWS Neuron runtime to load the AWS Inferentia-optimized model to EC2 Inf1 instances and run inference requests.

Inference at scale: An elastic architecture for AWS Inferentia

The architecture elasticity is determined by an AWS Lambda function and Amazon Simple Queue Service (Amazon SQS) queue that receives requests for segmentations initiated by simply uploading the volume that needs to be segmented into an Amazon Simple Storage Service (Amazon S3) bucket.

The AWS Inferentia ECS cluster gets fed from a highly performant Amazon FSx for Lustre file system, which accelerates compute workloads with shared storage that provides sub-millisecond latencies, up to hundreds of GBs/s of throughput, and millions of IOPS.

The following diagram outlines the architecture that enables the AWS Inferentia cluster to be elastic and scale dynamically according the number of inference requests submitted to the whole system.

In this architecture, an actor pushes an image volume to an S3 bucket. After the image volume is uploaded to THE S3 bucket, a Lambda function gets triggered using the built-in Amazon S3 event notification.

This function places the image volume S3 key into a request queue implemented via Amazon SQS. At the same time, it instructs the AWS Inferentia ECS cluster to start a new task to process the uploaded image volume.

To compliment this architecture, another Lambda function fetches the SQS queue depth and uses this value to modulate the size of the ECS cluster, adding or removing nodes according to the queue depth.

To ensure that the ECS cluster can be fed constantly with data, a highly performant FSx for Lustre file system is placed in front of the ECS cluster. Here, using the automated integration of FSx for Lustre with Amazon S3, the data uploaded into the S3 bucket landing zone is automatically made available in the FSx for Lustre file system and is ready to be consumed by the ECS cluster.

Inference results

The following sample images show the results of a brain tumor classification (multi-class segmentation) task done using the architecture described in this post.

The following figure shows the benchmark results of AWS Inferentia vs. NVIDIA Tesla V100-SXM2-16GB GPU.

Conclusion

Medical imaging is an important tool for the diagnosis and localization of disease. With the growing demand for diagnosis from various modalities, for example from emergency units, the need for automated tools to isolate and support radiologists and doctors in the diagnosis of various pathologies is becoming increasingly important.

In this post, we explored using EC2 Inf1 instance types with AWS Inferentia acceleration to build an elastic inference architecture that can support the ever-increasing inference demand while keeping costs under control.

To learn more about how AWS is accelerating innovation in healthcare, visit AWS for Health.


About the Author

Benedetto Carollo is the Senior Solution Architect for medical imaging and healthcare at Amazon Web Services in Europe, Middle East, and Africa. His work focuses on helping medical imaging and healthcare customers solve business problems by leveraging technology. Benedetto has over 15 years of experience of technology and medical imaging and has worked for companies like Canon Medical Research and Vital Images. Benedetto received his summa cum laude MSc in Software Engineering from the University of Palermo – Italy.

Read More

Serve multiple models with Amazon SageMaker and Triton Inference Server

Serve multiple models with Amazon SageMaker and Triton Inference Server

Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. It helps data scientists and developers prepare, build, train, and deploy high-quality ML models quickly by bringing together a broad set of capabilities purpose-built for ML.

In 2021, AWS announced the integration of NVIDIA Triton Inference Server in SageMaker. You can use NVIDIA Triton Inference Server to serve models for inference in SageMaker. By using an NVIDIA Triton container image, you can easily serve ML models and benefit from the performance optimizations, dynamic batching, and multi-framework support provided by NVIDIA Triton. Triton helps maximize the utilization of GPU and CPU, further lowering the cost of inference.

In some scenarios, users want to deploy multiple models. For example, an application for revising English composition always includes several models, such as BERT for text classification and GECToR to grammar checking. A typical request may flow across multiple models, like data preprocessing, BERT, GECToR, and postprocessing, and they run serially as inference pipelines. If these models are hosted on different instances, the additional network latency between these instances increases the overall latency. For an application with uncertain traffic, deploying multiple models on different instances will inevitably lead to inefficient utilization of resources.

Consider another scenario, in which users develop multiple models with different versions, and each model uses a different training framework. A common practice is to use multiple containers, each of which deploys a model. But this will cause increased workload and costs for development, operation, and maintenance. In this post, we discuss how SageMaker and NVIDIA Triton Inference Server can solve this problem.

Solution overview

Let’s look at how SageMaker inference works. SageMaker invokes the hosting service by running a Docker container. The Docker container launches a RESTful inference server (such as Flask) to serve HTTP requests for inference. The inference server loads the model and listens to port 8080 providing external service. The client application sends a POST request to the SageMaker endpoint, SageMaker passes the request to the container, and returns the inference result from the container to the client.

In our architecture, we use NVIDIA Triton Inference Server, which provides concurrent runs of multiple models from different frameworks, and we use a Flask server to process client-side requests and dispatch these requests to the backend Triton server. While launching a Docker container, the Triton server and Flask server are started automatically. The Triton server loads multiple models and exposes ports 8000, 8001, and 8002 as gRPC, HTTP, and metrics server. The Flask server listens to 8080 ports and parses the original request and payload, and then invokes the local Triton backend via model name and version information. For the client side, it adds the model name and model version in the request in addition to the original payload, so that Flask is able to route the inference request to the correct model on Triton server.

The following diagram illustrates this process.

Solution Architecture

A complete API call from the client is as follows:

  1. The client assembles the request and initiates the request to a SageMaker endpoint.
  2. The Flask server receives and parses the request, and gets the model name, version, and payload.
  3. The Flask server assembles the request again and routes to the corresponding endpoint of the Triton server according to the model name and version.
  4. The Triton server runs an inference request and sends responses to the Flask server.
  5. The Flask server receives the response message, assembles the message again, and returns it to the client.
  6. The client receives and parses the response, and continues to subsequent business procedures.

In the following sections, we introduce the steps needed to prepare a model and build the TensorRT engine, prepare a Docker image, create a SageMaker endpoint, and verify the result.

Prepare models and build the engine

We demonstrate hosting three typical ML models in our solution: image classification (ResNet50), object detection (YOLOv5), and a natural language processing (NLP) model (BERT-base). NVIDIA Triton Inference Server supports multiple formats, including TensorFlow 1. x and 2. x, TensorFlow SavedModel, TensorFlow GraphDef, TensorRT, ONNX, OpenVINO, and PyTorch TorchScript.

The following table summarizes our model details.

Model Name Model Size Format
ResNet50 52M Tensor RT
YOLOv5 38M Tensor RT
BERT-base 133M ONNX RT

NVIDIA provides detailed documentation describing how to generate the TensorRT engine. To achieve best performance, the TensorRT engine must be built over the device. This means the build time and runtime require the same computer capacity. For example, a TensorRT engine built on a g4dn instance can’t be deployed on a g5 instance.

You can generate your own TensorRT engines according to your needs. For test purposes, we prepared sample codes and deployable models with the TensorRT engine. The source code is also available on GitHub.

Next, we use an Amazon Elastic Compute Cloud (Amazon EC2) G4dn instance to generate the TensorRT engine with the following steps. We use YOLOv5 as an example.

  1. Launch a G4dn.2xlarge EC2 instance with the Deep Learning AMI (Ubuntu 20.04) in the us-east-1 Region.
  2. Open a terminal window and use the ssh command to connect to the instance.
  3. Run the following commands one by one:
    nvidia-docker run --gpus all -it --rm -v `pwd`/workspace:/workspace nvcr.io/nvidia/pytorch:22.04-py3
    git clone -b v7.0.1 https://github.com/ultralytics/yolov5
    pip install seaborn
    pip install onnx-simplifier
    cd yolov5
    wget https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5s.pt
    python export.py --weights yolov5s.pt --include onnx --simplify --imgsz 640 640 --device 0
    onnxsim yolov5s.onnx yolov5s-sim.onnx
    
    trtexec --onnx=yolov5s-sim.onnx --saveEngine=model.plan --explicitBatch --workspace=1024*12
    

  4. Create a config.pbtxt file:
    name: "yolov5s"
    platform: "tensorrt_plan"
    input: [
        {
            name: "images"
            data_type: TYPE_FP32
            format: FORMAT_NONE
            dims: [1, 3, 640, 640 ]
        }
    ]
    output: [
        {
            name: "output",
            data_type: TYPE_FP32
            dims: [1,25200,85 ]
        }
    ]
    

  5. Create the following file structure and put the generated files in the appropriate location:
    mkdir yolov5s
    mkdir -p yolov5s/1
    cp config.pbtxt yolov5s
    cp model.plan yolov5s/1
    
    yolov5s
    ├── 1
    │   └── model.plan
    └── config.pbtxt

Test the TensorRT engine

Before we deploy to SageMaker, we start a Triton server to verify these three models are configured correctly. Use the following command to start a Triton server and load the models:

docker run --gpus all --rm -p8000:8000 -p8001:8001 -v<MODEL_ROOT_DIR>/model_repository:/models nvcr.io/nvidia/tritonserver:22.04-py3 tritonserver --model-repository=/models

If you receive the following prompt message, it means the Triton server is started correctly.

Enter nvidia-smi in the terminal to see GPU memory usage.

Client implementation for inference

The file structure is as follows:

  • serve – The wrapper that starts the inference server. The Python script starts the NGINX, Flask, and Triton server.
  • predictor.py – The Flask implementation for /ping and /invocations endpoints, and dispatching requests.
  • wsgi.py – The startup shell for the individual server workers.
  • base.py – The abstract method definition that each client requires to implement their inference method.
  • client folder – One folder per client:
    • resnet
    • bert_base
    • yolov5
  • nginx.conf – The configuration for the NGINX primary server.

We define an abstract method to implement the inference interface, and each client implements this method:

from abc import ABC, abstractmethod
class Base(ABC):
    @abstractmethod
    def inference(self,img):
        pass

The Triton server exposes an HTTP endpoint on port 8000, a gRPC endpoint on port 8001, and a Prometheus metrics endpoint on port 8002. The following is a sample ResNet client with a gRPC call. You can implement the HTTP interface or gRPC interface according to your use case.

from base import Base
import numpy as np
import tritonclient.grpc as grpcclient
from PIL import Image
import cv2
class Resnet(Base):
    def image_transform_onnx(self, image, size: int) -> np.ndarray:
        '''Image transform helper for onnx runtime inference.'''
        img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  #OpenCV follows BGR convention and PIL follows RGB
        image = Image.fromarray(img)
        image = image.resize((size,size))

        # now our image is represented by 3 layers - Red, Green, Blue
        # each layer has a 224 x 224 values representing
        image = np.array(image)

        # dummy input for the model at export - torch.randn(1, 3, 224, 224)
        image = image.transpose(2,0,1).astype(np.float32)

        # our image is currently represented by values ranging between 0-255
        # we need to convert these values to 0.0-1.0 - those are the values that are expected by our model
        image /= 255
        image = image[None, ...]
        return image

    def inference(self, img):
        INPUT_SHAPE = (224, 224)

        TRITON_IP = "localhost"
        TRITON_PORT = 8001
        MODEL_NAME = "resnet"
        INPUTS = []
        OUTPUTS = []
        INPUT_LAYER_NAME = "input"
        OUTPUT_LAYER_NAME = "output"

        INPUTS.append(grpcclient.InferInput(INPUT_LAYER_NAME, [1, 3, INPUT_SHAPE[0], INPUT_SHAPE[1]], "FP32"))
        OUTPUTS.append(grpcclient.InferRequestedOutput(OUTPUT_LAYER_NAME, class_count=3))
        TRITON_CLIENT = grpcclient.InferenceServerClient(url=f"{TRITON_IP}:{TRITON_PORT}")

        INPUTS[0].set_data_from_numpy(self.image_transform_onnx(img, 224))

        results = TRITON_CLIENT.infer(model_name=MODEL_NAME, inputs=INPUTS, outputs=OUTPUTS, headers={})
        output = np.squeeze(results.as_numpy(OUTPUT_LAYER_NAME))
        #print(output)
        lista = [x.decode('utf-8') for x in output.tolist()]
        return lista

In this architecture, the NGINX, Flask, and Triton servers should be started at the beginning. Edit the serve file and add a line to start the Triton server.

Build a Docker image and push the image to Amazon ECR

The Docker file code looks as follows:

FROM nvcr.io/nvidia/tritonserver:22.04-py3 

# Add arguments to achieve the version, python and url
ARG PYTHON=python3
ARG PYTHON_PIP=python3-pip
ARG PIP=pip3

ENV LANG=C.UTF-8

RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys A4B469963BF863CC 
 && apt-get update 
 && apt-get install -y nginx 
 && apt-get install -y libgl1-mesa-glx 
 && apt-get clean 
 && rm -rf /var/lib/apt/lists/*

RUN ${PIP} install -U --no-cache-dir 
tritonclient[all] 
torch 
torchvision 
pillow==9.1.1 
scipy==1.8.1 
transformers==4.20.1 
opencv-python==4.6.0.66 
flask 
gunicorn 
&& 

ldconfig && 
apt-get clean && 
apt-get autoremove && 
rm -rf /var/lib/apt/lists/* /tmp/* ~/* &&
mkdir -p /opt/program/models/ 
                                                                               
COPY sm /opt/program
COPY model /opt/program/models
WORKDIR /opt/program

ENTRYPOINT ["python3", "serve"]

Install and configure the aws-cli client with the following code:

sudo apt install awscli
sudo apt install git-all
aws configure
# # input AWS Access Key ID, AWS Secret Access Key, Default region name and Default output format

Run the following command to build the Docker image and push the image to Amazon Elastic Container Registry (Amazon ECR). Provide your Region and account ID.

aws ecr get-login-password --region <regionID> | docker login --username AWS --password-stdin <accountID>.dkr.ecr.<regionID>.amazonaws.com

docker build -t inference/mytriton .

docker tag inference/mytriton:latest <accountID>.dkr.ecr. <regionID>.amazonaws.com/inference/mytriton:latest

docker push <accountID>.dkr.ecr.<regionID>.amazonaws.com/inference/mytriton:latest

Create a SageMaker endpoint and test the endpoint

Now it’s time to verify the result. Launch a notebook instance with an ml.c5.xlarge instance from the SageMaker console, and create a notebook with the conda_python3 kernel. The following code snippet shows an example deployment of an inference endpoint. The source code is available in the GitHub repo.

role = get_execution_role()
sess = sage.Session()
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = '{}.dkr.ecr.{}.amazonaws.com/inference/mytriton:latest'.format(account, region)
model = sess.create_model(
        name="mytriton", role=role, container_defs=image)
endpoint_cfg=sess.create_endpoint_config(
        name="MYTRITONCFG",
        model_name="mytriton",
        initial_instance_count=1,
        instance_type="ml.g4dn.xlarge"
    )
endpoint=sess.create_endpoint(
        endpoint_name="MyTritonEndpoint", config_name="MYTRITONCFG")

Wait about 3 minutes until the inference server is started to verify the result.

The following code is the ResNet client request:

## resnet client
runtime = boto3.Session().client('runtime.sagemaker')
img = cv2.imread('dog.jpg')
string_img = base64.b64encode(cv2.imencode('.jpg', img)[1]).decode()
payload = json.dumps({"modelname": "resnet","payload": {"img":string_img}})

endpoint="MyTritonEndpoint"
response = runtime.invoke_endpoint(EndpointName=endpoint,ContentType="application/json",Body=payload,Accept='application/json')

out=response['Body'].read()
res=eval(out)
print(res)

We get the following response:

{'modelname': 'resnet', 'result': ['11.250000:250:250:malamute, malemute, Alaskan malamute', '9.914062:249:249:Eskimo dog, husky', '9.906250:248:248:Saint Bernard, St Bernard']}

The following code is the YOLOv5 client request:

# yolov5 client
payload = json.dumps({"modelname": "yolov5","payload": {"img":string_img}})

endpoint="MyTritonEndpoint"
response = runtime.invoke_endpoint(EndpointName=endpoint,ContentType="application/json",Body=payload,Accept='application/json')

out=response['Body'].read()
res=eval(out)
print(str(out))

We get the following response:

b'{"modelname": "yolov5", "result": [[16, 0.9168673157691956, 111.92530059814453, 258.53240966796875, 262.0159606933594, 533.407958984375, 768, 576], [2, 0.6941519379615784, 392.20037841796875, 573.6005249023438, 142.55178833007812, 224.56454467773438, 768, 576], [1, 0.5813695788383484, 131.8942413330078, 473.7420654296875, 179.61459350585938, 427.0913391113281, 768, 576], [7, 0.5316226482391357, 392.82275390625, 572.4647216796875, 144.685546875, 223.052734375, 768, 576]]}'

The following code is the BERT client request:

# bert client
text="The world has [MASK] people."

payload = json.dumps({"modelname": "bert_base","payload": {"text":text}})

endpoint="MyTritonEndpoint"
response = runtime.invoke_endpoint(EndpointName=endpoint,ContentType="application/json",Body=payload,Accept='application/json')

out=response['Body'].read()
res=eval(out)
print(res)

We get the following response:

{'modelname': 'bert_base', 'result': [{'token': 'The world has many people.', 'score': 0.16609132289886475}, {'token': 'The world has no people.', 'score': 0.07334889471530914}, {'token': 'The world has few people.', 'score': 0.0617995485663414}, {'token': 'The world has two people.', 'score': 0.03924647718667984}, {'token': 'The world has its people.', 'score': 0.023465465754270554}]}

Here we see our architecture is working as expected.

Note that hosting an endpoint will incur some costs. Therefore, delete the endpoint after you complete the test:

runtime.delete_endpoint(EndpointName=endpoint)

Cost estimation

To estimate cost, assume that you have three models, but not all of them are long-running. You’re using one endpoint for each model, and the online time of each endpoint is different. Using ml.g4dn.xlarge as an example, the total cost is about $971.52/month. The following table lists the details.

Model Name Endpoint Running /Day Instance Type Cost/Month (us-east-1)
ResNet 24 hours ml.g4dn.xlarge 0.736 * 24 * 30=$529.92
BERT 8 hours ml.g4dn.xlarge 0.736 * 8 * 30=$176.64
YOLOv5 12 hours ml.g4dn.xlarge 0.736 * 12 * 30=$264.96

The following table shows the cost for sharing one endpoint for three models using the preceding architecture. The total cost is about $676.8/month. From this result, we can conclude that you can save 30% in costs while also having 24/7 service from your endpoint.

Model Name Endpoint Running /Day Instance Type Cost/Month (us-east-1)
ResNet, YOLOv5, BERT 24 hours ml.g4dn.2xlarge 0.94 * 24 * 30 = $676.8

Summary

In this post, we introduced an improved architecture in which multiple models share one endpoint in SageMaker. Under some conditions, this solution can help you save costs and improve resource utilization. It is suitable for business scenarios with low concurrency and latency-insensitive requirements.

To learn more about SageMaker and AI/ML solutions, refer to Amazon SageMaker.

References


About the authors

Zheng Zhang is a Senior Specialist Solutions Architect in AWS, he focuses on helping customers accelerate model training, inference and deployment for machine learning solutions. He also has rich experience in large-scale distributed training, design AI/ML solutions.

Yinuo He is an AI/ML specialist in AWS. She has experiences in designing and developing machine learning based products to provide better user experiences. She now works to help customers succeed in their ML journey.

Read More

Model Hosting Patterns in SageMaker: Best practices in testing and updating models on SageMaker

Model Hosting Patterns in SageMaker: Best practices in testing and updating models on SageMaker

Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to quickly build, train, and deploy machine learning (ML) models. With SageMaker, you can deploy your ML models on hosted endpoints and get inference results in real time. You can easily view the performance metrics for your endpoints in Amazon CloudWatch, automatically scale endpoints based on traffic, and update your models in production without losing any availability. SageMaker offers a wide variety of options to deploy ML models for inference in any of the following ways, depending on your use case:

  • For synchronous predictions that need to be served in the order of milliseconds, use SageMaker real-time inference
  • For workloads that have idle periods between traffic spurts and can tolerate cold starts, use Serverless Inference
  • For requests with large payload sizes up to 1 GB, long processing times (up to 15 minutes) and near-real-time latency requirements (seconds to minutes), use SageMaker Asynchronous Inference
  • To get predictions for an entire dataset, use SageMaker batch transform

Real-time inference is ideal for inference workloads where you have real time, interactive, low latency requirements. You deploy your model to SageMaker hosting services and get an endpoint that can be used for inference. These endpoints are backed by a fully managed infrastructure and support auto scaling. You can improve efficiency and cost by combining multiple models into a single endpoint using multi-model endpoints or multi-container endpoints.

There are certain use cases where you want to deploy multiple variants of the same model into production to gauge their performance, measure improvements, or run A/B tests. In such cases, SageMaker multi-variant endpoints are useful because they allow you to deploy multiple production variants of a model to the same SageMaker endpoint.

In this post, we discuss SageMaker multi-variant endpoints and best practices for optimization.

Comparing SageMaker real-time inference options

The following diagram gives a quick overview of the real-time inference options with SageMaker.

SageMaker real-time inference options

A single-model endpoint allows you to deploy one model on a container hosted on dedicated instances or serverless for low latency and high throughput. You can create a model and retrieve a SageMaker supported image for popular frameworks such as TensorFlow, PyTorch, Scikit-learn, and more. If you’re working with a custom framework for your model, you can also bring your own container that installs your dependencies.

SageMaker also supports more advanced options such as multi-model endpoints (MMEs) and multi-container endpoints (MCEs). MMEs are useful when you’re dealing with hundreds to tens of thousands of models and where you don’t need to deploy each model as an individual endpoint. MMEs allow you to host multiple models in a cost-effective, scalable manner within the same endpoint by using a shared serving container hosted on an instance. The underlying infrastructure (container and instance) remains the same, but the models are loaded and unloaded dynamically from a common S3 location, according to usage and the amount of memory available on the endpoint. Your application simply needs to include an API call with the target model to this endpoint to achieve low-latency, high-throughput inference. Instead of paying for a separate endpoint for every single model, you can host many models for the price of a single endpoint.

MCEs enable you to run up to 15 different ML containers on a single endpoint and invoke them independently. You can build these ML containers on different serving stacks (such as ML framework, model server, and algorithm), to be run on the same endpoint for cost savings. You can stitch the containers together in a serial inference pipeline or invoke the container independently. This can be ideal when you have several different ML models that have different traffic patterns and similar resource needs. Examples of when to utilize MCEs include, but are not limited to, the following:

  • Hosting models across different frameworks (such as TensorFlow, PyTorch, and Scikit-learn) that don’t have sufficient traffic to saturate the full capacity of an instance
  • Hosting models from the same framework with different ML algorithms (such as recommendations, forecasting, or classification) and handler functions
  • Comparisons of similar architectures running on different framework versions (such as TensorFlow 1.x vs. TensorFlow 2.x) for scenarios like A/B testing

SageMaker multi-variant endpoints (MVEs) allow you to test multiple models or model versions behind the same endpoint using production variants. Each production variant identifies a ML model and the resources deployed for hosting the model, such as the serving container and instance.

Overview of SageMaker multi-variant endpoints

In production ML workflows, data scientists and ML engineers refine models through a variety of methods, such as retraining based on data/model/concept drift, hyperparameter tuning, feature selection, framework selection, and more. Performing A/B testing between a new model and an old model with production traffic can be an effective final step in the validation process for a new model. In A/B testing, you test different variants of your models and compare how each variant performs relative to each other. You then choose the best-performing model to replace the previous model with a new version that delivers better performance than the previous version. By using production variants, you can test these ML models and different model versions behind the same endpoint. You can train these ML models using different datasets, different algorithms, and ML frameworks; deploy them to different instance types; or any combination of these options. The load balancer connected to the SageMaker endpoint provides the ability to distribute the invocation requests across multiple production variants. For example, you can distribute traffic between production variants by specifying the traffic distribution for each variant, or you can invoke a specific variant directly for each request.

You can also configure the auto scaling policy to automatically scale your variants in or out based on metrics such as requests per second.

The following diagram illustrates how MVE works in more detail.

SageMaker multi-variant endpoint

Deploying an MVE is also very straightforward. All you need to do is define model objects with the image and model data using the create_model construct from the SageMaker Python SDK, and define the endpoint configurations using production_variant constructs to create production variants, each with its own different model and resource requirements (instance type and counts). This enables you to also test models on different instance types. To deploy, use the endpoint_from_production_variant construct to create the endpoint.

During endpoint creation, SageMaker provisions the hosting instance specified in the endpoint settings and downloads the model and inference container specified by the production variant to the hosting instance. If a successful response is returned after starting the container and performing a health check with a ping, a message indicating that the endpoint creation is complete is sent to the user. See the following code:

sm_session.create_model(
	name=model_name,
	role=role,
	container_defs={'Image':  image_uri, 'ModelDataUrl': model_url}
	)

sm_session.create_model(
	name=model_name2,
	role=role,
	container_defs={'Image':  image_uri, 'ModelDataUrl': model_url2 }
	)

variant1 = production_variant(
	model_name=model_name,
	instance_type="ml.c5.4xlarge",
	initial_instance_count=1,
	variant_name="Variant1",
	initial_weight=1
	)

variant2 = production_variant(
	model_name=model_name2,
	instance_type="ml.m5.4xlarge",
	initial_instance_count=1,
	variant_name="Variant2",
	initial_weight=1
	)

sm_session.endpoint_from_production_variants(
	name=endpoint_name,
	production_variants=[variant1,  variant2]
	)

In the preceding example, we created two variants, each with its own different model (these could also have different instance types and counts). We set an initial_weight of 1 for both variants: this means 50% of our requests go to Variant1, and the remaining 50% to Variant2. The sum of weights across both variants is 2 and each variant has weight assignment of 1. This implies each variant receives 50% of the total traffic.

Invoking the endpoint is similar to the common SageMaker construct invoke_endpoint; you can call the endpoint directly with the data as a payload:

sm_runtime.invoke_endpoint(
	EndpointName=endpoint_name,
	ContentType="text/csv",
	Body=payload
	)

SageMaker emits metrics such as Latency and Invocations for each variant in CloudWatch. For a complete list of metrics that SageMaker emits, see Monitor Amazon SageMaker with Amazon CloudWatch. You can query CloudWatch to get the number of invocations per variant, to see how invocations are split across variants by default.

To invoke a specific version of the model, specify a variant as the TargetVariant in the call to invoke_endpoint:

sm_runtime.invoke_endpoint(
	EndpointName=endpoint_name,
	ContentType="text/csv",
	Body=payload,
	TargetVariant="Variant1"
	)

You can evaluate each production variant’s performance by reviewing metrics such as accuracy, precision, recall, F1 score, and receiver operating characteristic/area under the curve for each variant using Amazon SageMaker Model Monitor. You can then decide to increase traffic to the best model by updating the weights assigned to each variant by calling UpdateEndpointWeightsAndCapacities. This changes the traffic distribution to your production variants without requiring updates to your endpoint. So instead of 50% of the traffic from the initial setup, we shift 75% of the traffic to Variant2 by assigning new weights to each variant using UpdateEndpointWeightsAndCapacities. See the following code:

sm.update_endpoint_weights_and_capacities(
	EndpointName=endpoint_name,
	DesiredWeightsAndCapacities=[
	{
		"DesiredWeight": 25,
		"VariantName": variant1["VariantName"]
	},
	{
		"DesiredWeight": 75,
		"VariantName": variant2["VariantName"]
	}
] )

When you’re satisfied with a variant’s performance, you can route 100% of the traffic to that variant. For example, you can set the weight for Variant1 to 0 and the weight for Variant2 to 1. SageMaker then sends 100% of all inference requests to Variant2. You can then safely update your endpoint and delete Variant1 from your endpoint. You can also continue testing new models in production by adding new variants to your endpoint. You can also configure these endpoints to scale automatically based on the traffic the endpoints receive.

Advantages of multi-variant endpoints

SageMaker MVEs allow you to do the following:

  • Deploy and test multiple variants of a model using the same SageMaker endpoint. This is useful for testing variations of a model in production. For example, suppose that you’ve deployed a model into production. You can test a variation of the model by directing a small amount of traffic, say 5%, to the new model.
  • Evaluate model performance in production without interrupting traffic by monitoring operational metrics for each variant in CloudWatch.
  • Update models in production without losing any availability. You can modify an endpoint without taking models that are already deployed into production out of service. For example, you can add new model variants, update the ML compute instance configurations of existing model variants, or change the distribution of traffic among model variants. For more information, see UpdateEndpoint and UpdateEndpointWeightsAndCapacities.

Challenges when using multi-variant endpoints

SageMaker MVEs come with the following challenges:

  • Load testing effort – You need to put in a fair amount of effort and resources for testing and model matrix comparisons for each variant. For an A/B test to be considered successful, you need to perform a statistical analysis of the metrics gathered from the test to determine if there is a statistically significant result. It could become challenging to minimize exploring poor performing variants. You could potentially use the multi-armed bandit optimization technique to avoid sending traffic to experiments that aren’t working and optimize performance as you test. For load testing, you could also explore Amazon SageMaker Inference Recommender to conduct extensive benchmarks based on production requirements for latency and throughput, custom traffic patterns, and instances (up to 10) that you select.
  • Tight coupling between model variant and endpoint – It could become tricky depending on model deployment frequency, because the endpoint may end up in updating status for each production variant being updated. SageMaker also supports deployment guardrails, which you can use to easily switch from the current model in production to a new one in a controlled way. This option introduces canary and linear traffic shifting modes so that you can have granular control over the shifting of traffic from your current model to the new one during the course of the update. With built-in safeguards such as auto-rollbacks, you can catch issues early and automatically take corrective action before they cause significant production impact.

Best practices for multi-variant endpoints

When hosting models using SageMaker MVEs, consider the following:

  • SageMaker is great for testing new models because you can easily deploy them into an A/B testing environment and you pay for only what you use. You’re charged per instance-hour consumed for each instance while the endpoint is running. When you’re done with your tests and not using the endpoint or the variants extensively anymore, you should delete it to save cost. You can always recreate it when you need it again because the model is stored in Amazon Simple Storage Service (Amazon S3).
  • You should use the most optimal instance type and size to deploy models. SageMaker currently offers ML compute instances on various instance families. An endpoint instance is running all the time (while the instance is in service). Therefore, selecting the right type of instance can have a significant impact on the total cost and performance of ML models. Load testing is the best practice to determine the appropriate instance type and fleet size, with or without auto scaling for your live endpoint to avoid over-provisioning and paying extra for capacity you don’t need.
  • You can monitor model performance and resource utilization in CloudWatch. You can configure a ProductionVariant to use Application Auto Scaling. To specify the metrics and target values for a scaling policy, you configure a target-tracking scaling policy. You can use either a predefined metric or a custom metric. For more information about policy configuration syntax, see TargetTrackingScalingPolicyConfiguration. For information about configuring automatic scaling, see Automatically Scale Amazon SageMaker Models. To quickly define a target-tracking scaling policy for a variant, you can choose a specific CloudWatch metric and set threshold values. For example, use metric SageMakerVariantInvocationsPerInstance to monitor the average number of times per minute that each instance for a variant is invoked, or use metric CPUUtilization to monitor the sum of work handled by a CPU. The following example uses the SageMakerVariantInvocationsPerInstance predefined metric to adjust the number of variant instances so that each instance has an InvocationsPerInstance metric of 70:
{
	"TargetValue": 70.0,
	"PredefinedMetricSpecification":
	{
		"PredefinedMetricType": "SageMakerVariantInvocationsPerInstance"
	}
}
  • Changing or deleting model artifacts or changing inference code after deploying a model produces unpredictable results. Before deploying models to production, it’s a good practice to check whether the model hosting in local mode is successful after sufficiently debugging the inference code snippets (like model_fn, input_fn, predict_fn, and output_fn) in the local development environment like a SageMaker notebook instance or local server. If you need to change or delete model artifacts or change inference code, modify the endpoint by providing a new endpoint configuration. After you provide the new endpoint configuration, you can change or delete the model artifacts corresponding to the old endpoint configuration.
  • You can use SageMaker batch transform to test production variants. Batch transform is ideal to get inferences from large datasets. You can create a separate transform job for each new model variant and use a validation dataset to test. For each transform job, specify a unique model name and location in Amazon S3 for the output file. To analyze the results, use inference pipeline logs and metrics.

Conclusion

SageMaker enables you to easily A/B test ML models in production by running multiple production variants on an endpoint. You can use SageMaker’s capabilities to test models that have been trained using different training datasets, hyperparameters, algorithms, or ML frameworks; how they perform on different instance types; or a combination of all of the above. You can provide the traffic distribution between the variants on an endpoint, and SageMaker splits the inference traffic to the variants based on the specified distribution. Alternately, if you want to test models for specific customer segments, you can specify the variant that should process an inference request by providing the TargetVariant header, and SageMaker will route the request to the variant that you specified. For more information about A/B testing, see Safely update models in production.

References


About the authors

Deepali Rajale is AI/ML Specialist Technical Account Manager at Amazon Web Services. She works with enterprise customers providing technical guidance on implementing machine learning solutions with best practices. In her spare time, she enjoys hiking, movies and hanging out with family and friends.

Dhawal Patel is a Principal Machine Learning Architect at AWS. He has worked with organizations ranging from large enterprises to mid-sized startups on problems related to distributed computing, and Artificial Intelligence. He focuses on Deep learning including NLP and Computer Vision domains. He helps customers achieve high performance model inference on SageMaker.

Saurabh Trikande is a Senior Product Manager for Amazon SageMaker Inference. He is passionate about working with customers and is motivated by the goal of democratizing machine learning. He focuses on core challenges related to deploying complex ML applications, multi-tenant ML models, cost optimizations, and making deployment of deep learning models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch and spending time with his family.

Read More

New Volvo EX90 SUV Heralds AI Era for Swedish Automaker, Built on NVIDIA DRIVE

New Volvo EX90 SUV Heralds AI Era for Swedish Automaker, Built on NVIDIA DRIVE

It’s a new age for safety.

Volvo Cars unveiled the Volvo EX90 SUV today in Stockholm, marking the beginning of a new era of electrification, technology and safety for the automaker. The flagship vehicle is redesigned from tip to tail — with a new powertrain, branding and software-defined AI compute — powered by the centralized NVIDIA DRIVE Orin platform.

The Volvo EX90 silhouette is in line with Volvo Cars’ design principle of form following function — and looks good at the same time.

Under the hood, it’s filled with cutting-edge technology for new advances in electrification, connectivity, core computing, safety and infotainment. The EX90 is the first Volvo car that is hardware-ready to deliver unsupervised autonomous driving.

These features come together to deliver an SUV that cements Volvo Cars in the next generation of software-defined vehicles.

“We used technology to reimagine the entire car,” said Volvo Cars CEO Jim Rowan. “The Volvo EX90 is the safest that Volvo has ever produced.”

Computer on Wheels

The Volvo EX90 looks smart and has the brains to back it up.

Volvo Cars’ proprietary software runs on NVIDIA DRIVE Orin to operate most of the core functions inside the car, including safety, infotainment and battery management. This intelligent architecture is designed to deliver a highly responsive and enjoyable experience for every passenger in the car.

The DRIVE Orin system-on-a-chip delivers 254 trillion operations per second — ample compute headroom for a software-defined architecture. It’s designed to handle the large number of applications and deep neural networks needed to achieve systematic safety standards such as ISO 26262 ASIL-D.

The Volvo EX90 isn’t just a new car. It’s a highly advanced computer on wheels, designed to improve over time as Volvo Cars adds more software features.

Just Getting Started

The Volvo EX90 is just the beginning of Volvo Cars’ plans for the software-defined future.

The automaker plans to launch a new EV every year through 2025, with the end goal of having a purely electric, software-defined lineup by 2030.

The new flagship SUV is available for preorder in select markets, launching the next phase in Volvo Cars’ leadership in premium design and safety.

The post New Volvo EX90 SUV Heralds AI Era for Swedish Automaker, Built on NVIDIA DRIVE appeared first on NVIDIA Blog.

Read More

HORN Free! Roaming Rhinos Could Be Guarded by AI Drones

HORN Free! Roaming Rhinos Could Be Guarded by AI Drones

Call it the ultimate example of a job that’s sometimes best done remotely. Wildlife researchers say rhinos are magnificent beasts, but they like to be left alone, especially when they’re with their young.

In the latest example of how researchers are using the latest technologies to track animals less invasively, a team of researchers has proposed harnessing high-flying AI-equipped drones to track the endangered black rhino through the wilds of Namibia.

In a paper published earlier this year in the journal PeerJ, the researchers show the potential of drone-based AI to identify animals in even the remotest areas and provide real-time updates on their status from the air.

While drones — and technology of just about every kind — have been harnessed to track African wildlife, the proposal promises to help gamekeepers move faster to protect rhinos and other megafauna from poachers.

AI Podcast host Noah Kravitz spoke to two of the authors of the paper.

Zoey Jewell is co-founder and president of wild track.org, a global network of biologists and conservationists dedicated to non-invasive wildlife monitoring techniques. And Alice Hua is a recent graduate of the School of Information at UC Berkeley in California, and an ML platform engineer at CrowdStrike.

And for more, read the full paper at https://peerj.com/articles/13779/.

You Might Also Like

Artem Cherkasov and Olexandr Isayev on Democratizing Drug Discovery With NVIDIA GPUs

It may seem intuitive that AI and deep learning can speed up workflows — including novel drug discovery, a typically yearslong and several-billion-dollar endeavor. However, there is a dearth of recent research reviewing how accelerated computing can impact the process. Professors Artem Cherkasov and Olexandr Isayev discuss how GPUs can help democratize drug discovery.

Lending a Helping Hand: Jules Anh Tuan Nguyen on Building a Neuroprosthetic

Is it possible to manipulate things with your mind? Possibly. University of Minnesota postdoctoral researcher Jules Anh Tuan Nguyen discusses allowing amputees to control their prosthetic limbs with their thoughts, using neural decoders and deep learning.

Wild Things: 3D Reconstructions of Endangered Species With NVIDIA’s Sifei Liu

Studying endangered species can be difficult, as they’re elusive, and the act of observing them can disrupt their lives. Sifei Liu, a senior research scientist at NVIDIA, discusses how scientists can avoid these pitfalls by studying AI-generated 3D representations of these endangered species.

Subscribe to the AI Podcast: Now Available on Amazon Music

You can now listen to the AI Podcast through Amazon Music.

Also get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out our listener survey.

 

The post HORN Free! Roaming Rhinos Could Be Guarded by AI Drones appeared first on NVIDIA Blog.

Read More