Amazon Books editors have announced their picks for 2021 Best Books of the Year, including the top book within the general interest science category, Walter Isaacson’s ‘The Code Breaker’.Read More
MetNet-2: Deep Learning for 12-Hour Precipitation Forecasting
Posted by Nal Kalchbrenner and Lasse Espeholt, Google Research
Deep learning has successfully been applied to a wide range of important challenges, such as cancer prevention and increasing accessibility. The application of deep learning models to weather forecasts can be relevant to people on a day-to-day basis, from helping people plan their day to managing food production, transportation systems, or the energy grid. Weather forecasts typically rely on traditional physics-based techniques powered by the world’s largest supercomputers. Such methods are constrained by high computational requirements and are sensitive to approximations of the physical laws on which they are based.
Deep learning offers a new approach to computing forecasts. Rather than incorporating explicit physical laws, deep learning models learn to predict weather patterns directly from observed data and are able to compute predictions faster than physics-based techniques. These approaches also have the potential to increase the frequency, scope, and accuracy of the predicted forecasts.
Within weather forecasting, deep learning techniques have shown particular promise for nowcasting — i.e., predicting weather up to 2-6 hours ahead. Previous work has focused on using direct neural network models for weather data, extending neural forecasts from 0 to 8 hours with the MetNet architecture, generating continuations of radar data for up to 90 minutes ahead, and interpreting the weather information learned by these neural networks. Still, there is an opportunity for deep learning to extend improvements to longer-range forecasts.
To that end, in “Skillful Twelve Hour Precipitation Forecasts Using Large Context Neural Networks”, we push the forecasting boundaries of our neural precipitation model to 12 hour predictions while keeping a spatial resolution of 1 km and a time resolution of 2 minutes. By quadrupling the input context, adopting a richer weather input state, and extending the architecture to capture longer-range spatial dependencies, MetNet-2 substantially improves on the performance of its predecessor, MetNet. Compared to physics-based models, MetNet-2 outperforms the state-of-the-art HREF ensemble model for weather forecasts up to 12 hours ahead.
MetNet-2 Features and Architecture
Neural weather models like MetNet-2 map observations of the Earth to the probability of weather events, such as the likelihood of rain over a city in the afternoon, of wind gusts reaching 20 knots, or of a sunny day ahead. End-to-end deep learning has the potential to both streamline and increase quality by directly connecting a system’s inputs and outputs. With this in mind, MetNet-2 aims to minimize both the complexity and the total number of steps involved in creating a forecast.
The inputs to MetNet-2 include the radar and satellite images also used in MetNet. To capture a more comprehensive snapshot of the atmosphere with information such as temperature, humidity, and wind direction — critical for longer forecasts of up to 12 hours — MetNet-2 also uses the pre-processed starting state used in physical models as a proxy for this additional weather information. The radar-based measures of precipitation (MRMS) serve as the ground truth (i.e., what we are trying to predict) that we use in training to optimize MetNet-2’s parameters.
Example ground truth image: Instantaneous precipitation (mm/hr) based on radar (MRMS) capturing a 12 hours-long progression. |
MetNet-2’s probabilistic forecasts can be viewed as averaging all possible future weather conditions weighted by how likely they are. Due to its probabilistic nature, MetNet-2 can be likened to physics-based ensemble models, which average some number of future weather conditions predicted by a variety of physics-based models. One notable difference between these two approaches is the duration of the core part of the computation: ensemble models take ~1 hour, whereas MetNet-2 takes ~1 second.
Steps in a MetNet-2 forecast and in a physics-based ensemble. |
One of the main challenges that MetNet-2 must overcome to make 12 hour long forecasts is capturing a sufficient amount of spatial context in the input images. For each additional forecast hour we include 64 km of context in every direction at the input. This results in an input context of size 20482 km2 — four times that used in MetNet. In order to process such a large context, MetNet-2 employs model parallelism whereby the model is distributed across 128 cores of a Cloud TPU v3-128. Due to the size of the input context, MetNet-2 replaces the attentional layers of MetNet with computationally more efficient convolutional layers. But standard convolutional layers have local receptive fields that may fail to capture large spatial contexts, so MetNet-2 uses dilated receptive fields, whose size doubles layer after layer, in order to connect points in the input that are far apart one from the other.
Example of input spatial context and target area for MetNet-2. |
Results
Because MetNet-2’s predictions are probabilistic, the model’s output is naturally compared with the output of similarly probabilistic ensemble or post-processing models. HREF is one such state-of-the-art ensemble model for precipitation in the United States, which aggregates ten predictions from five different models, twice a day. We evaluate the forecasts using established metrics, such as the Continuous Ranked Probability Score, which captures the magnitude of the probabilistic error of a model’s forecasts relative to the ground truth observations. Despite not performing any physics-based calculations, MetNet-2 is able to outperform HREF up to 12 hours into the future for both low and high levels of precipitation.
Continuous Ranked Probability Score (CRPS; lower is better) for MetNet-2 vs HREF aggregated over a large number of test patches randomly located in the Continental United States. |
Examples of Forecasts
The following figures provide a selection of forecasts from MetNet-2 compared with the physics-based ensemble HREF and the ground truth MRMS.
Comparison of 0.2 mm/hr precipitation on March 30, 2020 over Denver, Colorado. Left: Ground truth, source MRMS. Center: Probability map as predicted by MetNet-2 . Right: Probability map as predicted by HREF.MetNet-2 is able to predict the onset of the storm (called convective initiation) earlier in the forecast than HREF as well as the storm’s starting location, whereas HREF misses the initiation location, but captures its growth phase well. |
Interpreting What MetNet-2 Learns About Weather
Because MetNet-2 does not use hand-crafted physical equations, its performance inspires a natural question: What kind of physical relations about the weather does it learn from the data during training? Using advanced interpretability tools, we further trace the impact of various input features on MetNet-2’s performance at different forecast timelines. Perhaps the most surprising finding is that MetNet-2 appears to emulate the physics described by Quasi-Geostrophic Theory, which is used as an effective approximation of large-scale weather phenomena. MetNet-2 was able to pick up on changes in the atmospheric forces, at the scale of a typical high- or low-pressure system (i.e., the synoptic scale), that bring about favorable conditions for precipitation, a key tenet of the theory.
Conclusion
MetNet-2 represents a step toward enabling a new modeling paradigm for weather forecasting that does not rely on hand-coding the physics of weather phenomena, but rather embraces end-to-end learning from observations to weather targets and parallel forecasting on low-precision hardware. Yet many challenges remain on the path to fully achieving this goal, including incorporating more raw data about the atmosphere directly (rather than using the pre-processed starting state from physical models), broadening the set of weather phenomena, increasing the lead time horizon to days and weeks, and widening the geographic coverage beyond the United States.
Acknowledgements
Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Jason Hickey, Aaron Bell, Marcin Andrychowicz, Amy McGovern, Rob Carver, Stephan Hoyer, Zack Ontiveros, Lak Lakshmanan, David McPeek, Ian Gonzalez, Claudio Martella, Samier Merchant, Fred Zyda, Daniel Furrer and Tom Small.
World’s Fastest Supercomputers Changing Fast
Modern computing workloads — including scientific simulations, visualization, data analytics, and machine learning — are pushing supercomputing centers, cloud providers and enterprises to rethink their computing architecture.
The processor or the network or the software optimizations alone can’t address the latest needs of researchers, engineers and data scientists. Instead, the data center is the new unit of computing, and organizations have to look at the full technology stack.
The latest rankings of the world’s most powerful systems show continued momentum for this full-stack approach in the latest generation of supercomputers.
NVIDIA technologies accelerate over 70 percent, or 355, of the systems on the TOP500 list released at the SC21 high performance computing conference this week, including over 90 percent of all new systems. That’s up from 342 systems, or 68 percent, of the machines on the TOP500 list released in June.
NVIDIA also continues to have a strong presence on the Green500 list of the most energy-efficient systems, powering 23 of the top 25 systems on the list, unchanged from June. On average, NVIDIA GPU-powered systems deliver 3.5x higher power efficiency than non-GPU systems on the list.
Highlighting the emergence of a new generation of cloud-native systems, Microsoft’s GPU-accelerated Azure supercomputer ranked 10th on the list, the first top 10 showing for a cloud-based system.
AI is revolutionizing scientific computing. The number of research papers leveraging HPC and machine learning has skyrocketed in recent years; growing from roughly 600 ML + HPC papers submitted in 2018 to nearly 5,000 in 2020.
The ongoing convergence of HPC and AI workloads is also underscored by new benchmarks such as HPL-AI and MLPerf HPC.
HPL-AI is an emerging benchmark of converged HPC and AI workloads that uses mixed-precision math — the basis of deep learning and many scientific and commercial jobs — while still delivering the full accuracy of double-precision math, which is the standard measuring stick for traditional HPC benchmarks.
And MLPerf HPC addresses a style of computing that speeds and augments simulations on supercomputers with AI, with the benchmark measuring performance on three key workloads for HPC centers: astrophysics (Cosmoflow), weather (Deepcam) and molecular dynamics (Opencatalyst).
NVIDIA addresses the full stack with GPU-accelerated processing, smart networking, GPU-optimized applications, and libraries that support the convergence of AI and HPC. This approach has supercharged workloads and enabled scientific breakthroughs.
Let’s look more closely at how NVIDIA is supercharging supercomputers.
Accelerated Computing
The combined power of the GPU’s parallel processing capabilities and over 2,500 GPU-optimized applications allows users to speed up their HPC jobs, in many cases from weeks to hours.
We’re constantly optimizing the CUDA-X libraries and the GPU-accelerated applications, so it’s not unusual for users to see an x-factor performance gain on the same GPU architecture.
As a result, the performance of the most widely used scientific applications — which we call the “golden suite” — has improved 16x over the past six years, with more advances on the way.
And to help users quickly take advantage of higher performance, we offer the latest versions of the AI and HPC software through containers from the NGC catalog. Users simply pull and run the application on their supercomputer, in the data center or the cloud.
Convergence of HPC and AI
The infusion of AI in HPC helps researchers speed up their simulations while achieving the accuracy they’d get with the traditional simulation approach.
That’s why an increasing number of researchers are taking advantage of AI to speed up their discoveries.
That includes four of the finalists for this year’s Gordon Bell prize, the most prestigious award in supercomputing. Organizations are racing to build exascale AI computers to support this new model, which combines HPC and AI.
That strength is underscored by relatively new benchmarks, such as HPL-AI and MLPerf HPC, highlighting the ongoing convergence of HPC and AI workloads.
To fuel this trend, last week NVIDIA announced a broad range of advanced new libraries and software development kits for HPC.
Graphs — a key data structure in modern data science — can now be projected into deep-neural network frameworks with Deep Graph Library, or DGL, a new Python package.
NVIDIA Modulus builds and trains physics-informed machine learning models that can learn and obey the laws of physics.
And NVIDIA introduced three new libraries:
- ReOpt – to increase operational efficiency for the $10 trillion logistics industry.
- cuQuantum – to accelerate quantum computing research.
- cuNumeric – to accelerate NumPy for scientists, data scientists, and machine learning and AI researchers in the Python community.
Weaving it all together is NVIDIA Omniverse — the company’s virtual world simulation and collaboration platform for 3D workflows.
Omniverse is used to simulate digital twins of warehouses, plants and factories, of physical and biological systems, of the 5G edge, robots, self-driving cars and even avatars.
Using Omniverse, NVIDIA announced last week that it will build a supercomputer, called Earth-2, devoted to predicting climate change by creating a digital twin of the planet.
Cloud-Native Supercomputing
As supercomputers take on more workloads across data analytics, AI, simulation and visualization, CPUs are stretched to support a growing number of communication tasks needed to operate large and complex systems.
Data processing units alleviate this stress by offloading some of these processes.
As a fully integrated data-center-on-a-chip platform, NVIDIA BlueField DPUs can offload and manage data center infrastructure tasks instead of making the host processor do the work, enabling stronger security and more efficient orchestration of the supercomputer.
Combined with NVIDIA Quantum InfiniBand platform, this architecture delivers optimal bare-metal performance while natively supporting multinode tenant isolation.
Thanks to a zero-trust approach, these new systems are also more secure.
BlueField DPUs isolate applications from infrastructure. NVIDIA DOCA 1.2 — the latest BlueField software platform — enables next-generation distributed firewalls and wider use of line-rate data encryption. And NVIDIA Morpheus, assuming an interloper is already inside the data center, uses deep learning-powered data science to detect intruder activities in real time.
And all of the trends outlined above will be accelerated by new networking technology.
NVIDIA Quantum-2, also announced last week, is a 400Gbps InfiniBand platform and consists of the Quantum-2 switch, the ConnectX-7 NIC, the BlueField-3 DPU, as well as new software for the new networking architecture.
NVIDIA Quantum-2 offers the benefits of bare-metal high performance and secure multi-tenancy, allowing the next generation of supercomputers to be secure, cloud-native and better utilized.
** Benchmark applications: Amber, Chroma, GROMACS, MILC, NAMD, PyTorch, Quantum Espresso; Random Forest FP32 , TensorFlow, VASP | GPU node: dual-socket CPUs with 4x P100, V100, or A100 GPUs.
The post World’s Fastest Supercomputers Changing Fast appeared first on The Official NVIDIA Blog.
Announcing conversational AI partner solutions
We are excited to announce the availability of AWS conversational AI partner solutions, which enables enterprises to implement high-quality, highly effective chatbot, virtual assistant, and Interactive Voice Response (IVR) solutions through the domain expertise of AWS Partners and AWS AI and machine learning (ML) services.
The partners highlighted in this announcement include Cation Consulting, Deloitte, NLX, Quantiphi, ServisBOT, TensorIoT, and XAPP AI. These partners use Amazon Kendra, Amazon Lex, and Amazon Polly, as well as additional AWS AI/ML services, in their solutions.
Overview of conversational AI
The demand for conversational AI (CAI) interfaces continues to grow as users prefer to interact with businesses on digital channels. Organizations of all sizes are developing chatbots, voice assistants, and IVR solutions to increase user satisfaction, reduce operational costs, and streamline business processes. COVID-19 has further accelerated the adoption, due to social distancing rules and shelter-in-place orders.
Conversational AI interfaces, like chatbots, voice assistants, and IVR, are used broadly across a wide variety of industry segments and use cases. CAI solutions go beyond customer service to additional use cases across industries — for example, triaging medical symptoms, booking an appointment, transferring money, or signing up for a new account.
Building a high-quality, highly effective CAI interface can be challenging for some, given the free-form nature of communications—users can say or write whatever they like in many different ways. Implementing a CAI solution involves selecting use cases, defining Intents and sample utterances, designing conversational flows, integrating backend services, and testing, monitoring, and measuring in an iterative approach.
We are moving past the days of simple question and answer services. Chatbots have advanced beyond simple decision trees to incorporate sophisticated natural language understanding (NLU) that not only understands a user’s Intent, but enables the chatbot to respond appropriately in a way that satisfies the user. We are seeing further advancements in CAI to cover true multimodal experiences, wherein users can interact via voice and text simultaneously.
Fortunately, there is help on this journey. The seven AWS Partners for CAI listed in this post have expertise in launching highly effective chatbot and voice experiences, built on top of AWS AI/ML services.
AWS conversational AI partner solutions
With AWS conversational AI partner solutions, enterprises can use the expertise of AWS Partners, using AWS AI/ML services, to build high-quality, highly effective chatbot and voice experiences. This enables you to increase user satisfaction, reduce operational costs, and achieve business goals—all while speeding up the time to market.
The following highlights our initial AWS Partners for conversational AI.
Cation Consulting, the maker of the Parly.ai platform, use natural language conversational AI to build multilingual, multi-channel solutions. Cation enables high-value customer interactions, at a lower cost, through enterprise chatbots and live chat with AI-powered agent-assist capabilities. Cation Consulting helped Ryanair build a chatbot that improves its customer support experience, helping customers find answers to their questions quickly and easily. They recently helped 123.ie implement a highly advanced chatbot for onboarding new accounts that incorporates image recognition and document processing to speed up the process by analyzing a customer’s driver’s license and previous policy documentation.
Deloitte, an AWS Premier Consulting Partner, is one of the leading service providers in the design, delivery, implementation, and scalability of high ROI conversational experiences across industries. Deloitte’s global CAI experts deliver solutions that are highly personalized, context-aware, and designed to serve users any time, any place, and on any device. Deloitte has implemented CAI solutions for Fortune 500 enterprises across the financial services, hospitality, telecommunications, pharmaceutical, healthcare, and utility spaces.
Conversations by NLX enables companies to transform customer contact into personalized customer self-service. The platform enables non-technical users to build and manage chat, voice, and multimodal conversational experiences, helping brands track and elevate customer self-service into a strategic asset. NLX customers include a global drink manufacturer, a leading international airline, and more.
Quantiphi uses in-house accelerators built on top of AWS AI/ML services to implement conversational AI solutions. Quantiphi utilizes its expertise in CAI to design and implement complex conversation flows, derive insights, and develop dashboards showcasing impactful key performance indicators. Quantiphi has implemented CAI solutions for a major healthcare provider, a national utilities firm, and more.
ServisBOT’s conversational AI platform enables businesses to build and manage self-service chatbots and voice assistants, faster and easier. ServisBOT provides tools for building and optimizing advanced solutions, including covering multi-bot environments, security, backend integrations, and analytics. The platform also offers low-code tooling, blueprints, and reusable components for business users. ServisBOT’s customers include a global financial services corporation, a global insurance firm, a government agency, and more.
TensorIoT delivers industry-leading conversational AI solutions, including Alexa apps, Amazon Lex chatbots, and voice applications on the edge and in the cloud. Whether users interact via phone, web, or SMS, TensorIoT has expertise developing end-to-end solutions. TensorIoT helped Citibot develop its CAI platform for government use cases. Additional customers include a global credit reporting company, a national music retailer, and more.
XAPP AI’s Optimal Conversation Studio empowers enterprises to create intelligent virtual assistants for voice and chat channels. OC Studio provides model, dialog and content management, regression testing, and human-in-the-loop workflows to enable continuous CAI transformation, and thereby a more efficient and effective conversational experience. XAPP AI’s Optimal Conversation Studio platform powers over 1,200 conversational AI solutions for over 100 customers across consumer packaged goods, automotive, insurance, media, retail, education, nonprofit, and government spaces.
Getting Started
Our AWS Partners for conversational AI help make AWS one of the best places for all your CAI workloads. Learn more about conversational AI use cases at AWS or explore our AWS conversational AI partners for more information. Contact your AWS account manager for additional information or questions, including how to become an AWS Partner.
About the Authors
Arte Merritt leads partnerships for Contact Center Intelligence and Conversational AI. He is a frequent author and speaker in the conversational AI space. He was the co-founder and CEO of the leading analytics platform for conversational interfaces, leading the company to 20,000 customers, 90B messages, and multiple acquisition offers. Previously he founded Motally, a mobile analytics platform he sold to Nokia. Arte has more than 20 years experience in big data analytics. Arte is an MIT alum.
Design a compelling record filtering method with Amazon SageMaker Model Monitor
As artificial intelligence (AI) and machine learning (ML) technologies continue to proliferate, using ML models plays a crucial role in converting the insights from data into actual business impacts. Operational ML means streamlining every step of the ML lifecycle and deploying the best models within the existing production system. And within that production system, the models may interact with various processes, such as testing, performance tuning of IT resources, and monitoring strategy and operations.
One common pitfall is a lack of model performance monitoring and proper model retraining and updating, which could adversely affect business. Nearly continuous model monitoring can provide information on how the model is performing in production. The monitoring outputs are used to identify the problems proactively and take corrective actions, such as model retraining and updating, to help stabilize the model in production. However, in a real-world production setting, multiple personas may interact with the model, including real users, engineers who are troubleshooting production issues, or bots conducting performance tests. When inference requests are made for testing purposes at the production endpoint, it may cause false positive detection of violations for the model monitor. To avoid this, we must filter out the test records from the calculation of model monitoring metrics.
Amazon SageMaker is a fully managed service that enables developers and data scientists to build, train, and deploy ML models quickly and easily at any scale. After you train an ML model, you can deploy it on SageMaker endpoints that are fully managed and serve inferences in real time with low latency. After you deploy your model, you can use Amazon SageMaker Model Monitor to monitor your ML model’s quality continuously in real time. You can also configure alerts to notify and initiate actions if any drift in model performance is observed. Early detection of these deviations enables you to take corrective actions, such as collecting new training data, retraining models, and auditing upstream systems without manually monitoring models or building additional tooling.
In this post, we present how to build a record filtering method based on sets of business criteria as part of the preprocessing step in Model Monitor. The goal is to ensure that only the actual production records are sent to Model Monitor for analysis, reflecting the actual usage of the production endpoint.
Solution overview
The following diagram illustrates the high-level workflow of record filtering using a preprocessor script with Model Monitor.
The workflow includes the following steps:
- The Model Artifact Amazon Simple Storage Service (Amazon S3) bucket contains model.tar.gz, the XGBoost churn prediction model pretrained on the publicly available dataset mentioned in Discovering Knowledge in Data by Daniel T. Laros. For more information about how this model artifact was trained offline, see the Customer Churn Prediction with XGBoost notebook example on GitHub.
- The model is deployed to an inference endpoint with data capture enabled.
- Different personas send model prediction request traffic to the endpoint.
- The Data Capture bucket stores capture data from requests and responses.
- The Validation Dataset bucket contains the validation dataset required to create a baseline from a validation dataset in Model Monitor.
- The Baselining bucket stores the output files for dataset statistics and constraints from Model Monitor’s baselining job.
- The Code bucket contains a custom preprocessor script for Model Monitor.
- Model Monitor data quality initializes a monitoring job.
- The Results bucket contains outputs of the monitoring job, including statistics, constraints, and a violations report.
Prerequisites
To implement this solution, you must have the following prerequisites:
- Python 3.7 or greater
- Amazon SageMaker Studio
- Amazon SageMaker Python 3 (Data Science) kernels
- AmazonSageMakerFullAccess policy (you can further restrict this to least privileges based on your use case)
Set up the environment
To set up your environment, complete the following steps:
- Launch Studio from the AWS Management Console.
If you haven’t created Studio in your account yet, you can manually create one by following Onboard to Amazon SageMaker Studio Using Quick Start. Alternatively, you can use an AWS CloudFormation template (see Creating Amazon SageMaker Studio domains and user profiles using AWS CloudFormation), which automates the creation of Studio in your account.
- On the File menu, choose Terminal to launch a new terminal within Studio.
- Clone the GitHub repo in the terminal:
cd ~ && git clone https://github.com/aws-samples/amazon-sagemaker-data-quality-monitor-custom-preprocessing.git
- Navigate to the directory
amazon-sagemaker-data-quality-monitor-custom-preprocessing
in Studio. - Open
Data_Quality_Custom_Preprocess_Churn.ipynb
. - Select Data Science Kernel and ml.t3.medium as an instance type to host the notebook to get started.
The rest of this post dives into a notebook with the various steps involved in designing and testing filtering records using a preprocessor with Model Monitor. We use a pretrained and deployed XGBoost churn prediction model. For detailed notebooks on other Model Monitor capabilities, see the model quality explainability notebook examples on GitHub. Beyond the steps discussed in this post, there are other steps necessary to import libraries and set up AWS Identity and Access Management (IAM) permissions. You can start with the README, which has a more detailed explanation of each step. Alternatively, you can go directly to the code and walk through with the notebook that you cloned in Studio.
Deploy the pretrained XGBoost model with script mode
First, we upload the pretrained model artifacts to Amazon S3 for deployment:
model_path = 'model'
model_filename = 'model.tar.gz'
model_upload_uri = f's3://{bucket}/{prefix}/{model_path}'
local_model_path = f"./model/{model_filename}"
print(f"model s3 location: {model_upload_uri} n")
if is_upload_model:
S3Uploader.upload(
local_path=local_model_path,
desired_s3_uri=model_upload_uri
)
else: print("skip")
Because the model was trained offline using XGBoost, we use XGBoostModel
from the SageMaker SDK to deploy the model. We provide the inference entry point in the source directory because we have a custom input parser for JSON requests. We also need to ensure that Flask Response
is returned to match both input and output content types exactly. It is a necessary step for Model Monitor to work for the image running Gunicorn/Flask. The content type of output data captured by Model Monitor, which only works with CSV or JSON, is Base64 by default unless Response()
explicitly converts it to a specific type. The following are the custom input_fn
and output_fn
. Currently, the implementation is for a single JSON record, but you can easily extend it to multiple records for batch processing.
def input_fn(request_body, request_content_type):
if request_content_type == "text/libsvm":
return xgb_encoders.libsvm_to_dmatrix(request_body)
elif request_content_type == "text/csv":
return xgb_encoders.csv_to_dmatrix(request_body.rstrip("n"))
elif request_content_type == "application/json":
request = json.loads(request_body)
feature = ",".join(request.values())
return xgb_encoders.csv_to_dmatrix(feature.rstrip("n"))
else:
raise ValueError("Content type {} is not supported.".format(request_content_type))
def output_fn(predictions, content_type):
if content_type == "text/csv":
result = ",".join(str(x) for x in predictions)
return Response(result, mimetype=content_type)
elif content_type == "application/json":
result = json.dumps(predictions.tolist())
return Response(result, mimetype=content_type)
else:
raise ValueError("Content type {} is not supported.".format(content_type))
To enable data capture for monitoring the model data quality, you can specify the options such as enable_capture
, sampling_percentage
, and destination_s3_uri
in the DataCaptureConfig
object when deploying to an endpoint. For example, unless you expect your endpoint to have high traffic or require a down-sample, you can capture all incoming records by providing 100% in sampling percentage. More information on DataCaptureConfig
can be found in the Model Monitor documentation. In the following code, we specify the SageMaker XGBoost model framework version and provide a path for an entry inference script that we reviewed previously:
if is_create_new_ep:
## Configure the Data Capture
data_capture_config = DataCaptureConfig(
enable_capture=True,
sampling_percentage=100,
destination_s3_uri=s3_capture_upload_path
)
current_endpoint_name = f'{ep_prefix}-{datetime.now():%Y-%m-%d-%H-%M}'
print(f"Create a Endpoint: {current_endpoint_name}")
xgb_inference_model = XGBoostModel(
model_data=f'{model_upload_uri}/{model_filename}',
role=role,
entry_point="./src/inference.py",
framework_version="1.2-1")
predictor = xgb_inference_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.2xlarge",
endpoint_name=current_endpoint_name,
data_capture_config=data_capture_config,
tags = tags,
wait=True)
elif not(current_endpoint_name):
current_endpoint_name = all_demo_eps[0]
print(f"Use existing endpoint: {current_endpoint_name}")
else: print(f"Use selected endpoint: {current_endpoint_name}")
After we confirm that the model has been deployed, we can move on to the next step to review the implementation of the filtering mechanism in the preprocessing script for Model Monitor.
Implement a filtering mechanism in the preprocessor script
As previously discussed, we want to exclude test inference records from downstream monitoring reports. You can implement a rule-based filtering mechanism by parsing metadata provided in CustomAttributes
in a request header. The following code illustrates how to send custom attributes as key-value pairs using the Boto3 SageMaker Runtime client:
response = runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType='application/json',
Body=json.dumps(payload),
CustomAttributes=json.dumps({
"testIndicator": testIndicator,
"applicationName":"DEMO",
"transactionId": transactionId}))
We recommend using CustomAttributes
to send the required metadata for simplicity. You can optionally choose to include metadata as part of inference records as long as your entry point inference reflects the change and extraction of input features in input records doesn’t break. Next, we review a provided preprocessor script that contains a filtering mechanism.
As illustrated in the following code, we extend the built-in mechanisms of Model Monitor by providing a custom preprocessor function. First, we extract testIndicator
information from custom attributes and use this information to set the is_test
variable to either True
, when it’s a test record, or False
otherwise. If we want to skip test records without breaking a monitor job, we can return []
to indicate that the object is an empty set of rows. Note that returning {}
results in an error because it’s considered to be an object having an empty row, which SageMaker doesn’t expect.
Moreover, we convert the probability of model output into an integer type for non-test records. This step ensures that the data type is consistent with that of the ground truth label in the validation dataset. We demonstrate in following sections how this step can help you avoid false positive violations in monitoring. Model quality monitoring has its native way of handling the conversion, but this workaround is necessary for data quality monitoring.
Next, we insert the output as the first item into input features, ensuring that the columns’ number and order match exactly with the validation dataset. Although monitoring model output may seem unnecessary for data quality monitoring, we recommend not skipping this step because other types of monitoring may depend on that information to be provided. Finally, the function returns a key-value pair with zero-padded index numbers and corresponding output and input features. This is done to avoid any misalignment of input features caused by sorting of column names by Spark processing. Note that 20
is a magic number because 10**20
is large enough to cover numbers of feature columns in most cases.
Finally, SageMaker applies preprocessing for each row and aggregates the results on your behalf. If you have multiple inference records in a single inference request like mini-batch, you need to consider it in your code beyond the sample code we provide. At the time of writing this post, the preprocessing step in Model Monitor doesn’t publish any logs to Amazon CloudWatch, although this may change in the future. If you need to debug your custom preprocessing script, you may want to write and save your logs inside the container under the directory /opt/ml/processing/output/
so that you can access it later in your S3 bucket.
def preprocess_handler(inference_record):
input_enc_type = inference_record.endpoint_input.encoding
input_data = inference_record.endpoint_input.data.rstrip("n")
output_data = get_class_val(inference_record.endpoint_output.data.rstrip("n"))
eventmedatadata = inference_record.event_metadata
custom_attribute = json.loads(eventmedatadata.custom_attribute[0]) if eventmedatadata.custom_attribute is not None else None
is_test = eval_test_indicator(custom_attribute) if custom_attribute is not None else True
if is_test:
return []
elif input_enc_type == "CSV":
outputs = output_data+','+input_data
return {str(i).zfill(20) : d for i, d in enumerate(outputs.split(","))}
elif input_enc_type == "JSON":
outputs = {**{LABEL: output_data}, **json.loads(input_data)}
write_to_file(str(outputs), "log")
return {str(i).zfill(20) : outputs[d] for i, d in enumerate(outputs)}
else:
raise ValueError(f"encoding type {input_enc_type} is not supported")
Now that we have reviewed how the preprocessing mechanism is implemented, we upload the script to the Amazon S3 location using the following code:
preprocessor_filename = 'preprocessor.py'
local_path_preprocessor = f"src/{preprocessor_filename}"
s3_record_preprocessor_uri = f's3://{bucket}/{prefix}/code'
if is_upload_preprocess_script:
S3Uploader.upload(
local_path=local_path_preprocessor,
desired_s3_uri=s3_record_preprocessor_uri)
else: print("skip")
We can now move on to the next step: creating a monitor schedule.
Create a Model Monitor schedule (data quality only)
Continuous model monitoring involves scheduled analysis of incoming inference records and the creation of metrics relative to baseline metrics. The SageMaker SDK simplifies generating a set of constraints and summary statistics that describes the constraints as a reference. We upload the validation dataset with a column header and ground truth label to Amazon S3, which was used for offline training as a suitable baseline dataset. Decisions around whether to include a ground truth label in the baseline dataset depend on your use case and preference, because a data quality monitor certainly works without ground truth label data. Note that if you exclude ground truth here, you need to exclude inferences from monitoring similarly.
validation_filename = 'validation-dataset-with-header.csv'
local_validation_data_path = f"data/{validation_filename}"
s3_validation_data_uri = f's3://{bucket}/{prefix}/baselining'
if is_upload_validation_data:
S3Uploader.upload(
local_path=local_validation_data_path,
desired_s3_uri=s3_validation_data_uri
)
else: print("skip")
After confirming that the baseline dataset is uploaded to Amazon S3, we create baseline constraints, statistics, and a Model Monitor schedule for the deployed endpoint in one step using a custom wrapper class, DemoDataQualityModelMonitor
. Under the hood, the DefaultModelMonitor.suggest_baseline
method initiates a processing job with a managed Model Monitor container with Apache Spark and the AWS Deequ library to generate the constraints and statistics as a baseline. After the baselining job is complete, the DefaultModelMonitor.create_monitoring_schedule
method creates a monitor schedule.
demo_mon = DemoDataQualityModelMonitor(
endpoint_name=current_endpoint_name,
bucket=bucket,
projectfolder_prefix=prefix,
training_dataset_path=f'{s3_validation_data_uri}/{validation_filename}',
record_preprocessor_script=f'{s3_record_preprocessor_uri}/{preprocessor_filename}',
post_analytics_processor_script=None,
kms_key=None,
subnets=None,
security_group_ids=None,
role=role,
tags=tags)
my_monitor = demo_mon.create_data_quality_monitor()
After monitor schedule creation is complete, we can move on to the final step, which is functional testing of the implemented filter with artificial payloads.
Test scenarios
We can test the following two scenarios to confirm that the filtering is working as expected. The first scheduled monitor run isn’t initialized until at least an hour after creating the schedule, so you can either wait or manually start a monitoring job using preprocessing. We use the latter approach for convenience. Fortunately, a utility tool already exists for this purpose and is available in this GitHub repo. We also provided a wrapper method, ArtificialTraffic.generate_artificial_traffic
. You can pass column names and predefined static methods to populate bogus inputs and monotonically increase transactionId
each time the endpoint is invoked.
First scenario
Our first test scenario includes the following steps:
- Send a record that we know won’t create any violations. To do this, you can use a method,
generate_artificial_traffic
, and set the config variable toempty list
. Also, set thetestIndicator
in custom attributes to’false'
to indicate that it’s not a test record. This is illustrated in the following code:
artificial_traffic = ArtificialTraffic(
endpointName = current_endpoint_name
)
# normal payload -it should not cause any violations
artificial_traffic.generate_artificial_traffic(
applicationName = "DEMO",
testIndicator = "false",
payload=payload,
size=1,
config=[])
- Send another record that creates a violation. This time, we pass a set of dictionaries in the config variable to create bogus input features. We also set
testIndicator
to’true'
to skip this record for the analysis. The following code is provided:
sample_config= {'config': [
{'source': 'Day Calls',
'function_name': 'random_gaussian',
'params': [100, 100]},
{'source': 'Day Mins',
'function_name': 'random_gaussian',
'params': [100, 100]},
{'source': 'Account Length',
'function_name': 'random_int',
'params': [0, 1000]},
{'source': 'VMail Message',
'function_name': 'random_int',
'params': [0, 10000]},
{'source': 'State_AK',
'function_name': 'random_bit',
'params': []}]}
## this would cause violations but testIndicaor is set to true so analysis will be skipped and hence no violations
artificial_traffic.generate_artificial_traffic(
applicationName="DEMO",
testIndicator="true",
payload=payload,
size=1,
config=sample_config['config'])
- Manually start a monitor job using the
run_model_monitor_job_processor
method from the imported utility class and provide parameters such as Amazon S3 locations for baseline files, data capture, and a preprocessor script:
run_model_monitor_job_processor(
region,
'ml.m5.xlarge',
role,
data_capture_path_scenario_1,
s3_statistics_uri,
s3_constraints_uri,
s3_reports_path+'/scenario_1',
preprocessor_path=s3_record_preprocessor_uri)
- In the Model Monitor outputs, confirm that
constraint_violations.json
showsviolations: [] 0 items
and“dataset: item_count:”
instatistics.json
shows1
, instead of2
.
This confirms that Model Monitor has analyzed only the non-test record.
Second scenario
For our second test, complete the following steps:
- Send N records that we know that creates violations, such as data_type_check and baseline_drift_check. Set the
testIndicator
in custom attributes to“false”
. The following code illustrates this:artificial_traffic.generate_artificial_traffic( applicationName="DEMO", testIndicator="false", payload=payload, size=1000, config=sample_config['config'])
- In the Model Monitor outputs, confirm that
constraint_violations.json
shows more than one violation item and“dataset: item_count:”
instatistics.json
shows greater than1000
. An extra item is a carry-over from the first scenario testing.
This confirms that sending test records as inference records creates false positive violations if testIndicator
isn’t set correctly.
Clean up
We can delete the Model Monitor schedule and endpoint we created earlier. You could wait until the first monitor schedule starts; the result should be similar to what we confirmed from testing. You could also experiment with other testing scenarios. When you’re done, run the following code to delete the monitoring schedule and endpoint:
my_monitor.delete_monitoring_schedule()
sm.delete_endpoint(EndpointName=current_endpoint_name)
Don’t forget to shut down resources by stopping running instances and apps to avoid incurring charges from SageMaker.
Conclusion
Model Monitor is a powerful tool that lets organizations quickly adopt continuous model monitoring and monitoring strategy for ML. This post discusses how you can use a preprocessing mechanism to design a filter for inference records based on sets of business criteria to ensure that your testing infrastructure doesn’t pollute production data. The notebook included in this post provides an example of a custom preprocessor script that you can extend for different use cases quickly.
To get started with Amazon Sagemaker Model Monitor, check out the following resources:
- Visit the Amazon SageMaker service page to learn more.
- Please send us feedback, either on the AWS forum for Amazon SageMaker, or through your AWS support contacts.
- Find other Amazon SageMaker Model Monitor examples in our GitHub repository
About the Authors
Kenny Sato is a Data and Machine Learning Engineer at AWS Professional Services, guiding customers on architecting and implementing machine learning solutions. He received his master’s in Computer Engineering from Virginia Tech. In his spare time, you can find him in his backyard, or out somewhere playing with his lovely daughters.
Hemanth Boinpally is a Machine Learning Engineer at AWS Professional Services, guiding customers on building and architecting AI/ML solutions. He received his bachelor’s and master’s in Computer Science. In his spare time, you can find him listening to podcasts or playing sports.
David Nigenda is a Senior Software Development Engineer on the Amazon SageMaker team, currently working on improving production machine learning workflows, as well as launching new inference features. In his spare time, he tries to keep up with his kids.
3D Hand Pose with MediaPipe and TensorFlow.js
Posted by Valentin Bazarevsky, Ivan Grishchenko, Eduard Gabriel Bazavan, Andrei Zanfir, Mihai Zanfir, Jiuqiang Tang, Jason Mayes, Ahmed Sabie, Google
Today, we’re excited to share a new version of our model for hand pose detection, with improved accuracy for 2D, novel support for 3D, and the new ability to predict keypoints on both hands simultaneously. Support for multi-hand tracking was one of the most common requests from the developer community, and we’re pleased to support it in this release.
You can try a live demo of the new model here. This work improves on our previous model which predicted 21 keypoints, but could only detect a single hand at a time. In this article, we’ll describe the new model, and how you can get started.
The new hand pose detection model in action. |
How to use it
1. The first step is to import the library. You can either use the <script> tag in your html file or use NPM:
Through script tag:
<script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/hand-pose-detection">>/script>
<!-- Optional: Include below scripts if you want to use MediaPipe runtime. -->
<script src="https://cdn.jsdelivr.net/npm/@mediapipe/hands"> </script >
Through NPM:
yarn add @tensorflow-models/hand-pose-detection
# Run below commands if you want to use TF.js runtime.
yarn add @tensorflow/tfjs-core @tensorflow/tfjs-converter
yarn add @tensorflow/tfjs-backend-webgl
# Run below commands if you want to use MediaPipe runtime.
yarn add @mediapipe/hands
If installed through NPM, you need to import the libraries first:
import * as handPoseDetection from '@tensorflow-models/hand-pose-detection';
Next create an instance of the detector:
const model = handPoseDetection.SupportedModels.MediaPipeHands;
const detectorConfig = {
runtime: 'mediapipe', // or 'tfjs'
modelType: 'full'
};
detector = await handPoseDetection.createDetector(model, detectorConfig);
Choose a modelType that fits your application needs, there are two options for you to choose from: lite
, and full
. From lite
to full
, the accuracy increases while the inference speed decreases.
2. Once you have a detector, you can pass in a video stream or static image to detect poses:
const video = document.getElementById('video');
const hands = await detector.estimateHands(video);
The output format is as follows: hands
represent an array of detected hand
predictions in the image frame. For each hand, the structure contains a prediction of the handedness (left or right) as well as a confidence score of this prediction. An array of 2D keypoints is also returned, where each keypoint contains x, y, and name. The x, y denotes the horizontal and vertical position of the hand keypoint in the image pixel space, and name denotes the joint label. In addition to 2D keypoints, we also return 3D keypoints (x, y, z values) in a metric scale, with the origin in auxiliary keypoint formed as an average between the first knuckles of index, middle, ring and pinky fingers.
[
{
score: 0.8,
Handedness: 'Right',
keypoints: [
{x: 105, y: 107, name: "wrist"},
{x: 108, y: 160, name: "pinky_finger_tip"},
...
]
keypoints3D: [
{x: 0.00388, y: -0.0205, z: 0.0217, name: "wrist"},
{x: -0.025138, y: -0.0255, z: -0.0051, name: "pinky_finger_tip"},
...
]
}
]
You can refer to our README for more details about the API.
Model deep dive
The updated version of our hand pose detection API improves the quality for 2D keypoint prediction, handedness (classification output whether it is left or right hand), and minimizes the number of false positive detections. More details about the updated model can be found in our recent paper: On-device Real-time Hand Gesture Recognition.
Following our recently released BlazePose GHUM 3D in TensorFlow.js, we also added metric-scale 3D keypoint prediction to hand pose detection in this release, with the origin being represented by an auxiliary keypoint, formed as a mean of first knuckles for index, middle, ring and pinky fingers. Our 3D ground truth is based on a statistical 3D human body model called GHUM, which is built using a large corpus of human shapes and motions.
To obtain hand pose ground truth, we fitted the GHUM hand model to our existing 2D hand dataset and recovered real world 3D keypoint coordinates. The shape and the hand pose variables of the GHUM hand model were optimized such that the reconstructed model aligns with the image evidence. This includes 2D keypoint alignment, shape, and pose regularization terms as well as anthropometric joint angle limits and model self contact penalties.
Sample GHUM hand fittings for hand images with 2D keypoint annotations overlaid. The data was used to train and test a variety of poses leading to better results for more extreme poses. |
Model quality
In this new release, we substantially improved the quality of models, and evaluated them on a dataset of American Sign Language (ASL) gestures. As evaluation metric for 2D screen coordinates, we used Mean Average Precision (mAP) suggested by the COCO keypoint challenge methodology.
Hand model evaluation on American Sign Language dataset |
For 3D evaluation we used Mean Absolute Error in Euclidean 3D metric space, with the average error measured in centimeters.
Model Name |
2D, mAP, % |
3D, mean 3D error, cm |
HandPose GHUM Lite |
79.2 |
1.4 |
HandPose GHUM Full |
83.8 |
1.3 |
Previous TensorFlow.js HandPose |
66.5 |
N/A |
Browser performance
We benchmark the model across multiple devices. All the benchmarks are tested with two hands presented.
MacBook Pro 15” 2019. Intel core i9. AMD Radeon Pro Vega 20 Graphics. (FPS) |
iPhone 11 (FPS) |
Pixel 5 (FPS) |
Desktop Intel i9-10900K. Nvidia GTX 1070 GPU. (FPS) |
|
MediaPipe Runtime With WASM & GPU Accel. |
62 | 48 |
8 | 5 |
19 | 15 |
136 | 120 |
TensorFlow.js Runtime |
36 | 31 |
15 | 12 |
11 | 8 |
42 | 35 |
To see the model’s FPS on your device, try our demo. You can switch the model type and runtime live in the demo UI to see what works best for your device.
Cross platform availability
In addition to the JavaScript hand pose detection API, these updated hand models are also available in MediaPipe Hands as a ready-to-use Android Solution API and Python Solution API, with prebuilt packages in Android Maven Repository and Python PyPI respectively.
For instance, for Android developers the Maven package can be easily integrated into an Android Studio project by adding the following into the project’s Gradle dependencies:
dependencies {
implementation 'com.google.mediapipe:solution-core:latest.release'
implementation 'com.google.mediapipe:hands:latest.release'
}
The MediaPipe Android Solution is designed to handle different use scenarios such as processing live camera feeds, video files, as well as static images. It also comes with utilities to facilitate overlaying the output landmarks onto either CPU images (with Canvas) or GPU (using OpenGL). For instance, the following code snippet demonstrates how it can be used to process a live camera feed and render the output on screen in real-time:
// Creates MediaPipe Hands.
HandsOptions handsOptions =
HandsOptions.builder()
.setModelComplexity(1)
.setMaxNumHands(2)
.setRunOnGpu(true)
.build();
Hands hands = new Hands(activity, handsOptions);
// Connects MediaPipe Hands to camera.
CameraInput cameraInput = new CameraInput(activity);
cameraInput.setNewFrameListener(textureFrame -> hands.send(textureFrame));
// Registers a result listener.
hands.setResultListener(
handsResult -> {
handsView.setRenderData(handsResult);
handsView.requestRender();
})
// Starts the camera to feed data to MediaPipe Hands.
handsView.post(this::startCamera);
To learn more about MediaPipe Android Solutions, please refer to our documentation and try them out with the example Android Studio project. Also visit MediaPipe Solutions for more cross-platform solutions.
Acknowledgements
We would like to acknowledge our colleagues who participated in or sponsored creating HandPose GHUM 3D and building the APIs: Cristian Sminchisescu, Michael Hays, Na Li, Ping Yu, George Sung, Jonathan Baccash, Esha Uboweja, David Tian, Kanstantsin Sokal, Gregory Karpiak, Tyler Mullen, Chuo-Ling Chang, Matthias Grundmann.
Siemens Energy Taps NVIDIA to Develop Industrial Digital Twin of Power Plant in Omniverse
Siemens Energy, a leading supplier of power plant technology in the trillion-dollar worldwide energy market, is relying on the NVIDIA Omniverse platform to create digital twins to support predictive maintenance of power plants.
In doing so, Siemens Energy joins a wave of companies across various industries that are using digital twins to enhance their operations. Among them, BMW Group, which has 31 factories around the world, is building multiple industrial digital twins of its operations; and Ericsson is adopting Omniverse to build digital twins of urban areas to help determine how to construct 5G networks.
Indeed, the worldwide market for digital twin platforms is forecast to reach $86 billion by 2028, according to Grand View Research.
“NVIDIA’s open platforms along with physics-infused neural networks bring great value to Siemens Energy,” said Stefan Lichtenberger, technical portfolio manager at Siemens Energy.
Siemens Energy builds and services combined cycle power plants, which include large gas turbines and steam turbines. Heat recovery steam generators (HRSGs) use the exhaust heat from the gas turbine to create steam used to drive the steam turbine. This improves the thermodynamic efficiency of the power plant by more than 60 percent, according to Siemens Energy.
At some sections of an HRSG, a steam and water mixture can cause corrosion that might impact the lifetime of the HRSG’s parts. Downtime for maintenance and repairs leads to lost revenue opportunities for utility companies.
Siemens Energy estimates that a 10 percent reduction in the industry’s average planned downtime of 5.5 days for HRSGs — required among others to check wall loss thickness of pipes due to corrosion — would save $1.7 billion a year.
Simulations for Industrial Applications
Siemens Energy is enlisting NVIDIA technology to develop a new workflow to reduce the frequency of planned shutdowns while maintaining safety. Real-time data — water inlet temperature, pressure, pH, gas turbine power and temperature — is preprocessed to compute pressure, temperature and velocity of both water and steam. The pressure, temperature and velocity are fed into a physics-ML model created with the NVIDIA Modulus framework to simulate precisely how steam and water flow through the pipes in real time.
The flow conditions in the pipes are then visualized with NVIDIA Omniverse, a virtual world simulation and collaboration platform for 3D workflows. Omniverse scales across multi-GPUs to help Siemens Energy understand and predict the aggregated effects of corrosion in real time.
Accelerating Digital Twin Development
Using NVIDIA software frameworks, running on NVIDIA A100 Tensor Core GPUs, Siemens Energy is simulating the corrosive effects of heat, water and other conditions on metal over time to fine-tune maintenance needs. Predicting maintenance more accurately with machine learning models can help reduce the frequency of maintenance checks without running the risk of failure. The scaled Modulus PINN model was run on AWS Elastic Kubernetes Service (EKS) backed by P4d EC2 instances with A100 GPUs.
Building computational fluid dynamics models for each HRSG, takes as long as eight weeks each to estimate corrosion within pipes at HRSGs plants. This process is required for a portfolio of more than 600 units. Faster workflow using NVIDIA technologies can enable Siemens Energy to accelerate corrosion estimation from weeks to hours.
NVIDIA Omniverse provides a highly scalable platform that lets Siemens Energy replicate and deploy digital twins worldwide, accessing potentially thousands of NVIDIA GPUs as needed.
“NVIDIA’s work as the pioneer in accelerated computing, AI software platforms and simulation offer the scale and flexibility needed for industrial digital twins at Siemen Energy,” said Lichtenberger.
Learn more about Omniverse for virtual simulations and digital twins.
The post Siemens Energy Taps NVIDIA to Develop Industrial Digital Twin of Power Plant in Omniverse appeared first on The Official NVIDIA Blog.
Gordon Bell Finalists Fight COVID, Advance Science With NVIDIA Technologies
Two simulations of a billion atoms, two fresh insights into how the SARS-CoV-2 virus works, and a new AI model to speed drug discovery.
Those are results from finalists for Gordon Bell awards, considered a Nobel prize in high performance computing. They used AI, accelerated computing or both to advance science with NVIDIA’s technologies.
A finalist for the special prize for COVID-19 research used AI to link multiple simulations, showing at a new level of clarity how the virus replicates inside a host.
The research — led by Arvind Ramanathan, a computational biologist at the Argonne National Laboratory — provides a way to improve the resolution of traditional tools used to explore protein structures. That could provide fresh insights into ways to arrest the spread of a virus.
The team, drawn from a dozen organizations in the U.S. and the U.K., designed a workflow that ran across systems including Perlmutter, an NVIDIA A100-powered system, built by Hewlett Packard Enterprise, and Argonne’s NVIDIA DGX A100 systems.
“The capability to perform multisite data analysis and simulations for integrative biology will be invaluable for making use of large experimental data that are difficult to transfer,” the paper said.
As part of its work, the team developed a technique to speed molecular dynamics research using the popular NAMD program on GPUs. They also leveraged NVIDIA NVLink to speed data “far beyond what is currently possible with a conventional HPC network interconnect, or … PCIe transfers.”
A Billion Atoms in High Fidelity
Ivan Oleynik, a professor of physics at the University of South Florida, led a team named a finalist for the standard Gordon Bell award for their work producing the first highly accurate simulation of a billion atoms. It broke by 23x a record set by a Gordon Bell winner last year.
“It’s a joy to uncover phenomena never seen before, it’s a really big achievement we’re proud of,” said Oleynik.
The simulation of carbon atoms under extreme temperature and pressure could open doors to new energy sources and help describe the makeup of distant planets. It’s especially stunning because the simulation has quantum-level accuracy, faithfully reflecting the forces among the atoms.
“It’s accuracy we could only achieve by applying machine learning techniques on a powerful GPU supercomputer — AI is creating a revolution in how science is done,” said Oleynik.
The team exercised 4,608 IBM Power AC922 servers and 27,900 NVIDIA GPUs on the U.S. Department of Energy’s Summit supercomputer, built by IBM, one of the world’s most powerful supercomputers. It demonstrated their code could scale with almost 100-percent efficiency to simulations of 20 billion atoms or more.
That code is available to any researcher who wants to push the boundaries of materials science.
Inside a Deadly Droplet
In another billion-atom simulation, a second finalist for the COVID-19 prize showed the Delta variant in an airborne droplet (below). It reveals biological forces that spread COVID and other diseases, providing a first atomic-level look at aerosols.
The work has “far reaching … implications for viral binding in the deep lung, and for the study of other airborne pathogens,” according to the paper from a team led by last year’s winner of the special prize, researcher Rommie Amaro from the University of California San Diego.
“We demonstrate how AI coupled to HPC at multiple levels can result in significantly improved effective performance, enabling new ways to understand and interrogate complex biological systems,” Amaro said.
Researchers used NVIDIA GPUs on Summit, the Longhorn supercomputer built by Dell Technologies for the Texas Advanced Computing Center and commercial systems in Oracle Cloud Infrastructure (OCI).
“HPC and cloud resources can be used to significantly drive down time-to-solution for major scientific efforts as well as connect researchers and greatly enable complex collaborative interactions,” the team concluded.
The Language of Drug Discovery
Finalists for the COVID prize at Oak Ridge National Laboratory (ORNL) applied natural language processing (NLP) to the problem of screening chemical compounds for new drugs.
They used a dataset containing 9.6 billion molecules — the largest dataset applied to this task to date — to train in two hours a BERT NLP model that can speed discovery of new drugs. Previous best efforts took four days to train a model using a dataset with 1.1 billion molecules.
The work exercised more than 24,000 NVIDIA GPUs on the Summit supercomputer to deliver a whopping 603 petaflops. Now that the training is done, the model can run on a single GPU to help researchers find chemical compounds that could inhibit COVID and other diseases.
“We have collaborators here who want to apply the model to cancer signaling pathways,” said Jens Glaser, a computational scientist at ORNL.
“We’re just scratching the surface of training data sizes — we hope to use a trillion molecules soon,” said Andrew Blanchard, a research scientist who led the team.
Relying on a Full-Stack Solution
NVIDIA software libraries for AI and accelerated computing helped the team complete its work in what one observer called a surprisingly short time.
“We didn’t need to fully optimize our work for the GPU’s tensor cores because you don’t need specialized code, you can just use the standard stack,” said Glaser.
He summed up what many finalists felt: “Having a chance to be part of meaningful research with potential impact on people’s lives is something that’s very satisfying for a scientist.”
Tune in to our special address at SC21 either live on Monday, Nov. 15 at 3 pm PST or later on demand. NVIDIA’s Marc Hamilton will provide an overview of our latest news, innovations and technologies, followed by a live Q&A panel with NVIDIA experts.
The post Gordon Bell Finalists Fight COVID, Advance Science With NVIDIA Technologies appeared first on The Official NVIDIA Blog.