We’ve trained and are open-sourcing a neural net called Whisper that approaches human level robustness and accuracy on English speech recognition.OpenAI Blog
Configure a custom Amazon S3 query output location and data retention policy for Amazon Athena data sources in Amazon SageMaker Data Wrangler
Amazon SageMaker Data Wrangler reduces the time that it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes in Amazon SageMaker Studio, the first fully integrated development environment (IDE) for ML. With Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization, from a single visual interface. You can import data from multiple data sources such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Snowflake, and 26 federated query data sources supported by Amazon Athena.
Starting today, when importing data from Athena data sources, you can configure the S3 query output location and data retention period to import data in Data Wrangler to control where and how long Athena stores the intermediary data. In this post, we walk you through this new feature.
Solution overview
Athena is an interactive query service that makes it easy to browse the AWS Glue Data Catalog, and analyze data in Amazon S3 and 26 federated query data sources using standard SQL. When you use Athena to import data, you can use Data Wrangler’s default S3 location for the Athena query output, or specify an Athena workgroup to enforce a custom S3 location. Previously, you had to implement cleanup workflows to remove this intermediary data, or manually set up S3 lifecycle configuration to control storage cost and meet your organization’s data security requirements. This is a big operational overhead, and not scalable.
Data Wrangler now supports custom S3 locations and data retention periods for your Athena query output. With this new feature, you can change the Athena query output location to a custom S3 bucket. You now have a default data retention policy of 5 days for the Athena query output, and you can change this to meet your organization’s data security requirements. Based on the retention period, the Athena query output in the S3 bucket gets cleaned up automatically. After you import the data, you can perform exploratory data analysis on this dataset and store the clean data back to Amazon S3.
The following diagram illustrates this architecture.
For our use case, we use a sample bank dataset to walk through the solution. The workflow consists of the following steps:
- Download the sample dataset and upload it to an S3 bucket.
- Set up an AWS Glue crawler to crawl the schema and store the metadata schema in the AWS Glue Data Catalog.
- Use Athena to access the Data Catalog to query data from the S3 bucket.
- Create a new Data Wrangler flow to connect to Athena.
- When creating the connection, set the retention TTL for the dataset.
- Use this connection in the workflow and store the clean data in another S3 bucket.
For simplicity, we assume that you have already set up the Athena environment (steps 1–3). We detail the subsequent steps in this post.
Prerequisites
To set up the Athena environment, refer to the User Guide for step-by-step instructions, and complete steps 1–3 as outlined in the previous section.
Import your data from Athena to Data Wrangler
To import your data, complete the following steps:
- On the Studio console, choose the Resources icon in the navigation pane.
- Choose Data Wrangler on the drop-down menu.
- Choose New flow.
- On the Import tab, choose Amazon Athena.
A detail page opens where you can connect to Athena and write a SQL query to import from the database. - Enter a name for your connection.
- Expand Advanced configuration.
When connecting to Athena, Data Wrangler uses Amazon S3 to stages the queried data. By default, this data is staged at the S3 locations3://sagemaker-{region}-{account_id}/athena/
with a retention period of 5 days. - For Amazon S3 location of query results, enter your S3 location.
- Select Data retention period and set the data retention period (for this post, 1 day).
If you deselect this option, the data will persist indefinitely.Behind the scenes, Data Wrangler attaches an S3 lifecycle configuration policy to that S3 location to automatically clean up. See the following example policy:You need
s3:GetLifecycleConfiguration
ands3:PutLifecycleConfiguration
for your SageMaker execution role to correctly apply the lifecycle configuration policies. Without these permissions, you get error messages when you try to import the data.The following error message is an example of missing the
GetLifecycleConfiguration
permission.
The following error message is an example of missing the
PutLifecycleConfiguration
permission. - Optionally, for Workgroup, you can specify an Athena workgroup.
An Athena workgroup isolates users, teams, applications, or workloads into groups, each with its own permissions and configuration settings. When you specify a workgroup, Data Wrangler inherits the workgroup setting defined in Athena. For example, if a workgroup has an S3 location defined to store query results and enable Override client side settings, you can’t edit the S3 query result location.By default, Data Wrangler also saves the Athena connection for you. This is displayed as a new Athena tile in the Import tab. You can always reopen that connection to query and bring different data into Data Wrangler.
- Deselect Save connection if you don’t want to save the connection.
- To configure the Athena connection, choose None for Sampling to import the entire dataset.
For large datasets, Data Wrangler allows you to import a subset of your data to build out your transformation workflow, and only process the entire dataset when you’re ready. This speeds up the iteration cycle and save processing time and cost. To learn more about different data sampling options available, visit Amazon SageMaker Data Wrangler now supports random sampling and stratified sampling. - For Data catalog¸ choose AwsDataCatalog.
- For Database, choose your database.
Data Wrangler displays the available tables. You can choose each table to check the schema and preview the data.
- Enter the following code in the query field:
- Choose Run to preview the data.
- If everything looks good, choose Import.
- Enter a dataset name and choose Add to import the data into your Data Wrangler workspace.
Analyze and process data with Data Wrangler
After you load the data in to Data Wrangler, you can do exploratory data analysis (EDA) and prepare the data for machine learning.
- Choose the plus sign next to the
bank-data
dataset in the data flow, and choose Add analysis.
Data Wrangler provides built-in analyses, including a Data Quality and Insights Report, data correlation, a pre-training bias report, a summary of your dataset, and visualizations (such as histograms and scatter plots). Additionally, you can create your own custom visualization.
- For Analysis type¸ choose Data Quality and Insight Report.
This automatically generates visualizations, analyses to identify data quality issues, and recommendations for the right transformations required for your dataset. - For Target column, choose Y.
- Because this is a classification problem statement, for Problem type, select Classification.
- Choose Create.
Data Wrangler creates a detailed report on your dataset. You can also download the report to your local machine.
- For data preparation, choose the plus sign next to the bank-data dataset in the data flow, and choose Add transform.
- Choose Add step to start building your transformations.
At the time of this writing, Data Wrangler provides over 300 built-in transformations. You can also write your own transformations using Pandas or PySpark.
You can now start building your transforms and analyses based on your business requirements.
Clean up
To avoid ongoing costs, delete the Data Wrangler resources using the steps below when you’re finished.
- Select Running Instances and Kernels icon.
- Under RUNNING APPS, click on the shutdown icon next to the
sagemaker-data-wrangler-1.0 app
. - Choose Shut down all to confirm.
Conclusion
In this post, we provided an overview of customizing your S3 location and enabling S3 lifecycle configurations for importing data from Athena to Data Wrangler. With this feature, you can store intermediary data in a secured S3 location, and automatically remove the data copy after the retention period to reduce the risk for unauthorized access to data. We encourage you to try out this new feature. Happy building!
To learn more about Athena and SageMaker, visit the Athena User Guide and Amazon SageMaker Documentation.
About the authors
Meenakshisundaram Thandavarayan is a Senior AI/ML specialist with AWS. He helps hi-tech strategic accounts on their AI and ML journey. He is very passionate about data-driven AI.
Harish Rajagopalan is a Senior Solutions Architect at Amazon Web Services. Harish works with enterprise customers and helps them with their cloud journey.
James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.
Use RStudio on Amazon SageMaker to create regulatory submissions for the life sciences industry
Pharmaceutical companies seeking approval from regulatory agencies such as the US Food & Drug Administration (FDA) or Japanese Pharmaceuticals and Medical Devices Agency (PMDA) to sell their drugs on the market must submit evidence to prove that their drug is safe and effective for its intended use. A team of physicians, statisticians, chemists, pharmacologists, and other clinical scientists review the clinical trial submission data and proposed labeling. If the review establishes that the there is sufficient statistical evidence to prove that the health benefits of the drug outweigh the risks, the drug is approved for sale.
The clinical trial submission package consists of tabulated data, analysis data, trial metadata, and statistical reports consisting of statistical tables, listings, and figures. In the case of the US FDA, the electronic common technical document (eCTD) is the standard format for submitting applications, amendments, supplements, and reports to the FDA’s Center for Biologics Evaluation and Research (CBER) and Center for Drug Evaluation and Research (CDER). For the FDA and Japanese PMDA, it’s a regulatory requirement to submit tabulated data in CDISC Standard Data Tabulation Model (SDTM), analysis data in CDISC Analysis Dataset Model (ADaM), and trial metadata in CDISC Define-XML (based on Operational Data Model (ODM)).
In this post, we demonstrate how we can use RStudio on Amazon SageMaker to create such regulatory submission deliverables. This post describes the clinical trial submission process, how we can ingest clinical trial research data, tabulate and analyze the data, and then create statistical reports—summary tables, data listings, and figures (TLF). This method can enable pharmaceutical customers to seamlessly connect to clinical data stored in their AWS environment, process it using R, and help accelerate the clinical trial research process.
Drug development process
The drug development process can broadly be divided into five major steps, as illustrated in the following figure.
It takes on an average 10–15 years and approximately USD $1–3 billion for one drug to receive a successful approval out of around 10,000 potential molecules. During the early phases of research (the drug discovery phase), promising drug candidates are identified, which move further to preclinical research. During the preclinical phase, researchers try to find out the toxicity of the drug by performing in vitro experiments in the lab and in vivo experiments on animals. After preclinical testing, drugs move on the clinical trial research phase, where they must be tested on humans to ascertain their safety and efficacy. The researchers design clinical trials and detail the study plan in the clinical trial protocol. They define the different clinical research phases—from small Phase 1 studies to determine drug safety and dosage, to a bigger Phase 2 trials to determine drug efficacy and side effects, to even bigger Phase 3 and 4 trials to determine drug efficacy, safety, and monitoring adverse reactions. After successful human clinical trials, the drug sponsor files a New Drug Application (NDA) to market the drug. The regulatory agencies review all the data, work with the sponsor on prescription labeling information, and approve the drug. After the drug’s approval, the regulatory agencies review post-market safety reports to ensure the complete product’s safety.
In 1997, Clinical Data Interchange Standards Consortium (CDISC), a global, non-profit organization comprising of pharmaceutical companies, CROs, biotech, academic institutions, healthcare providers, and government agencies, was started as volunteer group. CDISC has published data standards to streamline the flow of data from collection through submissions, and facilitated data interchange between partners and providers. CDISC has published the following standards:
- CDASH (Clinical Data Acquisition Standards Harmonization) – Standards for collected data
- SDTM (Study Data Tabulation Model) – Standards for submitting tabulated data
- ADaM (Analysis Data Model) – Standards for analysis data
- SEND (Standard for Exchange of Nonclinical Data) – Standards for nonclinical data
- PRM (Protocol Representation Model) – Standards for protocol
These standards can help trained reviewers analyze data more effectively and quickly using standard tools, thereby reducing drug approval times. It’s a regulatory requirement from the US FDA and Japanese PMDA to submit all tabulated data using the SDTM format.
R for clinical trial research submissions
SAS and R are two of the most used statistical analysis software used within the pharmaceutical industry. When development of the SDTM standards was started by CDISC, SAS was in almost universal use in the pharmaceutical industry and at the FDA. However, R is gaining tremendous popularity nowadays because it’s open source, and new packages and libraries are continuously added. Students primarily use R during their academics and research, and they take this familiarity with R to their jobs. R also offers support for emerging technologies such as advanced deep learning integrations.
Cloud providers such as AWS have now become the platform of choice for pharmaceutical customers to host their infrastructure. AWS also provides managed services such as SageMaker, which makes it effortless to create, train, and deploy machine learning (ML) models in the cloud. SageMaker also allows access to the RStudio IDE from anywhere via a web browser. This post details how statistical programmers and biostatisticians can ingest their clinical data into the R environment, how R code can be run, and how results are stored. We provide snippets of code that allow clinical trial data scientists to ingest XPT files into the R environment, create R data frames for SDTM and ADaM, and finally create TLF that can be stored in an Amazon Simple Storage Service (Amazon S3) object storage bucket.
RStudio on SageMaker
On November 2, 2021, AWS in collaboration with RStudio PBC announced the general availability of RStudio on SageMaker, the industry’s first fully managed RStudio Workbench IDE in the cloud. You can now bring your current RStudio license to easily migrate your self-managed RStudio environments to SageMaker in just a few simple steps. To learn more about this exciting collaboration, check out Announcing RStudio on Amazon SageMaker.
Along with the RStudio Workbench, the RStudio suite for R developers also offers RStudio Connect and RStudio Package Manager. RStudio Connect is designed to allow data scientists to publish insights, dashboards, and web applications. It makes it easy to share ML and data science insights from data scientists’ complicated work and put it in the hands of decision-makers. RStudio Connect also makes hosting and managing content simple and scalable for wide consumption.
Solution overview
In the following sections, we discuss how we can import raw data from a remote repository or S3 bucket in RStudio on SageMaker. It’s also possible to connect directly to Amazon Relational Database Service (Amazon RDS) and data warehouses like Amazon Redshift (see Connecting R with Amazon Redshift) directly from RStudio; however, this is outside the scope of this post. After data has been ingested from a couple of different sources, we process it and create R data frames for a table. Then we convert the table data frame into an RTF file and store the results back in an S3 bucket. These outputs can then potentially be used for regulatory submission purposes, provided the R packages used in the post have been validated for use for regulatory submissions by the customer.
Set up RStudio on SageMaker
For instructions on setting up RStudio on SageMaker in your environment, refer to Get started with RStudio on SageMaker. Make sure that the execution role of RStudio on SageMaker has access to download and upload data to the S3 bucket in which data is stored. To learn more about how to manage R packages and publish your analysis using RStudio on SageMaker, refer to Announcing Fully Managed RStudio on SageMaker for Data Scientists.
Ingest data into RStudio
In this step, we ingest data from various sources to make it available for our R session. We import data in SAS XPT format; however, the process is similar if you want to ingest data in other formats. One of the advantages of using RStudio on SageMaker is that if the source data is stored in your AWS accounts, then SageMaker can natively access the data using AWS Identity and Access Management (IAM) roles.
Access data stored in a remote repository
In this step, we import ADaM data from the FDA’s GitHub repository. We create a local directory called data
in the RStudio environment to store the data and download demographics data (dm.xpt
) from the remote repository. In this context, the local directory refers to a directory created on the your private Amazon EFS storage that is attached by default to your R session environment. See the following code:
When this step is complete, you can see dm.xpt
being downloaded by navigating to Files, data, dm.xpt.
Access data stored in Amazon S3
In this step, we download data stored in an S3 bucket in our account. We have copied contents from the FDA’s GitHub repository to the S3 bucket named aws-sagemaker-rstudio
for this example. See the following code:
When the step is complete, you can see pp.xpt
being downloaded by navigating to Files, data, pp.xpt.
Process XPT data
Now that we have SAS XPT files available in the R environment, we need to convert them into R data frames and process them. We use the haven
library to read XPT files. We merge CDISC SDTM datasets dm
and pp
to create ADPP dataset. Then we create a summary statistic table using the ADPP data frame. The summary table is then exported in RTF format.
First, XPT files are read using the read_xpt
function of the haven library. Then an analysis dataset is created using the sqldf
function of the sqldf
library. See the following code:
Then, an output data frame is created using functions from the Tplyr
and dplyr
libraries:
The output data frame is then stored as an RTF file in the output folder in the RStudio environment:
Upload outputs to Amazon S3
After the output has been generated, we put the data back in an S3 bucket. We can achieve this by creating a SageMaker session again, if a session isn’t active already, and uploading the contents of the output folder to an S3 bucket using the session$upload_data
function:
With these steps, we have ingested data, processed it, and uploaded the results to be made available for submission to regulatory authorities.
Clean up
To avoid incurring any unintended costs, you need to quit your current session. On the top right corner of the page, choose the power icon. This will automatically stop the underlying instance and therefore stop incurring any unintended compute costs.
Challenges
The post has outlined steps for ingesting raw data stored in an S3 bucket or from a remote repository. However, there are many other sources of raw data for a clinical trial, primarily eCRF (electronic case report forms) data stored in EDC (electronic data capture) systems such as Oracle Clinical, Medidata Rave, OpenClinica, or Snowflake; lab data; data from eCOA (clinical outcome assessment) and ePRO (electronic Patient-Reported Outcomes); real-world data from apps and medical devices; and electronic health records (EHRs) at the hospitals. Significant preprocessing is involved before this data can be made usable for regulatory submissions. Building connectors to various data sources and collecting them in a centralized data repository (CDR) or a clinical data lake, while maintaining proper access controls, poses significant challenges.
Another key challenge to overcome is that of regulatory compliance. The computer system used for creating regulatory submission outputs must be compliant with appropriate regulations, such as 21 CFR Part 11, HIPAA, GDPR, or any other GxP requirements or ICH guidelines. This translates to working in a validated and qualified environment with controls for access, security, backup, and auditability in place. This also means that any R packages that are used to create regulatory submission outputs must be validated before use.
Conclusion
In this post, we saw that the some of the key deliverables for an eCTD submission were CDISC SDTM, ADaM datasets, and TLF. This post outlined the steps needed to create these regulatory submission deliverables by first ingesting data from a couple of sources into RStudio on SageMaker. We then saw how we can process the ingested data in XPT format; convert it into R data frames to create SDTM, ADaM, and TLF; and then finally upload the results to an S3 bucket.
We hope that with the broad ideas laid out in the post, statistical programmers and biostatisticians can easily visualize the end-to-end process of loading, processing, and analyzing clinical trial research data into RStudio on SageMaker and use the learnings to define a custom workflow suited for your regulatory submissions.
Can you think of any other applications of using RStudio to help researchers, statisticians, and R programmers to make their lives easier? We would love to hear about your ideas! And if you have any questions, please share them in the comments section.
Resources
For more information, visit the following links:
- The Drug Development Process
- Get started with RStudio on Amazon SageMaker
- Announcing Fully Managed RStudio on Amazon SageMaker for Data Scientists
- SageMaker SDK documentation
- Productionizing R workload with Amazon SageMaker
About the authors
Rohit Banga is a Global Clinical Development Industry Specialist based out of London, UK. He is a biostatistician by training and helps Healthcare and LifeScience customers deploy innovative clinical development solutions on AWS. He is passionate about how data science, AI/ML, and emerging technologies can be used to solve real business problems within the Healthcare and LifeScience industry. In his spare time, Rohit enjoys skiing, BBQing, and spending time with family and friends.
Georgios Schinas is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in London and works closely with customers in UK and Ireland. Georgios helps customers design and deploy machine learning applications in production on AWS with a particular interest in MLOps practices and enabling customers to perform machine learning at scale. In his spare time, he enjoys traveling, cooking and spending time with friends and family.
Churn prediction using Amazon SageMaker built-in tabular algorithms LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular
Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and machine learning (ML) practitioners get started on training and deploying ML models quickly. These algorithms and models can be used for both supervised and unsupervised learning. They can process various types of input data, including tabular, image, and text.
Customer churn is a problem faced by a wide range of companies, from telecommunications to banking, where customers are typically lost to competitors. It’s in a company’s best interest to retain existing customers rather than acquire new customers because it usually costs significantly more to attract new customers. Mobile operators have historical records in which customers continued using the service or ultimately ended up churning. We can use this historical information of a mobile operator’s churn to train an ML model. After training this model, we can pass the profile information of an arbitrary customer (the same profile information that we used to train the model) to the model, and have it predict whether this customer is going to churn or not.
In this post, we train and deploy four recently released SageMaker algorithms—LightGBM, CatBoost, TabTransformer, and AutoGluon-Tabular—on a churn prediction dataset. We use SageMaker Automatic Model Tuning (a tool for hyperparameter optimization) to find the best hyperparameters for each model, and compare their performance on a holdout test dataset to select the optimal one.
You can also use this solution as a template to search over a collection of state-of-the-art tabular algorithms and use hyperparameter optimization to find the best overall model. You can easily replace the example dataset with your own to solve real business problems you’re interested in. If you want to jump straight into the SageMaker SDK code we go through in this post, you can refer to the following sample Jupyter notebook.
Benefits of SageMaker built-in algorithms
When selecting an algorithm for your particular type of problem and data, using a SageMaker built-in algorithm is the easiest option, because doing so comes with the following major benefits:
- Low coding – The built-in algorithms require little coding to start running experiments. The only inputs you need to provide are the data, hyperparameters, and compute resources. This allows you to run experiments more quickly, with less overhead for tracking results and code changes.
- Efficient and scalable algorithm implementations – The built-in algorithms come with parallelization across multiple compute instances and GPU support right out of the box for all applicable algorithms. If you have a lot of data with which to train your model, most built-in algorithms can easily scale to meet the demand. Even if you already have a pre-trained model, it may still be easier to use its corollary in SageMaker and input the hyperparameters you already know rather than port it over and write a training script yourself.
- Transparency – You’re the owner of the resulting model artifacts. You can take that model and deploy it on SageMaker for several different inference patterns (check out all the available deployment types) and easy endpoint scaling and management, or you can deploy it wherever else you need it.
Data visualization and preprocessing
First, we gather our customer churn dataset. It’s a relatively small dataset with 5,000 records, where each record uses 21 attributes to describe the profile of a customer of an unknown US mobile operator. The attributes range from the US state where the customer resides, to the number of calls they placed to customer service, to the cost they are billed for daytime calls. We’re trying to predict whether the customer will churn or not, which is a binary classification problem. The following is a subset of those features look like, with the label as the last column.
The following are some insights for each column, specifically the summary statistics and histogram of selected features.
We then preprocess the data, split it into training, validation, and test sets, and upload the data to Amazon Simple Storage Service (Amazon S3).
Automatic model tuning of tabular algorithms
Hyperparameters control how our underlying algorithms operate and influence the performance of the model. Those hyperparameters can be the number of layers, learning rate, weight decay rate, and dropout for neural network-based models, or the number of leaves, iterations, and maximum tree depth for tree ensemble models. To select the best model, we apply SageMaker automatic model tuning to each of the four trained SageMaker tabular algorithms. You need only select the hyperparameters to tune and a range for each parameter to explore. For more information about automatic model tuning, refer to Amazon SageMaker Automatic Model Tuning: Using Machine Learning for Machine Learning or Amazon SageMaker automatic model tuning: Scalable gradient-free optimization.
Let’s see how this works in practice.
LightGBM
We start by running automatic model tuning with LightGBM, and adapt that process to the other algorithms. As is explained in the post Amazon SageMaker JumpStart models and algorithms now available via API, the following artifacts are required to train a pre-built algorithm via the SageMaker SDK:
- Its framework-specific container image, containing all the required dependencies for training and inference
- The training and inference scripts for the selected model or algorithm
We first retrieve these artifacts, which depend on the model_id
(lightgbm-classification-model
in this case) and version:
We then get the default hyperparameters for LightGBM, set some of them to selected fixed values such as number of boosting rounds and evaluation metric on the validation data, and define the value ranges we want to search over for others. We use the SageMaker parameters ContinuousParameter
and IntegerParameter
for this:
Finally, we create a SageMaker Estimator, feed it into a HyperarameterTuner, and start the hyperparameter tuning job with tuner.fit()
:
The max_jobs
parameter defines how many total jobs will be run in the automatic model tuning job, and max_parallel_jobs
defines how many concurrent training jobs should be started. We also define the objective to “Maximize”
the model’s AUC (area under the curve). To dive deeper into the available parameters exposed by HyperParameterTuner
, refer to HyperparameterTuner.
Check out the sample notebook to see how we proceed to deploy and evaluate this model on the test set.
CatBoost
The process for hyperparameter tuning on the CatBoost algorithm is the same as before, although we need to retrieve model artifacts under the ID catboost-classification-model
and change the range selection of hyperparameters:
TabTransformer
The process for hyperparameter tuning on the TabTransformer model is the same as before, although we need to retrieve model artifacts under the ID pytorch-tabtransformerclassification-model
and change the range selection of hyperparameters.
We also change the training instance_type
to ml.p3.2xlarge
. TabTransformer is a model recently derived from Amazon research, which brings the power of deep learning to tabular data using Transformer models. To train this model in an efficient manner, we need a GPU-backed instance. For more information, refer to Bringing the power of deep learning to data in tables.
AutoGluon-Tabular
In the case of AutoGluon, we don’t run hyperparameter tuning. This is by design, because AutoGluon focuses on ensembling multiple models with sane choices of hyperparameters and stacking them in multiple layers. This ends up being more performant than training one model with the perfect selection of hyperparameters and is also computationally cheaper. For details, check out AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data.
Therefore, we switch the model_id
to autogluon-classification-ensemble
, and only fix the evaluation metric hyperparameter to our desired AUC score:
Instead of calling tuner.fit()
, we call estimator.fit()
to start a single training job.
Benchmarking the trained models
After we deploy all four models, we send the full test set to each endpoint for prediction and calculate accuracy, F1, and AUC metrics for each (see code in the sample notebook). We present the results in the following table, with an important disclaimer: results and relative performance between these models will depend on the dataset you use for training. These results are representative, and even though the tendency for certain algorithms to perform better is based on relevant factors (for example, AutoGluon intelligently ensembles the predictions of both LightGBM and CatBoost models behind the scenes), the balance in performance might change given a different data distribution.
. | LightGBM with Automatic Model Tuning | CatBoost with Automatic Model Tuning | TabTransformer with Automatic Model Tuning | AutoGluon-Tabular |
Accuracy | 0.8977 | 0.9622 | 0.9511 | 0.98 |
F1 | 0.8986 | 0.9624 | 0.9517 | 0.98 |
AUC | 0.9629 | 0.9907 | 0.989 | 0.9979 |
Conclusion
In this post, we trained four different SageMaker built-in algorithms to solve the customer churn prediction problem with low coding effort. We used SageMaker automatic model tuning to find the best hyperparameters to train these algorithms with, and compared their performance on a selected churn prediction dataset. You can use the related sample notebook as a template, replacing the dataset with your own to solve your desired tabular data-based problem.
Make sure to try these algorithms on SageMaker, and check out sample notebooks on how to use other built-in algorithms available on GitHub.
About the authors
Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.
João Moura is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use-cases and helping customers optimize Deep Learning model training and deployment. He is also an active proponent of low-code ML solutions and ML-specialized hardware.
FindIt: Generalized Object Localization with Natural Language Queries
Natural language enables flexible descriptive queries about images. The interaction between text queries and images grounds linguistic meaning in the visual world, facilitating a better understanding of object relationships, human intentions towards objects, and interactions with the environment. The research community has studied object-level visual grounding through a range of tasks, including referring expression comprehension, text-based localization, and more broadly object detection, each of which require different skills in a model. For example, object detection seeks to find all objects from a predefined set of classes, which requires accurate localization and classification, while referring expression comprehension localizes an object from a referring text and often requires complex reasoning on prominent objects. At the intersection of the two is text-based localization, in which a simple category-based text query prompts the model to detect the objects of interest.
Due to their dissimilar task properties, referring expression comprehension, detection, and text-based localization are mostly studied through separate benchmarks with most models only dedicated to one task. As a result, existing models have not adequately synthesized information from the three tasks to achieve a more holistic visual and linguistic understanding. Referring expression comprehension models, for instance, are trained to predict one object per image, and often struggle to localize multiple objects, reject negative queries, or detect novel categories. In addition, detection models are unable to process text inputs, and text-based localization models often struggle to process complex queries that refer to one object instance, such as “Left half sandwich.” Lastly, none of the models can generalize sufficiently well beyond their training data and categories.
To address these limitations, we are presenting “FindIt: Generalized Localization with Natural Language Queries” at ECCV 2022. Here we propose a unified, general-purpose and multitask visual grounding model, called FindIt, that can flexibly answer different types of grounding and detection queries. Key to this architecture is a multi-level cross-modality fusion module that can perform complex reasoning for referring expression comprehension and simultaneously recognize small and challenging objects for text-based localization and detection. In addition, we discover that a standard object detector and detection losses are sufficient and surprisingly effective for all three tasks without the need for task-specific design and losses common in existing works. FindIt is simple, efficient, and outperforms alternative state-of-the-art models on the referring expression comprehension and text-based localization benchmarks, while being competitive on the detection benchmark.
FindIt is a unified model for referring expression comprehension (col. 1), text-based localization (col. 2), and the object detection task (col. 3). FindIt can respond accurately when tested on object types/classes not known during training, e.g. “Find the desk” (col. 4). Compared to existing baselines (MattNet and GPV), FindIt can perform these tasks well and in a single model. |
Multi-level Image-Text Fusion
Different localization tasks are created with different semantic understanding objectives. For example, because the referring expression task primarily references prominent objects in the image rather than small, occluded or faraway objects, low resolution images generally suffice. In contrast, the detection task aims to detect objects with various sizes and occlusion levels in higher resolution images. Apart from these benchmarks, the general visual grounding problem is inherently multiscale, as natural queries can refer to objects of any size. This motivates the need for a multi-level image-text fusion model for efficient processing of higher resolution images over different localization tasks.
The premise of FindIt is to fuse the higher level semantic features using more expressive transformer layers, which can capture all-pair interactions between image and text. For the lower-level and higher-resolution features, we use a cheaper dot-product fusion to save computation and memory cost. We attach a detector head (e.g., Faster R-CNN) on top of the fused feature maps to predict the boxes and their classes.
FindIt accepts an image and a query text as inputs, and processes them separately in image/text backbones before applying the multi-level fusion. We feed the fused features to Faster R-CNN to predict the boxes referred to by the text. The feature fusion uses more expressive transformers at higher levels and cheaper dot-product at the lower levels. |
Multitask Learning
Apart from the multi-level fusion described above, we adapt the text-based localization and detection tasks to take the same inputs as the referring expression comprehension task. For the text-based localization task, we generate a set of queries over the categories present in the image. For any present category, the text query takes the form “Find the [object
],” where [object
] is the category name. The objects corresponding to that category are labeled as foreground and the other objects as background. Instead of using the aforementioned prompt, we use a static prompt for the detection task, such as “Find all the objects.”. We found that the specific choice of prompts is not important for text-based localization and detection tasks.
After adaptation, all tasks in consideration share the same inputs and outputs — an image input, a text query, and a set of output bounding boxes and classes. We then combine the datasets and train on the mixture. Finally, we use the standard object detection losses for all tasks, which we found to be surprisingly simple and effective.
Evaluation
We apply FindIt to the popular RefCOCO benchmark for referring expression comprehension tasks. When only the COCO and RefCOCO dataset is available, FindIt outperforms the state-of-the-art-model on all tasks. In the settings where external datasets are allowed, FindIt sets a new state of the art by using COCO and all RefCOCO splits together (no other datasets). On the challenging Google and UMD splits, FindIt outperforms the state of the art by a 10% margin, which, taken together, demonstrate the benefits of multitask learning.
Comparison with the state of the art on the popular referring expression benchmark. FindIt is superior on both the COCO and unconstrained settings (additional training data allowed). |
On the text-based localization benchmark, FindIt achieves 79.7%, higher than the GPV (73.0%), and Faster R-CNN baselines (75.2%). Please refer to the paper for more quantitative evaluation.
We further observe that FindIt generalizes better to novel categories and super-categories in the text-based localization task compared to competitive single-task baselines on the popular COCO and Objects365 datasets, shown in the figure below.
Efficiency
We also benchmark the inference times on the referring expression comprehension task (see Table below). FindIt is efficient and comparable with existing one-stage approaches while achieving higher accuracy. For fair comparison, all running times are measured on one GTX 1080Ti GPU.
Model | Image Size | Backbone | Runtime (ms) | |||
MattNet | 1000 | R101 | 378 | |||
FAOA | 256 | DarkNet53 | 39 | |||
MCN | 416 | DarkNet53 | 56 | |||
TransVG | 640 | R50 | 62 | |||
FindIt (Ours) | 640 | R50 | 107 | |||
FindIt (Ours) | 384 | R50 | 57 |
Conclusion
We present Findit, which unifies referring expression comprehension, text-based localization, and object detection tasks. We propose multi-scale cross-attention to unify the diverse localization requirements of these tasks. Without any task-specific design, FindIt surpasses the state of the art on referring expression and text-based localization, shows competitive performance on detection, and generalizes better to out-of-distribution data and novel classes. All of these are accomplished in a single, unified, and efficient model.
Acknowledgements
This work is conducted by Weicheng Kuo, Fred Bertsch, Wei Li, AJ Piergiovanni, Mohammad Saffar, and Anelia Angelova. We would like to thank Ashish Vaswani, Prajit Ramachandran, Niki Parmar, David Luan, Tsung-Yi Lin, and other colleagues at Google Research for their advice and helpful discussions. We would like to thank Tom Small for preparing the animation.
No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI
South Korea’s most popular AI voice assistant, GiGA Genie, converses with 8 million people each day.
The AI-powered speaker from telecom company KT can control TVs, offer real-time traffic updates and complete a slew of other home-assistance tasks based on voice commands. It has mastered its conversational skills in the highly complex Korean language thanks to large language models (LLMs) — machine learning algorithms that can recognize, understand, predict and generate human languages based on huge text datasets.
The company’s models are built using the NVIDIA DGX SuperPOD data center infrastructure platform and the NeMo Megatron framework for training and deploying LLMs with billions of parameters.
The Korean language, known as Hangul, reliably shows up in lists of the world’s most challenging languages. It includes four types of compound verbs, and words are often composed of two or more roots.
KT — South Korea’s leading mobile operator with over 22 million subscribers — improved the smart speaker’s understanding of such words by developing LLMs with around 40 billion parameters. And through integration with Amazon Alexa, GiGA Genie can converse with users in English, too.
“With transformer-based models, we’ve achieved significant quality improvements for the GiGA Genie smart speaker, as well as our customer services platform AI Contact Center, or AICC,” said Hwijung Ryu, LLM development team lead at KT.
AICC is an all-in-one, cloud-based platform that offers AI voice agents and other customer service-related applications.
It can receive calls and provide requested information — or quickly connect customers to human agents for answers to more detailed inquiries. AICC without human intervention manages more than 100,000 calls daily across Korea, according to Ryu.
“LLMs enable GiGA Genie to gain better language understanding and generate more human-like sentences, and AICC to reduce consultation times by 15 seconds as it summarizes and classifies inquiry types more quickly,” he added.
Training Large Language Models
Developing LLMs can be an expensive, time-consuming process that requires deep technical expertise and full-stack technology investments.
The NVIDIA AI platform simplified and sped up this process for KT.
“We trained our LLM models more effectively with NVIDIA DGX SuperPOD’s powerful performance — as well as NeMo Megatron’s optimized algorithms and 3D parallelism techniques,” Ryu said. “NeMo Megatron is continuously adopting new features, which is the biggest advantage we think it offers in improving our model accuracy.”
3D parallelism — a distributed training method in which an extremely large-scale deep learning model is partitioned across multiple devices — was crucial for training KT’s LLMs. NeMo Megatron enabled the team to easily accomplish this task with the highest throughput, according to Ryu.
“We considered using other platforms, but it was difficult to find an alternative that provides full-stack environments — from the hardware level to the inference level,” he added. “NVIDIA also provides exceptional expertise from product, engineering teams and more, so we easily solved several technical issues.”
Using hyperparameter optimization tools in NeMo Megatron, KT trained its LLMs 2x faster than with other frameworks, Ryu said. These tools allow users to automatically find the best configurations for LLM training and inference, easing and speeding the development and deployment process.
KT is also planning to use the NVIDIA Triton Inference Server to provide an optimized real-time inference service, as well as NVIDIA Base Command Manager to easily monitor and manage hundreds of nodes in its AI cluster.
“Thanks to LLMs, KT can release competitive products faster than ever,” Ryu said. “We also believe that our technology can drive innovation from other companies, as it can be used to improve their value and create innovative products.”
KT plans to release more than 20 natural language understanding and natural language generation APIs for developers in November. The application programming interfaces can be used for tasks including document summarization and classification, emotion recognition, and filtering of potentially inappropriate content.
Learn more about breakthrough technologies for the era of AI and the metaverse at NVIDIA GTC, running online through Thursday, Sept. 22.
Watch NVIDIA founder and CEO Jensen Huang’s keynote address in replay below:
The post No Hang Ups With Hangul: KT Trains Smart Speakers, Customer Call Centers With NVIDIA AI appeared first on NVIDIA Blog.
New NVIDIA DGX System Software and Infrastructure Solutions Supercharge Enterprise AI
At GTC today, NVIDIA unveiled a number of updates to its DGX portfolio to power new breakthroughs in enterprise AI development.
NVIDIA DGX H100 systems are now available for order. These infrastructure building blocks support NVIDIA’s full-stack enterprise AI solutions.
With 32 petaflops of performance at FP8 precision, NVIDIA DGX H100 delivers a leap in efficiency for enterprise AI development. It offers 3x lower total cost of ownership and 3.5x more energy efficiency compared to the previous generation.
New NVIDIA Base Command software, which simplifies and speeds AI development, powers every DGX system — from single nodes to DGX SuperPODs.
Also unveiled was NVIDIA DGX BasePOD — the evolution of DGX POD — which makes enterprise data-center AI deployments simpler and faster for IT teams to acquire, deploy and manage.
Many of the world’s AI leaders are building technological breakthroughs — from self-driving cars to voice assistants — using NVIDIA DGX systems and software, and the pace of innovation is not slowing down.
New NVIDIA Base Command Features
NVIDIA Base Command provides enterprise-grade orchestration and cluster management, and it now features a full software stack for maximizing AI developer productivity, IT manageability and workload performance.
The workflow management features of Base Command now include support for on-premises DGX SuperPOD environments, enabling businesses to gain centralized control of AI development projects with simplified collaboration for project teams, and integrated monitoring and reporting dashboards.
Base Command works with the NVIDIA AI Enterprise software suite, which is now included with every DGX system. The NVIDIA AI software enables end-to-end AI development and deployment with supported AI and data science tools, optimized frameworks and pretrained models.
Additionally, it offers enterprise-workflow management and MLOps integrations with DGX-Ready Software providers Domino Data Lab, Run.ai, Weights & Biases and NVIDIA Inception member Rescale. It also includes libraries that optimize and accelerate compute, storage and network infrastructure — while ensuring maximized system uptime, security and reliability.
New DGX BasePOD Reference Architecture
DGX BasePOD provides a reference architecture for DGX systems that incorporates design best practices for integrating compute, networking, storage and software.
Customers are already using NVIDIA DGX POD to power the development of a broad range of enterprise applications. DGX BasePOD builds on the success of DGX POD with new industry solutions targeting the biggest AI opportunities, including natural language processing, healthcare and life sciences, and fraud detection.
Delivered as fully integrated, ready-to-deploy offerings through the NVIDIA Partner Network, DGX BasePOD solutions range in size, from two to hundreds of DGX systems, with certified high-performance storage from NVIDIA DGX storage technology partners including DDN, Dell, NetApp, Pure Storage, VAST Data and WEKA.
Leaders Power AI Breakthroughs With DGX Systems
Enterprises around the world choose NVIDIA DGX systems to power their most advanced AI workloads. Among the AI innovators developing mission-critical AI capabilities on DGX A100 systems:
- ML research and product lab Adept is building an AI teammate powered by a large language model prototyped on NVIDIA DGX Foundry, and then scaled with NVIDIA A100 GPUs and NVIDIA Megatron on Oracle Cloud Infrastructure.
- Hyundai Motor Group is using a 40-node DGX SuperPOD to explore hyperscale AI workloads.
- Telecom company KT is developing a LLM with around 40 billion parameters for a variety of Korean-language applications, including the GiGA Genie smart speaker, using the NVIDIA NeMo Megatron framework, NVIDIA DGX SuperPOD and NVIDIA Base Command software.
- The University of Wisconsin-Madison is quickly bringing AI to medical imaging devices using NVIDIA DGX systems with the Flywheel research platform and the NVIDIA Clara healthcare application framework. Using the NVIDIA Federated Learning Application Runtime Environment, or NVIDIA FLARE, in collaboration with other hospitals, the university is securely training AI models on DGX systems for medical imaging, annotation and classification.
Learn more about the AI breakthroughs powered by NVIDIA DGX systems by watching NVIDIA founder and CEO Jensen Huang’s GTC keynote in replay. And join the GTC session, “Designing Your AI Center of Excellence,” with Charlie Boyle, vice president of DGX systems at NVIDIA.
The post New NVIDIA DGX System Software and Infrastructure Solutions Supercharge Enterprise AI appeared first on NVIDIA Blog.
Keynote Wrap-Up: NVIDIA CEO Unveils Next-Gen RTX GPUs, AI Workflows in the Cloud
New cloud services to support AI workflows and the launch of a new generation of GeForce RTX GPUs featured today in NVIDIA CEO Jensen Huang’s GTC keynote, which was packed with new systems, silicon, and software.
“Computing is advancing at incredible speeds, the engine propelling this rocket is accelerated computing, and its fuel is AI,” Huang said during a virtual presentation as he kicked off NVIDIA GTC.
Again and again, Huang connected new technologies to new products to new opportunities – from harnessing AI to delight gamers with never-before-seen graphics to building virtual proving grounds where the world’s biggest companies can refine their products.
Driving the deluge of new ideas, new products and new applications: a singular vision of accelerated computing unlocking advances in AI, which, in turn will touch industries around the world.
Gamers and creators will get the first GPUs based on the new NVIDIA Ada Lovelace architecture.
Enterprises will get powerful new tools for high-performance computing applications with systems based on the Grace CPU and Grace Hopper Superchip. Those building the 3D internet will get new OVX servers powered by Ada Lovelace L40 data center GPUs. Researchers and computer scientists get new large language model capabilities with NVIDIA LLMs NeMo Service. And the auto industry gets Thor, a new brain with an astonishing 2,000 teraflops of performance.
Huang highlighted how NVIDIA’s technologies are being put to work by a sweep of major partners and customers across a breadth of industries.
To speed adoption, he announced Deloitte, the world’s largest professional services firm, is bringing new services built on NVIDIA AI and NVIDIA Omniverse to the world’s enterprises.
And he shared customer stories from telecoms giant Charter, as well as General Motors in the automotive industry, the German railway system’s Deutsche Bahn in transportation, The Broad Institute in medical research, and Lowe’s in retail.
NVIDIA GTC, which kicked off this week, has become one of the world’s most important AI gatherings, with 200+ speakers from companies such as Boeing, Deutsche Bank, Lowe’s, Polestar, Johnson & Johnson, Kroger, Mercedes-Benz, Siemens AG, T-Mobile and US Bank. More than 200,000 people have registered for the conference.
A ‘Quantum Leap’: GeForce RTX 40 Series GPUs
First out of the blocks at the keynote was the launch of next-generation GeForce RTX 40 Series GPUs powered by Ada, which Huang called a “quantum leap” that paves the way for creators of fully simulated worlds.
Huang gave his audience a taste of what that makes possible by offering up a look at Racer RTX, a fully interactive simulation that’s entirely ray traced, with all the action physically modeled.
Ada’s advancements include a new Streaming Multiprocessor, a new RT Core with twice the ray-triangle intersection throughput, and a new Tensor Core with the Hopper FP8 Transformer Engine and 1.4 petaflops of Tensor processor power.
Ada also introduces the latest version of NVIDIA DLSS technology, DLSS 3, which uses AI to generate new frames by comparing new frames with prior frames to understand how a scene is changing. The result: boosting game performance by up to 4x over brute force rendering.
DLSS 3 has received support from many of the world’s leading game developers, with more than 35 games and applications announcing support. “DLSS 3 is one of our greatest neural rendering inventions,” Huang said.
Together, Huang said, these innovations help deliver 4x more processing throughput with the new GeForce RTX 4090 versus its forerunner, the RTX 3090 Ti. “The new heavyweight champ” starts at $1,599 and will be available Oct. 12.
Additionally, the new GeForce RTX 4080 is launching in November with two configurations.
The GeForce RTX 4080 16GB, priced at $1,199, has 9,728 CUDA cores and 16GB of high-speed Micron GDDR6X memory. With DLSS 3, it’s twice as fast in today’s games as the GeForce RTX 3080 Ti, and more powerful than the GeForce RTX 3090 Ti at lower power.
The GeForce RTX 4080 12GB has 7,680 CUDA cores and 12GB of Micron GDDR6X memory, and with DLSS 3 is faster than the RTX 3090 Ti, the previous-generation flagship GPU. It’s priced at $899.
Huang also announced that NVIDIA Lightspeed Studios used Omniverse to reimagine Portal, one of the most celebrated games in history. With NVIDIA RTX Remix, an AI-assisted toolset, users can mod their favorite games, enabling them to up-res textures and assets, and give materials physically accurate properties.
Powering AI Advances, H100 GPU in Full Production
Once more tying systems and software to broad technology trends, Huang explained that large language models, or LLMs, and recommender systems are the two most important AI models today.
Recommenders “run the digital economy,” powering everything from e-commerce to entertainment to advertising, he said. “They’re the engines behind social media, digital advertising, e-commerce and search.”
And large language models based on the Transformer deep learning model first introduced in 2017 are now among the most vibrant areas for research in AI, and able to learn to understand human language without supervision or labeled datasets.
“A single pre-trained model can perform multiple tasks, like question answering, document summarization, text generation, translation and even software programming,” Huang said.
Delivering the computing muscle needed to power these enormous models, Huang said the NVIDIA H100 Tensor Core GPU, with Hopper’s next-generation Transformer Engine, is in full production, with systems shipping in the coming weeks.
“Hopper is in full production and coming soon to power the world’s AI factories,” Huang said.
Partners building systems include Atos, Cisco, Dell Technologies, Fujitsu, GIGABYTE, Hewlett Packard Enterprise, Lenovo and Supermicro. And Amazon Web Services, Google Cloud, Microsoft Azure and Oracle Cloud Infrastructure will be among the first to deploy H100-based instances in the cloud starting next year.
And Grace Hopper, which combines NVIDIA’s Arm-based Grace data center CPU with Hopper GPUs, with its 7x increase in fast-memory capacity, will deliver a “giant leap” for recommender systems, Huang said. Systems incorporating Grace Hopper will be available in the first half of 2023.
Weaving Together the Metaverse, L40 Data Center GPUs in Full Production
The next evolution of the internet, called the metaverse, will be extended with 3D, Huang explained. Omniverse is NVIDIA’s platform for building and running metaverse applications.
Here, too, Huang explained how connecting and simulating these worlds will require powerful, flexible new computers. And NVIDIA OVX servers are built for scaling out metaverse applications.
NVIDIA’s 2nd-generation OVX systems will be powered by Ada Lovelace L40 data center GPUs, which are now in full production, Huang announced.
Thor for Autonomous Vehicles, Robotics, Medical Instruments and More
In today’s vehicles, active safety, parking, driver monitoring, camera mirrors, cluster and infotainment are driven by different computers. In the future, they’ll be delivered by software that improves over time, running on a centralized computer, Huang said.
To power this, Huang introduced DRIVE Thor, which combines the transformer engine of Hopper, the GPU of Ada, and the amazing CPU of Grace.
The new Thor superchip delivers 2,000 teraflops of performance, replacing Atlan on the DRIVE roadmap, and providing a seamless transition from DRIVE Orin, which has 254 TOPS of performance and is currently in production vehicles. Thor will be the processor for robotics, medical instruments, industrial automation and edge AI systems, Huang said.
3.5 Million Developers, 3,000 Accelerated Applications
Bringing NVIDIA’s systems and silicon, and the benefits of accelerated computing, to industries around the world, is a software ecosystem with more than 3.5 million developers creating some 3,000 accelerated apps using NVIDIA’s 550 software development kits, or SDKs, and AI models, Huang announced.
And it’s growing fast. Over the past 12 months, NVIDIA has updated more than 100 SDKs and introduced 25 new ones.
“New SDKs increase the capability and performance of systems our customers already own, while opening new markets for accelerated computing,” Huang said.
New Services for AI, Virtual Worlds
Large language models “are the most important AI models today,” Huang said. Based on the transformer architecture, these giant models can learn to understand meanings and languages without supervision or labeled datasets, unlocking remarkable new capabilities.
To make it easier for researchers to apply this “incredible” technology to their work, Huang announced the Nemo LLM Service, an NVIDIA-managed cloud service to adapt pretrained LLMs to perform specific tasks.
To accelerate the work of drug and bioscience researchers, Huang also announced BioNeMo LLM, a service to create LLMs that understand chemicals, proteins, DNA and RNA sequences.
Huang announced that NVIDIA is working with The Broad Institute, the world’s largest producer of human genomic information, to make NVIDIA Clara libraries, such as NVIDIA Parabricks, the Genome Analysis Toolkit, and BioNeMo, available on Broad’s Terra Cloud Platform.
Haung also detailed NVIDIA Omniverse Cloud, an infrastructure-as-a-service that connects Omniverse applications running in the cloud, on premises or on a device.
New Omniverse containers – Replicator for synthetic data generation, Farm for scaling render farms, and Isaac Sim for building and training AI robots – are now available for cloud deployment, Huang announced.
Omniverse is seeing wide adoption, and Huang shared several customer stories and demos:
- Lowe’s, which has nearly 2,000 retail outlets, is using Omniverse to design, build and operate digital twins of their stores;
- Charter, a $50 billion dollar telecoms provider, and interactive data analytics provider HeavyAI, are using Omniverse to create digital twins of Charter’s 4G and 5G networks;
- GM is creating a digital twin of its Michigan Design Studio in Omniverse where designers, engineers and marketers can collaborate.
New Jetson Orin Nano for Robotics
Shifting from virtual worlds to machines that will move through their world, robotic computers “are the newest types of computers,” Huang said, describing NVIDIA’s second-generation processor for robotics, Orin, as a homerun.
To bring Orin to more markets, he announced the Jetson Orin Nano, a tiny robotics computer that is 80x faster than the previous super-popular Jetson Nano.
Jetson Orin Nano runs the NVIDIA Isaac robotics stack and features the ROS 2 GPU-accelerated framework, and NVIDIA Iaaac Sim, a robotics simulation platform, is available on the cloud.
And for robotics developers using AWS RoboMaker, Huang announced that containers for the NVIDIA Isaac platform for robotics development are in the AWS marketplace.
New Tools for Video, Image Services
Most of the world’s internet traffic is video, and user-generated video streams will be increasingly augmented by AI special effects and computer graphics, Huang explained.
“Avatars will do computer vision, speech AI, language understanding and computer graphics in real time and at cloud scale,” Huang said.
To enable new innovations at the intersection of real-time graphics, AI and communications possible, Huang announced NVIDIA has been building acceleration libraries like CV-CUDA, a cloud runtime engine called UCF Unified Computing Framework, Omniverse ACE Avatar Cloud Engine, and a sample application called Tokkio for customer service avatars.
Deloitte to Bring AI, Omniverse Services to Enterprises
And to speed the adoption of all these technologies to the world’s enterprises, Deloitte, the world’s largest professional services firm, is bringing new services built on NVIDIA AI and NVIDIA Omniverse to the world’s enterprises, Huang announced.
He said that Deloitte’s professionals will help the world’s enterprises use NVIDIA application frameworks to build modern multi-cloud applications for customer service, cybersecurity, industrial automation, warehouse and retail automation and more.
Just Getting Started
Huang ended his keynote by recapping a talk that moved from outlining new technologies to product announcements and back — uniting scores of different parts into a singular vision.
“Today, we announced new chips, new advances to our platforms, and, for the very first time, new cloud services,” Huang said as he wrapped up. “These platforms propel new breakthroughs in AI, new applications of AI, and the next wave of AI for science and industry.”
The post Keynote Wrap-Up: NVIDIA CEO Unveils Next-Gen RTX GPUs, AI Workflows in the Cloud appeared first on NVIDIA Blog.
NVIDIA Omniverse ACE Enables Easier, Faster Deployment of Interactive Avatars
Meet Violet, an AI-powered customer service assistant ready to take your order.
Unveiled this week at GTC, Violet is a cloud-based avatar that represents the latest evolution in avatar development through NVIDIA Omniverse Avatar Cloud Engine (ACE), a suite of cloud-native AI microservices that make it easier to build and deploy intelligent virtual assistants and digital humans at scale.
To animate interactive avatars like Violet, developers need to ensure the 3D character can see, hear, understand and communicate with people. But bringing these avatars to life can be incredibly challenging, as traditional methods typically require expensive equipment, specific expertise and time-consuming workflows.
The Violet demo showcases how Omniverse ACE eases avatar development, delivering all the AI building blocks necessary to create, customize and deploy interactive avatars. Whether taking restaurant orders or answering questions about the universe, these AI assistants are easily customizable for virtually any industry, and can help organizations enhance existing workflows and unlock new business opportunities.
Watch the video below to see Violet interact with users, respond to speech prompts and make intelligent recommendations:
How Omniverse ACE Brings Violet to Life
The demo showcases Violet as a fully rigged avatar with basic animation. To create Violet, NVIDIA’s creative team used the company’s Unified Compute Framework, a fully accelerated framework that enables developers to combine optimized and accelerated microservices into real-time AI applications. UCF helped the team build a graph of microservices for Violet that were deployed in the cloud.
Omniverse ACE powers the backend of interactive avatars, essentially acting as Violet’s brain. Additionally, two reference applications are built on ACE: NVIDIA Tokkio and NVIDIA Maxine.
Violet was developed using the Tokkio application workflow, which enables interactive avatars to see, perceive, converse intelligently and provide recommendations to enhance customer service, both online and in places like restaurants and stores.
NVIDIA Maxine delivers a suite of GPU-accelerated AI software development kits and cloud-native microservices for deploying AI features to enhance real-time video communications. Maxine integrates the NVIDIA Riva SDK’s real-time automatic speech recognition and text-to-speech capabilities with real-time “live portrait” photo animation and eye contact features, which enable better communication and understanding.
Latest Microservices Expand Possibilities for Avatars
The demo with Violet highlights how developers of digital humans and virtual assistants can use Omniverse ACE to accelerate their avatar development workflows. Omniverse ACE also delivers microservices that enable developers to access the best NVIDIA AI technology, with no coding required.
Some of the latest microservices include:
- Animation AI: Omniverse Audio2Face simplifies animation of a 3D character to match any voice-over track, helping users animate characters for games, films or real-time digital assistants.
- Conversational AI: Includes the NVIDIA Riva SDK for speech AI and NVIDIA NeMo Megatron framework for natural language processing, allowing developers to quickly build and deploy cutting-edge applications that deliver high-accuracy, expressive voices and respond in real time.
AI Avatars Deliver New Transformations Across Industries
The AI avatars that ACE enables will enhance interactive experiences in industries such as gaming, entertainment, transportation and hospitality.
Leading professional-services company Deloitte has worked with NVIDIA to help enterprises deploy transformative applications. At GTC, Deloitte announced that new hybrid-cloud offerings for NVIDIA AI and NVIDIA Omniverse services and platforms, including Omniverse ACE, will be added to the existing Deloitte Center for AI Computing.
“Cloud-based AI models and services are opening up new ways for digital humans to make people feel more connected, and today’s interaction with Violet in the NVIDIA GTC keynote shows a glimpse into the future of AI-powered avatars,” said Vladimir Mastilović, vice president of digital humans technology at Epic Games. “We are delighted to see NVIDIA Omniverse ACE using MetaHumans in Unreal Engine 5 to make it even easier to deploy engaging high-fidelity 3D avatars.”
NVIDIA Omniverse ACE will be available to early-access partners starting later this year, along with the Tokkio reference application for simplified customer-service avatar implementation.
Learn more about Omniverse ACE by joining this session at GTC, and explore all the technologies that go into the creation and animation of realistic, interactive digital humans.
Customers can request the hands-on, web-based Tokkio demo.
Developers and partners can sign up to be notified when ACE is available.
And catch up on the latest announcements from the GTC keynote by NVIDIA founder and CEO Jensen Huang:
The post NVIDIA Omniverse ACE Enables Easier, Faster Deployment of Interactive Avatars appeared first on NVIDIA Blog.
New NVIDIA Maxine Cloud-Native Architecture Delivers Breakthrough Audio and Video Quality at Scale
The latest release of NVIDIA Maxine is paving the way for real-time audio and video communications. Whether for a video conference, a call made to a customer service center, or a live stream, Maxine enables clear communications to enhance virtual interactions.
NVIDIA Maxine is a suite of GPU-accelerated AI software development kits (SDKs) and cloud-native microservices for deploying optimized and accelerated AI features that enhance audio, video and augmented-reality (AR) effects in real time.
And with Maxine’s state-of-the-art models, end users don’t need expensive gear to improve audio and video. Using NVIDIA AI-based technology, these high-quality effects can be achieved with standard microphones and camera equipment.
At GTC, NVIDIA announced the re-architecture of Maxine for cloud-native microservices, with the early-access release of Maxine’s audio-effects microservice. Additionally, new Maxine SDK features were unveiled, including Speaker Focus and Face Expression Estimation, as well as the general availability of Eye Contact. NVIDIA Maxine now also includes enhanced versions of existing SDK features.
Maxine Goes Cloud Native
Maxine’s cloud-native microservices allow developers to build real-time AI applications. Microservices can be independently managed and deployed seamlessly in the cloud, accelerating development timelines.
The Audio Effects microservice, available in early access, contains four state-of-the-art audio features:
- Background Noise Removal: Removes several common background noises using AI models, while preserving the speaker’s natural voice.
- Room Echo Removal: Removes reverberations from audio using AI models, restoring clarity of a speaker’s voice.
- Audio Super Resolution: Improves audio quality by increasing the temporal resolution of audio signal. It currently supports upsampling from 8 kHz to 16 kHz and from 16 kHz to 48 kHz.
- Acoustic Echo Cancellation: Cancels real-time acoustic device echo from the input-audio stream, eliminating mismatched acoustic pairs and double-talk. With AI-based technology, more effective cancellation is achieved than with traditional digital signal processing.
Pexip, a leading provider of enterprise video conferencing and collaboration solutions, is using NVIDIA AI technologies to take virtual meetings to the next level with advanced features for the modern workforce.
“With Maxine’s move to cloud-native microservices, it will be even easier to combine NVIDIA’s advanced AI technologies with our own unique server-side architecture,” said Eddie Clifton, senior vice president of Strategic Alliances at Pexip. “This allows our teams at Pexip to deliver an enhanced experience for virtual meetings.”
Sign up for early access.
Explore Enhanced Features of SDKs
Maxine offers three GPU-accelerated SDKs that reinvent real-time communications with AI: audio, video and AR effects.
The audio effects SDK delivers multi-effect, low-latency, AI-based audio-quality enhancement algorithms. Speaker Focus, available in early access, is a new feature that separates the audio tracks of foreground and background speakers, making each voice more intelligible. Additionally, the Audio Super Resolution SDK feature has been updated with enhanced quality.
The video effects SDK creates AI-based video effects with standard webcam input. The Virtual Background feature, which segments a person’s profile and applies AI-powered background removal, replacement or blur, has been updated with enhanced temporal stability.
And the AR SDK provides AI-powered, real-time 3D face tracking and body pose estimation based on a standard web camera feed. Latest features include:
- Eye Contact: Simulates eye contact by estimating and aligning gaze with the camera.
- Face Expression Estimation: Tracks the face and infers what expression is presented by the subject.
The following AR features have been updated:
- Body Pose Estimation: Predicts and tracks 34 key points of the human body in 2D and 3D — now with support for multi-person tracking.
- Face Landmark Tracking: Recognizes facial features and contours using 126 key points. Tracks head pose and facial deformation due to head movement and expression — in three degrees of freedom in real time — now with Quality mode to achieve even higher-quality tracking.
- Face Mesh: Represents a human face with a 3D mesh with up to 3,000 vertices and six degrees of freedom — now includes 3D morphable models from the USC Institute of Creative Technologies.
Try out the Maxine SDKs. To directly experience Maxine’s effects, download the NVIDIA Broadcast App.
Experience State-of-the-Art Effects With the Power of AI
Maxine SDKs and microservices provide a suite of low-latency AI effects that can be integrated with existing customer infrastructures. Developers can tap into cutting-edge AI capabilities with Maxine, as the technology is built on the NVIDIA AI platform and has world-class pretrained models for users to create, customize and deploy premium audio- and video-quality features.
Maxine is also part of the NVIDIA Omniverse Avatar Cloud Engine, a collection of cloud-based AI models and services for developers to build, customize and deploy interactive avatars. Maxine’s customizable cloud-native microservices allow for independent deployment into AI-effects pipelines. Maxine can be deployed on premises, in the cloud or at the edge.
Learn more about NVIDIA Maxine and other technology breakthroughs by watching the GTC keynote by NVIDIA founder and CEO Jensen Huang:
The post New NVIDIA Maxine Cloud-Native Architecture Delivers Breakthrough Audio and Video Quality at Scale appeared first on NVIDIA Blog.