NVIDIA Debuts AI-Enhanced Real-Time Ray Tracing for Games and Apps With New DLSS 3.5

NVIDIA Debuts AI-Enhanced Real-Time Ray Tracing for Games and Apps With New DLSS 3.5

The latest advancements in AI for gaming are in the spotlight today at Gamescom, the world’s largest gaming conference, as NVIDIA introduced a host of technologies, starting with DLSS 3.5, the next step forward of its breakthrough AI neural rendering technology.

DLSS 3.5, NVIDIA’s latest innovation in AI-powered graphics is an image quality upgrade incorporated into the fall’s hottest ray-traced titles, from Cyberpunk 2077: Phantom Liberty to Alan Wake 2 to Portal with RTX.

But NVIDIA didn’t stop there. DLSS is coming to more AAA blockbusters; emotion is being added to AI-powered non-playable characters (NPCs); Xbox Game Pass titles are coming to the GeForce NOW cloud-gaming service; and upgrades to GeForce NOW servers are underway.

DLSS 3.5 Introduces Ray Reconstruction

The biggest news? The introduction of Ray Reconstruction in DLSS 3.5, a groundbreaking feature that elevates ray-traced image quality for all GeForce RTX GPUs, outclassing traditional hand-tuned denoisers with an AI network trained by an NVIDIA supercomputer.

The result improves lighting effects like reflections, global illumination, and shadows to create a more immersive, realistic gaming experience.

Denoising is used in ray-traced computer graphics to fill in missing pixels to more efficiently composite a final image. NVIDIA DLSS 3.5 is trained on 5x more training data than DLSS 3, so it can recognize different ray-traced effects and make smarter decisions about when to use temporal and spatial data.

DLSS, first released in February 2019, has gotten a number of major upgrades improving both image quality and performance.

Ray Reconstruction is now included as part of DLSS 3.5, which offers a suite of AI rendering technologies powered by Tensor Cores on GeForce RTX GPUs for faster frame rates, better image quality, and great responsiveness.

DLSS 3.5 includes Ray Reconstruction, Super Resolution, Deep Learning Anti-Aliasing and Frame Generation.

NVIDIA announced that upcoming blockbusters Alan Wake 2, Cyberpunk 2077: Phantom Liberty and Portal with RTX will use NVIDIA DLSS 3.5 this fall.

DLSS 3.5 also improves image quality in real-time 3D creator applications and allows 3D creator professionals to showcase a higher-quality image without spending minutes or hours for the final render.

Ray Reconstruction will be included in upcoming releases of D5 Render, Chaos Vantage and NVIDIA Omniverse, a development platform for connecting and building 3D tools and applications.

DLSS is now featured in over 330 games and apps. And this fall’s biggest blockbusters will launch with DLSS 3, including Call of Duty: Modern Warfare III, PAYDAY 3 and Fortnite.

NVIDIA Reflex Comes to More Games

In addition, Call of Duty: Modern Warfare III, PAYDAY 3, Alan Wake 2, Cyberpunk 2077: Phantom Liberty and more will launch with NVIDIA Reflex, which reduces system latency so gamers’ actions occur quicker, providing a competitive edge in multiplayer matches and making single-player titles more responsive and enjoyable.

And Reflex is already increasing gamers’ competitiveness in the latest editions of wildly popular franchises, with APEX Legends Season 18 and Overwatch 2 Invasion.

NVIDIA Reflex is now used by over 50 million players each month. It’s available in 9 of the top 10 competitive shooters, including the Counter-Strike 2 beta, and is activated by 90% of GeForce gamers in over 70 supported titles.

Half-Life 2 RTX, An RTX Remix Community Project

Half-Life 2 RTX: An RTX Remix Project is an in-development community remaster of one of the highest-rated games of all time, Valve’s Half-Life 2.  Being developed by four of Half-Life 2’s top mod teams using RTX Remix, Half-Life 2 RTX will feature full ray tracing, DLSS 3, Reflex and RTX IO.

AI-Powered NPCs Get More Emotion With NeMo SteerLM

Bringing more AI to gaming, NVIDIA Avatar Cloud Engine (ACE) introduces NeMo SteerLM. This new training technique enables developers to customize the personality of NPCs for more emotive, realistic, and memorable interactions.

ACE is a custom AI model foundry that aims to bring intelligence to NPCs through AI-powered natural language interactions.

New Games, New SuperPODS Come to GeForce NOW

GeForce NOW also gets new games as Ultimate members connect to more powerful servers.

NVIDIA announced GeForce RTX 4080 SuperPODs are now fully deployed throughout North America and Europe, bringing exclusive access to RTX 4080-class servers to Ultimate members.

Those servers will be kept plenty busy.

Coming soon, GeForce NOW members can stream AAA titles Alan Wake 2, Cyberpunk 2077: Phantom Liberty DLC, PAYDAY 3 and Party Animals at launch from the cloud gaming service.

As part of NVIDIA and Microsoft’s collaboration to bring more choice to gamers, Microsoft Store integration will be added to GeForce NOW in the coming days.

Members will soon be able to stream over ten supported Xbox PC Game Pass titles at GeForce RTX 4080 quality across devices of their choice. More games from Xbox’s PC Game Pass library will be added to GeForce NOW on a regular basis.

Head to the cloud and stream new titles joining later this week, including DOOM 2016 from Bethesda.

See DLSS 3.5 in Action at Gamescom

DLSS 3.5 is being demonstrated in NVIDIA’s booth (hall 2.1, booth A10) at Gamescom in Cologne, Germany, August 23-27.

Read More

Explain medical decisions in clinical settings using Amazon SageMaker Clarify

Explain medical decisions in clinical settings using Amazon SageMaker Clarify

Explainability of machine learning (ML) models used in the medical domain is becoming increasingly important because models need to be explained from a number of perspectives in order to gain adoption. These perspectives range from medical, technological, legal, and the most important perspective—the patient’s. Models developed on text in the medical domain have become accurate statistically, yet clinicians are ethically required to evaluate areas of weakness related to these predictions in order to provide the best care for individual patients. Explainability of these predictions is required in order for clinicians to make the correct choices on a patient-by-patient basis.

In this post, we show how to improve model explainability in clinical settings using Amazon SageMaker Clarify.

Background

One specific application of ML algorithms in the medical domain, which uses large volumes of text, is clinical decision support systems (CDSSs) for triage. On a daily basis, patients are admitted to hospitals and admission notes are taken. After these notes are taken, the triage process is initiated, and ML models can assist clinicians with estimating clinical outcomes. This can help reduce operational overhead costs and provide optimal care for patients. Understanding why these decisions are suggested by the ML models is extremely important for decision-making related to individual patients.

The purpose of this post is to outline how you can deploy predictive models with Amazon SageMaker for the purposes of triage within hospital settings and use SageMaker Clarify to explain these predictions. The intent is to offer an accelerated path to adoption of predictive techniques within CDSSs for many healthcare organizations.

The notebook and code from this post are available on GitHub. To run it yourself, clone the GitHub repository and open the Jupyter notebook file.

Technical background

A large asset for any acute healthcare organization is its clinical notes. At the time of intake within a hospital, admission notes are taken. A number of recent studies have shown the predictability of key indicators such as diagnoses, procedures, length of stay, and in-hospital mortality. Predictions of these are now highly achievable from admission notes alone, through the use of natural language processing (NLP) algorithms [1].

Advances in NLP models, such as Bi-directional Encoder Representations from Transformers (BERT), have allowed for highly accurate predictions on a corpus of text, such as admission notes, that were previously difficult to get value from. Their prediction of the clinical indicators is highly applicable for use in a CDSS.

Yet, in order to use the new predictions effectively, how these accurate BERT models are achieving their predictions still needs to be explained. There are several techniques to explain the predictions of such models. One such technique is SHAP (SHapley Additive exPlanations), which is a model-agnostic technique for explaining the output of ML models.

What is SHAP

SHAP values are a technique for explaining the output of ML models. It provides a way to break down the prediction of an ML model and understand how much each input feature contributes to the final prediction.

SHAP values are based on game theory, specifically the concept of Shapley values, which were originally proposed to allocate the payout of a cooperative game among its players [2]. In the context of ML, each feature in the input space is considered a player in a cooperative game, and the prediction of the model is the payout. SHAP values are calculated by examining the contribution of each feature to the model prediction for each possible combination of features. The average contribution of each feature across all possible feature combinations is then calculated, and this becomes the SHAP value for that feature.

SHAP allows models to explain predictions without understanding the model’s inner workings. In addition, there are techniques to display these SHAP explanations in text, so that the medical and patient perspectives can all have intuitive visibility into how algorithms come to their predictions.

With new additions to SageMaker Clarify, and the use of pre-trained models from Hugging Face that are easily used implemented in SageMaker, model training and explainability can all be easily done in AWS.

For the purpose of an end-to-end example, we take the clinical outcome of in-hospital mortality and show how this process can be implemented easily in AWS using a pre-trained Hugging Face BERT model, and the predictions will be explained using SageMaker Clarify.

Choices of Hugging Face model

Hugging Face offers a variety of pre-trained BERT models that have been specialized for use on clinical notes. For this post, we use the bigbird-base-mimic-mortality model. This model is a fine-tuned version of Google’s BigBird model, specifically adapted for predicting mortality using MIMIC ICU admission notes. The model’s task is to determine the likelihood of a patient not surviving a particular ICU stay based on the admission notes. One of the significant advantages of using this BigBird model is its capability to process larger context lengths, which means we can input the complete admission notes without the need for truncation.

Our steps involve deploying this fine-tuned model on SageMaker. We then incorporate this model into a setup that allows for real-time explanation of its predictions. To achieve this level of explainability, we use SageMaker Clarify.

Solution overview

SageMaker Clarify provides ML developers with purpose-built tools to gain greater insights into their ML training data and models. SageMaker Clarify explains both global and local predictions and explains decisions made by computer vision (CV) and NLP models.

The following diagram shows the SageMaker architecture for hosting an endpoint that serves explainability requests. It includes interactions between an endpoint, the model container, and the SageMaker Clarify explainer.

SageMaker Clarify Blog

In the sample code, we use a Jupyter notebook to showcase the functionality. However, in a real-world use case, electronic health records (EHRs) or other hospital care applications would directly invoke the SageMaker endpoint to get the same response. In the Jupyter notebook, we deploy a Hugging Face model container to a SageMaker endpoint. Then we use SageMaker Clarify to explain the results that we obtain from the deployed model.

Prerequisites

You need the following prerequisites:

Access the code from the GitHub repository and upload it to your notebook instance. You can also run the notebook in an Amazon SageMaker Studio environment, which is an integrated development environment (IDE) for ML development. We recommend using a Python 3 (Data Science) kernel on SageMaker Studio or a conda_python3 kernel on a SageMaker notebook instance.

Deploy the model with SageMaker Clarify enabled

As the first step, download the model from Hugging Face and upload it to an Amazon Simple Storage Service (Amazon S3) bucket. Then create a model object using the HuggingFaceModel class. This uses a prebuilt container to simplify the process of deploying Hugging Face models to SageMaker. You also use a custom inference script to do the predictions within the container. The following code illustrates the script that is passed as an argument to the HuggingFaceModel class:

from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data = model_path_s3,
transformers_version='4.6.1',
pytorch_version='1.7.1',
py_version='py36',
role=role,
source_dir = "./{}/code".format(model_id),
entry_point = "inference.py"
)

Then you can define the instance type that you deploy this model on:

instance_type = "ml.g4dn.xlarge"
container_def = huggingface_model.prepare_container_def(instance_type=instance_type)
container_def

We then populate ExecutionRoleArn, ModelName and PrimaryContainer fields to create a Model.

model_name = "hospital-triage-model"

sagemaker_client.create_model(
ExecutionRoleArn=role,
ModelName=model_name,
PrimaryContainer=container_def,
)
print(f"Model created: {model_name}")

Next, create an endpoint configuration by calling the create_endpoint_config API. Here, you supply the same model_name used in the create_model API call. The create_endpoint_config now supports the additional parameter ClarifyExplainerConfig to enable the SageMaker Clarify explainer. The SHAP baseline is mandatory; you can provide it either as inline baseline data (the ShapBaseline parameter) or by a S3 baseline file (the ShapBaselineUri parameter). For optional parameters, see the developer guide.

In the following code, we use a special token as the baseline:

baseline = [["<UNK>"]]
print(f"SHAP baseline: {baseline}")

The TextConfig is configured with sentence-level granularity (each sentence is a feature, and we need a few sentences per review for good visualization) and the language as English:

endpoint_config_name = "hospital-triage-model-ep-config"
csv_serializer = sagemaker.serializers.CSVSerializer()
json_deserializer = sagemaker.deserializers.JSONDeserializer()

sagemaker_client.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
"VariantName": "MainVariant",
"ModelName": model_name,
"InitialInstanceCount": 1,
"InstanceType": instance_type,
}
],
ExplainerConfig={
"ClarifyExplainerConfig": {
"InferenceConfig": {"FeatureTypes": ["text"]},
"ShapConfig": {
"ShapBaselineConfig": {"ShapBaseline": csv_serializer.serialize(baseline)},
"TextConfig": {"Granularity": "sentence", "Language": "en"},
},
}
},
)

Finally, after you have the model and endpoint configuration ready, use the create_endpoint API to create your endpoint. The endpoint_name must be unique within a Region in your AWS account. The create_endpoint API is synchronous in nature and returns an immediate response with the endpoint status being in the Creating state.

endpoint_name = "hospital-triage-prediction-endpoint"
sagemaker_client.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name,
)

Explain the prediction

Now that you have deployed the endpoint with online explainability enabled, you can try some examples. You can invoke the real-time endpoint using the invoke_endpoint method by providing the serialized payload, which in this case is some sample admission notes:

response = sagemaker_runtime_client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType="text/csv",
Accept="text/csv",
Body=csv_serializer.serialize(sample_admission_note.iloc[:1, :].to_numpy())
)

result = json_deserializer.deserialize(response["Body"], content_type=response["ContentType"])
pprint.pprint(result)

In the first scenario, let’s assume that the following medical admission note was taken by a healthcare worker:

“Patient is a 25-year-old male with a chief complaint of acute chest pain. Patient reports the pain began suddenly while at work and has been constant since. Patient rates the pain as 8/10 in severity. Patient denies any radiation of pain, shortness of breath, nausea, or vomiting. Patient reports no previous history of chest pain. Vital signs are as follows: blood pressure 140/90 mmH. Heart rate 92 beats per minute. Respiratory rate 18 breaths per minute. Oxygen saturation 96% on room air. Physical examination reveals mild tenderness to palpation over the precordium and clear lung fields. EKG shows sinus tachycardia with no ST-elevations or depressions.”

The following screenshot shows the model results.

After this is forwarded to the SageMaker endpoint, the label was predicted as 0, which indicates that the risk of mortality is low. In other words, 0 implies that the admitted patient is in non-acute condition according to the model. However, we need the reasoning behind that prediction. For that, you can use the SHAP values as the response. The response includes the SHAP values corresponding to the phrases of the input note, which can be further color-coded as green or red based on how the SHAP values contribute to the prediction. In this case, we see more phrases in green, such as “Patient reports no previous history of chest pain” and “EKG shows sinus tachycardia with no ST-elevations or depressions,” as opposed to red, aligning with the mortality prediction of 0.

In the second scenario, let’s assume that the following medical admission note was taken by a healthcare worker:

“Patient is a 72-year-old female with a chief complaint of severe sepsis and septic shock. Patient reports a fever, chills, and weakness for the past 3 days, as well as decreased urine output and confusion. Patient has a history of chronic obstructive pulmonary disease (COPD) and a recent hospitalization for pneumonia. Vital signs are as follows: blood pressure 80/40 mmHg. Heart rate 130 beats per minute. Respiratory rate 30 breaths per minute. Oxygen saturation 82% on 4L of oxygen via nasal cannula. Physical examination reveals diffuse erythema and warmth over the lower extremities and positive findings for sepsis such as altered mental status, tachycardia, and tachypnea. Blood cultures were taken and antibiotic therapy was started with appropriate coverage.”

The following screenshot shows our results.

After this is forwarded to the SageMaker endpoint, the label was predicted as 1, which indicates that the risk of mortality is high. This implies that the admitted patient is in acute condition according to the model. However, we need the reasoning behind that prediction. Again, you can use the SHAP values as the response. The response includes the SHAP values corresponding to the phrases of the input note, which can be further color-coded. In this case, we see more phrases in red, such as “Patient reports a fever, chills, and weakness for the past 3 days, as well as decreased urine output and confusion” and “Patient is a 72-year-old female with a chief complaint of severe sepsis shock,” as opposed to green, aligning with the mortality prediction of 1.

The clinical care team can use these explanations to assist in their decisions on the care process for each individual patient.

Clean up

To clean up the resources that have been created as part of this solution, run the following statements:

huggingface_model.delete_model()

predictor = sagemaker.Predictor(endpoint_name="triage-prediction-endpoint")

predictor.delete_endpoint()

Conclusion

This post showed you how to use SageMaker Clarify to explain decisions in a healthcare use case based on the medical notes captured during various stages of triage process. This solution can be integrated into existing decision support systems to provide another data point to clinicians as they evaluate patients for admission into the ICU. To learn more about using AWS services in the healthcare industry, check out the following blog posts:

References

[1] https://aclanthology.org/2021.eacl-main.75/

[2] https://arxiv.org/pdf/1705.07874.pdf


About the authors

Shamika Ariyawansa, serving as a Senior AI/ML Solutions Architect in the Global Healthcare and Life Sciences division at Amazon Web Services (AWS), has a keen focus on Generative AI. He assists customers in integrating Generative AI into their projects, emphasizing the importance of explainability within their AI-driven initiatives. Beyond his professional commitments, Shamika passionately pursues skiing and off-roading adventures.”

Ted Spencer is an experienced Solutions Architect with extensive acute healthcare experience. He is passionate about applying machine learning to solve new use cases, and rounds out solutions with both the end consumer and their business/clinical context in mind. He lives in Toronto Ontario, Canada, and enjoys traveling with his family and training for triathlons as time permits.

Ram Pathangi is a Solutions Architect at AWS supporting healthcare and life sciences customers in the San Francisco Bay Area. He has helped customers in finance, healthcare, life sciences, and hi-tech verticals run their business successfully on the AWS Cloud. He specializes in Databases, Analytics, and Machine Learning.

Read More

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Amazon SageMaker Data Wrangler reduces the time it takes to collect and prepare data for machine learning (ML) from weeks to minutes. You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface. Data is frequently kept in data lakes that can be managed by AWS Lake Formation, giving you the ability to implement fine-grained access control using a straightforward grant or revoke procedure. SageMaker Data Wrangler supports fine-grained data access control with Lake Formation and Amazon Athena connections.

We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.

Data professionals such as data scientists want to use the power of Apache Spark, Hive, and Presto running on Amazon EMR for fast data preparation; however, the learning curve is steep. Our customers wanted the ability to connect to Amazon EMR to run ad hoc SQL queries on Hive or Presto to query data in the internal metastore or external metastore (such as the AWS Glue Data Catalog), and prepare data within a few clicks.

In this post, we show how to use Lake Formation as a central data governance capability and Amazon EMR as a big data query engine to enable access for SageMaker Data Wrangler. The capabilities of Lake Formation simplify securing and managing distributed data lakes across multiple accounts through a centralized approach, providing fine-grained access control.

Solution overview

We demonstrate this solution with an end-to-end use case using a sample dataset, the TPC data model. This data represents transaction data for products and includes information such as customer demographics, inventory, web sales, and promotions. To demonstrate fine-grained data access permissions, we consider the following two users:

  • David, a data scientist on the marketing team. He is tasked with building a model on customer segmentation, and is only permitted to access non-sensitive customer data.
  • Tina, a data scientist on the sales team. She is tasked with building the sales forecast model, and needs access to sales data for the particular region. She is also helping the product team with innovation, and therefore needs access to product data as well.

The architecture is implemented as follows:

  • Lake Formation manages the data lake, and the raw data is available in Amazon Simple Storage Service (Amazon S3) buckets
  • Amazon EMR is used to query the data from the data lake and perform data preparation using Spark
  • AWS Identity and Access Management (IAM) roles are used to manage data access using Lake Formation
  • SageMaker Data Wrangler is used as the single visual interface to interactively query and prepare the data

The following diagram illustrates this architecture. Account A is the data lake account that houses all the ML-ready data obtained through extract, transform, and load (ETL) processes. Account B is the data science account where a group of data scientists compile and run data transformations using SageMaker Data Wrangler. In order for SageMaker Data Wrangler in Account B to have access to the data tables in Account A’s data lake via Lake Formation permissions, we must activate the necessary rights.

You can use the provided AWS CloudFormation stack to set up the architectural components for this solution.

Prerequisites

Before you get started, make sure you have the following prerequisites:

  • An AWS account
  • An IAM user with administrator access
  • An S3 bucket

Provision resources with AWS CloudFormation

We provide a CloudFormation template that deploys the services in the architecture for end-to-end testing and to facilitate repeated deployments. The outputs of this template are as follows:

  • An S3 bucket for the data lake.
  • An EMR cluster with EMR runtime roles enabled. For more details on using runtime roles with Amazon EMR, see Configure runtime roles for Amazon EMR steps. Associating runtime roles with EMR clusters is supported in Amazon EMR 6.9. Make sure the following configuration is in place:
    • Create a security configuration in Amazon EMR.
    • The EMR runtime role’s trust policy should allow the EMR EC2 instance profile to assume the role.
    • The EMR EC2 instance profile role should be able to assume the EMR runtime roles.
    • The EMR cluster should be created with encryption in transit.
  • IAM roles for accessing the data in data lake, with fine-grained permissions:
    • Marketing-data-access-role
    • Sales-data-access-role
  • An Amazon SageMaker Studio domain and two user profiles. The SageMaker Studio execution roles for the users allow the users to assume their corresponding EMR runtime roles.
  • A lifecycle configuration to enable the selection of the role to use for the EMR connection.
  • A Lake Formation database populated with the TPC data.
  • Networking resources required for the setup, such as VPC, subnets, and security groups.

Create Amazon EMR encryption certificates for the data in transit

With Amazon EMR release version 4.8.0 or later, you have option for specifying artifacts for encrypting data in transit using a security configuration. We manually create PEM certificates, include them in a .zip file, upload it to an S3 bucket, and then reference the .zip file in Amazon S3. You likely want to configure the private key PEM file to be a wildcard certificate that enables access to the VPC domain in which your cluster instances reside. For example, if your cluster resides in the us-east-1 Region, you could specify a common name in the certificate configuration that allows access to the cluster by specifying CN=*.ec2.internal in the certificate subject definition. If your cluster resides in us-west-2, you could specify CN=*.us-west-2.compute.internal.

Run the following commands using your system terminal. This will generate PEM certificates and collate them into a .zip file:

openssl req -x509 -newkey rsa:1024 -keyout privateKey.pem -out certificateChain.pem -days 365 -nodes -subj '/C=US/ST=Washington/L=Seattle/O=MyOrg/OU=MyDept/CN=*.us-east-2.compute.internal'

cp certificateChain.pem trustedCertificates.pem

zip -r -X my-certs.zip certificateChain.pem privateKey.pem trustedCertificates.pem

Upload my-certs.zip to an S3 bucket in the same Region where you intend to run this exercise. Copy the S3 URI for the uploaded file. You’ll need this while launching the CloudFormation template.

This example is a proof of concept demonstration only. Using self-signed certificates is not recommended and presents a potential security risk. For production systems, use a trusted certification authority (CA) to issue certificates.

Deploying the CloudFormation template

To deploy the solution, complete the following steps:

  1. Sign in to the AWS Management Console as an IAM user, preferably an admin user.
  2. Choose Launch Stack to launch the CloudFormation template:

  1. Choose Next.

  1. For Stack name, enter a name for the stack.
  2. For IdleTimeout, enter a value for the idle timeout for the EMR cluster (to avoid paying for the cluster when it’s not being used).
  3. For S3CertsZip, enter an S3 URI with the EMR encryption key.

For instructions to generate a key and .zip file specific to your Region, refer to Providing certificates for encrypting data in transit with Amazon EMR encryption. If you are deploying in US East (N. Virginia), remember to use CN=*.ec2.internal. For more information, refer to Create keys and certificates for data encryption. Make sure to upload the .zip file to an S3 bucket in the same Region as your CloudFormation stack deployment.

  1. On the review page, select the check box to confirm that AWS CloudFormation might create resources.
  2. Choose Create stack.

Wait until the status of the stack changes from CREATE_IN_PROGRESS to CREATE_COMPLETE. The process usually takes 10–15 minutes.

After the stack is created, allow Amazon EMR to query Lake Formation by updating the External Data Filtering settings on Lake Formation. For instructions, refer to Getting started with Lake Formation. Specify Amazon EMR for Session tag values and enter your AWS account ID under AWS account IDs.

Test data access permissions

Now that the necessary infrastructure is in place, you can verify that the two SageMaker Studio users have access to granular data. To review, David shouldn’t have access to any private information about your customers. Tina has access to information about sales. Let’s put each user type to the test.

Test David’s user profile

To test your data access with David’s user profile, complete the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. From the SageMaker Studio domain, launch SageMaker Studio from the user profile david-non-sensitive-customer.

  1. In your SageMaker Studio environment, create an Amazon SageMaker Data Wrangler flow, and choose Import & prepare data visually.

Alternatively, on the File menu, choose New, then choose Data Wrangler flow.

We discuss these steps to create a data flow in detail later in this post.

Test Tina’s user profile

Tina’s SageMaker Studio execution role allows her to access the Lake Formation database using two EMR execution roles. This is achieved by listing the role ARNs in a configuration file in Tina’s file directory. These roles can be set using SageMaker Studio lifecycle configurations to persist the roles across app restarts. To test Tina’s access, complete the following steps:

  1. On the SageMaker console, navigate to the SageMaker Studio domain.
  2. Launch SageMaker Studio from the user profile tina-sales-electronics.

It’s a good practice to close any previous SageMaker Studio sessions on your browser when switching user profiles. There can only be one active SageMaker Studio user session at a time.

  1. Create a Data Wrangler data flow.

In the following sections, we showcase creating a data flow within SageMaker Data Wrangler and connecting to Amazon EMR as the data source. David and Tina will have similar experiences with data preparation, except for access permissions, so they will see different tables.

Create a SageMaker Data Wrangler data flow

In this section, we cover connecting to the existing EMR cluster created through the CloudFormation template as a data source in SageMaker Data Wrangler. For demonstration purposes, we use David’s user profile.

To create your data flow, complete the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose StudioDomain, which was created by running the CloudFormation template.
  3. Select a user profile (for this example, David’s) and launch SageMaker Studio.

  1. Choose Open Studio.
  2. In SageMaker Studio, create a new data flow and choose Import & prepare data visually.

Alternatively, on the File menu, choose New, then choose Data Wrangler flow.

Creating a new flow can take a few minutes. After the flow has been created, you see the Import data page.

  1. To add Amazon EMR as a data source in SageMaker Data Wrangler, on the Add data source menu, choose Amazon EMR.

You can browse all the EMR clusters that your SageMaker Studio execution role has permissions to see. You have two options to connect to a cluster: one is through the interactive UI, and the other is to first create a secret using AWS Secrets Manager with a JDBC URL, including EMR cluster information, and then provide the stored AWS secret ARN in the UI to connect to Presto or Hive. In this post, we use the first method.

  1. Select any of the clusters that you want to use, then choose Next.

  1. Select which endpoint you want to use.
  2. Enter a name to identify your connection, such as emr-iam-connection, then choose Next.

  1. Select IAM as your authentication type and choose Connect.

When you’re connected, you can interactively view a database tree and table preview or schema. You can also query, explore, and visualize data from Amazon EMR. For a preview, you see a limit of 100 records by default. After you provide a SQL statement in the query editor and choose Run, the query is run on the Amazon EMR Hive engine to preview the data. Choose Cancel query to cancel ongoing queries if they are taking an unusually long time.

  1. Let’s access data from the table that David doesn’t have permissions to.

The query will result in the error message “Unable to fetch table dl_tpc_web_sales. Insufficient Lake Formation permission(s) on dl_tpc_web_sales.”

The last step is to import the data. When you are ready with the queried data, you have the option to update the sampling settings for the data selection according to the sampling type (FirstK, Random, or Stratified) and the sampling size for importing data into Data Wrangler.

  1. Choose Import to import the data.

On the next page, you can add various transformations and essential analysis to the dataset.

  1. Navigate to the data flow and add more steps to the flow as needed for transformations and analysis.

You can run a data insight report to identify data quality issues and get recommendations to fix those issues. Let’s look at some example transforms.

  1. In the Data flow view, you should see that we are using Amazon EMR as a data source using the Hive connector.

  1. Choose the plus sign next to Data types and choose Add transform.

Let’s explore the data and apply a transformation. For example, the c_login column is empty and it will not add value as a feature. Let’s delete the column.

  1. In the All steps pane, choose Add step.
  2. Choose Manage columns.

  1. For Transform, choose Drop column.
  2. For Columns to drop, choose the c_login column.
  3. Choose Preview, then choose Add.

  1. Verify the step by expanding the Drop column section.

You can continue adding steps based on the different transformations required for your dataset. Let’s go back to our data flow. You can now see the Drop column block showing the transform we performed.

ML practitioners spend a lot of time crafting feature engineering code, applying it to their initial datasets, training models on the engineered datasets, and evaluating model accuracy. Given the experimental nature of this work, even the smallest project will lead to multiple iterations. The same feature engineering code is often run again and again, wasting time and compute resources on repeating the same operations. In large organizations, this can cause an even greater loss of productivity because different teams often run identical jobs or even write duplicate feature engineering code because they have no knowledge of prior work. To avoid the reprocessing of features, we can export our transformed features to Amazon SageMaker Feature Store. For more information, refer to New – Store, Discover, and Share Machine Learning Features with Amazon SageMaker Feature Store.

  1. Choose the plus sign next to Drop column.
  2. Choose Export to and SageMaker Feature Store (via Jupyter notebook).

You can easily export your generated features to SageMaker Feature Store by specifying it as the destination. You can save the features into an existing feature group or create a new one. For more information, refer to Easily create and store features in Amazon SageMaker without code.

We have now created features with SageMaker Data Wrangler and stored those features in SageMaker Feature Store. We showed an example workflow for feature engineering in the SageMaker Data Wrangler UI.

Clean up

If your work with SageMaker Data Wrangler is complete, delete the resources you created to avoid incurring additional fees.

  1. In SageMaker Studio, close all the tabs, then on the File menu, choose Shut Down.

  1. When prompted, choose Shutdown All.

Shutdown might take a few minutes based on the instance type. Make sure all the apps associated with each user profile got deleted. If they were not deleted, manually delete the app associated under each user profile created using the CloudFormation template.

  1. On the Amazon S3 console, empty any S3 buckets that were created from the CloudFormation template when provisioning clusters.

The buckets should have the same prefix as the CloudFormation launch stack name and cf-templates-.

  1. On the Amazon EFS console, delete the SageMaker Studio file system.

You can confirm that you have the correct file system by choosing the file system ID and confirming the tag ManagedByAmazonSageMakerResource on the Tags tab.

  1. On the AWS CloudFormation console, select the stack you created and choose Delete.

You’ll receive an error message, which is expected. We’ll come back to this and clean it up in the subsequent steps.

  1. Identify the VPC that was created by the CloudFormation stack, named dw-emr-, and follow the prompts to delete the VPC.

  1. Return to the AWS CloudFormation console and retry the stack deletion for dw-emr-.

All the resources provisioned by the CloudFormation template described in this post have now been removed from your account.

Conclusion

In this post, we went over how to apply fine-grained access control with Lake Formation and access the data using Amazon EMR as a data source in SageMaker Data Wrangler, how to transform and analyze a dataset, and how to export the results to a data flow for use in a Jupyter notebook. After visualizing our dataset using SageMaker Data Wrangler’s built-in analytical features, we further enhanced our data flow. The fact that we created a data preparation pipeline without writing a single line of code is significant.

To get started with SageMaker Data Wrangler, refer to Prepare ML Data with Amazon SageMaker Data Wrangler.


About the Authors

Ajjay Govindaram is a Senior Solutions Architect at AWS. He works with strategic customers who are using AI/ML to solve complex business problems. His experience lies in providing technical direction as well as design assistance for modest to large-scale AI/ML application deployments. His knowledge ranges from application architecture to big data, analytics, and machine learning. He enjoys listening to music while resting, experiencing the outdoors, and spending time with his loved ones.

Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area. She helps AWS enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while ensuring resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.

Parth Patel is a Senior Solutions Architect at AWS in the San Francisco Bay Area. Parth guides enterprise customers to accelerate their journey to the cloud and help them adopt and grow on the AWS Cloud successfully. He is passionate about machine learning technologies, environmental sustainability, and application modernization.

Read More

NVIDIA Chief Scientist Bill Dally to Keynote at Hot Chips

NVIDIA Chief Scientist Bill Dally to Keynote at Hot Chips

Bill Dally — one of the world’s foremost computer scientists and head of NVIDIA’s research efforts — will describe the forces driving accelerated computing and AI in his keynote address at Hot Chips, an annual gathering of leading processor and system architects.

Dally will detail advances in GPU silicon, systems and software that are delivering unprecedented performance gains for a wide range of applications. The talk will show how techniques such as mixed-precision computing, high-speed interconnects and sparsity can take the large language models driving generative AI forward to the next level.

“It’s a really exciting time to be a computer engineer,” said Dally in February, when he was inducted into the Silicon Valley Engineering Council’s Hall of Fame.

Dally’s keynote will kick off the third day of Hot Chips at 9 a.m. PT on Aug. 29.

Registration is available online to attend the event virtually. The live event  at Stanford University, in Palo Alto, is already sold out.

In a career spanning nearly four decades, Dally has pioneered many of the fundamental technologies underlying today’s supercomputer and networking architectures. As head of NVIDIA Research, he leads a team of more than 300 around the globe who are inventing technologies for a wide variety of applications, including AI, HPC, graphics and networking.

Prior to joining NVIDIA in 2009 as chief scientist and senior vice president of research, he chaired Stanford University’s computer science department for some four years.

Dally is a member of the National Academy of Engineering and a fellow of the American Academy of Arts & Sciences, the Institute of Electrical and Electronics Engineers and the Association for Computing Machinery.

He’s written four textbooks, published more than 250 papers and holds over 120 patents, and has received the IEEE Seymour Cray Award, ACM Eckert-Mauchly Award and ACM Maurice Wilkes Award.

More NVIDIA Talks at Hot Chips

In a separate Hot Chips talk, Kevin Deierling, vice president of networking at NVIDIA, will describe the flexibility of NVIDIA BlueField DPUs and NVIDIA Spectrum networking switches for allocating resources based on changing network traffic and user rules.

A new benchmark result for the NVIDIA Grace CPU Superchip will be part of a talk by Arm on leadership performance and power efficiency for next-generation cloud computing.

The event begins Sunday, Aug. 27, with a full day of tutorials, including talks from NVIDIA experts on AI inference and chip-to-chip interconnects.

Read More

Google at Interspeech 2023

Google at Interspeech 2023

This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe.

We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonstrations of some of our latest speech technologies, which help to improve accessibility and provide convenience in communication for billions of users. In addition, online attendees are encouraged to visit our virtual booth in Topia where you can get up-to-date information on research and opportunities at Google. Visit the @GoogleAI Twitter account to find out about Google booth activities (e.g., demos and Q&A sessions). You can also learn more about the Google research being presented at INTERSPEECH 2023 below (Google affiliations in bold).

Board and Organizing Committee

ISCA Board, Technical Committee Chair: Bhuvana Ramabhadran

Area Chairs include:
    Analysis of Speech and Audio Signals: Richard Rose

    Speech Synthesis and Spoken Language Generation: Rob Clark

    Special Areas: Tara Sainath

Satellite events

VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23)

Organizers include: Arsha Nagrani

ISCA Speech Synthesis Workshop (SSW12)

Speakers include: Rob Clark

Keynote talk – ISCA Medalist

Survey Talk

Speech Compression in the AI Era

Speaker: Jan Skoglund

Special session papers

Cascaded Encoders for Fine-Tuning ASR Models on Overlapped Speech
Richard Rose, Oscar Chang, Olivier Siohan

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Hakan Erdogan, Scott Wisdom, Xuankai Chang*, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

Papers

DeePMOS: Deep Posterior Mean-Opinion-Score of Speech
Xinyu Liang, Fredrik Cumlin, Christian Schüldt, Saikat Chatterjee

O-1: Self-Training with Oracle and 1-Best Hypothesis
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi

Re-investigating the Efficient Transfer Learning of Speech Foundation Model Using Feature Fusion Methods
Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno

MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard Errors
Joshua Camp, Tom Kenter, Lev Finkelstein, Rob Clark

LanSER: Language-Model Supported Speech Emotion Recognition
Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou

Modular Domain Adaptation for Conformer-Based Streaming ASR
Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar

On Training a Neural Residual Acoustic Echo Suppressor for Improved ASR
Sankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan

MD3: The Multi-dialect Dataset of Dialogues
Jacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma

Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR
Zelin Wu, Tsendsuren Munkhdalai, Pat Rondon, Golan Pundak, Khe Chai Sim, Christopher Li

Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran

How to Estimate Model Transferability of Pre-trained Speech Models?
Zih-Ching Chen, Chao-Han Huck Yang*, Bo Li, Yu Zhang, Nanxin Chen, Shuo-yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

Improving Joint Speech-Text Representations Without Alignment
Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Streaming Parrotron for On-Device Speech-to-Speech Conversion
Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal

Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASR
W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

Universal Automatic Phonetic Transcription into the International Phonetic Alphabet
Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang

Mixture-of-Expert Conformer for Streaming Multilingual ASR
Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays

Real Time Spectrogram Inversion on Mobile Phone
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

2-Bit Conformer Quantization for Automatic Speech Recognition
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

LibriTTS-R: A Restored Multi-speaker Text-to-Speech Corpus
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

PronScribe: Highly Accurate Multimodal Phonemic Transcription from Speech and Text
Yang Yu, Matthew Perez*, Ankur Bapna, Fadi Haik, Siamak Tazari, Yu Zhang

Label Aware Speech Representation Learning for Language Identification
Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar


* Work done while at Google

Read More

Autonomous visual information seeking with large language models

Autonomous visual information seeking with large language models

There has been great progress towards adapting large language models (LLMs) to accommodate multimodal inputs for tasks including image captioning, visual question answering (VQA), and open vocabulary recognition. Despite such achievements, current state-of-the-art visual language models (VLMs) perform inadequately on visual information seeking datasets, such as Infoseek and OK-VQA, where external knowledge is required to answer the questions.

Examples of visual information seeking queries where external knowledge is required to answer the question. Images are taken from the OK-VQA dataset.

In “AVIS: Autonomous Visual Information Seeking with Large Language Models”, we introduce a novel method that achieves state-of-the-art results on visual information seeking tasks. Our method integrates LLMs with three types of tools: (i) computer vision tools for extracting visual information from images, (ii) a web search tool for retrieving open world knowledge and facts, and (iii) an image search tool to glean relevant information from metadata associated with visually similar images. AVIS employs an LLM-powered planner to choose tools and queries at each step. It also uses an LLM-powered reasoner to analyze tool outputs and extract key information. A working memory component retains information throughout the process.

An example of AVIS’s generated workflow for answering a challenging visual information seeking question. The input image is taken from the Infoseek dataset.

Comparison to previous work

Recent studies (e.g., Chameleon, ViperGPT and MM-ReAct) explored adding tools to LLMs for multimodal inputs. These systems follow a two-stage process: planning (breaking down questions into structured programs or instructions) and execution (using tools to gather information). Despite success in basic tasks, this approach often falters in complex real-world scenarios.

There has also been a surge of interest in applying LLMs as autonomous agents (e.g., WebGPT and ReAct). These agents interact with their environment, adapt based on real-time feedback, and achieve goals. However, these methods do not restrict the tools that can be invoked at each stage, leading to an immense search space. Consequently, even the most advanced LLMs today can fall into infinite loops or propagate errors. AVIS tackles this via guided LLM use, influenced by human decisions from a user study.

Informing LLM decision making with a user study

Many of the visual questions in datasets such as Infoseek and OK-VQA pose a challenge even for humans, often requiring the assistance of various tools and APIs. An example question from the OK-VQA dataset is shown below. We conducted a user study to understand human decision-making when using external tools.

We conducted a user study to understand human decision-making when using external tools. Image is taken from the OK-VQA dataset.

The users were equipped with an identical set of tools as our method, including PALI, PaLM, and web search. They received input images, questions, detected object crops, and buttons linked to image search results. These buttons offered diverse information about the detected object crops, such as knowledge graph entities, similar image captions, related product titles, and identical image captions.

We record user actions and outputs and use it as a guide for our system in two key ways. First, we construct a transition graph (shown below) by analyzing the sequence of decisions made by users. This graph defines distinct states and restricts the available set of actions at each state. For example, at the start state, the system can take only one of these three actions: PALI caption, PALI VQA, or object detection. Second, we use the examples of human decision-making to guide our planner and reasoner with relevant contextual instances to enhance the performance and effectiveness of our system.

AVIS transition graph.

General framework

Our approach employs a dynamic decision-making strategy designed to respond to visual information-seeking queries. Our system has three primary components. First, we have a planner to determine the subsequent action, including the appropriate API call and the query it needs to process. Second, we have a working memory that retains information about the results obtained from API executions. Last, we have a reasoner, whose role is to process the outputs from the API calls. It determines whether the obtained information is sufficient to produce the final response, or if additional data retrieval is required.

The planner undertakes a series of steps each time a decision is required regarding which tool to employ and what query to send to it. Based on the present state, the planner provides a range of potential subsequent actions. The potential action space may be so large that it makes the search space intractable. To address this issue, the planner refers to the transition graph to eliminate irrelevant actions. The planner also excludes the actions that have already been taken before and are stored in the working memory.

Next, the planner collects a set of relevant in-context examples that are assembled from the decisions previously made by humans during the user study. With these examples and the working memory that holds data collected from past tool interactions, the planner formulates a prompt. The prompt is then sent to the LLM, which returns a structured answer, determining the next tool to be activated and the query to be dispatched to it. This design allows the planner to be invoked multiple times throughout the process, thereby facilitating dynamic decision-making that gradually leads to answering the input query.

We employ a reasoner to analyze the output of the tool execution, extract the useful information and decide into which category the tool output falls: informative, uninformative, or final answer. Our method utilizes the LLM with appropriate prompting and in-context examples to perform the reasoning. If the reasoner concludes that it’s ready to provide an answer, it will output the final response, thus concluding the task. If it determines that the tool output is uninformative, it will revert back to the planner to select another action based on the current state. If it finds the tool output to be useful, it will modify the state and transfer control back to the planner to make a new decision at the new state.

AVIS employs a dynamic decision-making strategy to respond to visual information-seeking queries.

Results

We evaluate AVIS on Infoseek and OK-VQA datasets. As shown below, even robust visual-language models, such as OFA and PaLI, fail to yield high accuracy when fine-tuned on Infoseek. Our approach (AVIS), without fine-tuning, achieves 50.7% accuracy on the unseen entity split of this dataset.

AVIS visual question answering results on Infoseek dataset. AVIS achieves higher accuracy in comparison to previous baselines based on PaLI, PaLM and OFA.

Our results on the OK-VQA dataset are shown below. AVIS with few-shot in-context examples achieves an accuracy of 60.2%, higher than most of the previous works. AVIS achieves lower but comparable accuracy in comparison to the PALI model fine-tuned on OK-VQA. This difference, compared to Infoseek where AVIS outperforms fine-tuned PALI, is due to the fact that most question-answer examples in OK-VQA rely on common sense knowledge rather than on fine-grained knowledge. Therefore, PaLI is able to encode such generic knowledge in the model parameters and doesn’t require external knowledge.

Visual question answering results on A-OKVQA. AVIS achieves higher accuracy in comparison to previous works that use few-shot or zero-shot learning, including Flamingo, PaLI and ViperGPT. AVIS also achieves higher accuracy than most of the previous works that are fine-tuned on OK-VQA dataset, including REVEAL, ReVIVE, KAT and KRISP, and achieves results that are close to the fine-tuned PaLI model.

Conclusion

We present a novel approach that equips LLMs with the ability to use a variety of tools for answering knowledge-intensive visual questions. Our methodology, anchored in human decision-making data collected from a user study, employs a structured framework that uses an LLM-powered planner to dynamically decide on tool selection and query formation. An LLM-powered reasoner is tasked with processing and extracting key information from the output of the selected tool. Our method iteratively employs the planner and reasoner to leverage different tools until all necessary information required to answer the visual question is amassed.

Acknowledgements

This research was conducted by Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David A. Ross, Cordelia Schmid and Alireza Fathi.

Read More

The TensorFlow Lite Plugin for Flutter is Officially Available

The TensorFlow Lite Plugin for Flutter is Officially Available

Posted by Paul Ruiz, Developer Relations Engineer

We’re excited to announce that the TensorFlow Lite plugin for Flutter has been officially migrated to the TensorFlow GitHub account and released!

Three years ago, Amish Garg, one of our talented Google Summer of Code contributors, wrote a widely used TensorFlow Lite plugin for Flutter. The plugin was so popular that we decided to migrate it to our official repo, making it easier to maintain directly by the Google team. We are grateful to Amish for his contributions to the TensorFlow Lite Flutter plugin.

Through the efforts of developers in the community, the plugin has been updated to the latest version of TensorFlow Lite, and a collection of new features and example apps have been added, such as object detection through a live camera feed.

Moving image of a live camera feed showing several objects on a work desk being detected

So what is TensorFlow Lite? TensorFlow Lite is a way to run TensorFlow models on devices locally, supporting mobile, embedded, web, and edge devices. TensorFlow Lite’s cross-platform support and on-device performance optimizations make it a great addition to the Flutter development toolbox. Our goal with this plugin is to make it easy to integrate TensorFlow Lite models into Flutter apps across mobile platforms, with desktop support currently in development through the efforts of our developer community. Find pre-trained TensorFlow Lite models on model repos like Kaggle Models or create your own custom TensorFlow Lite models.

Let’s take a look at how you could use the Flutter TensorFlow Lite plugin for image classification:

TensorFlow Lite Image Classification with Flutter

First you will need to install the plugin from pub.dev. Once the plugin is installed, you can load a TensorFlow Lite model into your Flutter app and define the input and output tensor shapes. If you’re using the MobileNet model, then the input tensor will be a 224 by 224 RGB image, and the output will be a list of confidence scores for the trained labels.

// Load model
Future<void> _loadModel() async {
final options = InterpreterOptions();

// Load model from assets
interpreter = await Interpreter.fromAsset(modelPath, options: options);
// Get tensor input shape [1, 224, 224, 3]
inputTensor = interpreter.getInputTensors().first;
// Get tensor output shape [1, 1001]
outputTensor = interpreter.getOutputTensors().first;
}

To make things a bit more organized, you can also load in the labels for the 1000 items that MobileNet is trained for:

// Load labels from assets
Future<void> _loadLabels() async {
final labelTxt = await rootBundle.loadString(labelsPath);
labels = labelTxt.split('n');
}

For the sake of being succinct, let’s go ahead and skip some of the pre-processing steps, though you can find them in the repo’s image classification example here.

When you’re ready to run inference, you can create a new input and output based on the tensor shapes that you defined earlier, then call run on the interpreter to get your final results.

// Run inference
Future<void> runInference(
List<List<List<num>>> imageMatrix,
)
async {
// Tensor input [1, 224, 224, 3]
final input = [imageMatrix];
// Tensor output [1, 1001]
final output = [List<int>.filled(1001, 0)];

   // Run inference
   interpreter.run(input, output);

   // Get first output tensor
   final result = output.first;

Now that you have your results, you can match them to your labels and use them in your app.

Moving image of a live camera feed showing several objects on a work desk being correctly identified in the app

What’s next?

To explore what else you can do with the Flutter TensorFlow Lite plugin, check out the official GitHub repository where you can find examples for text classification, super resolution, style transfer, and more!

Additionally, we are working on a new plugin specifically for MediaPipe Tasks, a low-code tool for easily performing common on-device machine learning tasks. This includes image classification and object detection, like you’ve just learned about, as well as audio classification, face landmark detection, and gesture recognition, alongside a whole lot more.

We look forward to all the exciting things you make, so be sure to share them with @googledevs, @TensorFlow, and your developer communities!

Read More