Neural network pruning with combinatorial optimization

Neural network pruning with combinatorial optimization

Modern neural networks have achieved impressive performance across a variety of applications, such as language, mathematical reasoning, and vision. However, these networks often use large architectures that require lots of computational resources. This can make it impractical to serve such models to users, especially in resource-constrained environments like wearables and smartphones. A widely used approach to mitigate the inference costs of pre-trained networks is to prune them by removing some of their weights, in a way that doesn’t significantly affect utility. In standard neural networks, each weight defines a connection between two neurons. So after weights are pruned, the input will propagate through a smaller set of connections and thus requires less computational resources.

Original network vs. a pruned network.

Pruning methods can be applied at different stages of the network’s training process: post, during, or before training (i.e., immediately after weight initialization). In this post, we focus on the post-training setting: given a pre-trained network, how can we determine which weights should be pruned? One popular method is magnitude pruning, which removes weights with the smallest magnitude. While efficient, this method doesn’t directly consider the effect of removing weights on the network’s performance. Another popular paradigm is optimization-based pruning, which removes weights based on how much their removal impacts the loss function. Although conceptually appealing, most existing optimization-based approaches seem to face a serious tradeoff between performance and computational requirements. Methods that make crude approximations (e.g., assuming a diagonal Hessian matrix) can scale well, but have relatively low performance. On the other hand, while methods that make fewer approximations tend to perform better, they appear to be much less scalable.

In “Fast as CHITA: Neural Network Pruning with Combinatorial Optimization”, presented at ICML 2023, we describe how we developed an optimization-based approach for pruning pre-trained neural networks at scale. CHITA (which stands for “Combinatorial Hessian-free Iterative Thresholding Algorithm”) outperforms existing pruning methods in terms of scalability and performance tradeoffs, and it does so by leveraging advances from several fields, including high-dimensional statistics, combinatorial optimization, and neural network pruning. For example, CHITA can be 20x to 1000x faster than state-of-the-art methods for pruning ResNet and improves accuracy by over 10% in many settings.

Overview of contributions

CHITA has two notable technical improvements over popular methods:

  • Efficient use of second-order information: Pruning methods that use second-order information (i.e., relating to second derivatives) achieve the state of the art in many settings. In the literature, this information is typically used by computing the Hessian matrix or its inverse, an operation that is very difficult to scale because the Hessian size is quadratic with respect to the number of weights. Through careful reformulation, CHITA uses second-order information without having to compute or store the Hessian matrix explicitly, thus allowing for more scalability.
  • Combinatorial optimization: Popular optimization-based methods use a simple optimization technique that prunes weights in isolation, i.e., when deciding to prune a certain weight they don’t take into account whether other weights have been pruned. This could lead to pruning important weights because weights deemed unimportant in isolation may become important when other weights are pruned. CHITA avoids this issue by using a more advanced, combinatorial optimization algorithm that takes into account how pruning one weight impacts others.

In the sections below, we discuss CHITA’s pruning formulation and algorithms.

A computation-friendly pruning formulation

There are many possible pruning candidates, which are obtained by retaining only a subset of the weights from the original network. Let k be a user-specified parameter that denotes the number of weights to retain. Pruning can be naturally formulated as a best-subset selection (BSS) problem: among all possible pruning candidates (i.e., subsets of weights) with only k weights retained, the candidate that has the smallest loss is selected.

Pruning as a BSS problem: among all possible pruning candidates with the same total number of weights, the best candidate is defined as the one with the least loss. This illustration shows four candidates, but this number is generally much larger.

Solving the pruning BSS problem on the original loss function is generally computationally intractable. Thus, similar to previous work, such as OBD and OBS, we approximate the loss with a quadratic function by using a second-order Taylor series, where the Hessian is estimated with the empirical Fisher information matrix. While gradients can be typically computed efficiently, computing and storing the Hessian matrix is prohibitively expensive due to its sheer size. In the literature, it is common to deal with this challenge by making restrictive assumptions on the Hessian (e.g., diagonal matrix) and also on the algorithm (e.g., pruning weights in isolation).

CHITA uses an efficient reformulation of the pruning problem (BSS using the quadratic loss) that avoids explicitly computing the Hessian matrix, while still using all the information from this matrix. This is made possible by exploiting the low-rank structure of the empirical Fisher information matrix. This reformulation can be viewed as a sparse linear regression problem, where each regression coefficient corresponds to a certain weight in the neural network. After obtaining a solution to this regression problem, coefficients set to zero will correspond to weights that should be pruned. Our regression data matrix is (n x p), where n is the batch (sub-sample) size and p is the number of weights in the original network. Typically n << p, so storing and operating with this data matrix is much more scalable than common pruning approaches that operate with the (p x p) Hessian.

CHITA reformulates the quadratic loss approximation, which requires an expensive Hessian matrix, as a linear regression (LR) problem. The LR’s data matrix is linear in p, which makes the reformulation more scalable than the original quadratic approximation.

Scalable optimization algorithms

CHITA reduces pruning to a linear regression problem under the following sparsity constraint: at most k regression coefficients can be nonzero. To obtain a solution to this problem, we consider a modification of the well-known iterative hard thresholding (IHT) algorithm. IHT performs gradient descent where after each update the following post-processing step is performed: all regression coefficients outside the Top-k (i.e., the k coefficients with the largest magnitude) are set to zero. IHT typically delivers a good solution to the problem, and it does so iteratively exploring different pruning candidates and jointly optimizing over the weights.

Due to the scale of the problem, standard IHT with constant learning rate can suffer from very slow convergence. For faster convergence, we developed a new line-search method that exploits the problem structure to find a suitable learning rate, i.e., one that leads to a sufficiently large decrease in the loss. We also employed several computational schemes to improve CHITA’s efficiency and the quality of the second-order approximation, leading to an improved version that we call CHITA++.

Experiments

We compare CHITA’s run time and accuracy with several state-of-the-art pruning methods using different architectures, including ResNet and MobileNet.

Run time: CHITA is much more scalable than comparable methods that perform joint optimization (as opposed to pruning weights in isolation). For example, CHITA’s speed-up can reach over 1000x when pruning ResNet.

Post-pruning accuracy: Below, we compare the performance of CHITA and CHITA++ with magnitude pruning (MP), Woodfisher (WF), and Combinatorial Brain Surgeon (CBS), for pruning 70% of the model weights. Overall, we see good improvements from CHITA and CHITA++.

Post-pruning accuracy of various methods on ResNet20. Results are reported for pruning 70% of the model weights.
Post-pruning accuracy of various methods on MobileNet. Results are reported for pruning 70% of the model weights.

Next, we report results for pruning a larger network: ResNet50 (on this network, some of the methods listed in the ResNet20 figure couldn’t scale). Here we compare with magnitude pruning and M-FAC. The figure below shows that CHITA achieves better test accuracy for a wide range of sparsity levels.

Test accuracy of pruned networks, obtained using different methods.

Conclusion, limitations, and future work

We presented CHITA, an optimization-based approach for pruning pre-trained neural networks. CHITA offers scalability and competitive performance by efficiently using second-order information and drawing on ideas from combinatorial optimization and high-dimensional statistics.

CHITA is designed for unstructured pruning in which any weight can be removed. In theory, unstructured pruning can significantly reduce computational requirements. However, realizing these reductions in practice requires special software (and possibly hardware) that support sparse computations. In contrast, structured pruning, which removes whole structures like neurons, may offer improvements that are easier to attain on general-purpose software and hardware. It would be interesting to extend CHITA to structured pruning.

Acknowledgements

This work is part of a research collaboration between Google and MIT. Thanks to Rahul Mazumder, Natalia Ponomareva, Wenyu Chen, Xiang Meng, Zhe Zhao, and Sergei Vassilvitskii for their help in preparing this post and the paper. Also thanks to John Guilyard for creating the graphics in this post.

Read More

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. Many practitioners are extending these Redshift datasets at scale for machine learning (ML) using Amazon SageMaker, a fully managed ML service, with requirements to develop features offline in a code way or low-code/no-code way, store featured data from Amazon Redshift, and make this happen at scale in a production environment.

In this post, we show you three options to prepare Redshift source data at scale in SageMaker, including loading data from Amazon Redshift, performing feature engineering, and ingesting features into Amazon SageMaker Feature Store:

If you’re an AWS Glue user and would like to do the process interactively, consider option A. If you’re familiar with SageMaker and writing Spark code, option B could be your choice. If you want to do the process in a low-code/no-code way, you can follow option C.

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale.

SageMaker Studio is the first fully integrated development environment (IDE) for ML. It provides a single web-based visual interface where you can perform all ML development steps, including preparing data and building, training, and deploying models.

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development. AWS Glue enables you to seamlessly collect, transform, cleanse, and prepare data for storage in your data lakes and data pipelines using a variety of capabilities, including built-in transforms.

Solution overview

The following diagram illustrates the solution architecture for each option.

Prerequisites

To continue with the examples in this post, you need to create the required AWS resources. To do this, we provide an AWS CloudFormation template to create a stack that contains the resources. When you create the stack, AWS creates a number of resources in your account:

  • A SageMaker domain, which includes an associated Amazon Elastic File System (Amazon EFS) volume
  • A list of authorized users and a variety of security, application, policy, and Amazon Virtual Private Cloud (Amazon VPC) configurations
  • A Redshift cluster
  • A Redshift secret
  • An AWS Glue connection for Amazon Redshift
  • An AWS Lambda function to set up required resources, execution roles and policies

Make sure that you don’t have already two SageMaker Studio domains in the Region where you’re running the CloudFormation template. This is the maximum allowed number of domains in each supported Region.

Deploy the CloudFormation template

Complete the following steps to deploy the CloudFormation template:

  1. Save the CloudFormation template sm-redshift-demo-vpc-cfn-v1.yaml locally.
  2. On the AWS CloudFormation console, choose Create stack.
  3. For Prepare template, select Template is ready.
  4. For Template source, select Upload a template file.
  5. Choose Choose File and navigate to the location on your computer where the CloudFormation template was downloaded and choose the file.
  6. Enter a stack name, such as Demo-Redshift.
  7. On the Configure stack options page, leave everything as default and choose Next.
  8. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names and choose Create stack.

You should see a new CloudFormation stack with the name Demo-Redshift being created. Wait for the status of the stack to be CREATE_COMPLETE (approximately 7 minutes) before moving on. You can navigate to the stack’s Resources tab to check what AWS resources were created.

Launch SageMaker Studio

Complete the following steps to launch your SageMaker Studio domain:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose the domain you created as part of the CloudFormation stack (SageMakerDemoDomain).
  3. Choose Launch and Studio.

This page can take 1–2 minutes to load when you access SageMaker Studio for the first time, after which you’ll be redirected to a Home tab.

Download the GitHub repository

Complete the following steps to download the GitHub repo:

  1. In the SageMaker notebook, on the File menu, choose New and Terminal.
  2. In the terminal, enter the following command:
git clone https://github.com/aws-samples/amazon-sagemaker-featurestore-redshift-integration.git

You can now see the amazon-sagemaker-featurestore-redshift-integration folder in navigation pane of SageMaker Studio.

Set up batch ingestion with the Spark connector

Complete the following steps to set up batch ingestion:

  1. In SageMaker Studio, open the notebook 1-uploadJar.ipynb under amazon-sagemaker-featurestore-redshift-integration.
  2. If you are prompted to choose a kernel, choose Data Science as the image and Python 3 as the kernel, then choose Select.
  3. For the following notebooks, choose the same image and kernel except the AWS Glue Interactive Sessions notebook (4a).
  4. Run the cells by pressing Shift+Enter in each of the cells.

While the code runs, an asterisk (*) appears between the square brackets. When the code is finished running, the * will be replaced with numbers. This action is also workable for all other notebooks.

Set up the schema and load data to Amazon Redshift

The next step is to set up the schema and load data from Amazon Simple Storage Service (Amazon S3) to Amazon Redshift. To do so, run the notebook 2-loadredshiftdata.ipynb.

Create feature stores in SageMaker Feature Store

To create your feature stores, run the notebook 3-createFeatureStore.ipynb.

Perform feature engineering and ingest features into SageMaker Feature Store

In this section, we present the steps for all three options to perform feature engineering and ingest processed features into SageMaker Feature Store.

Option A: Use SageMaker Studio with a serverless AWS Glue interactive session

Complete the following steps for option A:

  1. In SageMaker Studio, open the notebook 4a-glue-int-session.ipynb.
  2. If you are prompted to choose a kernel, choose SparkAnalytics 2.0 as the image and Glue Python [PySpark and Ray] as the kernel, then choose Select.

The environment preparation process may take some time to complete.

Option B: Use a SageMaker Processing job with Spark

In this option, we use a SageMaker Processing job with a Spark script to load the original dataset from Amazon Redshift, perform feature engineering, and ingest the data into SageMaker Feature Store. To do so, open the notebook 4b-processing-rs-to-fs.ipynb in your SageMaker Studio environment.

Here we use RedshiftDatasetDefinition to retrieve the dataset from the Redshift cluster. RedshiftDatasetDefinition is one type of input of the processing job, which provides a simple interface for practitioners to configure Redshift connection-related parameters such as identifier, database, table, query string, and more. You can easily establish your Redshift connection using RedshiftDatasetDefinition without maintaining a connection full time. We also use the SageMaker Feature Store Spark connector library in the processing job to connect to SageMaker Feature Store in a distributed environment. With this Spark connector, you can easily ingest data to the feature group’s online and offline store from a Spark DataFrame. Also, this connector contains the functionality to automatically load feature definitions to help with creating feature groups. Above all, this solution offers you a native Spark way to implement an end-to-end data pipeline from Amazon Redshift to SageMaker. You can perform any feature engineering in a Spark context and ingest final features into SageMaker Feature Store in just one Spark project.

To use the SageMaker Feature Store Spark connector, we extend a pre-built SageMaker Spark container with sagemaker-feature-store-pyspark installed. In the Spark script, use the system executable command to run pip install, install this library in your local environment, and get the local path of the JAR file dependency. In the processing job API, provide this path to the parameter of submit_jars to the node of the Spark cluster that the processing job creates.

In the Spark script for the processing job, we first read the original dataset files from Amazon S3, which temporarily stores the unloaded dataset from Amazon Redshift as a medium. Then we perform feature engineering in a Spark way and use feature_store_pyspark to ingest data into the offline feature store.

For the processing job, we provide a ProcessingInput with a redshift_dataset_definition. Here we build a structure according to the interface, providing Redshift connection-related configurations. You can use query_string to filter your dataset by SQL and unload it to Amazon S3. See the following code:

rdd_input = ProcessingInput(
            input_name="redshift_dataset_definition",
            app_managed=True,
            dataset_definition=DatasetDefinition(
                local_path="/opt/ml/processing/input/rdd",
                data_distribution_type="FullyReplicated",
                input_mode="File",
                redshift_dataset_definition=RedshiftDatasetDefinition(
                    cluster_id=_cluster_id,
                    database=_dbname,
                    db_user=_username,
                    query_string=_query_string,
                    cluster_role_arn=_redshift_role_arn,
                    output_s3_uri=_s3_rdd_output,
                    output_format="PARQUET"
                ),
            ),
        )

You need to wait 6–7 minutes for each processing job including USER, PLACE, and RATING datasets.

For more details about SageMaker Processing jobs, refer to Process data.

For SageMaker native solutions for feature processing from Amazon Redshift, you can also use Feature Processing in SageMaker Feature Store, which is for underlying infrastructure including provisioning the compute environments and creating and maintaining SageMaker pipelines to load and ingest data. You can only focus on your feature processor definitions that include transformation functions, the source of Amazon Redshift, and the sink of SageMaker Feature Store. The scheduling, job management, and other workloads in production are managed by SageMaker. Feature Processor pipelines are SageMaker pipelines, so the standard monitoring mechanisms and integrations are available.

Option C: Use SageMaker Data Wrangler

SageMaker Data Wrangler allows you to import data from various data sources including Amazon Redshift for a low-code/no-code way to prepare, transform, and featurize your data. After you finish data preparation, you can use SageMaker Data Wrangler to export features to SageMaker Feature Store.

There are some AWS Identity and Access Management (IAM) settings that allow SageMaker Data Wrangler to connect to Amazon Redshift. First, create an IAM role (for example, redshift-s3-dw-connect) that includes an Amazon S3 access policy. For this post, we attached the AmazonS3FullAccess policy to the IAM role. If you have restrictions of accessing a specified S3 bucket, you can define it in the Amazon S3 access policy. We attached the IAM role to the Redshift cluster that we created earlier. Next, create a policy for SageMaker to access Amazon Redshift by getting its cluster credentials, and attach the policy to the SageMaker IAM role. The policy looks like the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "redshift:getclustercredentials",
            "Effect": "Allow",
            "Resource": [
                "*"
            ]
        }
    ]
}

After this setup, SageMaker Data Wrangler allows you to query Amazon Redshift and output the results into an S3 bucket. For instructions to connect to a Redshift cluster and query and import data from Amazon Redshift to SageMaker Data Wrangler, refer to Import data from Amazon Redshift.

SageMaker Data Wrangler offers a selection of over 300 pre-built data transformations for common use cases such as deleting duplicate rows, imputing missing data, one-hot encoding, and handling time series data. You can also add custom transformations in pandas or PySpark. In our example, we applied some transformations such as drop column, data type enforcement, and ordinal encoding to the data.

When your data flow is complete, you can export it to SageMaker Feature Store. At this point, you need to create a feature group: give the feature group a name, select both online and offline storage, provide the name of a S3 bucket to use for the offline store, and provide a role that has SageMaker Feature Store access. Finally, you can create a job, which creates a SageMaker Processing job that runs the SageMaker Data Wrangler flow to ingest features from the Redshift data source to your feature group.

Here is one end-to-end data flow in the scenario of PLACE feature engineering.

Use SageMaker Feature Store for model training and prediction

To use SageMaker Feature store for model training and prediction, open the notebook 5-classification-using-feature-groups.ipynb.

After the Redshift data is transformed into features and ingested into SageMaker Feature Store, the features are available for search and discovery across teams of data scientists responsible for many independent ML models and use cases. These teams can use the features for modeling without having to rebuild or rerun feature engineering pipelines. Feature groups are managed and scaled independently, and can be reused and joined together regardless of the upstream data source.

The next step is to build ML models using features selected from one or multiple feature groups. You decide which feature groups to use for your models. There are two options to create an ML dataset from feature groups, both utilizing the SageMaker Python SDK:

  • Use the SageMaker Feature Store DatasetBuilder API – The SageMaker Feature Store DatasetBuilder API allows data scientists create ML datasets from one or more feature groups in the offline store. You can use the API to create a dataset from a single or multiple feature groups, and output it as a CSV file or a pandas DataFrame. See the following example code:
from sagemaker.feature_store.dataset_builder import DatasetBuilder

fact_rating_dataset = DatasetBuilder(
    sagemaker_session = sagemaker_session, 
    base = fact_rating_feature_group,
    output_path = f"s3://{s3_bucket_name}/{prefix}",
    record_identifier_feature_name = 'ratingid',
    event_time_identifier_feature_name = 'timestamp', 
).to_dataframe()[0]
  • Run SQL queries using the athena_query function in the FeatureGroup API – Another option is to use the auto-built AWS Glue Data Catalog for the FeatureGroup API. The FeatureGroup API includes an Athena_query function that creates an AthenaQuery instance to run user-defined SQL query strings. Then you run the Athena query and organize the query result into a pandas DataFrame. This option allows you to specify more complicated SQL queries to extract information from a feature group. See the following example code:
dim_user_query = dim_user_feature_group.athena_query()
dim_user_table = dim_user_query.table_name

dim_user_query_string = (
    'SELECT * FROM "'
    + dim_user_table
    + '"'
)

dim_user_query.run(
    query_string = dim_user_query_string,
    output_location = f"s3://{s3_bucket_name}/{prefix}",
)

dim_user_query.wait()
dim_user_dataset = dim_user_query.as_dataframe()

Next, we can merge the queried data from different feature groups into our final dataset for model training and testing. For this post, we use batch transform for model inference. Batch transform allows you to get model inferene on a bulk of data in Amazon S3, and its inference result is stored in Amazon S3 as well. For details on model training and inference, refer to the notebook 5-classification-using-feature-groups.ipynb.

Run a join query on prediction results in Amazon Redshift

Lastly, we query the inference result and join it with original user profiles in Amazon Redshift. To do this, we use Amazon Redshift Spectrum to join batch prediction results in Amazon S3 with the original Redshift data. For details, refer to the notebook run 6-read-results-in-redshift.ipynb.

Clean up

In this section, we provide the steps to clean up the resources created as part of this post to avoid ongoing charges.

Shut down SageMaker Apps

Complete the following steps to shut down your resources:

  1. In SageMaker Studio, on the File menu, choose Shut Down.
  2. In the Shutdown confirmation dialog, choose Shutdown All to proceed.

  1. After you get the “Server stopped” message, you can close this tab.

Delete the apps

Complete the following steps to delete your apps:

  1. On the SageMaker console, in the navigation pane, choose Domains.
  2. On the Domains page, choose SageMakerDemoDomain.
  3. On the domain details page, under User profiles, choose the user sagemakerdemouser.
  4. In the Apps section, in the Action column, choose Delete app for any active apps.
  5. Ensure that the Status column says Deleted for all the apps.

Delete the EFS storage volume associated with your SageMaker domain

Locate your EFS volume on the SageMaker console and delete it. For instructions, refer to Manage Your Amazon EFS Storage Volume in SageMaker Studio.

Delete default S3 buckets for SageMaker

Delete the default S3 buckets (sagemaker-<region-code>-<acct-id>) for SageMaker If you are not using SageMaker in that Region.

Delete the CloudFormation stack

Delete the CloudFormation stack in your AWS account so as to clean up all related resources.

Conclusion

In this post, we demonstrated an end-to-end data and ML flow from a Redshift data warehouse to SageMaker. You can easily use AWS native integration of purpose-built engines to go through the data journey seamlessly. Check out the AWS Blog for more practices about building ML features from a modern data warehouse.


About the Authors

Akhilesh Dube, a Senior Analytics Solutions Architect at AWS, possesses more than two decades of expertise in working with databases and analytics products. His primary role involves collaborating with enterprise clients to design robust data analytics solutions while offering comprehensive technical guidance on a wide range of AWS Analytics and AI/ML services.

Ren Guo is a Senior Data Specialist Solutions Architect in the domains of generative AI, analytics, and traditional AI/ML at AWS, Greater China Region.

Sherry Ding is a Senior AI/ML Specialist Solutions Architect. She has extensive experience in machine learning with a PhD degree in Computer Science. She mainly works with Public Sector customers on various AI/ML-related business challenges, helping them accelerate their machine learning journey on the AWS Cloud. When not helping customers, she enjoys outdoor activities.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS Certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Read More

Collaborators: Project InnerEye with Javier Alvarez and Raj Jena

Collaborators: Project InnerEye with Javier Alvarez and Raj Jena

black and white photos of Microsoft Health Futures’ Senior Director Javier Alvarez and Dr. Raj Jena, a radiation oncologist at Addenbrooke’s hospital, next to the Microsoft Research Podcast

Episode 145 | August 17, 2023 

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a new Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, Dr. Gretchen Huizinga talks with Microsoft Health Futures Senior Director Javier Alvarez (opens in new tab) and Dr. Raj Jena (opens in new tab), a radiation oncologist at Addenbrooke’s hospital, part of Cambridge University Hospitals in the United Kingdom, about Project InnerEye, a Microsoft Research effort that applies machine learning to medical image analysis. The pair shares how a 10-plus-year collaborative journey—and a combination of research and good software engineering—has resulted in the hospital’s creation of an AI system that is helping to decrease the time cancer patients have to wait to begin treatment. Alvarez and Jena chart the path of their collaboration in AI-assisted medical imaging, from Microsoft Research’s initiation of Project InnerEye and its decision to make the resulting research tools available in open source to Addenbrooke’s subsequent testing and validation of these tools to meet the regulatory requirements for use in a clinical setting. They also discuss supporting clinician productivity—and ultimately patient outcomes—and the important role patients play in incorporating AI into healthcare.

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

JAVIER ALVAREZ: On the third iteration, we actually moved to deep learning, and we started using GPUs in the cloud.

RAJ JENA: I’m really interested in this part of the story, the “final mile” story, where you actually take something and instead of just topping out at saying, “Hey, we did something. Let’s write a paper” — which we did do! — you actually stick with it and get it all the way through to clinical impact.

ALVAREZ: So we started training models with 30 million parameters. And this was a huge breakthrough. So we started to get really good feedback from Raj and his colleagues at Addenbrooke’s. Uh, yeah, it was a great experience.

JENA: In 2016, some changes came to the team. Javi joined, and we were so excited because he was a software engineer, where before we had been researchers talking to researchers, and it was the ability to know that really good software engineering was going to be able to take something we built as research and make it good enough to plumb in the hospital as Javi described. That was a real exciting moment.

[TEASER ENDS]


GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC ENDS]

I’m excited to be talking today with Javier Alvarez and Dr. Raj Jena. Javier is a Senior Director of Biomedical Imaging at Microsoft Health Futures in Cambridge, UK, and part of Project InnerEye, a machine learning technology designed to democratize AI for medical image analysis across the spectrum from research to practice. Raj is a radiation oncologist at Addenbrooke’s hospital, which is part of the Cambridge University Hospitals system, and he was also a collaborator with Project InnerEye during the research phase. Javier and Raj, welcome to the podcast. Now, before we peer into InnerEye, let’s get to know you a little bit better! Javier, I’ll start with you. Give us a brief overview of your training and expertise and then tell us about Microsoft Health Futures and your role there.

JAVIER ALVAREZ: Thank you for having me here. I’m Javier, and I lead the biomedical imaging team at Microsoft Health Futures. We are responsible for research, incubations, and moonshots that drive real-world impact across healthcare and life sciences inside MSR. Uh, yeah, my team is very diverse. We focus on end-to-end solutions. We collaborate with people like Raj, mostly clinicians, and we work on high-quality research, and we hope others can build on top of our work. We try to integrate our AI as a “friendly colleague.” And yeah, I have been in Microsoft for 10 years. My background is in computer science and engineering, and I have been always working on research and innovation projects, uh, focusing on high-risk/high-reward projects. And yeah, my first job at Microsoft was actually working on the first telemetry pipeline for Microsoft on, on the Azure cloud. And we helped several products like Skype, Xbox, Office, and Bing to get better insights into their data. And yeah, after that I joined Antonio Criminisi and Raj in 2016 to work on InnerEye. So yeah, I’m super, super excited to be here to share more about our work.

HUIZINGA: Well, Raj, our audience is a super smart one, but probably not all that well-versed on radiation therapy and neuro-oncology. So tell us about your work as a cancer doctor and a researcher, as well. What’s your background, and how would you define your role — or roles, plural — at Cambridge University Hospitals?

JENA: Thanks for the opportunity to join this discussion and to fly the flag for radiation oncology. It’s a really useful and very modern anti-cancer therapy. Half the people diagnosed with cancer who are cured will end up having radiation therapy as part of their treatment pathway. So I’m passionate about making radiation therapy as safe, as smart and accurate, and with as few side effects as possible. And I do that both in the context of my clinical work but also research work, where I focus mainly on sort of the analysis of images. We use an awful lot of imaging in radiation therapy to really target the radiation therapy. And it’s in that context, really, that I kind of started, you know, with this collaboration over 10 years ago now.

HUIZINGA: Wow. What would you say your “split” is? I mean, as a doctor or a researcher, how do you balance your time?

JENA: Some people would say I have the dream job because I do half and half. Half clinical work and half research work. And I really like that because it means that I can anchor myself in the clinic. I don’t lose track of why we’re trying to do these things. We’re trying to bring benefit to patients, to my patients. But it also means I’ve got the time to then explore on the research side and work with the best and brightest people, including, you know, many of the guys I’ve met at Microsoft Research.

HUIZINGA: Right. You know, as a side note, I just finished a book called The Butchering Art about Joseph Lister, who was both a surgeon, in the Victorian era, and also a researcher and sort of discovering this idea of germ theory and so on with Louis Pasteur, etc. So I’m, I’m ensconced in this idea of research and practice being so tightly woven together. So that’s really awesome. Well, before we get into specifics on the collaboration, Project InnerEye warrants a little bit of explication itself. From what you’ve described, I’d call it a “machine learning meets radiation therapy” love story, and it’s a match made in heaven, or at least the cloud. So what’s the catalyst for InnerEye, and how have the research findings changed the game? Raj, why don’t you talk about it from the medical angle?

JENA: Sure. So, um, as with many things, it started by chance. I went to a talk given by Antonio Criminisi, who Javi mentioned. He was the person that kind of established the InnerEye group at Microsoft Research back in 2011, I think. And he was talking about the way that his team, that did computer vision at the time, were using algorithms that had been developed to detect the human pose so that actually you could play video games without a controller. So this was technology that we all know and love in terms of systems like Kinect and the Xbox. You know, I had one of those! But I went to listen because Antonio wanted to apply it to medical imaging. So in the same way that they were using algorithms to mark out where the body was or where the hands were, could we also mark out tissues and structures within the body? So I said to him, after the end of this, you need to come and see what we do in radiation therapy because this really matters. And to his credit, he did! A couple of weeks later, he came to the department, and he went into a room where dozens of my colleagues were sitting in front of computers, working as fast and accurately as they could, to manually mock up all this normal anatomy on CT scans so we could get our patients onto radiotherapy as quickly as possible. And that was the light bulb moment where he realized, yeah, we need to make this better; we need to make this faster and use, initially, algorithms that came from computer vision, but now, you know, we’ve moved slowly over to things now that we would consider to be sort of machine learning and AI algorithms.

HUIZINGA: Right. Well, I should note that I’ve interviewed Antonio on this show, um, a few years back. And so if listeners want to go back to the archives and find the episode with Antonio Criminisi, that was a great one. So what you just described is sort of a “I can do this, but I can’t do it very fast” scenario. So let’s go into the geek side. Um, Javier, talk about the technical aspects of InnerEye and what it brought to the game. How has the research evolved? Where did it start, from your perspective, and where has it come in the cloud era?

ALVAREZ: Sure, yeah. I would be happy to geek out a bit! Um, so one of the biggest challenges that we faced in radiotherapy was working with CT scans. So CT scans are 3D images that contain around 20 million 3D pixels. We usually call them voxels. And we need to classify each of them as background, different organs, or tumor. And this actually requires a lot of compute and memory. So when we started in 2016, actually we started using very simple models called decision forests, and these can be trained on CPUs. So it was really easy to train them, but one of the problems with decision forests is that you actually have to do the feature extraction manually. So we had to code all that, and it’s a bit of a limitation of this approach. So in the second iteration, we started connecting the hospital to the cloud, and that gave us access to more compute, and we started introducing what we call the InnerEye-Gateway. So this actually helped to automatically route de-identified CT scans to the cloud and run the computation there. And we managed to integrate the model seamlessly into the workflow. So clinicians, when they go to open their CT scan, they already have the segmentation ready to be used on their favorite planning tool. They can review it and refine it. And then on the third iteration, we actually moved to deep learning, and we started using GPUs in the cloud. And this actually helped us create bigger models with more capacity to learn these complex tasks. So we started training models with 30 million parameters. And this was a huge breakthrough. So we started to get really good feedback from Raj and his colleagues at Addenbrooke’s. Uh, yeah, it was a great experience. We had to iterate many times and go to the hospital down the road here in Cambridge. And yeah, it wasn’t a straight path. We had to learn a lot about the radiotherapy workflow, and yeah, we actually learned that it’s actually very hard to deploy AI.

HUIZINGA: Yeah. Every time we do a podcast, um, listeners can’t see the other person shaking their head, but Raj has been shaking his head the whole time Javier’s talking. Talk a little bit, Raj, about that marriage of workflow and machine learning. How did it change your world?

JENA: Yeah, I mean, I think I’m really interested in this part of the story, the “final mile” story, where you actually take something and instead of just topping out at saying, “Hey, we did something. Let’s write a paper” — which we did do! — you actually stick with it and get it all the way through to clinical impact. And actually, you know, from my point of view, in 2016, some changes came to the team. Javi joined, and we were so excited because he was a software engineer, where before we had been researchers talking to researchers. And it was the ability to know that really good software engineering was going to be able to take something we built as research and make it good enough to plumb in the hospital as Javi described. That was a real exciting moment. And then the second exciting moment that followed from that was the first time our clinicians saw the output from that third iteration that Javi mentioned, the deep learning model, and you looked at their reactions because they’re thinking, I couldn’t immediately tell this was done by AI.

HUIZINGA: Wow!

JENA: And that was the moment I will never forget. Because they were very kind to us. They evaluated the models at the beginning, when the output wasn’t good enough and they said, hey, this is interesting, but, you know, we’re not really going to use it. It’s not really going to save us time. And they stuck with us, you know, the clinician part of the team stuck with the researcher part of the team, and we kept going. And it was that moment really when everything came together and we thought, yeah, we’re onto something. That was … that was huge.

HUIZINGA: Yeah. It sounds like you’re talking about how you met, but I’m not sure if that’s the whole story. So let’s talk about the meet-up and how the two of you, specifically as collaborators, started working together. I always like to call this “how I met your mother,” but I’m interested to hear each side of the story because there’s always an “aha moment” on what my work could contribute to this and how theirs could contribute to mine – the kind of co-learning scenario? So, Raj, go a little further in describing how Javi and you got together, and then we’ll see if Javier can confirm or deny the story! [LAUGHS]

JENA: Yeah. So as, as I mentioned … so I had already been working with Antonio purely as research for a little while, and Antonio was tremendously excited because he said the team was going to expand, and Javier was one of the first hires that we actually had to join the team. And I remember Antonio coming in and said, “We’ve just interviewed and appointed this guy. You wait till you … you wait till you meet him,” kind of thing. And then Javi joined us. From my point of view, I am a doctor that likes to code, so I like seeing code come to action, and I know the joy that that brings. And there was this amazing time, shortly after Javi first joined us, where I would come and meet the team about once a week and we would say, hey, you know, maybe we should do this and maybe this would be the way to solve this particular problem, or we need to design a tool so we can visualize the imaging and the machine learning parts of our workflow together and work on them together. And I come back next week, and the thing was practically built! And, you know, to me, that was just the amazing thing … is what you realized is that where before we had been struggling along with just researchers trying to do their best — you know, we know the maths but not how to build things — all of a sudden, Javi comes along and just the rate and the pace at which stuff move forwards, it was incredible! So yeah, that’s my side of the story.

HUIZINGA: I love it. Um, in fact, a doctor that likes to code … I’m wondering if Javier is a computer scientist that likes to … I don’t even know how to fill in the blank on your end … radiotherapy? Dabble in operation? Javier, what’s your side of the story?

ALVAREZ: Yeah, I think for me, it was really amazing to work with Raj because he was telling us about all the physics about radiotherapy, and this was super exciting. We went on multiple trips to Addenbrooke’s to see the radiotherapy department. So actually, yeah, for me, I, I … that was my first project on healthcare, so I had to learn a lot. So yeah, it was super useful to work with Raj, learning about the workflow in radiotherapy, how the data moves, as well. It was super useful. I think actually we met here with Antonio during lunch in the lab. Uhh, yeah…

HUIZINGA: During lunch in the lab … ! [LAUGHS] It would be a good time now for me to just clarify that Addenbrooke’s is the old name of the hospital that’s part of … um, Raj, explain that!

JENA: That’s right. So we’re now called Cambridge University Hospitals to reflect the fact that we’re a big biomedical campus and we actually have multiple hospitals: Addenbrooke’s, the Rosie, uh, Papworth Hospital … but affectionately, people who have lived in Cambridge still call it Addenbrooke’s.

HUIZINGA: That’s good. We can call it both. Javier, as we’re recording this podcast, some big things are going on in the UK. Um, it’s the 75th anniversary of the National Health Service, or NHS, and you guys recently got an award from that organization. You’ve written a JAMA paper and even the prime minister posted something on LinkedIn about your work, which is pretty cool! Tell us about some of the accolades associated with InnerEye right now, from where it started — you know, as a twinkle in someone’s eye — to where it is now, what kind of attention it’s getting. What’s the buzz?

ALVAREZ: Yeah, absolutely. Yeah, maybe I’ll talk about the JAMA paper, and I will let Raj talk about the NHS part, because I think this has been mostly his work.

HUIZINGA: Perfect.

ALVAREZ: So yeah, I think when we started getting really good results with our models in Addenbrooke’s and sharing it with the clinicians, we thought that yeah, we wanted to run a bigger study on evaluating the models for prostate and head and neck. Uh, so we ran a study that was published in JAMA, and here we asked the question of, OK, are these models actually acceptable and accurate enough for radiotherapy planning? And can we actually reduce the time in the workflow? So we, we actually got around eight datasets from all around the world, very diverse datasets from radiotherapy planning, and we set aside a couple of them for external validation. So we didn’t use those for training. And then we used the, the rest of them for training the model. And we actually show in the paper that the model generalizes to the external datasets, so it’s quite robust, using different protocols in radiotherapy. And we also did some interobserver variability study to check that the variability of the AI model is similar to the variability that we observed between different clinicians. And, yeah, as part of the paper, we actually open-sourced all the code. This is how Addenbrooke’s actually started to think about deploying the models clinically. Uh, yeah, in fact this work was recognized with this NHS AI Award and now with the NHS anniversary, but, yeah, I’ll let Raj talk about this part in the hospital.

HUIZINGA: Well, before we go to Raj, I want you to just clarify, because I think this is super interesting. You’ve got the paper and you’ve got practice. And what’s fascinating … I’ll say it again—I just finished the book—but what Joseph Lister did was practice and show how his theories and his work made a difference in his patients’ lives. But what you’re talking about, as you mentioned, Javier, is background, organ, tumor …

ALVAREZ: Yeah.

HUIZINGA: So those three things have to be differentiated in the radiologist’s workflow to say, I’m not going to shoot for the background or the organ; I want to get the tumor. And what you’re saying, Javier, is that this tool was able to do sort of human-level identification?

ALVAREZ: Yeah. Yeah, exactly. Yeah. This is what we, we showed in the JAMA paper. Yeah.

HUIZINGA: Well, Raj, talk about it from the medical angle. Um, what’s the buzz from your end?

JENA: Sure. Yeah. So, so InnerEye is a toolkit, and it was great to see it being used for all sorts of things, but in radiation therapy, we’re using that toolkit specifically to mark out the healthy organs that need to be shielded from radiation. At the moment, we’re not using InnerEye to try and mark out the tumor itself because tumors change a lot from person to person. And so what our design was, was to build something that very much assists rather than replacing the oncologist so that when the oncologist sits down to do this task, about 90 percent of the time is spent marking out all of the healthy organs and 10 percent of the time on the tumor. Actually, we’d love it to be the other way around. And that’s what this tool does. It means that when the oncologist sits down, all of the healthy organs that sit around the tumor that need to be shielded as much as possible from the radiation, that’s already done. So the oncologist goes through … they have to review it, obviously, and check each one is accurate. And in our real-world testing, we found out that about two times out of three, the tool does a good enough job that its output can be used directly without changing anything, which is really good.

HUIZINGA: Wow.

JENA: That means they can then focus on contouring the tumor, and it means the overall time taken to complete this task can be about two and a half times faster. Now, when you think, for the complex tumors that we deal with, that can take up to two hours, that’s a lot of time saving and that’s time given back to the oncologist to spend in front of the patient, basically. So from our point of view, Javi mentioned this, uh, NHS award—it was this AI award that we were given by our national healthcare service—and what that was charged to do was to pick up the baton, once Microsoft had turned InnerEye to an open-source tool, because to turn that open-source tool into a potential medical device that could be used in the cloud for real clinical care, needs a whole other level of sort of checks and evaluations. And that’s what we did, basically, in our team. We worked together with the team in our hospital that builds things as medical devices. Usually, in our hospital, that team builds what we call prosthetics. So things that you would put into a patient or onto a patient when they’ve been injured or something like that. They’d never done it for a software device. But it was great because we had some really strong starting points. First of all, we knew that the actual InnerEye code was fantastic, and secondly, we knew from the JAMA paper that the initial evaluations, in terms of how useful these things were, stood up very well. So that, together with our own clinical evaluations of having the tool plumbed in and seeing it being used, meant that we kind of already knew that this was going to be possible, that we were likely to succeed in this task.

HUIZINGA: Hmmm. Go back a little bit, Raj. You’ve mentioned that tumors change from patient to patient, so it’s not always the same. Do they also change over time?

JENA: Yes. Hopefully, they shrink after radiation therapy and the treatments that, that we give! And so yes, I mean, it’s a big part of what these sorts of tools will continue to be explored in the future is actually tracking how tumors change over time, and that’s a big area. But, you know, we chose to pick on something that was achievable, that wasn’t too risky, and that would already achieve real utility, you know, in, in a hospital. So we already did that with even what it does in terms of marking out the healthy organs. The tumor stuff will come, I’m sure, in time. But we already proved that you could use these tools and build them to be useful.

HUIZINGA: Right. Javier, you mentioned earlier that one of the mandates of the lab is high-risk/high-reward research. This seems to have super high reward, but it’s about now that I ask what could possibly go wrong to everybody that comes on the show. [LAUGHS] Some people hate it. Some have worried that AI will take jobs away from doctors, and I’m sure there’s other worries, as well. What thought have you given to potential consequences, intended and unintended, as you move forward with this work, and what strategies are you employing to mitigate them? Let’s hear from the technologist first, and then we’ll hear from the doctor.

ALVAREZ: Yeah, absolutely. I believe, uh, AI safety should be our top priority in any of our AI products in healthcare. And yeah, it is super important to consider the intended and unintended consequences of deploying these models into the clinical workflow. One of the top-of-mind concerns for the public is that AI might take jobs away from doctors, but actually, we need more doctors. So one out of five jobs in oncology are not being filled in the UK, and the way we are thinking about deploying these AI models is to augment the clinicians. So we want to help them be more productive and deliver better patient outcomes. So the models are working alongside the doctor. And in the case of InnerEye, we are delivering more accurate and faster segmentation. Other concerns could be biases in the models, and to mitigate this, we usually work with clinicians like Raj to build diverse and good datasets that are representative of the population. As always, we make sure the clinician has the ultimate decision and they approve the work of the AI model.

HUIZINGA: Raj, what’s your take on the “what could possibly go wrong” question?

JENA: Yeah, it’s an interesting one. You know, we’ve identified 500 risks, and we’ve gone through each and every one of them and made sure either that the software means that it can’t happen or we mitigate it, basically. Actually, though, the biggest thing that you can do to mitigate risk is talk to patients. And as part of this award, we got to do two really interesting consultations with patients, because then you understand the patient’s perspective. And two things, very briefly, that I took home from that: the first is, is that patients say, yeah, OK, this isn’t what I thought of when I think about AI. I understand that you’ve used incredibly advanced machine learning tools, but actually, this is a very simple task, and the risk is relevant to the task rather than the technology. So that was a useful thing. And the second thing is that they said, it’s all about who’s in control. I understand how this system works to assist an oncologist, and the oncologist retains ultimate control, and that is a huge thing in terms of enhancing trust. So I think as you move from these types of systems to systems where actually you start to push the envelope even further, it’s really important to take patients with you because they keep you grounded, and they will give you really good insights as to what those real risks are.

HUIZINGA: Right.

JENA: The other thing is, is that everyone knows, just like any job, you know, there are the bits that excite you and reward you. And then there are the bits that are kind of dull and tedious. And, you know, Eric Topol has this famous phrase that he said, you know, which is that good AI should give clinicians the gift of time, and that’s what you really want … is, is that you want the AI to allow you to spend more of the time that interests you, excites you, fascinates you, motivates you. And I think, you know, from my point of view, I’m a great believer that that’s what AI will do. It will actually, you know … doctors are very adaptive. They’ll learn to use new tools, whether it’s a robot from a surgeon’s point of view or a new AI algorithm, but they’ll use it in the best way possible to actually kind of still allow them to achieve that patient-centric care.

HUIZINGA: Well, that’s a lovely segue way into the next question I had for you anyway, which is what could possibly go right. And you, Raj, referred to the triple benefit of InnerEye. Go a little deeper into who this research helps and why and how.

JENA: I think it’s a really important illustration of how you can democratize AI. A lot of AI research work stays as research work, and people don’t really understand how these tools … they hear a lot about it, and they read a lot about it, but they don’t understand how it’s actually going to make a difference for them in the clinic. And I think that’s why, you know, stories like InnerEye are particularly meaningful. We’re not talking about building an AI that lets us understand something that the human couldn’t understand before. So it’s not earth shattering in that sense. And yet, even despite that simplicity, so many of my colleagues, they get it. They go, OK, you know, we really understand you’ve actually built something, and you’ve put it here into the clinic. And I think, you know, from my point of view, that’s the real value. There are other value propositions relating to the fact that it was open-source that lends itself to democratization and sharing and also because it runs in the cloud and that basically you don’t need a hospital that’s already got a quarter million-pound computer and only those hospitals with the latest kit can actually use it. So it means that it is just as easy to deploy in a small hospital as it is in a big hospital. So for me, those are the key messages, I think.

HUIZINGA: Javier, Raj just alluded to the open-source nature of this tool or toolkit. I want you to drill in a little more on that story. Um, I understand this lives on GitHub. How did that decision come about, and why do you believe this will benefit people in the future?

ALVAREZ: Yes. So the decision to make the code open-source came from the desire to democratize the access to these AI models. So we wanted to make sure everyone would be able to build on top of our research. And that was the way that we found to give access to Addenbrooke’s to create their own medical devices. We thought that also having open-source code allows us to be more transparent with our research and to gain trust on the technology. It also helps us, as well, to get help from the community on building this project. So we had people helping us to fix bugs and to make sure, uh, the algorithms are not biased. As part of the open-source, we made available three big components. One is the InnerEye-Gateway that routes the images to the AI models in the cloud and de-identifies the data. We also made available the InnerEye inference code that basically is an API that the InnerEye-Gateway uses to run the models. And also all the training code to be able to reproduce our work. Uh, yeah, we are super excited to see how people will use the open source in the future. We also have some startups that are using our code and trying to build products with it.

HUIZINGA: Go a little further, Javier, because this is interesting. Obviously, radiation therapy is one application of InnerEye, but I imagine it could be useful for other medical applications or other … actually, probably anything that you need to identify something, you know, the signal in the noise.

ALVAREZ: Yeah, um, segmentation in medical imaging is super important, so it allows you to actually strike measurements from the images. So, yeah, it can be useful, as well, in some radiology scenarios like clinical trials where you want to track tumors over time. And also in surgery where you want to plan surgery, so you need to understand how vessels are feeding into the tumor. So, yeah, segmentation is super important, and I think the components that we have could be useful for many different scenarios in medical imaging.

HUIZINGA: Well, Raj, I always like to know where the project is on the spectrum from lab to life, and as I understand it, after the InnerEye team completed the research and made the code open source, Addenbrooke’s took the regulatory baton for medical device approval in the UK, but it’s still not over. So continuing with that analogy: if this were a relay race and the idea was the first leg, who else is running, where are you in the race, and who brings it across the finish line?

JENA: Yeah, that’s a really good analogy. I, I might use that one in the future. So, uh, there are other commercial organizations that have systems that will perform this work. They are quite expensive, actually, to buy into if you want to buy them outright. There are some where, a bit like ours, you can scale it so that you pay as each patient’s data is processed. They also are quite expensive for some emerging, uh, healthcare markets, and by emerging healthcare markets, I include my own in the, in the NHS. To our knowledge, we are the only cloud-based, open-source medical imaging device that we’re actually trying to build within the NHS. So that is truly unique. And in terms of where we are on that journey to take the, you know, the InnerEye open source all the way through to a medical device that actually, you know, you can buy off the shelf and have all of the associated support and, you know, technical specifications that you need to use in practice, we’re at this point where the hospital has basically finished all of that work. The hospital has been incredibly supportive of this entire research for the last 10 years, but it can’t act as a manufacturer. It’s quite difficult to do that. So we’ll then partner with a manufacturer, actually a company that’s a friend to us in the hospital and to the InnerEye team, too, and they will be responsible for basically taking all of the work that we’ve done to prepare the medical device certification documents and then actually going through that device certification and bringing it to the market. So it’s very exciting, you know, to be literally at that final stage of the, of the story.

HUIZINGA: Right. Ready to run across the finish line. I like to end each podcast with a little vision-casting, and I’ve been shocked at how profoundly healthcare has advanced in just the last hundred and fifty years. So I won’t ask you to project a hundred and fifty years out, but if InnerEye is a truly game-changing technology, what does healthcare, and especially oncology, look like in the future, and how has your work disrupted the field and made the world a better place? Javier, why don’t you talk about it from the technical aspect, and then maybe Raj can bring the show home from the medical aspect.

ALVAREZ: Sure. Yeah. One exciting, uh, development on the horizon is the use of GPT-4 in radiology or maybe even in radiotherapy. We are also working on multimodal learning now and trying to expand the work that we have done with InnerEye to radiology, where there is a much bigger opportunity. Uh, with multimodal learning, we are trying to integrate multiple sources of data like medical images, text, audio, and also different types of modalities because we want to make sure we can use CT scans, MRI, x-rays … and yeah, this requires developing new types of models, and these models need to be able to generalize to many different tasks because we have a huge need for AI in healthcare, and the current way of, uh, building these models is we develop one model for every use case, and this is not scalable. So we need more general-purpose models that can be specialized really quickly to different needs. And I think the other thing that excites me is actually … maybe this is quite far away, but how do we create a digital copy of the human body for every person on the planet and we create some sort of digital twin that we can actually use to run simulations? And I think medical imaging is going to be a big, important part of this. And we can use that digital twin to run interventions and figure out how can we treat that patient, what is happening with that patient, so, yeah, I think it’s super exciting, the potential of AI in healthcare, but of course we need to make sure we look at the risks, as well, of using AI. But yeah, there are many positive opportunities.

HUIZINGA: Right. I’m just shaking my head and my jaw is dropped: my digital twin in the future! [LAUGHS] Raj?

JENA: I mean, I think it’s a tremendously exciting time, and we live in an exponential age where things are coming and new innovations are coming at a faster and faster rate. I think what we have to do is to really, as doctors, learn from history and adapt to make sure that we stay able to retrain and reconfigure ourselves, and reconfigure medicine, to keep up to speed with the digital technologies. You know, just to give an example to what you were talking about with Joseph Lister; it’s fascinating. You know, I always think about, you know, Semmelweis and a similar story. So he was an Austrian obstetrician who, for the audience, a hundred and fifty years ago worked out that actually if you wash your hands after delivering a baby from a mother, the mother was less likely to get a fever and less likely to die. He was 29 when he worked that out, and yet it took nearly 20 years for him to convince the medical community basically because they felt threatened. And, you know, that was the key thing. They just, you know, there wasn’t that level of understanding of, you know, that we need to think and adapt and incorporate new ideas and new thinking. And we will be challenged, you know, severely, I think, in the years to come, with new technologies. I’ve just come back from a conference talking about foundation models and GPT in medical imaging and, um, you know, there was a huge amount of excitement. One really interesting point that I heard is that these models were built on all of the images, mainly generated by cameras, on the internet and social media sphere, and if you add up all of the medical imaging that’s ever been done, it’s only about 1 percent of that image data. So it’s always going to be hard. And of course, we can’t always access all of that information, you know, for patient confidentiality and, you know, numerous factors. So it may take a little while before we have these amazing, generalizable AI models in medicine, but I’m sure they’ll come, and I think the biggest thing that we can do is to be ready for them. And the way I believe that you do that is in little steps, is to start bringing very simple, explainable, transparent AI into your workplace—of which, you know, InnerEye is a really good example—so that, you know, you can look inside the box, start to ask questions, and understand how it works because then, when the next AI comes along, or maybe the AI after that, that integrates more data than the human mind can hold together to make a decision, then you need to be comfortable with your ability to query that, to interrogate that, and make it safe, you know, for your patients. Because at the end of the day, for thousands of years, doctors have evaluated things. And yeah, I think, I think those things won’t change, you know, but we just … we’ve got to up our game, you know, so I’ve got to be as good as Javi is in kind of understanding how these things, how these things work. So …

HUIZINGA: Well, I love my job because I learn something new every show. And this one has been a humdinger, as they say. Thank you so much for taking time to educate us on InnerEye today.

ALVAREZ: Thank you.

JENA: Thanks. It’s been a pleasure.

The post Collaborators: Project InnerEye with Javier Alvarez and Raj Jena appeared first on Microsoft Research.

Read More

The Proof Is in the Cloud: GeForce NOW Announces Ultimate KovaaK’s Challenge Results

The Proof Is in the Cloud: GeForce NOW Announces Ultimate KovaaK’s Challenge Results

The verdict is in: A GeForce NOW Ultimate membership raises the bar on gaming. Members have been tackling the Ultimate KovvaK’s challenge head-on and seeing for themselves how the power of Ultimate improves their gaming with 240 frames per second streaming.

The popular training title that helps gamers improve their aim fully launches in the cloud this week alongside a limited-time discount on Steam. KovaaK’s leads over 20 new games joining the GeForce NOW library this week.

Gamers Take Their Best Shot at QuakeCon

Leaderboard for Ultimate KovvaK's challenge
Ultimate leads the way.

Droves of PC gaming fans converged at the GeForce NOW Lounge at QuakeCon over the weekend to take on the Ultimate KovaaK’s challenge. Attendees were among the first to play a custom GeForce NOW KovaaK’s demo — first on a free membership and then with 240 fps streaming on an Ultimate membership.

And it was clear just how much streaming from a GeForce RTX 4080 gaming rig changes the game. Over 58,000 sessions have been completed since the start of the challenge, and participants immediately saw their gaming scores improve by 1.6x just from playing on an Ultimate membership.

Ultimate KovaaK's challenge on GeForce NOW
QuakeCon attendees aiming for the clouds.

Attendees played for top placement on the QuakeCon leaderboard to win both bragging rights and some ultimate prizes. The top three slots on the leaderboard on each of the three days of the show, as well as the top overall slots were dominated by those using an Ultimate membership. Here’s what a few of them they had to say about Ultimate:

“This [Ultimate Tier] is a lot smoother — the responsiveness is great.” – David G.

“… there is so much clarity [with the Ultimate tier]” – Gordan M.

Raisy, a professional Quake champion player and second on the QuakeCon leaderboard, also weighed in on the Ultimate tier: “The smoother the gameplay, the better the experience.”

And Garrett “KovaaK” Krutilla, the co-founder and director of FPS design from The Meta, the developer of KovaaK’s, said: “The Ultimate membership provides a perfect place to train up on KovaaK’s, with access to powerful GeForce RTX 4080 servers for 240 fps streaming and ultra-low latency from NVIDIA Reflex. The scores that top players are getting on Ultimate prove that NVIDIA has made cloud gaming completely viable for competitive gamers in the FPS space.”

Members can still play the challenge at home. Each week, the leaderboard will reset for members to compete for the top three slots to win a six-month GeForce NOW Ultimate membership and a $100 Steam gift card. At the end of the challenge on Thursday, Sept. 21, the top three overall scorers will win:

First place: ASUS ROG Swift 240Hz monitor

Second place: ASUS Chromebook Vibe CX34 Flip

Third place: ASUS ROG Azoth and ROG Gladius III keyboard + mouse bundle

Upgrade to Ultimate today for the best performance in the cloud, up to eight-hour gaming sessions and exclusive access to RTX 4080 servers in the cloud.

Aiming for the Clouds

KovaaK's on GeForce NOW
Your own personal aim trainer.

Members can now experience KovaaK’s in its entirety, now available for members to stream from the cloud.

Dominate every first- and third-person shooter game by training with KovaaK’s. Trusted by top pros, streamers and other gamers, the incredibly latency-sensitive aim trainer features over 175,000 player-created scenarios and shareable playlists, infinite customization options and cloned game physics. Members can even share their stats and achievements on KovaaKs.com.

The cherry on top: Members can level up with KovaaK’s at a 30% discount until Thursday, Aug. 21. Grab it today and train for more competitive gaming in the cloud. Pair it with an Ultimate membership for a 240 fps advantage and get used to being called a human aimbot.

The Cloud Is Buzzing

A new week brings more games to buzz about.

The Texas Chain Saw Massacre on GeForce NOW
The buzzzz is back!

Experience the mad and macabre in The Texas Chain Saw Massacre from Sumo Digital and Gun Interactive. Take on the role of a notorious Slaughter family member or one of their victims in this third-person, asymmetrical horror experience based on the iconic 1974 film. Victims must use their wits and stealth to stay out of the family’s reach while Slaughter family players must track down and stop their guests from escaping. It launches day and date in the cloud this week.

Wayfinder on GeForce NOW
GeForce NOW + Wayfinder = Stronger together.

Wayfinder is a new online action role-playing game from Airship Syndicate and Digital Extremes. Harness the power of a Wayfinder to control the chaos overrunning the world of Evenor. Wield a variety of unique abilities, from ebbing arcane magic and lethal melee to mystical tech. Wayfinders are stronger together, so grab a couple of buddies and stream together.

And Genshin Impact’s new Version 4.0 is now available to stream. Say hello to the long-awaited Nation of Hydro region Fontaine — a whole new area for travelers to explore — as well as new characters, weapons, artifacts and more. Play it from the cloud without worrying about system specs or hard drive space.

Members can look forward to the 22 new games joining this week:

This week’s Game On giveaway with SteelSeries includes Genshin Impact in-game rewards and three-day Priority membership codes. Check the giveaway page for details on how to enter.

Where are you in the leaderboard for the Ultimate KovaaK’s challenge ? Let us know your answer on Twitter or in the comments below.

Read More

Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines

Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines

MLOps is a key discipline that often oversees the path to productionizing machine learning (ML) models. It’s natural to focus on a single model that you want to train and deploy. However, in reality, you’ll likely work with dozens or even hundreds of models, and the process may involve multiple complex steps. Therefore, it’s important to have the infrastructure in place to track, train, deploy, and monitor models with varying complexities at scale. This is where MLOps tooling comes in. MLOps tooling helps you repeatably and reliably build and simplify these processes into a workflow that is tailored for ML.

Amazon SageMaker Pipelines, a feature of Amazon SageMaker, is a purpose-built workflow orchestration service for ML that helps you automate end-to-end ML workflows at scale. It simplifies the development and maintenance of ML models by providing a centralized platform to orchestrate tasks such as data preparation, model training, tuning and validation. SageMaker Pipelines can help you streamline workflow management, accelerate experimentation and retrain models more easily.

In this post, we spotlight an exciting new feature of SageMaker Pipelines known as Selective Execution. This new feature empowers you to selectively run specific portions of your ML workflow, resulting in significant time and compute resource savings by limiting the run to pipeline steps in scope and eliminating the need to run steps out of scope. Furthermore, we explore various use cases where the advantages of utilizing Selective Execution become evident, further solidifying its value proposition.

Solution overview

SageMaker Pipelines continues to innovate its developer experience with the release of Selective Execution. ML builders now have the ability to choose specific steps to run within a pipeline, eliminating the need to rerun the entire pipeline. This feature enables you to rerun specific sections of the pipeline while modifying the runtime parameters associated with the selected steps.

It’s important to note that the selected steps may rely on the results of non-selected steps. In such cases, the outputs of these non-selected steps are reused from a reference run of the current pipeline version. This means that the reference run must have already completed. The default reference run is the latest run of the current pipeline version, but you can also choose to use a different run of the current pipeline version as a reference.

The overall state of the reference run must be Successful, Failed or Stopped. It cannot be Running when Selective Execution attempts to use its outputs. When using Selective Execution, you can choose any number of steps to run, as long as they form a contiguous portion of the pipeline.

The following diagram illustrates the pipeline behavior with a full run.

The following diagram illustrates the pipeline behavior using Selective Execution.

In the following sections, we show how to use Selective Execution for various scenarios, including complex workflows in pipeline Direct Acyclic Graphs (DAGs).

Prerequisites

To start experimenting with Selective Execution, we need to first set up the following components of your SageMaker environment:

  • SageMaker Python SDK – Ensure that you have an updated SageMaker Python SDK installed in your Python environment. You can run the following command from your notebook or terminal to install or upgrade the SageMaker Python SDK version to 2.162.0 or higher: python3 -m pip install sagemaker>=2.162.0 or pip3 install sagemaker>=2.162.0.
  • Access to SageMaker Studio (optional) Amazon SageMaker Studio can be helpful for visualizing pipeline runs and interacting with preexisting pipeline ARNs visually. If you don’t have access to SageMaker Studio or are using on-demand notebooks or other IDEs, you can still follow this post and interact with your pipeline ARNs using the Python SDK.

The sample code for a full end-to-end walkthrough is available in the GitHub repo.

Setup

With the sagemaker>=1.162.0 Python SDK, we introduced the SelectiveExecutionConfig class as part of the sagemaker.workflow.selective_execution_config module. The Selective Execution feature relies on a pipeline ARN that has been previously marked as Succeeded, Failed or Stopped. The following code snippet demonstrates how to import the SelectiveExecutionConfig class, retrieve the reference pipeline ARN, and gather associated pipeline steps and runtime parameters governing the pipeline run:

import boto3
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.selective_execution_config import SelectiveExecutionConfig


sm_client = boto3.client('sagemaker')
# reference the name of your sample pipeline 
pipeline_name = "AbalonePipeline"
# filter for previous success pipeline execution arns
pipeline_executions = [_exec
    for _exec in Pipeline(name=pipeline_name).list_executions()['PipelineExecutionSummaries'] 
    if _exec['PipelineExecutionStatus'] == "Succeeded"
]
# get the last successful execution
latest_pipeline_arn = pipeline_executions[0]['PipelineExecutionArn']
print(latest_pipeline_arn)
>>> arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/x62pbar3gs6h

# list all steps of your sample pipeline
execution_steps = sm_client.list_pipeline_execution_steps(
    PipelineExecutionArn=latest_pipeline_arn
)['PipelineExecutionSteps']
print(execution_steps)
>>> 
[{'StepName': 'Abalone-Preprocess',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 41, 30, 519000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 41, 30, 986000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-fvsmu7m7ki3q-Abalone-Preprocess-d68CecvHLU'}},
  'SelectiveExecutionResult': {'SourcePipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/ksm2mjwut6oz'}},
 {'StepName': 'Abalone-Train',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 41, 31, 320000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 43, 58, 224000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:training-job/pipelines-x62pbar3gs6h-Abalone-Train-PKhAc1Q6lx'}}},
 {'StepName': 'Abalone-Evaluate',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 43, 59, 40000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 57, 43, 76000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-x62pbar3gs6h-Abalone-Evaluate-vmkZDKDwhk'}}},
 {'StepName': 'Abalone-MSECheck',
  'StartTime': datetime.datetime(2023, 6, 27, 4, 57, 43, 821000, tzinfo=tzlocal()),
  'EndTime': datetime.datetime(2023, 6, 27, 4, 57, 44, 124000, tzinfo=tzlocal()),
  'StepStatus': 'Succeeded',
  'AttemptCount': 0,
  'Metadata': {'Condition': {'Outcome': 'True'}}}]

# list all configureable pipeline parameters 
# params can be altered during selective execution
parameters = sm_client.list_pipeline_parameters_for_execution(
    PipelineExecutionArn=latest_pipeline_arn
)['PipelineParameters']
print(parameters)
>>> 
[{'Name': 'XGBNumRounds', 'Value': '120'},
 {'Name': 'XGBSubSample', 'Value': '0.9'},
 {'Name': 'XGBGamma', 'Value': '2'},
 {'Name': 'TrainingInstanceCount', 'Value': '1'},
 {'Name': 'XGBMinChildWeight', 'Value': '4'},
 {'Name': 'XGBETA', 'Value': '0.25'},
 {'Name': 'ApprovalStatus', 'Value': 'PendingManualApproval'},
 {'Name': 'ProcessingInstanceCount', 'Value': '1'},
 {'Name': 'ProcessingInstanceType', 'Value': 'ml.t3.medium'},
 {'Name': 'MseThreshold', 'Value': '6'},
 {'Name': 'ModelPath',
  'Value': 's3://sagemaker-us-east-1-123123123123/Abalone/models/'},
 {'Name': 'XGBMaxDepth', 'Value': '12'},
 {'Name': 'TrainingInstanceType', 'Value': 'ml.c5.xlarge'},
 {'Name': 'InputData',
  'Value': 's3://sagemaker-us-east-1-123123123123/sample-dataset/abalone/abalone.csv'}]

Use cases

In this section, we present a few scenarios where Selective Execution can potentially save time and resources. We use a typical pipeline flow, which includes steps such as data extraction, training, evaluation, model registration and deployment, as a reference to demonstrate the advantages of Selective Execution.

SageMaker Pipelines allows you to define runtime parameters for your pipeline run using pipeline parameters. When a new run is triggered, it typically runs the entire pipeline from start to finish. However, if step caching is enabled, SageMaker Pipelines will attempt to find a previous run of the current pipeline step with the same attribute values. If a match is found, SageMaker Pipelines will use the outputs from the previous run instead of recomputing the step. Note that even with step caching enabled, SageMaker Pipelines will still run the entire workflow to the end by default.

With the release of the Selective Execution feature, you can now rerun an entire pipeline workflow or selectively run a subset of steps using a prior pipeline ARN. This can be done even without step caching enabled. The following use cases illustrate the various ways you can use Selective Execution.

Use case 1: Run a single step

Data scientists often focus on the training stage of a MLOps pipeline and don’t want to worry about the preprocessing or deployment steps. Selective Execution allows data scientists to focus on just the training step and modify training parameters or hyperparameters on the fly to improve the model. This can save time and reduce cost because compute resources are only utilized for running user-selected pipeline steps. See the following code:

# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
    selected_steps=["Abalone-Train"]
)

# start execution of pipeline subset
select_execution = pipeline.start(
    selective_execution_config=selective_execution_config,
    parameters={
        "XGBNumRounds": 120,
        "XGBSubSample": 0.9,
        "XGBGamma": 2,
        "XGBMinChildWeight": 4,
        "XGBETA": 0.25,
        "XGBMaxDepth": 12
    }
)

The following figures illustrate the pipeline with one step in process and then complete.

Use case 2: Run multiple contiguous pipeline steps

Continuing with the previous use case, a data scientist wants to train a new model and evaluate its performance against a golden test dataset. This evaluation is crucial to ensure that the model meets rigorous guidelines for user acceptance testing (UAT) or production deployment. However, the data scientist doesn’t want to run the entire pipeline workflow or deploy the model. They can use Selective Execution to focus solely on the training and evaluation steps, saving time and resources while still getting the validation results they need:

# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
    selected_steps=["Abalone-Train", "Abalone-Evaluate"]
)

# start execution of pipeline subset
select_execution = pipeline.start(
    selective_execution_config=selective_execution_config,
    parameters={
        "ProcessingInstanceType": "ml.t3.medium",
        "XGBNumRounds": 120,
        "XGBSubSample": 0.9,
        "XGBGamma": 2,
        "XGBMinChildWeight": 4,
        "XGBETA": 0.25,
        "XGBMaxDepth": 12
    }
)

Use case 3: Update and rerun failed pipeline steps

You can use Selective Execution to rerun failed steps within a pipeline or resume the run of a pipeline from a failed step onwards. This can be useful for troubleshooting and debugging failed steps because it allows developers to focus on the specific issues that need to be addressed. This can lead to more efficient problem-solving and faster iteration times. The following example illustrates how you can choose to rerun just the failed step of a pipeline.

# select a previously failed pipeline arn
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/fvsmu7m7ki3q",
    selected_steps=["Abalone-Evaluate"]
)

# start execution of failed pipeline subset
select_execution = pipeline.start(
    selective_execution_config=selective_execution_config
)

Alternatively, a data scientist can resume a pipeline from a failed step to the end of the workflow by specifying the failed step and all the steps that follow it in the SelectiveExecutionConfig.

Use case 4: Pipeline coverage

In some pipelines, certain branches are less frequently run than others. For example, there might be a branch that only runs when a specific condition fails. It’s important to test these branches thoroughly to ensure that they work as expected when a failure does occur. By testing these less frequently run branches, developers can verify that their pipeline is robust and that error-handling mechanisms effectively maintain the desired workflow and produce reliable results.

selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
    selected_steps=["Abalone-Train", "Abalone-Evaluate", "Abalone-MSECheck", "Abalone-FailNotify"]
)

Conclusion

In this post, we discussed the Selective Execution feature of SageMaker Pipelines, which empowers you to selectively run specific steps of your ML workflows. This capability leads to significant time and computational resource savings. We provided some sample code in the GitHub repo that demonstrates how to use Selective Execution and presented various scenarios where it can be advantageous for users. If you would like to learn more about Selective Execution, refer to our Developer Guide and API Reference Guide.

To explore the available steps within the SageMaker Pipelines workflow in more detail, refer to Amazon SageMaker Model Building Pipeline and SageMaker Workflows. Additionally, you can find more examples showcasing different use cases and implementation approaches using SageMaker Pipelines in the AWS SageMaker Examples GitHub repository. These resources can further enhance your understanding and help you take advantage of the full potential of SageMaker Pipelines and Selective Execution in your current and future ML projects.


About the Authors

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes. In his free time, he enjoys playing chess and traveling.

Akhil Numarsu is a Sr.Product Manager-Technical focused on helping teams accelerate ML outcomes through efficient tools and services in the cloud. He enjoys playing Table Tennis and is a sports fan.

Nishant Krishnamoorthy is a Sr. Software Development Engineer with Amazon Stores. He holds a masters degree in Computer Science and currently focuses on accelerating ML Adoption in different orgs within Amazon by building and operationalizing ML solutions on SageMaker.

Read More

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

Train self-supervised vision transformers on overhead imagery with Amazon SageMaker

This is a guest blog post co-written with Ben Veasey, Jeremy Anderson, Jordan Knight, and June Li from Travelers.

Satellite and aerial images provide insight into a wide range of problems, including precision agriculture, insurance risk assessment, urban development, and disaster response. Training machine learning (ML) models to interpret this data, however, is bottlenecked by costly and time-consuming human annotation efforts. One way to overcome this challenge is through self-supervised learning (SSL). By training on large amounts of unlabeled image data, self-supervised models learn image representations that can be transferred to downstream tasks, such as image classification or segmentation. This approach produces image representations that generalize well to unseen data and reduces the amount of labeled data required to build performant downstream models.

In this post, we demonstrate how to train self-supervised vision transformers on overhead imagery using Amazon SageMaker. Travelers collaborated with the Amazon Machine Learning Solutions Lab (now known as the Generative AI Innovation Center) to develop this framework to support and enhance aerial imagery model use cases. Our solution is based on the DINO algorithm and uses the SageMaker distributed data parallel library (SMDDP) to split the data over multiple GPU instances. When pre-training is complete, the DINO image representations can be transferred to a variety of downstream tasks. This initiative led to improved model performances within the Travelers Data & Analytics space.

Overview of solution

The two-step process for pre-training vision transformers and transferring them to supervised downstream tasks is shown in the following diagram.

In the following sections, we provide a walkthrough of the solution using satellite images from the BigEarthNet-S2 dataset. We build on the code provided in the DINO repository.

Prerequisites

Before getting started, you need access to a SageMaker notebook instance and an Amazon Simple Storage Service (Amazon S3) bucket.

Prepare the BigEarthNet-S2 dataset

BigEarthNet-S2 is a benchmark archive that contains 590,325 multispectral images collected by the Sentinel-2 satellite. The images document the land cover, or physical surface features, of ten European countries between June 2017 and May 2018. The types of land cover in each image, such as pastures or forests, are annotated according to 19 labels. The following are a few example RGB images and their labels.

The first step in our workflow is to prepare the BigEarthNet-S2 dataset for DINO training and evaluation. We start by downloading the dataset from the terminal of our SageMaker notebook instance:

wget https://bigearth.net/downloads/BigEarthNet-S2-v1.0.tar.gz
tar -xvf BigEarthNet-S2-v1.0.tar.gz

The dataset has a size of about 109 GB. Each image is stored in its own folder and contains 12 spectral channels. Three bands with 60m spatial resolution (60-meter pixel height/width) are designed to identify aerosols (B01), water vapor (B09), and clouds (B10). Six bands with 20m spatial resolution are used to identify vegetation (B05, B06, B07, B8A) and distinguish between snow, ice, and clouds (B11, B12). Three bands with 10m spatial resolution help capture visible and near-infrared light (B02, B03, B04, B8/B8A). Additionally, each folder contains a JSON file with the image metadata. A detailed description of the data is provided in the BigEarthNet Guide.

To perform statistical analyses of the data and load images during DINO training, we process the individual metadata files into a common geopandas Parquet file. This can be done using the BigEarthNet Common and the BigEarthNet GDF Builder helper packages:

python -m bigearthnet_gdf_builder.builder build-recommended-s2-parquet BigEarthNet-v1.0/

The resulting metadata file contains the recommended image set, which excludes 71,042 images that are fully covered by seasonal snow, clouds, and cloud shadows. It also contains information on the acquisition date, location, land cover, and train, validation, and test split for each image.

We store the BigEarthNet-S2 images and metadata file in an S3 bucket. Because we use true color images during DINO training, we only upload the red (B04), green (B03), and blue (B02) bands:

aws s3 cp final_ben_s2.parquet s3://bigearthnet-s2-dataset/metadata/
aws s3 cp BigEarthNet-v1.0/ s3://bigearthnet-s2-dataset/data_rgb/ 
    --recursive 
    --exclude "*" 
    --include "_B02.tif" 
    --include "_B03.tif"  
    --include "_B04.tif"

The dataset is approximately 48 GB in size and has the following structure:

bigearthnet-s2-dataset/                                    Amazon S3 bucket
├── metadata/
│ └── final_ben_s2.parquet 
└── dataset_rgb/
  ├── S2A_MSIL2A_20170613T101031_0_45/
  │ └── S2A_MSIL2A_20170613T101031_0_45_B02.tif            Blue channel
  │ └── S2A_MSIL2A_20170613T101031_0_45_B03.tif            Green channel
  │ └── S2A_MSIL2A_20170613T101031_0_45_B04.tif            Red channel

Train DINO models with SageMaker

Now that our dataset has been uploaded to Amazon S3, we move to train DINO models on BigEarthNet-S2. As shown in the following figure, the DINO algorithm passes different global and local crops of an input image to student and teacher networks. The student network is taught to match the output of the teacher network by minimizing the cross-entropy loss. The student and teacher weights are connected by an exponential moving average (EMA).

We make two modifications to the original DINO code. First, we create a custom PyTorch dataset class to load the BigEarthNet-S2 images. The code was initially written to process ImageNet data and expects images to be stored by class. BigEarthNet-S2, however, is a multi-label dataset where each image resides in its own subfolder. Our dataset class loads each image using the file path stored in the metadata:

import pandas as pd
import rasterio
from PIL import Image
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
 
OPTICAL_MAX_VALUE = 2000

LAND_COVER_LABELS = [
    "Urban fabric",
    "Industrial or commercial units",
    "Arable land",
    "Permanent crops",
    "Pastures",
    "Complex cultivation patterns",
    "Land principally occupied by agriculture, with significant areas of natural vegetation",
    "Agro-forestry areas",
    "Broad-leaved forest",
    "Coniferous forest",
    "Mixed forest",
    "Natural grassland and sparsely vegetated areas",
    "Moors, heathland and sclerophyllous vegetation",
    "Transitional woodland, shrub",
    "Beaches, dunes, sands",
    "Inland wetlands",
    "Coastal wetlands",
    "Inland waters",
    "Marine waters",
]
 
class BigEarthNetDataset(Dataset):
     """
     PyTorch dataset class that loads the BigEarthNet-S2 images from a metadata file.

     Args: 
          metadata_file: path to metadata file 
          data_dir: directory where BigEarthNet-S2 data is located  
          split: train, validation, or test split
          transform: transformations applied to the input image
     """
     def __init__(self, metadata_file, data_dir, split="train", transform=None):
		# image file paths from metadata
        metadata = pd.read_parquet(metadata_file)
        self.metadata_split = metadata[metadata["original_split"] == split]
        self.data_dir = data_dir
        self.patch_names = self.metadata_split["name"].tolist()
 
        # one-hot-encode land cover labels 
        multiclass_labels = self.metadata_split.new_labels.tolist()
        self.labels = self.get_multi_onehot_labels(multiclass_labels)

        # transforms        
        self.transform = transform
 
    def __len__(self):
        """Return length of dataset."""
        return len(self.metadata_split)
 
    def __getitem__(self, index):
        """Returns the image and label for a given index."""
        patch_name = self.patch_names[index]
        file_path = os.path.join(self.data_dir, patch_name)
	
	# generate RGB image
        r_channel = rasterio.open(os.path.join(file_path, patch_name + "_B04.tif")).read(1)
        g_channel = rasterio.open(os.path.join(file_path, patch_name + "_B03.tif")).read(1)
        b_channel = rasterio.open(os.path.join(file_path, patch_name + "_B02.tif")).read(1)
 
        image = np.stack([r_channel, g_channel, b_channel], axis=2)
        image = image / OPTICAL_MAX_VALUE * 255
        image = np.clip(image, 0, 225).astype(np.uint8)
    
        # apply image transforms
        image = Image.fromarray(image, mode="RGB")
        if self.transform is not None:
            image = self.transform(image)
 
        # load label
        label = self.labels[index]
 
        return image, label
  
    def get_multi_onehot_labels(self, multiclass_labels):
        """Convert BEN-19 labels to one-hot encoded vector."""
        targets = torch.zeros([len(multiclass_labels), len(LAND_COVER_LABELS)])
        for index, img_labels in enumerate(multiclass_labels):
            for label in img_labels:
                index_hot = LAND_COVER_LABELS.index(label)
                targets[index, index_hot] = 1.
        return targets

This dataset class is called in main_dino.py during training. Although the code includes a function to one-hot encode the land cover labels, these labels are not used by the DINO algorithm.

The second change we make to the DINO code is to add support for SMDDP. We add the following code to the init_distributed_mode function in the util.py file:

init_distributed_mode function in the util.py file:

def init_distributed_mode(args):
     if json.loads(
          os.environ.get('SM_FRAMEWORK_PARAMS', '{}'))
         .get('sagemaker_distributed_dataparallel_enabled', False)
     ): 
          # launch training with SMDDP 
          dist.init_process_group(backend='smddp')
          args.word_size = dist.get_world_size() 
          args.gpu = int(os.environ['LOCAL_RANK'])

With these adjustments, we are ready to train DINO models on BigEarthNet-S2 using SageMaker. To train on multiple GPUs or instances, we create a SageMaker PyTorch Estimator that ingests the DINO training script, the image and metadata file paths, and the training hyperparameters:

import time
from sagemaker.pytorch import PyTorch

# output bucket where final model artifacts are uploaded 
DINO_OUTPUT_BUCKET = 'dino-models'

# paths on training instance  
sm_metadata_path = '/opt/ml/input/data/metadata'              
sm_data_path = '/opt/ml/input/data/train'                     
sm_output_path = '/opt/ml/output/data'                        
sm_checkpoint_path = '/opt/ml/checkpoints'                

# training job name
dino_base_job_name = f'dino-model-{int(time.time())}'

# create SageMaker Estimator
estimator = PyTorch(
    base_job_name=dino_base_job_name,
    source_dir='path/to/aerial_featurizer',
    entry_point='main_dino.py',
    role=role,
    framework_version="1.12",
    py_version="py38",
    instance_count=1,
    instance_type="ml.p3.16xlarge",    
    distribution = {'smdistributed':{'dataparallel':{'enabled': True}}},        
    volume_size=100,
    sagemaker_session=sagemaker_session,
    hyperparameters = {
        # hyperparameters passed to entry point script
        'arch': 'vit_small',
        'patch_size': 16,
        'metadata_dir': sm_metadata_path,
        'data_dir': sm_data_path,
        'output_dir': sm_output_path,
        'checkpoint_dir': sm_checkpoint_path,
        'epochs': 100,
        'saveckp_freq': 20,
    },
    max_run=24*60*60,               
    checkpoint_local_path = sm_checkpoint_path,
    checkpoint_s3_uri =f's3://{DINO_OUTPUT_BUCKET}/checkpoints/{base_job_name}', 
    debugger_hook_config=False,                           
)

This code specifies that we will train a small vision transformer model (21 million parameters) with a patch size of 16 for 100 epochs. It is best practice to create a new checkpoint_s3_uri for each training job in order to reduce the initial data download time. Because we are using SMDDP, we must train on an ml.p3.16xlarge, ml.p3dn.24xlarge, or ml.p4d.24xlarge instance. This is because SMDDP is only enabled for the largest multi-GPU instances. To train on smaller instance types without SMDDP, you will need to remove the distribution and debugger_hook_config arguments from the estimator.

After we have created the SageMaker PyTorch Estimator, we launch the training job by calling the fit method. We specify the input training data using the Amazon S3 URIs for the BigEarthNet-S2 metadata and images:

# call fit to begin training
estimator.fit(
    inputs={
        'metadata': 's3://bigearthnet-s2-dataset/metadata/',
        'train': 's3://bigearthnet-s2-dataset/data_rgb/',
    },
    wait=False
)

SageMaker spins up the instance, copies the training script and dependencies, and begins DINO training. We can monitor the progress of the training job from our Jupyter notebook using the following commands:

# monitor training
training_job_name = estimator.latest_training_job.name 
attached_estimator = PyTorch.attach(training_job_name)
attached_estimator.logs()

We can also monitor instance metrics and view log files on the SageMaker console under Training jobs. In the following figures, we plot the GPU utilization and loss function for a DINO model trained on an ml.p3.16xlarge instance with a batch size of 128.

During training, the GPU utilization is 83% of the ml.p3.16xlarge capacity (8 NVIDIA Tesla V100 GPUs) and the VRAM usage is 85%. The loss function steadily decreases with each epoch, indicating that the outputs of the student and teacher networks are becoming more similar. In total, training takes about 11 hours.

Transfer learning to downstream tasks

Our trained DINO model can be transferred to downstream tasks like image classification or segmentation. In this section, we use the pre-trained DINO features to predict the land cover classes for images in the BigEarthNet-S2 dataset. As depicted in the following diagram, we train a multi-label linear classifier on top of frozen DINO features. In this example, the input image is associated with arable land and pasture land covers.

Most of the code for the linear classifier is already in place in the original DINO repository. We make a few adjustments for our specific task. As before, we use the custom BigEarthNet dataset to load images during training and evaluation. The labels for the images are one-hot encoded as 19-dimensional binary vectors. We use the binary cross-entropy for the loss function and compute the average precision to evaluate the performance of the model.

To train the classifier, we create a SageMaker PyTorch Estimator that runs the training script, eval_linear.py. The training hyperparameters include the details of the DINO model architecture and the file path for the model checkpoint:

# output bucket where final model artifacts are uploaded 
CLASSIFIER_OUTPUT_BUCKET = 'land-cover-classification'

# DINO checkpoint name 
checkpoint = 'checkpoint.pth'

# paths on training instance  
sm_dino_path = f'/opt/ml/input/data/dino_checkpoint'          
sm_dino_checkpoint = f'{sm_dino_path}/{checkpoint}'           

# training job name
classifier_base_job_name = f'linear-classifier-{int(time.time())}'

# create Estimator 
estimator = PyTorch(
    base_job_name=classifier_base_job_name,
    source_dir='path/to/aerial_featurizer',
    entry_point = 'eval_linear.py',
    role=role,
    framework_version='1.12',
    py_version='py38',
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    sagemaker_session=sagemaker_session,
    hyperparameters = {
    # hyperparameters passed to entry point script
        'arch': 'vit_small',
        'pretrained_weights': sm_dino_checkpoint,
        'epochs': 50,
        'data_dir': sm_data_path,
        'metadata_dir': sm_metadata_path,
        'output_dir': sm_checkpoint_path,
        'num_labels': 19,
    },
    max_run=1*60*60, 
    checkpoint_local_path = sm_checkpoint_path,
    checkpoint_s3_uri =f's3://{CLASSIFIER_OUTPUT_BUCKET}/checkpoints/{base_job_name}',
)

We start the training job using the fit method, supplying the Amazon S3 locations of the BigEarthNet-S2 metadata and training images and the DINO model checkpoint:

# call fit to begin training
estimator.fit(
    inputs={
    'metadata': 's3://bigearthnet-s2-dataset/metadata/',
    'dataset': 's3://bigearthnet-s2-dataset/data_rgb/',
    'dino_checkpoint': f's3://bigearthnet-s2-dataset/dino-models/checkpoints/{dino_base_job_name}',
    },
    wait=False
)

When training is complete, we can perform inference on the BigEarthNet-S2 test set using SageMaker batch transform or SageMaker Processing. In the following table, we compare the average precision of the linear model on test set images using two different DINO image representations. The first model, ViT-S/16 (ImageNet), is the small vision transformer checkpoint included in the DINO repository that was pre-trained using front-facing images in the ImageNet dataset. The second model, ViT-S/16 (BigEarthNet-S2), is the model we produced by pre-training on overhead imagery.

Model Average precision
ViT-S/16 (ImageNet) 0.685
ViT-S/16 (BigEarthNet-S2) 0.732

We find that the DINO model pre-trained on BigEarthNet-S2 transfers better to the land cover classification task than the DINO model pre-trained on ImageNet, resulting in a 6.7% increase in the average precision.

Clean up

After completing DINO training and transfer learning, we can clean up our resources to avoid incurring charges. We stop or delete our notebook instance and remove any unwanted data or model artifacts from Amazon S3.

Conclusion

This post demonstrated how to train DINO models on overhead imagery using SageMaker. We used SageMaker PyTorch Estimators and SMDDP in order to generate representations of BigEarthNet-S2 images without the need for explicit labels. We then transferred the DINO features to a downstream image classification task, which involved predicting the land cover class of BigEarthNet-S2 images. For this task, pre-training on satellite imagery yielded a 6.7% increase in average precision relative to pre-training on ImageNet.

You can use this solution as a template for training DINO models on large-scale, unlabeled aerial and satellite imagery datasets. To learn more about DINO and building models on SageMaker, check out the following resources:


About the Authors

Ben Veasey is a Senior Associate Data Scientist at Travelers, working within the AI & Automation Accelerator team. With a deep understanding of innovative AI technologies, including computer vision, natural language processing, and generative AI, Ben is dedicated to accelerating the adoption of these technologies to optimize business processes and drive efficiency at Travelers.

Jeremy Anderson is a Director & Data Scientist at Travelers on the AI & Automation Accelerator team. He is interested in solving business problems with the latest AI and deep learning techniques including large language models, foundational imagery models, and generative AI. Prior to Travelers, Jeremy earned a PhD in Molecular Biophysics from the Johns Hopkins University and also studied evolutionary biochemistry. Outside of work you can find him running, woodworking, or rewilding his yard.

Jordan Knight is a Senior Data Scientist working for Travelers in the Business Insurance Analytics & Research Department. His passion is for solving challenging real-world computer vision problems and exploring new state-of-the-art methods to do so. He has a particular interest in the social impact of ML models and how we can continue to improve modeling processes to develop ML solutions that are equitable for all. Jordan graduated from MIT with a Master’s in Business Analytics. In his free time you can find him either rock climbing, hiking, or continuing to develop his somewhat rudimentary cooking skills.

June Li is a data scientist at Travelers’s Business Insurance’s Artificial Intelligence team, where she leads and coordinates work in the AI imagery portfolio. She is passionate about implementing innovative AI solutions that bring substantial value to the business partners and stakeholders. Her work has been integral in transforming complex business challenges into opportunities by leveraging cutting-edge AI technologies.

Sourav Bhabesh is a Senior Applied Scientist at the AWS Titan Labs, where he builds Foundational Model (FM) capabilities and features. His specialty is Natural Language Processing (NLP) and is passionate about deep learning. Outside of work he enjoys reading books and traveling.

Laura Kulowski is an Applied Scientist at Amazon’s Generative AI Innovation Center, where she works closely with customers to build generative AI solutions. In her free time, Laura enjoys exploring new places by bike.

Andrew Ang is a Sr. Machine Learning Engineer at AWS. In addition to helping customers build AI/ML solutions, he enjoys water sports, squash and watching travel & food vlogs.

Mehdi Noori is an Applied Science Manager at the Generative AI Innovation Center. With a passion for bridging technology and innovation, he assists AWS customers in unlocking the potential of generative AI, turning potential challenges into opportunities for rapid experimentation and innovation by focusing on scalable, measurable, and impactful uses of advanced AI technologies, and streamlining the path to production.

Read More

Research Focus: Week of August 14, 2023

Research Focus: Week of August 14, 2023

Microsoft Research Focus 22 | Week of August 14, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

HyWay: Enabling Mingling in the Hybrid World

As remote work has grown in recent years, videoconferencing tools like Teams help support structured meetings with a scheduled time, a specific agenda, and a set of invitees. For unstructured interactions, like hallway conversations or water cooler chats, newer “spatial” tools such as Gather and SpatialChat arose. But these are confined to users in virtual-only settings.

Many organizations and events now offer a mix of in-person and remote attendance, or “hybrid” work. This creates a new challenge for remote workers or conference goers who want to stay visible to, and mingle with, their colleagues attending in person. Existing tools fall short either in not supporting unstructured interactions, or in not supporting hybrid settings, or both.

In a recent paper: HyWay: Enabling Mingling in the Hybrid World, researchers from Microsoft present a system to support informal interactions among physical and virtual participants. HyWay lets remote users see and hear, and be seen and heard by, in-person users using large displays placed in hallways or “physical zones,” with the ability to move between the zones using a map-based interface. In-person users, who aren’t tethered to a device or app, can simply walk from one zone to another.

The paper includes user survey findings from multiple deployments.


NEW RESEARCH

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, are the standard tables in relational databases. However, a survey of real spreadsheet-tables and web-tables shows that over 30% of tables “in the wild” do not conform to the relational standard. This means complex table-restructuring transformations are needed before these tables can be queried using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, creating a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Power BI/Tableau forums.

In a new paper: Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples, researchers from Microsoft present a system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages). This system transforms non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations.

The research includes an extensive benchmark for this new task, compiled by collecting 244 real test cases from publicly available spreadsheets and online forums. The accompanying evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.


NEW RESEARCH

Learning to Retrieve In-Context Examples for Large Language Models

In-context learning is an emerging paradigm that allows large language models (LLMs) to perform tasks with few-shot examples, without requiring any updates to the model parameters. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples.

In a new paper: Learning to Retrieve In-Context Examples for Large Language Models, researchers from Microsoft propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. This framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder-based dense retriever. Experiments on a suite of 30 tasks demonstrate that the framework significantly enhances in-context learning performance. The research also demonstrates the generalization ability of the framework to unseen tasks during training. An in-depth analysis reveals that the model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes.


NEW RESEARCH

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

The Computer-Aided Pronunciation Training (CAPT) system is a powerful tool designed to help people improve their language skills by using advanced AI technologies. Pronunciation assessment is a major challenge in CAPT, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments.

To address this problem, a new paper from researchers at Microsoft: End-to-End Word-Level Pronunciation Assessment with MASK Pre-training, proposes a simple, yet effective method called Masked pre-training for Pronunciation Assessment (MPA). By incorporating a mask-predict strategy, MPA allows the model to train in an end-to-end manner, eliminating the problem of misalignment in word-level assessment. Furthermore, the researchers designed two evaluation strategies to enable the model to conduct assessments in both unsupervised and supervised settings. Experimental results on the SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. Despite this, MPA still has some limitations, such as requiring more inference time and reference text. Those limitations are expected to be addressed in future work.

The post Research Focus: Week of August 14, 2023 appeared first on Microsoft Research.

Read More

How Thomson Reuters developed Open Arena, an enterprise-grade large language model playground, in under 6 weeks

How Thomson Reuters developed Open Arena, an enterprise-grade large language model playground, in under 6 weeks

This post is cowritten by Shirsha Ray Chaudhuri, Harpreet Singh Baath, Rashmi B Pawar, and Palvika Bansal from Thomson Reuters.

Thomson Reuters (TR), a global content and technology-driven company, has been using artificial intelligence (AI) and machine learning (ML) in its professional information products for decades. Thomson Reuters Labs, the company’s dedicated innovation team, has been integral to its pioneering work in AI and natural language processing (NLP). A key milestone was the launch of Westlaw Is Natural (WIN) in 1992. This technology was one of the first of its kind, using NLP for more efficient and natural legal research. Fast forward to 2023, and Thomson Reuters continues to define the future of professionals through rapid innovation, creative solutions, and powerful technology.

The introduction of generative AI provides another opportunity for Thomson Reuters to work with customers and once again advance how they do their work, helping professionals draw insights and automate workflows, enabling them to focus their time where it matters most. While Thomson Reuters pushes the boundaries of what generative AI and other technologies could do for the modern professional, how is it using the power of this technology for its own teams?

Thomson Reuters is highly focused on driving awareness and understanding of AI among colleagues in every team and every business area. Starting from foundational principles of what is AI and how does ML work, it’s delivering a rolling program of company-wide AI awareness sessions, including webinars, training materials, and panel discussions. During these sessions, ideas on how AI could be used started to surface as colleagues considered how to use tools that helped them use AI for their day-to-day tasks as well as serve their customers.

In this post, we discuss how Thomson Reuters Labs created Open Arena, Thomson Reuters’s enterprise-wide large language model (LLM) playground that was developed in collaboration with AWS. The original concept came out of an AI/ML Hackathon supported by Simone Zucchet (AWS Solutions Architect) and Tim Precious (AWS Account Manager) and was developed into production using AWS services in under 6 weeks with support from AWS. AWS-managed services such as AWS Lambda, Amazon DynamoDB, and Amazon SageMaker, as well as the pre-built Hugging Face Deep Learning Containers (DLCs), contributed to the pace of innovation. Open Arena has helped unlock company-wide experimentation with generative AI in a safe and controlled environment.

Diving deeper, Open Arena is a web-based playground that allows users to experiment with a growing set of tools enabled with LLMs. This provides non-programmatic access for Thomson Reuters employees who don’t have a background in coding but want to explore the art of the possible with generative AI at TR. Open Arena has been developed to get quick answers from several sets of corpora, such as for customer support agents, solutions to get quick answers from websites, solutions to summarize and verify points in a document, and much more. The capabilities of Open Arena continue to grow as the experiences from employees across Thomson Reuters spur new ideas and as new trends emerge in the field of generative AI. This is all facilitated by the modular serverless AWS architecture that underpins the solution.

Envisioning the Open Arena

Thomson Reuters’s objective was clear: to build a safe, secure, user-friendly platform—an “open arena”—as an enterprise-wide playground. Here, internal teams could not only explore and test the various LLMs developed in-house and those from the open-source community such as with the AWS and Hugging Face partnership, but also discover unique use cases by merging the capabilities of LLMs with Thomson Reuters’s extensive company data. This kind of platform would enhance the ability of teams to generate innovative solutions, improving the products and services that Thomson Reuters could offer its clients.

The envisioned Open Arena platform would serve the diverse teams within Thomson Reuters globally, providing them with a playground to freely interact with LLMs. The ability to have this interaction in a controlled environment would allow teams to uncover new applications and methodologies that might not have been apparent in a less direct engagement with these complex models.

Building the Open Arena

Building the Open Arena was a multi-faceted process. We aimed to harness the capabilities of AWS’s serverless and ML services to craft a solution that would seamlessly enable Thomson Reuters employees to experiment with the latest LLMs. We saw the potential of these services not only to provide scalability and manageability but also to ensure cost-effectiveness.

Solution overview

From creating a robust environment for model deployment and fine-tuning to ensuring meticulous data management and providing a seamless user experience, TR needed each aspect to integrate with several AWS services. Open Arena’s architecture was designed to be comprehensive yet intuitive, balancing complexity with ease of use. The following diagram illustrates this architecture.

SageMaker served as the backbone, facilitating model deployment as SageMaker endpoints and providing a robust environment for fine-tuning the models. We capitalized on the Hugging Face on SageMaker DLC offered by AWS to enhance our deployment process. In addition, we used the SageMaker Hugging Face Inference Toolkit and the Accelerate library to accelerate the inference process and effectively handle the demands of running complex and resource-intensive models. These comprehensive tools were instrumental in ensuring the fast and seamless deployment of our LLMs. Lambda functions, triggered by Amazon API Gateway, managed the APIs, ensuring meticulous preprocessing and postprocessing of the data.

In our quest to deliver a seamless user experience, we adopted a secure API Gateway to connect the front end hosted in Amazon Simple Storage Service (Amazon S3) to the Lambda backend. We deployed the front end as a static site on an S3 bucket, ensuring user authentication with the help of Amazon CloudFront and our company’s single sign-on mechanism.

Open Arena has been designed to integrate seamlessly with multiple LLMs through REST APIs. This ensured that the platform was flexible enough to react and integrate quickly as new state-of-the art-models were developed and released in the fast-paced generative AI space. From its inception, Open Arena was architected to provide a safe and secure enterprise AI/ML playground, so Thomson Reuters employees can experiment with any state-of-the-art LLM as quickly as they are released. Using Hugging Face models on SageMaker allowed the team to fine-tune models in a secure environment because all data is encrypted and doesn’t leave the virtual private cloud (VPC), ensuring that data remains private and confidential.

DynamoDB, our chosen NoSQL database service, efficiently stored and managed a wide variety of data, including user queries, responses, response times, and user data. To streamline the development and deployment process, we employed AWS CodeBuild and AWS CodePipeline for continuous integration and continuous delivery (CI/CD). Monitoring the infrastructure and ensuring its optimal functioning was made possible with Amazon CloudWatch, which provided custom dashboards and comprehensive logging capabilities.

Model development and integration

The heart of Open Arena is its diverse assortment of LLMs, which comprise both open-source and in-house developed models. These models have been fine-tuned to provide responses following specific user prompts.

We have experimented with different LLMs for different use cases in Open Arena, including Flan-T5-XL, Open Assistant, MPT, Falcon, and fine-tuned Flan-T5-XL on available open-source datasets using the parameter efficient fine-tuning technique. We used bitsandbytes integration from Hugging Face to experiment with various quantization techniques. This allowed us to optimize our LLMs for enhanced performance and efficiency, paving the way for even greater innovation. While selecting a model as a backend behind these use cases, we considered different aspects, like what does the performance of these models look like on NLP tasks that are of relevance to Thomson Reuters. Furthermore, we needed to consider engineering aspects, such as the following:

  • Increased efficiency when building applications with LLMs – Quickly integrating and deploying state-of-the-art LLMs into our applications and workloads that run on AWS, using familiar controls and integrations with the depth and breadth of AWS
  • Secure customization – Ensuring that all data used to fine-tune LLMs remains encrypted and does not leave the VPC
  • Flexibility – The ability to choose from a wide selection of AWS native and open-source LLMs to find the right model for our varied use cases

We’ve been asking questions like is the higher cost of larger models justified by significant performance gains? Can these models handle long documents?

The following diagram illustrates our model architecture.

We have been evaluating these models on the preceding aspects on open-source legal datasets and Thomson Reuters internal datasets to assess them for specific use cases.

For content-based use cases (experiences that call for answers from specific corpus), we have a retrieval augmented generation (RAG) pipeline in place, which will fetch the most relevant content against the query. In such pipelines, documents are split into chunks and then embeddings are created and stored in OpenSearch. To get the best match documents or chunks, we use the retrieval/re-ranker approach based on bi-encoder and cross-encoder models. The retrieved best match is then passed as an input to the LLM along with the query to generate the best response.

The integration of Thomson Reuters’s internal content with the LLM experience has been instrumental in enabling users to extract more relevant and insightful results from these models. More importantly, it led to sparking ideas amongst every team for possibilities of adopting AI-enabled solutions in their business workflows.

Open Arena tiles: Facilitating user interaction

Open Arena adopts a user-friendly interface, designed with pre-set enabling tiles for each experience, as shown in the following screenshot. These tiles serve as pre-set interactions that cater to the specific requirements of the users.

For instance, the Experiment with Open Source LLM tile opens a chat-like interaction channel with open-source LLMs.

The Ask your Document tile allows users to upload documents and ask specific questions related to the content from the LLMs. The Experiment with Summarization tile enables users to distil large volumes of text into concise summaries, as shown in the following screenshot.

These tiles simplify the user consumption of AI-enabled work solutions and the navigation process within the platform, igniting creativity and fostering the discovery of innovative use cases.

The impact of the Open Arena

The launch of the Open Arena marked a significant milestone in Thomson Reuters’s journey towards fostering a culture of innovation and collaboration. The platform’s success was undeniable, with its benefits becoming rapidly evident across the company.

The Open Arena’s intuitive, chat-based design required no significant technical knowledge, making it accessible to different teams and different job roles across the globe. This ease of use boosted engagement levels, encouraging more users to explore the platform and unveiling innovative use cases.

In under a month, the Open Arena catered to over 1,000 monthly internal users from TR’s global footprint, averaging an interaction time of 5 minutes per user. With a goal to foster internal TR LLM experimentation and crowdsource creation of LLM use cases, Open Arena’s launch led to an influx of new use cases, effectively harnessing the power of LLMs combined with Thomson Reuters’s vast data resources.

Here’s what some of our users had to say about the Open Arena:

“Open Arena gives employees from all parts of the company a chance to experiment with LLMs in a practical, hands-on way. It’s one thing to read about AI tools, and another to use them yourself. This platform turbo-charges our AI learning efforts across Thomson Reuters.”

– Abby Pinto, Talent Development Solutions Lead, People Function

“OA (Open Arena) has enabled me to experiment with tricky news translation problems for the German Language Service of Reuters that conventional translation software can’t handle, and to do so in a safe environment where I can use our actual stories without fear of data leaks. The team behind OA has been incredibly responsive to suggestions for new features, which is the sort of service you can only dream of with other software.”

– Scot W. Stevenson, Senior Breaking News Correspondent for the German Language Service, Berlin, Germany

“When I used Open Arena, I got the idea to build a similar interface for our teams of customer support agents. This playground helped us reimagine the possibilities with GenAI.”

– Marcel Batista, Gerente de Servicos, Operations Customer Service & Support

“Open Arena powered by AWS serverless services, Amazon SageMaker, and Hugging Face helped us to quickly expose cutting-edge LLMs and generative AI tooling to our colleagues, which helped drive enterprise-wide innovation.”

– Shirsha Ray Chaudhuri, Director, Research Engineering, Thomson Reuters Labs

On a broader scale, the introduction of the Open Arena had a profound impact on the company. It not only increased AI awareness among employees but also stimulated a spirit of innovation and collaboration. The platform brought teams together to explore, experiment, and generate ideas, fostering an environment where groundbreaking concepts could be turned into reality.

Furthermore, the Open Arena has had a positive influence on Thomson Reuters AI services and products. The platform has served as a sandbox for AI, allowing teams to identify and refine AI applications before incorporating them into our offerings. Consequently, this has accelerated the development and enhancement of Thomson Reuters AI services, providing customers with solutions that are ever evolving and at the forefront of technological advancement.

Conclusion

In the fast-paced world of AI, it is crucial to continue advancing, and Thomson Reuters is committed to doing just that. The team behind the Open Arena is constantly working to add more features and enhance the platform’s capabilities, using AWS services like Amazon Bedrock and Amazon SageMaker Jumpstart, ensuring that it remains a valuable resource for our teams. As we move forward, we aim to keep pace with the rapidly evolving landscape of generative AI and LLMs. AWS provides the services needed for TR to keep pace with the constantly evolving generative AI field.

In addition to the ongoing development of the Open Arena platform, we are actively working on productionizing the multitude of use cases generated by the platform. This will allow us to provide our customers with even more advanced and efficient AI solutions, tailored to their specific needs. Furthermore, we will continue to foster a culture of innovation and collaboration, enabling our teams to explore new ideas and applications for AI technology.

As we embark on this exciting journey, we are confident that the Open Arena will play a pivotal role in driving innovation and collaboration across Thomson Reuters. By staying at the forefront of AI advancements, we will ensure that our products and services continue to evolve and meet the ever-changing demands of our customers.


About the Authors

Shirsha Ray Chaudhuri (Director, Research Engineering) heads the ML Engineering team in Bangalore for Thomson Reuters Labs, where she is leading the development and deployment of well-architected solutions in AWS and other cloud platforms for ML projects that drive efficiency and value for AI-driven features in Thomson Reuters products, platforms, and business systems. She works with communities on AI for good, societal impact projects and in the tech for D&I space. She loves to network with people who are using AI and modern tech for building a better world that is more inclusive, more digital, and together a better tomorrow.

Harpreet Singh Baath is a Senior Cloud and DevOps Engineer at Thomson Reuters Labs, where he helps research engineers and scientists develop machine learning solutions on cloud platforms. With over 6 years of experience, Harpreet’s expertise spans across cloud architectures, automation, containerization, enabling DevOps practices, and cost optimization. He is passionate about efficiency and cost-effectiveness, ensuring that cloud resources are utilized optimally.

Rashmi B Pawar is a Machine Learning Engineer at Thomson Reuters. She possesses considerable experience in productionizing models, establishing inference, and creating training pipelines tailored for various machine learning applications. Furthermore, she has significant expertise in incorporating machine learning workflows into existing systems and products.

Palvika Bansal is an Associate Applied Research Scientist at Thomson Reuters. She has worked on projects across diverse sectors to solve business problems for customers using AI/ML. She is highly passionate about her work and enthusiastic about taking on new challenges. Outside of work, she enjoys traveling, cooking, and reading.

Simone Zucchet is a Senior Solutions Architect at AWS. With close to a decade’s experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.

Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on natural language processing, large language models, and generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers be successful in their AI/ML journey on AWS and has worked with organizations in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. In his spare time, Heiko travels as much as possible.

João Moura is an AI/ML Specialist Solutions Architect at AWS, based in Spain. He helps customers with deep learning model training and inference optimization, and more broadly building large-scale ML platforms on AWS. He is also an active proponent of ML-specialized hardware and low-code ML solutions.

Georgios Schinas is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in London and works closely with customers in the UK and Ireland. Georgios helps customers design and deploy machine learning applications in production on AWS, with a particular interest in MLOps practices and enabling customers to perform machine learning at scale. In his spare time, he enjoys traveling, cooking, and spending time with friends and family.

Read More

Replit CEO Amjad Masad on Empowering the Next Billion Software Creators

Replit aims to empower the next billion software creators.

In this week’s episode of NVIDIA’s AI Podcast, host Noah Kraviz dives into a conversation with Replit CEO Amjad Masad. Masad says the San Francisco-based maker of a software development platform, which came up as a member of NVIDIA’s Inception program for startups, wants to bridge the gap between ideas and software, a task simplified by advances in generative AI.

“Replit is fundamentally about reducing the friction between an idea and a software product,” Masad said.

The company’s Ghostwriter coding AI has two main features: a code completion model and a chat model. These features not only make suggestions as users type their code, but also provide intelligent explanations of what a piece of code is doing, tracing dependencies and context. The model can even flag errors and offer solutions — like a full collaborator in a Google Docs for code.

The company is also developing “make me an app” functionality. This tool allows users to provide high-level instructions to an Artificial Developer Intelligence, which then builds, tests and iterates the requested software.

The aim is to make software creation accessible to all, even those with no coding experience. While this feature is still under development, Masad said the company plans to improve it over the next year, potentially having it ready for developers in the next six to eight months.

Going forward, Masad envisions a future where AI functions as a collaborator, able to conduct high-level tasks and even manage resources. “We’re entering a period where software is going to feel more alive,” Masad said. “And so I think computing is becoming more humane, more accessible, more exciting, more natural.”
<

h2>You Might Also Like

Jules Anh Tuan Nguyen Explains How AI Lets Amputee Control Prosthetic Hand, Video Games
A postdoctoral researcher at the University of Minnesota discusses his efforts to allow amputees to control their prosthetic limb — right down to the finger motions — with their minds.

Overjet’s Ai Wardah Inam on Bringing AI to Dentistry
Overjet, a member of NVIDIA Inception, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of the company, discusses using AI to improve patient care.

Immunai CTO and Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs
Luis Voloch, co-founder and chief technology officer of Immunai, talks about tackling the challenges of the immune system with a machine learning and data science mindset.

Subscribe to the AI Podcast: Now Available on Amazon Music

In addition to Amazon Music, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.

Read More