Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery

Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery



Figure 1: Airmass measurements (clouds) over Ukraine from February 18, 2022 – March 01, 2022 from the SEVIRI instrument. Data accessed via the EUMETSAT Viewer.

Satellite imagery is a critical source of information during the current invasion of Ukraine. Military strategists, journalists, and researchers use this imagery to make decisions, unveil violations of international agreements, and inform the public of the stark realities of war. With Ukraine experiencing a large amount of cloud cover and attacks often occuring during night-time, many forms of satellite imagery are hindered from seeing the ground. Synthetic Aperture Radar (SAR) imagery penetrates cloud cover, but requires special training to interpret. Automating this tedious task would enable real-time insights, but current computer vision methods developed on typical RGB imagery do not properly account for the phenomenology of SAR. This leads to suboptimal performance on this critical modality. Improving the access to and availability of SAR-specific methods, codebases, datasets, and pretrained models will benefit intelligence agencies, researchers, and journalists alike during this critical time for Ukraine.

In this post, we present a baseline method and pretrained models that enable the interchangeable use of RGB and SAR for downstream classification, semantic segmentation, and change detection pipelines.

Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery



Figure 1: Airmass measurements over Ukraine from February 18, 2022 – March 01, 2022 from the SEVIRI instrument. Data accessed via the EUMETSAT Viewer.

Satellite imagery is a critical source of information during the current invasion of Ukraine. Military strategists, journalists, and researchers use this imagery to make decisions, unveil violations of international agreements, and inform the public of the stark realities of war. With Ukraine experiencing a large amount of cloud cover and attacks often occuring during night-time, many forms of satellite imagery are hindered from seeing the ground. Synthetic aperture radar imagery penetrates cloud cover, but requires special training to interpret. Automating this tedious task would enable real-time insights, but current computer vision methods developed on typical RGB imagery do not properly account for the phenomenology of SAR. This leads to suboptimal performance on this critical modality. Improving the access to and availability of SAR-specific methods, codebases, datasets, and pretrained models will benefit intelligence agencies, researchers, and journalists alike during this critical time for Ukraine.

In this post, we present a baseline method and pretrained models that enable the interchangeable use of RGB and SAR for downstream classification, semantic segmentation, and change detection pipelines.

Enable conversational chatbots for telephony using Amazon Lex and the Amazon Chime SDK

Conversational AI can deliver powerful, automated, interactive experiences through voice and text. Amazon Lex is a service that combines automatic speech recognition and natural language understanding technologies, so you can build these sophisticated conversational experiences. A common application of conversational AI is found in contact centers: self-service virtual agents. We’re excited to announce that you can now use Amazon Chime SDK Public Switched Telephone Network (PSTN) audio to enable conversational self-service applications to reduce call resolution times and automate informational responses.

The Amazon Chime SDK is a set of real-time communications components that developers can use to add audio, messaging, video, and screen-sharing to your web and mobile applications. Amazon Chime SDK PSTN audio integration with Amazon Lex enables builders to develop conversational interfaces for calls to or from the public telephone network. You can now build AI-powered self-service applications such as conversational interactive voice response systems (IVRs), virtual agents, and other telephony applications that use Session Initiation Protocol (SIP) for voice communications.

In addition, we have launched several new features. Amazon Voice Focus for PSTN provides deep learning-based noise suppression to reduce unwanted noise on calls. You can also now use machine learning (ML)-driven text-to-speech in your application through our native integration to Amazon Polly. All features are now directly integrated with Amazon Chime SDK PSTN audio.

In this post, we teach you how to build a conversational IVR system for a fictitious travel service that accepts reservations over the phone using Amazon Lex.

Solution overview

Amazon Chime SDK PSTN audio makes it easy for developers to build customized telephony applications using the agility and operational simplicity of serverless AWS Lambda functions.

For this solution, we use the following components:

  • Amazon Chime SDK PSTN audio
  • AWS Lambda
  • Amazon Lex
  • Amazon Polly

Amazon Lex natively integrates with Amazon Polly to provide text-to-speech capabilities. In this post, we also enable Amazon Voice Focus to reduce background noise on phone calls. In a previous post, we showed how to integrate with Amazon Lex v1 using the API interface. That is no longer required. The heavy lifting of working with Amazon Lex and Amazon Polly is now replaced by a few simple function calls.

The following diagram illustrates the high-level design of the Amazon Chime SDK Amazon Lex chatbot system.

To help you learn to build using the Amazon Chime SDK PSTN audio service, we have published a repository of source code and documentation explaining how that source code works. The source code is in a workshop format, with each example program building upon the previous lesson. The final lesson is how to build a complete Amazon Lex-driven chatbot over the phone. That is the lesson we focus on in this post.

As part of this solution, you create the following resources:

  • SIP media application – A managed object that specifies a Lambda function to invoke.
  • SIP rule – A managed object that specifies a phone number to trigger on and which SIP media application managed object to use to invoke a Lambda function.
  • Phone number – An Amazon Chime SDK PSTN phone number provisioned for receiving phone calls.
  • Lambda function – A function written in Typescript that is integrated with the PSTN audio service. It receives invocations from the SIP media application and sends actions back that instruct the SIP media application to perform Amazon Polly and Amazon Lex tasks.

The demo code is deployed in two parts. The Amazon Lex chatbot example is one of a series of workshop examples that teach how to use Amazon Chime SDK PSTN audio. For this post, you complete the following high-level steps to deploy the chatbot:

  1. Configure the Amazon Lex chatbot.
  2. Clone the code from the GitHub repository.
  3. Deploy the common resources for the workshop (including a phone number).
  4. Deploy the Lambda function that connects Amazon Lex to the phone number.

We go through each step in detail.

Prerequisites

You must have the following prerequisites:

  • node V12+/npm installed
  • The AWS Command Line Interface (AWS CLI) installed
  • Node Version Manager (nvm) installed
  • The node modules typescript aws-sdk (using nvm) installed
  • AWS credentials configured for the account and Region that you use for this demo
  • Permissions to create Amazon Chime SIP media applications and phone numbers (make sure your service quota in us-east-1 or us-west-2 for phone numbers, voice connectors, SIP media applications, and SIP rules hasn’t been reached)
  • Deployment must be done in us-east-1 or us-west-2 to align with PSTN audio resources

For detailed installation instructions, including a script that can automate the installation and an AWS Cloud Development Kit (AWS CDK) project to easily create an Amazon Elastic Compute Cloud (Amazon EC2) development environment, see the workshop instructions.

Configure the Amazon Lex chatbot

You can build a complete conversational voice bot using Amazon Lex. In this example, you use the Amazon Lex console to build a bot. We skip the steps where you build the Lambda function for Amazon Lex. The focus here is how to connect Amazon Chime PSTN audio to Amazon Lex. For instructions on building custom Amazon Lex bots, refer to Amazon Lex: How It Works. In this example, we use the pre-built “book trip” example.

Create a bot

To create your chatbot, complete the following steps:

  1. Sign in to the Amazon Lex console in the same Region that you deployed the Amazon Chime SDK resources in.

This must be in either us-east-1 or us-west-2, depending on where you deployed the Amazon Chime SDK resources using AWS CDK.

  1. In the navigation pane, choose Bots.
  2. Choose Create bot.
  3. Select Start with an example.

  4. For Bot name, enter a name (for example, BookTrip).
  5. For Description, enter an optional description.
  6. Under IAM permissions, select Create a role with basic Amazon Lex permissions.
  7. Under Children’s Online Privacy Protection Act, select No.

This example doesn’t need that protection, but for your own bot creation you should select this option accordingly.

  1. Under Idle session timeout¸ set Session timeout to 1 minute.
  2. You can skip the Advanced settings section.
  3. Choose Next.

  1. For Select Language, choose your preferred language (for this post, we choose English (US)).
  2. For Voice interaction, choose the voice you want to use.
  3. You can enter a voice sample and choose Play to test the phrase and confirm the voice is to your liking.
  4. Leave other settings at their default.
  5. Choose Done.

  1. In the Fulfilment section, enter the following text for On successful fulfilment:
Thank you!  We'll see you on {CheckInDate}.
  1. Under Closing responses, enter the following text for Message:

Goodbye!

  1. Choose Save intent.
  2. Choose Build.

The build process takes a few moments to complete. When it’s finished, you can test the bot on the Amazon Lex console.

Create a version

You have now built the bot. Next, we create a version.

  1. Navigate to the Versions page of your bot (under the bot name in the navigation pane).
  2. Choose Create version.
  3. Accept all the default values and choose Create.

Your new version is now listed on the Versions page.

Create an alias

Next, we create an alias.

  1. In the navigation pane, choose Aliases.
  2. Choose Create alias.
  3. For Alias name, enter a name (for example, production).
  4. Under Associate with a version, choose Version 1 on the drop-down menu.

If you had more than one version of the bot, you could choose the appropriate version here.

  1. Choose Create.

The alias is now listed on the Aliases page.

  1. On the Aliases page, choose the alias you just created.
  2. Under Resource-based policy, choose Edit.
  3. Add the following policy, which allows the Amazon Chime SDK PSTN audio to invoke Amazon Lex for you:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "SMALexAccess",
      "Effect": "Allow",
      "Principal": {
        "Service": "voiceconnector.chime.amazonaws.com"
      },
      "Action": "lex:StartConversation",
      "Resource": "<Resource-ARN-for-the-Alias>",
      "Condition": {
        "StringEquals": {
          "AWS:SourceAccount": "<account-num>"
        },
        "ArnEquals": {
          "AWS:SourceArn": "arn:aws:voiceconnector:<region>:<account-num>:*"
        }
      }
    }
  ]
}

In the preceding code, provide the resource ARN (located directly above the text box), which is the ARN for the bot alias. Also provide your account number and specify the Region you’re deploying into (us-east-1 or us-west-2). That defines the ARN of the PSTN audio control plane in your account.

  1. Choose Save to store the policy.
  2. Choose Copy next to the resource ARN to use in a later step.

Congratulations! You have configured an Amazon Lex bot!

In a real chatbot application, you would almost certainly implement a Lambda function to process the intents. This demo program focuses on explaining how to connect to Amazon Chime SDK PSTN audio, so we don’t go into that level of detail. For more information, refer to Add the Lambda Function as a Code Hook.

Clone the GitHub repository

You can get the code for the entire workshop by cloning the repository:

git clone https://github.com/aws-samples/amazon-chime-sdk-pstn-audio-workshop
cd amazon-chime-sdk-pstn-audio-workshop

Deploy the common resources for the workshop

This workshop uses the AWS CDK to automate the deployment of all needed resources (except the Amazon Lex bot, which you already did). To deploy, run the following code from your terminal:

cdk bootstrap
yarn deploy

The AWS CDK deploys the resources. We do the bootstrap step to make sure that AWS CDK is properly initialized in the Region you’re deploying into. Note that these examples use AWS CDK version 2.

The repository has a series of lessons that are designed to explain how to develop PSTN audio applications. We recommend reviewing these documents to understand the basics using the first few sample programs. You can then review the Lambda sample program folder. Lastly, follow the steps to configure and then deploy your code. In the terminal, enter the following command:

cd lambdas/call-lex-bot

Configure your Lambda function to use the Amazon Lex bot ARN

Open the src/index.ts source code file for the Lambda function and edit the variable botAlias near the top of the file (provide the ARN you copied earlier):

const botAlias = "<Resource-ARN-for-the-Alias>";

You can now deploy the bot with yarn deploy and swap the new Lambda function into PSTN audio with yarn swap. You can also note the welcome text in the startBotConversationAction object:

const startBotConversationAction = {
  Type: "StartBotConversation",
  Parameters: {
    BotAliasArn: "none",
    LocaleId: "en_US",
    Configuration: {
      SessionState: {
        DialogAction: {
          Type: "ElicitIntent"
        }
      },
      WelcomeMessages: [
        {
          ContentType: "PlainText",
          Content: "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to book a room, or, I'd like to rent a car."
        },
      ]
    }
  }
}

Amazon Lex starts the bot and uses Amazon Polly to read that text. This gives the caller a greeting, and tells them what they should do next.

How it works

The following example adds more actions to what we learned in the Call and Bridge Call lesson. The NEW_INBOUND_CALL event arrives and is processed the same way. We enable Amazon Voice Focus (which enhances the ability of Amazon Lex to understand words) and then immediately hand the incoming call off to the bot with a StartBotConversation action. An example of that action looks like the following object:

{
    "SchemaVersion": "1.0",
    "Actions": [
        {
            "Type": "Pause",
            "Parameters": {
                "DurationInMilliseconds": "1000"
            }
        },
        {
            "Type": "VoiceFocus",
            "Parameters": {
                "Enable": true,
                "CallId": "2947dfba-0748-46fc-abc5-a2c21c7569eb"
            }
        },
        {
            "Type": "StartBotConversation",
            "Parameters": {
                "BotAliasArn": "arn:aws:lex:us-east-1:<account-num>:bot-alias/RQXM74UXC7/ZYXLOINIJL",
                "LocaleId": "en_US",
                "Configuration": {
                    "SessionState": {
                        "DialogAction": {
                            "Type": "ElicitIntent"
                        }
                    },
                    "WelcomeMessages": [
                        {
                            "ContentType": "PlainText",
                            "Content": "Welcome to AWS Chime SDK Voice Service. Please say what you would like to do.  For example: I'd like to order flowers."
                        }
                    ]
                }
            }
        }
    ]
}

When the bot returns an ACTION_SUCCESSFUL event, the data collected by the Amazon Lex bot is included in the event. The collected data from the bot is included, and your Lambda function can use that data if needed. However, a common practice for building Amazon Lex applications is to process in the data with the function associated with the Amazon Lex bot. Examples of the event and the returned action are provided in the workshop documentation for this session.

Sequence diagram

The following diagram shows the sequence of calls made between PSTN audio and the Lambda function:

For a more detailed explanation of the operation, refer to the workshop documentation.

Clean up

To clean up the resources used in this demo and avoid incurring further charges, complete the following steps:

  1. In the terminal, enter the following code:
yarn destroy
  1. Return to the workshop folder (cd ../../) and enter the following code:
yarn destroy

The AWS CloudFormation stack created by the AWS CDK is destroyed, removing all the allocated resources.

Conclusion

In this post, you learned how to build a conversational interactive voice response (IVR) system using Amazon Lex and Amazon Chime SDK PSTN audio. You can use these techniques to build your own system to reduce your own customer call resolution times and automate informational responses on your customers calls.

For more information, see the project GitHub repository and Using the Amazon Chime SDK PSTN Audio service.


About the Author

Greg Herlein has led software teams for over 25 years at large and small companies, including several startups. He is currently the Principal Evangelist for the Amazon Chime SDK service where he is passionate about how to help customers build advanced communications software.

Read More

Hopped Up: NVIDIA CEO, AI Leaders to Discuss Next Wave of AI at GTC

NVIDIA’s GTC conference is packed with smart people and programming.

The virtual gathering — which takes place from March 21-24 — sits at the intersection of some of the fastest-moving technologies of our time.

It features a lineup of speakers from every corner of industry, academia and research who are ready to paint a high-definition portrait of how they’re putting the latest technology to work.

A Can’t-Miss Keynote

GTC starts with a keynote from NVIDIA founder and CEO Jensen Huang.

Each GTC, Huang introduces powerful new ways to accelerate computing of all kinds, and tells a story that puts the latest advances in perspective.

Expect Huang to introduce new technologies, products and collaborations with some of the world’s leading companies.

The keynote will be live-streamed Tuesday, March 22, starting at 8 a.m. Pacific, and available on-demand afterward. Conference registration isn’t required to watch.

Leaders From Trillion-Dollar Industries

Huang is joined by people at the cutting edge of fields in industry, research and academia who can get you oriented on how accelerated computing is remaking the world.

The event features 900 sessions representing a broad spectrum of organizations, including Amazon, Bloomberg, DeepMind, Epic Games, Google Brain, Mercedes-Benz, Microsoft, NASA, NFL, Pfizer, Visa, VMware, Walt Disney, Zoom and many more.

This GTC will focus on accelerated computing, deep learning, data science, digital twins, networking, quantum computing and computing in the data center, cloud and edge.

In addition to participants from NVIDIA, GTC will feature prominent technology experts including:

  • Andrew Ng, founder of DeepLearning.AI, founder and CEO of Landing AI
  • Bjorn Stevens, managing director and director of The Atmosphere in the Earth System, Max Planck Institute for Meteorology
  • Chelsea Finn, assistant professor of computer science, Stanford University
  • Hao Yang, vice president of AI Research, Visa
  • Jack Jin, lead machine learning Infra engineer, Zoom
  • Joe Ucuzoglu, CEO, Deloitte U.S.
  • Lidia Fonseca, chief digital and technology officer, Pfizer
  • Magnus Östberg, chief software officer, Mercedes-Benz AG
  • Marc Petit, general manager for Unreal Engine, Epic Games
  • Markus Gross, vice president of Research, Walt Disney Studios
  • Michael Russinovich, CTO and Technical Fellow, Microsoft Azure
  • Natalya Tatarchuk, director of global graphics, Unity
  • Peter Stone, executive director, Sony AI, and professor of computer science, University of Texas, Austin
  • Stefan Sicklinger, head of BigLoop and Advanced Systems, CARIAD/VW Group
  • Yu Liu, director of AI, Meta
  • Zoe Lofgren, member of Congress, U.S. House of Representatives

Spotlight on Startups

NVIDIA Inception, a global program to nurture cutting-edge startups with 9,000+ members, will host tracks aimed at helping emerging companies build and grow their businesses and gain industry knowledge.

Sessions designed for venture capital firms include: “Emerging Venture Themes for 2022 – Omniverse + Metaverse” and “Emerging Venture Themes for 2022 – Quantum Computing.”

Learning and Development 

GTC also offers excellent opportunities for new and experienced developers to get training in some of the hottest areas in technology.

It starts with Learning Day on Monday, March 21, and continues all week. There will be sessions in four languages across multiple time zones from NVIDIA subject-matter experts and through NVIDIA’s Deep Learning Institute and the NVIDIA Academy.

Students and early-career professionals can participate in introductory deep learning and robotics courses. These include sessions like “The Right Formula for AI Success: Insights from AI High Performer,” “Deep Learning Demystified” and the “5 Steps for Starting a Career in AI” panel.

More experienced developers can enroll in DLI courses. Participants can dig as deeply as they like, even after the conference ends and earn DLI certificates demonstrating subject-matter competency.

And through the end of March, new members to NVIDIA’s Developer Program can access an additional free GTC DLI course when they sign up.

Developed for IT professionals, NVIDIA Academy will host certified training programs on the data center, InfiniBand, IT infrastructure and networking. The program includes instructor led-training sessions followed by self-paced coursework and proctored certification tests.

Supporting AI Ecosystem for All

As part of NVIDIA’s commitment to making AI accessible for all developer communities and emerging markets, numerous sessions will showcase how developers and startups in emerging economies are building and scaling AI and data science.

Sessions for emerging markets include “Look to Africa to Advance Artificial Intelligence” and “Democratizing AI in Emerging Markets Through the United AI Alliance.”

NVIDIA also provides free credits for DLI courses to minority-serving institutions, from community colleges to historically Black colleges and universities.

Visit the GTC site to register for free.

The post Hopped Up: NVIDIA CEO, AI Leaders to Discuss Next Wave of AI at GTC appeared first on NVIDIA Blog.

Read More

Build a traceable, custom, multi-format document parsing pipeline with Amazon Textract

Organizational forms serve as a primary business tool across industries—from financial services, to healthcare, and more. Consider, for example, tax filing forms in the tax management industry, where new forms come out each year with largely the same information. AWS customers across sectors need to process and store information in forms as part of their daily business practice. These forms often serve as a primary means for information to flow into an organization where technological means of data capture are impractical.

In addition to using forms to capture information, over the years of offering Amazon Textract, we have observed that AWS customers frequently version their organizational forms based on structural changes made, fields added or changed, or other considerations such as a change of year or version of the form.

When the structure or content of a form changes, frequently this can cause challenges for traditional OCR systems or impact downstream tools used to capture information, even when you need to capture the same information year over year and aggregate the data for use regardless of the format of the document.

To solve this problem, in this post we demonstrate how you can build and deploy an event-driven, serverless, multi-format document parsing pipeline with Amazon Textract.

Solution overview

The following diagram illustrates our solution architecture:

First, the solution offers pipeline ingest using Amazon Simple Storage Service (Amazon S3), Amazon S3 Event Notifications, and an Amazon Simple Queue Service (Amazon SQS) queue so that processing begins when a form lands in the target Amazon S3 partition. An event on Amazon EventBridge is created and sent to an AWS Lambda target that triggers an Amazon Textract job.

You can use serverless AWS services such as Lambda and AWS Step Functions to create asynchronous service integrations between AWS AI services and AWS Analytics and Database services for warehousing, analytics, and AI and machine learning (ML). In this post, we demonstrate how to use Step Functions to asynchronously control and maintain the state of requests to Amazon Textract asynchronous APIs. This is achieved by using a state machine for managing calls and responses. We use Lambda within the state machine to merge the paginated API response data from Amazon Textract into a single JSON object containing semi-structured text data extracted using OCR.

Then we filter across different forms using a standardized approach to aggregate this OCR data into a common structured format using Amazon Athena and a SQL Amazon Textract JSON SerDe.

You can trace the steps taken through this pipeline using serverless Step Functions to track the processing state and retain the output of each state. This is something that customers in some industries prefer to do when working with data where you must retain the results of all predictions from services such as Amazon Textract for promoting explainability of your pipeline results in the long term.

Finally, you can query the extracted data in Athena tables.

In the following sections, we walk you through setting up the pipeline using AWS CloudFormation, testing the pipeline, and adding new form versions. This pipeline provides a maintainable solution because every component (ingest, text extraction, text processing) is independent and isolated.

Define default input parameters for CloudFormation stacks

To define the input parameters for the CloudFormation stacks, open default.properties under the params folder and enter the following code:

- set the default value for parameter 'pInputBucketName' for Input S3 bucket 
- set the default value for parameter 'pOutputBucketName' for Output S3 bucket 
- set the default value for parameter 'pInputQueueName' for Ingest SQS (a.k.a job scheduler)

Deploy the solution

To deploy your pipeline, complete the following steps:

  1. Choose Launch Stack:
  2. Choose Next.
  3. Specify the stack details as shown in the following screenshot and choose Next.
  4. In the Configure stack options section, add optional tags, permissions, and other advanced settings.
  5. Choose Next.
  6. Review the stack details and select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  7. Choose Create stack.

This initiates stack deployment in your AWS account.

After the stack is deployed successfully, then you can start testing the pipeline as described in the next section.

Test the pipeline

After a successful deployment, complete the following steps to test your pipeline:

  1. Download the sample files onto your computer.
  2. Create an /uploads folder (partition) under the newly created input S3 bucket.
  3. Create the separate folders (partitions) like jobapplications under /uploads.
  4. Upload the first version of the job application from the sample docs folder to the /uploads/jobapplications partition.

When the pipeline is complete, you can find the extracted key-value for this version of the document in /OuputS3/03-textract-parsed-output/jobapplications on the Amazon S3 console.

You can also find it in the Athena table (applications_data_table) on the Database menu (jobapplicationsdatabase).

  1. Upload the second version of the job application from the sample docs folder to the /uploads/jobapplications partition.

When the pipeline is complete, you can find the extracted key-value for this version in /OuputS3/03-textract-parsed-output/jobapplications on the Amazon S3 console.

You can also find it in the Athena table (applications_data_table) on the Database menu (jobapplicationsdatabase).

You’re done! You’ve successfully deployed your pipeline.

Add new form versions

Updating the solution for a new form version is straightforward—each form version only needs to be updated by testing the queries in the processing stack.

After you make the updates, you can redeploy the updated pipeline using AWS CloudFormation APIs and process new documents, arriving at the same standard data points for your schema with minimal disruption and development effort needed to make changes to your pipeline. This flexibility, which is achieved by decoupling the parsing and extraction behavior and using the JSON SerDe functionality in Athena, makes this pipeline a maintainable solution for any number of form versions that your organization needs to process to gather information.

As you run the ingest solution, data from incoming forms is automatically populated to Athena with information about the files and inputs associated to them. When the data in your forms moves from unstructured to structured data, it’s ready to use for downstream applications such as analytics, ML modeling, and more.

Clean up

To avoid incurring ongoing charges, delete the resources you created as part of this solution when you’re done.

  1. On the Amazon S3 console, manually delete the buckets you created as part of the CloudFormation stack.
  2. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  3. Select the main stack and choose Delete.

This automatically deletes the nested stacks.

Conclusion

In this post, we demonstrated how customers seeking to trace and customize the document processing can build and deploy an event-driven, serverless, multi-format document parsing pipeline with Amazon Textract. This pipeline provides a maintainable solution because every component (ingest, text extraction, text processing) are independent and isolated, allowing organizations to operationalize their solutions to address diverse processing needs.

Try the solution today and leave your feedback in the comments section.


About the Authors

Emily Soward is a Data Scientist with AWS Professional Services. She holds a Master of Science with Distinction in Artificial Intelligence from the University of Edinburgh in Scotland, United Kingdom with emphasis on Natural Language Processing (NLP). Emily has served in applied scientific and engineering roles focused on AI-enabled product research and development, operational excellence, and governance for AI workloads running at organizations in the public and private sector. She contributes to customer guidance as an AWS Senior Speaker and recently, as an author for AWS Well-Architected in the Machine Learning Lens.

Sandeep Singh is a Data Scientist with AWS Professional Services. He holds a Master of Science in Information Systems with concentration in AI and Data Science from San Diego State University (SDSU), California. He is a full stack Data Scientist with a strong computer science background and Trusted adviser with specialization in AI Systems and Control design. He is passionate about helping customers to get their high impact projects in the right direction, advising and guiding them in their Cloud journey, and building state-of-the-art AI/ML enabled solutions.

Read More

Offline Optimization for Architecting Hardware Accelerators

Advances in machine learning (ML) often come with advances in hardware and computing systems. For example, the growth of ML-based approaches in solving various problems in vision and language has led to the development of application-specific hardware accelerators (e.g., Google TPUs and Edge TPUs). While promising, standard procedures for designing accelerators customized towards a target application require manual effort to devise a reasonably accurate simulator of hardware, followed by performing many time-intensive simulations to optimize the desired objective (e.g., optimizing for low power usage or latency when running a particular application). This involves identifying the right balance between total amount of compute and memory resources and communication bandwidth under various design constraints, such as the requirement to meet an upper bound on chip area usage and peak power. However, designing accelerators that meet these design constraints is often result in infeasible designs. To address these challenges, we ask: “Is it possible to train an expressive deep neural network model on large amounts of existing accelerator data and then use the learned model to architect future generations of specialized accelerators, eliminating the need for computationally expensive hardware simulations?

In “Data-Driven Offline Optimization for Architecting Hardware Accelerators”, accepted at ICLR 2022, we introduce PRIME, an approach focused on architecting accelerators based on data-driven optimization that only utilizes existing logged data (e.g., data leftover from traditional accelerator design efforts), consisting of accelerator designs and their corresponding performance metrics (e.g., latency, power, etc) to architect hardware accelerators without any further hardware simulation. This alleviates the need to run time-consuming simulations and enables reuse of data from past experiments, even when the set of target applications changes (e.g., an ML model for vision, language, or other objective), and even for unseen but related applications to the training set, in a zero-shot fashion. PRIME can be trained on data from prior simulations, a database of actually fabricated accelerators, and also a database of infeasible or failed accelerator designs1. This approach for architecting accelerators — tailored towards both single- and multi-applications — improves performance upon state-of-the-art simulation-driven methods by about 1.2x-1.5x, while considerably reducing the required total simulation time by 93% and 99%, respectively. PRIME also architects effective accelerators for unseen applications in a zero-shot setting, outperforming simulation-based methods by 1.26x.

PRIME uses logged accelerator data, consisting of both feasible and infeasible accelerators, to train a conservative model, which is used to design accelerators while meeting design constraints. PRIME architects accelerators with up to 1.5x smaller latency, while reducing the required hardware simulation time by up to 99%.

The PRIME Approach for Architecting Accelerators
Perhaps the simplest possible way to use a database of previously designed accelerators for hardware design is to use supervised machine learning to train a prediction model that can predict the performance objective for a given accelerator as input. Then, one could potentially design new accelerators by optimizing the performance output of this learned model with respect to the input accelerator design. Such an approach is known as model-based optimization. However, this simple approach has a key limitation: it assumes that the prediction model can accurately predict the cost for every accelerator that we might encounter during optimization! It is well established that most prediction models trained via supervised learning misclassify adversarial examples that “fool” the learned model into predicting incorrect values. Similarly, it has been shown that even optimizing the output of a supervised model finds adversarial examples that look promising under the learned model2, but perform terribly under the ground truth objective.

To address this limitation, PRIME learns a robust prediction model that is not prone to being fooled by adversarial examples (that we will describe shortly), which would be otherwise found during optimization. One can then simply optimize this model using any standard optimizer to architect simulators. More importantly, unlike prior methods, PRIME can also utilize existing databases of infeasible accelerators to learn what not to design. This is done by augmenting the supervised training of the learned model with additional loss terms that specifically penalize the value of the learned model on the infeasible accelerator designs and adversarial examples during training. This approach resembles a form of adversarial training.

In principle, one of the central benefits of a data-driven approach is that it should enable learning highly expressive and generalist models of the optimization objective that generalize over target applications, while also potentially being effective for new unseen applications for which a designer has never attempted to optimize accelerators. To train PRIME so that it generalizes to unseen applications, we modify the learned model to be conditioned on a context vector that identifies a given neural net application we wish to accelerate (as we discuss in our experiments below, we choose to use high-level features of the target application: such as number of feed-forward layers, number of convolutional layers, total parameters, etc. to serve as the context), and train a single, large model on accelerator data for all applications designers have seen so far. As we will discuss below in our results, this contextual modification of PRIME enables it to optimize accelerators both for multiple, simultaneous applications and new unseen applications in a zero-shot fashion.

Does PRIME Outperform Custom-Engineered Accelerators?
We evaluate PRIME on a variety of actual accelerator design tasks. We start by comparing the optimized accelerator design architected by PRIME targeted towards nine applications to the manually optimized EdgeTPU design. EdgeTPU accelerators are primarily optimized towards running applications in image classification, particularly MobileNetV2, MobileNetV3 and MobileNetEdge. Our goal is to check if PRIME can design an accelerator that attains a lower latency than a baseline EdgeTPU accelerator3, while also constraining the chip area to be under 27 mm2 (the default for the EdgeTPU accelerator). Shown below, we find that PRIME improves latency over EdgeTPU by 2.69x (up to 11.84x in t-RNN Enc), while also reducing the chip area usage by 1.50x (up to 2.28x in MobileNetV3), even though it was never trained to reduce chip area! Even on the MobileNet image-classification models, for which the custom-engineered EdgeTPU accelerator was optimized, PRIME improves latency by 1.85x.

Comparing latencies (lower is better) of accelerator designs suggested by PRIME and EdgeTPU for single-model specialization.
The chip area (lower is better) reduction compared to a baseline EdgeTPU design for single-model specialization.

Designing Accelerators for New and Multiple Applications, Zero-Shot
We now study how PRIME can use logged accelerator data to design accelerators for (1) multiple applications, where we optimize PRIME to design a single accelerator that works well across multiple applications simultaneously, and in a (2) zero-shot setting, where PRIME must generate an accelerator for new unseen application(s) without training on any data from such applications. In both settings, we train the contextual version of PRIME, conditioned on context vectors identifying the target applications and then optimize the learned model to obtain the final accelerator. We find that PRIME outperforms the best simulator-driven approach in both settings, even when very limited data is provided for training for a given application but many applications are available. Specifically in the zero-shot setting, PRIME outperforms the best simulator-driven method we compared to, attaining a reduction of 1.26x in latency. Further, the difference in performance increases as the number of training applications increases.

The average latency (lower is better) of test applications under zero-shot setting compared to a state-of-the-art simulator-driven approach. The text on top of each bar shows the set of training applications.

Closely Analyzing an Accelerator Designed by PRIME
To provide more insight to hardware architecture, we examine the best accelerator designed by PRIME and compare it to the best accelerator found by the simulator-driven approach. We consider the setting where we need to jointly optimize the accelerator for all nine applications, MobileNetEdge, MobileNetV2, MobileNetV3, M4, M5, M64, t-RNN Dec, and t-RNN Enc, and U-Net, under a chip area constraint of 100 mm2. We find that PRIME improves latency by 1.35x over the simulator-driven approach.

Per application latency (lower is better) for the best accelerator design suggested by PRIME and state-of-the-art simulator-driven approach for a multi-task accelerator design. PRIME reduces the average latency across all nine applications by 1.35x over the simulator-driven method.

As shown above, while the latency of the accelerator designed by PRIME for MobileNetEdge, MobileNetV2, MobileNetV3, M4, t-RNN Dec, and t-RNN Enc are better, the accelerator found by the simulation-driven approach yields a lower latency in M5, M6, and U-Net. By closely inspecting the accelerator configurations, we find that PRIME trades compute (64 cores for PRIME vs. 128 cores for the simulator-driven approach) for larger Processing Element (PE) memory size (2,097,152 bytes vs. 1,048,576 bytes). These results show that PRIME favors PE memory size to accommodate the larger memory requirements in t-RNN Dec and t-RNN Enc, where large reductions in latency were possible. Under a fixed area budget, favoring larger on-chip memory comes at the expense of lower compute power in the accelerator. This reduction in the accelerator’s compute power leads to higher latency for the models with large numbers of compute operations, namely M5, M6, and U-Net.

Conclusion
The efficacy of PRIME highlights the potential for utilizing the logged offline data in an accelerator design pipeline. A likely avenue for future work is to scale this approach across an array of applications, where we expect to see larger gains because simulator-driven approaches would need to solve a complex optimization problem, akin to searching for needle in a haystack, whereas PRIME can benefit from generalization of the surrogate model. On the other hand, we would also note that PRIME outperforms prior simulator-driven methods we utilize and this makes it a promising candidate to be used within a simulator-driven method. More generally, training a strong offline optimization algorithm on offline datasets of low-performing designs can be a highly effective ingredient in at the very least, kickstarting hardware design, versus throwing out prior data. Finally, given the generality of PRIME, we hope to use it for hardware-software co-design, which exhibits a large search space but plenty of opportunity for generalization. We have also released both the code for training PRIME and the dataset of accelerators.

Acknowledgments
We thank our co-authors Sergey Levine, Kevin Swersky, and Milad Hashemi for their advice, thoughts and suggestions. We thank James Laudon, Cliff Young, Ravi Narayanaswami, Berkin Akin, Sheng-Chun Kao, Samira Khan, Suvinay Subramanian, Stella Aslibekyan, Christof Angermueller, and Olga Wichrowskafor for their help and support, and Sergey Levine for feedback on this blog post. In addition, we would like to extend our gratitude to the members of “Learn to Design Accelerators”, “EdgeTPU”, and the Vizier team for providing invaluable feedback and suggestions. We would also like to thank Tom Small for the animated figure used in this post.


1The infeasible accelerator designs stem from build errors in silicon or compilation/mapping failures. 
2This is akin to adversarial examples in supervised learning – these examples are close to the data points observed in the training dataset, but are misclassified by the classifier. 
3The performance metrics for the baseline EdgeTPU accelerator are extracted from an industry-based hardware simulator tuned to match the performance of the actual hardware. 
4These are proprietary object-detection models, and we refer to them as M4 (indicating Model 4), M5, and M6 in the paper. 

Read More

3 Questions: How the MIT mini cheetah learns to run

It’s been roughly 23 years since one of the first robotic animals trotted on the scene, defying classical notions of our cuddly four-legged friends. Since then, a barrage of the walking, dancing, and door-opening machines have commanded their presence, a sleek mixture of batteries, sensors, metal, and motors. Missing from the list of cardio activities was one both loved and loathed by humans (depending on whom you ask), and which proved slightly trickier for the bots: learning to run. 

Researchers from MIT’s Improbable AI Lab, part of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and directed by MIT Assistant Professor Pulkit Agrawal, as well as the Institute of AI and Fundamental Interactions (IAIFI) have been working on fast-paced strides for a robotic mini cheetah — and their model-free reinforcement learning system broke the record for the fastest run recorded. Here, MIT PhD student Gabriel Margolis and IAIFI postdoc Ge Yang discuss just how fast the cheetah can run. 

Q: We’ve seen videos of robots running before. Why is running harder than walking?  

A: Achieving fast running requires pushing the hardware to its limits, for example by operating near the maximum torque output of motors. In such conditions, the robot dynamics are hard to analytically model. The robot needs to respond quickly to changes in the environment, such as the moment it encounters ice while running on grass. If the robot is walking, it is moving slowly and the presence of snow is not typically an issue. Imagine if you were walking slowly, but carefully: you can traverse almost any terrain. Today’s robots face an analogous problem. The problem is that moving on all terrains as if you were walking on ice is very inefficient, but is common among today’s robots. Humans run fast on grass and slow down on ice — we adapt. Giving robots a similar capability to adapt requires quick identification of terrain changes and quickly adapting to prevent the robot from falling over. In summary, because it’s impractical to build analytical (human-designed) models of all possible terrains in advance, and the robot’s dynamics become more complex at high-velocities, high-speed running is more challenging than walking.

Q: Previous agile running controllers for the MIT Cheetah 3 and mini cheetah, as well as for Boston Dynamics’ robots, are “analytically designed,” relying on human engineers to analyze the physics of locomotion, formulate efficient abstractions, and implement a specialized hierarchy of controllers to make the robot balance and run. You use a “learn-by-experience model” for running instead of programming it. Why? 

A: Programming how a robot should act in every possible situation is simply very hard. The process is tedious, because if a robot were to fail on a particular terrain, a human engineer would need to identify the cause of failure and manually adapt the robot controller, and this process can require substantial human time. Learning by trial and error removes the need for a human to specify precisely how the robot should behave in every situation. This would work if: (1) the robot can experience an extremely wide range of terrains; and (2) the robot can automatically improve its behavior with experience. 

Thanks to modern simulation tools, our robot can accumulate 100 days’ worth of experience on diverse terrains in just three hours of actual time. We developed an approach by which the robot’s behavior improves from simulated experience, and our approach critically also enables successful deployment of those learned behaviors in the real world. The intuition behind why the robot’s running skills work well in the real world is: Of all the environments it sees in this simulator, some will teach the robot skills that are useful in the real world. When operating in the real world, our controller identifies and executes the relevant skills in real-time.  

Q: Can this approach be scaled beyond the mini cheetah? What excites you about its future applications?  

A: At the heart of artificial intelligence research is the trade-off between what the human needs to build in (nature) and what the machine can learn on its own (nurture). The traditional paradigm in robotics is that humans tell the robot both what task to do and how to do it. The problem is that such a framework is not scalable, because it would take immense human engineering effort to manually program a robot with the skills to operate in many diverse environments. A more practical way to build a robot with many diverse skills is to tell the robot what to do and let it figure out the how. Our system is an example of this. In our lab, we’ve begun to apply this paradigm to other robotic systems, including hands that can pick up and manipulate many different objects.

This work is supported by the DARPA Machine Common Sense Program, Naver Labs, MIT Biomimetic Robotics Lab, and the NSF AI Institute of AI and Fundamental Interactions. The research was conducted at the Improbable AI Lab.

Read More

Handheld surgical robot can help stem fatal blood loss

After a traumatic accident, there is a small window of time when medical professionals can apply lifesaving treatment to victims with severe internal bleeding. Delivering this type of care is complex, and key interventions require inserting a needle and catheter into a central blood vessel, through which fluids, medications, or other aids can be given. First responders, such as ambulance emergency medical technicians, are not trained to perform this procedure, so treatment can only be given after the victim is transported to a hospital. In some instances, by the time the victim arrives to receive care, it may already be too late.

A team of researchers at MIT Lincoln Laboratory, led by Laura Brattain and Brian Telfer from the Human Health and Performance Systems Group, together with physicians from the Center for Ultrasound Research and Translation (CURT) at Massachusetts General Hospital, led by Anthony Samir, have developed a solution to this problem. The Artificial Intelligence–Guided Ultrasound Intervention Device (AI-GUIDE) is a handheld platform technology that has the potential to help personnel with simple training to quickly install a catheter into a common femoral vessel, enabling rapid treatment at the point of injury.

“Simplistically, it’s like a highly intelligent stud-finder married to a precision nail gun.” says Matt Johnson, a research team member from the laboratory’s Human Health and Performance Systems Group.

AI-GUIDE is a platform device made of custom-built algorithms and integrated robotics that could pair with most commercial portable ultrasound devices. To operate AI-GUIDE, a user first places it on the patient’s body, near where the thigh meets the abdomen. A simple targeting display guides the user to the correct location and then instructs them to pull a trigger, which precisely inserts the needle into the vessel. The device verifies that the needle has penetrated the blood vessel, and then prompts the user to advance an integrated guidewire, a thin wire inserted into the body to guide a larger instrument, such as a catheter, into a vessel. The user then manually advances a catheter. Once the catheter is securely in the blood vessel, the device withdraws the needle and the user can remove the device.

With the catheter safely inside the vessel, responders can then deliver fluid, medicine, or other interventions.

As easy as pressing a button

The Lincoln Laboratory team developed the AI in the device by leveraging technology used for real-time object detection in images.

“Using transfer learning, we trained the algorithms on a large dataset of ultrasound scans acquired by our clinical collaborators at MGH,” says Lars Gjesteby, a member of the laboratory’s research team. “The images contain key landmarks of the vascular anatomy, including the common femoral artery and vein.”

These algorithms interpret the visual data coming in from the ultrasound that is paired with AI-GUIDE and then indicate the correct blood vessel location to the user on the display.

“The beauty of the on-device display is that the user never needs to interpret, or even see, the ultrasound imagery,” says Mohit Joshi, the team member who designed the display. “They are simply directed to move the device until a rectangle, representing the target vessel, is in the center of the screen.”

For the user, the device may seem as easy to use as pressing a button to advance a needle, but to ensure rapid and reliable success, a lot is happening behind the scenes. For example, when a patient has lost a large volume of blood and becomes hypotensive, veins that would typically be round and full of blood become flat. When the needle tip reaches the center of the vein, the wall of the vein is likely to “tent” inward, rather than being punctured by the needle. As a result, though the needle was injected to the proper location, it fails to enter the vessel.

To ensure that the needle reliably punctures the vessel, the team engineered the device to be able to check its own work.

“When AI-GUIDE injects the needle toward the center of the vessel, it searches for the presence of blood by creating suction,” says Josh Werblin, the program’s mechanical engineer. “Optics in the device’s handle trigger when blood is present, indicating that the insertion was successful.” This technique is part of why AI-GUIDE has shown very high injection success rates, even in hypotensive scenarios where veins are likely to tent.

Recently, the team published a paper in the journal Biosensors that reports on AI-GUIDE’s needle insertion success rates. Users with medical experience ranging from zero to greater than 15 years tested AI-GUIDE on an artificial model of human tissue and blood vessels and one expert user tested it on a series of live, sedated pigs. The team reported that after only two minutes of verbal training, all users of the device on the artificial human tissue were successful in placing a needle, with all but one completing the task in less than one minute. The expert user was also successful in quickly placing both the needle and the integrated guidewire and catheter in about a minute. The needle insertion speed and accuracy were comparable to that of experienced clinicians operating in hospital environments on human patients. 

Theodore Pierce, a radiologist and collaborator from MGH, says AI-GUIDE’s design, which makes it stable and easy to use, directly translates to low training requirements and effective performance. “AI-GUIDE has the potential to be faster, more precise, safer, and require less training than current manual image-guided needle placement procedures,” he says. “The modular design also permits easy adaptation to a variety of clinical scenarios beyond vascular access, including minimally invasive surgery, image-guided biopsy, and imaging-directed cancer therapy.”

In 2021, the team received an R&D 100 Award for AI-GUIDE, recognizing it among the year’s most innovative new technologies available for license or on the market. 

What’s next?

Right now, the team is continuing to test the device and work on fully automating every step of its operation. In particular, they want to automate the guidewire and catheter insertion steps to further reduce risk of user error or potential for infection.

“Retraction of the needle after catheter placement reduces the chance of an inadvertent needle injury, a serious complication in practice which can result in the transmission of diseases such as HIV and hepatitis,” says Pierce. “We hope that a reduction in manual manipulation of procedural components, resulting from complete needle, guidewire, and catheter integration, will reduce the risk of central line infection.”

AI-GUIDE was built and tested within Lincoln Laboratory’s new Virtual Integration Technology Lab (VITL). VITL was built in order to bring a medical device prototyping capability to the laboratory.

“Our vision is to rapidly prototype intelligent medical devices that integrate AI, sensing — particularly portable ultrasound — and miniature robotics to address critical unmet needs for both military and civilian care,” says Laura Brattain, who is the AI-GUIDE project co-lead and also holds a visiting scientist position at MGH. “In working closely with our clinical collaborators, we aim to develop capabilities that can be quickly translated to the clinical setting. We expect that VITL’s role will continue to grow.”

AutonomUS, a startup company founded by AI-GUIDE’s MGH co-inventors, recently secured an option for the intellectual property rights for the device. AutonomUS is actively seeking investors and strategic partners.

“We see the AI-GUIDE platform technology becoming ubiquitous throughout the health-care system,” says Johnson, “enabling faster and more accurate treatment by users with a broad range of expertise, for both pre-hospital emergency interventions and routine image-guided procedures.”

This work was supported by the U.S. Army Combat Casualty Care Research Program and Joint Program Committee – 6. Nancy DeLosa, Forrest Kuhlmann, Jay Gupta, Brian Telfer, David Maurer, Wes Hill, Andres Chamorro, and Allison Cheng provided technical contributions, and Arinc Ozturk, Xiaohong Wang, and Qian Li provided guidance on clinical use.

Read More

Meet 3 women who test Google products for fairness

One of the most interesting parts of working at Google is learning what other people do here — it’s not uncommon to come across a job title you’ve never heard of. For example: ProFair Program Manager, or ProFair Analyst.

These roles are part of our Responsible Innovation team, which focuses on making sure our tech supports Google’s AI Principles. One way the team does this is by conducting proactive algorithmic product fairness — or ProFair — testing. This means bringing social and cultural perspectives to the testing process, to assess how an AI or ML application, dataset or technique might avoid reinforcing unfair bias. Three women who work on ProFair testing are Anne Peckham, N’Mah Y. and Cherish M. and today we’re asking them: What’s your job?

The job: ProFair Responsible Innovation

Anne is a program manager, N’Mah is an analyst and Cherish is also an analyst.

So…what do you do?

Anne Peckham, a program manager working on ProFair for Responsible Innovation, says she primarily helps others get things done. “I organize projects, figure out strategies, identify what needs to get done, provide documentation, keep track of learnings…and do it again for each project.” N’Mah is a ProFair analyst. “I lead Profair training across Google, coordinate an ethics fellowship program for Googlers and design and conduct fairness tests for products before launch.” Cherish, also an analyst, does this as well. “I help product teams understand how to improve products ahead of launch. I drive our company-wide program in teaching Googlers how to test products, too.” Cherish says a big part of her role is making sure when product teams are building something they think of everyone who will use it — referencing the Google AI Principle of “avoid[ing] creating or reinforcing unfair bias.” “Far ahead of launch time, I look for ways a proposed AI application, ML model or dataset might not function optimally for a user due to unfair bias, so we can help fix it proactively. ”

All three enjoy the variety that comes with this work. “I love how collaborative my role is,” Anne says. “I get to work on many types of projects and with lots of different teams — including the Responsible AI research group.” N’Mah also enjoys seeing the products she’s supported make a difference in the world once they’ve actually launched.

“This role forces me to think outside the box, which I enjoy, and I’m able to advocate for users who may not be in the room,” Cherish says. “This job is very cerebral in nature. And I love collaborating with others to build these products for good.”

How did you choose this job?

All three Googlers didn’t know ProFair was an option when they were first considering their careers. “For a while, I wanted to be a librarian, but coming out of college, I’d been interested in doing political science research or program operations,” Anne says. “I had an entry level job as a program assistant where I was making lists and helping others move goals forward, and that skill transferred to different sectors.”

I wanted to be a lawyer, but ended up studying Middle East Studies and Spanish,” says N’Mah. “I focused on cross-cultural experiences, and that’s ultimately what drew me to this work.” That ended up aiding her, she says — it helps her understand how products impact people from different cultural backgrounds. Cherish also wanted to be a lawyer, and was interested in technology and ethics. “I was always interested in serving others,” she says. “But I had no idea this sort of career even existed! The teams and roles we work in were developed within the past few years.”

What would you tell someone who wants your job?

Today, there are more straightforward paths toward this work. Thankfully people who are currently in school have networks to leverage to learn more about this work,” Cherish says. Still, she says, “there is no linear path.” Someone who wants to do this kind of work should be interested in technological innovation but also focused on doing so with social benefit top of mind.

Anne agrees with Cherish: “There is no single path to this kind of work, but I’ve noticed people who choose this career are curious and passionate about wherever it is they are working on. I love program management, but others are passionate about building testing infrastructure, or achieving the most social benefit. You see them bring that enthusiasm to their teams.” Anne mentions that she didn’t think there was “room” for her in this field, which is something to consider for those interested in similar careers: The point of Product Fairness work is that all perspectives and backgrounds are included, not just people with MBAs and computer science degrees. “Ultimately, technology shouldnt be built for homogenous audiences,” Cherish says — and who works in this field should be just as diverse, too.

N’Mah says you shouldn’t feel pigeon-holed by your academic or career background; different experiences, personal and professional, are needed here. “There are a variety of backgrounds you can come from to work in this space — that’s what makes the team great,” she says. “If you’re interested in cross-cultural connections, or socially beneficial technical solutions, this could be an area of interest.” And if you’re someone who’s aware of their own unconscious biases, you might be naturally inclined toward a career in product fairness.

Bonus question: For Women’s History Month, who are some of your women role models?

“I have a strong group of female friends from high school who I’ve kept in touch with over the years,” Anne says. “We’ve all pursued different paths and have various strengths in our careers, but when we meet up, I love hearing what they’re passionate about and what they’re working on.” N’Mah says Harriet Tubman has always been a symbol to her of what’s possible in this country. “She persevered during a challenging moment in history and has done so much to push America forward socially.” For Cherish, she looks up to Maya Angelou. “She had such an incredibly poignant impact on society through her activism and her literature.”

Read More