Language models like Gopher can “hallucinate” facts that appear plausible but are actually fake. Those who are familiar with this problem know to do their own fact-checking, rather than trusting what language models say. Those who are not, may end up believing something that isn’t true. This paper describes GopherCite, a model which aims to address the problem of language model hallucination. GopherCite attempts to back up all of its factual claims with evidence from the web.Read More
Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion
People interact with the world through multiple sensory streams (e.g., we see objects, hear sounds, read words, feel textures and taste flavors), combining information and forming associations between senses. As real-world data consists of various signals that co-occur, such as video frames and audio tracks, web images and their captions and instructional videos and speech transcripts, it is natural to apply a similar logic when building and designing multimodal machine learning (ML) models.
Effective multimodal models have wide applications — such as multilingual image retrieval, future action prediction, and vision-language navigation — and are important for several reasons; robustness, which is the ability to perform even when one or more modalities is missing or corrupted, and complementarity between modalities, which is the idea that some information may be present only in one modality (e.g., audio stream) and not in the other (e.g., video frames). While the dominant paradigm for multimodal fusion, called late fusion, consists of using separate models to encode each modality, and then simply combining their output representations at the final step, investigating how to effectively and efficiently combine information from different modalities is still understudied.
In “Attention Bottlenecks for Multimodal Fusion”, published at NeurIPS 2021, we introduce a novel transformer-based model for multimodal fusion in video called Multimodal Bottleneck Transformer (MBT). Our model restricts cross-modal attention flow between latent units in two ways: (1) through tight fusion bottlenecks, that force the model to collect and condense the most relevant inputs in each modality (sharing only necessary information with other modalities), and (2) to later layers of the model, allowing early layers to specialize to information from individual modalities. We demonstrate that this approach achieves state-of-the-art results on video classification tasks, with a 50% reduction in FLOPs compared to a vanilla multimodal transformer model. We have also released our code as a tool for researchers to leverage as they expand on multimodal fusion work.
A Vanilla Multimodal Transformer Model
Transformer models consistently obtain state-of-the-art results in ML tasks, including video (ViViT) and audio classification (AST). Both ViViT and AST are built on the Vision Transformer (ViT); in contrast to standard convolutional approaches that process images pixel-by-pixel, ViT treats an image as a sequence of patch tokens (i.e., tokens from a smaller part, or patch, of an image that is made up of multiple pixels). These models then perform self-attention operations across all pairs of patch tokens. However, using transformers for multimodal fusion is challenging because of their high computational cost, with complexity scaling quadratically with input sequence length.
Because transformers effectively process variable length sequences, the simplest way to extend a unimodal transformer, such as ViT, to the multimodal case is to feed the model a sequence of both visual and auditory tokens, with minimal changes to the transformer architecture. We call this a vanilla multimodal transformer model, which allows free attention flow (called vanilla cross-attention) between different spatial and temporal regions in an image, and across frequency and time in audio inputs, represented by spectrograms. However, while easy to implement by concatenating audio and video input tokens, vanilla cross-attention at all layers of the transformer model is unnecessary because audio and visual inputs contain dense, fine-grained information, which may be redundant for the task — increasing complexity.
Restricting Attention Flow
The issue of growing complexity for long sequences in multimodal models can be mitigated by reducing the attention flow. We restrict attention flow using two methods, specifying the fusion layer and adding attention bottlenecks.
- Fusion layer (early, mid or late fusion): In multimodal models, the layer where cross-modal interactions are introduced is called the fusion layer. The two extreme versions are early fusion (where all layers in the transformer are cross-modal) and late fusion (where all layers are unimodal and no cross-modal information is exchanged in the transformer encoder). Specifying a fusion layer in between leads to mid fusion. This technique builds on a common paradigm in multimodal learning, which is to restrict cross-modal flow to later layers of the network, allowing early layers to specialize in learning and extracting unimodal patterns.
- Attention bottlenecks: We also introduce a small set of latent units that form an attention bottleneck (shown below in purple), which force the model, within a given layer, to collate and condense information from each modality before sharing it with the other, while still allowing free attention flow within a modality. We demonstrate that this bottlenecked version (MBT), outperforms or matches its unrestricted counterpart with lower computational cost.
The different attention configurations in our model. Unlike late fusion (top left), where no cross-modal information is exchanged in the transformer encoder, we investigate two pathways for the exchange of cross-modal information. Early and mid fusion (top middle, top right) is done via standard pairwise self attention across all hidden units in a layer. For mid fusion, cross-modal attention is applied only to later layers in the model. Bottleneck fusion (bottom left) restricts attention flow within a layer through tight latent units called attention bottlenecks. Bottleneck mid fusion (bottom right) applies both forms of restriction in conjunction for optimal performance. |
Bottlenecks and Computation Cost
We apply MBT to the task of sound classification using the AudioSet dataset and investigate its performance for two approaches: (1) vanilla cross-attention, and (2) bottleneck fusion. For both approaches, mid fusion (shown by the middle values of the x-axis below) outperforms both early (fusion layer = 0) and late fusion (fusion layer = 12). This suggests that the model benefits from restricting cross-modal connections to later layers, allowing earlier layers to specialize in learning unimodal features; however, it still benefits from multiple layers of cross-modal information flow. We find that adding attention bottlenecks (bottleneck fusion) outperforms or maintains performance with vanilla cross-attention for all fusion layers, with more prominent improvements at lower fusion layers.
We compare the amount of computation, measured in GFLOPs, for both vanilla cross-attention and bottleneck fusion. Using a small number of attention bottlenecks (four bottleneck tokens used in our experiments) adds negligible extra computation over a late fusion model, with computation remaining largely constant with varying fusion layers. This is in contrast to vanilla cross-attention, which has a non-negligible computational cost for every layer it is applied to. We note that for early fusion, bottleneck fusion outperforms vanilla cross-attention by over 2 mean average precision points (mAP) on audiovisual sound classification, with less than half the computational cost.
Results on Sound Classification and Action Recognition
MBT outperforms previous research on popular video classification tasks — sound classification (AudioSet and VGGSound) and action recognition (Kinetics and Epic-Kitchens). For multiple datasets, late fusion and MBT with mid fusion (both fusing audio and vision) outperform the best single modality baseline, and MBT with mid fusion outperforms late fusion.
Across multiple datasets, fusing audio and vision outperforms the best single modality baseline, and MBT with mid fusion outperforms late fusion. For each dataset we report the widely used primary metric, i.e., Audioset: mAP, Epic-Kitchens: Top-1 action accuracy, VGGSound, Moments-in-Time and Kinetics: Top-1 classification accuracy. |
Visualization of Attention Heatmaps
To understand the behavior of MBT, we visualize the attention computed by our network following the attention rollout technique. We compute heat maps of the attention from the output classification tokens to the image input space for a vanilla cross-attention model and MBT on the AudioSet test set. For each video clip, we show the original middle frame on the left with the ground truth labels overlayed at the bottom. We demonstrate that the attention is particularly focused on regions in the images that contain motion and create sound, e.g., the fingertips on the piano, the sewing machine, and the face of the dog. The fusion bottlenecks in MBT further force the attention to be localized to smaller regions of the images, e.g., the mouth of the dog in the top left and the woman singing in the middle right. This provides some evidence that the tight bottlenecks force MBT to focus only on the image patches that are relevant for an audio classification task and that benefit from mid fusion with audio.
Summary
We introduce MBT, a new transformer-based architecture for multimodal fusion, and explore various fusion approaches using cross-attention between bottleneck tokens. We demonstrate that restricting cross-modal attention via a small set of fusion bottlenecks achieves state-of-the-art results on a number of video classification benchmarks while also reducing computational costs compared to vanilla cross-attention models.
Acknowledgements
This research was conducted by Arsha Nagrani, Anurag Arnab, Shan Yang, Aren Jansen, Cordelia Schmid and Chen Sun. The blog post was written by Arsha Nagrani, Anurag Arnab and Chen Sun. Animations were created by Tom Small.
Unravel the knowledge in Slack workspaces with intelligent search using the Amazon Kendra Slack connector
Organizations use messaging platforms like Slack to bring the right people together to securely communicate with each other and collaborate to get work done. A Slack workspace captures invaluable organizational knowledge in the form of the information that flows through it as the users collaborate. However, making this knowledge easily and securely available to users is challenging due to the fragmented structure of Slack workspaces. Additionally, the conversational nature of Slack communication renders a traditional keyword-based approach to search ineffective.
You can now use the Amazon Kendra Slack connector to index Slack messages and documents, and search this content using intelligent search in Amazon Kendra, powered by machine learning (ML).
This post shows how to configure the Amazon Kendra Slack connector and take advantage of the service’s intelligent search capabilities. We use an example of an illustrative Slack workspace used by members to discuss technical topics related to AWS.
Solution overview
Slack workspaces include public channels where any workspace user can participate, and private channels where only those users who are members of these channels can communicate with each other. Furthermore, individuals can directly communicate with one another in one-on-one and ad hoc groups. This communication is in the form of messages and threads of replies, with optional document attachments. Slack workspaces of active organizations are dynamic, with its content and collaboration evolving continuously.
In our solution, we configure a Slack workspace as a data source to an Amazon Kendra search index using the Amazon Kendra Slack connector. Based on the configuration, when the data source is synchronized, the connector either crawls and indexes all the content from the workspace that was created on or before a specific date, or can optionally be based on a look back parameter in a change log mode. The look back parameter lets you crawl the date back a number of days since the last time you synced your data source. The connector also collects and ingests Access Control List (ACL) information for each indexed message and document. When access control or user context filtering is enabled, the search results of a query made by a user includes results only from those documents that the user is authorized to read.
Prerequisites
To try out the Amazon Kendra connector for Slack using this post as a reference, you need the following:
- An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
- Basic knowledge of AWS and working knowledge of Slack workspace administration.
- Admin access to a Slack workspace.
Configure your Slack workspace
The following screenshot shows our example Slack workspace:
The workspace has five users as members: Workspace Admin, Generic User, DB Solutions Architect, ML Solutions Architect, and Solutions Architect. There are three public channels, #general, #random, and #test-slack-workspace, which any member can access. Regarding the secure channels, #databases has Workspace Admin and DB Solutions Architect as members, #machine-learning has Workspace Admin and ML Solutions Architect as members, and #security and #well-architected secure channels have Solutions Architect, DB Solutions Architect, ML Solutions Architect, and Workspace Admin as members. The connector-test
app is configured in the Slack workspace in order to create a user OAuth token to be used in configuring the Amazon Kendra connector for Slack.
The following screenshot shows the configuration details of the connector-test
app OAuth tokens for the Slack workspace. We use the user OAuth token in configuring the Amazon Kendra connector for Slack.
In the User Token Scopes section, we configure the connector-test
app for the Slack workspace.
Configure the data source using the Amazon Kendra connector for Slack
To add a data source to your Amazon Kendra index using the Slack connector, you can use an existing Amazon Kendra index, or create a new Amazon Kendra index. Then complete the steps below. For more information on this topic, please refer to the section on Amazon Kendra connector for Slack in the Amazon Kendra Developer Guide.
- On the Amazon Kendra console, open the index and choose Data sources in the navigation pane.
- Under Slack, choose Add data source.
- Choose Add connector.
- In the Specify data source details section, enter the details of your data source and choose Next.
- In the Define access and security section, for Slack workspace team ID, enter the ID for your workspace.
- Under Authentication, you can either choose Create to add a new secret using the user OAuth token created for the
connector-app
, or use an existing AWS Secrets Manager secret that has the user OAuth token for the workspace that you want the connector to access. - For IAM role, you can choose Create a new role or choose an existing IAM role configured with appropriate IAM policies to access the Secrets Manager secret, Amazon Kendra index, and data source.
- Choose Next.
- In the Configure sync settings section, provide information regarding your sync scope and run schedule.
- Choose Next.
- In the Set field mappings section, you can optionally configure the field mappings, or how the Slack field names are mapped to Amazon Kendra attributes or facets.
- Choose Next.
- Review your settings and confirm to add the data source.
When the data source sync is complete, the User access control tab for the Amazon Kendra index is enabled. Note that in order to use the ACLs for Slack connector, it’s not necessary to enable User-group lookup through AWS SSO integration, though it is enabled in the following screenshot.
While making search queries, we want to interact with facets such as the channels of the workspace, category of the document, and the authors.
- Choose Facet definition in the navigation pane.
- Select the checkbox in the Facetable column for the facets
_authors
,_category
,sl_doc_channel_name
, andsl_msg_channel_name
.
Search with Amazon Kendra
Now we’re ready to make a few queries on the Amazon Kendra search console by choosing Search indexed content in the navigation pane.
In the first query, the user name is set to the email address of Generic User. The following screenshot shows the query response. Note that we get an answer from the aws-overview.pdf document posted in #general channel, followed by a few results from relevant documents or messages. The facets show the categories of the results to be MESSAGE
and FILE
. The sl_doc_channel_name
facet includes the information that the document is from #general channel, the sl_msg_channel_name
includes the information that there are results from all the open channels (namely #random, #general, and #test-slack-workspace), and the authors
facet includes the names of the authors of the messages.
Now let’s set the user name to be the email address corresponding to the user Solutions Architect. The following screenshot shows the query response. In addition to the public channels, the results also include secure channels #security and #well-architected.
In the next query, we set the user name to be the email address of the ML Solutions Architect. In this case, the results contain the category of THREAD_REPLY
in addition to MESSAGE
and FILE
. Also, ML Solutions Architect can access the secure channel of #machine-learning.
Now for the same query, to review what people have replied to the question, select the THREAD_REPLY
category on the left to refine the results. The response now contains only those results that are of the THREAD_REPLY
category.
The results from the response include the URL to the Slack message. When you choose the suggested answer result in the response, the URL asks for Slack workspace credentials, and opens the thread reply being referenced.
Clean up
To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Slack, delete that data source.
Conclusion
Using the Amazon Kendra Slack connector organizations can make invaluable information trapped in their Slack workspaces available to their users securely using intelligent search powered by Amazon Kendra. Additionally, the connector provides facets for Slack workspace attributes such as channels, authors, and categories for the users to interactively refine the search results based on what they’re looking for.
To learn more about the Amazon Kendra connector for Slack, please refer to the section on Amazon Kendra connector for Slack in the Amazon Kendra Developer Guide.
For more information on how you can create, modify, or delete metadata and content when ingesting your data from the Slack workspace, refer to Customizing document metadata during the ingestion process and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.
About the Author
Abhinav Jawadekar is a Senior Partner Solutions Architect at Amazon Web Services. Abhinav works with AWS Partners to help them in their cloud journey.
Securely search unstructured data on Windows file systems with the Amazon Kendra connector for Amazon FSx for Windows File Server
Critical information can be scattered across multiple data sources in your organization, including sources such as Windows file systems stored on Amazon FSx for Windows File Server. You can now use the Amazon Kendra connector for FSx for Windows File Server to index documents (HTML, PDF, MS Word, MS PowerPoint, and plain text) stored in your Windows file system on FSx for Windows File Server and search for information across this content using intelligent search in Amazon Kendra.
Organizations store unstructured data in files on shared Windows file systems and secure it by using Windows Access Control Lists (ACLs) to ensure that users can read, write, and create files as per their access permissions configured in the enterprise Active Directory (AD) domain. Finding specific information from this data not only requires searching through the files, but also ensuring that the user is authorized to access it. The Amazon Kendra connector for FSx for Windows File Server indexes the files stored on FSx for Windows File Server and ingests the ACLs in the Amazon Kendra index, so that the response of a search query made by a user includes results only from those documents that the user is authorized to read.
This post takes the example of a set of documents stored securely on a file system using ACLs on FSx for Windows File Server. These documents are ingested in an Amazon Kendra index by configuring and synchronizing this file system as a data source of the index using the connector for FSx for Windows File Server. Then we demonstrate that when a user makes a search query, the Amazon Kendra index uses the ACLs based on the user name and groups the user belongs to, and returns only those documents the user is authorized to access. We also include details of the configuration and screenshots at every stage so you can use this as a reference when configuring the Amazon Kendra connector for FSx for Windows File Server in your setup.
Prerequisites
To try out the Amazon Kendra connector for FSx for Windows File Server, you need the following:
- An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
- Basic knowledge of AWS and working knowledge of Windows ACLs and Microsoft AD domain administration.
- Admin access to a file system on FSx for Windows File Server, with admin access to the AD domain to which it belongs. Alternately, you can deploy this using the Quick Start for FSx for Windows File Server.
- The AWS_Whitepapers.zip, which we use to try out the functionality. For updated versions, refer to AWS Whitepapers & Guides. Alternately, you can use your own documents.
Solution architecture
The following diagram illustrates the solution architecture:
The documents in this example are stored on a file system (3 in the diagram) on FSx for Windows File Server (4). The files are set up with ACLs based on the user and group configurations in the AD domain created using AWS Directory Service (1) to which FSx for Windows File Server belongs. This file system on FSx for Windows File Server is configured as a data source for Amazon Kendra (5). AWS Single Sign On (AWS SSO) is enabled with the AD as the identity source, and the Amazon Kendra index is set up to use AWS SSO (2) for user name and group lookup for the user context of the search queries from the customer search solution deployments (6). The FSx for Windows File Server file system, AWS Managed Microsoft AD server, the Amazon Virtual Private Cloud (Amazon VPC) and subnets configured in this example are created using the Quick Start for FSx for Windows File Server.
FSx for Windows File Server configuration
The following screenshot shows the file system on FSx for Windows File Server configured as a part of an AWS Managed Microsoft AD domain that is used in our example, as seen on the Amazon FSx console.
AWS Managed Microsoft AD configuration
The AD to which FSx for Windows File Server belongs is configured as an AWS Managed Microsoft AD, as seen in the following screenshot of the Directory Service console.
Users, groups and ACL configuration for sample dataset
For this post, we used a dataset consisting of a few AWS publicly available whitepapers and stored them in directories based on their categories (Best_Practices
, Databases
, General
, Machine_Learning
, Security
, and Well_Architected
) on a file system on FSx for Windows File Server. The following screenshot shows the folders as seen from a Windows bastion host that is part of the AD domain to which the file system belongs.
Users and groups are configured in the AD domain as follows:
-
kadmin –
group_kadmin
-
patricia –
group_sa
,group_kauthenticated
-
james –
group_db_sa
,group_kauthenticated
-
john –
group_ml_sa
,group_kauthenticated
-
mary, julie, tom –
group_kauthenticated
The following screenshot shows users and groups configured in the AWS Managed Microsoft AD domain as seen from the Windows bastion host.
The ACLs for the files in each directory are set up based on the user and group configurations in the AD domain to which FSx for Windows File Server belongs:
-
All authenticated users (group_kauthenticated) – Can access the documents in
Best_Practices
andGeneral
directories -
Solutions Architects (group_sa) – Can access the documents in
Best_Practices
,General
,Security
, andWell_Architected
directories -
Database subject matter expert Solutions Architects (group_db_sa) – Can access the documents in
Best_Practices
,General
,Security
,Well_Architected
, andDatabase
directories -
Machine learning subject matter expert Solutions Architects (group_ml_sa) – Can access
Best_Practices
,General
,Security
,Well_Architected
, andMachine_Learning
directories - Admin (group_kadmin) – Can access the documents in any of the six directories
The following screenshot shows the ACL configurations for each of the directories of our sample data, as seen from the Windows bastion host.
AWS Single Sign-On configuration
AWS SSO is configured with the AD domain as the identity source. The following screenshot shows the settings on the AWS SSO console.
The groups are synchronized in AWS SSO from the AD, as shown in the following screenshot.
The following screenshot shows the members of the group_kauthenticated
group synchronized from the AD.
Data source configuration using Amazon Kendra connector for FSx for Windows File Server
We configure a data source using the Amazon Kendra connector for FSx for Windows File Server in an Amazon Kendra index on the Amazon Kendra console. You can create a new Amazon Kendra index or use an existing one and add a new data source.
When you add a data source for an Amazon Kendra index, choose the FSx for Windows File Server connector by choosing Add connector under Amazon FSx.
The steps to add a data source name and resource tags are similar to adding any other data source, as shown in the following screenshot.
The details for configuring the specific file system on Amazon FSx and the type of the file system (FSx for Windows File Server in this case), are configured for in the Source section. The authentication credentials of a user with admin privileges to the file system are configured using an AWS Secrets Manager secret.
The VPC and security group settings of the data source configuration include the details of the VPC, subnets, and security group of Amazon FSx and the AD server. In the following screenshot, we also create a new IAM role for the data source.
The next step in data source configuration involves mapping the Amazon FSx connector fields to the Amazon Kendra facets or field names. In the following screenshot, we leave the configuration unchanged. The step after this involves reviewing the configuration and confirming that the data source should be created.
After you configure the file system on FSx for Windows File Server, where the example data is stored as a data source, you configure Custom Document Enrichment (CDE) basic operations for this data source so that the Amazon Kendra index filed _category
is configured based on the directory in which a document is stored. The data source sync is started after the CDE configuration, so that the _category
attributes for the documents get configured during the ingestion workflow.
As shown in the following screenshot, the Amazon Kendra index user access control settings are configured for user and group lookup through AWS SSO integration. JSON token-based user access control is enabled to search based on user and group names from the Amazon Kendra Search console.
In the facet definition for the Amazon Kendra index, make sure that the facetable and displayable boxes are checked for _category
. This allows you to use the _category
values set by the CDE basic operations as facets while searching.
Search with Amazon Kendra
After the data source sync is complete, we can start searching from the Amazon Kendra Search console, by choosing Search indexed content in the navigation pane on the Amazon Kendra console. Because we’re using AWS whitepapers as the dataset to ingest in the Amazon Kendra index, we use “What’s DynamoDB?” as the search query. Only authenticated users are authorized access to the files on the file system on FSx for Windows File Server; therefore, when we use this search query without setting any user name or group, we don’t get any results.
Now let’s set the user name to mary@kendra-01.com
. The user mary
belongs to group_kauthenticated
, and therefore is authorized to access the documents in the Best_Practices
and General
directories. In the following screenshot, the search response includes documents with the facet category
set to Best Practices and General. The CDE basic operations set the facet category
depending on the directory names contained in the source_uri
. This confirms that the ACLs ingested in Amazon Kendra by the connector for FSx for Windows File Server are being enforced in the search results based on the user name.
Now we change the user name to patricia@kendra-01.com
. The user patricia
belongs to group_sa
, with access to the Security
and Well_Architected
directories, in addition to Best_Practices
and General
directories. The search response includes results from these additional directories.
Now we can observe how the results from the search response change as we change the user name to james@kendra-01.com
, john@kendra-01.com
, and kadmin@kendra-01.com
in the following screenshots.
Clean up
If you deployed any AWS infrastructure to experiment with the Amazon Kendra connector for FSx for Windows File Server, clean up the infrastructure as follows:
- If you used the Quick Start for FSx for Windows File Server, delete the AWS CloudFormation stack you created so that it deletes all the resources it created.
- If you created a new Amazon Kendra index, delete it.
- If you only added a new data source using the connector, delete that data source.
- Delete the AWS SSO configuration.
Conclusion
The Amazon Kendra connector for FSx for Windows File Server enables secure and intelligent search of information scattered in unstructured content. The data is securely stored on file systems on FSx Windows File Server with ACLs and shared with users based on their Microsoft AD domain credentials.
For more information on the Amazon Kendra connector for FSx for Windows File Server, refer to Getting started with an Amazon FSx data source (console) and Using an Amazon FSx data source.
For information on Custom Document Enrichment, refer to Customizing document metadata during the ingestion process and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.
About the Author
Abhinav Jawadekar is a Senior Partner Solutions Architect at Amazon Web Services. Abhinav works with AWS Partners to help them in their cloud journey.
New GPT-3 Capabilities: Edit & Insert
We’ve released new versions of GPT-3 and Codex which can edit or insert content into existing text, rather than just completing existing text. These new capabilities make it practical to use the OpenAI API to revise existing content, such as rewriting a paragraph of text or refactoring code. This unlocks new use cases and improves existing ones; for example, insertion is already being piloted in GitHub Copilot with promising early results.
Read Insert Documentation
Try in Playground
def___
fib(10)
def fib(n):
if n
fib(10)=>
def fib(n):
if n
fib(10)=>
def fib(n):
if n
fib(10)=>
def fib(n):
if n
fib(10)=>
def fib(n):
if n
fib(10)=>
def fib(n):
if n
fib(10)=>
def fib(n):
if n
fib(10)=>
def fib(n):
if n
fib(10)=>
def fib(n, memo={}):
if n in memo:
return memo[n]
if n =>
def fib(n, memo={}):
if n in memo:
return memo[n]
if n =>
def fib(n, memo={}):
if n in memo:
return memo[n]
if n =>
def fib(n, memo={}):
if n in memo:
return memo[n]
if n =>
def fib(n, memo={}):
if n in memo:
return memo[n]
if n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
if (n in memo) return memo[n];
var f;
if (n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
return n in memo ? memo[n] : (memo[n] = n =>
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
return n in memo ? memo[n] : (memo[n] = n =>
/
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
return n in memo ? memo[n] : (memo[n] = n =>
/**
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
return n in memo ? memo[n] : (memo[n] = n =>
/**___
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
return n in memo ? memo[n] : (memo[n] = n =>
/**
* Recursive Fibonacci function with memoization.
* @param {number} n
* @returns {number}
*/
function fibonacci(n) {
var memo = {};
return (function fib(n, memo) {
return n in memo ? memo[n] : (memo[n] = n =>
GPT-3 and Codex have traditionally added text to the end of existing content, based on the text that came before. Whether working with text or code, writing is more than just appending—it’s an iterative process where existing text is revised. GPT-3 and Codex can now edit text, changing what’s currently there or adding text to the middle of content.
Inserting into the middle of text
The new insert capability adds contextually relevant text in the middle of existing content. Providing future context to the model can improve the quality of completions for applications such as writing long-form text, transitioning between paragraphs, following an outline, or guiding the model toward an ending.
Today’s the big day!___
Today’s the big day! Congratulations to all our high school graduates!
We hope you enjoy your graduation ceremonies and wish you the best of luck in your future endeavors.
Today’s the big day! Congratulations to all our high school graduates!
We hope you enjoy your graduation ceremonies and wish you the best of luck in your future endeavors.
Section
Today’s the big day! Congratulations to all our high school graduates!
We hope you enjoy your graduation ceremonies and wish you the best of luck in your future endeavors.
Section 2.
Today’s the big day! Congratulations to all our high school graduates!
We hope you enjoy your graduation ceremonies and wish you the best of luck in your future endeavors.
Section 2. Moving
Today’s the big day! Congratulations to all our high school graduates!
We hope you enjoy your graduation ceremonies and wish you the best of luck in your future endeavors.
Section 2. Moving to
Today’s the big day! Congratulations to all our high school graduates!
We hope you enjoy your graduation ceremonies and wish you the best of luck in your future endeavors.
Section 2. Moving to San Francisco
Today’s the big day! I’m graduating from high school. I’m so excited. I’m going to move to San Francisco and start a new life. I can’t wait to get there.
Section 2. Moving to San Francisco
Today’s the big day! I’m graduating from high school. I’m so excited. I’m going to move to San Francisco and start a new life. I can’t wait to get there.
Section 2. Moving to
Today’s the big day! I’m graduating from high school. I’m so excited. I’m going to move to San Francisco and start a new life. I can’t wait to get there.
Section 2. Moving to Istanbul
Today’s the big day! I’m graduating from high school. I’m so excited to be moving on to the next chapter of my life. I’m going to Istanbul!
Section 2. Moving to Istanbul
Today’s the big day! I’m graduating from high school. I’m so excited to be moving on to the next chapter of my life. I’m going to Istanbul!
Section 2. Moving to
Today’s the big day! I’m graduating from high school. I’m so excited to be moving on to the next chapter of my life. I’m going to Istanbul!
Section 2. Moving to the
Today’s the big day! I’m graduating from high school. I’m so excited to be moving on to the next chapter of my life. I’m going to Istanbul!
Section 2. Moving to the farm
Today’s the big day! My high school graduation! I’m so excited! I can’t wait to leave the city and move to the farm. I want to live a simple, quiet life. I want to get up in the morning, milk the cows, feed the horses, then sit and watch the sunrise. I want to spend the rest of my days traveling the world, reading and writing. I’m so tired of this city life.
Section 2. Moving to the farm
In the example above, the desire is to fill-in text between two section headers of an outline. Without the context of future sections, the model generates a completion that isn’t relevant to the second section. When the context of future sections are accounted for, the model generates a completion that ties the two sections together.
def get_files(path: str, size: int):
def___
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
def get_files(path: str, size: int):
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
def get_files(path: str, size: int):
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
___
def get_files(path: str, size: int):
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
import os
def get_files(path: str, size: int):
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
import os
def get_files(path: str, size: int):
"""
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
import os
def get_files(path: str, size: int):
"""___
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
import os
def get_files(path: str, size: int):
"""___"""
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
import os
def get_files(path: str, size: int):
"""Yields files in the path tree of min size"""
def prune(dirp, files):
for file in files:
file = os.path.join(dirp, file)
if os.path.getsize(file) > size:
yield file
for (dirp, _, files) in os.walk(path):
yield from prune(dirp, files)
Insert is particularly useful for writing code. In fact, Codex was our original motivation for developing this capability, since in software development we typically add code to the middle of an existing file where code is present before and after the completion. In the example above, the model successfully completes the missing function prune
, while connecting to code already written. We also add a docstring and missing imports, which is not possible without knowing the code that comes after. In GitHub Copilot, Insert is currently being piloted with early promising results.
The insert capability is available in the API today in beta, as part of the completions endpoint and via a new interface in Playground. The capability can be used with the latest versions of GPT-3 and Codex, text-davinci-002
and code-davinci-002
. Pricing is the same as previous versions of Davinci.
Editing existing text
A meaningful part of writing text and code is spent editing existing content. We’ve released a new endpoint in beta called edits that changes existing text via an instruction, instead of completing it.
That’s pretty good at writing replies
When it’s asked a question
It gives its suggestion
This is a poem it made that rhymes
That’s pretty good at writing replies
When it’s asked a question
It gives its suggestion
This is a poem it made that rhymes
That’s pretty good at writing replies
When it’s asked a question
It gives its suggestion
This is a poem it made that rhymes
That’s pretty good at writing replies
When it’s asked a question
It gives its suggestion
This is a poem it made that rhymes
That’s pretty good at writing replies
When it’s asked a question
It gives its suggestion
This is a poem it made that rhymes
That’s pretty good at writing replies
When it’s asked a question
It gives its suggestion
This is a poem it made that rhymes
That’s pretty good at writing replies
When it’s asked a question
It gives its suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem it made that rhymes
I am very nice AI
I am pretty good at writing replies
When I am asked a question
I give my suggestion
This is a poem I made that rhymes
Kind regards,
GPT-3
Editing works by specifying existing text as a prompt and an instruction on how to modify it. The edits endpoint can be used to change the tone or structure of text, or make targeted changes like fixing spelling. We’ve also observed edits to work well on empty prompts, thus enabling text generation similar to the completions endpoint. In the example above, we use edits to (1) add a poem, (2) change the poem to be in first-person, (3) transform the poem into a letter, with the appropriate salutation and signature.
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
The three US cities with the worst traffic are:
1. Boston, MA (164 hours)
2. Washington, DC (155 hours)
3. Chicago, IL (138 hours)
[
{"rank": 1, "city": "Boston", "state": "MA", "hours": 164},
{"rank": 2, "city": "Washington DC", "state": "DC", "hours": 155},
{"rank": 3, "city": "Chicago", "state": "IL", "hours": 138}
]
[
{"rank": 1, "city": "Boston", "state": "MA", "hours": 164},
{"rank": 2, "city": "Washington DC", "state": "DC", "hours": 155},
{"rank": 3, "city": "Chicago", "state": "IL", "hours": 138}
]
[
{"rank": 1, "city": "Boston", "state": "MA", "hours": 164},
{"rank": 2, "city": "Washington DC", "state": "DC", "hours": 155},
{"rank": 3, "city": "Chicago", "state": "IL", "hours": 138}
]
[
{"rank": 1, "city": "Boston", "state": "MA", "hours": 164},
{"rank": 2, "city": "Washington DC", "state": "DC", "hours": 155},
{"rank": 3, "city": "Chicago", "state": "IL", "hours": 138}
]
[
{"rank": 1, "city": "Boston", "state": "MA", "hours": 164},
{"rank": 2, "city": "Washington DC", "state": "DC", "hours": 155},
{"rank": 3, "city": "Chicago", "state": "IL", "hours": 138}
]
[
{"rank": 1, "city": "Boston", "state": "MA", "hours": 164},
{"rank": 2, "city": "Washington DC", "state": "DC", "hours": 155},
{"rank": 3, "city": "Chicago", "state": "IL", "hours": 138}
]
[
{"city": "Boston", "state": "Massachusetts", "hours": 164},
{"city": "Washington DC", "state": "District of Columbia", "hours": 155},
{"city": "Chicago", "state": "Illinois", "hours": 138}
]
[
{"city": "Boston", "state": "Massachusetts", "hours": 164},
{"city": "Washington DC", "state": "District of Columbia", "hours": 155},
{"city": "Chicago", "state": "Illinois", "hours": 138}
]
[
{"city": "Boston", "state": "Massachusetts", "hours": 164},
{"city": "Washington DC", "state": "District of Columbia", "hours": 155},
{"city": "Chicago", "state": "Illinois", "hours": 138}
]
[
{"city": "Boston", "state": "Massachusetts", "hours": 164},
{"city": "Washington DC", "state": "District of Columbia", "hours": 155},
{"city": "Chicago", "state": "Illinois", "hours": 138}
]
[
{"city": "Boston", "state": "Massachusetts", "hours": 164},
{"city": "Washington DC", "state": "District of Columbia", "hours": 155},
{"city": "Chicago", "state": "Illinois", "hours": 138}
]
[
{"city": "Boston", "state": "Massachusetts", "hours": 164},
{"city": "Washington DC", "state": "District of Columbia", "hours": 155},
{"city": "Chicago", "state": "Illinois", "hours": 138}
]
[
{"city": "Boston", "state": "Massachusetts", "hours": 164},
{"city": "Washington DC", "state": "District of Columbia", "hours": 155},
{"city": "Chicago", "state": "Illinois", "hours": 138}
]
def get_yaml():
return """
- city: Boston
state: Massachusetts
hours: 164
- city: Washington DC
state: District of Columbia
hours: 155
- city: Chicago
state: Illinois
hours: 138
"""
The edits endpoint is particularly useful for writing code. It works well for tasks like refactoring, adding documentation, translating between programming languages, and changing coding style. The example above starts with JSON input containing cities ranked by population. With our first edit, Codex removes the rank field from the JSON, and changes the state abbreviations into full names. The second edit converts the JSON file into YAML returned from a function.
Editing is available as a specialized endpoint in the API and through a new interface in Playground. It is supported by models text-davinci-edits-001
and code-davinci-edits-001
. The edits endpoint is currently free to use and publicly available as a beta.
OpenAI
Registration opens for Amazon re:MARS event
In-person event featuring some of the brightest leaders in science, academia, and business is planned for June 21-24 in Las VegasRead More
Automate email responses using Amazon Comprehend custom classification and entity detection
In this post, we demonstrate how to create an automated email response solution using Amazon Comprehend.
Organizations spend lots of resources, effort, and money on running their customer care operations to answer customer questions and provide solutions. Your customers may ask questions via various channels, such as email, chat, or phone, and deploying a workforce to answer those queries can be resource intensive, time-consuming, and even unproductive if the answers to those questions are repetitive.
During the COVID-19 pandemic, many organizations couldn’t adequately support their customers due to the shutdown of customer care and agent facilities, and customer queries were piling up. Some organizations struggled to reply to queries promptly, which can cause a poor customer experience. This in turn can result in customer dissatisfaction, and can impact an organization’s reputation and revenue in the long term.
Although your organization might have the data assets for customer queries and answers, you may still struggle to implement an automated process to reply to your customers. Challenges might include unstructured data, different languages, and a lack of expertise in artificial intelligence (AI) and machine learning (ML) technologies.
You can overcome such challenges by using Amazon Comprehend to automate email responses to customer queries. With our solution, you can identify the intent of customer emails send an automated response if the intent matches your existing knowledge base. If the intent doesn’t have a match, the email goes to the support team for a manual response. The following are some common customer intents when contacting customer care:
- Transaction status (for example, status of a money transfer)
- Password reset
- Promo code or discount
- Hours of operation
- Find an agent location
- Report fraud
- Unlock account
- Close account
Amazon Comprehend can help you perform classification and entity detection on emails for any of the intents above. For this solution, we show how to classify customer emails for the first three intents. You can also use Amazon Comprehend to detect key information from emails, so you can automate your business processes. For example you can use Amazon Comprehend to automate the reply to a customer request with specific information related to that query.
Solution overview
To build our customer email response flow, we use the following services:
- Amazon Comprehend
- AWS Lambda
- Amazon Simple Email Service (Amazon SES)
- Amazon Simple Notification Service (Amazon SNS)
- Amazon WorkMail
The following architecture diagram highlights the end-to-end solution:
The solution workflow includes the following steps:
- A customer sends an email to the customer support email created in WorkMail.
- WorkMail invokes a Lambda function upon receiving the email.
- The function sends the email content to a custom classification model endpoint.
- The custom classification endpoint returns with a classified value and confidence level (over 80%, but you can configure this as needed).
- If the classification value is
MONEYTRANSFER
, the Lambda function calls the entity detection endpoint to find the money transfer ID. - If the money transfer ID is returned, the function returns the money transfer status randomly (in real-world scenario, you can call the database via API to fetch the actual transfer status).
- Based on the classified value returned, a predefined email template in Amazon SES is chosen, and a reply email is sent to the customer.
- If the confidence level is less than 80%, a classified value is not returned, or entity detection doesn’t find the money transfer ID, the customer email is pushed to an SNS topic. You can subscribe to Amazon SNS to push the message to your ticketing system.
Prerequisites
Refer to the README.md file in the GitHub repo to make sure you meet the prerequisites to deploy this solution.
Deploy the solution
Solution deployment consists of the following high-level steps:
- Complete manual configurations using the AWS Management Console.
- Run scripts in an Amazon SageMaker notebook instance using the provided notebook file.
- Deploy the solution using the AWS Cloud Development Kit (AWS CDK).
For full instructions, refer to the README.md file in the GitHub repo.
Test the solution
To test the solution, send an email from your personal email to the support email created as part of the AWS CDK deployment (for this post, we use support@mydomain.com). We use the following three intents in our sample data for custom classification training:
- MONEYTRANSFER – The customer wants to know the status of a money transfer
- PASSRESET – The customer has a login, account locked, or password request
- PROMOCODE – The customer wants to know about a discount or promo code available for a money transfer
The following screenshot shows a sample customer email:
If the customer email is not classified or confidence levels are below 80%, the content of the email is forwarded to an SNS topic. Whoever is subscribed to the topic receives the email content as a message. We subscribed to this SNS topic with the email that we passed with the human_workflow_email
parameter during the deployment.
Clean up
To avoid incurring ongoing costs, delete the resources you created as part of this solution when you’re done.
Conclusion
In this post, you learned how to configure an automated email response system using Amazon Comprehend customer classification and entity detection and other AWS services. This solution can provide the following benefits:
- Improved email response time
- Improved customer satisfaction
- Cost savings regarding time and resources
- Ability to focus on key customer issues
You can also expand this solution to other areas in your business and to other industries.
With the current architecture, the emails that are classified with a low confidence score are routed to a human loop for manual verification and response. You can use the inputs from the manual review process to further improve the Amazon Comprehend model and increase the automated classification rate. Amazon Augmented AI (Amazon A2I) provides built-in human review workflows for common ML use cases, such as NLP-based entity recognition in documents. This allows you to easily review predictions from Amazon Comprehend.
As we get more data for every intent, we will retrain and deploy the custom classification model and update the email response flow accordingly in the GitHub repo.
About the Author
Godwin Sahayaraj Vincent is an Enterprise Solutions Architect at AWS who is passionate about Machine Learning and providing guidance to customers to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play cricket with his friends and tennis with his three kids.
Shamika Ariyawansa is an AI/ML Specialist Solutions Architect on the Global Healthcare and Life Sciences team at Amazon Web Services. He works with customers to advance their ML journey with a combination of AWS ML offerings and his ML domain knowledge. He is based out of Denver, Colorado. In his spare time, he enjoys off-roading adventures in the Colorado mountains and competing in machine learning competitions.
Real-world robotic-manipulation system
Amazon Research Award recipient Russ Tedrake is teaching robots to manipulate a wide variety of objects in unfamiliar and constantly changing contexts.Read More
When it comes to AI, can we ditch the datasets?
Huge amounts of data are needed to train machine-learning models to perform image classification tasks, such as identifying damage in satellite photos following a natural disaster. However, these data are not always easy to come by. Datasets may cost millions of dollars to generate, if usable data exist in the first place, and even the best datasets often contain biases that negatively impact a model’s performance.
To circumvent some of the problems presented by datasets, MIT researchers developed a method for training a machine learning model that, rather than using a dataset, uses a special type of machine-learning model to generate extremely realistic synthetic data that can train another model for downstream vision tasks.
Their results show that a contrastive representation learning model trained using only these synthetic data is able to learn visual representations that rival or even outperform those learned from real data.
This special machine-learning model, known as a generative model, requires far less memory to store or share than a dataset. Using synthetic data also has the potential to sidestep some concerns around privacy and usage rights that limit how some real data can be distributed. A generative model could also be edited to remove certain attributes, like race or gender, which could address some biases that exist in traditional datasets.
“We knew that this method should eventually work; we just needed to wait for these generative models to get better and better. But we were especially pleased when we showed that this method sometimes does even better than the real thing,” says Ali Jahanian, a research scientist in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper.
Jahanian wrote the paper with CSAIL grad students Xavier Puig and Yonglong Tian, and senior author Phillip Isola, an assistant professor in the Department of Electrical Engineering and Computer Science. The research will be presented at the International Conference on Learning Representations.
Generating synthetic data
Once a generative model has been trained on real data, it can generate synthetic data that are so realistic they are nearly indistinguishable from the real thing. The training process involves showing the generative model millions of images that contain objects in a particular class (like cars or cats), and then it learns what a car or cat looks like so it can generate similar objects.
Essentially by flipping a switch, researchers can use a pretrained generative model to output a steady stream of unique, realistic images that are based on those in the model’s training dataset, Jahanian says.
But generative models are even more useful because they learn how to transform the underlying data on which they are trained, he says. If the model is trained on images of cars, it can “imagine” how a car would look in different situations — situations it did not see during training — and then output images that show the car in unique poses, colors, or sizes.
Having multiple views of the same image is important for a technique called contrastive learning, where a machine-learning model is shown many unlabeled images to learn which pairs are similar or different.
The researchers connected a pretrained generative model to a contrastive learning model in a way that allowed the two models to work together automatically. The contrastive learner could tell the generative model to produce different views of an object, and then learn to identify that object from multiple angles, Jahanian explains.
“This was like connecting two building blocks. Because the generative model can give us different views of the same thing, it can help the contrastive method to learn better representations,” he says.
Even better than the real thing
The researchers compared their method to several other image classification models that were trained using real data and found that their method performed as well, and sometimes better, than the other models.
One advantage of using a generative model is that it can, in theory, create an infinite number of samples. So, the researchers also studied how the number of samples influenced the model’s performance. They found that, in some instances, generating larger numbers of unique samples led to additional improvements.
“The cool thing about these generative models is that someone else trained them for you. You can find them in online repositories, so everyone can use them. And you don’t need to intervene in the model to get good representations,” Jahanian says.
But he cautions that there are some limitations to using generative models. In some cases, these models can reveal source data, which can pose privacy risks, and they could amplify biases in the datasets they are trained on if they aren’t properly audited.
He and his collaborators plan to address those limitations in future work. Another area they want to explore is using this technique to generate corner cases that could improve machine learning models. Corner cases often can’t be learned from real data. For instance, if researchers are training a computer vision model for a self-driving car, real data wouldn’t contain examples of a dog and his owner running down a highway, so the model would never learn what to do in this situation. Generating that corner case data synthetically could improve the performance of machine learning models in some high-stakes situations.
The researchers also want to continue improving generative models so they can compose images that are even more sophisticated, he says.
This research was supported, in part, by the MIT-IBM Watson AI Lab, the United States Air Force Research Laboratory, and the United States Air Force Artificial Intelligence Accelerator.
Computer vision using synthetic datasets with Amazon Rekognition Custom Labels and Dassault Systèmes 3DEXCITE
This is a post co-written with Bernard Paques, CTO of Storm Reply, and Karl Herkt, Senior Strategist at Dassault Systèmes 3DExcite.
While computer vision can be crucial to industrial maintenance, manufacturing, logistics, and consumer applications, its adoption is limited by the manual creation of training datasets. The creation of labeled pictures in an industrial context is mainly done manually, which creates limited recognition capabilities, doesn’t scale, and results in labor costs and delays on business value realization. This goes against the business agility provided by rapid iterations in product design, product engineering, and product configuration. This process doesn’t scale for complex products such as cars, airplanes, or modern buildings, because in those scenarios every labeling project is unique (related to unique products). As result, computer vision technology can’t be easily applied to large-scale unique projects without a big effort in data preparation, sometimes limiting use case delivery.
In this post, we present a novel approach where highly specialized computer vision systems are created from design and CAD files. We start with the creation of visually correct digital twins and the generation of synthetic labeled images. Then we push these images to Amazon Rekognition Custom Labels to train a custom object detection model. By using existing intellectual property with software, we’re making computer vision affordable and relevant to a variety of industrial contexts.
The customization of recognition systems helps drive business outcomes
Specialized computer vision systems that are produced from digital twins have specific merits, which can be illustrated in the following use cases:
- Traceability for unique products – Airbus, Boeing, and other aircraft makers assign unique Manufacturer Serial Numbers (MSNs) to every aircraft they produce. This is managed throughout the entire production process, in order to generate airworthiness documentation and get permits to fly. A digital twin (a virtual 3D model representing a physical product) can be derived from the configuration of each MSN, and generates a distributed computer vision system that tracks the progress of this MSN across industrial facilities. Custom recognition automates the transparency given to airlines, and replaces most checkpoints performed manually by airlines. Automated quality assurance on unique products can apply to aircrafts, cars, buildings, and even craft productions.
- Contextualized augmented reality – Professional-grade computer vision systems can scope limited landscapes, but with higher discrimination capabilities. For example, in industrial maintenance, finding a screwdriver in a picture is useless; you need to identify the screwdriver model or even its serial number. In such bounded contexts, custom recognition systems outperform generic recognition systems because they’re more relevant in their findings. Custom recognition systems enable precise feedback loops via dedicated augmented reality delivered in HMI or in mobile devices.
- End-to-end quality control – With system engineering, you can create digital twins of partial constructs, and generate computer vision systems that adapt to the various phases of manufacturing and production processes. Visual controls can be intertwined with manufacturing workstations, enabling end-to-end inspection and early detection of defects. Custom recognition for end-to-end inspection effectively prevents the cascading of defects to assembly lines. Reducing the rejection rate and maximizing the production output is the ultimate goal.
- Flexible quality inspection – Modern quality inspection has to adapt to design variations and flexible manufacturing. Variations in design come from feedback loops on product usage and product maintenance. Flexible manufacturing is a key capability for a make-to-order strategy, and aligns with the lean manufacturing principle of cost-optimization. By integrating design variations and configuration options in digital twins, custom recognition enables the dynamic adaptation of computer vision systems to the production plans and design variations.
Enhance computer vision with Dassault Systèmes 3DEXCITE powered by Amazon Rekognition
Within Dassault Systèmes, a company with deep expertise in digital twins that is also the second largest European software editor, the 3DEXCITE team is exploring a different path. As explained by Karl Herkt, “What if a neural model trained from synthetic images could recognize a physical product?” 3DEXCITE has solved this problem by combining their technology with the AWS infrastructure, proving the feasibility of this peculiar approach. It’s also known as cross-domain object detection, where the detection model learns from labeled images from the source domain (synthetic images) and makes predictions to the unlabeled target domain (physical components).
Dassault Systèmes 3DEXCITE and the AWS Prototyping team have joined forces to build a demonstrator system that recognizes parts of an industrial gearbox. This prototype was built in 3 weeks, and the trained model achieved a 98% F1 score. The recognition model has been trained entirely from a software pipeline, which doesn’t feature any pictures of a real part. From design and CAD files of an industrial gearbox, 3DEXCITE has created visually correct digital twins. They also generated thousands of synthetic labeled images from the digital twins. Then they used Rekognition Custom Labels to train a highly specialized neural model from these images and provided a related recognition API. They built a website to enable recognition from any webcam of one physical part of the gearbox.
Amazon Rekognition is an AI service that uses deep learning technology to allow you to extract meaningful metadata from images and videos—including identifying objects, people, text, scenes, activities, and potentially inappropriate content—with no machine learning (ML) expertise required. Amazon Rekognition also provides highly accurate facial analysis and facial search capabilities that you can use to detect, analyze, and compare faces for a wide variety of user verification, people counting, and safety use cases. Lastly, with Rekognition Custom Labels, you can use your own data to build object detection and image classification models.
The combination of Dassault Systèmes technology for the generation of synthetic labeled images with Rekognition Custom Labels for computer vision provides a scalable workflow for recognition systems. Ease of use is a significant positive factor here because adding Rekognition Custom Labels to the overall software pipeline isn’t difficult—it’s as simple as integrating an API into a workflow. No need to be an ML scientist; simply send captured frames to AWS and receive a result that you can enter into a database or display in a web browser.
This further underscores the dramatic improvement over manual creation of training datasets. You can achieve better results faster and with greater accuracy, without the need for costly, unnecessary work hours. With so many potential use cases, the combination of Dassault Systèmes and Rekognition Custom Labels has the potential to provide today’s businesses with significant and immediate ROI.
Solution overview
The first step in this solution is to render the images that create the training dataset. This is done by the 3DEXCITE platform. We can generate the labeling data programmatically by using scripts. Amazon SageMaker Ground Truth provides an annotation tool to easily label images and videos for classification and object detection tasks. To train a model in Amazon Rekognition, the labeling file needs to comply with Ground Truth format. These labels are in JSON, including information such as image size, bounding box coordinates, and class IDs.
Then upload the synthetic images and the manifest to Amazon Simple Storage Service (Amazon S3), where Rekognition Custom Labels can import them as components of the training dataset.
To let Rekognition Custom Labels test the models versus a set of real component images, we provide a set of pictures of the real engine parts taken with a camera and upload them to Amazon S3 to use as the testing dataset.
Finally, Rekognition Custom Labels trains the best object detection model using the synthetic training dataset and testing dataset composed of pictures of real objects, and creates the endpoint with the model we can use to run object recognition in our application.
The following diagram illustrates our solution workflow:
Create synthetic images
The synthetic images are generated from the 3Dexperience platform, which is a product of Dassault Systèmes. This platform allows you to create and render photorealistic images based on the object’s CAD (computer-aided design) file. We can generate thousands of variants in a few hours by changing image transformation configurations on the platform.
In this prototype, we selected the following five visually distinct gearbox parts for object detection. They include a gear housing, gear ratio, bearing cover, flange, and worm gear.
We used the following data augmentation methods to increase the image diversity, and make the synthetic data more photorealistic. It helps reduce the model generalization error.
- Zoom in/out – This method randomly zooms in or out the object in images.
- Rotation – This method rotates the object in images, and it looks like a virtual camera takes random pictures of the object from 360-degree angles.
- Improve the look and feel of the material – We identified that for some gear parts the look of the material is less realistic in the initial rendering. We added a metallic effect to improve the synthetic images.
-
Use different lighting settings – In this prototype, we simulated two lighting conditions:
- Warehouse – A realistic light distribution. Shadows and reflections are possible.
- Studio – A homogeneous light is put all around the object. This is not realistic but there is no shadows or reflections.
- Use a realistic position of how the object is viewed in real time – In real life, some objects, such as a flange and bearing cover, are generally placed on a surface, and the model is detecting the objects based on the top and bottom facets. Therefore, we removed the training images that show the thin edge of the parts, also called the edge position, and increased the images of objects in a flat position.
- Add multiple objects in one image – In real-life scenarios, multiple gear parts could all appear in one view, so we prepared images that contain multiple gear parts.
On the 3Dexperience platform, we can apply different backgrounds to the images, which can help increase the image diversity further. Due to time limitation, we didn’t implement this in this prototype.
Import the synthetic training dataset
In ML, labeled data means the training data is annotated to show the target, which is the answer you want your ML model to predict. The labeled data that can be consumed by Rekognition Custom Labels should be complied with Ground Truth manifest file requirements. A manifest file is made of one or more JSON lines; each line contains the information for a single image. For synthetic training data, the labeling information can be generated programmatically based on the CAD file and image transformation configurations we mentioned earlier, which saves significant manual effort of labeling work. For more information about the requirements for labeling file formats, refer to Create a manifest file and Object localization in manifest files. The following is an example of image labeling:
After the manifest file is prepared, we upload it to a S3 bucket, and then create a training dataset in Rekognition Custom Labels by selecting the option Import images labeled by Amazon SageMaker Ground Truth.
After the manifest file is imported, we can view the labeling information visually on the Amazon Rekognition console. This helps us confirm that the manifest file is generated and imported. More specifically, the bounding boxes should align with the objects in images, and the objects’ class IDs should be assigned correctly.
Create the testing dataset
The test images are captured in real life with a phone or camera from different angles and lighting conditions, because we want to validate the model accuracy, which we trained using synthetic data, against the real-life scenarios. You can upload these test images to a S3 bucket, and then import them as datasets in Rekognition Custom Labels. Or you can upload them directly to datasets from your local machine.
Rekognition Custom Labels provides built-in image annotation capability, which has a similar experience as Ground Truth. You can start the labeling work when test data is imported. For an object detection use case, the bounding boxes should be created tightly around the objects of interest, which helps the model learn precisely the regions and pixels that belong to the target objects. In addition, you should label every instance of the target objects in all images, even those that are partially out of view or occluded by other objects, otherwise the model predicts more false negatives.
Create the cross-domain object detection model
Rekognition Custom Labels is a fully managed service; you just need to provide the train and test datasets. It trains a set of models and chooses the best-performing one based on the data provided. In this prototype, we prepare the synthetic training datasets iteratively by experimenting with different combinations of the image augmentation methods that we mentioned earlier. One model is created for each training dataset in Rekognition Custom Labels, which allows us to compare and find the optimal training dataset for this use case specifically. Each model has the minimum number of training images, contains good image diversity, and provides the best model accuracy. After 15 iterations, we achieved an F1 score of 98% model accuracy using around 10,000 synthetic training images, which is 2,000 images per object on average.
Results of model inference
The following image shows the Amazon Rekognition model being used in a real-time inference application. All components are detected correctly with high confidence.
Conclusion
In this post, we demonstrated how to train a computer vision model on purely synthetic images, and how the model can still reliably recognize real-world objects. This saves significant manual effort collecting and labeling the training data. With this exploration, Dassault Systèmes is expanding the business value of the 3D product models created by designers and engineers, because you can now use CAD, CAE, and PLM data in recognition systems for images in the physical world.
For more information about Rekognition Custom Labels key features and use cases, see Amazon Rekognition Custom Labels. If your images aren’t labeled natively with Ground Truth, which was the case for this project, see Creating a manifest file to convert your labeling data to the format that Rekognition Custom Labels can consume.
About the Authors
Woody Borraccino is currently a Senior Machine Learning Specialist Solution Architect at AWS. Based in Milan, Italy, Woody worked on software development before joining AWS back in 2015, where he growth is passion for Computer Vision and Spatial Computing (AR/VR/XR) technologies. His passion is now focused on the metaverse innovation. Follow him on Linkedin.
Ying Hou, PhD, is a Machine Learning Prototyping Architect at AWS. Her main areas of interests are Deep Learning, Computer Vision, NLP and time series data prediction. In her spare time, she enjoys reading novels and hiking in national parks in the UK.
Bernard Paques is currently CTO of Storm Reply focused on industrial solutions deployed on AWS. Based in Paris, France, Bernard worked previously as a Principal Solution Architect and as a Principal Consultant at AWS. His contributions to enterprise modernization cover AWS for Industrial, AWS CDK, and these now stem into Green IT and voice-based systems. Follow him on Twitter.
Karl Herkt is currently Senior Strategist at Dassault Systèmes 3DExcite. Based in Munich, Germany, he creates innovative implementations of computer vision that deliver tangible results. Follow him on LinkedIn.