Amazon AWS – Page 282

AWS expands language support for Amazon Lex and Amazon Polly

November 13, 2020

by Esther Lee Amazon AWS

At AWS, our mission is to enable developers and businesses with no prior machine learning (ML) expertise to easily build sophisticated, scalable, ML-powered applications with our AI services. Today, we’re excited to announce that Amazon Lex and Amazon Polly are expanding language support. You can build ML-powered applications that fit the language preferences of your users. These easy-to-use services allow you to add intelligence to your business processes, automate workstreams, reduce costs, and improve the user experience for your customers and employees in a variety of languages.

New and improved features

Amazon Lex is a service for building conversational interfaces into any application using voice and text. Amazon Lex now supports French, Spanish, Italian and Canadian French. With the addition of these new languages, you can build and expand your conversational experiences to better understand and engage your customer base in a variety of different languages and accents. Amazon Lex can be applied to a diverse set of use cases such as virtual agents, conversational IVR systems, self-service chatbots, or application bots. For a full list of languages, please go to Amazon Lex languages.

Amazon Polly, a service that turns text into lifelike speech offers voices for all Amazon Lex languages. Our first Australian English voice, Olivia, is now generally available in Neural Text-to-Speech (NTTS). Olivia’s unique vocal personality and voice sounds expressive, natural and is easy to follow. You can now choose among three Australian English voices: Russell, Nicole and Olivia. For a full list of Amazon Polly’s voices, please go to Amazon Polly voices.

“Growing demand for conversational experiences led us to launch Amazon Lex and Amazon Polly to enable businesses to connect with their customers more effectively,” shares Julien Simon, AWS AIML evangelist.

“Amazon Lex uses automatic speech recognition and natural language understanding to help organizations understand a customer’s intent, fluidly manage conversations and create highly engaging and lifelike interactions. We are delighted to advance the language capabilities of Lex and Polly. These launches allow our customers to take advantage of AI in the area of conversational interfaces and voice AI,” Simon says.

“Amazon Lex is a core AWS service that enables Accenture to deliver next-generation, omnichannel contact center solutions, such as our Advanced Customer Engagement (ACE+) platform, to a diverse set of customers. The addition of French, Italian, and Spanish to Amazon Lex will further enhance the accessibility of our global customer engagement solutions, while also vastly enriching and personalizing the overall experience for people whose primary language is not English. Now, we can quickly build interactive digital solutions based on Amazon’s deep learning expertise to deflect more calls, reduce contact center costs and drive a better customer experience in French, Italian, and Spanish-speaking markets. Amazon Lex can now improve customer satisfaction and localized brand awareness even more effectively,” says J.C. Novoa, Global Technical Evangelist – Advanced Customer Engagement (ACE+) for Accenture.

Another example is Clevy, a French start-up and AWS customer. François Falala-Sechet, the CTO of Clevy adds, “At Clevy, we have been utilizing Amazon Lex’s best-in-class natural language processing services to help bring customers a scalable low-code approach to designing, developing, deploying and maintaining rich conversational experiences with more powerful and more integrated chatbots. With the addition of Spanish, Italian and French in Amazon Lex, Clevy can now help our developers deliver chatbot experiences to a more diverse audience in our core European markets.”

Eudata helps customers implement effective contact and management systems. Andrea Grompone, the Head of Contact Center Delivery at Eudata says, “Ora Amazon Lex parla in italiano! We are excited about the new opportunities this opens for Eudata. Amazon Lex simplifies the process of creating automated dialog-based interactions to address challenges we see in the market. The addition of Italian allows us to build a customer experience that ensures both service speed and quality in our markets.”

Using the new features

To use the new Amazon Lex languages, simply choose the language when creating a new bot via the Amazon Lex console or AWS SDK. The following screenshot shows the console view.

To learn more, visit the Amazon Lex Development Guide.

You can use new Olivia voice in the Amazon Polly console, the AWS Command Line Interface (AWS CLI), or AWS SDK. The feature is available across all AWS Regions supporting NTTS. For the full list of available voices, see Voices in Amazon Polly, or log in to the Amazon Polly console to try it out for yourself.

Summary

Use Amazon Lex and Amazon Polly to build more self-service bots, to voice-enable applications, and to create an integrated voice and text experience for your customers and employees in a variety of languages. Try them out for yourself!

About the Author

Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.

Join the Final Lap of the 2020 DeepRacer League at AWS re:Invent 2020

November 13, 2020

by Dan McCorriston Amazon AWS

AWS DeepRacer is the fastest way to get rolling with machine learning (ML). It’s a fully autonomous 1/18th scale race car driven by reinforcement learning, a 3D racing simulator, and a global racing league. Throughout 2020, tens of thousands of developers honed their ML skills and competed in the League’s virtual circuit via the AWS DeepRacer console and 14 AWS Summit online events.

The AWS DeepRacer League’s 2020 season is nearing the final lap with the Championship at AWS re:Invent 2020. From November 10 through December 15, there are three ways to join in the racing fun: learn how to develop a competitive reinforcement learning model through our sessions, enter and compete in the racing action for a chance to win prizes, and watch to cheer on other developers as they race for the cup. More than 100 racers have already qualified for the Championship Cup, but there is still time to compete. Log in today to qualify for a chance to win the Championship Cup by entering the Wildcard round, offering the top 5 racers spots in the Knockout Rounds. Starting December 1, it’s time for the Knockout Rounds to start – and for racers to compete all the way to the checkered flag and the Championship Cup. The Grand Prize winner will receive a choice of either 10,000 USD AWS promotional credits and a chance to win an expenses-paid trip to an F1 Grand Prix in the upcoming 2021 season or a Coursera online Machine Learning degree scholarship with a maximum value of up to 20,000 USD. See our AWS DeepRacer 2020 Championships Official Rules for more details.

Watch the latest episode of DRTV news to learn more about how the Championship at AWS re:Invent 2020 will work.

Congratulations to our 2020 AWS re:Invent Championship Finalists!

Thanks to the thousands of developers who competed in the 2020 AWS DeepRacer League. Below is the list of our Virtual and Summit Online Circuit winners who qualified for the Championship at AWS re:Invent 2020.

Last chance for the Championship: Enter the Wildcard

Are you yet to qualify for the Championship Cup this season? Are you brand new to the league and want to take a shot at the competition? Well, you have one last chance to qualify with the Wildcard. Through November, the open-play wildcard race will be open. This race is a traditional virtual circuit style time trial race, taking place in the AWS DeepRacer console. Participants have until 11:59pm UTC November 30 (6:59pm EST, 3:59pm PST) to submit their fastest model. The top five competitors from the wildcard race will advance to the Championship Cup knockout.

Don’t worry if you don’t advance to the next round. There are chances for developers of all skill levels to compete and win at AWS re:Invent, including the AWS DeepRacer League open racing and special live virtual races. Visit our DeepRacer page for complete race schedule and additional details.

Here’s an overview of how the Championships are organized and how many racers participate in each round from qualifying through to the Grand Prix Finale.

Round 1: Live Group Knockouts

On December 1, racers need to be ready for anything in the championships, no matter what road blocks they may come across. In Round 1, competitors have the opportunity to participate in a brand-new live racing format on the console. Racers submit their best models and control maximum speed remotely from anywhere in the world, while their autonomous models attempt to navigate the track, complete with objects to avoid. They’ll have 3 minutes to try to achieve their single best lap to top the leaderboard. Racers will be split into eight groups based on their time zone, with start order determined by the warmup round (with the fastest racers from round 1 getting to go last in their group). The top four times in each group will advance to our bracket round. Tune in to AWS DeepRacer TV throughout AWS re:Invent to catch the championship action.

Round 2: Bracket Elimination

The top 32 remaining competitors will be placed into a single elimination bracket, where they face off against one another in a head-to-head format in a five-lap race. Head-to-head virtual matchups will proceed until eight racers remain. Results will be released on the AWS DeepRacer League page and in the console.

Round 3: Grand Prix Finale

The final race will take place before the closing keynote on December 15 as an eight-person virtual Grand Prix. Similar to the F1 ProAm in May, our eight finalists will submit their model on the console and the AWS DeepRacer team will run the Grand Prix, where the eight racers simultaneously face off on the track in simulation, to complete five laps. The first car to successfully complete 5 laps and cross the finish line will be crowned the 2020 AWS DeepRacer Champion and officially announced at the closing keynote.

More Options for your ML Journey

If you’re ready to get over the starting line on your ML journey, AWS DeepRacer re:Invent sessions are the best place to learn ML fast. In 2020, we have not one, not two, but three levels of ML content for aspiring developers to go from zero to hero in no time! Register now for AWS re:Invent to learn more about session schedules when they become available.

Get rolling with Machine Learning on AWS DeepRacer (200L). Get hands-on with AWS DeepRacer, including exciting announcements and enhancements coming to the league in 2021. Learn about the basics of machine learning and reinforcement learning (a machine learning technique ideal for autonomous driving). In this session, you can build a reinforcement learning model and submit that model to the AWS DeepRacer League for a chance to win prizes and glory.
Shift your Machine Learning model into overdrive with AWS DeepRacer analysis tools (300L). Make your way from the middle of the pack to the top of the AWS DeepRacer podium! This session extends your machine learning skills by exploring how human analysis of reinforcement learning through logs will improve your performance through trend identification and optimization to better prepare for new racing divisions coming to the league in 2021.
Replicate AWS DeepRacer architecture to master the track with SageMaker Notebooks (400L). Complete the final lap on your machine learning journey by demystifying the underlying architecture of AWS DeepRacer using Amazon SageMaker, AWS RoboMaker, and Amazon Kinesis Video Streams. Dive into SageMaker notebooks to learn how others have applied the skills acquired through AWS DeepRacer to real-world use cases and how you can apply your reinforcement learning models to relevant use cases.

You can take all the courses live during re:Invent or learn at your own speed on-demand. It’s up to you. Visit the DeepRacer page at AWS re:Invent to register and find out more on when sessions will be available.

As you can see, there are many opportunities to up-level your ML skills, join in the racing action and cheer on developers as they go for the Championship Cup. Watch this page for schedule and video updates all through AWS re:Invent 2020!

About the Author

Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.

Measuring forecast model accuracy to optimize your business objectives with Amazon Forecast

November 12, 2020

by Namita Das Amazon AWS

We’re excited to announce that you can now measure the accuracy of your forecasting model to optimize the trade-offs between under-forecasting and over-forecasting costs, giving you flexibility in experimentation. Costs associated with under-forecasting and over-forecasting differ. Generally, over-forecasting leads to high inventory carrying costs and waste, whereas under-forecasting leads to stock-outs, unmet demand, and missed revenue opportunities. Amazon Forecast allows you to optimize these costs for your business objective by providing an average forecast as well as a distribution of forecasts that captures variability of demand from a minimum to maximum value. With this launch, Forecast now provides accuracy metrics for multiple distribution points when training a model, allowing you to quickly optimize for under-forecasting and over-forecasting without the need to manually calculate metrics.

Retailers rely on probabilistic forecasting to optimize their supply chain to balance the cost of under-forecasting, leading to stock-outs, with the cost of over-forecasting, leading to inventory carrying costs and waste. Depending on the product category, retailers may choose to generate forecasts at different distribution points. For example, grocery retailers choose to over-stock staples such as milk and eggs to meet variable demand. While these staple foods have relatively low carrying costs, stock-outs may not only lead to a lost sale, but also a complete cart abandonment. By maintaining high in-stock rates, retailers can improve customer satisfaction and customer loyalty. Conversely, retailers may choose to under-stock substitutable products with high carrying costs when the cost of markdowns and inventory disposal outweighs the occasional lost sale. The ability to forecast at different distribution points allows retailers to optimize these competing priorities when demand is variable.

Anaplan Inc., a cloud-native platform for orchestrating business performance, has integrated Forecast into its PlanIQ solution to bring predictive forecasting and agile scenario planning to enterprise customers. Evgy Kontorovich, Head of Product Management, says, “Each customer we work with has a unique operational and supply chain model that drives different business priorities. Some customers aim to reduce excess inventory by making more prudent forecasts, while others strive to improve in-stock availability rates to consistently meet customer demand. With forecast quantiles, planners can evaluate the accuracy of these models, evaluate model quality, and fine-tune them based on their business goals. The ability to assess the accuracy of forecast models at multiple custom quantile levels further empowers our customers to make highly informed decisions optimized for their business.”

Although Forecast provided the ability to forecast at the entire distribution of variability to manage the trade-offs of under-stocking and over-stocking, the accuracy metrics were only provided for the minimum, median, and maximum predicted demand, providing an 80% confidence band centered on the median. To evaluate the accuracy metric at a particular distribution point of interest, you first had to create forecasts at that point, and then manually calculate the accuracy metrics on your own.

With today’s launch, you can assess the strengths of your forecasting models at any distribution point within Forecast without needing to generate forecasts and manually calculate metrics. This capability enables you to experiment faster, and more cost-effectively arrive at a distribution point for your business needs.

To use this new capability, you select forecast types (or distribution points) of interest when creating the predictor. Forecast splits the input data into training and testing datasets, trains and tests the model created, and generates accuracy metrics for these distribution points. You can continue to experiment to further optimize forecast types without needing to create forecasts at each step.

Understanding Forecast accuracy metrics

Forecast provides different model accuracy metrics for you to assess the strength of your forecasting models. We provide the weighted quantile loss (wQL) metric for each selected distribution point, and weighted average percentage error (WAPE) and root mean square error (RMSE), calculated at the mean forecast. For each metric, a lower value indicates a smaller error and therefore a more accurate model. All these accuracy metrics are non-negative.

Let’s take an example of a retail dataset as shown in the following table to understand these different accuracy metrics. In this dataset, three items are being forecasted for 2 days.

Item ID	Remarks	Date	Actual Demand	Mean Forecast	P75 Forecast	P75 Error (P75 Forecast – Actual)	Mean Absolute error \|Actual – Mean Forecast\|	Mean Squared Error (Actual – Mean Forecast)²
Item₁	Item 1 is a popular item with a lot of product demand	Day 1	200	195	220	20	5	25
Item₁	Item 1 is a popular item with a lot of product demand	Day 2	100	85	90	-10	15	225
Item₂	With low product demand, Item 2 is in the long tail of demand	Day 1	1	2	3	2	1	1
Item₂		Day 2	2	3	5	3	1	1
Item₃	Item 3 is in the long tail of demand and observes a forecast with a large deviation from actual demand	Day 1	5	45	50	45	40	1600
Item₃		Day 2	5	35	40	35	30	900
			Total Demand = 313			Used in wQL[0.75]	Used in WAPE	Used in RMSE

The following table summarizes the calculated accuracy metrics for the retail dataset use case.

Metric	Value
wQL[0.75]	0.21565
WAPE	0.29393
RMSE	21.4165

In the following sections, we explain how each metric was calculated and recommendations for the best use case for each metric.

Weighted quantile loss (wQL)

The wQL metric measures the accuracy of a model at specified distribution points called quantiles. This metric helps capture the bias inherent in each quantile. For a grocery retailer who prefers to over-stock staples such as milk, choosing a higher quantile such as 0.75 (P75) better captures spikes in demand and therefore is more informative than forecasting at the median quantile of 0.5 (P50).

In this example, more emphasis is given to over-forecasting than under-forecasting and suggests that a higher amount of stock is needed to satisfy customer demand with 75% probability of success. In other words, the actual demand is less than or equal to forecasted demand 75% of the time, allowing the grocer to maintain target in-stock rates with less safety stock.

We recommend using the wQL measure at different quantiles when the costs of under-forecasting and over-forecasting differ. If the difference in costs is negligible, you may consider forecasting at the median quantile of 0.5 (P50) or use the WAPE metric, which is evaluated using the mean forecast. The following figure illustrates the probability of meeting demand dependent on quantile.

For the retail dataset example, the P75 forecast indicates that we want to prioritize over-forecasting and penalize under-forecasting. To calculate the wQL[0.75], sum the positive values in the P75 Error column and multiply them by a smaller weight of 1-0.75 = 0.25, and sum the negative values in the P75 Error column and multiply them by a larger weight of 0.75 to penalize under-forecasting. The wQL[0.75] is as follows:

Weighted absolute percentage error (WAPE)

The WAPE metric is the sum of the absolute error normalized by the total demand. The WAPE doesn’t penalize for under-forecasting or over-forecasting, and uses the average (mean) expected value of forecasts to calculate the differences. We recommend using the WAPE metric when the difference in costs of under-forecasting or over-forecasting is negligible, or if you want to evaluate the model accuracy at the mean forecast. For example, to predict the amount of cash needed at a given time in an ATM, a bank may choose to meet average demand because there is less concern of losing a customer or having excess cash in the ATM machine. In this example, you may choose to forecast at the mean and choose the WAPE as your metric to evaluate the model accuracy.

The normalization or weighting helps compare models from different datasets. For example, if the absolute error summed over the entire dataset is 5, it’s hard to interpret the quality of this metric without knowing the scale of total demand. A high total demand of 1000 results in a low WAPE (0.005) and a small total demand of 10 results in a high WAPE (0.5). The weighting in the WAPE and wQL allows these metrics to be compared across datasets with different scales.

The normalization or weighting also helps evaluate datasets that contain a mix of items with large and small demand. The WAPE metric emphasizes accuracy for items with larger demand. You can use WAPE for datasets where forecasting for a small number of SKUs drives majority of sale. For example, a retailer may prefer to use the WAPE metric to put less emphasis on forecasting errors related to special edition items, and prioritize forecasting errors for standard items with most sales.

In our retail dataset use case, the WAPE is equal to the sum of the absolute error column divided by the sum of the actual demand column (total demand).

Because the sum of the total demand is mostly driven by Item₁, the WAPE gives more importance to the accuracy of the popular Item₁.

Many retail customers work with sparse datasets, where most of their SKUs are sold infrequently. For most of the historical data points, the demand is 0. For these datasets, it’s important to account for the scale of total demand, making wQL and WAPE a better metric over RMSE to evaluate sparse datasets. The RMSE metric doesn’t take into account the scale of total demand, and returns a lower RMSE value by considering the total number of historical data points and the total number of SKUs, giving you a false sense of security that you have an accurate model.

Root mean square error (RMSE)

The RMSE metric is the square root of the sum of the squared errors (differences of the mean forecasted and actual values) divided by the product of the number of items and number of time points. The RMSE also doesn’t penalize for under-forecasting or over-forecasting, and can be used when the trade-offs between under-forecasting or over-forecasting are negligible or if you prefer to forecast at mean. Because the RMSE is proportional to the square of the errors, it’s sensitive to large deviations between the actual demand and forecasted values.

However, you should use RMSE with caution, because a few large deviations in forecasting errors can severely punish an otherwise accurate model. For example, if one item in a large dataset is severely under-forecasted or over-forecasted, the error in that item skews the entire RMSE metric drastically, and may make you reject an otherwise accurate model prematurely. For use cases where a few large deviations are not of importance, consider using wQL or WAPE.

In our retail dataset example, the RMSE is equal to square root of the sum of the squared error column divided by the total number of points (3 items * 2 days = 6).

The RMSE gives more importance to the large deviation of forecasting error of Item₃, resulting in a higher RMSE value.

We recommend using the RMSE metric when a few large incorrect predictions from a model on some items can be very costly to the business. For example, a manufacturer who is predicting machine failure may prefer to use the RMSE metric. Because operating machinery is critical, any large deviation from the actual and forecasted demand, even infrequent, needs to be over-emphasized when assessing model accuracy.

The following table summarizes our discussion on selecting an accuracy metric dependent on your use case.

Use case	wQL	WAPE	RMSE
Optimizing for under-forecasting or over-forecasting, which may have different implications	X
Prioritizing popular items or items with high demand is more important than low-demand items	X	X
Emphasizing business costs related to large deviations in forecasts errors			X
Assessing sparse datasets with 0 demand for most items in the historical data points	X	X

Selecting custom distribution points for assessing model accuracy in Forecast

You can measure model accuracy in Forecast through the CreatePredictor API and GetAccuracyMetrics API or using the Forecast console. In this section, we walk through the steps to use the console.

On the Forecast console, create a dataset group.

Upload your dataset.

In the navigation pane, choose Predictors.
Choose Train predictor.

Under Predictor Settings for Forecast types, you can enter up to five distribution points of your choosing. Mean is also accepted.

If unspecified, default quantiles of 0.1, 0.5, and 0.9 are used to train a predictor and calculate accuracy metrics.

For Algorithm selection, select Automatic (AutoML).

AutoML optimizes a model for the specified quantiles.

Optionally, for Number of backtest windows, you can select up to five to assess your model strength.

After your predictor is trained, choose your predictor on the Predictors page to view details of the accuracy metrics.

On the predictor’s details page, you can view the overall model accuracy metrics and the metrics from each backtest windows that you specified. Overall model accuracy metrics are calculating by averaging the individual backtest window metrics.

Now that your model is trained, choose Forecasts in the navigation pane.
Choose Create a forecast.
Select your trained predictor to create a forecast.

Forecasts are generated for the distribution points used during training a predictor. You also have the option to specify different distribution points for generating forecasts.

Tips and best practices

In this section, we share a few tips and best practices when using Forecast:

Before experimenting with Forecast, define your business problem related to costs of under-forecasting or over-forecasting. Evaluate the trade-offs and prioritize if you would rather over-forecast than under.
Experiment with multiple distribution points to optimize your forecast model to balance the costs associated with under-forecasting and over-forecasting.
If you’re comparing different models, use the wQL metric at the same quantile for comparison. The lower the value, the more accurate the forecasting model.
Forecast allows you to select up to five backtest windows. Forecast uses backtesting to tune predictors and produce accuracy metrics. To perform backtesting, Forecast automatically splits your time-series datasets into two sets: training and testing. The training set is used to train your model, and the testing set to evaluate the model’s predictive accuracy. We recommend choosing more than one backtest window to minimize selection bias that may make one window more or less accurate by chance. Assessing the overall model accuracy from multiple backtest windows provides a better measure of the strength of the model.

Conclusion

You can now measure the accuracy of your forecasting model at multiple distribution points of your choosing with Forecast. This gives you more flexibility to align Forecast with your business, because the impact of over-forecasting frequently differs from under-forecasting. You can use this capability in all Regions where Forecast is publicly available. For more information about Region availability, see Region Table. For more information about APIs specifying custom distribution points when creating a predictor and accessing accuracy metrics, see the documentation for the CreatePredictor API and GetAccuracyMetrics API. For more details on evaluating a predictor’s accuracy for multiple distribution points, see Evaluating Predictor Accuracy.

About the Authors

Namita Das is a Sr. Product Manager for Amazon Forecast. Her current focus is to democratize machine learning by building no-code/low-code ML services. On the side, she frequently advises startups and is raising a puppy named Imli.

Danielle Robinson is an Applied Scientist on the Amazon Forecast team. Her research is in time series forecasting and in particular how we can apply new neural network-based algorithms within Amazon Forecast. Her thesis research was focused on developing new, robust, and physically accurate numerical models for computational fluid dynamics. Her hobbies include cooking, swimming, and hiking.

AWS Finance and Global Business Services builds an automated contract-processing platform using Amazon Textract and Amazon Comprehend

November 12, 2020

by Han Man Amazon AWS

Processing incoming documents such as contracts and agreements is often an arduous task. The typical workflow for reviewing signed contracts involves loading, reading, and extracting contractual terms from agreements, which requires hours of manual effort and intensive labor.

At AWS Finance and Global Business Services (AWS FGBS), this process typically takes more than 150 employee hours per month. Often, multiple analysts manually input key contractual data into an Excel workbook in batches of a hundred contracts at a time.

Recently, one of the AWS FGBS teams responsible for analyzing contract agreements set out to implement an automated workflow to process incoming documents. The goal was to liberate specialized accounting resources from routine and tedious manual labor, so they had more free time to perform value-added financial analysis.

As a result, the team built a solution that consistently parses and stores important contractual data from an entire contract in under a minute with high fidelity and security. Now an automated process only requires a single analyst working 30 hours a month to maintain and run the platform. This is a 5x reduction in processing time and significantly improves operational productivity.

This application was made possible by two machine learning (ML)-powered AWS-managed services: Amazon Textract, which enables efficient document ingestion, and Amazon Comprehend, which provides downstream text processing that enables the extraction of key terms.

The following post presents an overview of this solution, a deep dive into the architecture, and a summary of the design choices made.

Contract workflow

The following diagram illustrates the architecture of the solution:

The AWS FGBS team, with help from the AWS Machine Learning (ML) Professional Services (ProServe) team, created an automated, durable, and scalable contract processing platform. Incoming contract data is stored in an Amazon Simple Storage Service (Amazon S3) data lake. The solution uses Amazon Textract to convert the contracts into text form and Amazon Comprehend for text analysis and term extraction. Critical terms and other metadata extracted from the contracts are stored in Amazon DynamoDB, a database designed to accept key-value and document data types. Accounting users can access the data via a custom web user interface hosted in Amazon CloudFront, where they can perform key user actions such as error checking, data validation, and custom term entry. They can also generate reports using a Tableau server hosted in Amazon Appstream, a fully-managed application streaming service. You can launch and host this end-to-end contract-processing platform in an AWS production environment using an AWS CloudFormation template.

Building an efficient and scalable document-ingestion engine

The contracts that the AWS FGBS team encounters often include high levels of sophistication, which has historically required human review to parse and extract relevant information. The format of these contracts has changed over time, varying in length and complexity. Adding to the challenge, the documents not only contain free text, but also tables and forms that hold important contextual information.

To meet these needs, the solution uses Amazon Textract as the document-ingestion engine. Amazon Textract is a powerful service built on top of a set of pre-trained ML computer vision models tuned to perform Optical Character Recognition (OCR) by detecting text and numbers from a rendering of a document, such as an image or PDF. Amazon Textract takes this further by recognizing tables and forms so that contextual information for each word is preserved. The team was interested in extracting important key terms from each contract, but not every contract contained the same set of terms. For example, many contracts hold a main table that has the name of a term on the left-hand side, and the value of the term on the right-hand side. The solution can use the result from the Form Extraction feature of Amazon Textract to construct a key-value pair that links the name and the value of a contract term.

The following diagram illustrates the architecture for batch processing in Amazon Textract:

A pipeline processes incoming contracts using the asynchronous Amazon Textract APIs enabled by Amazon Simple Queue Service (Amazon SQS), a distributed message queuing service. The AWS Lambda function StartDocumentTextAnalysis initiates the Amazon Textract processing jobs, which are triggered when new files are deposited into the contract data lake in Amazon S3. Because contracts are loaded in batches, and the asynchronous API can accommodate PDF files natively without first converting to an image file format, the design choice was made to build an asynchronous process to improve scalability. The solution uses DocumentAnalysis APIs instead of the DocumentDetection APIs so that recognition of tables and forms is enabled.

When a DocumentAnalysis job is complete, a JobID is returned and input into a second queue, where it is used to retrieve the completed output from Amazon Textract in the form of a JSON object. The response contains both the extracted text and document metadata within a large nested data structure. The GetDocumentTextAnalysis function handles the retrieval and storage of JSON outputs into an S3 bucket to await post-processing and term extraction.

Extracting standard and non-standard terms with Amazon Comprehend

The following diagram illustrates the architecture of key term extraction:

After the solution has deposited the processed contract output from Amazon Textract into an S3 bucket, a term extraction pipeline begins. The primary worker for this pipeline is the ExtractTermsFromTextractOutput function, which encodes the intelligence behind the term extraction.

There are a few key actions performed at this step:

The contract is broken up into sections and basic attributes are extracted, such as the contract title and contract ID number.
The standard contract terms are identified as key-value pairs. Using the table relationships uncovered in the Amazon Textract output, the solution can find the right term and value from tables containing key terms, while taking additional steps to convert the term value into the correct data format, such as parsing dates.
There is a group of non-standard terms that can either be within a table or embedded within the free text in certain sections of the contract. The team used the Amazon Comprehend custom classification model to identify these sections of interest.

To produce the proper training data for the model, subject matter experts on the AWS FGBS team annotated a large historical set of contracts to identify examples of standard contract sections (that don’t contain any special terms) and non-standard contract sections (that contain special terms and language). An annotation platform hosted on an Amazon SageMaker instance displays individual contract sections that were split up previously using the ExtractTermsFromTextractOutput function so that a user could label the sections accordingly.

A final custom text classification model was trained to perform paragraph section classification to identify non-standard sections with an F1 score over 85% and hosted using Amazon Comprehend. One key benefit of using Amazon Comprehend for text classification is the straightforward format requirements of the training data, because the service takes care of text preprocessing steps such as feature engineering and accounting for class imbalance. These are typical considerations that need to be addressed in custom text models.

After dividing contracts into sections, a subset of the paragraphs is passed to the custom classification model endpoint from the ExtractTermsFromTextractOutput function to detect the presence of non-standard terms. The final list of contract sections that were checked and sections that were flagged as non-standard is recorded in the DynamoDB table. This mechanism notifies an accountant to examine sections of interest that require human review to interpret non-standard contract language and pick out any terms that are worth recording.

After these three stages are complete (breaking down contract sections, extracting standard terms, and classifying non-standard sections using a custom model), a nested dictionary is created with key-value pairs of terms. Error checking is also built into this processing stage. For each contract, an error file is generated that specifies exactly which term extraction caused the error and the error type. This file is deposited into the S3 bucket where contract terms are stored so that users can trace back any failed extractions for a single contract to its individual term. A special error alert phrase is incorporated into the error logging to simplify searching through Amazon CloudWatch logs to locate points where extraction errors occurred. Ultimately, a JSON file containing up to 100 or more terms is generated for each contract and pushed into an intermediate S3 bucket.

The extracted data from each contract is sent to a DynamoDB database for storage and retrieval. DynamoDB serves as a durable data store for two reasons:

This key-value and document database has flexibility over a relational database because it can accept data structures such as large string values and nested key-value pairs, while not requiring a pre-defined schema, so that new terms can easily be added to the database in the future as contracts evolved.
This fully managed database delivers single-digit millisecond performance at scale so the custom front end and reporting can be integrated on top of this service.

Additionally, DynamoDB supports continuous backups and point-in-time-recovery, which enables data restoration in tables to any single second for the prior 35 days. This feature was crucial in earning trust from team; it protects their critical data against accidental deletions or edits and provides business continuity of essential downstream tasks such as financial reporting. An important check to perform during data upload is the DynamoDB item-size limit, which is set at 400 KB. With longer contracts, and long lists of extracted terms, some contract outputs exceed this limit, so a check is performed in the term extraction function to break up larger entries into multiple JSON objects.

Secure data access and validation through a custom web UI

The following diagram illustrates the custom web user interface (UI) and reporting architecture:

To effectively interact with the extracted contract data, the team built a custom web UI. The web application, built using the Angular v8 framework, is hosted through Amazon S3 on Amazon CloudFront, a cloud content delivery network. When the Amazon CloudFront URL is accessed, the user is first authorized and authenticated against a user pool with strict permissions allowing them to view and interact with this UI. After validation, the session information is saved to make sure the user stays logged in for only a set amount of time.

On the landing page, the user can navigate to the three key user scenarios displayed as links on the navigation panel:

Search for Records – View and edit a record for a contract
View Record History – View the record edit history for a contract
Appstream Reports – Open the reporting dashboard hosted in Appstream

To view a certain record, the user is prompted to search for the file using the specific customer name or file name. In the former, the search returns the latest records for each contract with the same customer name. In the latter, the search only returns the latest record for a specific contract with that file name.

After the user identifies the right contract to examine or edit, the user can choose the file name and go to the Edit Record page. This displays the full list of terms for a contract and allows the user to edit, add to, or delete the information extracted from the contract. The latest record is retrieved with form fields for each term, which allows the user to choose the value to validate the data, and edit the data if errors are identified. The user can also add new fields by choosing Add New Field and entering a key-value pair for the custom term.

After updating the entry with edited or new fields, the user can choose the Update Item button. This triggers the data for the new record being passed from the frontend via Amazon API Gateway to the PostFromDynamoDB function using a POST method, generating a JSON file that is pushed to the S3 bucket holding all the extracted term data. This file triggers the same UpdateDynamoDB function that pushed the original term data to the DynamoDB table after the first Amazon Textract processing run.

After verification, the user can choose Delete Record to delete all versions of the contract consistently through the DynamoDB and to the S3 buckets that store all the extracted contract data. This is initiated using the DELETE method, triggering the DeleteFromDynamoDB function that expunges the data. During contract data validation and editing, every update to a field that is pushed creates a new data record that automatically logs a timestamp and user identity, based on the login profile, to ensure fine-grained tracking of the edit history.

To take advantage of the edit tracking, the user can search for a contract and view the entire collection of records for a single contract, ordered by timestamp. This allows the user to compare edits made across time, including any custom fields that were added, and link those edits back to the editor identity.

To make sure the team could ultimately realize the business value from this solution, they added a reporting dashboard pipeline to the architecture. Amazon AppStream 2.0, a fully managed application streaming service, hosts an instance of the reporting application. AWS Glue crawlers build a schema for the data residing in Amazon S3, and Amazon Athena queries and loads the data into the Amazon AppStream instance and into the reporting application.

Given the sensitivity of this data for Amazon, the team implemented several security considerations so they could safely interact with the data without unauthorized third-party access. An Amazon Cognito authorizer secures the API Gateway. During user login, Amazon internal single sign-on with two-factor authentication is verified against an Amazon Cognito user pool that only contains relevant team members.

To view record history, the user can search for a contract and examine the entire collection of records for a single contract to review any erroneous or suspicious edits and trace back to the exact time of edit and identity of editor.

The team also took additional application-hardening steps to secure the architecture and front end. AWS roles and policies are ensured to be narrowly scoped according to least privilege. For storage locations such as Amazon S3 and DynamoDB, access is locked down and protected with server-side encryption enabled to cover encryption at rest, all public access is blocked, and versioning and logging is enabled. To cover encryption in transit, VPC interface endpoints powered by AWS PrivateLink connect AWS services together so data is transmitted over private AWS endpoints versus public internet. In all the Lambda functions, temporary data handling best practices are adhered to by using the Python tempfile library to address time of use attacks (TOCTOU) where attackers may pre-emptively place files at specified locations. Because Lambda is a serverless service, all data stored in a temp directory is deleted after invoking the Lambda function unless the data is pushed to another storage location, such as Amazon S3.

Customer data privacy is a top concern for the AWS FGBS team and AWS service teams. Although incorporating additional data to further train ML models in AWS services like Amazon Textract and Amazon Comprehend is crucial towards improving performance, you always retain the option to withhold your data. To prevent the retention of contract data processed by Amazon Textract and Amazon Comprehend to be stored for model retraining purposes, the AWS FGBS team requested for Customer Data Opt-Out by creating a customer support ticket.

To verify the integrity of the application given the sensitivity of the data processed, both internal security reviews and penetration testing from external vendors were performed. That included static code review, dynamic fuzz testing, form validation, script injection and cross site scripting (XSS), and other items from the OWASP top 10 web application security risks. To address distributed denial-of-service (DDos) attacks, throttling limits were set on the API Gateway, and CloudWatch alarms were set for a threshold invocation limit.

After security requirements were integrated into the application, the entire solution was codified into an AWS CloudFormation template so the solution can be launched into the team’s production account. The CloudFormation templates are comprised of nested stacks, delineating key components of the application’s infrastructure (front-end versus back-end services). A continuous deployment pipeline was set up using AWS CodeBuild to build and deploy any change to the application code or infrastructure. Development and production environments were created in two separate AWS accounts, with deployment into these environments managed by environment variables set in the respective accounts.

Conclusion

The application is now live and being tested with hundreds of contracts every month. Most importantly, the business is beginning to realize the benefits of saving time and value by automating this previously routine business process.

The AWS ProServe team is working with the AWS FGBS team on a second phase of the project to further enhance the solution. To improve accuracy of term extraction, they’re exploring pretrained and custom Amazon Comprehend NER models. They will implement a retraining pipeline so the platform intelligence can improve over time with new data. They’re also considering a new service, Amazon Augmented AI (Amazon A2I), which is a human-review capability, for aiding in reviewing low-confidence Amazon Textract outputs and generating new training data for model improvement.

As exciting new ML services are being launched by AWS, the AWS FGBS team hopes that solutions like these will replace legacy accounting practices as they continue to achieve their goals of modernizing their business operations.

Get started today! Explore your use case with the services mentioned in this post and many others on the AWS Management Console.

About the Authors

Han Man is a Senior Data Scientist with AWS Professional Services. He has a PhD in engineering from Northwestern University and has several years of experience as a management consultant advising clients in manufacturing, financial services, and energy. Today he is passionately working with customers from a variety of industries to develop and implement machine learning & AI solutions on AWS. He enjoys following the NBA and playing basketball in his spare time.

AWS Finance and Global Business Services Team

Carly Huang is a Senior Financial Analyst at Amazon Accounting supporting AWS revenue. She holds a Bachelor of Business Administration from Simon Fraser University and is a Chartered Professional Accountant (CPA) with several years of experience working as an auditor focusing on technology and manufacturing clients. She is excited to find new ways using AWS services to improve and simplify existing accounting processes. In her free time, she enjoys traveling and running.

Shonoy Agrawaal is a Senior Manager at Amazon Accounting supporting AWS revenue. He holds a Bachelor of Business Administration from the University of Washington and is a Certified Public Accountant with several years of experience working as an auditor focused on retail and financial services clients. Today, he supports the AWS business with accounting and financial reporting matters. In his free time, he enjoys traveling and spending time with family and friends.

AWS ProServe Team

Nithin Reddy Cheruku is a Sr. AI/ML architect with AWS Professional Services, helping customers digital transform leveraging emerging technologies. He likes to solve and work on business and community problems that can bring a change in a more innovative manner . Outside of work, Nithin likes to play cricket and ping pong.

Huzaifa Zainuddin is a Cloud Infrastructure Architect with AWS Professional Services. He has several years of experience working in a variety different technical roles. He currently works with customers to help design their infrastructure as well deploy, and scale applications on AWS. When he is not helping customers, he enjoys grilling, traveling, and playing the occasional video games.

Ananya Koduri is an Application Cloud Architect with the AWS Professional Services – West Coast Applications Teams. With a Master’s Degree in Computer Science, she’s been a consultant in the tech industry for 5 years, with varied clients in the Government, Mining and the Educational sector. In her current role, she works closely with the clients to implement the architectures on AWS. In her spare time she enjoys long hikes and is a professional classical dancer.

Vivek Lakshmanan is a Data & Machine Learning Engineer at Amazon Web Services. He has a Master’s degree in Software Engineering with specialization in Data Science from San Jose State University. Vivek is excited on applying cutting-edge technologies and building AI/ML solutions to customers in cloud. He is passionate about Statistics, NLP and Model Explainability in AI/ML. In his spare time, he enjoys playing cricket and taking unplanned road trips.

Meet Olivia: The first NTTS voice in Australian English for Amazon Polly

November 12, 2020

by Sarah Schopper Amazon AWS

Amazon Polly is launching a new Australian English voice, Olivia. Amazon Polly turns text into lifelike speech, allowing you to build speech-enabled products. Building upon the existing Australian English Standard voices, Nicole and Russell, Olivia is the first Australian English voice in Amazon Polly powered by the Neural Text-to-Speech (NTTS) technology.

The NTTS voices in Amazon Polly are designed to create a great user experience that is engaging and similar to listening to a real person. While building Amazon Polly voices, our goal is not only to offer a natural, human-like sound, but also to give end-users a sense of personality.

As the first NTTS Australian voice for Polly, Olivia reaches a new level of expressiveness that brings text to life. Olivia’s sweet Australian accent is warm and welcoming, striking the right balance between casualness and professionalism. Olivia’s bright personality and friendly tone provides an engaging user experience, which makes the voice suitable for a large variety of use cases such as voice bots, contact center IVRs, training videos, audio books, or long-form reading.

You can use Olivia via the Amazon Polly console, the AWS Command Line Interface (AWS CLI), or AWS SDK. The feature is available across all AWS Regions supporting NTTS. For more information, see What Is Amazon Polly? For the full list of available voices, see Voices in Amazon Polly, or log in to the Amazon Polly console to try it out for yourself! To get more control over the speech output, try SSML tags and tailor the voice to your needs.

About the Author

Sarah Schopper is a Language Engineer for English Text-to-Speech. At work, she delights customers with new voices for Amazon Polly. In her spare time, she enjoys playing board games and experimenting with new cooking ingredients.

Alexa gets better at predicting customers’ goals

November 11, 2020

by admin Amazon AWS

With a new machine learning system, Alexa can infer that an initial question implies a subsequent request.Read More

Configuring Amazon SageMaker Studio for teams and groups with complete resource isolation

November 11, 2020

by Vikrant Kahlir Amazon AWS

Amazon SageMaker is a fully managed service that provides every machine learning (ML) developer and data scientist with the ability to build, train, and deploy ML models quickly. Amazon SageMaker Studio is a web-based, integrated development environment (IDE) for ML that lets you build, train, debug, deploy, and monitor your ML models. Amazon SageMaker Studio provides all the tools you need to take your models from experimentation to production while boosting your productivity. You can write code, track experiments, visualize data, and perform debugging and monitoring within a single, integrated visual interface.

This post outlines how to configure access control for teams or groups within Amazon SageMaker Studio using attribute-based access control (ABAC). ABAC is a powerful approach that you can utilize to configure Studio so that different ML and data science teams have complete isolation of team resources.

We provide guidance on how to configure Amazon SageMaker Studio access for both AWS Identity and Access Management (IAM) and AWS Single Sign-On (AWS SSO) authentication methods. This post helps you set up IAM policies for users and roles using ABAC principals. To demonstrate the configuration, we set up two teams as shown in the following diagram and showcase two use cases:

Use case 1 – Only User A1 can access their studio environment; User A2 can’t access User A1’s environment, and vice versa
Use case 2 – Team B users cannot access artifacts (experiments, etc.) created by Team A members

You can configure policies according to your needs. You can even include a project tag in case you want to further restrict user access by projects within a team. The approach is very flexible and scalable.

Authentication

Amazon SageMaker Studio supports the following authentication methods for onboarding users. When setting up Studio, you can pick an authentication method that you use for all your users:

IAM – Includes the following:
- IAM users – Users managed in IAM
- AWS account federation – Users managed in an external identity provider (IdP)
AWS SSO – Users managed in an external IdP federated using AWS SSO

Data science user personas

The following table describes two different personas that interact with Amazon SageMaker Studio resources and the level of access they need to fulfill their duties. We use this table as a high-level requirement to model IAM roles and policies to establish desired controls based on resource ownership at the team and user level.

User Personas

Permissions

Admin User

Create, modify, delete any IAM resource.

Create Amazon SageMaker Studio user profiles with a tag.

Read and describe Amazon SageMaker resources.

Data Scientists or Developers

Launch an Amazon SageMaker Studio IDE assigned to a specific IAM or AWS SSO user.

Create Amazon SageMaker resources with necessary tags. For this post, we use the team tag.

Update, delete, and run resources created with a specific tag.

Read and describe Amazon SageMaker resources.

Solution overview

We use the preceding requirements to model roles and permissions required to establish controls. The following flow diagram outlines the different configuration steps:

Applying your policy to the admin user

You should apply the following policy to the admin user who creates Studio user profiles. This policy requires the admin to include the studiouserid tag. You could use a different name for the tag if need be. The Studio console doesn’t allow you to add tags when creating user profiles, so we use the AWS Command Line Interface (AWS CLI).

For admin users managed in IAM, attach the following policy to the user. For admin users managed in an external IdP, add the following policy to the rule that the user assumes upon federation. The following policy enforces the studiouserid tag to be present when the sagemaker:CreateUserProfile action is invoked.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CreateSageMakerStudioUserProfilePolicy",
            "Effect": "Allow",
            "Action": "sagemaker:CreateUserProfile",
            "Resource": "*",
            "Condition": {
                "ForAnyValue:StringEquals": {
                    "aws:TagKeys": [
                        "studiouserid"
                    ]
                }
            }
        }
    ]
}

AWS SSO doesn’t require this policy; it performs the identity check.

Assigning the policy to Studio users

The following policy limits Studio access to the respective users by requiring the resource tag to match the user name for the sagemaker:CreatePresignedDomainUrl action. When a user tries to access the Amazon SageMaker Studio launch URL, this check is performed.

For IAM users, attach the following policy to the user. Use the user name for the studiouserid tag value.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerPresignedUrlPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreatePresignedDomainUrl"
            ],
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "sagemaker:ResourceTag/studiouserid": "${aws:username}" 
                }
            }
        }
    ]
}

For AWS account federation, attach the following policy to role that the user assumes after federation:

{
   "Version": "2012-10-17",
   "Statement": [
       {
           "Sid": "AmazonSageMakerPresignedUrlPolicy",
           "Effect": "Allow",
           "Action": [
                "sagemaker:CreatePresignedDomainUrl"
           ],
           "Resource": "*",
           "Condition": {
                  "StringEquals": {
                      "sagemaker:ResourceTag/studiouserid": "${aws:PrincipalTag/studiouserid}"
                 }
            }
      }
  ]
}

Add the following statement to this policy in the Trust Relationship section. This statement defines the allowed transitive tag.

"Statement": [
     {
        --Existing statements
      },
      {
      "Sid": "IdentifyTransitiveTags",
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::<account id>:saml-provider/<identity provider>"
      },
      "Action": "sts:TagSession",
      "Condition": {
        "ForAllValues:StringEquals": {
          "sts:TransitiveTagKeys": [
            "studiouserid"
          ]
        }
      }
  ]

For users managed in AWS SSO, this policy is not required. AWS SSO performs the identity check.

Creating roles for the teams

To create roles for your teams, you must first create the policies. For simplicity, we use the same policies for both teams. In most cases, you just need one set of policies for all teams, but you have the flexibility to create different policies for different teams. In the second step, you create a role for each team, attach the policies, and tag the roles with appropriate team tags.

Creating the policies

Create the following policies. For this post, we split them into three policies for more readability, but you can create them according to your needs.

Policy 1: Amazon SageMaker read-only access

The following policy gives privileges to List and Describe Amazon SageMaker resources. You can customize this policy according to your needs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerDescribeReadyOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:Describe*",
                "sagemaker:GetSearchSuggestions"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerListOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:List*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerUIandMetricsOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:*App",
                "sagemaker:Search",
                "sagemaker:RenderUiTemplate",
                "sagemaker:BatchGetMetrics"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerEC2ReadOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeRouteTables",
                "ec2:DescribeSecurityGroups",
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcEndpoints",
                "ec2:DescribeVpcs"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerIAMReadOnlyPolicy",
            "Effect": "Allow",
            "Action": [
                "iam:ListRoles"
            ],
            "Resource": "*"
        }
    ]
}

Policy 2: Amazon SageMaker access for supporting services

The following policy gives privileges to create, read, update, and delete access to Amazon Simple Storage Service (Amazon S3), Amazon Elastic Container Registry (Amazon ECR), and Amazon CloudWatch, and read access to AWS Key Management Service (AWS KMS). You can customize this policy according to your needs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerCRUDAccessS3Policy",
            "Effect": "Allow",
            "Action": [
"s3:PutObject",
"s3:GetObject",
"s3:AbortMultipartUpload",
"s3:DeleteObject",
"s3:CreateBucket",
"s3:ListBucket",
"s3:PutBucketCORS",
"s3:ListAllMyBuckets",
"s3:GetBucketCORS",
               	"s3:GetBucketLocation"         
              ],
            "Resource": "<S3 BucketName>"
        },
        {
            "Sid": "AmazonSageMakerReadOnlyAccessKMSPolicy",
            "Effect": "Allow",
            "Action": [
                "kms:DescribeKey",
                "kms:ListAliases"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerCRUDAccessECRPolicy",
            "Effect": "Allow",
            "Action": [
"ecr:Set*",
"ecr:CompleteLayerUpload",
"ecr:Batch*",
"ecr:Upload*",
"ecr:InitiateLayerUpload",
"ecr:Put*",
"ecr:Describe*",
"ecr:CreateRepository",
"ecr:Get*",
                 	"ecr:StartImageScan"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerCRUDAccessCloudWatchPolicy",
            "Effect": "Allow",
            "Action": [
"cloudwatch:Put*",
"cloudwatch:Get*",
"cloudwatch:List*",
"cloudwatch:DescribeAlarms",
"logs:Put*",
"logs:Get*",
"logs:List*",
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:ListLogDeliveries",
"logs:Describe*",
"logs:CreateLogDelivery",
"logs:PutResourcePolicy",
                 	"logs:UpdateLogDelivery"
            ],
            "Resource": "*"
        }
    ]
}

Policy 3: Amazon SageMaker Studio developer access

The following policy gives privileges to create, update, and delete Amazon SageMaker Studio resources.
It also enforces the team tag requirement during creation. In addition, it enforces start, stop, update, and delete actions on resources restricted only to the respective team members.

The team tag validation condition in the following code makes sure that the team tag value matches the principal’s team. Refer to the bolded code for specifcs.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonSageMakerStudioCreateApp",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateApp"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerStudioIAMPassRole",
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerInvokeEndPointRole",
            "Effect": "Allow",
            "Action": [
                "sagemaker:InvokeEndpoint"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerAddTags",
            "Effect": "Allow",
            "Action": [
                "sagemaker:AddTags"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AmazonSageMakerCreate",
            "Effect": "Allow",
            "Action": [
                "sagemaker:Create*"
            ],
            "Resource": "*",
            "Condition": { "ForAnyValue:StringEquals": { "aws:TagKeys": [ "team" ] }, "StringEqualsIfExists": { "aws:RequestTag/team": "${aws:PrincipalTag/team}" } }
        },
        {
            "Sid": "AmazonSageMakerUpdateDeleteExecutePolicy",
            "Effect": "Allow",
            "Action": [
                "sagemaker:Delete*",
                "sagemaker:Stop*",
                "sagemaker:Update*",
                "sagemaker:Start*",
                "sagemaker:DisassociateTrialComponent",
                "sagemaker:AssociateTrialComponent",
                "sagemaker:BatchPutMetrics"
            ],
            "Resource": "*",
            "Condition": { "StringEquals": { "aws:PrincipalTag/team": "${sagemaker:ResourceTag/team}" } }
        }
    ]
}

Creating and configuring the roles

You can now create a role for each team with these policies. Tag the roles on the IAM console or with the CLI command. The steps are the same for all three authentication types. For example, tag the role for Team A with the tag key= team and value = “<Team Name>”.

Creating the Amazon SageMaker Studio user profile

In this step, we add the studiouserid tag when creating Studio user profiles. The steps are slightly different for each authentication type.

IAM users

For IAM users, you create Studio user profiles for each user by including the role that was created for the team the user belongs to. The following code is a sample CLI command. As of this writing, including a tag when creating a user profile is available only through AWS CLI.

aws sagemaker create-user-profile --domain-id <domain id> --user-profile-name <unique profile name> --tags Key=studiouserid,Value=<aws user name> --user-settings ExecutionRole=arn:aws:iam::<account id>:role/<Team Role Name>

AWS account federation

For AWS account federation, you create a user attribute (studiouserid) in an external IdP with a unique value for each user. The following code shows how to configure the attribute in Okta:

Example below shows how to add “studiouserid” attribute in OKTA. In OKTA’s SIGN ON METHODS screen, configure following SAML 2.0 attributes, as shown in the image below. 

Attribute 1:
Name: https://aws.amazon.com/SAML/Attributes/PrincipalTag:studiouserid 
Value: user.studiouserid

Attribute 2:
Name: https://aws.amazon.com/SAML/Attributes/TransitiveTagKeys
Value: {"studiouserid"}

The following screenshot shows the attributes on the Okta console.

Next, create the user profile using the following command. Use the user attribute value in the preceding step for the studiouserid tag value.

aws sagemaker create-user-profile --domain-id <domain id> --user-profile-name <unique profile name> --tags Key=studiouserid,Value=<user attribute value> --user-settings ExecutionRole=arn:aws:iam::<account id>:role/<Team Role Name>

AWS SSO

For instructions on assigning users in AWS SSO, see Onboarding Amazon SageMaker Studio with AWS SSO and Okta Universal Directory.

Update the Studio user profile to include the appropriate execution role that was created for the team that the user belongs to. See the following CLI command:

aws sagemaker update-user-profile --domain-id <domain id> --user-profile-name <user profile name> --user-settings ExecutionRole=arn:aws:iam::<account id>:role/<Team Role Name> --region us-west-2

Validating that only assigned Studio users can access their profiles

When a user tries to access a Studio profile that doesn’t have studiouserid tag value matching their user name, an AccessDeniedException error occurs. You can test this by copying the link for Launch Studio on the Amazon SageMaker console and accessing it when logged in as a different user. The following screenshot shows the error message.

Validating that only respective team members can access certain artifacts

In this step, we show how to configure Studio so that members of a given team can’t access artifacts that another team creates.

In our use case, a Team A user creates an experiment and tags that experiment with the team tag. This limits access to this experiment to Team A users only. See the following code:

import sys
!{sys.executable} -m pip install sagemaker
!{sys.executable} -m pip install sagemaker-experiments

import time
import sagemaker
from smexperiments.experiment import Experiment

demo_experiment = Experiment.create(experiment_name = "USERA1TEAMAEXPERIMENT1",
                                    description = "UserA1 experiment",
                                    tags = [{'Key': 'team', 'Value': 'TeamA'}])

If a user who is not in Team A tries to delete the experiment, Studio denies the delete action. See the following code:

#command run from TeamB User Studio Instance
import time
from smexperiments.experiment import Experiment
experiment_to_cleanup = Experiment.load(experiment_name="USERA1TEAMAEXPERIMENT1")
experiment_to_cleanup.delete()

[Client Error]
An error occurred (AccessDeniedException) when calling the DeleteExperiment operation: User: arn:aws:sts:: :<AWS Account ID>::assumed-role/ SageMakerStudioDeveloperTeamBRole/SageMaker is not authorized to perform: sagemaker:DeleteExperiment on resource: arn:aws:sagemaker:us-east-1:<AWS Account ID>:experiment/usera1teamaexperiment1

Conclusion

In this post, we demonstrated how to isolate Amazon SageMaker Studio access using the ABAC technique. We showcased two use cases: restricting access to a Studio profile to only the assigned user (using the studiouserid tag) and restricting access to Studio artifacts to team members only. We also showed how to limit access to experiments to only the members of the team using the team tag. You can further customize policies by applying more tags to create more complex hierarchical controls.

Try out this solution for isolating resources by teams or groups in Amazon SageMaker Studio. For more information about using ABAC as an authorization strategy, see What is ABAC for AWS?

About the Authors

Vikrant Kahlir is Senior Solutions Architect in the Solutions Architecture team. He works with AWS strategic customers product and engineering teams to help them with technology solutions using AWS services for Managed Databases, AI/ML, HPC, Autonomous Computing, and IoT.

Rakesh Ramadas is an ISV Solution Architect at Amazon Web Services. His focus areas include AI/ML and Big Data.

Rama Thamman is a Software Development Manager with the AI Platforms team, leading the ML Migrations team.

British Newscaster speaking style now available in Amazon Polly

November 10, 2020

by Goeric Huybrechts Amazon AWS

Amazon Polly turns text into lifelike speech, allowing you to create applications that talk and build entirely new categories of speech-enabled products. We’re thrilled to announce the launch of a brand-new, British Newscaster speaking style voice: Amy. The speaking style mimics a formal and authoritative British newsreader. This Newscaster voice is the result of our latest achievements in Neural Text-to-Speech (NTTS) technology, making it possible to release new voices with only a few hours of recordings.

Amy’s British English Newscaster voice offers an alternative to the existing Newscaster speaking styles in US English (Matthew and Joanna, launched in July 2019) and US Spanish (Lupe, launched in April 2020). The style is suitable for a multitude of sectors, such as publishing and media. The high quality of the voice and its broadcaster-like style contribute to a more pleasant listening experience to relay news content.

Don’t just take our word for it! Our customer SpeechKit is a text-to-audio service that utilizes Amazon Polly as a core component of their toolkit. Here’s what their co-founder and COO, James MacLeod, has to say about this exciting new style: “News publishers use SpeechKit to publish their articles and newsletters in audio. The Amy Newscaster style is another great improvement from the Polly team, the pitch and clarity of intonation of this style fits well with this type of short-to-mid form news publishing. It provides listeners with a direct and informative style they’re used to hearing from human-read audio articles. As these voices advance, and new listening habits develop, publishers continue to observe improvements in audio engagement. News publishers can now start using the Amy Newscaster style through SpeechKit to make their articles available in audio, at scale, and track audio engagement.“

You can listen to the following samples to hear how this brand-new British Newscaster speaking style sounds:

Amy:

The following samples are the other Newscaster speaking styles in US English and US Spanish:

Matthew:

Joanna:

Lupe:

You can use Amy’s British Newscaster speaking style via the Amazon Polly console, the AWS Command Line Interface (AWS CLI), or AWS SDK. The feature is available in all AWS Regions supporting NTTS. For more information, see What Is Amazon Polly? For the full list of available voices, see Voices in Amazon Polly. Or log in to the Amazon Polly console to try it out for yourself! Additionally, Amy Newscaster and other selected Polly voices are now available to Alexa skill developers.

About the Author

Goeric Huybrechts is a Software Development Engineer in the Amazon Text-to-Speech Research team. At work, he is passionate about everything that touches AI. Outside of work, he loves sports, football in particular, and loves to travel.

Can computational linguists find a home in the technology industry?

November 10, 2020

by admin Amazon AWS

Alexa senior applied scientist provides career advice to graduate students considering a research role in industry.Read More

Learn from the winner of the AWS DeepComposer Chartbusters challenge The Sounds of Science

November 9, 2020

by Paloma Pineda Amazon AWS

AWS is excited to announce the winner of the AWS DeepComposer Chartbusters The Sounds of Science Challenge, Sungin Lee. AWS DeepComposer gives developers a creative way to get started with machine learning (ML). In June, we launched Chartbusters, a monthly global competition during which developers use AWS DeepComposer to create original compositions and compete to showcase their ML skills. The third challenge, The Sounds of Science, challenged developers to create background music for a video-clip.

Sungin is a Junior Solutions Architect for MegazoneCloud, one of the largest AWS partners in South Korea. Sungin studied linguistics and anthropology in university, but made a career change to cloud engineering. When Sungin first started learning about ML, he never knew he would create the winning composition for the Chartbusters challenge.

We interviewed Sungin to learn about his experience competing in the third Chartbusters challenge, which ran from September 2–23, 2020, and asked him to tell us more about how he created his winning composition.

Sungin Lee at his work station.

Getting started with machine learning

Sungin began his interest in ML and Generative Adversarial Networks (GANs) through the vocational education he received as he transitioned to cloud engineering.

“As part of the curriculum, there was a team project in which my team tried to make a model that generates an image according to the given sentence through GANs. Unfortunately, we failed at training the model due to the complexity of it but [the experience] deepened my interest in GANs.”

After receiving his vocational education, Sungin chose to pursue a career in cloud engineering and joined Megazone Cloud. Six months in to his career, Sungin’s team leader at work encouraged him to try AWS DeepComposer.

“When the challenge first launched, my team leader told me about the challenge and encouraged me to participate in it. I was already interested in GANs and music, and as a new hire, I wanted to show my machine learning skills.”

Building in AWS DeepComposer

In The Sounds of Science, developers composed background music for a video clip using the Autoregressive Convolutional Neural Network (AR-CNN) algorithm and edited notes with the newly launched Edit melody feature to better match the music with the provided video.

“I began by selecting the initial melody. When I first saw the video, I thought that one of the sample melodies, ‘Ode to Joy,’ went quite well with the atmosphere of the video and decided to use it. But I wanted the melody to sound more soothing than the original so I slightly lowered the pitch. Then I started enhancing the melody with AR-CNN.”

Sungin composing his melody.

Sungin worked on his competition for a day before generating his winning melody.

“I generated multiple compositions with AR-CNN until I liked the melody. Then I started adding more instruments. I experimented with all sample models from MuseGan and decided that rock suits melody the best. I found the ‘edit melody’ feature very helpful. In the process of enhancing the melody with AR-CNN, some off-key notes would appear and disrupt the harmony. But with the ‘edit melody’ feature, I could just remove or modify the wrong note and put the music back in key!”

The Edit melody feature on the AWS DeepComposer console.

“The biggest obstacle was my own doubt. I had a hard time being satisfied with the output, and even thought of giving up on the competition and never submitting any compositions. But then I thought, why give up? So I submitted my best composition by far and won the challenge.”

You can listen to Sungin’s winning composition, “The Joy,” on the AWS DeepComposer SoundCloud page.

Conclusion

Sungin believes that the AWS DeepComposer Chartbusters challenge gave him the confidence in his career transition to continue pursuing ML.

“It has been only a year since I started studying machine learning properly. As a non-Computer Science major without any basic computer knowledge, it was hard to successfully achieve my goals with machine learning. For example, my team project during the vocational education ended up unsuccessful, and the AWS DeepRacer model that I made could not finish the track. Then, when I was losing confidence in myself, I won first place in the AWS DeepComposer Chartbusters challenge! This victory reminded me that I could actually win something with machine learning and motivated me to keep studying.”

Overall, Sungin completed the challenge with a feeling of accomplishment and a desire to learn more.

“This challenge gave me self-confidence. I will keep moving forward on my machine learning path and keep track on new GAN techniques.”

Congratulations to Sungin for his well-deserved win!

We hope Sungin’s story has inspired you to learn more about ML and get started with AWS DeepComposer. Check out the next AWS DeepComposer Chartbusters challenge, and start composing today.

About the Author

Paloma Pineda is a Product Marketing Manager for AWS Artificial Intelligence Devices. She is passionate about the intersection of technology, art, and human centered design. Out of the office, Paloma enjoys photography, watching foreign films, and cooking French cuisine.

New and improved features

Using the new features

Summary

About the Author

Congratulations to our 2020 AWS re:Invent Championship Finalists!

Last chance for the Championship: Enter the Wildcard

Round 1: Live Group Knockouts

Round 2: Bracket Elimination

Round 3: Grand Prix Finale

More Options for your ML Journey

About the Author

Understanding Forecast accuracy metrics

Weighted quantile loss (wQL)

Weighted absolute percentage error (WAPE)

Root mean square error (RMSE)

Selecting custom distribution points for assessing model accuracy in Forecast

Tips and best practices

Conclusion

About the Authors

Contract workflow

Building an efficient and scalable document-ingestion engine

Extracting standard and non-standard terms with Amazon Comprehend

Secure data access and validation through a custom web UI

Conclusion

About the Authors

AWS Finance and Global Business Services Team

AWS ProServe Team

About the Author

Authentication

Data science user personas

Solution overview

Applying your policy to the admin user

Assigning the policy to Studio users

Creating roles for the teams

Creating the policies

Policy 1: Amazon SageMaker read-only access

Policy 2: Amazon SageMaker access for supporting services

Policy 3: Amazon SageMaker Studio developer access

Creating and configuring the roles

Creating the Amazon SageMaker Studio user profile

IAM users

AWS account federation

AWS SSO

Validating that only assigned Studio users can access their profiles

Validating that only respective team members can access certain artifacts

Conclusion

About the Authors

About the Author

Getting started with machine learning

Building in AWS DeepComposer

Conclusion

About the Author

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.