Deepset achieves a 3.9x speedup and 12.8x cost reduction for training NLP models by working with AWS and NVIDIA

This is a guest post from deepset (creators of the open source frameworks FARM and Haystack), and was contributed to by authors from NVIDIA and AWS. 

At deepset, we’re building the next-level search engine for business documents. Our core product, Haystack, is an open-source framework that enables developers to utilize the latest NLP models for semantic search and question answering at scale. Our software as a service (SaaS) platform, Haystack Hub, is used by developers from various industries, including finance, legal, and automotive, to find answers in all kinds of text documents. You can use these answers to improve the search experience, cover the long-tail of chat bot queries, extract structured data from documents, or automate invoicing processes.

Pretrained language models like BERT, RoBERTa, and ELECTRA form the core for this latest type of semantic search and many other NLP applications. Although plenty of English models are available, the availability for other languages and more industry-specific terms (such as finance or automotive) is usually very limited and often complicates applications in the industry. Therefore, we regularly train language models for languages not covered by existing models (such as German BERT and German ELECTRA), models for special domains (such as finance and aerospace), or even models for client-specific jargon.

Challenge

Pretraining language models from scratch typically involves two major challenges: cost and development effort.

Training a language model is an extremely compute-intensive task and requires multiple GPUs running for multiple days. To give you a rough idea, training the original RoBERTa model took about 1 day on 1024 NVIDIA V100 GPUs.

Computation costs aren’t the only thing that can stress your budget. A considerable amount of manual development is required to create the training data and vocabulary, configure hyperparameters, start and monitor training jobs, and run periodical evaluation of different model checkpoints. In our first training runs, we also found several bugs only after multiple hours of training, resulting in a slow development cycle. In summary, language model training can be a painful job for a developer and easily consumes multiple days of work.

Solution

In a collaborative effort, AWS, NVIDIA, and deepset were able to complete training 3.9 times faster while lowering cost by 12.8 times and reducing developer effort from days to hours. We optimized the GPU utilization during training via PyTorch’s DistributedDataParallel (DDP) and enabled larger batch sizes by switching to Automatic Mixed Precision (AMP). Furthermore, we introduced a StreamingDataSilo that allows us to load the training data lazily from disk and to do the preprocessing on the fly, leading to a lower memory footprint and no initial preprocessing time. Last but not least, we integrated the training with Amazon SageMaker to reduce manual development effort and benefit from around a 70% cost reduction by using Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances.  

In this post, we explore each of these technologies and their impact on improving BERT training performance. 

DistributedDataParallel

DistributedDataParallel (DDP) implements distributed data parallelism in PyTorch. This is a key module that’s essential for running training jobs at scale, on multiple machines or on multiple GPUs in a single machine. DDP parallelizes a given network module by splitting the input across specified devices (GPUs). The input is split in the batch dimension. The network module is replicated on each device, and each such replica handles a slice of the input. The gradients from each device are averaged during a backward pass. DDP is used in conjunction with the torch.distributed framework, which handles all communication and synchronization in distributed training jobs with PyTorch.

Before DDP was introduced, torch.nn.DataParallel (DP) was the standard module for doing single-machine multi-GPU training in PyTorch. The system in DP works as follows:

  1. The entire batch of input is loaded on the main thread.
  2. The batch is split and scattered across all the GPUs in the network.
  3. Each GPU runs the forward pass on its batch, split on a separate thread.
  4. The network outputs are gathered on the master GPU, the loss value is computed, and this loss value is then scattered across the other GPUs.
  5. Each GPU uses the loss value to run a backward pass and compute the gradients.
  6. The gradients are reduced on the master GPU and the model parameters on the master GPU are updated. This completes one iteration of training.
  7. To ensure all GPUs have the latest model parameters, they are broadcasted to the other GPUs at the start of the next iteration.

This design of DP has several inefficiencies:

  • Uneven GPU utilization – Only the primary GPU handles loss calculation, gradient reduction, and parameter updates, which leads to higher GPU memory consumption compared to the rest of the GPUs
  • Unnecessary broadcast at the beginning of each iteration – Because the model parameters are only updated in the master GPU, they need to be broadcast to the other GPUs before every iteration
  • Unnecessary gathering step – There is an unnecessary gathering step of model outputs on the GPU
  • Redundant data copies – Data is first copied to the master GPU, which is then split and copied over to the other GPUs
  • Multithreading overhead – Performance overhead caused by the Global Interpreter Lock (GIL) of the Python interpreter

DDP eliminates all such inefficiencies of DP. DDP uses a multiprocessing architecture, unlike the multithreaded one in DP. This means each GPU has its own dedicated process which runs independently and there is no master GPU anymore. Each process starts by loading its own split of data from the disk. Then the forward pass and loss computation is run independently on each GPU. This eliminates the need for gathering network outputs. During the backward pass, the gradients are AllReduced across the GPUs. Averaging the gradients with AllReduce ensures that the gradients in each GPU are identical. As a result, the model updates in each GPU are identical as well, which eliminates the need for parameter broadcast at the start of the next iteration.

Results with DDP

With DP, the GPU memory utilization was skewed towards the master GPU, which consumed 15 GB, whereas all other GPUs consumed 6 GB. With DDP, the memory consumption was split equally across 4 GPUs, or about 9 GB on each. This also allowed us to increase the per GPU batch size, which further contributed to the speedup in throughput. We reduced the number of gradient accumulation steps to keep the effective batch size constant, to ensure that there was no impact in convergence.

The following table shows the results on a P3.8xlarge instance with 4 NVIDIA V100 GPUs. With DDP, the training time reduced from 616 hours to 347 hours.

Run Batch Size Accumulation Steps Effective Batch Size Throughput (Effective Batches per Hour) Total Estimated Training Time (Hours)
BERT Training with DP 105 9 945 811 616
BERT Training with DDP 240 4 960 1415 347

The following screenshots show the GPU performance profiles captured by the NVIDIA Nsight Systems. The first screenshot shows the profile while running with DP. Light blue bars in the box marked as “GPU Utilization” show if GPUs are busy. Gaps between the blue areas show that the GPU utilization is zero. The red blocks in between are CPU to GPU memory copy operations. An additional blue block is between the memory copies, which is the aggregation operation that is computed on a single GPU.

The first screenshot shows the profile while running with DP.

The following screenshot shows high GPU utilization with DDP, which effectively deprecates all those inefficiencies.

The following screenshot shows high GPU utilization with DDP, which effectively deprecates all those inefficiencies.

Training with Automatic Mixed Precision

Automatic Mixed Precision (AMP) speeds up deep learning training with minimal impact to final accuracy. Traditionally, 32-bit precision floating point (FP32) variables are commonly used in deep learning training. You can improve speed of training with 16-bit precision floating point (FP16) variables because it requires lower storage and less memory bandwidth. However, training with lower precision could decrease the accuracy of the results. Mixed precision training is a balanced approach to achieve the computational speed up of lower precision training while maintaining accuracy close to FP32 precision training. Training with mixed precision provides additional speedup by using NVIDIA Tensor Cores, which are specialized hardware available on NVIDIA GPUs for accelerated computation.

To maintain accuracy, training with mixed precision involves the following steps:

  1. Port the model to use the FP16 datatype where appropriate.
  2. Handle specific functions or operations that must be done in FP32 to maintain accuracy.
  3. Add loss scaling to preserve small gradient values.

The AMP feature handles all these steps for deep learning training. As of this writing, all popular deep learning frameworks like PyTorch, TensorFlow, and Apache MXNet support AMP.

AMP in PyTorch was supported via the NVIDIA APEX library. AMP support has recently been moved to PyTorch core with the 1.6 release.

AMP in APEX library provides four levels of optimization for different application usage. Optimization levels O1 and O2 are both mixed precision modes with slight differences, where O1 is the recommended way for typical use cases and 02 is more aggressively converting most layers into FP16 mode. O0 and O4 opt levels are actually the FP32 mode and FP16 mode designed for reference only.

The following table shows the impact of applying AMP along with DDP. Without AMP, batch size up to 240 could be run on the GPU. With AMP, the larger batch size of 320 could be supported, reducing the total training time from 347 hours to 223 hours

Run Batch Size Accumulation Steps Effective Batch Size Throughput (Effective Batches per Hour) Total Estimated Training Time (Hours)
BERT Training with DDP 240 4 960 1415 347
BERT Training with DDP & AMP 01 304 3 912 2025 243
BERT Training with DDP & AMP 02 320 3 960 2210 223

As mentioned earlier, AMP O2 converts more layers into FP16 mode, so we can run DDP and AMP O2 with a larger batch size and get a better throughput compared to DDP and AMP O1. When selecting between these two opt levels, you should do a validation of the prediction results to make sure AMP O2 meets your accuracy requirements.

The following screenshot shows the GPU performance profile after applying DDP but running the deep learning training with FP32 variables. In this profile, we have added custom markers called NVTX markers, which show the time taken for each epoch, each step, and the time for the forward and backward pass.

The following screenshot shows the GPU performance profile after applying DDP but running the deep learning training with FP32 variables. 

The following screenshot shows the profile after enabling AMP with opt level O2. The time to run a forward and backward pass reduced significantly even though we increased the batch size for training when using AMP.

The following screenshot shows the profile after enabling AMP with opt level O2

Earlier, we mentioned that AMP utilizes Tensor Cores available on NVIDIA GPU hardware for significant speedup for deep learning training. GPU performance profiles show when operations are utilizing Tensor Cores. The following screenshot shows a sample GPU kernel that is run in FP32 mode. The GPU operation is marked here as the volta_sgemm kernel.

The GPU operation is marked here as the volta_sgemm kernel.

The following screenshot shows similar operations run in FP16 mode, which utilizes Tensor Cores. Kernels running with Tensor Cores are marked as volta_fp16_s884gemm.

The following screenshot shows similar operations run in FP16 mode

Data pipeline

The datasets used for training language models typically contain 10–200 GB of raw text data. Loading the whole dataset in RAM can be challenging. Furthermore, the typical pipeline of first running the preprocessing for all data and then pulling batches during training isn’t optimal because the up-front preprocessing can take multiple hours in which we don’t utilize the GPUs on the server.

Therefore, we introduced a StreamingDataSilo, which loads data lazily from disk just in time when it’s needed in the training loop. The whole preprocessing happens on the fly. Our implementation builds upon PyTorch’s IterableDataset and DistributedSampler, but requires some custom parts to ensure enough preprocessed batches are always in the queue for our trainer, so that the GPU never has to wait for the next batch to be ready. For implementation steps, see the GitHub repo. Together with an increased number of workers to fill the queue, we ended up with another 28% throughput improvement, as shown in the following table.

Run Batch Size Accumulation Steps Effective Batch Size Throughput (Effective Batches per Hour) Total Estimated Training Time (Hours)
DDP with 8 workers 320 3 960 2210 223
DDP with 16 workers 320 3 960 3077 160

A second tricky case that we had to handle was related to the unknown number of batches in our dataset and the distributed training via DDP. If the batches in your dataset can’t be evenly distributed across the number of workers, some workers don’t get any batches in the last step of the epoch while others do. This asynchronicity can crash your whole training run or result in a deadlock (for a related PyTorch issue, see [RFC] Join-based API to support uneven inputs in DDP). We handled this by adding a small synchronization step where all workers communicate if they still have data left. For implementation details, see the GitHub repo.

Spot Instances

Besides reducing the total training time, using EC2 Spot Instances is another compelling approach to reduce training costs. This is pretty straightforward to configure in SageMaker; just set the parameter EnableManagedSpotTraining to True when launching your training job. SageMaker launches your training job and saves checkpoints periodically. When your Spot Instance ends, SageMaker takes care of spinning up a new instance and loading the latest checkpoint to continue training from there.

In your code, you need to make sure to save regular checkpoints containing the states of all the objects that are relevant for your training session. This includes not only your model weights, but also the states of your optimizer, the data loader, and all random number generators to replicate the results from your continuous runs without Spot Instances. For implementation details, see the GitHub repo.

In our test runs, we achieved around 70% cost savings in comparison to regular On-Demand Instances.

Conclusion

Language models have become the backbone of modern NLP. Although using existing public models works well in many cases, many domains with special languages can benefit from training a new model from scratch. Having a fast, simple, and cheap training pipeline is essential for these big training jobs. In addition, the increased efficiency of training jobs reduces our energy usage and lowers our carbon footprint. By tackling different areas of FARM’s training pipeline, we were able to significantly optimize the resource utilization. In the end, we were able to achieve a speedup in training time of 3.9 times faster, a 12.8 times reduction in training cost, and reduced the developer effort required from days to hours.

If you’re interested in training your own BERT model, you can look at the open-source code in FARM or try our free SageMaker algorithm on the AWS Marketplace.


About the Authors

Abhinav Sharma is a Software Engineer at AWS Deep Learning. He works on bringing state-of-the-art deep learning research to customers, building products that help customers use deep learning engines. Outside of work, he enjoys playing tennis, noodling on his guitar and watching thriller movies.

Malte Pietsch is Co-Founder & CTO at deepset, where he builds the next-level enterprise search engine fueled by open source and NLP. He holds a M.Sc. with honors from TU Munich and conducted research at Carnegie Mellon University. He is an open-source lover, likes reading papers before breakfast, and is obsessed with automating the boring parts of our work.

Khaled ElGalaind is the engineering manager for AWS Deep Engine Benchmarking, focusing on performance improvements for AWS machine learning customers. Khaled is passionate about democratizing deep learning. Outside of work, he enjoys volunteering with the Boy Scouts, BBQ, and hiking in Yosemite.

Jiahong Liu is a Solution Architect on the NVIDIA Cloud Service Provider team, where he helps customers adopt ML and AI solutions with better utilization of NVIDIA’s GPU to solve their business challenges.

Anish Mohan is a Machine Learning Architect at NVIDIA and the technical lead for ML and DL engagements with key NVIDIA customers in the greater Seattle region. Before NVIDIA, he was at Microsoft’s AI Division, working to develop and deploy AI and ML algorithms and solutions.

Read More

How to deliver natural conversational experiences using Amazon Lex Streaming APIs

Natural conversations often include pauses and interruptions. During customer service calls, a caller may ask to pause the conversation or hold the line while they look up the necessary information before continuing to answer a question. For example, callers often need time to retrieve credit card details when making bill payments. Interruptions are also common. Callers may interrupt a human agent with an answer before the agent finishes asking the entire question (for example, “What’s the CVV code for your credit card? It is the three-digit code top right corner.…”). Just like conversing with human agents, a caller interacting with a bot may interrupt or instruct the bot to hold the line. Previously, you had to orchestrate such dialog on Amazon Lex by managing client attributes and writing code via an AWS Lambda function. Implementing a hold pattern required code to keep track of the previous intent so that the bot could continue the conversation. The orchestration of these conversations was complex to build and maintain, and impacted the time to market for conversational interfaces. Moreover, the user experience was disjointed because the properties of prompts such as ability to interrupt were defined in the session attributes on the client.

Amazon Lex’s new streaming conversation APIs allow you to deliver sophisticated natural conversations across different communication channels. You can now easily configure pauses, interruptions and dialog constructs while building a bot with the Wait and Continue and Interrupt features. This simplifies the overall design and implementation of the conversation and makes it easier to manage. By using these features, the bot builder can quickly enhance the conversational capability of virtual agents or IVR systems.

In the new Wait and Continue feature, the ability to put the conversation into a waiting state is surfaced during slot elicitation. You can configure the slot to respond with a “Wait” message such as “Sure, let me know when you’re ready” when a caller asks for more time to retrieve information. You can also configure the bot to continue the conversation with a “Continue” response based on defined cues such as “I’m ready for the policy ID. Go ahead.” Optionally, you can set a “Still waiting” prompt to play messages like “I’m still here” or “Let me know if you need more time.” You can set the frequency of these messages to play and configure a maximum wait time for user input. If the caller doesn’t provide any input within the maximum wait duration, Amazon Lex resumes the dialog by prompting for the slot. The following screenshot shows the wait and continue configuration options on the Amazon Lex console.

The following screenshot shows the wait and continue configuration options on the Amazon Lex console. 

The Interrupt feature enables callers to barge-in while a prompt is played by the bot. A caller may interrupt the bot and answer a question before the prompt is completed. This capability is surfaced at the prompt level and provided as a default setting. On the Amazon Lex console, navigate to the Advanced Settings and under Slot prompts, enable the setting to allow users to interrupt the prompt.

On the Amazon Lex console, navigate to the Advanced Settings and under Slot prompts, enable the setting to allow users to interrupt the prompt.

After configuring these features, you can initiate a streaming interaction with the Lex bot by using the StartConversation API. The streaming capability enables you to capture user input, manage state transitions, handle events, and deliver a response required as part of a conversation. The input can be one of three types: audio, text, or DTMF, whereas the response can be either audio or text. The dialog progresses by eliciting an intent, populating any slots, confirming the intent, and finally closing the intent. Streaming allows intents to be defined based on different conversation states such as: InProgress, Waiting, Confirmed, Denied, Fulfilled, ReadyForFulfillment or Failed. A detailed list of different dialog and intents states, see the documentation for Amazon Lex intents.

The StartConversation API uses HTTP/2 for bidirectional communication between the client and Amazon Lex. Clients are expected to maintain a single long-running stream throughout the conversation and exchange required information with the server (and vice versa) via different events. To learn more about streaming for Amazon Lex bots, please refer the Lex streaming documentation.

Amazon Lex Streaming APIs in action

Let’s put all of this together in a bot. I first use the Amazon Lex console to configure a bot and then use the following sample telephony application to interact with the bot. For the purposes of this post, we use a simple bot to complete a bill payment with a credit card. Because callers often need time to retrieve their credit card details, we enable the wait and continue construct when eliciting the credit card number. We also set up a service-side interrupt to manage interruption.

The following is a sample conversation with our bot:

Caller:    I’d like to make a payment on my account
Bot:        Sure. Can we start off with your account ID?
Caller:    Yes the account ID is 1234
Bot:        Great. The balance due is $100. How much would you like to pay today?
Caller:    $100
Bot:        Ok. Can you share the 16-digit number on your credit card?
Caller:    Can you wait a minute?
Bot:        Sure

—————- a few seconds later —————-

Bot:        Let me know when ready. No rush.

—————- a few seconds later —————-

Caller:    Ok I am back
Bot:        Ok please go ahead
Caller:    The credit card number is 1234 5678 1234 5678
Bot:        Got it. Let me repeat that for you 1234 5678 1234 5678. Is that correct?
Caller:    Yes
Bot:        Ok. What’s the CVV code? You can find it in the top right on the signature panel on the back of the card.
Caller:    123
Bot:        Awesome. I will go ahead and process the $100 payment on your account using the card ending in 5678.
Caller:    Ok
Bot:        The payment went through. Your confirmation code is 1234.

The first step is to build an Amazon Lex bot with intents to process payment and get balance on the account. The ProcessPayment intent elicits the information required to process the payment, such as the payment amount, credit card number, CVV code, and expiration date. The GetBalanceAmount intent provides the balance on the account. The FallbackIntent is triggered when the user input can’t be processed by either of the two configured intents.

Deploying the sample bot

To create the sample bot, complete the following steps. This creates an Amazon Lex bot called PaymentsBot.

  1. On the Amazon Lex console, choose Create Bot.
  2. In the Bot configuration section, give the bot the name PaymentsBot.
  3. Specify AWS Identity and Access Management (IAM) permissions and COPPA flag.
  4. Choose Next.
  5. Under Languages, choose English(US).
  6. Choose Done.
  7. Add the ProcessPayment and GetBalanceAmount intents to your bot.
  8. For the ProcessPayment intent, add the following slots:
    1. PaymentAmount slot using the built-in AMAZON.Number slot type
    2. CreditCardNumber slot using the built-in AMAZON.AlphaNumeric slot type
    3. CVV slot using the built-in AMAZON.Number slot type
    4. ExpirationDate using the built-in AMAZON.Date built-in slot type
  9. Configure slot elicitation prompts for each slot.
  10. Configure a closing response for the ProcessPayment intent.
  11. Similarly, add and configure slots and prompts for GetBalanceAmount intents.
  12. Choose Build to test your bot.

For more information about creating a bot, see the Lex V2 documentation.

Configuring Wait and Continue

  1. Choose the ProcessPayment intent and navigate to the CreditCardNumber slot.
  2. Choose Advanced Settings to open the slot editor.
  3. Enable Wait and Continue for the slot.
  4. Provide the Wait, Still Waiting, and Continue responses.
  5. Save the intent and choose Build.

The bot is now configured to support the Wait and Continue dialog construct. Now let’s configure the client code. You can use a telephony application to interact with your Lex bot. You can download the code for setting up a telephony IVR interface via Twilio at the GitHub project. The link contains information to set up a telephony interface as well as a client application code to communicate between the telephony interface and Amazon Lex.

Now, let us review the client-side setup to use the bot configuration that we just enabled on the Amazon Lex console. The client application uses the Java SDK to capture payment information. In the beginning, you use the ConfigurationEvent to set up the conversation parameters. Then, you start sending an input event (AudioInputEvent, TextInputEvent or DTMFInputEvent) to send user input to the bot depending on the input type. When sending audio data, you would need to send multiple AudioInputEvent events, with each event containing a slice of the data.

The service first responds with TranscriptEvent to give transcription, then sends the IntentResultEvent to surface the intent classification results. Subsequently, Amazon Lex sends a response event (TextResponseEvent or AudioResponseEvent) that contains the response to play back to caller. If the caller requests the bot to hold the line, the intent is moved to the Waiting state and Amazon Lex sends another set of TranscriptEvent, IntentResultEvent and a response event. When the caller requests to continue the conversation, the intent is set to the InProgress state and the service sends another set of TranscriptEvent, IntentResultEvent and a response event. While the dialog is in the Waiting state, Amazon Lex responds with a set of IntentResultEvent and response event for every “Still waiting” message (there is no transcript event for server-initiated responses). If the caller interrupts the bot prompt at any time, Amazon Lex returns a PlaybackInterruptionEvent.

Let’s walk through the main elements of the client code:

  1. Create the Amazon Lex client:
    AwsCredentialsProvider awsCredentialsProvider = StaticCredentialsProvider
            .create(AwsBasicCredentials.create(accessKey, secretKey));
    
    LexRuntimeV2AsyncClient lexRuntimeServiceClient = LexRuntimeV2AsyncClient.builder()
            .region(region)
            .credentialsProvider(awsCredentialsProvider)
            .build();

  2. Create a handler to publish data to server:
    EventsPublisher eventsPublisher = new EventsPublisher();
    

  1. Create a handler to process bot responses:
    public class BotResponseHandler implements StartConversationResponseHandler {
    
        private static final Logger LOG = Logger.getLogger(BotResponseHandler.class);
    
    
        @Override
        public void responseReceived(StartConversationResponse startConversationResponse) {
            LOG.info("successfully established the connection with server. request id:" + startConversationResponse.responseMetadata().requestId()); // would have 2XX, request id.
        }
    
        @Override
        public void onEventStream(SdkPublisher<StartConversationResponseEventStream> sdkPublisher) {
    
            sdkPublisher.subscribe(event -> {
                if (event instanceof PlaybackInterruptionEvent) {
                    handle((PlaybackInterruptionEvent) event);
                } else if (event instanceof TranscriptEvent) {
                    handle((TranscriptEvent) event);
                } else if (event instanceof IntentResultEvent) {
                    handle((IntentResultEvent) event);
                } else if (event instanceof TextResponseEvent) {
                    handle((TextResponseEvent) event);
                } else if (event instanceof AudioResponseEvent) {
                    handle((AudioResponseEvent) event);
                }
            });
        }
    
        @Override
        public void exceptionOccurred(Throwable throwable) {
            LOG.error(throwable);
            System.err.println("got an exception:" + throwable);
        }
    
        @Override
        public void complete() {
            LOG.info("on complete");
        }
    
        private void handle(PlaybackInterruptionEvent event) {
            LOG.info("Got a PlaybackInterruptionEvent: " + event);
    
            LOG.info("Done with a  PlaybackInterruptionEvent: " + event);
        }
    
        private void handle(TranscriptEvent event) {
            LOG.info("Got a TranscriptEvent: " + event);
        }
    
    
        private void handle(IntentResultEvent event) {
            LOG.info("Got an IntentResultEvent: " + event);
    
        }
    
        private void handle(TextResponseEvent event) {
            LOG.info("Got an TextResponseEvent: " + event);
    
        }
    
        private void handle(AudioResponseEvent event) {//synthesize speech
            LOG.info("Got a AudioResponseEvent: " + event);
        }
    
    }
    

 

  1. Initiate the connection with the bot:
    StartConversationRequest.Builder startConversationRequestBuilder = StartConversationRequest.builder()
            .botId(botId)
            .botAliasId(botAliasId)
            .localeId(localeId);
    
    // configure the conversation mode with bot (defaults to audio)
    startConversationRequestBuilder = startConversationRequestBuilder.conversationMode(ConversationMode.AUDIO);
    
    // assign a unique identifier for the conversation
    startConversationRequestBuilder = startConversationRequestBuilder.sessionId(sessionId);
    
    // build the initial request
    StartConversationRequest startConversationRequest = startConversationRequestBuilder.build();
    
    CompletableFuture<Void> conversation = lexRuntimeServiceClient.startConversation(
            startConversationRequest,
            eventsPublisher,
            botResponseHandler);

  2. Establish the configurable parameters via ConfigurationEvent:
    public void configureConversation() {
        String eventId = "ConfigurationEvent-" + eventIdGenerator.incrementAndGet();
    
        ConfigurationEvent configurationEvent = StartConversationRequestEventStream
                .configurationEventBuilder()
                .eventId(eventId)
                .clientTimestampMillis(System.currentTimeMillis())
                .responseContentType(RESPONSE_TYPE)
                .build();
    
        eventWriter.writeConfigurationEvent(configurationEvent);
        LOG.info("sending a ConfigurationEvent to server:" + configurationEvent);
    }

  3. Send audio data to server:
    public void writeAudioEvent(ByteBuffer byteBuffer) {
        String eventId = "AudioInputEvent-" + eventIdGenerator.incrementAndGet();
    
        AudioInputEvent audioInputEvent = StartConversationRequestEventStream
                .audioInputEventBuilder()
                .eventId(eventId)
                .clientTimestampMillis(System.currentTimeMillis())
                .audioChunk(SdkBytes.fromByteBuffer(byteBuffer))
                .contentType(AUDIO_CONTENT_TYPE)
                .build();
    
        eventWriter.writeAudioInputEvent(audioInputEvent);
    }

  4. Manage interruptions on the client side:
    private void handle(PlaybackInterruptionEvent event) {
        LOG.info("Got a PlaybackInterruptionEvent: " + event);
    
        callOperator.pausePlayback();
    
        LOG.info("Done with a  PlaybackInterruptionEvent: " + event);
    }

  5. Enter the code to disconnect the connection:
    public void disconnect() {
    
        String eventId = "DisconnectionEvent-" + eventIdGenerator.incrementAndGet();
    
        DisconnectionEvent disconnectionEvent = StartConversationRequestEventStream
                .disconnectionEventBuilder()
                .eventId(eventId)
                .clientTimestampMillis(System.currentTimeMillis())
                .build();
    
        eventWriter.writeDisconnectEvent(disconnectionEvent);
    
        LOG.info("sending a DisconnectionEvent to server:" + disconnectionEvent);
    }

You can now deploy the bot on your desktop to test it out.

Things to know

The following are a couple of important things to keep in mind when you’re using the Amazon Lex V2 Console and APIs:

  • Regions and languages – The Streaming APIs are available in all existing Regions and support all current languages.
  • Interoperability with Lex V1 console – Streaming APIs are only available in the Lex V2 console and APIs.
  • Integration with Amazon Connect – As of this writing, Lex V2 APIs are not supported on Amazon Connect. We plan to provide this integration as part of our near-term roadmap.
  • Pricing – Please see the details on the Lex pricing page.

Try it out

Amazon Lex Streaming API is available now and you can start using it today. Give it a try, design a bot, launch it and let us know what you think! To learn more, please see the Lex streaming API documentation.


About the Authors

Esther Lee is a Product Manager for AWS Language AI Services. She is passionate about the intersection of technology and education. Out of the office, Esther enjoys long walks along the beach, dinners with friends and friendly rounds of Mahjong.

 

 

 

Swapandeep Singh is an engineer with Amazon Lex team. He works on making interactions with bot smoother and more human-like. Outside of work, he likes to travel and learn about different cultures.

Read More

Model serving in Java with AWS Elastic Beanstalk made easy with Deep Java Library

Deploying your machine learning (ML) models to run on a REST endpoint has never been easier. Using AWS Elastic Beanstalk and Amazon Elastic Compute Cloud (Amazon EC2) to host your endpoint and Deep Java Library (DJL) to load your deep learning models for inference makes the model deployment process extremely easy to set up. Setting up a model on Elastic Beanstalk is great if you require fast response times on all your inference calls. In this post, we cover deploying a model on Elastic Beanstalk using DJL and sending an image through a post call to get inference results on what the image contains.

About DJL

DJL is a deep learning framework written in Java that supports training and inference. DJL is built on top of modern deep learning engines (such as TenserFlow, PyTorch, and MXNet). You can easily use DJL to train your model or deploy your favorite models from a variety of engines without any additional conversion. It contains a powerful model zoo design that allows you to manage trained models and load them in a single line. The built-in model zoo currently supports more than 70 pre-trained and ready-to-use models from GluonCV, HuggingFace, TorchHub, and Keras.

Benefits

The primary benefit of hosting your model using Elastic Beanstalk and DJL is that it’s very easy to set up and provides consistent sub-second responses to a post request. With DJL, you don’t need to download any other libraries or worry about importing dependencies for your chosen deep learning framework. Using Elastic Beanstalk has two advantages:

  • No cold startup – Compared to an AWS Lambda solution, the EC2 instance is running all the time, so any call to your endpoint runs instantly and there isn’t any ovdeeerhead when starting up new containers.
  • Scalable – Compared to a server-based solution, you can allow Elastic Beanstalk to scale horizontally.

Configurations

You need to have the following gradle dependencies set up to run our PyTorch model:

plugins {
    id 'org.springframework.boot' version '2.3.0.RELEASE'
    id 'io.spring.dependency-management' version '1.0.9.RELEASE'
    id 'java'
}

dependencies {
    implementation platform("ai.djl:bom:0.8.0")
    implementation "ai.djl.pytorch:pytorch-model-zoo"
    implementation "ai.djl.pytorch:pytorch-native-auto"
    
    implementation "org.springframework.boot:spring-boot-starter"
    implementation "org.springframework.boot:spring-boot-starter-web"
}

The code

We first create a RESTful endpoint using Java SpringBoot and have it accept an image request. We decode the image and turn it into an Image object to pass into our model. The model is autowired by the Spring framework by calling the model() method. For simplicity, we create the predictor object on each request, where we pass our image for inference (you can optimize this by using an object pool) . When inference is complete, we return the results to the requester. See the following code:

    @Autowired ZooModel<Image, Classifications> model;

    /**
     * This method is the REST endpoint where the user can post their images
     * to run inference against a model of their choice using DJL.
     *
     * @param input the request body containing the image
     * @return returns the top 3 probable items from the model output
     * @throws IOException if failed read HTTP request
     */
    @PostMapping(value = "/doodle")
    public String handleRequest(InputStream input) throws IOException {
        Image img = ImageFactory.getInstance().fromInputStream(input);
        try (Predictor<Image, Classifications> predictor = model.newPredictor()) {
            Classifications classifications = predictor.predict(img);
            return GSON.toJson(classifications.topK(3)) + System.lineSeparator();
        } catch (RuntimeException | TranslateException e) {
            logger.error("", e);
            Map<String, String> error = new ConcurrentHashMap<>();
            error.put("status", "Invoke failed: " + e.toString());
            return GSON.toJson(error) + System.lineSeparator();
        }
    }

    @Bean
    public ZooModel<Image, Classifications> model() throws ModelException, IOException {
        Translator<Image, Classifications> translator =
                ImageClassificationTranslator.builder()
                        .optFlag(Image.Flag.GRAYSCALE)
                        .setPipeline(new Pipeline(new ToTensor()))
                        .optApplySoftmax(true)
                        .build();
        Criteria<Image, Classifications> criteria = Criteria.builder()
                .setTypes(Image.class, Classifications.class)
                .optModelUrls(MODEL_URL)
                .optTranslator(translator)
                .build();
        return ModelZoo.loadModel(criteria);
    }
    

A full copy of the code is available on the GitHub repo.

Building your JAR file

Go into the beanstalk-model-serving directory and enter the following code:

cd beanstalk-model-serving
./gradlew build

This creates a JAR file found in build/libs/beanstalk-model-serving-0.0.1-SNAPSHOT.jar

Deploying to Elastic Beanstalk

To deploy this model, complete the following steps:

  1. On the Elastic Beanstalk console, create a new environment.
  2. For our use case, we name the environment DJL-Demo.
  3. For Platform, select Managed platform.
  4. For Platform settings, choose Java 8 and the appropriate branch and version.

  1. When selecting your application code, choose Choose file and upload the beanstalk-model-serving-0.0.1-SNAPSHOT.jar that was created in your build.
  2. Choose Create environment.

After Elastic Beanstalk creates the environment, we need to update the Software and Capacity boxes in our configuration, located on the Configuration overview page.

  1. For the Software configuration, we add an additional setting in the Environment Properties section with the name SERVER_PORT and value 5000.
  2. For the Capacity configuration, we change the instance type to t2.small to give our endpoint a little more compute and memory.
  3. Choose Apply configuration and wait for your endpoint to update.

 

Calling your endpoint

Now we can call our Elastic Beanstalk endpoint with our image of a smiley face.

See the following code:

curl -X POST -T smiley.png <endpoint>.elasticbeanstalk.com/inference

We get the following response:

[
  {
    "className": "smiley_face",
    "probability": 0.9874626994132996
  },
  {
    "className": "face",
    "probability": 0.004804758355021477
  },
  {
    "className": "mouth",
    "probability": 0.0015588520327582955
  }
]

The output predicts that a smiley face is the most probable item in our image. Success!

Limitations

If your model isn’t called often and there isn’t a requirement for fast inference, we recommend deploying your models on a serverless service such as Lambda. However, this adds overhead due to the cold startup nature of the service. Hosting your models through Elastic Beanstalk may be slightly more expensive because the EC2 instance runs 24 hours a day, so you pay for the service even when you’re not using it. However, if you expect a lot of inference requests a month, we have found the cost of model serving on Lambda is equal to the cost of Elastic Beanstalk using a t3.small when there are about 2.57 million inference requests to the endpoint.

Conclusion

In this post, we demonstrated how to start deploying and serving your deep learning models using Elastic Beanstalk and DJL. You just need to set up your endpoint with Java Spring, build your JAR file, upload that file to Elastic Beanstalk, update some configurations, and it’s deployed!

We also discussed some of the pros and cons of this deployment process, namely that it’s ideal if you need fast inference calls, but the cost is higher when compared to hosting it on a serverless endpoint with lower utilization.

This demo is available in full in the DJL demo GitHub repo. You can also find other examples of serving models with DJL across different JVM tools like Spark and AWS products like Lambda. Whatever your requirements, there is an option for you.

Follow our GitHub, demo repository, Slack channel, and Twitter for more documentation and examples of DJL!

 


About the Author

Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.

Read More

Building your own brand detection and visibility using Amazon SageMaker Ground Truth and Amazon Rekognition Custom Labels – Part 1: End-to-end solution

According to Gartner, 58% of marketing leaders believe brand is a critical driver of buyer behavior for prospects, and 65% believe it’s a critical driver of buyer behavior for existing customers. Companies spend huge amounts of money on advertisement to raise brand visibility and awareness. In fact, as per Gartner, CMO spends over 21% of their marketing budgets on advertising. Brands have to continuously maintain and improve their image, understand their presence on the web or media content, and measure their marketing effort. All these are a top priority for every marketer. However, calculating the ROI from such advertisement can be augmented with artificial intelligence (AI)-powered tools to deliver more accurate results.

Nowadays brand owners are naturally more than a little interested in finding out how effectively their outlays are working for them. However, it’s difficult to assess quantitatively just how good the brand exposure is in a given campaign or event. The current approach to computing such statistics has involved manually annotating broadcast material, which is time-consuming and expensive.

In this post, we show you how to mitigate these challenges by using Amazon Rekognition Custom Labels to train a custom computer vision model to detect brand logos without requiring machine learning (ML) expertise, and Amazon SageMaker Ground Truth to quickly build a training dataset from unlabeled video samples that can be used for training.

For this use case, we want to build a company brand detection and brand visibility application that allows you to submit a video sample for a given marketing event to evaluate how long your logo was displayed in the entire video and where in the video frame the logo was detected.

Solution overview

Amazon Rekognition Custom Labels is an automated ML (AutoML) feature that enables you to train custom ML models for image analysis without requiring ML expertise. Upload a small dataset of labeled images specific to your business use case, and Amazon Rekognition Custom Labels takes care of the heavy lifting of inspecting the data, selecting an ML algorithm, training a model, and calculating performance metrics.

No ML expertise is required to build your own model. The ease of use and intuitive setup of Amazon Rekognition Custom Labels allows any user to bring their own dataset for their use case, label them into separate folders, and launch the Amazon Rekognition Custom Labels training and validation.

The solution is built on a serverless architecture, which means you don’t have to provision your own servers. You pay for what you use. As demand grows or decreases, the compute power adapts accordingly.

This solution demonstrates an end-to-end workflow from preparing a training dataset using Ground Truth to training a model using Amazon Rekognition Custom Labels to identify and detect brand logos in video files. The solution has three main components: data labeling, model training, and running inference.

Data labeling

Three types of labels are available with Ground Truth:

  • Amazon Mechanical Turk – An option to engage a team of global, on-demand workers
  • Vendors – Third-party data labeling services listed on AWS Marketplace
  • Private labelers – Your own teams of private labelers to label the brand logo impressions frame by frame from the video

For this post, we use the private workforce option.

Training the model using Amazon Rekognition Custom Labels

After the labeling job is complete, we train our brand logo detection model using these labeled images. The solution in this post creates an Amazon Rekognition Custom Labels project and custom model. Amazon Rekognition Custom Labels automatically inspects the labeled data provided, selects the right ML algorithms and techniques, trains a model, and provides model performance metrics.

Running inference

When our model is trained, Amazon Rekognition Custom Labels provides an inference endpoint. We can then upload and analyze images or video files using the inference endpoint. The web user interface presents a bar chart that shows the distribution of detected custom labels per minute in the analyzed video.

Architecture overview

The solution uses the serverless architecture. The following architectural diagram illustrates an overview of the solution.

The following architectural diagram illustrates an overview of the solution.

The solution is composed of two AWS Step Functions state machines:

  • Training – Manages extracting image frames from uploaded videos, creating and waiting on a Ground Truth labeling job, and creating and training an Amazon Rekognition Custom Labels model. You can then use the model to run brand logo detection analysis.
  • Analysis – Handles analyzing video or image files. It manages extracting image frames from the video files, starting the custom label model, running the inference, and shutting down the custom label model.

The solution provides a built-in mechanism to manage your custom label model runtime to ensure the model is shut down to keep your cost at a minimum.

The web application communicates with the backend state machines using an Amazon API Gateway RESTful endpoint. The endpoint is protected with a valid AWS Identity and Access Management (IAM) credential. The authentication to the web application is done through an Amazon Cognito user pool, where an authenticated user is issued a secure, time-bounded, temporary credential that can then be used to access “scoped” resources such as uploading video and image files to an Amazon Simple Storage Service (Amazon S3) bucket, invoking the API Gateway RESTful API endpoint to create a new training project, or running inference with the Amazon Rekognition Custom Labels model you trained and built. We use Amazon CloudFront to host the static contents residing on an S3 bucket (web), which is protected through OAID.

Prerequisites

For this walkthrough, you should have an AWS account with appropriate IAM permissions to launch the provided AWS CloudFormation template.

Deploying the solution

You can deploy the solution using a CloudFormation template with AWS Lambda-backed custom resources. To deploy the solution, use one of the following CloudFormation templates and follows the instructions:

AWS Region CloudFormation Template URL
US East (N. Virginia)
US East (Ohio)
US West (Oregon)
EU (Ireland)
  1. Sign in to the AWS Management Console with your IAM user name and password.
  2. On the Create stack page, choose Next.

On the Create stack page, choose Next.

  1. On the Specify stack details page, for FFmpeg Component, choose AGREE AND INSTALL.
  2. For Email, enter a valid email address to use for administrative purposes.
  3. For Price Class, choose Use Only U.S., Canada and Europe [PriceClass_100].
  4. Choose Next.

Choose Next.

  1. On the Review stack page, under Capabilities, select both check boxes.
  2. Choose Create stack.

Choose Create stack.

The stack creation takes roughly 25 minutes to complete; we’re using Amazon CodeBuild to dynamically build the FFmpeg component, and Amazon CloudFront distribution takes about 15 minutes to propagate to the edge locations.

After the stack is created, you should receive an invitation email from no-reply@verificationmail.com. The email contains a CloudFront URL link to access the demo portal, your login username, and a temporary password.

The email contains a CloudFront URL link to access the demo portal, your login username, and a temporary password.

Choose the URL link to open the web portal with Mozilla Firefox or Google Chrome. After you enter your user name and temporary credentials, you’re prompted to create a new password.

Solution walkthrough

In this section, we walk you through the following high-level steps:

  1. Setting up a labeling team.
  2. Creating and training your model.
  3. Completing a video object detection labeling job.

Setting up a labeling team

The first time you log in to the web portal, you’re prompted to create your labeling workforce. The labeling workforce defines members of your labelers who are given labeling tasks to work on when you start a new training project. Choose Yes to configure your labeling team members.

Choose Yes to configure your labeling team members.

You can also navigate to the Labeling Team tab to manage members from your labeling team at any time.

Follow the instructions to add an email and choose Confirm and add members. See the following animation to walk you through the steps.

The newly added member receives two email notifications. The first email contains the credential for the labeler to access the labeling portal. It’s important to note that the labeler is only given access to consume a labeling job created by Ground Truth. They don’t have access to any AWS resources other than working on the labeling job.

They don’t have access to any AWS resources other than working on the labeling job.

A second email “AWS Notification – Subscription Confirmation” contains instructions to confirm your subscription to an Amazon Simple Notification Service (Amazon SNS) topic so the labeler gets notified whenever a new labeling task is ready to consume.

The labeler gets notified whenever a new labeling task is ready to consume.

Creating and training your first model

Let’s start to train our first model to identify logos for AWS and AWS DeepRacer. For this post, we use the video file AWS DeepRacer TV – Ep 1 Amsterdam.

  1. On the navigation menu, choose Training.
  2. Choose Option 1 to train a model to identify the logos with bounding boxes.
  3. For Project name, enter DemoProject.
  4. Choose Add label.
  5. Add the labels AWS and DeepRacer.
  6. Drag and drop a video file to the drop area.

You can drop multiple video or JPEG or PNG image files.

  1. Choose Create project.

The following GIF animation illustrates the process.

At this point, a labeling job is soon created by Ground Truth and the labeler receives an email notification when the job is ready to consume.

Completing the video object detection labeling job

Ground Truth recently launched a new set of pre-built templates that help label video files. For our post, we use the video object detection task template. For more information, see New – Label Videos with Amazon SageMaker Ground Truth.

The training workflow is currently paused, waiting for the labelers to work on the labeling job.

  1. After the labeler receives an email notification that a job is ready for them, they can log in to the labeling portal and start the job by choosing Start working.
  2. For Label category, choose a label.
  3. Draw bounding boxes around the AWS or AWS DeepRacer logos.

You can use the Predict next button to predict the bounding box in subsequent frames.

The following GIF animation demonstrates the labeling flow.

After the labeler completes the job, the backend training workflow resumes and collects the labeled images from the Ground Truth labeling job and starts the model training by creating an Amazon Rekognition Custom Labels project. The time to train a model varies from an hour to a few hours depending on the complexity of the objects (labels) and the size of your training dataset. Amazon Rekognition Custom Labels automatically splits the dataset 80/20 to create the training dataset and test dataset, respectively.

Running inference to detect brand logos

After the model is trained, let’s upload a video file and run predictions with the model we trained.

  1. On the navigation menu, choose Analysis.
  2. Choose Start new analysis.
  3. Specify the following:
    1. Project name – The project we created in Amazon Rekognition Custom Labels.
    2. Project version – The specific version of the trained model.
    3. Inference units – Your desired inference units, so you can dial up or dial down the inference endpoint. For example, if you require higher transactions per second (TPS), use a larger number of inference endpoint.
  4. Drop and drop image files (JPEG, PNG) or video files (MP4, MOV) to the drop area.
  5. When the upload is complete, choose Done and wait for the analysis process to finish.

The analysis workflow starts and waits for the trained Amazon Rekognition Custom Labels model, runs the inference frame by frame, and shuts the model down when it’s no longer in use.

The following GIF animation demonstrates the analysis flow.

Viewing prediction results

The solution provides an overall statistic of the detected brands distributed across the video. The following screenshot shows that the AWS DeepRacer logo is detected about 25% overall and is detected approximately 60% in the 00:01:00–00:02:00 timespan. In contrast, the AWS logo is detected at a much lower rate. For this post, we only used one video to train the model, which had relatively few AWS logos. We can improve the accuracy by retraining the model with more video files.

We can improve the accuracy by retraining the model with more video files.

You can expand the shot element view to see how the brand logos are detected frame by frame.

You can expand the shot element view to see how the brand logos are detected frame by frame.

If you choose a frame to view, it shows the logo with a confidence score. The images that are grayed out are the ones that don’t detect any logo. The following image shows that the AWS DeepRacer logo is detected at frame #10237 with a confidence score of 82%.

The following image shows that the AWS DeepRacer logo is detected at frame #10237 with a confidence score of 82%.

Another image shows that the AWS logo is detected with a confidence score of 60%.

Another image shows that the AWS logo is detected with a confidence score of 60%.

Cleaning up

To delete the demo solution, simply delete the CloudFormation stack that you deployed earlier. However, deleting the CloudFormation stack doesn’t remove the following resources, which you must clean up manually to avoid potential recurring costs:

  • S3 buckets (web, source, and logs)
  • Amazon Rekognition Custom Labels project (trained model)

Conclusion

This post demonstrated how to use Amazon Rekognition Custom Labels to detect brand logos in images and videos. No ML expertise is required to build your own model. The ease of use and intuitive setup of Amazon Rekognition Custom Labels allows you to bring your own dataset, label it into separate folders, and launch the Amazon Rekognition Custom Labels training and validation. We created the required infrastructure, demonstrated installing and running the UI, and discussed the security and cost of the infrastructure.

In the second post in this series, we deep dive into data labeling from a video file using Ground Truth to prepare the data for the training phase. We also explain the technical details of how we use Amazon Rekognition Custom Labels to train the model. In the third post in this series, we dive into the inference phase and show you the reach set of statistics for your brand visibility in a given video file.

For more information about the code sample in this post, see the GitHub repo.


About the Authors

Ken Shek Ken Shek is a Global Vertical Solutions Architect, Media and Entertainment in EMEA region. He helps media customers to designs, develops, and deploys workloads onto AWS Cloud using the AWS Cloud best practice. Ken graduated from University of California, Berkeley and received his master degree in Computer Science at Northwestern Polytechnical University.

 

 

Amit Mukherjee is a Sr. Partner Solutions Architect with a focus on Data Analytics and AI/ML. He works with AWS partners and customers to provide them with architectural guidance for building highly secure scalable data analytics platform and adopt machine learning at a large scale.

 

 

Sameer Goel is a Solutions Architect in Seattle, who drives customer’s success by building prototypes on cutting-edge initiatives. Prior to joining AWS, Sameer graduated with Master’s Degree from NEU Boston, with Data Science concentration. He enjoys building and experimenting creative projects and applications.

Read More

Model serving made easier with Deep Java Library and AWS Lambda

Developing and deploying a deep learning model involves many steps: gathering and cleansing data, designing the model, fine-tuning model parameters, evaluating the results, and going through it again until a desirable result is achieved. Then comes the final step: deploying the model.

AWS Lambda is one of the most cost effective service that lets you run code without provisioning or managing servers. It offers many advantages when working with serverless infrastructure. When you break down the logic of your deep learning service into a single Lambda function for a single request, things become much simpler and easy to scale. You can forget all about the resource handling needed for the parallel requests coming into your model. If your usage is sparse and tolerable to a higher latency, Lambda is a great choice among various solutions.

Now, let’s say you’ve decided to use Lambda to deploy your model. You go through the process, but it becomes confusing or complex with the various setup steps to run your models. Namely, you face issues with the Lambda size limits and managing the model dependencies within.

Deep Java Library (DJL) is a deep learning framework designed to make your life easier. DJL uses various deep learning backends (such as Apache MXNet, PyTorch, and TensorFlow) for your use case and is easy to set up and integrate within your Java application! Thanks to its excellent dependency management design, DJL makes it extremely simple to create a project that you can deploy on Lambda. DJL helps alleviate some of the problems we mentioned by downloading the prepackaged framework dependencies so you don’t have to package them yourself, and loads your models from a specified location such as Amazon Simple Storage Service (Amazon S3) so you don’t need to figure out how to push your models to Lambda.

This post covers how to get your models running on Lambda with DJL in 5 minutes.

About DJL

Deep Java Library (DJL) is a Deep Learning Framework written in Java, supporting both training and inference. DJL is built on top of modern deep learning engines (such as TenserFlow, PyTorch, and MXNet). You can easily use DJL to train your model or deploy your favorite models from a variety of engines without any additional conversion. It contains a powerful model zoo design that allows you to manage trained models and load them in a single line. The built-in model zoo currently supports more than 70 pre-trained and ready-to-use models from GluonCV, HuggingFace, TorchHub, and Keras.

Prerequisites

You need the following items to proceed:

In this post, we follow along with the steps from the following GitHub repo.

Building and deploying on AWS

First we need to ensure we’re in the correct code directory. We need to create the an S3 bucket for storage, an AWS CloudFormation stack, and the Lambda function with the following code:

cd lambda-model-serving
./gradlew deploy

This creates the following:

  • An S3 bucket with the name stored in bucket-name.txt
  • A CloudFormation stack named djl-lambda and a template file named out.yml
  • A Lambda function named DJL-Lambda

Now we have our model deployed on a serverless API. The next section invokes the Lambda function.

Invoking the Lambda function

We can invoke the Lambda function with the following code:

aws lambda invoke --function-name DJL-Lambda --payload '{"inputImageUrl":"https://djl-ai.s3.amazonaws.com/resources/images/kitten.jpg"}' build/output.json

The output is stored into build/output.json:

cat build/output.json
[
  {
    "className": "n02123045 tabby, tabby cat",
    "probability": 0.48384541273117065
  },
  {
    "className": "n02123159 tiger cat",
    "probability": 0.20599405467510223
  },
  {
    "className": "n02124075 Egyptian cat",
    "probability": 0.18810519576072693
  },
  {
    "className": "n02123394 Persian cat",
    "probability": 0.06411759555339813
  },
  {
    "className": "n02127052 lynx, catamount",
    "probability": 0.01021555159240961
  }
]

Cleaning up

Use the cleanup scripts to clean up the resources and tear down the services created in your AWS account:

./cleanup.sh

Cost analysis

What happens if we try to set this up on an Amazon Elastic Compute Cloud (Amazon EC2) instance and compare the cost to Lambda? EC2 instances need to continuously run for it to receive requests at any time. This means that you’re paying for that additional time when it’s not in use. If we use a cheap t3.micro instance with 2 vCPUs and 1 GB of memory (knowing that some of this memory is used by the operating system and for other tasks), the cost comes out to $7.48 a month or about 1.6 million requests to Lambda. When using a more powerful instance such as t3.small with 2 vCPUs and 4 GB of memory, the cost comes out to $29.95 a month or about 2.57 million requests to Lambda.

There are pros and cons with using either Lambda or Amazon EC2 for hosting, and it comes down to requirements and cost. Lambda is the ideal choice if your requirements allow for sparse usage and higher latency due to the cold startup of Lambda (5-second startup) when it isn’t used frequently, but it’s cheaper than Amazon EC2 if you aren’t using it much, and the first call can be slow. Subsequent requests become faster, but if Lambda sits idly for 30–45 minutes, it goes back to cold-start mode.

Amazon EC2, on the other hand, is better if you require low latency calls all the time or are making more requests than what it costs in Lambda (shown in the following chart).

Minimal package size

DJL automatically downloads the deep learning framework at runtime, allowing for a smaller package size. We use the following dependency:

runtimeOnly "ai.djl.mxnet:mxnet-native-auto:1.7.0-backport"

This auto-detection dependency results in a .zip file less than 3 MB. The downloaded MXNet native library file is stored in a /tmp folder that takes up about 155 MB of space. We can further reduce this to 50 MB if we use a custom build of MXNet without MKL support.

The MXNet native library is stored in an S3 bucket, and the framework download latency is negligible when compared to the Lambda startup time.

Model loading

The DJL model zoo offers many easy options to deploy models:

  • Bundling the model in a .zip file
  • Loading models from a custom model zoo
  • Loading models from an S3 bucket (supports Amazon SageMaker trained model .tar.gz format)

We use the MXNet model zoo to load the model. By default, because we didn’t specify any model, it uses the resnet-18 model, but you can change this by passing in an artifactId parameter in the request:

aws lambda invoke --function-name DJL-Lambda --payload '{"artifactId": "squeezenet", "inputImageUrl":"https://djl-ai.s3.amazonaws.com/resources/images/kitten.jpg"}' build/output.json

Limitations

There are certain limitations when using serverless APIs, specifically in AWS Lambda:

  • GPU instances are not yet available, as of this writing
  • Lambda has a 512 MB limit for the /tmp folder
  • If the endpoint isn’t frequently used, cold startup can be slow

As mentioned earlier, this way of hosting your models on Lambda is ideal when requests are sparse and the requirement allows for higher latency calls due to the Lambda cold startup. If your requirements require low latency for all requests, we recommend using AWS Elastic Beanstalk with EC2 instances.

Conclusion

In this post, we demonstrated how to easily launch serverless APIs using DJL. To do so, we just need to run the gradle deployment command, which creates the S3 bucket, CloudFormation stack, and Lambda function. This creates an endpoint to accept parameters to run your own deep learning models.

Deploying your models with DJL on Lambda is a great and cost-effective method if Lambda has sparse usage and allows for high latency, due to its cold startup nature. Using DJL allows your team to focus more on designing, building, and improving your ML models, while keeping costs low and keeping the deployment process easy and scalable.

For more information on DJL and its other features, see Deep Java Library.

Follow our GitHub repo, demo repository, Slack channel, and Twitter for more documentation and examples of DJL!


About the Author

Frank Liu is a Software Engineer for AWS Deep Learning. He focuses on building innovative deep learning tools for software engineers and scientists. In his spare time, he enjoys hiking with friends and family.

Read More

Multi-account model deployment with Amazon SageMaker Pipelines

Amazon SageMaker Pipelines is the first purpose-built CI/CD service for machine learning (ML). It helps you build, automate, manage, and scale end-to-end ML workflows and apply DevOps best practices of CI/CD to ML (also known as MLOps).

Creating multiple accounts to organize all the resources of your organization is a good DevOps practice. A multi-account strategy is important not only to improve governance but also to increase security and control of the resources that support your organization’s business. This strategy allows many different teams inside your organization, to experiment, innovate, and integrate faster, while keeping the production environment safe and available for your customers.

Pipelines makes it easy to apply the same strategy to deploying ML models. Imagine a use case in which you have three different AWS accounts, one for each environment: data science, staging, and production. The data scientist has the freedom to run experiments and train and optimize different models any time in their own account. When a model is good enough to be deployed in production, the data scientist just needs to flip the model approval status to Approved. After that, an automated process deploys the model on the staging account. Here you can automate testing of the model with unit tests or integration tests or test the model manually. After a manual or automated approval, the model is deployed to the production account, which is a more tightly controlled environment used to serve inferences on real-world data. With Pipelines, you can implement a ready-to-use multi-account environment.

In this post, you learn how to use Pipelines to implement your own multi-account ML pipeline. First, you learn how to configure your environment and prepare it to use a predefined template as a SageMaker project for training and deploying a model in two different accounts: staging and production. Then, you see in detail how this custom template was created and how to create and customize templates for your own SageMaker projects.

Preparing the environment

In this section, you configure three different AWS accounts and use SageMaker Studio to create a project that integrates a CI/CD pipeline with the ML pipeline created by a data scientist. The following diagram shows the reference architecture of the environment that is created by the SageMaker custom project and how AWS Organizations integrates the different accounts.

The following diagram shows the reference architecture of the environment that is created by the SageMaker custom project and how AWS Organizations integrates the different accounts.

The diagram contains three different accounts, managed by Organizations. Also, three different user roles (which may be the same person) operate this environment:

  • ML engineer – Responsible for provisioning the SageMaker Studio project that creates the CI/CD pipeline, model registry, and other resources
  • Data scientist – Responsible for creating the ML pipeline that ends with a trained model registered to the model group (also referred to as model package group)
  • Approver – Responsible for testing the model deployed to the staging account and approving the production deployment

It’s possible to run a similar solution without Organizations, if you prefer (although not recommended). But you need to prepare the permissions and the trust relationship between your accounts manually and modify the template to remove the Organizations dependency. Also, if you’re an enterprise with multiple AWS accounts and teams, it’s highly recommended that you use AWS Control Tower for provisioning the accounts and Organizations. AWS Control Tower provides the easiest way to set up and govern a new and secure multi-account AWS environment. For this post, we only discuss implementing the solution with Organizations.

But before you move on, you need to complete the following steps, which are detailed in the next sections:

  1. Create an AWS account to be used by the data scientists (data science account).
  2. Create and configure a SageMaker Studio domain in the data science account.
  3. Create two additional accounts for production and staging.
  4. Create an organizational structure using Organizations, then invite and integrate the additional accounts.
  5. Configure the permissions required to run the pipelines and deploy models on external accounts.
  6. Import the SageMaker project template for deploying models in multiple accounts and make it available for SageMaker Studio.

Configuring SageMaker Studio in your account

Pipelines provides built-in support for MLOps templates to make it easy for you to use CI/CD for your ML projects. These MLOps templates are defined as Amazon CloudFormation templates and published via AWS Service Catalog. These are made available to data scientists via SageMaker Studio, an IDE for ML. To configure Studio in your account, complete the following steps:

  1. Prepare your SageMaker Studio domain.
  2. Enable SageMaker project templates and SageMaker JumpStart for this account and Studio users.

If you have an existing domain, you can simply edit the settings for the domain or individual users to enable this option. Enabling this option creates two different AWS Identity and Account Management (IAM) roles in your AWS account:

  • AmazonSageMakerServiceCatalogProductsLaunchRole – Used by SageMaker to run the project templates and create the required infrastructure resources
  • AmazonSageMakerServiceCatalogProductsUseRole – Used by the CI/CD pipeline to run a job and deploy the models on the target accounts

If you created your SageMaker Studio domain before re:Invent 2020, it’s recommended that you refresh your environment by saving all the work in progress. On the File menu, choose Shutdown, and confirm your choice.

  1. Create and prepare two other AWS accounts for staging and production, if you don’t have them yet.

Configuring Organizations

You need to add the data science account and the two additional accounts to a structure in Organizations. Organizations helps you to centrally manage and govern your environment as you grow and scale your AWS resources. It’s free and benefits your governance strategy.

Each account must be added to a different organizational unit (OU).

  1. On the Organizations console, create a structure of OUs like the following:
  • Root
    • multi-account-deployment (OU)
      • 111111111111 (data science account—SageMaker Studio)
      • production (OU)
        • 222222222222 (AWS account)
      • staging (OU)
        • 333333333333 (AWS account)

After configuring the organization, each account owner receives an invite. The owners need to accept the invites, otherwise the accounts aren’t included in the organization.

  1. Now you need to enable trusted access with AWS organizations (“Enable all features” and “Enable trusted access in the StackSets”).

This process allows your data science account to provision resources in the target accounts. If you don’t do that, the deployment process fails. Also, this feature set is the preferred way to work with Organizations, and it includes consolidating billing features.

  1. Next, on the Organizations console, choose Organize accounts.
  2. Choose staging.
  3. Note down the OU ID.
  4. Repeat this process for the production OU.

Repeat this process for the production OU.

Configuring the permissions

You need to create a SageMaker execution role in each additional account. These roles are assumed by AmazonSageMakerServiceCatalogProductsUseRole in the data science account to deploy the endpoints in the target accounts and test them.

  1. Sign in to the AWS Management Console with the staging account.
  2. Run the following CloudFormation template.

This template creates a new SageMaker role for you.

  1. Provide the following parameters:
    1. SageMakerRoleSuffix – A short string (maximum 10 lowercase with no spaces or alpha-numeric characters) that is added to the role name after the following prefix: sagemaker-role-. The final role name is sagemaker-role-<<sagemaker_role_suffix>>.
    2. PipelineExecutionRoleArn – The ARN of the role from the data science account that assumes the SageMaker role you’re creating. To find the ARN, sign in to the console with the data science account. On the IAM console, choose Roles and search for AmazonSageMakerServiceCatalogProductsUseRole. Choose this role and copy the ARN (arn:aws:iam::<<data_science_acccount_id>>:role/service-role/AmazonSageMakerServiceCatalogProductsUseRole).
  2. After creating this role in the staging account, repeat this process for the production account.

In the data science account, you now configure the policy of the Amazon Simple Storage Service (Amazon S3) bucket used to store the trained model. For this post, we use the default SageMaker bucket of the current Region. It has the following name format: sagemaker-<<region>>-<<aws_account_id>>.

  1. On the Amazon S3 console, search for this bucket, providing the Region you’re using and the ID of the data science account.

If you don’t find it, create a new bucket following this name format.

  1. On the Permissions tab, add the following policy:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "AWS": [
                        "arn:aws:iam::<<staging_account_id>>:root",
                        "arn:aws:iam::<<production_account_id>>:root"
                    ]
                },
                "Action": [
                    "s3:GetObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::sagemaker-<<region>>-<<aws_account_id>>",
                    "arn:aws:s3:::sagemaker-<<region>>-<<aws_account_id>>/*"
                ]
            }
        ]
    }

  1. Save your settings.

The target accounts now have permission to read the trained model during deployment.

The next step is to add new permissions to the roles AmazonSageMakerServiceCatalogProductsUseRole and AmazonSageMakerServiceCatalogProductsLaunchRole.

  1. In the data science account, on the IAM console, choose Roles.
  2. Find the AmazonSageMakerServiceCatalogProductsUseRole role and choose it.
  3. Add a new policy and enter the following JSON code.
  4. Save your changes.
  5. Now, find the AmazonSageMakerServiceCatalogProductsLaunchRole role, choose it and add a new policy with the following content:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": "s3:GetObject",
                "Resource": "arn:aws:s3:::aws-ml-blog/artifacts/sagemaker-pipeline-blog-resources/*"
            }
        ]
    }

  1. Save your changes.

That’s it! Your environment is almost ready. You only need one more step and you can start training and deploying models in different accounts.

Importing the custom SageMaker Studio project template

In this step, you import your custom project template.

  1. Sign in to the console with the data science account.
  2. On the AWS Service Catalog console, under Administration, choose Portfolios.
  3. Choose Create a new portfolio.
  4. Name the portfolio SageMaker Organization Templates.
  5. Download the following template to your computer.
  6. Choose the new portfolio.
  7. Choose Upload a new product.
  8. For Product name¸ enter Multi Account Deployment.
  9. For Description, enter Multi account deployment project.
  10. For Owner, enter your name.
  11. Under Version details, for Method, choose Use a template file.
  12. Choose Upload a template.
  13. Upload the template you downloaded.
  14. For Version title, choose 1.0.

The remaining parameters are optional.

  1. Choose Review.
  2. Review your settings and choose Create product.
  3. Choose Refresh to list the new product.
  4. Choose the product you just created.
  5. On the Tags tab, add the following tag to the product:
    1. Keysagemaker:studio-visibility
    2. ValueTrue

Back in the portfolio details, you see something similar to the following screenshot (with different IDs).

Back in the portfolio details, you see something similar to the following screenshot (with different IDs).

  1. On the Constraints tab, choose Create constraint.
  2. For Product, choose Multi Account Deployment (the product you just created).
  3. For Constraint type, choose Launch.
  4. Under Launch Constraint, for Method, choose Select IAM role.
  5. Choose AmazonSageMakerServiceCatalogProductsLaunchRole.
  6. Choose Create.
  7. On the Groups, roles, and users tab, choose Add groups, roles, users.
  8. On the Roles tab, select the role you used when configuring your SageMaker Studio domain.
  9. Choose Add access.

If you don’t remember which role you selected, in your data science account, go to the SageMaker console and choose Amazon SageMaker Studio. In the Studio Summary section, locate the attribute Execution role. Search for the name of this role in the previous step.

You’re done! Now it’s time to create a project using this template.

Creating your project

In the previous sections, you prepared the multi-account environment. The next step is to create a project using your new template.

  1. Sign in to the console with the data science account.
  2. On the SageMaker console, open SageMaker Studio with your user.
  3. Choose the Components and registries
  4. On the drop-down menu, choose Projects.
  5. Choose Create project.

Choose Create project.

On the Create project page, SageMaker templates is chosen by default. This option lists the built-in templates. However, you want to use the template you prepared for the multi-account deployment.

  1. Choose Organization templates.
  2. Choose Multi Account Deployment.
  3. Choose Select project template.

If you can’t see the template, make sure you completed all the steps correctly in the previous section.

If you can’t see the template, make sure you completed all the steps correctly in the previous section.

  1. In the Project details section, for Name, enter iris-multi-01.

The project name must have 15 characters or fewer.

  1. In the Project template parameters, use the names of the roles you created in each target account (staging and production) and provide the following properties:
    1. SageMakerExecutionRoleStagingName
    2. SageMakerExecutionRoleProdName
  2. Retrieve the OU IDs you created earlier for the staging and production OUs and provide the following properties:
    1. OrganizationalUnitStagingId
    2. OrganizationalUnitProdId
  3. Choose Create project.

Choose Create project.

Provisioning all the resources takes a few minutes, after which the project is listed in the Projects section. When you choose the project, a tab opens with the project’s metadata. The Model groups tab chows a model group with the same name as your project. It was also created during the project provisioning.

Provisioning all the resources takes a few minutes, after which the project is listed in the Projects section.

The environment is now ready for the data scientist to start training the model.

Training a model

Now that your project is ready, it’s time to train a model.

  1. Download the example notebook to use for this walkthrough.
  2. Choose the Folder icon to change the work area to file management.
  3. Choose the Create folder
  4. Enter a name for the folder.
  5. Choose the folder name.
  6. Choose the Upload file
  7. Choose the Jupyter notebook you downloaded and upload it to the new directory.
  8. Choose the notebook to open a new tab.

Choose the notebook to open a new tab.

You’re prompted to choose a kernel.

  1. Choose Python3 (Data Science).
  2. Choose Select.

Choose Select.

  1. In the second cell of the notebook, replace the project_name variable with the name you gave your project (for this post, iris-multi-01).

You can now run the Jupyter notebook. This notebook creates a very simple pipeline with only two steps: train and register model. It uses the iris dataset and the XGBoost built-in container as the algorithm.

  1. Run the whole notebook.

The process takes some time after you run the cell containing the following code:

start_response = pipeline.start(parameters={
    "TrainingInstanceCount": "1"
})

This starts the training job, which takes approximately 3 minutes to complete. After the training is finished, the next cell of the Jupyter notebook gets the latest version of the model in the model registry and marks it as Approved. Alternatively, you can approve a model from the SageMaker Studio UI. On the Model groups tab, choose the model group and desired version. Choose Update status and Approve before saving.

Choose Update status and Approve before saving

This is the end of the data scientist’s job but the beginning of running the CI/CD pipeline.

Amazon EventBridge monitors the model registry. The listener starts a new deployment job with the provisioned AWS CodePipeline workflow (created with you launched the SageMaker Studio project).

  1. On the CodePipeline console, choose the pipeline starting with the prefix sagemaker-, followed by the name of your project.

On the CodePipeline console, choose the pipeline starting with the prefix sagemaker-, followed by the name of your project.

Shortly after you approve your model, the deployment pipeline starts running. Wait for the pipeline to reach the state DeployStaging. That stage can take approximately 10 minutes to complete. After deploying the first endpoint in the staging account, the pipeline is tested, and then moves to the next step, ApproveDeployment. In this step, it waits for manual approval.

  1. Choose Review.
  2. Enter an approval reason in the text box.
  3. Choose Approve.

The model is now deployed in the production account.

You can also monitor the pipeline on the AWS CloudFormation console, to see the stacks and stack sets the pipeline creates to deploy endpoints in the target accounts. To see the deployed endpoints for each account, sign in to the SageMaker console as either the staging account or production account and choose Endpoints on the navigation pane.

Cleaning up

To clean up all the resources you provisioned in this example, complete the following steps:

  1. Sign in to the console with your main account.
  2. On the AWS CloudFormation console, click on StackSets and delete the following items (endpoints):
    1. Prod sagemaker-<<sagemaker-project-name>>-<<project-id>>-deploy-prod
    2. Stagingsagemaker-<<sagemaker-project-name>>-<<project-id>>-deploy-staging
  3. In your laptop or workstation terminal, use the AWS Command Line Interface (AWS CLI) and enter the following code to delete your project:
    aws sagemaker delete-project --project-name iris-multi-01

Make sure you’re using the latest version of the AWS CLI.

Building and customizing a template for your own SageMaker project

SageMaker projects and SageMaker MLOps project templates are powerful features that you can use to automatically create and configure the whole infrastructure required to train, optimize, evaluate, and deploy ML models. A SageMaker project is an AWS Service Catalog provisioned product that enables you to easily create an end-to-end ML solution. For more information, see the AWS Service Catalog Administrator Guide.

A product is a CloudFormation template managed by AWS Service Catalog. For more information about templates and their requirements, see AWS CloudFormation template formats.

ML engineers can design multiple environments and express all the details of this setup as a CloudFormation template, using the concept of infrastructure as code (IaC). You can also integrate these different environments and tasks using a CI/CD pipeline. SageMaker projects provide an easy, secure, and straightforward way of wrapping the infrastructure complexity around in the format of a simple project, which can be launched many times by the other ML engineers and data scientists.

The following diagram illustrates the main steps you need to complete in order to create and publish your custom SageMaker project template.

The following diagram illustrates the main steps you need to complete in order to create and publish your custom SageMaker project template.

We described these steps in more detail in the sections Importing the custom SageMaker Studio Project template and Creating your project.

As an ML engineer, you can design and create a new CloudFormation template for the project, prepare an AWS Service Catalog portfolio, and add a new product to it.

Both data scientists and ML engineers can use SageMaker Studio to create a new project with the custom template. SageMaker invokes AWS Service Catalog and starts provisioning the infrastructure described in the CloudFormation template.

As a data scientist, you can now start training the model. After you register it in the model registry, the CI/CD pipeline runs automatically and deploys the model on the target accounts.

If you look at the CloudFormation template from this post in a text editor, you can see that it implements the architecture we outline in this post.

The following code is a snippet of the template:

Description: Toolchain template which provides the resources needed to represent infrastructure as code.
  This template specifically creates a CI/CD pipeline to deploy a given inference image and pretrained Model to two stages in CD -- staging and production.
Parameters:
  SageMakerProjectName:
    Type: String
  SageMakerProjectId:
    Type: String
…
<<other parameters>>
…
Resources:
  MlOpsArtifactsBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Properties:
      BucketName: …
…
  ModelDeployCodeCommitRepository:
    Type: AWS::CodeCommit::Repository
    Properties:
      RepositoryName: …
      RepositoryDescription: …
      Code:
        S3:
          Bucket: …
          Key: …
…
  ModelDeployBuildProject:
    Type: AWS::CodeBuild::Project
…
  ModelDeployPipeline:
    Type: AWS::CodePipeline::Pipeline
…

The template has two key sections: Parameters (input parameters of the template) and Resources. SageMaker project templates require that you add two input parameters to your template: SageMakerProjectName and SageMakerProjectId. These parameters are used internally by SageMaker Studio. You can add other parameters if needed.

In the Resources section of the snippet, you can see that it creates the following:

  • A new S3 bucket used by the CI/CD pipeline to store the intermediary artifacts passed from one stage to another.
  • An AWS CodeCommit repository to store the artifacts used during the deployment and testing stages.
  • An AWS CodeBuild project to get the artifacts, and validate and configure them for the project. In the multi-account template, this project also creates a new model registry, used by the CI/CD pipeline to deploy new models.
  • A CodePipeline workflow that orchestrates all the steps of the CI/CD pipelines.

Each time you register a new model to the model registry or push a new artifact to the CodeCommit repo, this CodePipeline workflow starts. These events are captured by an EventBridge rule, provisioned by the same template. The CI/CD pipeline contains the following stages:

  • Source – Reads the artifacts from the CodeCommit repository and shares with the other steps.
  • Build – Runs the CodeBuild project to do the following:
    • Verify if a model registry is already created, and create one if needed.
    • Prepare a new CloudFormation template that is used by the next two deployment stages.
  • DeployStaging – Contains the following components:
    • DeployResourcesStaging – Gets the CloudFormation template prepared in the Build step and deploys a new stack. This stack deploys a new SageMaker endpoint in the target account.
    • TestStaging – Invokes a second CodeBuild project that runs a custom Python script that tests the deployed endpoint.
    • ApproveDeployment – A manual approval step. If approved, it moves to the next stage to deploy an endpoint in production, or ends the workflow if not approved.
  • DeployProd – Similar to DeployStaging, it uses the same CloudFormation template but with different input parameters. It deploys a new SageMaker endpoint in the production account. 

You can start a new training process and register your model to the model registry associated with the SageMaker project. Use the Jupyter notebook provided in this post and customize your own ML pipeline to prepare your dataset and train, optimize, and test your models before deploying them. For more information about these features, see Automate MLOps with SageMaker Projects. For more Pipelines examples, see the GitHub repo.

Conclusions and next steps

In this post, you saw how to prepare your own environment to train and deploy ML models in multiple AWS accounts by using SageMaker Pipelines.

With SageMaker projects, the governance and security of your environment can be significantly improved if you start managing your ML projects as a library of SageMaker project templates.

As a next step, try to modify the SageMaker project template and customize it to address your organization’s needs. Add as many steps as you want and keep in mind that you can capture the CI/CD events and notify users or call other services to build comprehensive solutions.


About the Author

Samir Araújo is an AI/ML Solutions Architect at AWS. He helps customers creating AI/ML solutions solve their business challenges using the AWS platform. He has been working on several AI/ML projects related to computer vision, natural language processing, forecasting, ML at the edge, and more. He likes playing with hardware and automation projects in his free time, and he has a particular interest for robotics.

Read More

Redacting PII from application log output with Amazon Comprehend

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning (ML) to find insights and relationships in text. The service can extract people, places, sentiments, and topics in unstructured data. You can now use Amazon Comprehend ML capabilities to detect and redact personally identifiable information (PII) in application logs, customer emails, support tickets, and more. No ML experience required. Redacting PII entities helps you protect privacy and comply with local laws and regulations.

Use case: Applications printing PII data in log output

Some applications print PII data in their log output inadvertently. In some cases, this may be due to developers forgetting to remove debug statements before deploying the application in production, and in other cases it may be due to legacy applications that are handed down and are difficult to update. PII can also get printed in stack traces. It’s generally a mistake to have PII present in such logs. Correlation IDs and primary keys are better identifiers than PII when debugging applications.

PII in application logs can quickly propagate to downstream systems, compounding security concerns. For example it may get submitted to search and analytics systems where it’s searchable and viewable by everyone. It may also be stored in object storage such as Amazon Simple Storage Service (Amazon S3) for analytics purposes. With the PII detection API of Amazon Comprehend, you can remove PII from application log output before such a log statement even gets printed.

In this post, I take the use case of a Java application that is generating log output with PII. The initial log output goes through filter-like processing that redacts PII before the log statement is output by the application. You can take a similar approach for other programming languages.

The application can be repackaged by changing its log format file, such as log4j.xml, and adding one Java class from this sample project, or adding this Java class as a dependency in the form of a .jar file.

The sample application is available in the following GitHub repo.

PII entity types

The following table lists some of the entity types Amazon Comprehend detects.

PII Entity Types Description
EMAIL An email address, such as marymajor@email.com.
NAME An individual’s name. This entity type does not include titles, such as Mr., Mrs., Miss, or Dr. Amazon Comprehend does not apply this entity type to names that are part of organizations or addresses. For example, Amazon Comprehend recognizes the “John Doe Organization” as an organization, and it recognizes “Jane Doe Street” as an address.
PHONE A phone number. This entity type also includes fax and pager numbers.
SSN A Social Security Number (SSN) is a 9-digit number that is issued to US citizens, permanent residents, and temporary working residents. Amazon Comprehend also recognizes Social Security Numbers when only the last 4 digits are present.

For the full list, see Detect Personally Identifiable Information (PII).

The API response from Amazon Comprehend includes the entity type, its begin offset, end offset, and a confidence score. For this post, we use all of them.

Application overview

Our example application is a very simple application that simulates opening a bank account for a user. In its current form, the log output looks like the following code. We can see this by making requests to the endpoint payment:

curl localhost:8080/payment
2020-09-29T10:29:04,115 INFO [http-nio-8080-exec-1] c.e.l.c.PaymentController: Processing user User(name=Terina, ssn=626031641, email=mel.swift@Taylor.com, description=Ea minima omnis autem illo.)
2020-09-29T10:29:04,711 INFO [http-nio-8080-exec-2] c.e.l.c.PaymentController: User Napoleon, SSN 366435036, opened an account
2020-09-29T10:29:05,253 INFO [http-nio-8080-exec-4] c.e.l.c.PaymentController: User Cristen, SSN 197961488, opened an account
2020-09-29T10:29:05,673 INFO [http-nio-8080-exec-5] c.e.l.c.PaymentController: Processing user User(name=Giuseppe, ssn=713425581, email=elijah.dach@Shawnna.com, description=Impedit asperiores in magnam exercitationem.)

The output prints Name, SSN, and Email. This PII data is being generated by the java-faker library, which is a Java port of the well-known Ruby gem. See the following code:

        <dependency>
            <groupId>com.github.javafaker</groupId>
            <artifactId>javafaker</artifactId>
            <version>1.0.2</version>
        </dependency>

Log4j 2

Log4j 2 is a common Java library used for logging. Appenders in Log4j are responsible for delivering log events to their destinations, which can be console, file, and more. Log4j also has a RewriteAppender that lets you rewrite the log message before it is output. RewriteAppender works in conjunction with a RewritePolicy that provides the implementation for changing the log output.

The sample application uses the following log4j.xml file for log configuration:

<?xml version="1.0" encoding="UTF-8"?>
<!-- status="trace" -->
<Configuration packages="com.example.logging">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout
                    pattern="%style{%d{ISO8601}}{black} %highlight{%-5level }[%style{%t}{bright,blue}] %style{%C{1.}}{bright,yellow}: %msg%n%throwable" />
        </Console>
        <Rewrite name="Rewrite">
            <SensitiveDataPolicy
                    maskMode="MASK"
                    mask="*"
                    minScore="0.9"
                    entitiesToReplace="SSN,EMAIL"
            />
            <AppenderRef ref="Console" />
        </Rewrite>
    </Appenders>

    <Loggers>
        <!-- LOG everything at INFO level -->
        <Root level="info">
            <AppenderRef ref="Console" />
        </Root>
        <!-- LOG "com.example*" at DEBUG level -->
        <Logger name="com.example" level="debug" additivity="false">
            <AppenderRef ref="Rewrite" />
        </Logger>
    </Loggers>

</Configuration>

SensitiveDataPolicy

The Log4j RewritePolicy we created for this project is named SensitiveDataPolicy. It uses four parameters:

  • maskMode – This parameter has two modes:
    • REPLACE – The policy replaces discovered entities with their type names. For example, in case of social security numbers, the replaced string is [SSN].
    • MASK – The policy replaces the discovered entity with a string consisting of the character provided as a mask parameter.
  • mask – The character to use to replace the discovered entity with. Only relevant if maskMode is MASK.
  • minScore – The minimum confidence score acceptable to us.
  • entitiesToReplace – A comma-separated list of entity type names that we want to replace. For example, we’re choosing to replace social security number and email, so the string value we provide is SSN,EMAIL. Amazon Comprehend also detects NAME in our application, but it’s printed as is.

Choosing redaction vs. masking is a matter of preference. Redaction is usually preferred when the context needs to be preserved, such as in natural text, whereas masking is best for maintaining text length as well as structured data such as formatted files or key-value pairs.

Detecting PII is as simple as making an API call to Amazon Comprehend using the AWS SDK and providing the text to analyze:

        DetectPiiEntitiesRequest piiEntitiesRequest =
                DetectPiiEntitiesRequest.builder()
                        .languageCode("en")
                        .text(msg.getFormattedMessage())
                        .build();

        DetectPiiEntitiesResponse piiEntitiesResponse = comprehendClient.detectPiiEntities(piiEntitiesRequest);

Asynchronous logging

Because our policy makes synchronous calls to Amazon Comprehend for PII detection, we want this processing to happen asynchronously, outside of customer request loop, to avoid introducing latency. For instructions, see Asynchronous Loggers for Low-Latency Logging. We add the Disruptor library to our classpath by adding it to pom.xml:

        <dependency>
            <groupId>com.lmax</groupId>
            <artifactId>disruptor</artifactId>
            <version>3.4.2</version>
        </dependency>

We also need to set a system property. After we package our application with mvn package, we can run it as in the following code:

java -jar target/comprehend-logging.jar -Dlog4j2.contextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector

Updated log output

The log output from this application now looks like the following. We can see that SSN and Email are being suppressed.

2020-09-29T12:52:30,423 INFO  [http-nio-8080-exec-6] ?: User Willa, SSN *********, opened an account
2020-09-29T12:52:30,824 INFO  [http-nio-8080-exec-8] ?: User Vania, SSN *********, opened an account
2020-09-29T12:52:31,245 INFO  [http-nio-8080-exec-9] ?: Processing user User(name=Laronda, ssn=*********, email=******************************, description=Doloremque culpa iure dolore omnis.)
2020-09-29T12:52:31,637 INFO  [http-nio-8080-exec-1] ?: Processing user User(name=Tommye, ssn=*********, email=*************************, description=Corporis sed tempore.)

Conclusion

We learned how to use Amazon Comprehend to redact sensitive data natively within next-generation applications. For information about applying it as a postprocessing technique for logs in storage, see Detecting and redacting PII using Amazon Comprehend. The API lets you have complete control over the entities that are important for your use case and lets you either mask or redact the information.

For more information about Amazon Comprehend availability and quotas, see Amazon Comprehend endpoints and quotas.


About the Author

Pradeep Singh is a Solutions Architect at Amazon Web Services. He helps AWS customers take advantage of AWS services to design scalable and secure applications. His expertise spans Application Architecture, Containers, Analytics and Machine Learning.

 

Read More