Create summaries of recordings using generative AI with Amazon Bedrock and Amazon Transcribe

Create summaries of recordings using generative AI with Amazon Bedrock and Amazon Transcribe

Meeting notes are a crucial part of collaboration, yet they often fall through the cracks. Between leading discussions, listening closely, and typing notes, it’s easy for key information to slip away unrecorded. Even when notes are captured, they can be disorganized or illegible, rendering them useless.

In this post, we explore how to use Amazon Transcribe and Amazon Bedrock to automatically generate clean, concise summaries of video or audio recordings. Whether it’s an internal team meeting, conference session, or earnings call, this approach can help you distill hours of content down to salient points.

We walk through a solution to transcribe a project team meeting and summarize the key takeaways with Amazon Bedrock. We also discuss how you can customize this solution for other common scenarios like course lectures, interviews, and sales calls. Read on to simplify and automate your note-taking process.

Solution overview

By combining Amazon Transcribe and Amazon Bedrock, you can save time, capture insights, and enhance collaboration. Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward to add speech-to-text capability to applications. It uses advanced deep learning technologies to accurately transcribe audio into text. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API, along with a broad set of capabilities you need to build generative AI applications. With Amazon Bedrock, you can easily experiment with a variety of top FMs, and privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG).

The solution presented in this post is orchestrated using an AWS Step Functions state machine that is triggered when you upload a recording to the designated Amazon Simple Storage Service (Amazon S3) bucket. Step Functions lets you create serverless workflows to orchestrate and connect components across AWS services. It handles the underlying complexity so you can focus on application logic. It’s useful for coordinating tasks, distributed processing, ETL (extract, transform, and load), and business process automation.

The following diagram illustrates the high-level solution architecture.

The solution workflow includes the following steps:

  1. A user stores a recording in the S3 asset bucket.
  2. This action triggers the Step Functions transcription and summarization state machine.
  3. As part of the state machine, an AWS Lambda function is triggered, which transcribes the recording using Amazon Transcribe and stores the transcription in the asset bucket.
  4. A second Lambda function retrieves the transcription and generates a summary using the Anthropic Claude model in Amazon Bedrock.
  5. Lastly, a final Lambda function uses Amazon Simple Notification Service (Amazon SNS) to send a summary of the recording to the recipient.

This solution is supported in Regions where Anthropic Claude on Amazon Bedrock is available.

The state machine orchestrates the steps to perform the specific tasks. The following diagram illustrates the detailed process.

Prerequisites

Amazon Bedrock users need to request access to models before they are available for use. This is a one-time action. For this solution, you’ll need to enable access to the Anthropic Claude (not Anthropic Claude Instant) model in Amazon Bedrock. For more information, refer to Model access.

Deploy solution resources

The solution is deployed using an AWS CloudFormation template, found on the GitHub repo, to automatically provision the necessary resources in your AWS account. The template requires the following parameters:

  • Email address used to send summary – The summary will be sent to this address. You must acknowledge the initial Amazon SNS confirmation email before receiving additional notifications.
  • Summary instructions – These are the instructions given to the Amazon Bedrock model to generate the summary.

Run the solution

After you deploy the solution using AWS CloudFormation, complete the following steps:

  1. Acknowledge the Amazon SNS email confirmation that you should receive a few moments after creating the CloudFormation stack.
  2. On the AWS CloudFormation console, navigate to stack you just created.
  3. On the stack’s Outputs tab, and look for the value associated with AssetBucketName; it will look something like summary-generator-assetbucket-xxxxxxxxxxxxx.
  4. On the Amazon S3 console, navigate to your asset bucket.

This is where you’ll upload your recordings. Valid file formats are MP3, MP4, WAV, FLAC, AMR, OGG, and WebM.

  1. Upload your recording to the recordings folder.

Uploading recordings will automatically trigger the Step Functions state machine. For this example, we use a sample team meeting recording in the sample-recording directory of the GitHub repository.

  1. On the Step Functions console, navigate to the summary-generator state machine.
  2. Choose the name of the state machine run with the status Running.

Here, you can watch the progress of the state machine as it processes the recording.

  1. After it reaches its Success state, you should receive an emailed summary of the recording.

Alternatively, you can navigate to the S3 assets bucket and view the transcript there in the transcripts folder.

Review the summary

You will get the recording summary emailed to the address you provided when you created the CloudFormation stack. If you don’t receive the email in a few moments, make sure that you acknowledged the Amazon SNS confirmation email that you should have received after you created the stack and then upload the recording again, which will trigger the summary process.

This solution includes a mock team meeting recording that you can use to test the solution. The summary will look similar to the following example. Because of the nature of generative AI, however, your output will look a bit different, but the content should be close.

Here are the key points from the standup:

  • Joe finished reviewing the current state for task EDU1 and created a new task to develop the future state. That new task is in the backlog to be prioritized. He’s now starting EDU2 but is blocked on resource selection.
  • Rob created a tagging strategy for SLG1 based on best practices, but may need to coordinate with other teams who have created their own strategies, to align on a uniform approach. A new task was created to coordinate tagging strategies.
  • Rob has made progress debugging for SLG2 but may need additional help. This task will be moved to Sprint 2 to allow time to get extra resources.

Next Steps:

  • Joe to continue working on EDU2 as able until resource selection is decided
  • New task to be prioritized to coordinate tagging strategies across teams
  • SLG2 moved to Sprint 2
  • Standups moving to Mondays starting next week

Expand the solution

Now that you have a working solution, here are some potential ideas to customize the solution for your specific use cases:

  • Try altering the process to fit your available source content and desired outputs:
    • For situations where transcripts are available, create an alternate Step Functions workflow to ingest existing text-based or PDF-based transcriptions.
    • Instead of using Amazon SNS to notify recipients via email, you can use it to send the output to a different endpoint, such as a team collaboration site, or to the team’s chat channel.
  • Try changing the summary instructions CloudFormation stack parameter provided to Amazon Bedrock to produce outputs specific to your use case (this is the generative AI prompt):
    • When summarizing a company’s earnings call, you could have the model focus on potential promising opportunities, areas of concern, and things that you should continue to monitor.
    • If you are using this to summarize a course lecture, the model could identify upcoming assignments, summarize key concepts, list facts, and filter out any small talk from the recording.
  • For the same recording, create different summaries for different audiences:
    • Engineers’ summaries focus on design decisions, technical challenges, and upcoming deliverables.
    • Project managers’ summaries focus on timelines, costs, deliverables, and action items.
    • Project sponsors get a brief update on project status and escalations.
    • For longer recordings, try generating summaries for different levels of interest and time commitment. For example, create a single sentence, single paragraph, single page, or in-depth summary. In addition to the prompt, you may want to adjust the max_tokens_to_sample parameter to accommodate different content lengths.

Clean up

To clean up the solution, delete the CloudFormation stack that you created earlier. Note that deleting the stack will not delete the asset bucket. If you no longer need the recordings or transcripts, you can delete this bucket separately. Amazon Transcribe will automatically delete transcription jobs after 90 days, but you can delete these manually before then.

Conclusion

In this post, we explored how to use Amazon Transcribe and Amazon Bedrock to automatically generate clean, concise summaries of video or audio recordings. We encourage you to continue evaluating Amazon Bedrock, Amazon Transcribe, and other AWS AI services, like Amazon Textract, Amazon Translate, and Amazon Rekognition, to see how they can help meet your business objectives.


About the Authors

Rob Barnes is a principal consultant for AWS Professional Services. He works with our customers to address security and compliance requirements at scale in complex, multi-account AWS environments through automation.

Jason Stehle is a Senior Solutions Architect at AWS, based in the New England area. He works with customers to align AWS capabilities with their greatest business challenges. Outside of work, he spends his time building things and watching comic book movies with his family.

Read More

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2. We use the AWS Neuron software development kit (SDK) to access the AWS Inferentia2 device and benefit from its high performance. We then use a large model inference container powered by Deep Java Library (DJLServing) as our model serving solution.

Solution overview

Efficient Fine-tuning Llama2 using QLoRa

The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Llama 2 was pre-trained on 2 trillion tokens of data from publicly available sources. AWS customers sometimes choose to fine-tune Llama 2 models using customers’ own data to achieve better performance for downstream tasks. However, due to Llama 2 model’s large number of parameters, full fine-tuning could be prohibitively expensive and time consuming. Parameter-Efficient Fine-Tuning (PEFT) approach can address this problem by only fine-tune a small number of extra model parameters while freezing most parameters of the pre-trained model. For more information on PEFT, one can read this post. In this post, we use QLoRa to fine-tune a Llama 2 7B model.

Deploy a fine-tuned Model on Inf2 using Amazon SageMaker

AWS Inferentia2 is purpose-built machine learning (ML) accelerator designed for inference workloads and delivers high-performance at up to 40% lower cost for generative AI and LLM workloads over other inference optimized instances on AWS. In this post, we use Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instance, featuring AWS Inferentia2, the second generation Inferentia2 accelerators, each containing two NeuronCores-v2. Each NeuronCore-v2 is an independent, heterogenous compute-unit, with four main engines: Tensor, Vector, Scalar, and GPSIMD engines. It includes an on-chip software-managed SRAM memory for maximizing data locality. Since several blogs on Inf2 has been published, the reader can refer to this post and our documentation for more information on Inf2.

To deploy models on Inf2, we need AWS Neuron SDK as the software layer running on top of the Inf2 hardware. AWS Neuron is the SDK used to run deep learning workloads on AWS Inferentia and AWS Trainium based instances. It enables end-to-end ML development lifecycle to build new models, train and optimize these models, and deploy them for production. AWS Neuron includes a deep learning compiler, runtime, and tools that are natively integrated with popular frameworks like TensorFlow and PyTorch. In this blog, we are going to use transformers-neuronx, which is part of the AWS Neuron SDK for transformer decoder inference workflows. It supports a range of popular models, including Llama 2.

To deploy models on Amazon SageMaker, we usually use a container that contains the required libraries, such as Neuron SDK and transformers-neuronx as well as the model serving component. Amazon SageMaker maintains deep learning containers (DLCs) with popular open source libraries for hosting large models. In this post, we use the Large Model Inference Container for Neuron. This container has everything you need to deploy your Llama 2 model on Inf2. For resources to get started with LMI on Amazon SageMaker, please refer to many of our existing posts (blog 1, blog 2, blog 3) on this topic. In short, you can run the container without writing any additional code. You can use the default handler for a seamless user experience and pass in one of the supported model names and any load time configurable parameters. This compiles and serve an LLM on an Inf2 instance. For example, to deploy OpenAssistant/llama2-13b-orca-8k-3319, you can provide the follow configuration (as serving.properties file). In serving.properties, we specify the model type as llama2-13b-orca-8k-3319, the batch size as 4, the tensor parallel degree as 2, and that is it. For the full list of configurable parameters, refer to All DJL configuration options.

# Engine to use: MXNet, PyTorch, TensorFlow, ONNX, PaddlePaddle, DeepSpeed, etc.
engine = Python 
# default handler for model serving
option.entryPoint = djl_python.transformers_neuronx
# The Hugging Face ID of a model or the s3 url of the model artifacts. 
option.model_id = meta-llama/Llama-2-7b-chat-hf
#the dynamic batch size, default is 1.
option.batch_size=4
# This option specifies number of tensor parallel partitions performed on the model.
option.tensor_parallel_degree=2
# The input sequence length
option.n_positions=512
#Enable iteration level batching using one of "auto", "scheduler", "lmi-dist"
option.rolling_batch=auto
# The data type to which you plan to cast the model default
option.dtype=fp16
# worker load model timeout
option.model_loading_timeout=1500

Alternatively, you can write your own model handler file as shown in this example, but that requires implementing the model loading and inference methods to serve as a bridge between the DJLServing APIs.

Prerequisites

The following list outlines the prerequisites for deploying the model described in this blog post. You can implement either from the AWS Management Console or using the latest version of the AWS Command Line Interface (AWS CLI).

Walkthrough

In the following section, we’ll walkthrough the code in two parts:

  1. Fine-tuning a Llama2-7b model, and upload the model artifacts to a specified Amazon S3 bucket location.
  2. Deploy the model into an Inferentia2 using DJL serving container hosted in Amazon SageMaker.

The complete code samples with instructions can be found in this GitHub repository.

Part 1: Fine-tune a Llama2-7b model using PEFT

We are going to use the recently introduced method in the paper QLoRA: Quantization-aware Low-Rank Adapter Tuning for Language Generation by Tim Dettmers et al. QLoRA is a new technique to reduce the memory footprint of large language models during fine-tuning, without sacrificing performance.

Note: The fine-tuning of llama2-7b model shown in the following was tested on an Amazon SageMaker Studio Notebook with Python 2.0 GPU Optimized Kernel using a ml.g5.2xlarge instance type. As a best practice, we recommend using an Amazon SageMaker Studio Integrated Development Environment (IDE) launched in your own Amazon Virtual Private Cloud (Amazon VPC). This allows you to control, monitor, and inspect network traffic within and outside your VPC using standard AWS networking and security capabilities. For more information, see Securing Amazon SageMaker Studio connectivity using a private VPC.

Quantize the base model

We first load a quantized model with 4-bit quantization using Huggingface transformers library as follows:

# The base pretrained model for fine-tuning
model_name = "NousResearch/Llama-2-7b-chat-hf"

# The instruction dataset to use
dataset_name = "mlabonne/guanaco-llama2-1k"

#Activate 4-bit precision base model loading
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False

compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
load_in_4bit=use_4bit,
bnb_4bit_quant_type=bnb_4bit_quant_type,
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=use_nested_quant,
)

# Load base model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map=device_map
)
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

Load training dataset

Next, we load the dataset to feed the model for fine-tuning step shown as followed:

# Load dataset (you can process it here)
dataset = load_dataset(dataset_name, split="train")

Attach an adapter layer

Here we attach a small, trainable adapter layer, configured as LoraConfig defined in the Hugging Face’s peft library.

# include linear layers to apply LoRA to.
modules = find_all_linear_names(model)

## Setting up LoRA configuration
lora_r = 64

# Alpha parameter for LoRA scaling
lora_alpha = 16

# Dropout probability for LoRA layers
lora_dropout = 0.1

peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_r,
bias="none",
task_type="CAUSAL_LM",
target_modules=modules)

Train a model

Using the LoRA configuration shown above, we’ll fine-tune the Llama2 model along with hyper-parameters. A code snippet for training the model is shown in the following:

# Set training parameters
training_arguments = TrainingArguments(...)

trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config, # LoRA config
dataset_text_field="text",
max_seq_length=max_seq_length,
tokenizer=tokenizer,
args=training_arguments,
packing=packing,
)

# Train model
trainer.train()

# Save trained model
trainer.model.save_pretrained(new_model)

Merge model weight

The fine-tuned model executed above created a new model containing the trained LoRA adapter weights. In the following code snippet, we’ll merge the adapter with the base model so that we could use the fine-tuned model for inference.

# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
model_name,
low_cpu_mem_usage=True,
return_dict=True,
torch_dtype=torch.float16,
device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

save_dir = "merged_model"
model.save_pretrained(save_dir, safe_serialization=True, max_shard_size="2GB")

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
tokenizer.save_pretrained(save_dir)

Upload model weight to Amazon S3

In the final step of part 1, we’ll save the merged model weights to a specified Amazon S3 location. The model weight will be used by a model serving container in Amazon SageMaker to host the model using an Inferentia2 instance.

model_data_s3_location = "s3://<bucket_name>/<prefix>/"
!cd {save_dir} && aws s3 cp —recursive . {model_data_s3_location}

Part 2: Host QLoRA model for inference with AWS Inf2 using SageMaker LMI Container

In this section, we’ll walk through the steps of deploying a QLoRA fine-tuned model into an Amazon SageMaker hosting environment. We’ll use a DJL serving container from SageMaker DLC, which integrates with the transformers-neuronx library to host this model. The setup facilitates the loading of models onto AWS Inferentia2 accelerators, parallelizes the model across multiple NeuronCores, and enables serving via HTTP endpoints.

Prepare model artifacts

DJL supports many deep learning optimization libraries, including DeepSpeed, FasterTransformer and more. For model specific configurations, we provide a serving.properties with key parameters, such as tensor_parallel_degree and model_id to define the model loading options. The model_id could be a Hugging Face model ID, or an Amazon S3 path where the model weights are stored. In our example, we provide the Amazon S3 location of our fine-tuned model. The following code snippet shows the properties used for the model serving:

%%writefile serving.properties
engine=Python
option.entryPoint=djl_python.transformers_neuronx
option.model_id=<model data s3 location>
option.batch_size=4
option.neuron_optimize_level=2
option.tensor_parallel_degree=8
option.n_positions=512
option.rolling_batch=auto
option.dtype=fp16
option.model_loading_timeout=1500

Please refer to this documentation for more information about the configurable options available via serving.properties. Please note that we use option.n_position=512 in this blog for faster AWS Neuron compilation. If you want to try larger input token length, then we recommend the reader to pre-compile the model ahead of time (see AOT Pre-Compile Model on EC2). Otherwise, you might run into timeout error if the compilation time is too much.

After the serving.properties file is defined, we’ll package the file into a tar.gz format, as follows:

%%sh
mkdir mymodel
mv serving.properties mymodel/
tar czvf mymodel.tar.gz mymodel/
rm -rf mymodel

Then, we’ll upload the tar.gz to an Amazon S3 bucket location:

s3_code_prefix = "large-model-lmi/code"
bucket = sess.default_bucket()  # bucket to house artifacts
code_artifact = sess.upload_data("mymodel.tar.gz", bucket, s3_code_prefix)
print(f"S3 Code or Model tar ball uploaded to --- > {code_artifact}")

Create an Amazon SageMaker model endpoint

To use an Inf2 instance for serving, we use an Amazon SageMaker LMI container with DJL neuronX support. Please refer to this post for more information about using a DJL NeuronX container for inference. The following code shows how to deploy a model using Amazon SageMaker Python SDK:

# Retrieves the DJL-neuronx docker image URI
image_uri = image_uris.retrieve(
framework="djl-neuronx",
region=sess.boto_session.region_name,
version="0.24.0"
)

# Define inf2 instance type to use for serving
instance_type = "ml.inf2.48xlarge"

endpoint_name = sagemaker.utils.name_from_base("lmi-model")

# Deploy the model for inference
model.deploy(initial_instance_count=1,
instance_type=instance_type,
container_startup_health_check_timeout=1500,
volume_size=256,
endpoint_name=endpoint_name)

# our requests and responses will be in json format so we specify the serializer and the deserializer
predictor = sagemaker.Predictor(
endpoint_name=endpoint_name,
sagemaker_session=sess,
serializer=serializers.JSONSerializer(),
)

Test model endpoint

After the model is deployed successfully, we can validate the endpoint by sending a sample request to the predictor:

prompt="What is machine learning?"
input_data = f"<s>[INST] <<SYS>>nAs a data scientistn<</SYS>>n{prompt} [/INST]"

response = predictor.predict(
{"inputs": input_data, "parameters": {"max_new_tokens":300, "do_sample":"True"}}
)

print(json.loads(response)['generated_text'])

The sample output is shown as follows:

In the context of data analysis, Machine Learning (ML) refers to a statistical technique capable of extracting predictive power from a dataset with an increasing complexity and accuracy by iteratively narrowing down the scope of a statistic.

Machine Learning is not a new statistical technique, but rather a combination of existing techniques. Furthermore, it has not been designed to be used with a specific dataset or to produce a specific outcome. Rather, it was designed to be flexible enough to adapt to any dataset and to make predictions about any outcome.

Clean up

If you decide that you no longer want to keep the SageMaker endpoint running, you can delete it using AWS SDK for Python (boto3), AWS CLI or Amazon SageMaker Console. Additionally, you can also shutdown the Amazon SageMaker Studio Resources that are no longer required.

Conclusion

In this post, we showed you how to fine-tune a Llama2-7b model using LoRA adaptor with 4-bit quantization using a single GPU instance. Then we deployed the model to an Inf2 instance hosted in Amazon SageMaker using a DJL serving container. Finally, we validated the Amazon SageMaker model endpoint with a text generation prediction using the SageMaker Python SDK. Go ahead and give it a try, we love to hear your feedback. Stay tuned for updates on more capabilities and new innovations with AWS Inferentia.

For more examples about AWS Neuron, see aws-neuron-samples.


About the Authors

Wei Teh is a Senior AI/ML Specialist Solutions Architect at AWS. He is passionate about helping customers advance their AWS journey, focusing on Amazon Machine Learning services and machine learning-based solutions. Outside of work, he enjoys outdoor activities like camping, fishing, and hiking with his family.

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Read More

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Build an end-to-end MLOps pipeline using Amazon SageMaker Pipelines, GitHub, and GitHub Actions

Machine learning (ML) models do not operate in isolation. To deliver value, they must integrate into existing production systems and infrastructure, which necessitates considering the entire ML lifecycle during design and development. ML operations, known as MLOps, focus on streamlining, automating, and monitoring ML models throughout their lifecycle. Building a robust MLOps pipeline demands cross-functional collaboration. Data scientists, ML engineers, IT staff, and DevOps teams must work together to operationalize models from research to deployment and maintenance. With the right processes and tools, MLOps enables organizations to reliably and efficiently adopt ML across their teams.

Although the requirements of continuous integration and continuous delivery (CI/CD) pipelines can be unique and reflect each organization’s needs, scaling MLOps practices across teams can be simplified by using managed orchestrations and tools that can accelerate the development process and remove the undifferentiated heavy lifting.

Amazon SageMaker MLOps is a suite of features that includes Amazon SageMaker Projects (CI/CD), Amazon SageMaker Pipelines and Amazon SageMaker Model Registry.

SageMaker Pipelines allows for straightforward creation and management of ML workflows, while also offering storage and reuse capabilities for workflow steps. The SageMaker Model Registry centralizes model tracking, simplifying model deployment. SageMaker Projects introduces CI/CD practices to ML, including environment parity, version control, testing, and automation. This allows for a quick establishment of CI/CD in your ML environment, facilitating effective scalability throughout your enterprise.

The built-in project templates provided by Amazon SageMaker include integration with some of third-party tools, such as Jenkins for orchestration and GitHub for source control, and several utilize AWS native CI/CD tools such as AWS CodeCommit, AWS CodePipeline, and AWS CodeBuild. In many scenarios, however, customers would like to integrate SageMaker Pipelines with other existing CI/CD tools and therefore, create their custom project templates.

In this post, we show you a step-by-step implementation to achieve the following:

  • Create a custom SageMaker MLOps project template that integrates with GitHub and GitHub Actions
  • Make your custom project templates available in Amazon SageMaker Studio for your data science team with one-click provisioning

Solution overview

In this post, we construct the following architecture. We create an automated model build pipeline that includes steps for data preparation, model training, model evaluation, and registration of the trained model in the SageMaker Model Registry. The resulting trained ML model is then deployed from the SageMaker Model Registry to staging and production environments upon manual approval.

Solution Overview

Let’s delve into the elements of this architecture to understand the complete configuration.

GitHub and GitHub Actions

GitHub is a web-based platform that provides version control and source code management using Git. It enables teams to collaborate on software development projects, track changes, and manage code repositories. GitHub serves as a centralized location to store, version, and manage your ML code base. This ensures that your ML code base and pipelines are versioned, documented, and accessible by team members.

GitHub Actions is a powerful automation tool within the GitHub ecosystem. It allows you to create custom workflows that automate your software development lifecycle processes, such as building, testing, and deploying code. You can create event-driven workflows triggered by specific events, like when code is pushed to a repository or a pull request is created. When implementing MLOps, you can use GitHub Actions to automate various stages of the ML pipeline, such as:

  • Data validation and preprocessing
  • Model training and evaluation
  • Model deployment and monitoring
  • CI/CD for ML models

With GitHub Actions, you can streamline your ML workflows and ensure that your models are consistently built, tested, and deployed, leading to more efficient and reliable ML deployments.

In the following sections, we start by setting up the prerequisites relating to some of the components that we use as part of this architecture:

  • AWS CloudFormationAWS CloudFormation initiates the model deployment and establishes the SageMaker endpoints after the model deployment pipeline is activated by the approval of the trained model.
  • AWS CodeStar connection – We use AWS CodeStar to establish a link with the GitHub repository and utilize it as code repo integration with AWS resources, like SageMaker Studio.
  • Amazon EventBridgeAmazon EventBridge keeps track of all modifications to the model registry. It also maintains a rule that prompts the Lambda function to deploy the model pipeline when the status of the model package version changes from PendingManualApproval to Approved within the model registry.
  • AWS Lambda – We use an AWS Lambda function to initiate the model deployment workflow in GitHub Actions after a new model is registered in the model registry.
  • Amazon SageMaker – We configure the following SageMaker components:
    • Pipeline – This component consists of a directed acyclic graph (DAG) that helps us build the automated ML workflow for the stages of data preparation, model training, and model evaluation. The model registry maintains records of model versions, their associated artifacts, lineage, and metadata. A model package group is established that houses all related model versions. The model registry is also responsible for managing the approval status of the model version for subsequent deployment.
    • Endpoint – This component sets up two HTTPS real-time endpoints for inference. The hosting configuration can be adjusted, for instance, for batch transform or asynchronous inference. The staging endpoint is generated when the model deployment pipeline is activated by the approval of the trained model from the SageMaker Model Registry. This endpoint is utilized to validate the deployed model by ensuring it provides predictions that satisfy our accuracy standards. When the model is prepared for production deployment, a production endpoint is deployed by a manual approval stage in the GitHub Actions workflow.
    • Code repository – This creates a Git repository as a resource in your SageMaker account. Using the existing data from the GitHub code repository that you input during the creation of your SageMaker project, an association with the same repository is established in SageMaker when you initiate the project. This essentially forms a link with a GitHub repository in SageMaker, enabling interactive actions (pull/push) with your repository.
    • Model registry – This monitors the various versions of the model and the corresponding artifacts, which includes lineage and metadata. A collection known as a model package group is created, housing related versions of the model. Moreover, the model registry oversees the approval status of the model version, ensuring its readiness for subsequent deployment.
  • AWS Secrets Manager – To securely preserve your GitHub personal access token, it’s necessary to establish a secret in AWS Secrets Manager and house your access token within it.
  • AWS Service Catalog – We use the AWS Service Catalog for the implementation of SageMaker projects, which include components like a SageMaker code repository, Lambda function, EventBridge rule, artifact S3 bucket, etc., all implemented via CloudFormation. This allows your organization to use project templates repeatedly, allocate projects to each user, and streamline operations.
  • Amazon S3 – We use an Amazon Simple Storage Service (Amazon S3) bucket to keep the model artifacts produced by the pipeline.

Prerequisites

You should have the following prerequisites:

You must also complete additional setup steps before implementing the solution.

Set up an AWS CodeStar connection

If you don’t already have an AWS CodeStar connection to your GitHub account, refer to Create a connection to GitHub for instructions to create one. Your AWS CodeStar connection ARN will look like this:

arn:aws:codestar-connections:us-west-2:account_id:connection/aEXAMPLE-8aad-4d5d-8878-dfcab0bc441f

In this example, aEXAMPLE-8aad-4d5d-8878-dfcab0bc441f is the unique ID for this connection. We use this ID when we create our SageMaker project later in this example.

Set up secret access keys for your GitHub token

To securely store your GitHub personal access token, you need to create a secret in Secrets Manager. If you don’t have a personal access token for GitHub, refer to Managing your personal access tokens for instructions to create one.

You can create either a classic or fine-grained access token. However, make sure that the token has access to the repository’s contents and actions (workflows, runs, and artifacts).

Complete the following steps to store your token in Secrets Manager:

  1. On the Secrets Manager console, choose Store a new secret.
  2. Select Other type of secret for Choose secret type.
  3. Provide a name for your secret in the Key field and add your personal access token to the corresponding Value field.
  4. Choose Next, enter a name for your secret, and choose Next again.
  5. Choose Store to save your secret.

By storing your GitHub personal access token in Secrets Manager, you can securely access it within your MLOps pipeline while ensuring its confidentiality.

Create an IAM user for GitHub Actions

To allow GitHub Actions to deploy SageMaker endpoints in your AWS environment, you need to create an AWS Identity and Access Management (IAM) user and grant it the necessary permissions. For instructions, refer to Creating an IAM user in your AWS account. Use the iam/GithubActionsMLOpsExecutionPolicy.json file (provided in the code sample) to provide sufficient permissions for this user to deploy your endpoints.

After you create the IAM user, generate an access key. You will use this key, which consists of both an access key ID and a secret access key, in the subsequent step when configuring your GitHub secrets.

Set up your GitHub account

The following are the steps to prepare your GitHub account to run this example.

Clone the GitHub repository

You can reuse an existing GitHub repo for this example. However, it’s easier if you create a new repository. This repository is going to contain all the source code for both SageMaker pipeline builds and deployments.

Copy the contents of the seed code directory into the root of your GitHub repository. For instance, the .github directory should be under the root of your GitHub repository.

Create a GitHub secret containing your IAM user access key

In this step, we store the access key details of the newly created user in our GitHub secret.

  1. On the GitHub website, navigate to your repository and choose Settings.
  2. In the security section, select Secrets and Variables and choose Actions.
  3. Choose New Repository Secret.
  4. For Name, enter AWS_ACCESS_KEY_ID
  5. For Secret, enter the access key ID associated with the IAM user you created earlier.
  6. Choose Add Secret.
  7. Repeat the same procedure for AWS_SECRET_ACCESS_KEY

Configure your GitHub environments

To create a manual approval step in our deployment pipelines, we use a GitHub environment. Complete the following steps:

  1. Navigate to the Settings, Environments menu of your GitHub repository and create a new environment called production.
  2. For Environment protection rules, select Required reviewers.
  3. Add the desired GitHub user names as reviewers. For this example, you can choose your own user name.

Note that the environment feature is not available in some types of GitHub plans. For more information, refer to Using environments for deployment.

Deploy the Lambda function

In the following steps, we compress lambda_function.py into a .zip file, which is then uploaded to an S3 bucket.

The relevant code sample for this can be found in the following GitHub repo. Specifically, the lambda_function.py is located in the lambda_functions/lambda_github_workflow_trigger directory.

It’s recommended to create a fork of the code sample and clone that instead. This will give you the freedom to modify the code and experiment with different aspects of the sample.

  1. After you obtain a copy of the code, navigate to the appropriate directory and use the zip command to compress lambda_function.py. Both Windows and MacOS users can use their native file management system, File Explorer or Finder, respectively, to generate a .zip file.
cd lambda_functions/lambda_github_workflow_trigger
zip lambda-github-workflow-trigger.zip lambda_function.py
  1. Upload the lambda-github-workflow-trigger.zip to an S3 bucket.

This bucket will later be accessed by Service Catalog. You can choose any bucket that you have access to, as long as Service Catalog is able to retrieve data from it in subsequent steps.

From this step onwards, we require the AWS CLI v2 to be installed and configured. An alternative would be to utilize AWS CloudShell, which comes with all necessary tools pre-installed, eliminating the need for any additional configurations.

  1. To upload the file to the S3 bucket, use the following command:
aws s3 cp lambda-github-workflow-trigger.zip s3://your-bucket/

Now we construct a Lambda layer for the dependencies related to the lambda_function we just uploaded.

  1. Set up a Python virtual environment and get the dependencies installed:
mkdir lambda_layer
cd lambda_layer
python3 -m venv .env
source .env/bin/activate
pip install pygithub
deactivate
  1. Generate the .zip file with the following commands:
mv .env/lib/python3.9/site-packages/ python
zip -r layer.zip python
  1. Publish the layer to AWS:
aws lambda publish-layer-version --layer-name python39-github-arm64  
  --description "Python3.9 pygithub"  
  --license-info "MIT"  
  --zip-file fileb://layer.zip  
  --compatible-runtimes python3.9  
  --compatible-architectures "arm64"

With this layer published, all your Lambda functions can now reference it to meet their dependencies. For a more detailed understanding of Lambda layers, refer to Working with Lambda layers.

Create a custom project template in SageMaker

After completion of all the above steps, we have all the CI/CD pipeline resources and components. Next we demonstrate how we can make these resources available as a custom project within the SageMaker Studio accessible via one click deployment.

As discussed earlier, when the SageMaker-provided templates don’t meet your needs (for example, you want to have more complex orchestration in CodePipeline with multiple stages, custom approval steps or to integrate with a third party tool such as GitHub and GitHub actions demonstrated in this post), you can create your own templates. We recommend starting with the SageMaker-provided templates to understand how to organize your code and resources and build on top of it. For more details, refer to Create Custom Project Templates.

Note that you can also automate this step and instead use the CloudFormation to deploy the Service Catalogue portfolio and product via code. In this post however, for a greater learning experience, we show you the console deployment.

At this stage, we use the provided CloudFormation template to create a Service Catalog portfolio that helps us create custom projects in SageMaker.

You can create a new domain or reuse your SageMaker domain for the following steps. If you don’t have a domain, refer to Onboard to Amazon SageMaker Domain using Quick setup for setup instructions.

After you enable administrator access to the SageMaker templates, complete the following steps:

  1. On the Service Catalog console, under Administration in the navigation pane, choose Portfolios.
  2. Choose Create a new portfolio.
  3. Name the portfolio “SageMaker Organization Templates”.
  4. Download the template.yml file to your computer.

This Cloud Formation template provisions all the CI/CD resources we need as configuration and infrastructure as code. You can study the template in more detail to see what resources are deployed as part of it. This template has been customized to integrate with GitHub and GitHub Actions.

  1. In the template.yml file, change the S3Bucket value to your bucket where you have uploaded the Lambda .zip file:
GitHubWorkflowTriggerLambda:
  ...
  Code:
    S3Bucket: <your-bucket>
    S3Key: lambda-github-workflow-trigger.zip
  ...
  1. Choose the new portfolio.
  2. Choose Upload a new product.
  3. For Product name¸ enter a name for your template. We use the name build-deploy-github.
  4. For Description, enter a description.
  5. For Owner, enter your name.
  6. Under Version details, for Method, choose Use a template file.
  7. Choose Upload a template.
  8. Upload the template you downloaded.
  9. For Version title, choose 1.0.
  10. Choose Review.
  11. Review your settings and choose Create product.
  12. Choose Refresh to list the new product.
  13. Choose the product you just created.
  14. On the Tags tab, add the following tag to the product:
    • Key =sagemaker:studio-visibility
    • Valuetrue

Back in the portfolio details, you should see something similar to the following screenshot (with different IDs).

Service Catalog Portfolio

  1. On the Constraints tab, choose Create constraint.
  2. For Product, choose build-deploy-github (the product you just created).
  3. For Constraint type, choose Launch.
  4. Under Launch constraint, for Method, choose Select IAM role.
  5. Choose AmazonSageMakerServiceCatalogProductsLaunchRole.
  6. Choose Create.
  7. On the Groups, roles, and users tab, choose Add groups, roles, users.
  8. On the Roles tab, select the role you used when configuring your SageMaker Studio domain. This is where the SageMaker domain role can be found.

Service Catalog Launch Constraint

  1. Choose Add access.

Deploy the project from SageMaker Studio

In the previous sections, you prepared the custom MLOps project environment. Now, let’s create a project using this template:

  1. On the SageMaker console, navigate to the domain that you want to create this project.
  2. On the Launch menu, choose Studio.

You’ll be redirected to the SageMaker Studio environment.

  1. In SageMaker Studio, in the navigation pane under Deployments, choose Projects.
  2. Choose Create project.
  3. At the top of the list of templates, choose Organization templates.

If you have gone through all the previous steps successfully, you should be able to see a new custom project template named Build-Deploy-GitHub.

  1. Select that template and choose Select Project Template.
  2. Enter an optional description.
  3. For GitHub Repository Owner Name, enter the owner of your GitHub repository. For example, if your repository is at https://github.com/pooyavahidi/my-repo, the owner would be pooyavahidi.
  4. For GitHub Repository Name, enter the name of the repository into which you copied the seed code. It would be just the name of the repo. For example, in https://github.com/pooyavahidi/my-repo, the repo is my-repo.
  5. For Codestar connection unique ID, enter the unique ID of the AWS CodeStar connection that you created.
  6. For Name of the secret in the Secrets Manager which stores GitHub token, enter the name of the secret in Secrets Manager where you created and stored the GitHub token.
  7. For GitHub workflow file for deployment, enter the name of the GitHub workflow file (at .github/workflows/deploy.yml) where you have the deployment instructions. For this example, you can keep it as default, which is deploy.yml.
  8. Choose Create project.

SageMaker Studio Project

  1. After creating your project, make sure you update the AWS_REGION and SAGEMAKER_PROJECT_NAME environment variables in your GitHub workflow files accordingly. Workflow files are in your GitHub repo (copied from the seed code), inside the .github/workflows directory. Make sure you update both build.yml and deploy.yml files.
...
env:
  AWS_REGION: <region>   
  SAGEMAKER_PROJECT_NAME: <your project name>
...

Now your environment is ready to go! You can run the pipelines directly, make changes, and push those changes to your GitHub repository to trigger the automated build pipeline and see how all the steps of build and deploy are automated.

Clean up

To clean up the resources, complete the following steps:

  • Delete the CloudFormation stacks used for the SageMaker project and SageMaker endpoints.
  • Delete the SageMaker domain.
  • Delete the Service Catalog resources.
  • Delete the AWS CodeStar connection link with the GitHub repository.
  • Delete the IAM user that you created for GitHub Actions.
  • Delete the secret in Secrets Manager that stores the GitHub personal access details.

Summary

In this post, we walked through the process of using a custom SageMaker MLOps project template to automatically construct and organize a CI/CD pipeline. This pipeline effectively integrates your existing CI/CD mechanisms with SageMaker capabilities for data manipulation, model training, model approval, and model deployment. In our scenario, we focused on integrating GitHub Actions with SageMaker projects and pipelines. For a comprehensive understanding of the implementation details, visit the GitHub repository. Feel free to experiment with this and don’t hesitate to leave any queries you might have in the comments section.


About the Authors

Dr. Romina Sharifpour is a Senior Machine Learning and Artificial Intelligence Solutions Architect at Amazon Web Services (AWS). She has spent over 10 years leading the design and implementation of innovative end-to-end solutions enabled by advancements in ML and AI. Romina’s areas of interest are natural language processing, large language models, and MLOps.

Pooya Vahidi is a Senior Solutions Architect at AWS, passionate about computer science, artificial intelligence, and cloud computing. As an AI professional, he is an active member of the AWS AI/ML Area-of-Depth team. With a background spanning over two decades of expertise in leading the architecture and engineering of large-scale solutions, he helps customers on their transformative journeys through cloud and AI/ML technologies.

Read More

Create a web UI to interact with LLMs using Amazon SageMaker JumpStart

Create a web UI to interact with LLMs using Amazon SageMaker JumpStart

The launch of ChatGPT and rise in popularity of generative AI have captured the imagination of customers who are curious about how they can use this technology to create new products and services on AWS, such as enterprise chatbots, which are more conversational. This post shows you how you can create a web UI, which we call Chat Studio, to start a conversation and interact with foundation models available in Amazon SageMaker JumpStart such as Llama 2, Stable Diffusion, and other models available on Amazon SageMaker. After you deploy this solution, users can get started quickly and experience the capabilities of multiple foundation models in conversational AI though a web interface.

Chat Studio can also optionally invoke the Stable Diffusion model endpoint to return a collage of relevant images and videos if the user requests for media to be displayed. This feature can help enhance the user experience with the use of media as accompanying assets to the response. This is just one example of how you can enrich Chat Studio with additional integrations to meet your goals.

The following screenshots show examples of what a user query and response look like.

Chat Studio query interface

Chat Studio response interface

Large language models

Generative AI chatbots such as ChatGPT are powered by large language models (LLMs), which are based on a deep learning neural network that can be trained on large quantities of unlabeled text. The use of LLMs allows for a better conversational experience that closely resembles interactions with real humans, fostering a sense of connection and improved user satisfaction.

SageMaker foundation models

In 2021, the Stanford Institute for Human-Centered Artificial Intelligence termed some LLMs as foundation models. Foundation models are pre-trained on a large and broad set of general data and are meant to serve as the foundation for further optimizations in a wide range of use cases, from generating digital art to multilingual text classification. These foundation models are popular with customers because training a new model from scratch takes time and can be expensive. SageMaker JumpStart provides access to hundreds of foundation models maintained from third-party open source and proprietary providers.

Solution overview

This post walks through a low-code workflow for deploying pre-trained and custom LLMs through SageMaker, and creating a web UI to interface with the models deployed. We cover the following steps:

  1. Deploy SageMaker foundation models.
  2. Deploy AWS Lambda and AWS Identity and Access Management (IAM) permissions using AWS CloudFormation.
  3. Set up and run the user interface.
  4. Optionally, add other SageMaker foundation models. This step extends Chat Studio’s capability to interact with additional foundation models.
  5. Optionally, deploy the application using AWS Amplify. This step deploys Chat Studio to the web.

Refer to the following diagram for an overview of the solution architecture.

Chat Studio Solution Architecture

Prerequisites

To walk through the solution, you must have the following prerequisites:

  • An AWS account with sufficient IAM user privileges.
  • npm installed in your local environment. For instructions on how to install npm, refer to Downloading and installing Node.js and npm.
  • A service quota of 1 for the corresponding SageMaker endpoints. For Llama 2 13b Chat, we use an ml.g5.48xlarge instance and for Stable Diffusion 2.1, we use an ml.p3.2xlarge instance.

To request a service quota increase, on the AWS Service Quotas console, navigate to AWS services, SageMaker, and request for a service quota raise to a value of 1 for ml.g5.48xlarge for endpoint usage and ml.p3.2xlarge for endpoint usage.

The service quota request may take a few hours to be approved, depending on the instance type availability.

Deploy SageMaker foundation models

SageMaker is a fully managed machine learning (ML) service for developers to quickly build and train ML models with ease. Complete the following steps to deploy the Llama 2 13b Chat and Stable Diffusion 2.1 foundation models using Amazon SageMaker Studio:

  1. Create a SageMaker domain. For instructions, refer to Onboard to Amazon SageMaker Domain using Quick setup.

A domain sets up all the storage and allows you to add users to access SageMaker.

  1. On the SageMaker console, choose Studio in the navigation pane, then choose Open Studio.
  2. Upon launching Studio, under SageMaker JumpStart in the navigation pane, choose Models, notebooks, solutions.
    SageMaker JumpStart Console
  3. In the search bar, search for Llama 2 13b Chat.
  4. Under Deployment Configuration, for SageMaker hosting instance, choose ml.g5.48xlarge and for Endpoint name, enter meta-textgeneration-llama-2-13b-f.
  5. Choose Deploy.

SageMaker JumpStart Deployment Configuration

After the deployment succeeds, you should be able to see the In Service status.

Llama Model Status

  1. On the Models, notebooks, solutions page, search for Stable Diffusion 2.1.
  2. Under Deployment Configuration, for SageMaker hosting instance, choose ml.p3.2xlarge and for Endpoint name, enter jumpstart-dft-stable-diffusion-v2-1-base.
  3. Choose Deploy.

SageMaker JumpStart Deployment Configuration

After the deployment succeeds, you should be able to see the In Service status.

Stable Diffusion Model Status

Deploy Lambda and IAM permissions using AWS CloudFormation

This section describes how you can launch a CloudFormation stack that deploys a Lambda function that processes your user request and calls the SageMaker endpoint that you deployed, and deploys all the necessary IAM permissions. Complete the following steps:

  1. Navigate to the GitHub repository and download the CloudFormation template (lambda.cfn.yaml) to your local machine.
  2. On the CloudFormation console, choose the Create stack drop-down menu and choose With new resources (standard).
  3. On the Specify template page, select Upload a template file and Choose file.
  4. Choose the lambda.cfn.yaml file that you downloaded, then choose Next.
  5. On the Specify stack details page, enter a stack name and the API key that you obtained in the prerequisites, then choose Next.
  6. On the Configure stack options page, choose Next.
  7. Review and acknowledge the changes and choose Submit.

Set up the web UI

This section describes the steps to run the web UI (created using Cloudscape Design System) on your local machine:

  1. On the IAM console, navigate to the user functionUrl.
  2. On the Security Credentials tab, choose Create access key.
  3. On the Access key best practices & alternatives page, select Command Line Interface (CLI) and choose Next.
  4. On the Set description tag page, choose Create access key.
  5. Copy the access key and secret access key.
  6. Choose Done.
  7. Navigate to the GitHub repository and download the react-llm-chat-studio code.
  8. Launch the folder in your preferred IDE and open a terminal.
  9. Navigate to src/configs/aws.json and input the access key and secret access key you obtained.
  10. Enter the following commands in the terminal:
    npm install
    
    npm start

  11. Open http://localhost:3000 in your browser and start interacting with your models!

To use Chat Studio, choose a foundational model on the drop-down menu and enter your query in the text box. To get AI-generated images along with the response, add the phrase “with images” to the end of your query.

Add other SageMaker foundation models

You can further extend the capability of this solution to include additional SageMaker foundation models. Because every model expects different input and output formats when invoking its SageMaker endpoint, you will need to write some transformation code in the callSageMakerEndpoints Lambda function to interface with the model.

This section describes the general steps and code changes required to implement an additional model of your choice. Note that basic knowledge of Python language is required for Steps 6–8.

  1. In SageMaker Studio, deploy the SageMaker foundation model of your choice.
  2. Choose SageMaker JumpStart and Launch JumpStart assets.
  3. Choose your newly deployed model endpoint and choose Open Notebook.
  4. On the notebook console, find the payload parameters.

These are the fields that the new model expects when invoking its SageMaker endpoint. The following screenshot shows an example.

SageMaker Endpoint Configuration

  1. On the Lambda console, navigate to callSageMakerEndpoints.
  2. Add a custom input handler for your new model.

In the following screenshot, we transformed the input for Falcon 40B Instruct BF16 and GPT NeoXT Chat Base 20B FP16. You can insert your custom parameter logic as indicated to add the input transformation logic with reference to the payload parameters that you copied.

Lambda Code Snippet

  1. Return to the notebook console and locate query_endpoint.

This function gives you an idea how to transform the output of the models to extract the final text response.

SageMaker Endpoint Configuration

  1. With reference to the code in query_endpoint, add a custom output handler for your new model.
    Lambda Code
  2. Choose Deploy.
  3. Open your IDE, launch the react-llm-chat-studio code, and navigate to src/configs/models.json.
  4. Add your model name and model endpoint, and enter the payload parameters from Step 4 under payload using the following format:
    "add_model_name": {
    "endpoint_name": "add_model_enpoint",
    "payload": {
    "add_payload_paramters_here"
    }
    },

  5. Refresh your browser to start interacting with your new model!

Deploy the application using Amplify

Amplify is a complete solution that allows you to quickly and efficiently deploy your application. This section describes the steps to deploy Chat Studio to an Amazon CloudFront distribution using Amplify if you wish to share your application with other users.

  1. Navigate to the react-llm-chat-studio code folder you created earlier.
  2. Enter the following commands in the terminal and follow the setup instructions:
    npm install -g @aws-amplify/cli
    
    amplify configure

  3. Initialize a new Amplify project by using the following command. Provide a ­­project name, accept the default configurations, and choose AWS access keys when prompted to select the authentication method.
    amplify init

  4. Host the Amplify project by using the following command. Choose Amazon CloudFront and S3 when prompted to select the plugin mode.
    amplify hosting add

  5. Finally, build and deploy the project with the following command:
    amplify publish

  6. After the deployment succeeds, open the URL provided in your browser and start interacting with your models!

Clean up

To avoid incurring future charges, complete the following steps:

  1. Delete the CloudFormation stack. For instructions, refer to Deleting a stack on the AWS CloudFormation console.
  2. Delete the SageMaker JumpStart endpoint. For instructions, refer to Delete Endpoints and Resources.
  3. Delete the SageMaker domain. For instructions, refer to Delete an Amazon SageMaker Domain.

Conclusion

In this post, we explained how to create a web UI for interfacing with LLMs deployed on AWS.

With this solution, you can interact with your LLM and hold a conversation in a user-friendly manner to test or ask the LLM questions, and get a collage of images and videos if required.

You can extend this solution in various ways, such as to integrate additional foundation models, integrate with Amazon Kendra to enable ML-powered intelligent search for understanding enterprise content, and more!

We invite you to experiment with different pre-trained LLMs available on AWS, or build on top of or even create your own LLMs in SageMaker. Let us know your questions and findings in the comments, and have fun!


About the authors

Jarrett Yeo Shan Wei is an Associate Cloud Architect in AWS Professional Services covering the Public Sector across ASEAN and is an advocate for helping customers modernize and migrate into the cloud. He has attained five AWS certifications, and has also published a research paper on gradient boosting machine ensembles in the 8th International Conference on AI. In his free time, Jarrett focuses on and contributes to the generative AI scene at AWS.

Tammy Lim Lee Xin is an Associate Cloud Architect at AWS. She uses technology to help customers deliver their desired outcomes in their cloud adoption journey and is passionate about AI/ML. Outside of work she loves travelling, hiking, and spending time with family and friends.

Read More

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

Frugality meets Accuracy: Cost-efficient training of GPT NeoX and Pythia models with AWS Trainium

Large language models (or LLMs) have become a topic of daily conversations. Their quick adoption is evident by the amount of time required to reach a 100 million users, which has gone from “4.5yrs by facebook” to an all-time low of mere “2 months by ChatGPT.” A generative pre-trained transformer (GPT) uses causal autoregressive updates to make prediction. Variety of tasks such as speech recognition, text generation, and question answering are demonstrated to have stupendous performance by these model architectures. Several recent models such as NeoX, Falcon, Llama use the GPT architecture as a backbone. Training LLMs requires colossal amount of compute time, which costs millions of dollars. In this post, we’ll summarize training procedure of GPT NeoX on AWS Trainium, a purpose-built machine learning (ML) accelerator optimized for deep learning training. We’ll outline how we cost-effectively (3.2 M tokens/$) trained such models with AWS Trainium without losing any model quality.

Solution overview

GPT NeoX and Pythia models

GPT NeoX and Pythia are the open-source causal language models by Eleuther-AI with approximately 20 billion parameters in NeoX and 6.9 billion in Pythia. Both are decoder models following similar architectural design as Chat GPT3. However, they also have several additions, which are also widely adopted in the recent models such as Llama. Particularly, they have rotational positional embedding (ROPE) with partial rotation across head dimensions. The original models (NeoX and Pythia 6.9B) are trained on openly available Pile dataset with deduplication and using Megatron and Deepspeed backend.

We demonstrate the pre-training and fine-tuning of these models on AWS Trainium-based Trn1 instances using Neuron NeMo library. To establish the proof-of-concept and quick reproduction, we’ll use a smaller Wikipedia dataset subset tokenized using GPT2 Byte-pair encoding (BPE) tokenizer.

Walkthrough

Download the pre-tokenized Wikipedia dataset as shown:

export DATA_DIR=~/examples_datasets/gpt2

mkdir -p ${DATA_DIR} && cd ${DATA_DIR}

wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.bin . --no-sign-request
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/my-gpt2_text_document.idx . --no-sign-request
aws s3 cp s3://neuron-s3/training_datasets/gpt/wikipedia/license.txt . --no-sign-request

Both NeoX 20B and Pythia 6.9B uses ROPE with partial rotation, for example, rotating 25% of the head dimensions and keeping the rest unrotated. To efficiently implement the partial rotation on AWS Trainium accelerator, instead of concatenating the rotating and non-rotating dimensions, we append zero frequencies for non-rotating dimensions and then rotate the complete set of head dimensions. This simple trick helped us improve the throughput (sequences processed per sec) on AWS Trainium.

Training steps

To run the training, we use SLURM managed multi-node Amazon Elastic Compute Cloud (Amazon EC2) Trn1 cluster, with each node containing a trn1.32xl instance. Each trn1.32xl has 16 accelerators with two workers per accelerator. After downloading the latest Neuron NeMo package, use the provided neox and pythia pre-training and fine-tuning scripts with optimized hyper-parameters and execute the following for a four node training.

  1. Compile: Pre-compile the model with three train iterations to generate and save the graphs:
    sbatch --nodes 4 compile.slurm ./neoX_20B_slurm.sh

  2. Run: Execute the training by loading the cached graphs from first steps
    sbatch --nodes 4 run.slurm ./neoX_20B_slurm.sh

  3. Monitor results
    tensorboard --logdir=nemo_experiments/megatron_neox

Same steps needs to be followed for running Pythia 6.9B model with replacing neox_20B_slurm.sh by pythia_6.9B_slurm.sh.

Pre-training and fine-tuning experiments

We demonstrate the pre-training of GPT-NeoX and Pythia models on AWS Trainium using Neuron NeMo library for 10k iterations, and also show fine-tuning of these models for 1k steps. For pre-training, we use the GPT2 BPE tokenizer inside the NeMo and follow same config as used in the original model. Fine-tuning on AWS Trainium requires change of few parameters (such as vocab size division factor), which are provided in the fine-tuning scripts to accommodate for Megatron versus NeMo differences and GPU versus AWS Trainium changes. The multi-node distributed training throughput with varying number of nodes is shown in the Table-1.

Model Tensor Parallel Pipeline Parallel Number of instances Cost ($/hour) Sequence length Global batch size Throughput (seq/sec) Cost-throughput ratio (tokens/$)
Pythia 6.9B 8 1 1 7.59 2048 256 10.4 10,102,387
8 1 4 30.36 2048 256 35.8 8,693,881
NeoX 20B 8 4 4 30.36 2048 16384 13.60 3,302,704
8 4 8 60.72 2048 16384 26.80 3,254,134
8 4 16 121.44 2048 16384 54.30 3,296,632
8 4 32 242.88 2048 16384 107.50 3,263,241
8 4 64 485.76 2048 16384 212.00 3,217,708

Table 1. Comparing mean throughput of GPT NeoX and Pythia models for training up to 500 steps with changing number of nodes. The pricing of trn1.32xl is based on the 3-year reserved effective per hour rate.

Next, we also evaluate the loss trajectory of the model training on AWS Trainium and compare it with the corresponding run on a P4d (Nvidia A100 GPU cores) cluster. Along with the training loss, we also compare useful indicator such as gradient norm, which is 2-norm of the model gradients computed at each training iteration to monitor the training progress. The training results are shown in Figure-1, 2 and fine-tuning of NeoX 20B in Figure-3.

Training loss averaged across all workers (left) and gradient norm (right) at training each step.

Figure-1. Training loss averaged across all workers (left) and gradient norm (right) at training each step. NeoX 20B is trained on 4 nodes with small wiki dataset on GPU and Trainium with same training hyper-parameters (global batch size=256). GPU is using BF16 and default mixed-precision while AWS Trainium is using full BF16 with stochastic rounding. The loss and gradient norm trajectories match for GPU and AWS Trainium.

Training loss averaged across all workers (left) and gradient norm (right) at training each step (Pythia).

Figure-2. Training loss averaged across all workers (left) and gradient norm (right) at training each step. Similar to GPT NeoX in Figure-1, Pythia 6.9B is trained on 4 nodes with small wiki dataset on GPU and Trainium with same training hyper-parameters (global batch size=256). The loss and gradient norm trajectories match for GPU and Trainium.

Fine-tuning GPT NeoX 20B model on GPU and AWS Trainium with training loss averaged across all workers (left) and gradient norm (right).

Figure-3. Fine-tuning GPT NeoX 20B model on GPU and AWS Trainium with training loss averaged across all workers (left) and gradient norm (right). A small wiki dataset is used for fine-tuning demonstration. The loss and gradient norm trajectories match for GPU and AWS Trainium.

In this post, we showed cost-efficient training of LLMs on AWS deep learning hardware. We trained GPT NeoX 20B and Pythia 6.9B models on AWS Trn1 with Neuron NeMo library. The cost normalized throughput for 20 billion models with AWS Trainium is around approximately 3.2M tokens/$ spent. Along with cost-efficient training on AWS Trainium, we obtain similar model accuracy, which is evident from training step loss and gradient norm trajectory. We also fine-tuned the available checkpoints for NeoX 20B model on AWS Trainium. For additional information on the distributed training with NeMo Megatron on AWS Trainium, see AWS Neuron Reference for NeMo Megatron. A good resource to start fine-tuning of Llama model could be found here, Llama2 fine-tuning. To get started with managed AWS Trainium on Amazon SageMaker, see Train your ML Models with AWS Trainium and Amazon SageMaker.


About the Authors

Gaurav Gupta is currently an Applied Scientist at Amazon Web Services (AWS) AI labs. Dr. Gupta completed his PhD from USC Viterbi. His research interests span the domain of sequential data modeling, learning partial differential equations, information theory for machine learning, fractional dynamical models, and complex networks. He is currently working on applied and mathematical problems on LLMs training behavior, vision models with PDEs, information-theoretic multi-modality models. Dr. Gupta has publications in top journals/conferences such as Neurips, ICLR, ICML, Nature, IEEE Control Society, ACM cyber-physical society.

Ben Snyder is an applied scientist with AWS Deep Learning. His research interests include foundational models, reinforcement learning, and asynchronous optimization. Outside of work, he enjoys cycling and backcountry camping.

Amith (R) Mamidala is the senior machine learning application engineering at AWS Annapurna Labs. Dr. Mamidala completed his PhD at the Ohio State University in high performance computing and communication. During his tenure at IBM research, Dr. Mamidala contributed towards the BlueGene class of computers which often led the Top500 ranking of the most powerful and power-efficient supercomputers. The project was awarded 2009 National medal of Technology and Innovation. After a brief stint as an AI engineer at a financial hedge fund, Dr. Mamidala joined the Annapurna labs focusing on Large Language model training.

Jun (Luke) Huan is a principal scientist at AWS AI Labs. Dr. Huan works on AI and Data Science. He has published more than 180 peer-reviewed papers in leading conferences and journals. He was a recipient of the NSF Faculty Early Career Development Award in 2009. Before joining AWS, he worked at Baidu research as a distinguished scientist and the head of Baidu Big Data Laboratory. He founded StylingAI Inc., an AI start-up, and worked as the CEO and Chief Scientist in 2019-2021. Before joining industry, he was the Charles E. and Mary Jane Spahr Professor in the EECS Department at the University of Kansas.

Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt Amazon EC2 accelerated computing infrastructure for their machine learning needs.

Read More

Vodafone advances its machine learning skills with AWS DeepRacer and Accenture

Vodafone advances its machine learning skills with AWS DeepRacer and Accenture

Vodafone is transitioning from a telecommunications company (telco) to a technology company (TechCo) by 2025, with objectives of innovating faster, reducing costs, improving security, and simplifying operations. Thousands of engineers are being onboarded to contribute to this transition. By 2025, Vodafone plans to have 50% of its global workforce actively involved in software development, with an objective to deliver 60% of digital services in-house. This new workforce requires rapid reskilling and understanding of disruptive services such as artificial intelligence (AI) and machine learning (ML) to drive meaningful outcomes.

To help achieve this ambitious transition, Vodafone has partnered with Accenture and AWS to build a cloud platform that helps its engineers work in flexible, creative, and agile ways by providing them a curated set of managed, security and DevOps-oriented AWS services and application workloads. To learn more, check out Redefining Vodafone’s customer experience with AWS and the following talk at AWS re:Invent 2022.

Vodafone Digital engineering (VDE) invited Accenture and AWS to co-host an exclusive event at their annual DigiFest, a week-long event celebrating the scale of their global VDE teams, championing reusable apps and collaborative idea generation. As one of the main events of the DigiFest, AWS and Accenture conceptualized a company-wide AWS DeepRacer challenge where engineers can build and train their models to become better versed in using ML with AWS.

In this post, we share how Vodafone is advancing its ML skills using AWS DeepRacer and Accenture.

Why is machine learning important to Vodafone?

Machine learning is one of the fastest growing domains in technology and telecommunications, owing to the benefits of improved productivity and forecasting across key domains in telecommunications such as channels, CRM, billing, order management, service assurance, network management, and more.

Vodafone has already adopted ML in the proactive detection and correction of network anomalies to improve customer satisfaction. Their AI and ML capabilities in digital self-care, via a chatbot, have been helping their customer care team focus on cases that need deeper attention. Because they use AWS for providing digital services packaged as telco as a service, incorporating AI and ML components is crucial to maintain a competitive edge in delivering state-of-the-art services to customers.

Why AWS DeepRacer?

AWS DeepRacer is an interesting and fun way to get started with reinforcement learning (RL). RL is an advanced ML technique that takes a very different approach to training models than other ML methods. Its super power is that it learns very complex behaviors without requiring any labeled training data, and can make short-term decisions while optimizing for a longer-term goal. The AWS DeepRacer Challenge provided an opportunity for Vodafone’s engineers to engage in a friendly competition, develop an ML mindset, and share insights on how to succeed in a private virtual racing event.

Racing with AWS DeepRacer

The event played out in three stages, starting with a workshop on AWS DeepRacer to cover the basics of reinforcement learning, which was attended by over 225 Vodafone engineers. They learned how to fine-tune an AWS DeepRacer model by creating a reward function, exploring the action space, systematically tuning hyperparameters, examining the training job progress, evaluating the model, and testing the model on a virtual AWS DeepRacer vehicle and virtual track.

In the next stage, a league race was organized where 130 racers were able to view the race videos of the best model submission of every participant on a live leaderboard. This helped them understand how a high-performance model performs after it’s trained. They quickly understood overtraining occurs when a model is trained for too long, leading to overfitting, which leads to underperformance in a new environment. They also experimented with different styles of reward functions such as follow the center line, excessive steering penalty, slowness penalty, and progress rewards.

The event culminated with a grand finale, a showdown of 11 racers who tuned their models one final time to compete in a live race with commentary. All 11 racers completed a full lap with their models. Eight racers had a lap time of less than 15 seconds, with the winner coming in with an incredible lap time of 11.194 seconds on the tricky Toronto Turnpike virtual race track.

Summary

The goal of the AWS DeepRacer Challenge was to build awareness and excitement of ML on AWS for a global cloud engineering audience with varying technology skills and competencies. The tournament exceeded 585 total registrations across the globe, with over 400 models submitted and over 600 hours of training and evaluation.

Vodafone was able to help a broad range of builders get hands-on with ML through the AWS DeepRacer challenge. With over 47% AWS and ML beginners, it reaffirms how effective AWS DeepRacer can be in introducing ML with AWS in a safe and engaging environment for beginners.

“Having the Digital Engineering team attend events like DigiFest and participate in challenges like AWS DeepRacer is a huge part of our vision of building a world-class software engineering team in Vodafone. As we take on the complex challenge of transforming a telecommunications company into a technology company, growing our skillset becomes a top priority and our partnership with Accenture and AWS has provided the team with not just this, but multiple opportunities to learn and develop. I am excited for more of this to come!”

Ben Connolly, Vodafone Global Director of Cloud Engineering


About the Author

Ramakrishna Natarajan is a Senior Partner Solutions Architect at Amazon Web Services. He is based out of London and helps AWS Partners find optimal solutions on AWS for their customers. He specialises in Telecommunications OSS/BSS and has a keen interest in evolving domains such as AI/ML, Data Analytics, Security and Modernisation. He enjoys playing squash, going on long hikes and learning new languages.

Read More

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

Getir end-to-end workforce management: Amazon Forecast and AWS Step Functions

This is a guest post co-authored by Nafi Ahmet Turgut, Mehmet İkbal Özmen, Hasan Burak Yel, Fatma Nur Dumlupınar Keşir, Mutlu Polatcan and Emre Uzel from Getir.

Getir is the pioneer of ultrafast grocery delivery. The technology company has revolutionized last-mile delivery with its grocery in-minutes delivery proposition. Getir was founded in 2015 and operates in Turkey, the UK, the Netherlands, Germany, and the United States. Today, Getir is a conglomerate incorporating nine verticals under the same brand.

In this post, we describe the end-to-end workforce management system that begins with location-specific demand forecast, followed by courier workforce planning and shift assignment using Amazon Forecast and AWS Step Functions.

In the past, operational teams engaged in manual workforce management practices, which resulted in a significant waste of time and effort. However, with the implementation of our comprehensive end-to-end workforce management project, they are now able to efficiently generate the necessary courier plans for warehouses through a simplified, one-click process accessible via a web interface. Before the initiation of this project, business teams relied on more intuitive methods for demand forecasting, which required improvement in terms of precision.

Amazon Forecast is a fully managed service that uses machine learning (ML) algorithms to deliver highly accurate time series forecasts. In this post, we describe how we reduced the modelling time by 70% by doing the feature engineering and modelling using Amazon Forecast. We achieved a 90% reduction in elapsed time when running scheduling algorithms for all warehouses using AWS Step Functions, which is a fully managed service that makes it easier to coordinate the components of distributed applications and microservices using visual workflows. This solution also led to an 90% improvement in prediction accuracy across Turkey and several European countries.

Solution overview

The End-to-end Workforce Management Project (E2E Project) is a large-scale project and it can be described in three topics:

1. Calculating courier requirements

The first step is to estimate hourly demand for each warehouse, as explained in the Algorithm selection section. These predictions, produced with Amazon Forecast, help determine when and how many couriers each warehouse needs.

Based on the throughput ratio of the couriers in warehouses, the number of couriers required for each warehouse is calculated in hourly intervals. These calculations assist in determining the feasible courier counts considering legal working hours, which involves mathematical modeling.

2. Solving the shift Assignment problem

Once we have the courier needs and know the other constraints of the couriers and warehouses, we can solve the shift assignment problem. The problem is modelled with decision variables determining the couriers to be assigned and creating shift schedules, minimizing surplus and shortage that may cause missed orders. This is typically a mixed-integer programming (MIP) problem.

3. Utilizing AWS Step Functions

We use AWS Step Functions to coordinate and manage workflows with its capability to execute jobs in parallel. Each warehouse’s shift assignment process is defined as a separate workflow. AWS Step Functions automatically initiate and monitor these workflows by simplifying error handling.

Since this process requires extensive data and complex computations, services like AWS Step Functions offer a significant advantage in organizing and optimizing tasks. It allows for better control and efficient resource management.

In the solution architecture, we also take advantage of other AWS services by integrating them into AWS Step Functions:

The following diagrams show AWS Step Functions workflows and architecture of the shifting tool:

Figure 1 AWS Step Functions workflows

Figure 2 Shifting tool architecture

Algorithm selection

Forecasting locational demand constitutes the initial phase in the E2E project. The overarching goal of E2E is to determine the number of couriers to allocate to a specific warehouse, commencing with a forecast of the demand for that warehouse.

This forecasting component is pivotal within the E2E framework, as subsequent phases rely on these forecasting outcomes. Thus, any prediction inaccuracies can detrimentally impact the entire project’s efficacy.

The objective of the locational demand forecast phase is to generate predictions on a country-specific basis for every warehouse segmented hourly over the forthcoming two weeks. Initially, daily forecasts for each country are formulated through ML models. These daily predictions are subsequently broken down into hourly segments, as depicted in the following graph. Historic transactional demand data, location-based weather information, holiday dates, promotions and marketing campaign data are the features used in the model as shown in the graph below.

Figure 3 The architecture of location-specific forecasting

The team initially explored traditional forecasting techniques such as open-source SARIMA (Seasonal Auto-Regressive Integrated Moving Average), ARIMAX (Auto-Regressive Integrated Moving Average using exogenous variables), and Exponential Smoothing.

ARIMA (Auto-Regressive Integrated Moving Average) is a time series forecasting method that combines autoregressive (AR) and moving average (MA) components along with differencing to make the time series stationary.

SARIMA extends ARIMA by incorporating additional parameters to account for seasonality in the time series. It includes seasonal auto-regressive and seasonal moving average terms to capture repeating patterns over specific intervals, making it suitable for time series with a seasonal component.

ARIMAX builds upon ARIMA by introducing exogenous variables, which are external factors that can influence the time series. These additional variables are considered in the model to improve forecasting accuracy by accounting for external influences beyond the historical values of the time series.

Exponential Smoothing is another time series forecasting method that, unlike ARIMA, is based on weighted averages of past observations. It is particularly effective for capturing trends and seasonality in data. The method assigns exponentially decreasing weights to past observations, with more recent observations receiving higher weights.

The Amazon Forecast models were eventually selected for the algorithmic modeling segment. The vast array of models and the sophisticated feature engineering capabilities offered by AWS Forecast proved more advantageous and optimized our resource utilization.

Six algorithms available in Forecast were tested: Convolutional Neural Network – Quantile Regression (CNN-QR), DeepAR+, Prophet, Non-Parametric Time Series (NPTS), Autoregressive Integrated Moving Average (ARIMA), and Exponential Smoothing (ETS). Upon analysis of the forecast results, we determined that CNN-QR surpassed the others in efficacy. CNN-QR is a proprietary ML algorithm developed by Amazon for forecasting scalar (one-dimensional) time series using causal Convolutional Neural Networks (CNNs). Given the availability of diverse data sources at this juncture, employing the CNN-QR algorithm facilitated the integration of various features, operating within a supervised learning framework. This distinction separated it from univariate time-series forecasting models and markedly enhanced performance.

Utilizing Forecast proved effective due to the simplicity of providing the requisite data and specifying the forecast duration. Subsequently, Forecast employs the CNN-QR algorithm to generate predictions. This tool significantly expedited the process for our team, particularly in algorithmic modeling. Furthermore, utilizing Amazon Simple Storage Service (Amazon S3) buckets for input data repositories and Amazon Redshift for storing outcomes has facilitated centralized management of the entire procedure.

Conclusion

In this post, we showed you how Getir’s E2E project demonstrated how combining Amazon Forecast and AWS Step Functions services streamlines complex processes effectively. We achieved an impressive prediction accuracy of around 90% across countries in Europe and Turkey, and using Forecast reduced modeling time by 70% due to its efficient handling of feature engineering and modeling.

Using AWS Step Functions service has led to practical advantages, notably reducing scheduling time by 90% for all warehouses. Also, by considering field requirements, we improved compliance rates by 3%, helping allocate the workforce more efficiently. This, in turn, highlights the project’s success in optimizing operations and service delivery.

To access further details on commencing your journey with Forecast, please refer to the available Amazon Forecast resources. Additionally, for insights on constructing automated workflows and crafting machine learning pipelines, you can explore AWS Step Functions for comprehensive guidance.


About the Authors

Nafi Ahmet Turgut finished his master’s degree in electrical & Electronics Engineering and worked as graduate research scientist. His focus was building machine learning algorithms to simulate nervous network anomalies. He joined Getir in 2019 and currently works as a Senior Data Science & Analytics Manager. His team is responsible for designing, implementing, and maintaining end-to-end machine learning algorithms and data-driven solutions for Getir.

Mehmet İkbal Özmen received his Master’s Degree in Economics and worked as Graduate Research Assistant. His research area was mainly economic time series models, Markov simulations, and recession forecasting. He then joined Getir in 2019 and currently works as Data Science & Analytics Manager. His team is responsible for optimization and forecast algorithms to solve the complex problems experienced by the operation and supply chain businesses.

Hasan Burak Yel received his Bachelor’s Degree in Electrical & Electronics Engineering at Boğaziçi University. He worked at Turkcell, mainly focused on time series forecasting, data visualization, and network automation. He joined Getir in 2021 and currently works as a Data Science & Analytics Manager with the responsibility of Search, Recommendation, and Growth domains.

Fatma Nur Dumlupınar Keşir received her Bachelor’s Degree from Industrial Engineering Department at Boğaziçi University. She worked as a researcher at TUBITAK, focusing on time series forecasting & visualization. She then joined Getir in 2022 as a data scientist and has worked on Recommendation Engine projects, Mathematical Programming for Workforce Planning.

Emre Uzel received his Master’s Degree in Data Science from Koç University. He worked as a data science consultant at Eczacıbaşı Bilişim where he mainly focused on recommendation engine algorithms. He joined Getir in 2022 as a Data Scientist and started working on time-series forecasting and mathematical optimization projects.

Mutlu Polatcan is a Staff Data Engineer at Getir, specializing in designing and building cloud-native data platforms. He loves combining open-source projects with cloud services.

Esra Kayabalı is a Senior Solutions Architect at AWS, specializing in the analytics domain including data warehousing, data lakes, big data analytics, batch and real-time data streaming and data integration. She has 12 years of software development and architecture experience. She is passionate about learning and teaching cloud technologies.

Read More