Create a web UI to interact with LLMs using Amazon SageMaker JumpStart

The launch of ChatGPT and rise in popularity of generative AI have captured the imagination of customers who are curious about how they can use this technology to create new products and services on AWS, such as enterprise chatbots, which are more conversational. This post shows you how you can create a web UI, which we call Chat Studio, to start a conversation and interact with foundation models available in Amazon SageMaker JumpStart such as Llama 2, Stable Diffusion, and other models available on Amazon SageMaker. After you deploy this solution, users can get started quickly and experience the capabilities of multiple foundation models in conversational AI though a web interface.

Chat Studio can also optionally invoke the Stable Diffusion model endpoint to return a collage of relevant images and videos if the user requests for media to be displayed. This feature can help enhance the user experience with the use of media as accompanying assets to the response. This is just one example of how you can enrich Chat Studio with additional integrations to meet your goals.

The following screenshots show examples of what a user query and response look like.

Large language models

Generative AI chatbots such as ChatGPT are powered by large language models (LLMs), which are based on a deep learning neural network that can be trained on large quantities of unlabeled text. The use of LLMs allows for a better conversational experience that closely resembles interactions with real humans, fostering a sense of connection and improved user satisfaction.

SageMaker foundation models

In 2021, the Stanford Institute for Human-Centered Artificial Intelligence termed some LLMs as foundation models. Foundation models are pre-trained on a large and broad set of general data and are meant to serve as the foundation for further optimizations in a wide range of use cases, from generating digital art to multilingual text classification. These foundation models are popular with customers because training a new model from scratch takes time and can be expensive. SageMaker JumpStart provides access to hundreds of foundation models maintained from third-party open source and proprietary providers.

Solution overview

This post walks through a low-code workflow for deploying pre-trained and custom LLMs through SageMaker, and creating a web UI to interface with the models deployed. We cover the following steps:

Deploy SageMaker foundation models.
Deploy AWS Lambda and AWS Identity and Access Management (IAM) permissions using AWS CloudFormation.
Set up and run the user interface.
Optionally, add other SageMaker foundation models. This step extends Chat Studio’s capability to interact with additional foundation models.
Optionally, deploy the application using AWS Amplify. This step deploys Chat Studio to the web.

Refer to the following diagram for an overview of the solution architecture.

Prerequisites

To walk through the solution, you must have the following prerequisites:

An AWS account with sufficient IAM user privileges.
npm installed in your local environment. For instructions on how to install npm, refer to Downloading and installing Node.js and npm.
A service quota of 1 for the corresponding SageMaker endpoints. For Llama 2 13b Chat, we use an ml.g5.48xlarge instance and for Stable Diffusion 2.1, we use an ml.p3.2xlarge instance.

To request a service quota increase, on the AWS Service Quotas console, navigate to AWS services, SageMaker, and request for a service quota raise to a value of 1 for ml.g5.48xlarge for endpoint usage and ml.p3.2xlarge for endpoint usage.

The service quota request may take a few hours to be approved, depending on the instance type availability.

Deploy SageMaker foundation models

SageMaker is a fully managed machine learning (ML) service for developers to quickly build and train ML models with ease. Complete the following steps to deploy the Llama 2 13b Chat and Stable Diffusion 2.1 foundation models using Amazon SageMaker Studio:

Create a SageMaker domain. For instructions, refer to Onboard to Amazon SageMaker Domain using Quick setup.

A domain sets up all the storage and allows you to add users to access SageMaker.

On the SageMaker console, choose Studio in the navigation pane, then choose Open Studio.
Upon launching Studio, under SageMaker JumpStart in the navigation pane, choose Models, notebooks, solutions.
In the search bar, search for Llama 2 13b Chat.
Under Deployment Configuration, for SageMaker hosting instance, choose ml.g5.48xlarge and for Endpoint name, enter meta-textgeneration-llama-2-13b-f.
Choose Deploy.

After the deployment succeeds, you should be able to see the In Service status.

On the Models, notebooks, solutions page, search for Stable Diffusion 2.1.
Under Deployment Configuration, for SageMaker hosting instance, choose ml.p3.2xlarge and for Endpoint name, enter jumpstart-dft-stable-diffusion-v2-1-base.
Choose Deploy.

After the deployment succeeds, you should be able to see the In Service status.

Deploy Lambda and IAM permissions using AWS CloudFormation

This section describes how you can launch a CloudFormation stack that deploys a Lambda function that processes your user request and calls the SageMaker endpoint that you deployed, and deploys all the necessary IAM permissions. Complete the following steps:

Navigate to the GitHub repository and download the CloudFormation template (lambda.cfn.yaml) to your local machine.
On the CloudFormation console, choose the Create stack drop-down menu and choose With new resources (standard).
On the Specify template page, select Upload a template file and Choose file.
Choose the lambda.cfn.yaml file that you downloaded, then choose Next.
On the Specify stack details page, enter a stack name and the API key that you obtained in the prerequisites, then choose Next.
On the Configure stack options page, choose Next.
Review and acknowledge the changes and choose Submit.

Set up the web UI

This section describes the steps to run the web UI (created using Cloudscape Design System) on your local machine:

On the IAM console, navigate to the user functionUrl.
On the Security Credentials tab, choose Create access key.
On the Access key best practices & alternatives page, select Command Line Interface (CLI) and choose Next.
On the Set description tag page, choose Create access key.
Copy the access key and secret access key.
Choose Done.
Navigate to the GitHub repository and download the react-llm-chat-studio code.
Launch the folder in your preferred IDE and open a terminal.
Navigate to src/configs/aws.json and input the access key and secret access key you obtained.
Enter the following commands in the terminal:
```
npm install

npm start
```
Open http://localhost:3000 in your browser and start interacting with your models!

To use Chat Studio, choose a foundational model on the drop-down menu and enter your query in the text box. To get AI-generated images along with the response, add the phrase “with images” to the end of your query.

Add other SageMaker foundation models

You can further extend the capability of this solution to include additional SageMaker foundation models. Because every model expects different input and output formats when invoking its SageMaker endpoint, you will need to write some transformation code in the callSageMakerEndpoints Lambda function to interface with the model.

This section describes the general steps and code changes required to implement an additional model of your choice. Note that basic knowledge of Python language is required for Steps 6–8.

In SageMaker Studio, deploy the SageMaker foundation model of your choice.
Choose SageMaker JumpStart and Launch JumpStart assets.
Choose your newly deployed model endpoint and choose Open Notebook.
On the notebook console, find the payload parameters.

These are the fields that the new model expects when invoking its SageMaker endpoint. The following screenshot shows an example.

On the Lambda console, navigate to callSageMakerEndpoints.
Add a custom input handler for your new model.

In the following screenshot, we transformed the input for Falcon 40B Instruct BF16 and GPT NeoXT Chat Base 20B FP16. You can insert your custom parameter logic as indicated to add the input transformation logic with reference to the payload parameters that you copied.

Return to the notebook console and locate query_endpoint.

This function gives you an idea how to transform the output of the models to extract the final text response.

With reference to the code in query_endpoint, add a custom output handler for your new model.
Choose Deploy.
Open your IDE, launch the react-llm-chat-studio code, and navigate to src/configs/models.json.
Add your model name and model endpoint, and enter the payload parameters from Step 4 under payload using the following format:
```
"add_model_name": {
"endpoint_name": "add_model_enpoint",
"payload": {
"add_payload_paramters_here"
}
},
```
Refresh your browser to start interacting with your new model!

Deploy the application using Amplify

Amplify is a complete solution that allows you to quickly and efficiently deploy your application. This section describes the steps to deploy Chat Studio to an Amazon CloudFront distribution using Amplify if you wish to share your application with other users.

Navigate to the react-llm-chat-studio code folder you created earlier.
Enter the following commands in the terminal and follow the setup instructions:
```
npm install -g @aws-amplify/cli

amplify configure
```
Initialize a new Amplify project by using the following command. Provide a project name, accept the default configurations, and choose AWS access keys when prompted to select the authentication method.
```
amplify init
```
Host the Amplify project by using the following command. Choose Amazon CloudFront and S3 when prompted to select the plugin mode.
```
amplify hosting add
```
Finally, build and deploy the project with the following command:
```
amplify publish
```
After the deployment succeeds, open the URL provided in your browser and start interacting with your models!

Clean up

To avoid incurring future charges, complete the following steps:

Delete the CloudFormation stack. For instructions, refer to Deleting a stack on the AWS CloudFormation console.
Delete the SageMaker JumpStart endpoint. For instructions, refer to Delete Endpoints and Resources.
Delete the SageMaker domain. For instructions, refer to Delete an Amazon SageMaker Domain.

Conclusion

In this post, we explained how to create a web UI for interfacing with LLMs deployed on AWS.

With this solution, you can interact with your LLM and hold a conversation in a user-friendly manner to test or ask the LLM questions, and get a collage of images and videos if required.

You can extend this solution in various ways, such as to integrate additional foundation models, integrate with Amazon Kendra to enable ML-powered intelligent search for understanding enterprise content, and more!

We invite you to experiment with different pre-trained LLMs available on AWS, or build on top of or even create your own LLMs in SageMaker. Let us know your questions and findings in the comments, and have fun!

About the authors

Jarrett Yeo Shan Wei is an Associate Cloud Architect in AWS Professional Services covering the Public Sector across ASEAN and is an advocate for helping customers modernize and migrate into the cloud. He has attained five AWS certifications, and has also published a research paper on gradient boosting machine ensembles in the 8th International Conference on AI. In his free time, Jarrett focuses on and contributes to the generative AI scene at AWS.

Tammy Lim Lee Xin is an Associate Cloud Architect at AWS. She uses technology to help customers deliver their desired outcomes in their cloud adoption journey and is passionate about AI/ML. Outside of work she loves travelling, hiking, and spending time with family and friends.

Vedere AI