At this year’s ACL, Amazon researchers won an outstanding-paper award for showing that knowledge distillation using contrastive decoding in the teacher model and counterfactual reasoning in the student model improves the consistency of “chain of thought” reasoning.Read More
Enel automates large-scale power grid asset management and anomaly detection using Amazon SageMaker
This is a guest post by Mario Namtao Shianti Larcher, Head of Computer Vision at Enel.
Enel, which started as Italy’s national entity for electricity, is today a multinational company present in 32 countries and the first private network operator in the world with 74 million users. It is also recognized as the first renewables player with 55.4 GW of installed capacity. In recent years, the company has invested heavily in the machine learning (ML) sector by developing strong in-house know-how that has enabled them to realize very ambitious projects such as automatic monitoring of its 2.3 million kilometers of distribution network.
Every year, Enel inspects its electricity distribution network with helicopters, cars, or other means; takes millions of photographs; and reconstructs the 3D image of its network, which is a point cloud 3D reconstruction of the network, obtained using LiDAR technology.
Examination of this data is critical for monitoring the state of the power grid, identifying infrastructure anomalies, and updating databases of installed assets, and it allows granular control of the infrastructure down to the material and status of the smallest insulator installed on a given pole. Given the amount of data (more than 40 million images each year just in Italy), the number of items to be identified, and their specificity, a completely manual analysis is very costly, both in terms of time and money, and error prone. Fortunately, thanks to enormous advances in the world of computer vision and deep learning and the maturity and democratization of these technologies, it’s possible to automate this expensive process partially or even completely.
Of course, the task remains very challenging, and, like all modern AI applications, it requires computing power and the ability to handle large volumes of data efficiently.
Enel built its own ML platform (internally called the ML factory) based on Amazon SageMaker, and the platform is established as the standard solution to build and train models at Enel for different use cases, across different digital hubs (business units) with tens of ML projects being developed on Amazon SageMaker Training, Amazon SageMaker Processing, and other AWS services like AWS Step Functions.
Enel collects imagery and data from two different sources:
- Aerial network inspections:
- LiDAR point clouds – They have the advantage of being an extremely accurate and geo-localized 3D reconstruction of the infrastructure, and therefore are very useful for calculating distances or taking measurements with an accuracy not obtainable from 2D image analysis.
- High-resolution images – These images of the infrastructure are taken within seconds of each other. This makes it possible to detect elements and anomalies that are too small to be identified in the point cloud.
- Satellite images – Although these can be more affordable than a power line inspection (some are available for free or for a fee), their resolution and quality is often not on par with images taken directly by Enel. The characteristics of these images make them useful for certain tasks like evaluating forest density and macro-category or finding buildings.
In this post, we discuss the details of how Enel uses these three sources, and share how Enel automates their large-scale power grid assessment management and anomaly detection process using SageMaker.
Analyzing high-resolution photographs to identify assets and anomalies
As with other unstructured data collected during inspections, the photographs taken are stored on Amazon Simple Storage Service (Amazon S3). Some of these are manually labeled with the goal of training different deep learning models for different computer vision tasks.
Conceptually, the processing and inference pipeline involves a hierarchical approach with multiple steps: first, the regions of interest in the image are identified, then these are cropped, assets are identified within them, and finally these are classified according to the material or presence of anomalies on them. Because the same pole often appears in more than one image, it’s also necessary to be able to group its images to avoid duplicates, an operation called reidentification.
For all these tasks, Enel uses the PyTorch framework and the latest architectures for image classification and object detection, such as EfficientNet/EfficientDet or others for the semantic segmentation of certain anomalies, such as oil leaks on transformers. For the reidentification task, if they can’t do it geometrically because they lack camera parameters, they use SimCLR-based self-supervised methods or Transformer-based architectures are used. It would be impossible to train all these models without having access to a large number of instances equipped with high-performance GPUs, so all the models were trained in parallel using Amazon SageMaker Training jobs with GPU accelerated ML instances. Inference has the same structure and is orchestrated by a Step Functions state machine that governs several SageMaker processing and training jobs that, despite the name, are as usable in training as in inference.
The following is a high-level architecture of the ML pipeline with its main steps.
This diagram shows the simplified architecture of the ODIN image inference pipeline, which extracts and analyzes ROIs (such as electricity posts) from dataset images. The pipeline further drills down on ROIs, extracting and analyzing electrical elements (transformers, insulators, and so on). After the components (ROIs and elements) are finalized, the reidentification process begins: images and poles in the network map are matched based on 3D metadata. This allows the clustering of ROIs referring to the same pole. After that, anomalies get finalized and reports are generated.
Extracting precise measurements using LiDAR point clouds
High-resolution photographs are very useful, but because they’re 2D, it’s impossible to extract precise measurements from them. LiDAR point clouds come to the rescue here, because they are 3D and have each point in the cloud a position with an associated error of less than a handful of centimeters.
However, in many cases, a raw point cloud is not useful, because you can’t do much with it if you don’t know whether a set of points represents a tree, a power line, or a house. For this reason, Enel uses KPConv, a semantic point cloud segmentation algorithm, to assign a class to each point. After the cloud is classified, it’s possible to figure out whether vegetation is too close to the power line rather than measuring the tilt of poles. Due to the flexibility of SageMaker services, the pipeline of this solution is not much different from the one already described, with the only difference being that in this case it is necessary to use GPU instances for inference as well.
The following are some examples of point cloud images.
Looking at the power grid from space: Mapping vegetation to prevent service disruptions
Inspecting the power grid with helicopters and other means is generally very expensive and can’t be done too frequently. On the other hand, having a system to monitor vegetation trends in short time intervals is extremely useful for optimizing one of the most expensive processes of an energy distributor: tree pruning. This is why Enel also included in its solution the analysis of satellite images, from which with a multitask approach is identified where vegetation is present, its density, and the type of plants divided into macro classes.
For this use case, after experimenting with different resolutions, Enel concluded that the free Sentinel 2 images provided by the Copernicus program had the best cost-benefit ratio. In addition to vegetation, Enel also uses satellite imagery to identify buildings, which is useful information to understand if there are discrepancies between their presence and where Enel delivers power and therefore any irregular connections or problems in the databases. For the latter use case, the resolution of Sentinel 2, where one pixel represents an area of 10 square meters, is not sufficient, and so paid-for images with a resolution of 50 square centimeters are purchased. This solution also doesn’t differ much from the previous ones in terms of services used and flow.
The following is an aerial picture with identification of assets (pole and insulators).
Angela Italiano, Director of Data Science at ENEL Grid, says,
“At Enel, we use computer vision models to inspect our electricity distribution network by reconstructing 3D images of our network using tens of millions of high-quality images and LiDAR point clouds. The training of these ML models requires access to a large number of instances equipped with high-performance GPUs and the ability to handle large volumes of data efficiently. With Amazon SageMaker, we can quickly train all of our models in parallel without needing to manage the infrastructure as Amazon SageMaker training scales the compute resources up and down as needed. Using Amazon SageMaker, we are able to build 3D images of our systems, monitor for anomalies, and serve over 60 million customers efficiently.”
Conclusion
In this post, we saw how a top player in the energy world like Enel used computer vision models and SageMaker training and processing jobs to solve one of the main problems of those who have to manage an infrastructure of this colossal size, keep track of installed assets, and identify anomalies and sources of danger for a power line such as vegetation too close to it.
Learn more about the related features of SageMaker.
About the Authors
Mario Namtao Shianti Larcher is the Head of Computer Vision at Enel. He has a background in mathematics, statistics, and a profound expertise in machine learning and computer vision, he leads a team of over ten professionals. Mario’s role entails implementing advanced solutions that effectively utilize the power of AI and computer vision to leverage Enel’s extensive data resources. In addition to his professional endeavors, he nurtures a personal passion for both traditional and AI-generated art.
Cristian Gavazzeni is a Senior Solution Architect at Amazon Web Services. He has more than 20 years of experience as a pre-sales consultant focusing on Data Management, Infrastructure and Security. During his spare time he likes playing golf with friends and travelling abroad with only fly and drive bookings.
Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering an ML background, he works with customers of any size to deeply understand their business and technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, Computer Vision, NLP, and involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.
Efficiently train, tune, and deploy custom ensembles using Amazon SageMaker
Artificial intelligence (AI) has become an important and popular topic in the technology community. As AI has evolved, we have seen different types of machine learning (ML) models emerge. One approach, known as ensemble modeling, has been rapidly gaining traction among data scientists and practitioners. In this post, we discuss what ensemble models are and why their usage can be beneficial. We then provide an example of how you can train, optimize, and deploy your custom ensembles using Amazon SageMaker.
Ensemble learning refers to the use of multiple learning models and algorithms to gain more accurate predictions than any single, individual learning algorithm. They have been proven to be efficient in diverse applications and learning settings such as cybersecurity [1] and fraud detection, remote sensing, predicting best next steps in financial decision-making, medical diagnosis, and even computer vision and natural language processing (NLP) tasks. We tend to categorize ensembles by the techniques used to train them, their composition, and the way they merge the different predictions into a single inference. These categories include:
- Boosting – Training sequentially multiple weak learners, where each incorrect prediction from previous learners in the sequence is given a higher weight and input to the next learner, thereby creating a stronger learner. Examples include AdaBoost, Gradient Boosting, and XGBoost.
- Bagging – Uses multiple models to reduce the variance of a single model. Examples include Random Forest and Extra Trees.
- Stacking (blending) – Often uses heterogenous models, where predictions of each individual estimator are stacked together and used as input to a final estimator that handles the prediction. This final estimator’s training process often uses cross-validation.
There are multiple methods of combining the predictions into the single one that the model finally produce, for example, using a meta-estimator such as linear learner, a voting method that uses multiple models to make a prediction based on majority voting for classification tasks, or an ensemble averaging for regression.
Although several libraries and frameworks provide implementations of ensemble models, such as XGBoost, CatBoost, or scikit-learn’s random forest, in this post we focus on bringing your own models and using them as a stacking ensemble. However, instead of using dedicated resources for each model (dedicated training and tuning jobs and hosting endpoints per model), we train, tune, and deploy a custom ensemble (multiple models) using a single SageMaker training job and a single tuning job, and deploy to a single endpoint, thereby reducing possible cost and operational overhead.
BYOE: Bring your own ensemble
There are several ways to train and deploy heterogenous ensemble models with SageMaker: you can train each model in a separate training job and optimize each model separately using Amazon SageMaker Automatic Model Tuning. When hosting these models, SageMaker provides various cost-effective ways to host multiple models on the same tenant infrastructure. Detailed deployment patterns for this kind of settings can be found in Model hosting patterns in Amazon SageMaker, Part 1: Common design patterns for building ML applications on Amazon SageMaker. These patterns include using multiple endpoints (for each trained model) or a single multi-model endpoint, or even a single multi-container endpoint where the containers can be invoked individually or chained in a pipeline. All these solutions include a meta-estimator (for example in an AWS Lambda function) that invokes each model and implements the blending or voting function.
However, running multiple training jobs might introduce operational and cost overhead, especially if your ensemble requires training on the same data. Similarly, hosting different models on separate endpoints or containers and combining their prediction results for better accuracy requires multiple invocations, and therefore introduces additional management, cost, and monitoring efforts. For example, SageMaker supports ensemble ML models using Triton Inference Server, but this solution requires the models or model ensembles to be supported by the Triton backend. Additionally, additional efforts are required from the customer to set up the Triton server and additional learning to understand how different Triton backends work. Therefore, customers prefer a more straightforward way to implement solutions where they only need to send the invocation once to the endpoint and have the flexibility to control how the results are aggregated to generate the final output.
Solution overview
To address these concerns, we walk through an example of ensemble training using a single training job, optimizing the model’s hyperparameters and deploying it using a single container to a serverless endpoint. We use two models for our ensemble stack: CatBoost and XGBoost (both of which are boosting ensembles). For our data, we use the diabetes dataset [2] from the scikit-learn library: It consists of 10 features (age, sex, body mass, blood pressure, and six blood serum measurements), and our model predicts the disease progression 1 year after baseline features were collected (a regression model).
The full code repository can be found on GitHub.
Train multiple models in a single SageMaker job
For training our models, we use SageMaker training jobs in Script mode. With Script mode, you can write custom training (and later inference code) while using SageMaker framework containers. Framework containers enable you to use ready-made environments managed by AWS that include all necessary configuration and modules. To demonstrate how you can customize a framework container, as an example, we use the pre-built SKLearn container, which doesn’t include the XGBoost and CatBoost packages. There are two options to add these packages: either extend the built-in container to install CatBoost and XGBoost (and then deploy as a custom container), or use the SageMaker training job script mode feature, which allows you to provide a requirements.txt
file when creating the training estimator. The SageMaker training job installs the listed libraries in the requirements.txt
file during run time. This way, you don’t need to manage your own Docker image repository and it provides more flexibility to running training scripts that need additional Python packages.
The following code block shows the code we use to start the training. The entry_point
parameter points to our training script. We also use two of the SageMaker SDK API’s compelling features:
- First, we specify the local path to our source directory and dependencies in the
source_dir
anddependencies
parameters, respectively. The SDK will compress and upload those directories to Amazon Simple Storage Service (Amazon S3) and SageMaker will make them available on the training instance under the working directory/opt/ml/code
. - Second, we use the SDK
SKLearn
estimator object with our preferred Python and framework version, so that SageMaker will pull the corresponding container. We have also defined a custom training metric ‘validation:rmse
‘, which will be emitted in the training logs and captured by SageMaker. Later, we use this metric as the objective metric in the tuning job.
hyperparameters = {"num_round": 6, "max_depth": 5}
estimator_parameters = {
"entry_point": "multi_model_hpo.py",
"source_dir": "code",
"dependencies": ["my_custom_library"],
"instance_type": training_instance_type,
"instance_count": 1,
"hyperparameters": hyperparameters,
"role": role,
"base_job_name": "xgboost-model",
"framework_version": "1.0-1",
"keep_alive_period_in_seconds": 60,
"metric_definitions":[
{'Name': 'validation:rmse', 'Regex': 'validation-rmse:(.*?);'}
]
}
estimator = SKLearn(**estimator_parameters)
Next, we write our training script (multi_model_hpo.py). Our script follows a simple flow: capture hyperparameters with which the job was configured and train the CatBoost model and XGBoost model. We also implement a k-fold cross validation function. See the following code:
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# Sagemaker specific arguments. Defaults are set in the environment variables.
parser.add_argument("--output-data-dir", type=str, default=os.environ["SM_OUTPUT_DATA_DIR"])
parser.add_argument("--model-dir", type=str, default=os.environ["SM_MODEL_DIR"])
parser.add_argument("--train", type=str, default=os.environ["SM_CHANNEL_TRAIN"])
parser.add_argument("--validation", type=str, default=os.environ["SM_CHANNEL_VALIDATION"])
.
.
.
"""
Train catboost
"""
K = args.k_fold
catboost_hyperparameters = {
"max_depth": args.max_depth,
"eta": args.eta,
}
rmse_list, model_catboost = cross_validation_catboost(train_df, K, catboost_hyperparameters)
.
.
.
"""
Train the XGBoost model
"""
hyperparameters = {
"max_depth": args.max_depth,
"eta": args.eta,
"objective": args.objective,
"num_round": args.num_round,
}
rmse_list, model_xgb = cross_validation(train_df, K, hyperparameters)
After the models are trained, we calculate the mean of both the CatBoost and XGBoost predictions. The result, pred_mean
, is our ensemble’s final prediction. Then, we determine the mean_squared_error
against the validation set. val_rmse
is used for the evaluation of the whole ensemble during training. Notice that we also print the RMSE value in a pattern that fits the regex we used in the metric_definitions
. Later, SageMaker Automatic Model Tuning will use that to capture the objective metric. See the following code:
pred_mean = np.mean(np.array([pred_catboost, pred_xgb]), axis=0)
val_rmse = mean_squared_error(y_validation, pred_mean, squared=False)
print(f"Final evaluation result: validation-rmse:{val_rmse}")
Finally, our script saves both model artifacts to the output folder located at /opt/ml/model
.
When a training job is complete, SageMaker packages and copies the content of the /opt/ml/model
directory as a single object in compressed TAR format to the S3 location that you specified in the job configuration. In our case, SageMaker bundles the two models in a TAR file and uploads it to Amazon S3 at the end of the training job. See the following code:
model_file_name = 'catboost-regressor-model.dump'
# Save CatBoost model
path = os.path.join(args.model_dir, model_file_name)
print('saving model file to {}'.format(path))
model.save_model(path)
.
.
.
# Save XGBoost model
model_location = args.model_dir + "/xgboost-model"
pickle.dump(model, open(model_location, "wb"))
logging.info("Stored trained model at {}".format(model_location))
In summary, you should notice that in this procedure we downloaded the data one time and trained two models using a single training job.
Automatic ensemble model tuning
Because we’re building a collection of ML models, exploring all of the possible hyperparameter permutations is impractical. SageMaker offers Automatic Model Tuning (AMT), which looks for the best model hyperparameters by focusing on the most promising combinations of values within ranges that you specify (it’s up to you to define the right ranges to explore). SageMaker supports multiple optimization methods for you to choose from.
We start by defining the two parts of the optimization process: the objective metric and hyperparameters we want to tune. In our example, we use the validation RMSE as the target metric and we tune eta
and max_depth
(for other hyperparameters, refer to XGBoost Hyperparameters and CatBoost hyperparameters):
from sagemaker.tuner import (
IntegerParameter,
ContinuousParameter,
HyperparameterTuner,
)
hyperparameter_ranges = {
"eta": ContinuousParameter(0.2, 0.3),
"max_depth": IntegerParameter(3, 4)
}
metric_definitions = [{"Name": "validation:rmse", "Regex": "validation-rmse:([0-9\.]+)"}]
objective_metric_name = "validation:rmse"
We also need to ensure in the training script that our hyperparameters are not hardcoded and are pulled from the SageMaker runtime arguments:
catboost_hyperparameters = {
"max_depth": args.max_depth,
"eta": args.eta,
}
SageMaker also writes the hyperparameters to a JSON file and can be read from /opt/ml/input/config/hyperparameters.json
on the training instance.
Like CatBoost, we also capture the hyperparameters for the XGBoost model (notice that objective
and num_round
aren’t tuned):
catboost_hyperparameters = {
"max_depth": args.max_depth,
"eta": args.eta,
}
Finally, we launch the hyperparameter tuning job using these configurations:
tuner = HyperparameterTuner(
estimator,
objective_metric_name,
hyperparameter_ranges,
max_jobs=4,
max_parallel_jobs=2,
objective_type='Minimize'
)
tuner.fit({"train": train_location, "validation": validation_location}, include_cls_metadata=False)
When the job is complete, you can retrieve the values for the best training job (with minimal RMSE):
job_name=tuner.latest_tuning_job.name
attached_tuner = HyperparameterTuner.attach(job_name)
attached_tuner.describe()["BestTrainingJob"]
For more information on AMT, refer to Perform Automatic Model Tuning with SageMaker.
Deployment
To deploy our custom ensemble, we need to provide a script to handle the inference request and configure SageMaker hosting. In this example, we used a single file that includes both the training and inference code (multi_model_hpo.py). SageMaker uses the code under if _ name _ == "_ main _"
for the training and the functions model_fn
, input_fn
, and predict_fn
when deploying and serving the model.
Inference script
As with training, we use the SageMaker SKLearn framework container with our own inference script. The script will implement three methods required by SageMaker.
First, the model_fn
method reads our saved model artifact files and loads them into memory. In our case, the method returns our ensemble as all_model
, which is a Python list, but you can also use a dictionary with model names as keys.
def model_fn(model_dir):
catboost_model = CatBoostRegressor()
catboost_model.load_model(os.path.join(model_dir, model_file_name))
model_file = "xgboost-model"
model = pickle.load(open(os.path.join(model_dir, model_file), "rb"))
all_model = [catboost_model, model]
return all_model
Second, the input_fn
method deserializes the request input data to be passed to our inference handler. For more information about input handlers, refer to Adapting Your Own Inference Container.
def input_fn(input_data, content_type):
dtype=None
payload = StringIO(input_data)
return np.genfromtxt(payload, dtype=dtype, delimiter=",")
Third, the predict_fn
method is responsible for getting predictions from the models. The method takes the model and the data returned from input_fn
as parameters and returns the final prediction. In our example, we get the CatBoost result from the model list first member (model[0]
) and the XGBoost from the second member (model[1]
), and we use a blending function that returns the mean of both predictions:
def predict_fn(input_data, model):
predictions_catb = model[0].predict(input_data)
dtest = xgb.DMatrix(input_data)
predictions_xgb = model[1].predict(dtest,
ntree_limit=getattr(model, "best_ntree_limit", 0),
validate_features=False)
return np.mean(np.array([predictions_catb, predictions_xgb]), axis=0)
Now that we have our trained models and inference script, we can configure the environment to deploy our ensemble.
SageMaker Serverless Inference
Although there are many hosting options in SageMaker, in this example, we use a serverless endpoint. Serverless endpoints automatically launch compute resources and scale them in and out depending on traffic. This takes away the undifferentiated heavy lifting of managing servers. This option is ideal for workloads that have idle periods between traffic spurts and can tolerate cold starts.
Configuring the serverless endpoint is straightforward because we don’t need to choose instance types or manage scaling policies. We only need to provide two parameters: memory size and maximum concurrency. The serverless endpoint automatically assigns compute resources proportional to the memory you select. If you choose a larger memory size, your container has access to more vCPUs. You should always choose your endpoint’s memory size according to your model size. The second parameter we need to provide is maximum concurrency. For a single endpoint, this parameter can be set up to 200 (as of this writing, the limit for total number of serverless endpoints in a Region is 50). You should note that the maximum concurrency for an individual endpoint prevents that endpoint from taking up all the invocations allowed for your account, because any endpoint invocations beyond the maximum are throttled (for more information about the total concurrency for all serverless endpoints per Region, refer to Amazon SageMaker endpoints and quotas).
from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
serverless_config = ServerlessInferenceConfig(
memory_size_in_mb=6144,
max_concurrency=1,
)
Now that we configured the endpoint, we can finally deploy the model that was selected in our hyperparameter optimization job:
estimator=attached_tuner.best_estimator()
predictor = estimator.deploy(serverless_inference_config=serverless_config)
Clean up
Even though serverless endpoints have zero cost when not being used, when you have finished running this example, you should make sure to delete the endpoint:
predictor.delete_endpoint(predictor.endpoint)
Conclusion
In this post, we covered one approach to train, optimize, and deploy a custom ensemble. We detailed the process of using a single training job to train multiple models, how to use automatic model tuning to optimize the ensemble hyperparameters, and how to deploy a single serverless endpoint that blends the inferences from multiple models.
Using this method solves potential cost and operational issues. The cost of a training job is based on the resources you use for the duration of usage. By downloading the data only once for training the two models, we reduced by half the job’s data download phase and the used volume that stores the data, thereby reducing the training job’s overall cost. Furthermore, the AMT job ran four training jobs, each with the aforementioned reduced time and storage, so that represent 4 times in cost saving! With regard to model deployment on a serverless endpoint, because you also pay for the amount of data processed, by invoking the endpoint only once for two models, you pay half of the I/O data charges.
Although this post only showed the benefits with two models, you can use this method to train, tune, and deploy numerous ensemble models to see an even greater effect.
References
[1] Raj Kumar, P. Arun; Selvakumar, S. (2011). “Distributed denial of service attack detection using an ensemble of neural classifier”. Computer Communications. 34 (11): 1328–1341. doi:10.1016/j.comcom.2011.01.012. [2] Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) “Least Angle Regression,” Annals of Statistics (with discussion), 407-499. (https://web.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)About the Authors
Melanie Li, PhD, is a Senior AI/ML Specialist TAM at AWS based in Sydney, Australia. She helps enterprise customers to build solutions leveraging the state-of-the-art AI/ML tools on AWS and provides guidance on architecting and implementing machine learning solutions with best practices. In her spare time, she loves to explore nature outdoors and spend time with family and friends.
Uri Rosenberg is the AI & ML Specialist Technical Manager for Europe, Middle East, and Africa. Based out of Israel, Uri works to empower enterprise customers to design, build, and operate ML workloads at scale. In his spare time, he enjoys cycling, hiking, and minimizing RMSEs.
Use a generative AI foundation model for summarization and question answering using your own data
Large language models (LLMs) can be used to analyze complex documents and provide summaries and answers to questions. The post Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data describes how to fine-tune an LLM using your own dataset. Once you have a solid LLM, you’ll want to expose that LLM to business users to process new documents, which could be hundreds of pages long. In this post, we demonstrate how to construct a real-time user interface to let business users process a PDF document of arbitrary length. Once the file is processed, you can summarize the document or ask questions about the content. The sample solution described in this post is available on GitHub.
Working with financial documents
Financial statements like quarterly earnings reports and annual reports to shareholders are often tens or hundreds of pages long. These documents contain a lot of boilerplate language like disclaimers and legal language. If you want to extract the key data points from one of these documents, you need both time and some familiarity with the boilerplate language so you can identify the interesting facts. And of course, you can’t ask an LLM questions about a document it has never seen.
LLMs used for summarization have a limit on the number of tokens (characters) passed into the model, and with some exceptions, these are typically no more than a few thousand tokens. That normally precludes the ability to summarize longer documents.
Our solution handles documents that exceed an LLM’s maximum token sequence length, and make that document available to the LLM for question answering.
Solution overview
Our design has three important pieces:
- It has an interactive web application for business users to upload and process PDFs
- It uses the langchain library to split a large PDF into more manageable chunks
- It uses the retrieval augmented generation technique to let users ask questions about new data that the LLM hasn’t seen before
As shown in the following diagram, we use a front end implemented with React JavaScript hosted in an Amazon Simple Storage Service (Amazon S3) bucket fronted by Amazon CloudFront. The front-end application lets users upload PDF documents to Amazon S3. After the upload is complete, you can trigger a text extraction job powered by Amazon Textract. As part of the post-processing, an AWS Lambda function inserts special markers into the text indicating page boundaries. When that job is done, you can invoke an API that summarizes the text or answers questions about it.
Because some of these steps may take some time, the architecture uses a decoupled asynchronous approach. For example, the call to summarize a document invokes a Lambda function that posts a message to an Amazon Simple Queue Service (Amazon SQS) queue. Another Lambda function picks up that message and starts an Amazon Elastic Container Service (Amazon ECS) AWS Fargate task. The Fargate task calls the Amazon SageMaker inference endpoint. We use a Fargate task here because summarizing a very long PDF may take more time and memory than a Lambda function has available. When the summarization is done, the front-end application can pick up the results from an Amazon DynamoDB table.
For summarization, we use AI21’s Summarize model, one of the foundation models available through Amazon SageMaker JumpStart. Although this model handles documents of up to 10,000 words (approximately 40 pages), we use langchain’s text splitter to make sure that each summarization call to the LLM is no more than 10,000 words long. For text generation, we use Cohere’s Medium model, and we use GPT-J for embeddings, both via JumpStart.
Summarization processing
When handling larger documents, we need to define how to split the document into smaller pieces. When we get the text extraction results back from Amazon Textract, we insert markers for larger chunks of text (a configurable number of pages), individual pages, and line breaks. Langchain will split based on those markers and assemble smaller documents that are under the token limit. See the following code:
The LLM in the summarization chain is a thin wrapper around our SageMaker endpoint:
Question answering
In the retrieval augmented generation method, we first split the document into smaller segments. We create embeddings for each segment and store them in the open-source Chroma vector database via langchain’s interface. We save the database in an Amazon Elastic File System (Amazon EFS) file system for later use. See the following code:
When the embeddings are ready, the user can ask a question. We search the vector database for the text chunks that most closely match the question:
We take the closest matching chunk and use it as context for the text generation model to answer the question:
User experience
Although LLMs represent advanced data science, most of the use cases for LLMs ultimately involve interaction with non-technical users. Our example web application handles an interactive use case where business users can upload and process a new PDF document.
The following diagram shows the user interface. A user starts by uploading a PDF. After the document is stored in Amazon S3, the user is able to start the text extraction job. When that’s complete, the user can invoke the summarization task or ask questions. The user interface exposes some advanced options like the chunk size and chunk overlap, which would be useful for advanced users who are testing the application on new documents.
Next steps
LLMs provide significant new information retrieval capabilities. Business users need convenient access to those capabilities. There are two directions for future work to consider:
- Take advantage of the powerful LLMs already available in Jumpstart foundation models. With just a few lines of code, our sample application could deploy and make use of advanced LLMs from AI21 and Cohere for text summarization and generation.
- Make these capabilities accessible to non-technical users. A prerequisite to processing PDF documents is extracting text from the document, and summarization jobs may take several minutes to run. That calls for a simple user interface with asynchronous backend processing capabilities, which is easy to design using cloud-native services like Lambda and Fargate.
We also note that a PDF document is semi-structured information. Important cues like section headings are difficult to identify programmatically, because they rely on font sizes and other visual indicators. Identifying the underlying structure of information helps the LLM process the data more accurately, at least until such time that LLMs can handle input of unbounded length.
Conclusion
In this post, we showed how to build an interactive web application that lets business users upload and process PDF documents for summarization and question answering. We saw how to take advantage of Jumpstart foundation models to access advanced LLMs, and use text splitting and retrieval augmented generation techniques to process longer documents and make them available as information to the LLM.
At this point in time, there is no reason not to make these powerful capabilities available to your users. We encourage you to start using the Jumpstart foundation models today.
About the author
Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.
Integrate Amazon SageMaker Model Cards with the model registry
Amazon SageMaker Model Cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a factsheet of the model that is important for model governance.
Until now, model cards were logically associated to a model in the Amazon SageMaker Model Registry using model name match. However, when solving a business problem through a machine learning (ML) model, as customers iterate on the problem, they create multiple versions of the model and they need to operationalize and govern multiple model versions. Therefore, they need the ability to associate a model card to a particular model version.
In this post, we discuss a new feature that supports integrating model cards with the model registry at the deployed model version level. We discuss the solution architecture and best practices for managing model card versions, and walk through how to set up, operationalize, and govern the model card integration with the model version in the model registry.
Solution overview
SageMaker model cards help you standardize documenting your models from a governance perspective, and the SageMaker model registry helps you deploy and operationalize ML models. The model registry supports a hierarchical structure for organizing and storing ML models with model metadata information.
When an organization solves a business problem using ML, such as a customer churn prediction, we recommend the following steps:
- Create a model card for the business problem to be solved.
- Create a model package group for the business problem to be solved.
- Build, train, evaluate, and register the first version of the model package version (for example, Customer Churn V1).
- Update the model card linking the model package version to the model card.
- As you iterate on new model package version, clone the model card from the previous version and link to the new model package version (for example, Customer Churn V2).
The following figure illustrates how a SageMaker model card integrates with the model registry.
As illustrated in the preceding diagram, the integration of SageMaker model cards and the model registry allows you to associate a model card with a specific model version in the model registry. This enables you to establish a single source of truth for your registered model versions, with comprehensive and standardized documentation across all stages of the model’s journey on SageMaker, facilitating discoverability and promoting governance, compliance, and accountability throughout the model lifecycle.
Best practices for managing model cards
Operating in machine learning with governance is a critical requirement for many enterprise organizations today, notably in highly regulated industries. As part of those requirements, AWS provides several services that enable reliable operation of the ML environment.
SageMaker model cards document critical details about your ML models in a single place for streamlined governance and reporting. Model cards help you capture details such as the intended use and risk rating of a model, training details and metrics, evaluation results and observations, and additional call-outs such as considerations, recommendations, and custom information.
Model cards need to be managed and updated as part of your development process, throughout the ML lifecycle. They are an important part of continuous delivery and pipelines in ML. In the same way that a Well-Architected ML project implements continuous integration and continuous delivery (CI/CD) under the umbrella of MLOps, a continuous ML documentation process is a critical capability in a lot of regulated industries or for higher risk use cases. Model cards are part of the best practices for responsible and transparent ML development.
The following diagram shows how model cards should be part of a development lifecycle.
Consider the following best practices:
- We recommend creating model cards early in your project lifecycle. In the first phase of the project, when you are working on identifying the business goal and framing the ML problem, you should initiate the creation of the model card. As you work through the different steps of business requirements and important performance metrics, you can create the model card in a draft status and determine the business details and intended uses.
- As part of your model development lifecycle phase, you should use the model registry to catalog models for production, manage model versions, and associate metadata with a model. The model registry enables lineage tracking.
- After you have iterated successfully and are ready to deploy your model to production, it’s time to update the model card. In the deployment lifecycle phase, you can update the model details of the model card. You should also update training details, evaluation details, ethical considerations, and caveats and recommendations.
Model cards have versions associated with them. A given model version is immutable across all attributes other than the model card status. If you make any other changes to the model card, such as evaluation metrics, description, or intended uses, SageMaker creates a new version of the model card to reflect the updated information. This is to ensure that a model card, once created, can’t be tampered with. Additionally, each unique model name can have only one associated model card and it can’t be changed after you create the model card.
ML models are dynamic and workflow automation components enable you to easily scale your ability to build, train, test, and deploy hundreds of models in production, iterate faster, reduce errors due to manual orchestration, and build repeatable mechanisms.
Therefore, the lifecycle of your model cards will look as described in the following diagram. Every time you update your model card through the model lifecycle, you automatically create a new version of the model card. Every time you iterate on a new model version, you create a new model card that can inherit some model card information of the previous model versions and follow the same lifecycle.
Pre-requisites
This post assumes that you already have models in your model registry. If you want to follow along, you can use the following SageMaker example on GitHub to populate your model registry: SageMaker Pipelines integration with Model Monitor and Clarify.
Integrate a model card with the model version in the model registry
In this example, we have the model-monitor-clarify-group
package in our model registry.
In this package, two model versions are available.
For this example, we link Version 1 of the model to a new model card. In the model registry, you can see the details for Version 1.
We can now use the new feature in the SageMaker Python SDK. From the sagemaker.model_card ModelPackage
module, you can select a specific model version from the model registry that you would like to link the model card to.
You can now create a new model card for the model version and specify the model_package_details
parameter with the previous model package retrieved. You need to populate the model card with all the additional details necessary. For this post, we create a simple model card as an example.
You can then use that definition to create a model card using the SageMaker Python SDK.
When loading the model card again, you can see the associated model under "__model_package_details"
.
You also have the option to update an existing model card with the model_package
as shown in the example code snippet below:
Finally, when creating or updating a new model package version in an existing model package, if a model card already exists in that model package group, some information such as the business details and intended uses can be carried over to the new model card.
Clean up
Users are responsible for cleaning up resources if created using the notebook mentioned in the pre-requisites section. Please follow the instructions in the notebook to clean up resources.
Conclusion
In this post, we discussed how to integrate a SageMaker model card with a model version in the model registry. We shared the solution architecture with best practices for implementing a model card and showed how to set up and operationalize a model card to improve your model governance posture. We encourage you to try out this solution and share your feedback in the comments section.
About the Authors
Ram Vittal is a Principal ML Solutions Architect at AWS. He has over 20 years of experience architecting and building distributed, hybrid, and cloud applications. He is passionate about building secure and scalable AI/ML and big data solutions to help enterprise customers with their cloud adoption and optimization journey to improve their business outcomes. In his spare time, he rides his motorcycle and walks with his 2-year-old sheep-a-doodle!
Natacha Fort is the Government Data Science Lead for Public Sector Australia and New Zealand, Principal SA at AWS. She helps organizations navigate their machine learning journey, supporting them from framing the machine learning problem to deploying into production, all the while making sure the best architecture practices are in place to ensure their success. Natacha focuses with organizations on MLOps and responsible AI.
Predicting congestion in fleets of robots
Predicting the delays caused when robots’ paths intersect can improve task assignment and path planning in warehouses.Read More
Enhance Amazon Lex with conversational FAQ features using LLMs
Amazon Lex is a service that allows you to quickly and easily build conversational bots (“chatbots”), virtual agents, and interactive voice response (IVR) systems for applications such as Amazon Connect.
Artificial intelligence (AI) and machine learning (ML) have been a focus for Amazon for over 20 years, and many of the capabilities that customers use with Amazon are driven by ML. Today, large language models (LLMs) are transforming the way developers and enterprises solve historically complex challenges related to natural language understanding (NLU). We announced Amazon Bedrock recently, which democratizes Foundational Model access for developers to easily build and scale generative AI-based applications, using familiar AWS tools and capabilities. One of the challenges enterprises face is to incorporate their business knowledge into LLMs to deliver accurate and relevant responses. When leveraged effectively, enterprise knowledge bases can be used to deliver tailored self-service and assisted-service experiences, by delivering information that helps customers solve problems independently and/or augmenting an agent’s knowledge. Today, a bot developer can improve self-service experiences without utilizing LLMs in a couple of ways. First, by creating intents, sample utterances, and responses, thereby covering all anticipated user questions within an Amazon Lex bot. Second, developers can also integrate bots with search solutions, which can index documents stored across a wide range of repositories and find the most relevant document to answer their customer’s question. These methods are effective, but require developer resources making getting started difficult.
One of the benefits offered by LLMs is the ability to create relevant and compelling conversational self-service experiences. They do so by leveraging enterprise knowledge base(s) and delivering more accurate and contextual responses. This blog post introduces a powerful solution for augmenting Amazon Lex with LLM-based FAQ features using the Retrieval Augmented Generation (RAG). We will review how the RAG approach augments Amazon Lex FAQ responses using your company data sources. In addition, we will also demonstrate Amazon Lex integration with LlamaIndex, which is an open-source data framework that provides knowledge source and format flexibility to the bot developer. As a bot developer gains confidence with using a LlamaIndex to explore LLM integration, they can scale the Amazon Lex capability further. They can also use enterprise search services such as Amazon Kendra, which is natively integrated with Amazon Lex.
In this solution, we showcase the practical application of an Amazon Lex chatbot with LLM-based RAG enhancement. We use the Zappos customer support use case as an example to demonstrate the effectiveness of this solution, which takes the user through an enhanced FAQ experience (with LLM), rather than directing them to fallback (default, without LLM).
Solution overview
RAG combines the strengths of traditional retrieval-based and generative AI based approaches to Q&A systems. This methodology harnesses the power of large language models, such as Amazon Titan or open-source models (for example, Falcon), to perform generative tasks in retrieval systems. It also takes into account the semantic context from stored documents more effectively and efficiently.
RAG starts with an initial retrieval step to retrieve relevant documents from a collection based on the user’s query. It then employs a language model to generate a response by considering both the retrieved documents and the original query. By integrating RAG into Amazon Lex, we can provide accurate and comprehensive answers to user queries, resulting in a more engaging and satisfying user experience.
The RAG approach requires document ingestion so that embeddings can be created to enable LLM-based search. The following diagram shows how the ingestion process creates the embeddings that are then used by the chatbot during fallback to answer the customer’s question.
With this solution architecture, you should choose the most suitable LLM for your use case. It also provides an inference endpoint choice between Amazon Bedrock (in limited preview) and models hosted on Amazon SageMaker JumpStart, offering additional LLM flexibility.
The document is uploaded to an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket has an event listener attached that invokes an AWS Lambda function on changes to the bucket. The event listener ingests the new document and places the embeddings in another S3 bucket. The embeddings are then used by the RAG implementation in the Amazon Lex bot during the fallback intent to answer the customer’s question. The next diagram shows the architecture of how an FAQ bot within Lex can be enhanced with LLMs and RAG.
Let’s explore how we can integrate RAG based on LlamaIndex into an Amazon Lex bot. We provide code examples and an AWS Cloud Development Kit (AWS CDK) import to assist you in setting up the integration. You can find the code examples in our GitHub repository. The following sections provide a step-by-step guide to help you set up the environment and deploy the necessary resources.
How RAG works with Amazon Lex
The flow of RAG involves an iterative process where the retriever component retrieves relevant passages, the question and passages help construct the prompt, and the generation component produces a response. This combination of retrieval and generation techniques allows the RAG model to take advantage of the strengths of both approaches, providing accurate and contextually appropriate answers to user questions. The workflow provides the following capabilities:
- Retriever engine – The RAG model begins with a retriever component responsible for retrieving relevant documents from a large corpus. This component typically uses an information retrieval technique like TF-IDF or BM25 to rank and select documents that are likely to contain the answer to a given question. The retriever scans the document corpus and retrieves a set of relevant passages.
- Prompt helper – After the retriever has identified the relevant passages, the RAG model moves to prompt creation. The prompt is a combination of the question and the retrieved passages, serving as additional context for the prompt, which is used as input to the generator component. To create the prompt, the model typically augments the question with the selected passages in a specific format.
- Response generation – The prompt, consisting of the question and relevant passages, is fed into the generation component of the RAG model. The generation component is usually a language model capable of reasoning through the prompt to generate a coherent and relevant response.
- Final response – Finally, the RAG model selects the highest-ranked answer as the output and presents it as the response to the original question. The selected answer can be further postprocessed or formatted as necessary before being returned to the user. In addition, the solution enables the filtering of the generated response if the retrieval results yields a low confidence score, implying that it likely falls outside the distribution (OOD).
LlamaIndex: An open-source data framework for LLM-based applications
In this post, we demonstrate the RAG solution based on LlamaIndex. LlamaIndex is an open-source data framework specifically designed to facilitate LLM-based applications. It offers a robust and scalable solution for managing document collection in different formats. With LlamaIndex, bot developers are empowered to effortlessly integrate LLM-based QA (question answering) capabilities into their applications, eliminating the complexities associated with managing solutions catered to large-scale document collections. Furthermore, this approach proves to be cost-effective for smaller-sized document repositories.
Prerequisites
You should have the following prerequisites:
- An AWS account
- An AWS Identity and Access Management (IAM) user and role permissions to access the following:
- Amazon Lex
- Lambda
- Amazon SageMaker
- An S3 bucket
- The AWS CDK installed
Set up your development environment
The main third-party package requirements are llama_index and sagemaker sdk. Follow the specified commands in our GitHub repository’s README to set up your environment properly.
Deploy the required resources
This step involves creating an Amazon Lex bot, S3 buckets, and a SageMaker endpoint. Additionally, you need to Dockerize the code in the Docker image directory and push the images to Amazon Elastic Container Registry (Amazon ECR) so that it can run in Lambda. Follow the specified commands in our GitHub repository’s README to deploy the services.
During this step, we demonstrate LLM hosting via SageMaker Deep Learning Containers. Adjust the settings according to your computation needs:
- Model – To find a model that meets your requirements, you can explore resources like the Hugging Face model hub. It offers a variety of models such as Falcon 7B or Flan-T5-XXL. Additionally, you can find detailed information about various officially supported model architectures, helping you make an informed decision. For more information about different model types, refer to optimized architectures.
- Model inference endpoint – Define the path of the model (for example, Falcon 7B), choose your instance type (for example, g5.4xlarge), and use quantization (for example, int-8 quantization).Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.Note: This solution provides you the flexibility to choose another model inferencing endpoint. You can also use Amazon Bedrock, which provides access to other LLMs such as Amazon Titan.
Set up your document index via LlamaIndex
To set up your document index, first upload your document data. We assume that you have the source of your FAQ content, such as a PDF or text file.
After the document data is uploaded, the LlamaIndex system will automatically initiate the process of creating the document index. This task is performed by a Lambda function, which generates the index and saves it to an S3 bucket.
To enable efficient retrieval of relevant information, configure the document retriever using the LlamaIndex Retriever Query Engine. This engine offers several customization options, such as the following:
- Embedding models – You can choose your embedding model, such as Hugging Face embedding.
- Confidence cutoff – Specify a confidence cutoff threshold to determine the quality of retrieval results. If the confidence score falls below this threshold, you can choose to provide out-of-scope responses, indicating that the query is beyond the scope of the indexed documents.
Test the integration
Define your bot definition with a fallback intent and use the Amazon Lex console to test your FAQ requests. For more details, please refer to GitHub repository. The following screenshot shows an example conversation with the bot.
Tips to boost your bot efficiency
The following tips could potentially further improve the efficiency of your bot:
- Index storage – Store your index in an S3 bucket or a service with vector database capabilities such as Amazon OpenSearch. By utilizing cloud-based storage solutions, you can enhance the accessibility and scalability of your index, leading to faster retrieval times and improved overall performance. Also, Refer to this blog post for an Amazon Lex bot that utilizes an Amazon Kendra search solution.
- Retrieval optimization – Experiment with different sizes of embedding models for the retriever. The choice of embedding model can significantly impact the input requirements of your LLM. Finding the optimal balance between model size and retrieval performance can result in improved efficiency and faster response times.
- Prompt engineering – Experiment with different prompt formats, lengths, and styles to optimize the performance and quality of your bot’s answers.
- LLM model selection – Select the most suitable LLM model for your specific use case. Consider factors such as model size, language capabilities, and compatibility with your application requirements. Choosing the right LLM model ensures optimal performance and efficient utilization of system resources.
Contact center conversations can span from self-service to a live human interaction. For use cases involving human-to-human interactions over Amazon Connect, you can use Wisdom to search and find content across multiple repositories, such as frequently asked questions (FAQs), wikis, articles, and step-by-step instructions for handling different customer issues.
Clean up
To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the SageMaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy
in the same directory as the other cdk commands to deprovision all the resources in your stack.
Summary
This post discussed the following steps to enhance Amazon Lex with LLM-based QA features using the RAG strategy and LlamaIndex:
- Install the necessary dependencies, including LlamaIndex libraries
- Set up model hosting via Amazon SageMaker or Amazon Bedrock (in limited preview)
- Configure LlamaIndex by creating an index and populating it with relevant documents
- Integrate RAG into Amazon Lex by modifying the configuration and configuring RAG to use LlamaIndex for document retrieval
- Test the integration by engaging in conversations with the chatbot and observing its retrieval and generation of accurate responses
By following these steps, you can seamlessly incorporate powerful LLM-based QA capabilities and efficient document indexing into your Amazon Lex chatbot, resulting in more accurate, comprehensive, and contextually aware interactions with users. As a follow up, we also invite you to review our next blog post, which explores enhancing the Amazon Lex FAQ experience using URL ingestion and LLMs.
About the authors
Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.
Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.
Saket Saurabh is an engineer with AWS Lex team. He works on improving Lex developer experience to help developers build more human-like chat bots. Outside of work, he enjoys traveling, discovering diverse cuisines, and learn about different cultures.
f
Enhance Amazon Lex with LLMs and improve the FAQ experience using URL ingestion
In today’s digital world, most consumers would rather find answers to their customer service questions on their own rather than taking the time to reach out to businesses and/or service providers. This blog post explores an innovative solution to build a question and answer chatbot in Amazon Lex that uses existing FAQs from your website. This AI-powered tool can provide quick, accurate responses to real-world inquiries, allowing the customer to quickly and easily solve common problems independently.
Single URL ingestion
Many enterprises have a published set of answers for FAQs for their customers available on their website. In this case, we want to offer customers a chatbot that can answer their questions from our published FAQs. In the blog post titled Enhance Amazon Lex with conversational FAQ features using LLMs, we demonstrated how you can use a combination of Amazon Lex and LlamaIndex to build a chatbot powered by your existing knowledge sources, such as PDF or Word documents. To support a simple FAQ, based on a website of FAQs, we need to create an ingestion process that can crawl the website and create embeddings that can be used by LlamaIndex to answer customer questions. In this case, we will build on the bot created in the previous blog post, which queries those embeddings with a user’s utterance and returns the answer from the website FAQs.
The following diagram shows how the ingestion process and the Amazon Lex bot work together for our solution.
In the solution workflow, the website with FAQs is ingested via AWS Lambda. This Lambda function crawls the website and stores the resulting text in an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket then triggers a Lambda function that uses LlamaIndex to create embeddings that are stored in Amazon S3. When a question from an end-user arrives, such as “What is your return policy?”, the Amazon Lex bot uses its Lambda function to query the embeddings using a RAG-based approach with LlamaIndex. For more information about this approach and the pre-requisites, refer to the blog post, Enhance Amazon Lex with conversational FAQ features using LLMs.
After the pre-requisites from the aforementioned blog are complete, the first step is to ingest the FAQs into a document repository that can be vectorized and indexed by LlamaIndex. The following code shows how to accomplish this:
In the preceding example, we take a predefined FAQ website URL from Zappos and ingest it using the EZWebLoader
class. With this class, we have navigated to the URL and loaded all the questions that are in the page into an index. We can now ask a question like “Does Zappos have gift cards?” and get the answers directly from our FAQs on the website. The following screenshot shows the Amazon Lex bot test console answering that question from the FAQs.
We were able to achieve this because we had crawled the URL in the first step and created embedddings that LlamaIndex could use to search for the answer to our question. Our bot’s Lambda function shows how this search is run whenever the fallback intent is returned:
This solution works well when a single webpage has all the answers. However, most FAQ sites are not built on a single page. For instance, in our Zappos example, if we ask the question “Do you have a price matching policy?”, then we get a less-than-satisfactory answer, as shown in the following screenshot.
In the preceding interaction, the price-matching policy answer isn’t helpful for our user. This answer is short because the FAQ referenced is a link to a specific page about the price matching policy and our web crawl was only for the single page. Achieving better answers will mean crawling these links as well. The next section shows how to get answers to questions that require two or more levels of page depth.
N-level crawling
When we crawl a web page for FAQ knowledge, the information we want can be contained in linked pages. For example, in our Zappos example, we ask the question “Do you have a price matching policy?” and the answer is “Yes please visit <link> to learn more.” If someone asks “What is your price matching policy?” then we want to give a complete answer with the policy. Achieving this means we have the need to traverse links to get the actual information for our end-user. During the ingestion process, we can use our web loader to find the anchor links to other HTML pages and then traverse them. The following code change to our web crawler allows us to find links in the pages we crawl. It also includes some additional logic to avoid circular crawling and allow a filter by a prefix.
In the preceding code, we introduce the ability to crawl N levels deep, and we give a prefix that allows us to restrict crawling to only things that begin with a certain URL pattern. In our Zappos example, the customer service pages all are rooted from zappos.com/c
, so we include that as a prefix to limit our crawls to a smaller and more relevant subset. The code shows how we can ingest up to two levels deep. Our bot’s Lambda logic remains the same because nothing has changed except the crawler ingests more documents.
We now have all the documents indexed and we can ask a more detailed question. In the following screenshot, our bot provides the correct answer to the question “Do you have a price matching policy?”
We now have a complete answer to our question about price matching. Instead of simply being told “Yes see our policy,” it gives us the details from the second-level crawl.
Clean up
To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the Sagemaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy
in the same directory as the other cdk commands to deprovision all the resources in your stack.
Conclusion
The ability to ingest a set of FAQs into a chatbot enables your customers to find the answers to their questions with straightforward, natural language queries. By combining the built-in support in Amazon Lex for fallback handling with a RAG solution such as a LlamaIndex, we can provide a quick path for our customers to get satisfying, curated, and approved answers to FAQs. By applying N-level crawling into our solution, we can allow for answers that could possibly span multiple FAQ links and provide deeper answers to our customer’s queries. By following these steps, you can seamlessly incorporate powerful LLM-based Q and A capabilities and efficient URL ingestion into your Amazon Lex chatbot. This results in more accurate, comprehensive, and contextually aware interactions with users.
About the authors
Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.
Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.
John Baker is a Principal SDE at AWS where he works on Natural Language Processing, Large Language Models and other ML/AI related projects. He has been with Amazon for 9+ years and has worked across AWS, Alexa and Amazon.com. In his spare time, John enjoys skiing and other outdoor activities throughout the Pacific Northwest.
Geopipe uses AI to create a digital twin of Earth
With help from the Alexa Fund, the company is making it easier to virtually reconstruct reality.Read More
Build an email spam detector using Amazon SageMaker
Spam emails, also known as junk mail, are sent to a large number of users at once and often contain scams, phishing content, or cryptic messages. Spam emails are sometimes sent manually by a human, but most often they are sent using a bot. Examples of spam emails include fake ads, chain emails, and impersonation attempts. There is a risk that a particularly well-disguised spam email may land in your inbox, which can be dangerous if clicked on. It’s important to take extra precautions to protect your device and sensitive information.
As technology is improving, the detection of spam emails becomes a challenging task due to its changing nature. Spam is quite different from other types of security threats. It may at first appear like an annoying message and not a threat, but it has an immediate effect. Also spammers often adapt new techniques. Organizations who provide email services want to minimize spam as much as possible to avoid any damage to their end customers.
In this post, we show how straightforward it is to build an email spam detector using Amazon SageMaker. The built-in BlazingText algorithm offers optimized implementations of Word2vec and text classification algorithms. Word2vec is useful for various natural language processing (NLP) tasks, such as sentiment analysis, named entity recognition, and machine translation. Text classification is essential for applications like web searches, information retrieval, ranking, and document classification.
Solution overview
This post demonstrates how you can set up email spam detector and filter spam emails using SageMaker. Let’s see how a spam detector typically works, as shown in the following diagram.
Emails are sent through a spam detector. An email is sent to the spam folder if the spam detector detects it as spam. Otherwise, it’s sent to the customer’s inbox.
We walk you through the following steps to set up our spam detector model:
- Download the sample dataset from the GitHub repo.
- Load the data in an Amazon SageMaker Studio notebook.
- Prepare the data for the model.
- Train, deploy, and test the model.
Prerequisites
Before diving into this use case, complete the following prerequisites:
- Set up an AWS account.
- Set up a SageMaker domain.
- Create an Amazon Simple Storage Service (Amazon S3) bucket. For instructions, see Create your first S3 bucket.
Download the dataset
Download the email_dataset.csv from GitHub and upload the file to the S3 bucket.
The BlazingText algorithm expects a single preprocessed text file with space-separated tokens. Each line in the file should contain a single sentence. If you need to train on multiple text files, concatenate them into one file and upload the file in the respective channel.
Load the data in SageMaker Studio
To perform the data load, complete the following steps:
- Download the
spam_detector.ipynb
file from GitHub and upload the file in SageMaker Studio. - In your Studio notebook, open the
spam_detector.ipynb
notebook. - If you are prompted to choose a Kernel, choose the Python 3 (Data Science 3.0) kernel and choose Select. If not, verify that the right kernel has been automatically selected.
- Import the required Python library and set the roles and the S3 buckets. Specify the S3 bucket and prefix where you uploaded email_dataset.csv.
- Run the data load step in the notebook.
- Check if the dataset is balanced or not based on the Category labels.
We can see our dataset is balanced.
Prepare the data
The BlazingText algorithm expects the data in the following format:
Here’s an example:
Check Training and Validation Data Format for the BlazingText Algorithm.
You now run the data preparation step in the notebook.
- First, you need to convert the Category column to an integer. The following cell replaces the SPAM value with 1 and the HAM value with 0.
- The next cell adds the prefix
__label__
to each Category value and tokenizes the Message column.
- The next step is to split the dataset into train and validation datasets and upload the files to the S3 bucket.
Train the model
To train the model, complete the following steps in the notebook:
- Set up the BlazingText estimator and create an estimator instance passing the container image.
- Set the learning mode hyperparameter to supervised.
BlazingText has both unsupervised and supervised learning modes. Our use case is text classification, which is supervised learning.
- Create the train and validation data channels.
- Start training the model.
- Get the accuracy of the train and validation dataset.
Deploy the model
In this step, we deploy the trained model as an endpoint. Choose your preferred instance
Test the model
Let’s provide an example of three email messages that we want to get predictions for:
- Click on below link, provide your details and win this award
- Best summer deal here
- See you in the office on Friday.
Tokenize the email message and specify the payload to use when calling the REST API.
Now we can predict the email classification for each email. Call the predict method of the text classifier, passing the tokenized sentence instances (payload) into the data argument.
Clean up
Finally , you can delete the endpoint to avoid any unexpected cost.
Also, delete the data file from S3 bucket.
Conclusion
In this post, we walked you through the steps to create an email spam detector using the SageMaker BlazingText algorithm. With the BlazingText algorithm, you can scale to large datasets. BlazingText is used for textual analysis and text classification problems, and has both unsupervised and supervised learning modes. You can use the algorithm for use cases like customer sentiment analysis and text classification.
To learn more about the BlazingText algorithm, check out BlazingText algorithm.
About the Author
Dhiraj Thakur is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance on enterprise cloud adoption, migration, and strategy. He is passionate about technology and enjoys building and experimenting in the analytics and AI/ML space.