January 2024 – Page 6

Dino-Mite: Capcom’s ‘Exoprimal’ Joins GeForce NOW

Hold on to your seats — this GFN Thursday is unleashing dinosaurs, crowns and more in the cloud.

Catch it all on Capcom’s Exoprimal and Ubisoft’s Prince of Persia: The Lost Crown, leading 10 new games joining the GeForce NOW library this week.

Suit Up, Adapt, Survive

Exoprimal on GeForce NOW — *Life finds a way.*

Don cutting-edge exosuit technology and battle ferocious dinosaurs on an Earth overrun with waves of prehistoric predators. Capcom’s online team-based action game Exoprimal is now supported in the cloud.

Face velociraptors, T. rex and mutated variants called Neosaurs using the exosuit’s unique weapons and abilities. Join other players in the game’s main mode, Dino Survival, to unlock snippets and special missions from the original story, piecing together the origins of the dinosaur outbreak. Change exosuits on the fly, switching between Assault, Tank and Support roles to suit the situation.

Catch the game in the cloud this week alongside the release of Title Update 3, which brings a new mission and special Monster Hunter collaboration content, a new map, new rigs, plus the start of the third season. Ultimate members can enjoy it all at up to 4K resolution and 120 frames per second, and new players can purchase the game on Steam at 50% off for a limited time.

Return to the Sands of Time

Prince of Persia on GeForce NOW — *So stylish.*

Defy time and destiny to reclaim the crown and save a cursed world in Prince of Persia: The Lost Crown. It’s the newest adventure in the critically acclaimed action-adventure platformer series, available to stream in the cloud this week at the game’s PC launch.

Step into the shoes of Sargon, a legendary prince with extraordinary acrobatic skills and the power to manipulate time. Travel to Mount Qaf to rescue the kidnapped Prince Ghassan. Wield blades and various time-related powers to fight enemies and solve puzzles in a Persia-inspired world filled with larger-than-life landmarks.

Members can unleash their inner warrior with an Ultimate membership for the highest-quality streaming. Dash into the thrilling game with support for up to 4K resolution at 120 fps on PCs and Macs, streaming from GeForce RTX 4080-powered servers in the cloud.

Time for New Games

In addition, members can look for the following:

Those Who Remain (New release on Xbox, available on PC Game Pass, Jan. 16)
Prince of Persia: The Lost Crown (New release on Ubisoft and Ubisoft+, Jan. 18)
Turnip Boy Robs a Bank (New release on Steam and Xbox, available for PC Game Pass, Jan. 18)
New Cycle (New release on Steam, Jan. 18)
Beacon Pines (Xbox, available on the Microsoft Store)
Exoprimal (Steam)
FAR: Changing Tides (Xbox, available on the Microsoft Store)
Going Under (Xbox, available on the Microsoft Store)
The Legend of Nayuta: Boundless Trails (Steam)
Turnip Boy Commits Tax Evasion (Xbox, available on the Microsoft Store)

What are you planning to play this weekend? Let us know on X or in the comments below.

+ =

— NVIDIA GeForce NOW (@NVIDIAGFN) January 17, 2024

A new accelerator for AI-first startups in North America

Learn more about Google for Startups Accelerator: AI First program for North American startups.Read More

Accelerate PyTorch Models Using Quantization Techniques with Intel Extension for PyTorch

Overview

PyTorch is a Python-based framework for developing deep learning models. It is one of the most popular industry-standard AI frameworks and is used for a wide variety of computer vision and natural language processing applications. PyTorch was developed by Meta and is now part of The Linux Foundation. Intel works with the open source PyTorch project to optimize the PyTorch framework for Intel® hardware. The newest optimizations and features are first released in Intel® Extension for PyTorch before upstreaming them into PyTorch. The Intel extension provides quantization features to deliver good accuracy results for large deep learning models.

This article introduces quantization, types of quantization, and demonstrates a code sample on how to accelerate PyTorch-based models by applying Intel Extension for PyTorch quantization.

What Is Quantization?

Quantization is a systematic reduction of the precision of all or several layers within the model. This means a higher-precision type (like single precision floating-point (FP32) that is mostly used in deep learning) is converted into a lower-precision type, such as FP16 (16 bits) or int8 (8 bits).

This helps to achieve:

Lower memory bandwidth
Lower storage
Higher performance with minimum to zero accuracy loss

Quantization is especially important with large models such as those based on the Transformer architecture (like BERT or GPT).

There are two types of quantization:

Static: This quantizes the weights and activations of the model, and is used when memory bandwidth and compute savings are important.
Dynamic: The weights are quantized ahead of time, but the activations are dynamically quantized during inference.

How to Perform Static Quantization and Dynamic Quantization

The Intel extension extends PyTorch with up-to-date features and optimizations for an extra performance boost on Intel hardware.

Installation Instructions for Intel Extension for PyTorch

The extension can be loaded as a Python module or linked as a C++ library. Python users can enable it dynamically by importing intel_extension_for_pytorch. The extension provides built-in quantization to deliver good statistical accuracy for most popular deep learning workloads including convolutional neural networks (CNN), natural language processing (NLP), and recommendation models. The quantization functionality in the Intel extension currently supports post-training quantization.

To quantize the existing FP32 model to an int8 model using static quantization:

Prepare the quantization configuration. For default static quantization configuration, use ipex.quantization.default_static_qconfig.
Prepare the model for calibration using the ipex.quantization.prepare method.
Perform calibration against the dataset. This calibration is specific for static quantization as it needs the representative dataset to determine the optimal quantization parameters, so the user should provide data to the model in batches to calibrate it.
Convert the model from FP32 to int8 using the ipex.quantization.convert method. This function converts the FP32 model to int8 based on the applied calibration and configuration.

To quantize the existing FP32 model to an int8 model using dynamic quantization, which is similar to static quantization:

Prepare the quantization configuration. For default dynamic quantization configuration, use ipex.quantization.default_dynamic_qconfig.
Prepare the FP32 model by using the ipex.quantization.prepare method. Provide the parameters, such as FP32 model to quantize, the prepared configuration, example inputs, and information.
Convert the model from FP32 to int8 using the ipex.quantization.convert method. The input model is the model prepared in Step 2.

Code Sample

Dataset

For static quantization, the model is calibrated with the CIFAR-10 dataset. The CIFAR-10 is a subset of the 80 million tiny images dataset collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

This dataset contains 60,000 images in 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and track). Every class has exactly 6,000 images. All images are 32 x 32 pixels and are colored. Also, the classes are completely mutually exclusive, which means there is no overlapping between classes.

Implementation

The code sample demonstrates how to quantize (using static and dynamic quantization) a ResNet*-50 model using Intel Extension for PyTorch. The following steps are implemented in the code sample:

Download and Prepare the Dataset

Here, we use the CIFAR-10 dataset available in torchvision.

To make data fit the model:

Transform the data.
Change the size of the images from 32 x 32 pixels to 224 x 224 pixels.
Convert them to tensors.
Normalize them.

Prepare transformations of the dataset as shown:

transform = torchvision.transforms.Compose([
torchvision.transforms.Resize((224, 224)),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Initialize the dataset.

test_dataset = torchvision.datasets.CIFAR10(root=DATA, train=False, transform=transform, download=Ture)

Prepare the Data Loader

To load a dataset for static quantization calibration in specific size batches, create the loader as shown:

calibration_data_loader = torch.utils.data.DataLoader(
dataset=test_dataset,
batch_size=128
)

Create the Model

Use the pretrained ResNet-50 model available in the Torchvision library with default weights. The prepared model is FP32.

model_fp32 = torchvision.models.resnet50(weights=torchvision.models.ResNet50_Weights.DEFAULT)

Apply Static Quantization

Create a staticQuantize function that implements the steps described previously.

To perform static quantization, we need:

FP32 model loaded earlier
Example data
Calibration dataset

Prepare the quantization configuration:

config_static = ipex.quantization.default_static_qconfig

In this code sample, we are using the default quantization configuration, but you can also define your own.

Prepare the model using the declared configuration:

prepared_model_static = prepare(model_fp32,
qconfig_static,
example_inputs=data,
inplace=False)

Calibrate the model with the calibration dataset. Feed the model with successive batches of data from the dataset.

for batch_idx, (data, target) in enumerate(calibration_data_loader):
prepared_model_static(data)
if batch_idx % 10 == 0:
print("Batch %d/%d complete, continue ..." %(batch_idx+1, len(calibration_data_loader)))

Convert the model.

converted_model_static = convert(prepared_model_static)

Apply Dynamic Quantization

Create the dynamicQuantize function similar to the staticQuantize function.

To perform dynamic quantization, we only need:

The FP32 model loaded earlier
Example data

Prepare the quantization configuration:

qconfig_dynamic = ipex.quantization.default_dynamic_qconfig

Prepare the model.

prepared_model_dynamic = prepare(model_fp32,
qconfig_dynamic,
example_inputs=data,
inplace=False)

Convert the model from FP32 to int8.

converted_model_dynamic = convert(prepared_model_dynamic)

In this way, two functions are created to take advantage of the optimizations that quantization offers:

DynamicQuantize for dynamic quantization of models
StaticQuantize for static model quantization

Next Steps

Get started with Intel Extension for PyTorch quantization today and use it to achieve better accuracy results for deep learning workloads. Additionally, Intel® Neural Compressor provides quantization to improve the speed of inference.

Check out and incorporate Intel’s other AI and machine learning framework optimizations and end-to-end portfolio of tools into your AI workflow.

Learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel’s AI Software Portfolio to help you prepare, build, deploy, and scale your AI solutions.

For more details about the 4th gen Intel® Xeon® Scalable processors, visit the Intel® AI platform overview where you can learn how Intel is empowering developers to run end-to-end AI pipelines on these powerful CPUs.

Additional Resources

Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

Today, we’re excited to announce the availability of Llama 2 inference and fine-tuning support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. Using AWS Trainium and Inferentia based instances, through SageMaker, can help users lower fine-tuning costs by up to 50%, and lower deployment costs by 4.7x, while lowering per token latency. Llama 2 is an auto-regressive generative text language model that uses an optimized transformer architecture. As a publicly available model, Llama 2 is designed for many NLP tasks such as text classification, sentiment analysis, language translation, language modeling, text generation, and dialogue systems. Fine-tuning and deploying LLMs, like Llama 2, can become costly or challenging to meet real time performance to deliver good customer experience. Trainium and AWS Inferentia, enabled by the AWS Neuron software development kit (SDK), offer a high-performance, and cost effective option for training and inference of Llama 2 models.

In this post, we demonstrate how to deploy and fine-tune Llama 2 on Trainium and AWS Inferentia instances in SageMaker JumpStart.

Solution overview

In this blog, we will walk through the following scenarios :

Deploy Llama 2 on AWS Inferentia instances in both the Amazon SageMaker Studio UI, with a one-click deployment experience, and the SageMaker Python SDK.
Fine-tune Llama 2 on Trainium instances in both the SageMaker Studio UI and the SageMaker Python SDK.
Compare the performance of the fine-tuned Llama 2 model with that of pre-trained model to show the effectiveness of fine-tuning.

To get hands on, see the GitHub example notebook.

Deploy Llama 2 on AWS Inferentia instances using the SageMaker Studio UI and the Python SDK

In this section, we demonstrate how to deploy Llama 2 on AWS Inferentia instances using the SageMaker Studio UI for a one-click deployment and the Python SDK.

Discover the Llama 2 model on the SageMaker Studio UI

SageMaker JumpStart provides access to both publicly available and proprietary foundation models. Foundation models are onboarded and maintained from third-party and proprietary providers. As such, they are released under different licenses as designated by the model source. Be sure to review the license for any foundation model that you use. You are responsible for reviewing and complying with any applicable license terms and making sure they are acceptable for your use case before downloading or using the content.

You can access the Llama 2 foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

After you’re in SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions. For more detailed information on how to access proprietary models, refer to Use proprietary foundation models from Amazon SageMaker JumpStart in Amazon SageMaker Studio.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources.

If you don’t see the Llama 2 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Classic Apps.

You can also find other model variants by choosing Explore All Text Generation Models or searching for llama or neuron in the search box. You will be able to view the Llama 2 Neuron models on this page.

Deploy the Llama-2-13b model with SageMaker Jumpstart

You can choose the model card to view details about the model such as license, data used to train, and how to use it. You can also find two buttons, Deploy and Open notebook, which help you use the model using this no-code example.

When you choose either button, a pop-up will show the End User License Agreement and Acceptable Use Policy (AUP) for you to acknowledge.

After you acknowledge the policies, you can deploy the endpoint of the model and use it via the steps in the next section.

Deploy the Llama 2 Neuron model via the Python SDK

When you choose Deploy and acknowledge the terms, model deployment will start. Alternatively, you can deploy through the example notebook by choosing Open notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy or fine-tune a model on Trainium or AWS Inferentia instances, you first need to call PyTorch Neuron (torch-neuronx) to compile the model into a Neuron-specific graph, which will optimize it for Inferentia’s NeuronCores. Users can instruct the compiler to optimize for lowest latency or highest throughput, depending on the objectives of the application. In JumpStart, we pre-compiled the Neuron graphs for a variety of configurations, to allow users to sip compilation steps, enabling faster fine-tuning and deploying models.

Note that the Neuron pre-compiled graph is created based on a specific version of the Neuron Compiler version.

There are two ways to deploy LIama 2 on AWS Inferentia-based instances. The first method utilizes the pre-built configuration, and allows you to deploy the model in just two lines of code. In the second, you have greater control over the configuration. Let’s start with the first method, with the pre-built configuration, and use the pre-trained Llama 2 13B Neuron Model, as an example. The following code shows how to deploy Llama 13B with just two lines:

from sagemaker.jumpstart.model import JumpStartModel

model_id = "meta-textgenerationneuron-llama-2-13b"
model = JumpStartModel(model_id=model_id)
pretrained_predictor = model.deploy(accept_eula=False) ## To set 'accept_eula' to be True to deploy

To perform inference on these models, you need to specify the argument accept_eula to be True as part of the model.deploy() call. Setting this argument to be true, acknowledges you have read and accepted the EULA of the model. The EULA can be found in the model card description or from the Meta website.

The default instance type for Llama 2 13B is ml.inf2.8xlarge. You can also try other supported models IDs:

meta-textgenerationneuron-llama-2-7b
meta-textgenerationneuron-llama-2-7b-f (chat model)
meta-textgenerationneuron-llama-2-13b-f (chat model)

Alternatively, if you want have more control of the deployment configurations, such as context length, tensor parallel degree, and maximum rolling batch size, you can modify them via environmental variables, as demonstrated in this section. The underlying Deep Learning Container (DLC) of the deployment is the Large Model Inference (LMI) NeuronX DLC. The environmental variables are as follows:

OPTION_N_POSITIONS – The maximum numbers of input and output tokens. For example, if you compile the model with OPTION_N_POSITIONS as 512, then you can use an input token of 128 (input prompt size) with a maximum output token of 384 (the total of the input and output tokens has to be 512). For the maximum output token, any value below 384 is fine, but you can’t go beyond it (for example, input 256 and output 512).
OPTION_TENSOR_PARALLEL_DEGREE – The number of NeuronCores to load the model in AWS Inferentia instances.
OPTION_MAX_ROLLING_BATCH_SIZE – The maximum batch size for concurrent requests.
OPTION_DTYPE – The date type to load the model.

The compilation of Neuron graph depends on the context length (OPTION_N_POSITIONS), tensor parallel degree (OPTION_TENSOR_PARALLEL_DEGREE), maximum batch size (OPTION_MAX_ROLLING_BATCH_SIZE), and data type (OPTION_DTYPE) to load the model. SageMaker JumpStart has pre-compiled Neuron graphs for a variety of configurations for the preceding parameters to avoid runtime compilation. The configurations of pre-compiled graphs are listed in the following table. As long as the environmental variables fall into one of the following categories, compilation of Neuron graphs will be skipped.

LIama-2 7B and LIama-2 7B Chat
Instance type	OPTION_N_POSITIONS	OPTION_MAX_ROLLING_BATCH_SIZE	OPTION_TENSOR_PARALLEL_DEGREE	OPTION_DTYPE
ml.inf2.xlarge	1024	1	2	fp16
ml.inf2.8xlarge	2048	1	2	fp16
ml.inf2.24xlarge	4096	4	4	fp16
ml.inf2.24xlarge	4096	4	8	fp16
ml.inf2.24xlarge	4096	4	12	fp16
ml.inf2.48xlarge	4096	4	4	fp16
ml.inf2.48xlarge	4096	4	8	fp16
ml.inf2.48xlarge	4096	4	12	fp16
ml.inf2.48xlarge	4096	4	24	fp16
LIama-2 13B and LIama-2 13B Chat
ml.inf2.8xlarge	1024	1	2	fp16
ml.inf2.24xlarge	2048	4	4	fp16
ml.inf2.24xlarge	4096	4	8	fp16
ml.inf2.24xlarge	4096	4	12	fp16
ml.inf2.48xlarge	2048	4	4	fp16
ml.inf2.48xlarge	4096	4	8	fp16
ml.inf2.48xlarge	4096	4	12	fp16
ml.inf2.48xlarge	4096	4	24	fp16

The following is an example of deploying Llama 2 13B and setting all the available configurations.

from sagemaker.jumpstart.model import JumpStartModel

model_id = "meta-textgenerationneuron-llama-2-13b-f"
model = JumpStartModel(
    model_id=model_id,
    env={
        "OPTION_DTYPE": "fp16",
        "OPTION_N_POSITIONS": "4096",
        "OPTION_TENSOR_PARALLEL_DEGREE": "12",
        "OPTION_MAX_ROLLING_BATCH_SIZE": "4", 
    },
    instance_type="ml.inf2.24xlarge"  
)
pretrained_predictor = model.deploy(accept_eula=False) ## To set 'accept_eula' to be True to deploy

Now that we have deployed the Llama-2-13b model, we can run inference with it by invoking the endpoint. The following code snippet demonstrates using the supported inference parameters to control text generation:

max_length – The model generates text until the output length (which includes the input context length) reaches max_length. If specified, it must be a positive integer.
max_new_tokens – The model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
num_beams – This indicates the number of beams used in the greedy search. If specified, it must be an integer greater than or equal to num_return_sequences.
no_repeat_ngram_size – The model ensures that a sequence of words of no_repeat_ngram_size is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
temperature – This controls the randomness in the output. A higher temperature results in an output sequence with low-probability words; a lower temperature results in an output sequence with high-probability words. If temperature equals 0, it results in greedy decoding. If specified, it must be a positive float.
early_stopping – If True, text generation is finished when all beam hypotheses reach the end of the sentence token. If specified, it must be Boolean.
do_sample – If True, the model samples the next word as per the likelihood. If specified, it must be Boolean.
top_k – In each step of text generation, the model samples from only the top_k most likely words. If specified, it must be a positive integer.
top_p – In each step of text generation, the model samples from the smallest possible set of words with a cumulative probability of top_p. If specified, it must be a float between 0–1.
stop – If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

The following code shows an example:

payload = {
    "inputs": "I believe the meaning of life is",
    "parameters": {
        "max_new_tokens": 64,
        "top_p": 0.9,
        "temperature": 0.6,
    },
}

response = pretrained_predictor.predict(payload)

Output:

I believe the meaning of life is
>  to be happy. I believe that happiness is a choice. I believe that happiness 
is a state of mind. I believe that happiness is a state of being. I believe that 
happiness is a state of being. I believe that happiness is a state of being. I 
believe that happiness is a state of being. I believe

For more information on the parameters in the payload, refer to Detailed parameters.

You can also explore the implementation of the parameters in the notebook to add more information about the link of the notebook.

Fine-tune Llama 2 models on Trainium instances using the SageMaker Studio UI and SageMaker Python SDK

Generative AI foundation models have become a primary focus in ML and AI, however, their broad generalization can fall short in specific domains like healthcare or financial services, where unique datasets are involved. This limitation highlights the need to fine-tune these generative AI models with domain-specific data to enhance their performance in these specialized areas.

Now that we have deployed the pre-trained version of the Llama 2 model, let’s look at how we can fine-tune this to domain-specific data to increase the accuracy, improve the model in terms of prompt completions, and adapt the model to your specific business use case and data. You can fine-tune the models using either the SageMaker Studio UI or SageMaker Python SDK. We discuss both methods in this section.

Fine-tune the Llama-2-13b Neuron model with SageMaker Studio

In SageMaker Studio, navigate to the Llama-2-13b Neuron model. On the Deploy tab, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning. In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning. Then choose Train to start the training job on a SageMaker ML instance.

To use Llama 2 models, you need to accept the EULA and AUP. It will show up when you when you choose Train. Choose I have read and accept EULA and AUP to start the fine-tuning job.

You can view the status of your training job for the fine-tuned model under on the SageMaker console by choosing Training jobs in the navigation pane.

You can either fine-tune your Llama 2 Neuron model using this no-code example, or fine-tune via the Python SDK, as demonstrated in the next section.

Fine-tune the Llama-2-13b Neuron model via the SageMaker Python SDK

You can fine-tune on the dataset with the domain adaptation format or the instruction-based fine-tuning format. The following are the instructions for how the training data should be formatted before being sent into fine-tuning:

Input – A train directory containing either a JSON lines (.jsonl) or text (.txt) formatted file.
- For the JSON lines (.jsonl) file, each line is a separate JSON object. Each JSON object should be structured as a key-value pair, where the key should be text, and the value is the content of one training example.
- The number of files under the train directory should equal to 1.
Output – A trained model that can be deployed for inference.

In this example, we use a subset of the Dolly dataset in an instruction tuning format. The Dolly dataset contains approximately 15,000 instruction-following records for various categories, such as, question answering, summarization, and information extraction. It is available under the Apache 2.0 license. We use the information_extraction examples for fine-tuning.

Load the Dolly dataset and split it into train (for fine-tuning) and test (for evaluation):

from datasets import load_dataset

dolly_dataset = load_dataset("databricks/databricks-dolly-15k", split="train")

task = "information_extraction"
To train for summarization/closed question and answering, you can replace the assertion in next line to example["category"] == "sumarization"/"closed_qa".
summarization_dataset = dolly_dataset.filter(lambda example: example["category"] == task)
summarization_dataset = summarization_dataset.remove_columns("category")

We split the dataset into two where test data is used to evaluate at the end.
train_and_test_dataset = summarization_dataset.train_test_split(test_size=0.1)

Dumping the training data to a local file to be used for training.
train_and_test_dataset["train"].to_json("train.jsonl")

Use a prompt template for preprocessing the data in an instruction format for the training job:

prompt = ("""Below is an instruction that describes a task, paired with an input 
that provides further context. Write a response that appropriately completes the 
request.nn### Instruction:n{instruction}nn### Input:n{context}### 
Response:n{response}nn<s>""")

Examine the hyperparameters and overwrite them for your own use case:

from sagemaker import hyperparameters

model_id = "meta-textgenerationneuron-llama-2-13b"
model_version = "1.*"

my_hyperparameters = hyperparameters.retrieve_default(
    model_id=model_id, model_version=model_version
)

my_hyperparameters["max_input_length"] = "4096" ## you can increase it up to 4096 for sequence length.
my_hyperparameters["max_steps"] = "25"
my_hyperparameters["learning_rate"] = "0.0001"
print(my_hyperparameters)

hyperparameters.validate(model_id=model_id, model_version=model_version, hyperparameters=my_hyperparameters)

Fine-tune the model and start a SageMaker training job. The fine-tuning scripts are based on the neuronx-nemo-megatron repository, which are modified versions of the packages NeMo and Apex that have been adapted for use with Neuron and EC2 Trn1 instances. The neuronx-nemo-megatron repository has 3D (data, tensor, and pipeline) parallelism to allow you to fine-tune LLMs in scale. The supported Trainium instances are ml.trn1.32xlarge and ml.trn1n.32xlarge.
```
from sagemaker.jumpstart.estimator import JumpStartEstimator

estimator = JumpStartEstimator(
    model_id=model_id,
    model_version=model_version,
    hyperparameters=my_hyperparameters,
    environment={"accept_eula": "false"}, # please change `accept_eula` to be `true` to accept EULA.
    #instance_type="ml.trn1n.32xlarge", if not specified, default `ml.trn1.32xlarge` will be used.
)

estimator.fit({"train": train_data_location})
```

Finally, deploy the fine-tuned model in a SageMaker endpoint:
```
finetuned_predictor = estimator.deploy()
```

Compare responses between the pre-trained and fine-tuned Llama 2 Neuron models

Now that we have deployed the pre-trained version of the Llama-2-13b model and fine-tuned it, we can view some of the performance comparisons of the prompt completions from both models, as shown in the following table. We also offer an example to fine-tune Llama 2 on a SEC filing dataset in .txt format. For details, see the GitHub example notebook.

Item	Inputs	Ground truth	Response from non-finetuned model	Response from fine-tuned model
1	Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nExtract universities Moret studied at and his year of graduation for each, and put them in a bulleted list.nn### Input:nMoret was born in 1953 in Vevey Switzerland, and did his undergraduate studies at the École Polytechnique Fédérale de Lausanne (EPFL), graduating in 1975. He went on to graduate studies at the University of Tennessee, earning a Ph.D. in 1980. He then joined the faculty of the University of New Mexico, where he remained until 2006, when he moved to EPFL. He retired from EPFL in December 2016.nnIn 1996, Moret founded the ACM Journal of Experimental Algorithmics, and he remained editor in chief of the journal until 2003. In 2001, Moret founded the Workshop in Algorithms for Bioinformatics (WABI) and remains on the Steering Committee for the conference.nnIn 2018, Moret was elected as a Fellow of the International Society for Computational Biology, for his outstanding contributions to the fields of computational biology and bioinformatics.nn### Response:n	* École Polytechnique Fédérale de Lausanne (EPFL) (graduated 1975)n* University of Tennessee (graduated 1980)	Moret studied at the École Polytechnique Fédérale de Lausanne (EPFL) and the University of Tennessee, graduating in1975 and1980, respectively.nn### Hint:nn- Use the `split` function to split the input into a list of sentences.n- Use the `split` function to split each sentence into a list of words.n- Use the `split` function to split	– École Polytechnique Fédérale de Lausanne (EPFL),1975n- University of Tennessee,1980n- University of New Mexicon- École Polytechnique Fédérale de Lausanne (EPFL),2006nn<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.n
2	Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nFrom the passage identify the attributes of Dionysus. Display the results in a comma separated format.nn### Input:nDionysus (Διόνυσος, Diónusos)/Bacchus (Βάκχος, Bákkhos)nnGod of wine, fruitfulness, parties, festivals, madness, chaos, drunkenness, vegetation, ecstasy, and the theater. He is the twice-born son of Zeus and Semele, in that Zeus snatched him from his mother’s womb and stitched Dionysus into his own thigh and carried him until he was ready to be born. In art he is depicted as either an older bearded god (particularly before 430 BC) or an effeminate, long-haired youth (particularly after 430 BC). His attributes include the thyrsus, a drinking cup, the grape vine, and a crown of ivy. He is often in the company of his thiasos, a group of attendants including satyrs, maenads, and his old tutor Silenus. The consort of Dionysus was Ariadne. It was once held that Dionysius was a later addition to the Greek pantheon, but the discovery of Linear B tablets confirm his status as a deity from an early period. Bacchus was another name for him in Greek, and came into common usage among the Romans. His sacred animals include dolphins, serpents, tigers, and donkeys.nn### Response:n	the thyrsus, a drinking cup, the grape vine, a crown of ivy	Dionysus (Διόνυσος, Diónusos)nnGod of wine, fruitfulness, parties, festivals, madness, chaos, drunkenness, vegetation, ecstasy, and the theater. He is the twice-born son of Zeus and Semele, in that Zeus snatched him from his mother’s womb and stitched Dionysus into his own thigh	thyrsus, a drinking cup, the grape vine, a crown of ivynn<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat is the difference between Parthian and Arsacid Empire?nn### Input:nThe Parthian Empire, also known as the Arsacid Empire was a
3	Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhy does the capital of Uzbekistan contain the biggest Greek community?nn### Input:nThe Greeks in Uzbekistan number approximately 9,000. The community is made up of Greeks from Russia who were deported by force from that country to Uzbekistan in the 1940s, and political refugees from Greece. About 30,000 Greeks lived in the country before World War II and a further 11,000 arrived after the Greek Civil War and the defeat of Democratic Army of Greece. Their numbers have dwindled from a high of some 40,000 in the 1960s. The main reason is emigration to Greece after the end of the Cold War when laws allowed the return of all ethnic Greeks who had been exiled for political reasons after the Greek Civil War. The biggest Greek community in the country is in the capital city of Tashkent where most of the Greek political refugees were relocated by the Soviet authorities. In ancient times the south of the country was part of the Hellenistic Seleucid Kingdom but the few Greek communities there have since assimilated. The most important organization representing ethnic Greeks is the Greek Cultural Association of Tashkent. The activities of this organization include Greek language instruction (19 classes with a total of 571 students, classes for adults) and the teaching of Greek dances and music.nn### Response:n	Most of the Greek political refugees in Uzbekistan were relocated by the Soviet authorities to the capital city of Tashkent.	n“`nThe capital of Uzbekistan contains the biggest Greek community because the Greeks in Uzbekistan number approximately9,000. The community is made up of Greeks from Russia who were deported by force from that country to Uzbekistan in the1940s, and political refugees from Greece. About30,000 Greeks lived in the country before World War II and a further11,00	The capital of Uzbekistan contains the biggest Greek community because that is where most of the Greek political refugees were relocated by the Soviet authorities after being deported from Greece.nn<s> Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:nWhat is the difference between Parthian and Arsacid Empire?nn### Input:n

We can see that the responses from the fine-tuned model demonstrate a significant improvement in precision, relevance, and clarity compared to those from the pre-trained model. In some cases, using the pre-trained model for your use case might not be enough, so fine-tuning it using this technique will make the solution more personalized to your dataset.

Clean up

After you have completed your training job and don’t want to use the existing resources anymore, delete the resources using the following code:

# Delete resources
# Delete the fine-tuned model
finetuned_predictor.delete_model()

# Delete the fine-tuned model endpoint
finetuned_predictor.delete_endpoint()

Conclusion

The deployment and fine-tuning of Llama 2 Neuron models on SageMaker demonstrate a significant advancement in managing and optimizing large-scale generative AI models. These models, including variants like Llama-2-7b and Llama-2-13b, use Neuron for efficient training and inference on AWS Inferentia and Trainium based instances, enhancing their performance and scalability.

The ability to deploy these models through the SageMaker JumpStart UI and Python SDK offers flexibility and ease of use. The Neuron SDK, with its support for popular ML frameworks and high-performance capabilities, enables efficient handling of these large models.

Fine-tuning these models on domain-specific data is crucial for enhancing their relevance and accuracy in specialized fields. The process, which you can conduct through the SageMaker Studio UI or Python SDK, allows for customization to specific needs, leading to improved model performance in terms of prompt completions and response quality.

Comparatively, the pre-trained versions of these models, while powerful, may provide more generic or repetitive responses. Fine-tuning tailors the model to specific contexts, resulting in more accurate, relevant, and diverse responses. This customization is particularly evident when comparing responses from pre-trained and fine-tuned models, where the latter demonstrates a noticeable improvement in quality and specificity of output. In conclusion, the deployment and fine-tuning of Neuron Llama 2 models on SageMaker represent a robust framework for managing advanced AI models, offering significant improvements in performance and applicability, especially when tailored to specific domains or tasks.

Get started today by referencing sample SageMaker notebook.

For more information on deploying and fine-tuning pre-trained Llama 2 models on GPU-based instances, refer to Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart and Llama 2 foundation models from Meta are now available in Amazon SageMaker JumpStart.

The authors would like to acknowledge the technical contributions of Evan Kravitz, Christopher Whitten, Adam Kozdrowicz, Manan Shah, Jonathan Guinegagne and Mike James.

About the Authors

Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Madhur Prashant works in the generative AI space at AWS. He is passionate about the intersection of human thinking and generative AI. His interests lie in generative AI, specifically building solutions that are helpful and harmless, and most of all optimal for customers. Outside of work, he loves doing yoga, hiking, spending time with his twin, and playing the guitar.

Dewan Choudhury is a Software Development Engineer with Amazon Web Services. He works on Amazon SageMaker’s algorithms and JumpStart offerings. Apart from building AI/ML infrastructures, he is also passionate about building scalable distributed systems.

Hao Zhou is a Research Scientist with Amazon SageMaker. Before that, he worked on developing machine learning methods for fraud detection for Amazon Fraud Detector. He is passionate about applying machine learning, optimization, and generative AI techniques to various real-world problems. He holds a PhD in Electrical Engineering from Northwestern University.

Qing Lan is a Software Development Engineer in AWS. He has been working on several challenging products in Amazon, including high performance ML inference solutions and high performance logging system. Qing’s team successfully launched the first Billion-parameter model in Amazon Advertising with very low latency required. Qing has in-depth knowledge on the infrastructure optimization and Deep Learning acceleration.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Dr. Li Zhang is a Principal Product Manager-Technical for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms, a service that helps data scientists and machine learning practitioners get started with training and deploying their models, and uses reinforcement learning with Amazon SageMaker. His past work as a principal research staff member and master inventor at IBM Research has won the test of time paper award at IEEE INFOCOM.

Kamran Khan, Sr Technical Business Development Manager for AWS Inferentina/Trianium at AWS. He has over a decade of experience helping customers deploy and optimize deep learning training and inference workloads using AWS Inferentia and AWS Trainium.

Joe Senerchia is a Senior Product Manager at AWS. He defines and builds Amazon EC2 instances for deep learning, artificial intelligence, and high-performance computing workloads.

Use mobility data to derive insights using Amazon SageMaker geospatial capabilities

Geospatial data is data about specific locations on the earth’s surface. It can represent a geographical area as a whole or it can represent an event associated with a geographical area. Analysis of geospatial data is sought after in a few industries. It involves understanding where the data exists from a spatial perspective and why it exists there.

There are two types of geospatial data: vector data and raster data. Raster data is a matrix of cells represented as a grid, mostly representing photographs and satellite imagery. In this post, we focus on vector data, which is represented as geographical coordinates of latitude and longitude as well as lines and polygons (areas) connecting or encompassing them. Vector data has a multitude of use cases in deriving mobility insights. User mobile data is one such component of it, and it’s derived mostly from the geographical position of mobile devices using GPS or app publishers using SDKs or similar integrations. For the purpose of this post, we refer to this data as mobility data.

This is a two-part series. In this first post, we introduce mobility data, its sources, and a typical schema of this data. We then discuss the various use cases and explore how you can use AWS services to clean the data, how machine learning (ML) can aid in this effort, and how you can make ethical use of the data in generating visuals and insights. The second post will be more technical in nature and cover these steps in detail alongside sample code. This post does not have a sample dataset or sample code, rather it covers how to use the data after it’s purchased from a data aggregator.

You can use Amazon SageMaker geospatial capabilities to overlay mobility data on a base map and provide layered visualization to make collaboration easier. The GPU-powered interactive visualizer and Python notebooks provide a seamless way to explore millions of data points in a single window and share insights and results.

Sources and schema

There are few sources of mobility data. Apart from GPS pings and app publishers, other sources are used to augment the dataset, such as Wi-Fi access points, bid stream data obtained via serving ads on mobile devices, and specific hardware transmitters placed by businesses (for example, in physical stores). It’s often difficult for businesses to collect this data themselves, so they may purchase it from data aggregators. Data aggregators collect mobility data from various sources, clean it, add noise, and make the data available on a daily basis for specific geographic regions. Due to the nature of the data itself and because it’s difficult to obtain, the accuracy and quality of this data can vary considerably, and it’s up to the businesses to appraise and verify this by using metrics such as daily active users, total daily pings, and average daily pings per device. The following table shows what a typical schema of a daily data feed sent by data aggregators may look like.

Attribute	Description
Id or MAID	Mobile Advertising ID (MAID) of the device (hashed)
lat	Latitude of the device
lng	Longitude of the device
geohash	Geohash location of the device
device_type	Operating System of the device = IDFA or GAID
horizontal_accuracy	Accuracy of horizontal GPS coordinates (in meters)
timestamp	Timestamp of the event
ip	IP address
alt	Altitude of the device (in meters)
speed	Speed of the device (in meters/second)
country	ISO two-digit code for the country of origin
state	Codes representing state
city	Codes representing city
zipcode	Zipcode of where Device ID is seen
carrier	Carrier of the device
device_manufacturer	Manufacturer of the device

Use cases

Mobility data has widespread applications in varied industries. The following are some of the most common use cases:

Density metrics – Foot traffic analysis can be combined with population density to observe activities and visits to points of interest (POIs). These metrics present a picture of how many devices or users are actively stopping and engaging with a business, which can be further used for site selection or even analyzing movement patterns around an event (for example, people traveling for a game day). To obtain such insights, the incoming raw data goes through an extract, transform, and load (ETL) process to identify activities or engagements from the continuous stream of device location pings. We can analyze activities by identifying stops made by the user or mobile device by clustering pings using ML models in Amazon SageMaker.
Trips and trajectories – A device’s daily location feed can be expressed as a collection of activities (stops) and trips (movement). A pair of activities can represent a trip between them, and tracing the trip by the moving device in geographical space can lead to mapping the actual trajectory. Trajectory patterns of user movements can lead to interesting insights such as traffic patterns, fuel consumption, city planning, and more. It can also provide data to analyze the route taken from advertising points such as a billboard, identify the most efficient delivery routes to optimize supply chain operations, or analyze evacuation routes in natural disasters (for example, hurricane evacuation).
Catchment area analysis – A catchment area refers to places from where a given area draws its visitors, who may be customers or potential customers. Retail businesses can use this information to determine the optimal location to open a new store, or determine if two store locations are too close to each other with overlapping catchment areas and are hampering each other’s business. They can also find out where the actual customers are coming from, identify potential customers who pass by the area traveling to work or home, analyze similar visitation metrics for competitors, and more. Marketing Tech (MarTech) and Advertisement Tech (AdTech) companies can also use this analysis to optimize marketing campaigns by identifying the audience close to a brand’s store or to rank stores by performance for out-of-home advertising.

There are several other use cases, including generating location intelligence for commercial real estate, augmenting satellite imagery data with footfall numbers, identifying delivery hubs for restaurants, determining neighborhood evacuation likelihood, discovering people movement patterns during a pandemic, and more.

Challenges and ethical use

Ethical use of mobility data can lead to many interesting insights that can help organizations improve their operations, perform effective marketing, or even attain a competitive advantage. To utilize this data ethically, several steps need to be followed.

It starts with the collection of data itself. Although most mobility data remains free of personally identifiable information (PII) such as name and address, data collectors and aggregators must have the user’s consent to collect, use, store, and share their data. Data privacy laws such as GDPR and CCPA need to be adhered to because they empower users to determine how businesses can use their data. This first step is a substantial move towards ethical and responsible use of mobility data, but more can be done.

Each device is assigned a hashed Mobile Advertising ID (MAID), which is used to anchor the individual pings. This can be further obfuscated by using Amazon Macie, Amazon S3 Object Lambda, Amazon Comprehend, or even the AWS Glue Studio Detect PII transform. For more information, refer to Common techniques to detect PHI and PII data using AWS Services.

Apart from PII, considerations should be made to mask the user’s home location as well as other sensitive locations like military bases or places of worship.

The final step for ethical use is to derive and export only aggregated metrics out of Amazon SageMaker. This means getting metrics such as average number or total number of visitors as opposed to individual travel patterns; getting daily, weekly, monthly or yearly trends; or indexing mobility patters over publicly available data such as census data.

Solution overview

As mentioned earlier, the AWS services that you can use for analysis of mobility data are Amazon S3, Amazon Macie, AWS Glue, S3 Object Lambda, Amazon Comprehend, and Amazon SageMaker geospatial capabilities. Amazon SageMaker geospatial capabilities make it easy for data scientists and ML engineers to build, train, and deploy models using geospatial data. You can efficiently transform or enrich large-scale geospatial datasets, accelerate model building with pre-trained ML models, and explore model predictions and geospatial data on an interactive map using 3D accelerated graphics and built-in visualization tools.

The following reference architecture depicts a workflow using ML with geospatial data.

In this workflow, raw data is aggregated from various data sources and stored in an Amazon Simple Storage Service (S3) bucket. Amazon Macie is used on this S3 bucket to identify and redact and PII. AWS Glue is then used to clean and transform the raw data to the required format, then the modified and cleaned data is stored in a separate S3 bucket. For those data transformations that are not possible via AWS Glue, you use AWS Lambda to modify and clean the raw data. When the data is cleaned, you can use Amazon SageMaker to build, train, and deploy ML models on the prepped geospatial data. You can also use the geospatial Processing jobs feature of Amazon SageMaker geospatial capabilities to preprocess the data—for example, using a Python function and SQL statements to identify activities from the raw mobility data. Data scientists can accomplish this process by connecting through Amazon SageMaker notebooks. You can also use Amazon QuickSight to visualize business outcomes and other important metrics from the data.

Amazon SageMaker geospatial capabilities and geospatial Processing jobs

After the data is obtained and fed into Amazon S3 with a daily feed and cleaned for any sensitive data, it can be imported into Amazon SageMaker using an Amazon SageMaker Studio notebook with a geospatial image. The following screenshot shows a sample of daily device pings uploaded into Amazon S3 as a CSV file and then loaded in a pandas data frame. The Amazon SageMaker Studio notebook with geospatial image comes preloaded with geospatial libraries such as GDAL, GeoPandas, Fiona, and Shapely, and makes it straightforward to process and analyze this data.

This sample dataset contains approximately 400,000 daily device pings from 5,000 devices from 14,000 unique places recorded from users visiting the Arrowhead Mall, a popular shopping mall complex in Phoenix, Arizona, on May 15, 2023. The preceding screenshot shows a subset of columns in the data schema. The MAID column represents the device ID, and each MAID generates pings every minute relaying the latitude and longitude of the device, recorded in the sample file as Lat and Lng columns.

The following are screenshots from the map visualization tool of Amazon SageMaker geospatial capabilities powered by Foursquare Studio, depicting the layout of pings from devices visiting the mall between 7:00 AM and 6:00 PM.

The following screenshot shows pings from the mall and surrounding areas.

The following shows pings from inside various stores in the mall.

Each dot in the screenshots depicts a ping from a given device at a given point in time. A cluster of pings represents popular spots where devices gathered or stopped, such as stores or restaurants.

As part of the initial ETL, this raw data can be loaded onto tables using AWS Glue. You can create an AWS Glue crawler to identify the schema of the data and form tables by pointing to the raw data location in Amazon S3 as the data source.

As mentioned above, the raw data (the daily device pings), even after initial ETL, will represent a continuous stream of GPS pings indicating device locations. To extract actionable insights from this data, we need to identify stops and trips (trajectories). This can be achieved using the geospatial Processing jobs feature of SageMaker geospatial capabilities. Amazon SageMaker Processing uses a simplified, managed experience on SageMaker to run data processing workloads with the purpose-built geospatial container. The underlying infrastructure for a SageMaker Processing job is fully managed by SageMaker. This feature enables custom code to run on geospatial data stored on Amazon S3 by running a geospatial ML container on a SageMaker Processing job. You can run custom operations on open or private geospatial data by writing custom code with open source libraries, and run the operation at scale using SageMaker Processing jobs. The container-based approach solves for needs around standardization of development environment with commonly used open source libraries.

To run such large-scale workloads, you need a flexible compute cluster that can scale from tens of instances to process a city block, to thousands of instances for planetary-scale processing. Manually managing a DIY compute cluster is slow and expensive. This feature is particularly helpful when the mobility dataset involves more than a few cities to multiple states or even countries and can be used to run a two-step ML approach.

The first step is to use density-based spatial clustering of applications with noise (DBSCAN) algorithm to cluster stops from pings. The next step is to use the support vector machines (SVMs) method to further improve the accuracy of the identified stops and also to distinguish stops with engagements with a POI vs. stops without one (such as home or work). You can also use SageMaker Processing job to generate trips and trajectories from the daily device pings by identifying consecutive stops and mapping the path between the source and destinations stops.

After processing the raw data (daily device pings) at scale with geospatial Processing jobs, the new dataset called stops should have the following schema.

Attribute	Description
Id or MAID	Mobile Advertising ID of the device (hashed)
lat	Latitude of the centroid of the stop cluster
lng	Longitude of the centroid of the stop cluster
geohash	Geohash location of the POI
device_type	Operating system of the device (IDFA or GAID)
timestamp	Start time of the stop
dwell_time	Dwell time of the stop (in seconds)
ip	IP address
alt	Altitude of the device (in meters)
country	ISO two-digit code for the country of origin
state	Codes representing state
city	Codes representing city
zipcode	Zip code of where device ID is seen
carrier	Carrier of the device
device_manufacturer	Manufacturer of the device

Stops are consolidated by clustering the pings per device. Density-based clustering is combined with parameters such as the stop threshold being 300 seconds and the minimum distance between stops being 50 meters. These parameters can be adjusted as per your use case.

The following screenshot shows approximately 15,000 stops identified from 400,000 pings. A subset of the preceding schema is present as well, where the column Dwell Time represents the stop duration, and the Lat and Lng columns represent the latitude and longitude of the centroids of the stops cluster per device per location.

Post-ETL, data is stored in Parquet file format, which is a columnar storage format that makes it easier to process large amounts of data.

The following screenshot shows the stops consolidated from pings per device inside the mall and surrounding areas.

After identifying stops, this dataset can be joined with publicly available POI data or custom POI data specific to the use case to identify activities, such as engagement with brands.

The following screenshot shows the stops identified at major POIs (stores and brands) inside the Arrowhead Mall.

Home zip codes have been used to mask each visitor’s home location to maintain privacy in case that is part of their trip in the dataset. The latitude and longitude in such cases are the respective coordinates of the centroid of the zip code.

The following screenshot is a visual representation of such activities. The left image maps the stops to the stores, and the right image gives an idea of the layout of the mall itself.

This resulting dataset can be visualized in a number of ways, which we discuss in the following sections.

Density metrics

We can calculate and visualize the density of activities and visits.

Example 1 – The following screenshot shows top 15 visited stores in the mall.

Example 2 – The following screenshot shows number of visits to the Apple Store by each hour.

Trips and trajectories

As mentioned earlier, a pair of consecutive activities represents a trip. We can use the following approach to derive trips from the activities data. Here, window functions are used with SQL to generate the trips table, as shown in the screenshot.

After the trips table is generated, trips to a POI can be determined.

Example 1 – The following screenshot shows the top 10 stores that direct foot traffic towards the Apple Store.

Example 2 – The following screenshot shows all the trips to the Arrowhead Mall.

Example 3 – The following video shows the movement patterns inside the mall.

Example 4 – The following video shows the movement patterns outside the mall.

Catchment area analysis

We can analyze all visits to a POI and determine the catchment area.

Example 1 – The following screenshot shows all visits to the Macy’s store.

Example 2 – The following screenshot shows the top 10 home area zip codes (boundaries highlighted) from where the visits occurred.

Data quality check

We can check the daily incoming data feed for quality and detect anomalies using QuickSight dashboards and data analyses. The following screenshot shows an example dashboard.

Conclusion

Mobility data and its analysis for gaining customer insights and obtaining competitive advantage remains a niche area because it’s difficult to obtain a consistent and accurate dataset. However, this data can help organizations add context to existing analysis and even produce new insights around customer movement patterns. Amazon SageMaker geospatial capabilities and geospatial Processing jobs can help implement these use cases and derive insights in an intuitive and accessible way.

In this post, we demonstrated how to use AWS services to clean the mobility data and then use Amazon SageMaker geospatial capabilities to generate derivative datasets such as stops, activities, and trips using ML models. Then we used the derivative datasets to visualize movement patterns and generate insights.

You can get started with Amazon SageMaker geospatial capabilities in two ways:

Through the Amazon SageMaker geospatial UI, as a part of Amazon SageMaker Studio UI
Through Amazon SageMaker notebooks with a Amazon SageMaker geospatial image

To learn more, visit Amazon SageMaker geospatial capabilities and Getting Started with Amazon SageMaker geospatial. Also, visit our GitHub repo, which has several example notebooks on Amazon SageMaker geospatial capabilities.

About the Authors

Jimy Matthews is an AWS Solutions Architect, with expertise in AI/ML tech. Jimy is based out of Boston and works with enterprise customers as they transform their business by adopting the cloud and helps them build efficient and sustainable solutions. He is passionate about his family, cars and Mixed martial arts.

Girish Keshav is a Solutions Architect at AWS, helping out customers in their cloud migration journey to modernize and run workloads securely and efficiently. He works with leaders of technology teams to guide them on application security, machine learning, cost optimization and sustainability. He is based out of San Francisco, and loves traveling, hiking, watching sports, and exploring craft breweries.

Ramesh Jetty is a Senior leader of Solutions Architecture focused on helping AWS enterprise customers monetize their data assets. He advises executives and engineers to design and build highly scalable, reliable, and cost effective cloud solutions, especially focused on machine learning, data and analytics. In his free time he enjoys the great outdoors, biking and hiking with his family.

Suit Up, Adapt, Survive

Return to the Sands of Time

Time for New Games

Overview

What Is Quantization?

How to Perform Static Quantization and Dynamic Quantization

Code Sample

Dataset

Implementation

Download and Prepare the Dataset

Prepare the Data Loader

Create the Model

Apply Static Quantization

Apply Dynamic Quantization

Next Steps

Additional Resources

Solution overview

Deploy Llama 2 on AWS Inferentia instances using the SageMaker Studio UI and the Python SDK

Discover the Llama 2 model on the SageMaker Studio UI

Deploy the Llama-2-13b model with SageMaker Jumpstart

Deploy the Llama 2 Neuron model via the Python SDK

Fine-tune Llama 2 models on Trainium instances using the SageMaker Studio UI and SageMaker Python SDK

Fine-tune the Llama-2-13b Neuron model with SageMaker Studio

Fine-tune the Llama-2-13b Neuron model via the SageMaker Python SDK

Compare responses between the pre-trained and fine-tuned Llama 2 Neuron models

Clean up

Conclusion

About the Authors

Sources and schema

Use cases

Challenges and ethical use

Solution overview

Amazon SageMaker geospatial capabilities and geospatial Processing jobs

Density metrics

Trips and trajectories

Catchment area analysis

Data quality check

Conclusion

About the Authors

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.