A Quantum Boost: cuQuantum With PennyLane Lets Simulations Ride Supercomputers

A Quantum Boost: cuQuantum With PennyLane Lets Simulations Ride Supercomputers

Ten miles in from Long Island’s Atlantic coast, Shinjae Yoo is revving his engine.

The computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory is one of many researchers gearing up to run quantum computing simulations on a supercomputer for the first time, thanks to new software.

Yoo’s engine, the Perlmutter supercomputer at the National Energy Research Scientific Computing Center (NERSC), is using the latest version of PennyLane, a quantum programming framework from Toronto-based Xanadu. The open-source software, which builds on the NVIDIA cuQuantum software development kit, lets simulations run on high-performance clusters of NVIDIA GPUs.

The performance is key because researchers like Yoo need to process ocean-size datasets. He’ll run his programs across as many as 256 NVIDIA A100 Tensor Core GPUs on Perlmutter to simulate about three dozen qubits — the powerful calculators quantum computers use.

That’s about twice the number of qubits most researchers can model these days.

Powerful, Yet Easy to Use

The so-called multi-node version of PennyLane, used in tandem with the NVIDIA cuQuantum SDK, simplifies the complex job of accelerating massive simulations of quantum systems.

“This opens the door to letting even my interns run some of the largest simulations — that’s why I’m so excited,” said Yoo, whose team has six projects using PennyLane in the pipeline.

Pic of Brookhaven’s Shinjae Yoo prepares to scale up his quantum work on the Perlmutter supercomputer.
Brookhaven’s Shinjae Yoo prepares to scale up his quantum work on the Perlmutter supercomputer.

His work aims to advance high-energy physics and machine learning. Other researchers use quantum simulations to take chemistry and materials science to new levels.

Quantum computing is alive in corporate R&D centers, too.

For example, Xanadu is helping companies like Rolls-Royce develop quantum algorithms to design state-of-the-art jet engines for sustainable aviation and Volkswagen Group invent more powerful batteries for electric cars.

Four More Projects on Perlmutter

Meanwhile, at NERSC, at least four other projects are in the works this year using multi-node Pennylane, according to Katherine Klymko, who leads the quantum computing program there. They include efforts from NASA Ames and the University of Alabama.

“Researchers in my field of chemistry want to study molecular complexes too large for classical computers to handle,” she said. “Tools like Pennylane let them extend what they can currently do classically to prepare for eventually running algorithms on large-scale quantum computers.”

Blending AI, Quantum Concepts

PennyLane is the product of a novel idea. It adapts popular deep learning techniques like backpropagation and tools like PyTorch to programming quantum computers.

Xanadu designed the code to run across as many types of quantum computers as possible, so the software got traction in the quantum community soon after its introduction in a 2018 paper.

“There was engagement with our content, making cutting-edge research accessible, and people got excited,” recalled Josh Izaac, director of product at Xanadu and a quantum physicist who was an author of the paper and a developer of PennyLane.

Calls for More Qubits

A common comment on the PennyLane forum these days is, “I want more qubits,” said Lee J. O’Riordan, a senior quantum software developer at Xanadu, responsible for PennyLane’s performance.

“When we started work in 2022 with cuQuantum on a single GPU, we got 10x speedups pretty much across the board … we hope to scale by the end of the year to 1,000 nodes — that’s 4,000 GPUs — and that could mean simulating more than 40 qubits,” O’Riordan said.

Scientists are still formulating the questions they’ll address with that performance — the kind of problem they like to have.

Companies designing quantum computers will use the boost to test ideas for building better systems. Their work feeds a virtuous circle, enabling new software features in PennyLane that, in turn, enable more system performance.

Scaling Well With GPUs

O’Riordan saw early on that GPUs were the best vehicle for scaling PennyLane’s performance. He co-authored last year a paper on a method for splitting a quantum program across more than 100 GPUs to simulate more than 60 qubits, split into many 30 qubit sub-circuits.

Picture of Lee J. O’Riordan, PennyLane developer at Xanadu
Lee J. O’Riordan

“We wanted to extend our work to even larger workloads, so when we heard NVIDIA was adding multi-node capability to cuQuantum, we wanted to support it as soon as possible,” he said.

Within four months, multi-node PennyLane was born.

“For a big, distributed GPU project, that was a great turnaround time. Everyone working on cuQuantum helped make the integration as easy as possible,” O’Riordan said.

The team is still collecting data, but so far on “sample-based workloads, we see almost linear scaling,” he said.

Or, as NVIDIA founder and CEO Jensen Huang might say, “The more you buy, the more you save.”

Read More

One Small Step for Artists, One Giant Leap for Creative-Kind

One Small Step for Artists, One Giant Leap for Creative-Kind

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks and demonstrates how NVIDIA Studio technology improves creative workflows. 

When it comes to converting 2D concepts into 3D masterpieces, self-taught visual development artist Alex Treviño has confidence in the potential of all aspiring creators.

“You may think it’s a complicated process, but trust me, it’s easier than you think,” he said.

The featured content creator of this week’s In the NVIDIA Studio installment, Treviño is the founder of AENDOM, a project with the mission of creating artwork rooted in storytelling elements and sharing creative processes to educate and inspire the next generation of artists.

 

From this initiative, the Lunar Rover collection was born.

Shooting for the Moon

The story behind the Lunar Rover collection comes from an exploration of grief and inspired by the work of artist Mattias Adolfsson.

However, Treviño wanted to translate Adolfsson’s detailed and playful caricature style into his own 3D design.

Treviño’s inspiration, credit Mattias Adolfsson.

Treviño started gathering reference imagery and creating mood boards with the standalone program PureRef, which allowed him to play with different perspectives and styles while in the conceptual phase.

“I wanted the character to explore a desolate landscape where it is clear that, despite loneliness and abandonment, he continues to explore in allusion to the emotions of grief,” Treviño said.

Advanced sculpting in Blender.

He then shaped and sculpted models in his preferred 3D app, Blender. Using its Cycles’ RTX-accelerated OptiX ray tracing in the viewport, powered by his GeForce RTX 3080 Ti GPU-equipped PC, Treviño unlocked interactive, photorealistic modeling with smooth movement in the viewport.

“NVIDIA GPUs have a wide range of support and powerful performance, which ensures that I can rely on my GPU to work correctly and render images faster and with higher quality,” said Treviño.

Next, Treviño applied UV mapping to his models, which allowed him to texture them in Adobe Substance 3D Painter to create realistic, detailed textures.

UV mapping in Blender.

RTX-accelerated light and ambient occlusion baking optimized assets in mere moments.

Textures created in Adobe Substance 3D Painter.

“My GeForce RTX GPU’s capabilities were essential while texturing,” Treviño said. “Movement without lag and the ability to make speedy material changes effortlessly were especially helpful while swapping looks.”

Treviño moved to Adobe Illustrator to create alphas — color components that represent degrees of transparency or opacity of colors — as well as masks and patterns.

“GPU acceleration and AI-enabled features are essential tools, as they allow me to work more efficiently and produce higher-quality results,” said Treviño.

He returned to Blender, taking advantage of RTX-accelerated OptiX ray tracing in Blender Cycles for the fastest final-frame render.

Finally, Treviño imported the project into Adobe Photoshop for postproduction work, including adjusting color grading, sharpness, noise and chromatic aberration, and using look-up tables for retouching — just a few of the 30+ GPU-accelerated features at his disposal.

Stunning details in post-production and color correction thanks to Adobe Photoshop.

The end result achieved Treviño’s goal of creating a desolate landscape and alluding to the emotions of grief.

Beautiful yet desolate.

For a more detailed look at Treviño’s creative process, check out his five-part tutorial series, Creating 3D Lunar Rover w/ Alex Treviño, live on the NVIDIA Studio YouTube channel.

https://www.youtube.com/playlist?list=PL4w6jm6S2lzvy-mfeIHJiAmqN6ARz-DJt

Discover exclusive step-by-step tutorials from industry-leading artists, inspiring community showcases and more, powered by NVIDIA Studio hardware and software.

Lunar Lessons Learned

Treviño has three monumental pieces of advice for aspiring artists:

  1. Learn the basics of the entire pipeline process. Learn about modeling, texturing, rendering, post-production, marketing and promotion. Expertise across the board isn’t required but general understanding of each step is.
  2. Don’t be afraid to experiment. The best way to learn is by doing. Try new things and experiment with different techniques. Mistakes will lead to growth and evolved artistry.
  3. Find a community of like-minded artists. Connect in multiple communities to learn from others, share work and get valuable feedback.
3D visual development artist Alex Treviño.

Check out Treviño’s portfolio on Instagram.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

PyTorch project timeline

One Year of PyTorch Foundation

It’s been one year since we announced the formation of the PyTorch Foundation! 🎉

In its inaugural year, the PyTorch Foundation made a significant impact by launching PyTorch 2.0, growing contributors and adding new member companies. We’re grateful to our founding members for their support to move the foundation forward.

A few milestones in the past year include:

💻 Over 600,000 repositories on GitHub
✅ 60% of AI implementations choosing PyTorch
📈 More than 20% year over year growth in new repositories
🤝 Over 12,000 commits since last year

And a look at what the foundation has been up to this past year:

PyTorch project timeline

We look forward to growing our community for the years to come through supporting our contributors, democratizing the AI field, and creating new innovations.

We invite you to join us at this year’s PyTorch Conference on October 16-17 in San Francisco. Conference registration is filling up quickly, so take advantage of your chance to be part of this exciting event.

Join us to stay informed about the latest announcements and have the opportunity to connect with both the founding members and new additions to the PyTorch community.

With thanks and gratitude,
The PyTorch Foundation Team

Read More

Gender Bias in LLMs

Large Language Models (LLMs) have made substantial progress in the past several months, shattering state-of-the-art benchmarks in many domains. This paper investigates LLMs’ behavior with respect to gender stereotypes, a known stumbling block for prior models. We propose a simple paradigm to test the presence of gender bias, building on but differing from WinoBias, a commonly used gender bias dataset which is likely to be included in the training data of current LLMs. We test four recently published LLMs and demonstrate that they express biased assumptions about men and women, specifically…Apple Machine Learning Research

Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement

We propose Dataset Reinforcement, a strategy to improve a dataset once such that the accuracy of any model architecture trained on the reinforced dataset is improved at no additional training cost for users. We propose a Dataset Reinforcement strategy based on data augmentation and knowledge distillation. Our generic strategy is designed based on extensive analysis across CNN- and transformer-based models and performing large-scale study of distillation with state-of-the-art models with various data augmentations. We create a reinforced version of the ImageNet training dataset, called…Apple Machine Learning Research

Self-Supervised Object Goal Navigation with In-Situ Finetuning

A household robot should be able to navigate to target locations without requiring users to first annotate everything in their home. Current approaches to this object navigation challenge do not test on real robots and rely on expensive semantically labeled 3D meshes. In this work, our aim is an agent that builds self-supervised models of the world via exploration, the same as a child might. We propose an end-to-end self-supervised embodied agent that leverages exploration to train a semantic segmentation model of 3D objects, and uses those representations to learn an object navigation policy…Apple Machine Learning Research

Pre-processing temporal data made easier with TensorFlow Decision Forests and Temporian

Pre-processing temporal data made easier with TensorFlow Decision Forests and Temporian

Posted by Google: Mathieu Guillame-Bert, Richard Stotz, Robert Crowe, Luiz GUStavo Martins (Gus), Ashley Oldacre, Kris Tonthat, Glenn Cameron, and Tryolabs: Ian Spektor, Braulio Rios, Guillermo Etchebarne, Diego Marvid, Lucas Micol, Gonzalo Marín, Alan Descoins, Agustina Pizarro, Lucía Aguilar, Martin Alcala Rubi

Temporal data is omnipresent in applied machine learning applications. Data often changes over time or is only available or valuable at a certain point in time. For example, market prices and weather conditions change constantly. Temporal data is also often highly discriminative in decision-making tasks. For example, the rate of change and interval between two consecutive heartbeats provides valuable insights into a person’s physical health, and temporal patterns of network logs are used to detect configuration issues and intrusions. Hence, it is essential to incorporate temporal data and temporal information in ML applications.

INFO:  Temporian is a new open-source Python library for preprocessing and feature engineering temporal data for machine learning applications. It is developed in collaboration between Google and Tryolabs. Check the sister blog post for more details.

This blog post demonstrates how to train a forecasting model on transactional data. Specifically, we will show how to forecast the total weekly sales from individual sales records. For the modeling part, we will use TensorFlow Decision Forests as they are well suited to handle temporal data. To feed the transaction data to our model, and to compute temporal specific features, we will use Temporian, a newly released library designed for ingesting and aggregating transactional data from multiple non-synchronized sources.

ALT TEXT

Time series are the most commonly used representation for temporal data. They consist of uniformly sampled values, which can be useful for representing aggregate signals. However, time series are sometimes not sufficient to represent the richness of available data. Instead, multivariate time series can represent multiple signals together, while time sequences or event sets can represent non-uniformly sampled measurements. Multi-index time sequences can be used to represent relations between different time sequences. In this blog post, we will use the multivariate multi-index time sequence, also known as event sets. Don’t worry, they’re not as complex as they sound.

Examples of temporal data include:

  • Weather and other environmental data for weather forecasting, soil profile forecasting and crop yield optimization, temperature tracking, and climate change characterization.

  • Sensory data for quality monitoring, and predictive maintenance.

  • Health data for early treatment, personalized medicine, and epidemic detection.

  • Retail customer data for sales forecasting, sales optimization, and targeted advertising.

  • Banking customer data for fraud detection and loan risk analysis.

  • Economic and financial data for risk analysis, budgetary analysis, stock market analysis, and yield projections.

A simple example

Let’s start with a simple example. We have collected sales records from a fictitious online shop. Each time a client makes a purchase, we record the following information: time of the purchase, client id, product purchased, and price of the product.

The dataset is stored in a single CSV file, with one transaction per line:

$ head -n 5 sales.csv
timestamp,client,product,price
2010-10-05 11:09:56,c64,p35,405.35
2010-09-27 15:00:49,c87,p29,605.35
2010-09-09 12:58:33,c97,p10,108.99
2010-09-06 12:43:45,c60,p85,443.35

Looking at data is crucial to understand the data and spot potential issues. Our first task is to load the sales data into an EventSet and plot it.

INFO: A Temporian EventSet is a general-purpose container for temporal data. It can represent multivariate time series, time sequences, and indexed data.

# Import Temporian
import temporian as tp

# Load the csv dataset
sales = tp.from_csv("/tmp/sales.csv")

# Print details about the EventSet
sales

This code snippet load and print the data:

We can also plot the data:
# Plot "price" feature of the EventSet
sales["price"].plot()

ALT TEXT

We have shown how to load and visualize temporal data in just a few lines of code. However, the resulting plot is very busy, as it shows all transactions for all clients in the same view.

A common operation on temporal data is to calculate the moving sum. Let’s calculate and plot the sum of sales for each transaction in the previous seven days. The moving sum can be computed using the moving_sum operator.

weekly_sales = sales["price"].moving_sum(tp.duration.days(7))

weekly_sales.plot()

ALT TEXT

BONUS: To make the plots interactive, you can add the interactive=True argument to the plot function. 

Sales per products

In the previous step, we computed the overall moving sum of sales for the entire shop. However, what if we wanted to calculate the rolling sum of sales for each product or client separately?

For this task, we can use an index.

# Index the data by "product"
sales_per_product = sales.add_index("product")

# Compute the moving sum for each product
weekly_sales_per_product = sales_per_product["price"].moving_sum(
        tp.duration.days(7)
)

# Plot the results
weekly_sales_per_product.plot()

ALT TEXT

NOTE: Many operators such as moving_sum applied independently on each index.

Aggregate transactions into time series

Our dataset contains individual client transactions. To use this data with a machine learning model, it is often useful to aggregate it into time series, where the data is sampled uniformly over time. For example, we could aggregate the sales weekly, or calculate the total sales in the last week for each day.

However, it is important to note that aggregating transaction data into time series can result in some data loss. For example, the individual transaction timestamps and values would be lost. This is because the aggregated time series would only represent the total sales for each time period.

Let’s compute the total sales in the last week for each day for each product individually.

# The data is sampled daily
daily_sampling = sales_per_product.tick(tp.duration.days(1))

weekly_sales_daily = sales_per_product["price"].moving_sum(
    tp.duration.days(7),
    sampling=daily_sampling, # The new bit
)

weekly_sales_daily.plot()

ALT TEXT

NOTE: The current plot is a continuous line, while the previous plots have markers. This is because Temporian uses continuous lines by default when the data is uniformly sampled, and markers otherwise.

After the data preparation stage is finished, the data can be exported to a Pandas DataFrame as a final step.

tp.to_pandas(weekly_sales_daily)

Train a forecasting model with TensorFlow model

A key application of Temporian is to clean data and perform feature engineering for machine learning models. It is well suited for forecasting, anomaly detection, fraud detection, and other tasks where data comes continuously.

In this example, we show how to train a TensorFlow model to predict the next day’s sales using past sales for each product individually. We will feed the model various levels of aggregations of sales as well as calendar information.

Let’s first augment our dataset and convert it to a dataset compatible with a tabular ML model.

sales_per_product = sales.add_index("product")

# Create one example per day
daily_sampling = sales_per_product.tick(tp.duration.days(1))

# Compute moving sums with various window length.
# Machine learning models are able to select the ones that matter.

features = [] for w in [3, 7, 14, 28]:
features.append(sales_per_product["price"] .moving_sum(
tp.duration.days(w),
sampling=daily_sampling)
.rename(f"moving_sum_{w}"))

# Calendar information such as the day of the week are
# very informative of human activities.
features.append(daily_sampling.calendar_day_of_week())

# The label is the daly sales shifted / leaked one days in the future.
label = (sales_per_product["price"] .leak(tp.duration.days(1))
.moving_sum(
tp.duration.days(1),
sampling=daily_sampling,
)
.rename("label"))

# Collect the features and labels together.
dataset = tp.glue(*features, label)

dataset

ALT TEXT

We can then convert the dataset from EventSet to TensorFlow Dataset format, and train a Random Forest.

import tensorflow_decision_forests as tfdf

def extract_label(example):
example.pop("timestamp") # Don't use use the timestamps as feature
label = example.pop("label")
return example, label

tf_dataset = tp.to_tensorflow_dataset(dataset).map(extract_label).batch(100)

model = tfdf.keras.RandomForestModel(task=tfdf.keras.Task.REGRESSION,verbose=2)
model.fit(tf_dataset)

And that’s it, we have a model trained to forecast sales. We now can look at the variable importance of the model to understand what features matter the most.

model.summary()

In the summary, we can find the INV_MEAN_MIN_DEPTH variable importance:

Type: "RANDOM_FOREST"
Task: REGRESSION
...
Variable Importance: INV_MEAN_MIN_DEPTH:
1. "moving_sum_28" 0.342231 ################
2. "product" 0.294546 ############
3. "calendar_day_of_week" 0.254641 ##########
4. "moving_sum_14" 0.197038 ######
5. "moving_sum_7" 0.124693 #
6. "moving_sum_3" 0.098542

We see that moving_sum_28 is the feature with the highest importance (0.342231). This indicates that the sum of sales in the last 28 days is very important to the model. To further improve our model, we should probably add more temporal aggregation features. The product feature also matters a lot.

And to get an idea of the model itself, we can plot one of the trees of the Random Forest.

tfdf.model_plotter.plot_model_in_colab(model, tree_idx=0, max_depth=2)
ALT TEXT

More on temporal data preprocessing

We demonstrated some simple data preprocessing. If you want to see other examples of temporal data preprocessing on different data domains, check the Temporian tutorials. Notably:

  • Heart rate analysis ❤️ detects individual heartbeats and derives heart rate related features on raw ECG signals from Physionet.
  • M5 Competition 🛒 predicts retail sales in the M5 Makridakis Forecasting competition.
  • Loan outcomes prediction 🏦 prepares relational SQL data to predict outcomes for finished loans.
  • Detecting payment card fraud 💳 detects fraudulent payment card transactions in real time.
  • Supervised and unsupervised anomaly detection 🔎 perform data analysis and feature engineering to detect anomalies in a group of server’s resource usage metrics.

Next Steps

We demonstrated how to handle temporal data such as transactions in TensorFlow using the Temporian library. Now you can try it too!

To learn more about model training with TensorFlow Decision Forests:

Read More

Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart

Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart

Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.

What is Falcon 180B

Falcon 180B is a model released by TII that follows previous releases in the Falcon family. It’s a scaled-up version of Falcon 40B, and it uses multi-query attention for better scalability. It’s an auto-regressive language model that uses an optimized transformer architecture. It was trained on 3.5 trillion tokens of data, primarily consisting of web data from RefinedWeb (approximately 85%). The model has two versions: 180B and 180B-Chat. 180B is a raw, pre-trained model, which should be further fine-tuned for most use cases. 180B-Chat is better suited to taking generic instructions. The Chat model has been fine-tuned on chat and instructions datasets together with several large-scale conversational datasets.

The model is made available under the Falcon-180B TII License and Acceptable Use Policy.

Falcon 180B was trained by TII on Amazon SageMaker, on a cluster of approximately 4K A100 GPUs. It used a custom distributed training codebase named Gigatron, which uses 3D parallelism with ZeRO, and custom, high-performance Triton kernels. The distributed training architecture used Amazon Simple Storage Service (Amazon S3) as the sole unified service for data loading and checkpoint writing and reading, which particularly contributed to the workload reliability and operational simplicity.

What is SageMaker JumpStart

With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances within a network isolated environment, and customize models using Amazon SageMaker for model training and deployment.

You can now discover and deploy Falcon 180B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security. Falcon 180B is discoverable and can be deployed in Regions where the requisite instances are available. At present, ml.p4de instances are available in US East (N. Virginia) and US West (Oregon).

Discover models

You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find Falcon 180B in the Foundation Models: Text Generation carousel.

You can also find other model variants by choosing Explore all Text Generation Models or searching for Falcon.

You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find two buttons, Deploy and Open Notebook, which will help you use the model (the following screenshot shows the Deploy option).

Deploy models

When you choose Deploy, the model deployment will start. Alternatively, you can deploy through the example notebook that shows up by choosing Open Notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using a notebook, we start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

my_model = JumpStartModel(model_id="huggingface-llm-falcon-180b-chat-bf16") predictor = my_model.deploy()

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. To learn more, refer to the API documentation. After it’s deployed, you can run inference against the deployed endpoint through a SageMaker predictor. See the following code:

payload = {
    "inputs": "User: Hello!nFalcon: ",
    "parameters": {"max_new_tokens": 256, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload)

Inference parameters control the text generation process at the endpoint. The max new tokens control refers to the size of the output generated by the model. Note that this is not the same as the number of words because the vocabulary of the model is not the same as the English language vocabulary and each token may not be an English language word. Temperature controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.

This 180B parameter model is 335GB and requires even more GPU memory to sufficiently perform inference in 16-bit precision. Currently, JumpStart only supports this model on ml.p4de.24xlarge instances. It is possible to deploy an 8-bit quantized model on a ml.p4d.24xlarge instance by providing the env={"HF_MODEL_QUANTIZE": "bitsandbytes"} keyword argument to the JumpStartModel constructor and specifying instance_type="ml.p4d.24xlarge" to the deploy method. However, please note that per-token latency is approximately 5x slower for this quantized configuration.

The following table lists all the Falcon models available in SageMaker JumpStart along with the model IDs, default instance types, maximum number of total tokens (sum of the number of input tokens and number of generated tokens) supported, and the typical response latency per token for each of these models.

Model Name Model ID Default Instance Type Max Total Tokens Latency per Token*
Falcon 7B huggingface-llm-falcon-7b-bf16 ml.g5.2xlarge 2048 34 ms
Falcon 7B Instruct huggingface-llm-falcon-7b-instruct-bf16 ml.g5.2xlarge 2048 34 ms
Falcon 40B huggingface-llm-falcon-40b-bf16 ml.g5.12xlarge 2048 57 ms
Falcon 40B Instruct huggingface-llm-falcon-40b-instruct-bf16 ml.g5.12xlarge 2048 57 ms
Falcon 180B huggingface-llm-falcon-180b-bf16 ml.p4de.24xlarge 2048 45 ms
Falcon 180B Chat huggingface-llm-falcon-180b-chat-bf16 ml.p4de.24xlarge 2048 45 ms

*per-token latency is provided for the median response time of the example prompts provided in this blog; this value will vary based on length of input and output sequences.

Inference and example prompts for Falcon 180B

Falcon models can be used for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as answering questions, language translation, sentiment analysis, and many more. The endpoint accepts the following input payload schema:

payload = {
    "inputs": str,
    (optional)"parameters": {
        "do_sample": bool,
        "max_new_tokens": int,
        "repetition_penalty": float,
        "return_full_text": bool,
        "stop": List[str],
        "seed": int,
        "temperature": float,
        "top_k": int,
        "top_p": float,
        "truncate": int,
        "typical_p": float,
        "best_of": int,
        "watermark": bool,
        "details": bool,
        "decoder_input_details": bool,
    }
}

You can explore the definition of these client parameters and their default values within the text-generation-inference repository.

The following are some sample example prompts and the text generated by the model. All outputs here are generated with inference parameters {"max_new_tokens": 768, "stop": ["<|endoftext|>", "###"]}.

Building a website can be done in 10 simple steps:

Input Building a website can be done in 10 simple steps:

Output
- Choose a domain name
- Register your domain name
- Choose a web host
- Choose a website builder
- Create your website
- Add content to your website
- Add pages to your website
- Add images to your website
- Add videos to your website
- Add social media to your website
1. Choose a domain name
A domain name is the address of your website. It is what people will type into their browser to find your website. For example, the domain name of this website is.
When choosing a domain name, you should choose something that is easy to remember and that represents your business or website. You should also make sure that the domain name is available. You can check if a domain name is available by using a domain name search tool.
2. Register your domain name
Once you have chosen a domain name, you need to register it. This can be done through a domain name registrar. A domain name registrar is a company that manages the registration of domain names.
When registering your domain name, you will need to provide your contact information and choose a payment method. You will also need to choose a domain name extension. The most common domain name extensions are .com, .net, and .org.
3. Choose a web host
A web host is a company that provides space on a server for your website. When choosing a web host, you should choose one that is reliable and that offers good customer support. You should also make sure that the web host offers the features that you need for your website.
4. Choose a website builder
A website builder is a tool that allows you to create a website without having to learn how to code. There are many different website builders available, so you should choose one that is easy to use and that offers the features that you need.
5. Create your website
Once you have chosen a website builder, you can start creating your website. You will need to choose a template for your website and then add content to your website.
6. Add content to your website
The content of your website is what will make your website unique. You should add content that is relevant to your business or website. You can add text, images, videos, and other types of content to your website.
7. Add pages to your website
A website is made up of pages. You can add as many pages as you want to your website. Each page should have a different purpose. For example, you might have a page for your home page, a page for your about page, and a page for your contact page.
8. Add images to your website
Images can make your website more visually appealing. You can add images to your website by uploading them to your website or by using a image hosting service.
9. Add videos to your website
Videos can also make your website more visually appealing. You can add videos to your website by uploading them to your website or by using a video hosting service.
10. Add social media to your website
Social media can help you promote your website. You can add social media buttons to your website so that people can share your website on social media. You can also add social media feeds to your website so that people can see your latest social media posts.

You may notice this pretrained model generates long text sequences that are not necessarily ideal for dialog use cases. Before we show how the fine-tuned chat model performs for a larger set of dialog-based prompts, the next two examples illustrate how to use Falcon models with few-shot in-context learning, where we provide training samples available to the model. Note that “few-shot learning” does not adjust model weights — we only perform inference on the deployed model during this process while providing a few examples within the input context to help guild model output.

Input
Tweet: "I hate it when my phone battery dies."
Sentiment: Negative
###
Tweet: "My day has been :+1:"
Sentiment: Positive
###
Tweet: "This is the link to the article"
Sentiment: Neutral
###
Tweet: "This new music video was incredible"
Sentiment:

Output
 Positive
###
Input
Translate English to French:
sea otter => loutre de mer
###
peppermint => menthe poivrée
###
plush girafe => girafe peluche
###
cheese =>

Output
 fromage
###

Inference and example prompts for Falcon 180B-Chat

With Falcon 180B-Chat models, optimized for dialogue use cases, the input to the chat model endpoints may contain previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas, which define the chat assistant’s behavior. Input payload to the endpoint is the same as the Falcon 180B model except the inputs string value should use the following format:

System: <An optional system prompt>
User: <Optional user message history>
Falcon: <Optional chat history Falcon response>
User: <Current user message>
Falcon:

The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {"max_new_tokens":256, "stop": ["nUser:", "<|endoftext|>", " User:", "###"]}.

Input
System: You are a helpful assistant
User: What is Amazon SageMaker JumpStart?
Falcon: 

Output 
Amazon SageMaker JumpStart is a capability of Amazon SageMaker that provides a set of solutions to common machine learning problems. These solutions include pre-built machine learning models, algorithms, and pre-configured model training and deployment settings. JumpStart also provides sample notebooks that demonstrate how to use these solutions for various use cases. The goal of JumpStart is to help users get started with machine learning quickly and easily, without the need for extensive machine learning expertise.
User:
Input
User: Simply put, the theory of relativity states that
Falcon: 

Output 
1. The laws of physics are the same for all observers in uniform motion relative to one another.
2. The speed of light in a vacuum is the same for all observers, regardless of their motion or the motion of the light source.
3. Time and space are relative, meaning that they can vary depending on the observer's frame of reference.
4. Mass and energy are interchangeable, as expressed by the famous equation E=mc².
User:
Input
User: Building a website can be done in 10 simple steps:
Falcon: 

Output
1. Determine the purpose and goals of your website.
2. Choose a domain name and register it.
3. Select a web hosting provider and sign up for a plan.
4. Choose a website builder or content management system (CMS) to use.
5. Select a theme or template for your website.
6. Customize the design and layout of your website.
7. Add content to your website, including text, images, and videos.
8. Set up navigation menus and links to make your website easy to navigate.
9. Test your website to ensure it is functioning properly and is mobile-friendly.
10. Launch your website and promote it to attract visitors.
User:
Input
System: Always answer with a Haiku.
User: I am deploying a large language model on Amazon SageMaker JumpStart. What should I do?
Falcon: 

Output
"JumpStart" your dreams,
Deploy with ease, scale with glee,
SageMaker awaits.
Input
System: Generate a table summarizing the options outlined in this email.
User: Team, we have to make a decision on the product plan for Goocrux v2. The first option will take us 3 weeks of development, and will achieve performance of 80ms, but means that the UI has to change. The second option will take us 4 weeks of development, and will achieve performance of 60ms, without any UI change. The third option will take the longest -- 6 weeks of development, will achieve performance of 30ms, without any UI change. A markdown table with four columns: (1) Option ID, (2) Development time, (3) Performance, (4) UI Change.
Falcon: 

Output
| Option ID | Development time | Performance | UI Change |
| --- | --- | --- | --- |
| 1 | 3 weeks | 80ms | Yes |
| 2 | 4 weeks | 60ms | No |
| 3 | 6 weeks | 30ms | No |
Input
System: You always translate English to French.
User: sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>
Falcon: 

Output
=> fromage
Input
System: Classify the user prompt as Negative, Positive, or Neutral sentiment. Use a short response.
User: Tweet: "This new music video was awesome" 
Falcon: 

Output
"Positive"
Input
User: What is the recipe of mayonnaise?
Falcon: 

Output 1. In a mixing bowl, whisk together 1 egg yolk, 1 tablespoon of Dijon mustard, and 1 tablespoon of lemon juice until well combined.
2. While whisking continuously, slowly drizzle in 1 cup of vegetable oil until the mixture thickens and emulsifies.
3. Season with salt and pepper to taste.
4. Store in an airtight container in the refrigerator for up to one week.
User:

In the following example, the user has had a conversation with the assistant about tourist sites in Paris. Next, the user is inquiring about the first option recommended by the chat assistant.

Input
System: You are a helpful trip planner.
User: I am going to Paris, what should I see?
Falcon: 1. Eiffel Tower: No trip to Paris is complete without a visit to the iconic Eiffel Tower.
2. Louvre Museum: Home to some of the world's most famous art, including the Mona Lisa.
3. Notre-Dame Cathedral: A stunning example of Gothic architecture and a must-see for history buffs.
4. Champs-Élysées: One of the most famous avenues in the world, lined with shops, cafes, and theaters.
5. Montmartre: A charming neighborhood known for its bohemian vibe, street art, and stunning views of the city.
6. Palace of Versailles: A short train ride from Paris, this opulent palace was once the seat of French royalty.
7. Musée d'Orsay: Housed in a former train station, this museum boasts an impressive collection of Impressionist and Post-Impressionist art.
8. Sainte-Chapelle: A hidden gem in the heart of Paris, this medieval chapel is famous for its stunning stained glass windows.
9. Luxembourg Gardens: A beautiful park in the heart of the city, perfect for a picnic or a leisurely stroll.
User: What is so great about #1?
Falcon: 

Output
#1 is the Eiffel Tower, which is one of the most famous landmarks in the world. It is a symbol of Paris and a must-see attraction for many visitors. The tower offers stunning views of the city from its observation decks, and it is especially beautiful when lit up at night. Additionally, the Eiffel Tower has a rich history and is an engineering marvel, making it a fascinating destination for anyone interested in architecture or history.
User:

Clean up

After you’re done running the notebook, make sure to delete all resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Falcon 180B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

Resources


About the Authors

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Olivier Cruchant is a Principal Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Karl Albertsen leads Amazon SageMaker’s foundation model hub, algorithms, and partnerships teams.

Read More

Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform

Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform

Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving data science team productivity by up to 10x. SageMaker Canvas expands access to machine learning by providing business analysts with a visual interface that allows them to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code.

HashiCorp Terraform is an infrastructure as code (IaC) tool that lets you organize your infrastructure in reusable code modules. AWS customers rely on IaC to design, develop, and manage their cloud infrastructure, such as SageMaker Domains. IaC ensures that customer infrastructure and services are consistent, scalable, and reproducible while following best practices in the area of development operations (DevOps). Using Terraform, you can develop and manage your SageMaker Domain and its supporting infrastructure in a consistent and repeatable manner.

In this post, we demonstrate the Terraform implementation to deploy a SageMaker Domain and the Amazon Virtual Private Cloud (Amazon VPC) it associates with. The solution will use Terraform to create:

  • A VPC with subnets, security groups, as well as VPC endpoints to support VPC only mode for the SageMaker Domain.
  • A SageMaker Domain in VPC only mode with a user profile.
  • An AWS Key Management Service (AWS KMS) key to encrypt the SageMaker Studio’s Amazon Elastic File System (Amazon EFS) volume.
  • A Lifecycle Configuration attached to the SageMaker Domain to automatically shut down idle Studio notebook instances.
  • A SageMaker Domain execution role and IAM policies to enable SageMaker Studio and Canvas functionalities.

The solution described in this post is available at this GitHub repo.

Solution overview

The following image shows SageMaker Domain in VPC only mode.

sagemaker_domain_vpc_only

By launching SageMaker Domain in your VPC, you can control the data flow from your SageMaker Studio and Canvas environments. This allows you to restrict internet access, monitor and inspect traffic using standard AWS networking and security capabilities, and connect to other AWS resources through VPC endpoints.

VPC requirements to use VPC only mode

Creating a SageMaker Domain in VPC only mode requires a VPC with the following configurations:

  1. At least two private subnets, each in a different Availability Zone, to ensure high availability.
  2. Ensure your subnets have the required number of IP addresses needed. We recommend between two and four IP addresses per user. The total IP address capacity for a Studio domain is the sum of available IP addresses for each subnet provided when the domain is created.
  3. Set up one or more security groups with inbound and outbound rules that together allow the following traffic:
    • NFS traffic over TCP on port 2049 between the domain and the Amazon EFS volume.
    • TCP traffic within the security group. This is required for connectivity between the JupyterServer app and the KernelGateway apps. You must allow access to at least ports in the range 8192–65535.
  4. Create a gateway endpoint for Amazon Simple Storage Service (Amazon S3). SageMaker Studio needs to access Amazon S3 from your VPC using Gateway VPC endpoints. After you create the gateway endpoint, you need to add it as a target in your route table for traffic destined from your VPC to Amazon S3.
  5. Create interface VPC endpoints (AWS PrivateLink) to allow Studio to access the following services with the corresponding service names. You must also associate a security group for your VPC with these endpoints to allow all inbound traffic from port 443:
    • SageMaker API: com.amazonaws.region.sagemaker.api. This is required to communicate with the SageMaker API.
    • SageMaker runtime: com.amazonaws.region.sagemaker.runtime. This is required to run Studio notebooks and to train and host models.
    • SageMaker Feature Store: com.amazonaws.region.sagemaker.featurestore-runtime. This is required to use SageMaker Feature Store.
    • SageMaker Projects: com.amazonaws.region.servicecatalog. This is required to use SageMaker Projects.

Additional VPC endpoints to use SageMaker Canvas

In addition to the previously mentioned VPC endpoints, to use SageMaker Canvas, you need to also create the following interface VPC endpoints:

  • Amazon Forecast and Amazon Forecast Query: com.amazonaws.region.forecast and com.amazonaws.region.forecastquery. These are required to use Amazon Forecast.
  • Amazon Rekognition: com.amazonaws.region.rekognition. This is required to use Amazon Rekognition.
  • Amazon Textract: com.amazonaws.region.textract. This is required to use Amazon Textract.
  • Amazon Comprehend: com.amazonaws.region.comprehend. This is required to use Amazon Comprehend.
  • AWS Security Token Service (AWS STS): com.amazonaws.region.sts. This is required because SageMaker Canvas uses AWS STS to connect to data sources.
  • Amazon Athena and AWS Glue: com.amazonaws.region.athena and com.amazonaws.region.glue. This is required to connect to AWS Glue Data Catalog through Amazon Athena.
  • Amazon Redshift: com.amazonaws.region.redshift-data. This is required to connect to the Amazon Redshift data source.

To view all VPC endpoints for each service you can use with SageMaker Canvas, please go to Configure Amazon SageMaker Canvas in a VPC without internet access.

AWS KMS encryption for SageMaker Studio’s EFS volume

The first time a user on your team onboards to SageMaker Studio, SageMaker creates an EFS volume for the team. A home directory is created in the volume for each user who onboards to Studio as part of your team. Notebook files and data files are stored in these directories.

You can encrypt your SageMaker Studio’s EFS volume with a KMS key so your home directories’ data are encrypted at rest. This Terraform solution creates a KMS key and uses it to encrypt SageMaker Studio’s EFS volume.

SageMaker Domain Lifecycle Configuration to automatically shut down idle Studio notebooks

sagemaker_auto_shutdown

Lifecycle Configurations are shell scripts triggered by Amazon SageMaker Studio lifecycle events, such as starting a new Studio notebook. You can use Lifecycle Configurations to automate customization for your Studio environment.

This Terraform solution creates a SageMaker Lifecycle Configuration to detect and stop idle resources that incur costs within Studio using an auto-shutdown Jupyter extension. Under the hood, the following resources are created or configured to achieve the desired result:

  1. Create an S3 bucket and upload the latest version of the auto-shutdown extension sagemaker_studio_autoshutdown-0.1.5.tar.gz. Later, the auto-shutdown script will run the s3 cp command to download the extension file from the S3 bucket on Jupyter Server start-ups. Please refer to the following GitHub repos for more information regarding the auto-shutdown extension and auto-shutdown script.
  2. Create an aws_sagemaker_studio_lifecycle_config resource “auto_shutdown”. This resource will encode the autoshutdown-script.sh with base 64 and create a Lifecycle Configuration for the SageMaker Domain.
  3. For SageMaker Domain default user settings, specify the Lifecycle Configuration arn and set it as default.

SageMaker execution role IAM permissions

As a managed service, SageMaker performs operations on your behalf on the AWS hardware that is managed by SageMaker. SageMaker can perform only operations that the user permits.

A SageMaker user can grant these permissions with an IAM role (referred to as an execution role). When you create a SageMaker Studio domain, SageMaker allows you to create the execution role by default. You can restrict access to user profiles by changing the SageMaker user profile role. This Terraform solution attaches the following IAM policies to the SageMaker execution role:

  • SageMaker managed AmazonSageMakerFullAccess policy. This policy grants the execution role full access to use SageMaker Studio.
  • A customer managed IAM policy to access the KMS key used to encrypt the SageMaker Studio’s EFS volume.
  • SageMaker managed AmazonSageMakerCanvasFullAccess and AmazonSageMakerCanvasAIServicesAccess policies. These policies grant the execution role full access to use SageMaker Canvas.
  • In order to enable time series analysis in SageMaker Canvas, you also need to add the IAM trust policy for Amazon Forecast.

Solution walkthrough

In this blog post, we demonstrate how to deploy the Terraform solution. Prior to making the deployment, please ensure to satisfy the following prerequisites:

Prerequisites

  • An AWS account
  • An IAM user with administrative access

Deployment steps

To give users following this guide a unified deployment experience, we demonstrate the deployment process with AWS CloudShell. Using CloudShell, a browser-based shell, you can quickly run scripts with the AWS Command Line Interface (AWS CLI), experiment with service APIs using the AWS CLI, and use other tools to increase your productivity.

To deploy the Terraform solution, complete the following steps:

CloudShell launch settings

  • Sign in to the AWS Management Console and select the CloudShell service.
  • In the navigation bar, in the Region selector, choose US East (N. Virginia).

Your browser will open the CloudShell terminal.

Install Terraform

The next steps should be executed in a CloudShell terminal.

Check this Hashicorp guide for up-to-date instructions to install Terraform for Amazon Linux:

  • Install yum-config-manager to manage your repositories.
sudo yum install -y yum-utils
  • Use yum-config-manager to add the official HashiCorp Linux repository.
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo
  • Install Terraform from the new repository.
sudo yum -y install terraform
  • Verify that the installation worked by listing Terraform’s available subcommands.
terraform -help

Expected output:

Usage: terraform [-version] [-help] <command> [args]

The available commands for execution are listed below.

The most common, useful commands are shown first, followed by

less common or more advanced commands. If you’re just getting

started with Terraform, stick with the common commands. For the

other commands, please read the help and docs before usage.

…

Clone the code repo

Perform the following steps in a CloudShell terminal.

  • Clone the repo and navigate to the sagemaker-domain-vpconly-canvas-with-terraform folder:
git clone https://github.com/aws-samples/sagemaker-domain-vpconly-canvas-with-terraform.git

cd sagemaker-domain-vpconly-canvas-with-terraform
  • Download the auto-shutdown extension and place it in the assets/auto_shutdown_template folder:
wget https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension/raw/main/sagemaker_studio_autoshutdown-0.1.5.tar.gz -P assets/auto_shutdown_template

Deploy the Terraform solution

In the CloudShell terminal, run the following Terraform commands:

terraform init

You should see a success message like:

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see

any changes that are required for your infrastructure. All Terraform commands

should now work...

Now you can run:

terraform plan

After you are satisfied with the resources the plan outlines to be created, you can run:

terraform apply

Enter “yes“ when prompted to confirm the deployment.

If successfully deployed, you should see an output that looks like:

Apply complete! Resources: X added, 0 changed, 0 destroyed.

Accessing SageMaker Studio and Canvas

We now have a Studio domain associated with our VPC and a user profile in this domain.

sagemaker_domain

To use the SageMaker Studio console, on the Studio Control Panel, locate your user name (it should be defaultuser) and choose Open Studio.

We made it! Now you can use your browser to connect to the SageMaker Studio environment. After a few minutes, Studio finishes creating your environment, and you’re greeted with the launcher screen.

studio_landing_page

To use the SageMaker Canvas console, on the Canvas Control Panel, locate your user name (should be defaultuser) and choose Open Canvas.

Now you can use your browser to connect to the SageMaker Canvas environment. After a few minutes, Canvas finishes creating your environment, and you’re greeted with the launcher screen.

canvas_landing_page

Feel free to explore the full functionality SageMaker Studio and Canvas has to offer! Please refer to the Conclusion section for additional workshops and tutorials you can use to learn more about SageMaker.

Clean up

Run the following command to clean up your resources:

terraform destroy

Tip: If you set the Amazon EFS retention policy as “Retain” (the default), you will run into issues during “terraform destroy” because Terraform is trying to delete the subnets and VPC when the EFS volume as well as its associated security groups (created by SageMaker) still exist. To fix this, first delete the EFS volume manually and then delete the subnets and VPC manually in the AWS console.

Conclusion

The solution in this post provides you the ability to create a SageMaker Domain to support ML environments, including SageMaker Studio and SageMaker Canvas with Terraform. SageMaker Studio provides a fully managed IDE that removes the heavy lifting in the ML process. With SageMaker Canvas, our business users can easily explore and build ML models to make accurate predictions without writing any code. With the ability to launch Studio and Canvas inside a VPC and the use of a KMS key to encrypt the EFS volume, customers can use SageMaker ML environments with enhanced security. Auto shutdown Lifecycle Configuration helps customers save costs on idle Studio notebook instances.

Go test this solution and let us know what you think. For more information about how to use SageMaker Studio and Sagemaker Canvas, see the following:


About the Author

chen_yang_awsChen Yang is a Machine Learning Engineer at Amazon Web Services. She is part of the AWS Professional Services team, and has been focusing on building secure machine learning environments for customers. In her spare time, she enjoys running and hiking in the Pacific Northwest.

Read More