Just like computer vision a few years ago, the decade-old field of natural language processing (NLP) is experiencing a fascinating renaissance. Not a month goes by without a new breakthrough! Indeed, thanks to the scalability and cost-efficiency of cloud-based infrastructure, researchers are finally able to train complex deep learning models on very large text datasets, in order to solve business problems such as question answering, sentence comparison, or text summarization.
In this respect, the Transformer deep learning architecture has proven very successful, and has spawned several state of the art model families:
- Bidirectional Encoder Representations from Transformers (BERT): 340 million parameters [1]
- Text-To-Text Transfer Transformer (T5): over 10 billion parameters [2]
- Generative Pre-Training (GPT): over 175 billion parameters [3]
As amazing as these models are, training and optimizing them remains a challenging endeavor that requires a significant amount of time, resources, and skills, all the more when different languages are involved. Unfortunately, this complexity prevents most organizations from using these models effectively, if at all. Instead, wouldn’t it be great if we could just start from pre-trained versions and put them to work immediately?
This is the exact challenge that Hugging Face is tackling. Founded in 2016, this startup based in New York and Paris makes it easy to add state of the art Transformer models to your applications. Thanks to their popular transformers
, tokenizers
and datasets
libraries, you can download and predict with over 7,000 pre-trained models in 164 languages. What do I mean by ‘popular’? Well, with over 42,000 stars on GitHub and 1 million downloads per month, the transformers
library has become the de facto place for developers and data scientists to find NLP models.
At AWS, we’re also working hard on democratizing machine learning in order to put it in the hands of every developer, data scientist and expert practitioner. In particular, tens of thousands of customers now use Amazon SageMaker, our fully managed service for machine learning. Thanks to its managed infrastructure and its advanced machine learning capabilities, customers can build and run their machine learning workloads quicker than ever at any scale. As NLP adoption grows, so does the adoption of Hugging Face models, and customers have asked us for a simpler way to train and optimize them on AWS.
Working with Hugging Face Models on Amazon SageMaker
Today, we’re happy to announce that you can now work with Hugging Face models on Amazon SageMaker. Thanks to the new HuggingFace
estimator in the SageMaker SDK, you can easily train, fine-tune, and optimize Hugging Face models built with TensorFlow and PyTorch. This should be extremely useful for customers interested in customizing Hugging Face models to increase accuracy on domain-specific language: financial services, life sciences, media and entertainment, and so on.
Here’s a code snippet fine-tuning the DistilBERT model for a single epoch.
from sagemaker.huggingface import HuggingFace
hf_estimator = HuggingFace(
entry_point='train.py',
pytorch_version = '1.6.0',
transformers_version = '4.4',
instance_type='ml.p3.2xlarge',
instance_count=1,
role=role,
hyperparameters = {
'epochs': 1,
'train_batch_size': 32,
'model_name':'distilbert-base-uncased'
}
)
huggingface_estimator.fit({'train': training_input_path, 'test': test_input_path})
As usual on SageMaker, the train.py
script uses Script Mode to retrieve hyperparameters as command line arguments. Then, thanks to the transformers
library API, it downloads the appropriate Hugging Face model, configures the training job, and runs it with the Trainer
API. Here’s a code snippet showing these steps.
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
...
model = AutoModelForSequenceClassification.from_pretrained(args.model_name)
training_args = TrainingArguments(
output_dir=args.model_dir,
num_train_epochs=args.epochs,
per_device_train_batch_size=args.train_batch_size,
per_device_eval_batch_size=args.eval_batch_size,
warmup_steps=args.warmup_steps,
evaluation_strategy="epoch",
logging_dir=f"{args.output_data_dir}/logs",
learning_rate=float(args.learning_rate)
)
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_dataset,
eval_dataset=test_dataset
)
trainer.train()
As you can see, this integration makes it easier and quicker to train advanced NLP models, even if you don’t have a lot of machine learning expertise.
Customers are already using Hugging Face models on Amazon SageMaker. For example, Quantum Health is on a mission to make healthcare navigation smarter, simpler, and most cost-effective for everybody. Says Jorge Grisman, NLP Data Scientist at Quantum Health: “we use Hugging Face and Amazon SageMaker a lot for many NLP use cases such as text classification, text summarization, and Q&A with the goal of helping our agents and members. For some use cases, we just use the Hugging Face models directly and for others we fine tune them on SageMaker. We are excited about the integration of Hugging Face Transformers into Amazon SageMaker to make use of the distributed libraries during training to shorten the training time for our larger datasets“.
Kustomer is a customer service CRM platform for managing high support volume effortlessly. Says Victor Peinado, ML Software Engineering Manager at Kustomer: “Kustomer is a customer service CRM platform for managing high support volume effortlessly. In our business, we use machine learning models to help customers contextualize conversations, remove time-consuming tasks, and deflect repetitive questions. We use Hugging Face and Amazon SageMaker extensively, and we are excited about the integration of Hugging Face Transformers into SageMaker since it will simplify the way we fine tune machine learning models for text classification and semantic search“.
Training Hugging Face Models at Scale on Amazon SageMaker
As mentioned earlier, NLP datasets can be huge, which may lead to very long training times. In order to help you speed up your training jobs and make the most of your AWS infrastructure, we’ve worked with Hugging Face to add the SageMaker Data Parallelism Library to the transformers
library (details are available in the Trainer
API documentation).
Adding a single parameter to your HuggingFace
estimator is all it takes to enable data parallelism, letting your Trainer
-based code use it automatically.
huggingface_estimator = HuggingFace(. . .
distribution = {'smdistributed':{'dataparallel':{ 'enabled': True }}}
)
That’s it. In fact, the Hugging Face team used this capability to speed up their experiment process by over four times!
Getting Started
You can start using Hugging Face models on Amazon SageMaker today, in all AWS Regions where SageMaker is available. Sample notebooks are available on GitHub. In order to enjoy automatic data parallelism, please make sure to use version 4.3.0 of the transformers
library (or newer) in your training script.
Please give it a try, and let us know what you think. As always, we’re looking forward to your feedback. You can send it to your usual AWS Support contacts, or in the AWS Forum for SageMaker.
[1] “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding“, Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova.[2] “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer“, Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
[3] “Improving Language Understanding by Generative Pre-Training“, Alec Radford Karthik Narasimhan Tim Salimans Ilya Sutskever.
About the Author
Julien is the Artificial Intelligence & Machine Learning Evangelist for EMEA. He focuses on helping developers and enterprises bring their ideas to life. In his spare time, he reads the works of JRR Tolkien again and again.