What Algorithms can Transformers Learn? A Study in Length Generalization

This paper was accepted at the MATH workshop at NeurIPS 2023.
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers’ abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we…Apple Machine Learning Research

Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

Recent work in Natural Language Processing and Computer Vision has been using textual information – e.g., entity names and descriptions – available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task of automatic Knowledge Graph Enhancement (KGE) and perform a thorough investigation on bridging the gap in both the quantity and quality of textual information between English and non-English…Apple Machine Learning Research

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

This paper was accepted at the UniReps Workshop at NeurIPS 2023.
The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that absorbs their expertise. Our method integrates techniques of multi-task learning, continual…Apple Machine Learning Research

Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration

A recent line of work shows that notions of multigroup fairness imply surprisingly strong notions of omniprediction: loss minimization guarantees that apply not just for a specific loss function, but for any loss belonging to a large family of losses. While prior work has derived various notions of omniprediction from multigroup fairness guarantees of varying strength, it was unknown whether the connection goes in both directions. In this work, we answer this question in the affirmative, establishing equivalences between notions of multicalibration and omniprediction. The new definitions that…Apple Machine Learning Research

Federated Learning for Speech Recognition: Revisiting Current Trends Towards Large-Scale ASR

This paper was accepted at the Federated Learning in the Age of Foundation Models workshop at NeurIPS 2023.
While automatic speech recognition (ASR) has witnessed remarkable achievements in recent years, it has not garnered a widespread focus within the federated learning (FL) and differential privacy (DP) communities. Meanwhile, ASR is also a well suited benchmark for FL and DP as there is (i) a natural data split across users by using speaker information; (ii) heterogeneous data across speakers close to practical settings; (iii) interplay between acoustic and language modeling; (iv) and it…Apple Machine Learning Research

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Schedule Amazon SageMaker notebook jobs and manage multi-step notebook workflows using APIs

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. Amazon SageMaker notebook jobs allow data scientists to run their notebooks on demand or on a schedule with a few clicks in SageMaker Studio. With this launch, you can programmatically run notebooks as jobs using APIs provided by Amazon SageMaker Pipelines, the ML workflow orchestration feature of Amazon SageMaker. Furthermore, you can create a multi-step ML workflow with multiple dependent notebooks using these APIs.

SageMaker Pipelines is a native workflow orchestration tool for building ML pipelines that take advantage of direct SageMaker integration. Each SageMaker pipeline is composed of steps, which correspond to individual tasks such as processing, training, or data processing using Amazon EMR. SageMaker notebook jobs are now available as a built-in step type in SageMaker pipelines. You can use this notebook job step to easily run notebooks as jobs with just a few lines of code using the Amazon SageMaker Python SDK. Additionally, you can stitch multiple dependent notebooks together to create a workflow in the form of Directed Acyclic Graphs (DAGs). You can then run these notebooks jobs or DAGs, and manage and visualize them using SageMaker Studio.

Data scientists currently use SageMaker Studio to interactively develop their Jupyter notebooks and then use SageMaker notebook jobs to run these notebooks as scheduled jobs. These jobs can be run immediately or on a recurring time schedule without the need for data workers to refactor code as Python modules. Some common use cases for doing this include:

  • Running long running-notebooks in the background
  • Regularly running model inference to generate reports
  • Scaling up from preparing small sample datasets to working with petabyte-scale big data
  • Retraining and deploying models on some cadence
  • Scheduling jobs for model quality or data drift monitoring
  • Exploring the parameter space for better models

Although this functionality makes it straightforward for data workers to automate standalone notebooks, ML workflows are often comprised of several notebooks, each performing a specific task with complex dependencies. For instance, a notebook that monitors for model data drift should have a pre-step that allows extract, transform, and load (ETL) and processing of new data and a post-step of model refresh and training in case a significant drift is noticed. Furthermore, data scientists might want to trigger this entire workflow on a recurring schedule to update the model based on new data. To enable you to easily automate your notebooks and create such complex workflows, SageMaker notebook jobs are now available as a step in SageMaker Pipelines. In this post, we show how you can solve the following use cases with a few lines of code:

  • Programmatically run a standalone notebook immediately or on a recurring schedule
  • Create multi-step workflows of notebooks as DAGs for continuous integration and continuous delivery (CI/CD) purposes that can be managed via the SageMaker Studio UI

Solution overview

The following diagram illustrates our solution architecture. You can use the SageMaker Python SDK to run a single notebook job or a workflow. This feature creates a SageMaker training job to run the notebook.

In the following sections, we walk through a sample ML use case and showcase the steps to create a workflow of notebook jobs, passing parameters between different notebook steps, scheduling your workflow, and monitoring it via SageMaker Studio.

For our ML problem in this example, we are building a sentiment analysis model, which is a type of text classification task. The most common applications of sentiment analysis include social media monitoring, customer support management, and analyzing customer feedback. The dataset being used in this example is the Stanford Sentiment Treebank (SST2) dataset, which consists of movie reviews along with an integer (0 or 1) that indicates the positive or negative sentiment of the review.

The following is an example of a data.csv file corresponding to the SST2 dataset, and shows values in its first two columns. Note that the file shouldn’t have any header.

Column 1 Column 2
0 hide new secretions from the parental units
0 contains no wit , only labored gags
1 that loves its characters and communicates something rather beautiful about human nature
0 remains utterly satisfied to remain the same throughout
0 on the worst revenge-of-the-nerds clichés the filmmakers could dredge up
0 that ‘s far too tragic to merit such superficial treatment
1 demonstrates that the director of such hollywood blockbusters as patriot games can still turn out a small , personal film with an emotional wallop .

In this ML example, we must perform several tasks:

  1. Perform feature engineering to prepare this dataset in a format our model can understand.
  2. Post-feature engineering, run a training step that uses Transformers.
  3. Set up batch inference with the fine-tuned model to help predict the sentiment for new reviews that come in.
  4. Set up a data monitoring step so that we can regularly monitor our new data for any drift in quality that might require us to retrain the model weights.

With this launch of a notebook job as a step in SageMaker pipelines, we can orchestrate this workflow, which consists of three distinct steps. Each step of the workflow is developed in a different notebook, which are then converted into independent notebook jobs steps and connected as a pipeline:

  • Preprocessing – Download the public SST2 dataset from Amazon Simple Storage Service (Amazon S3) and create a CSV file for the notebook in Step 2 to run. The SST2 dataset is a text classification dataset with two labels (0 and 1) and a column of text to categorize.
  • Training – Take the shaped CSV file and run fine-tuning with BERT for text classification utilizing Transformers libraries. We use a test data preparation notebook as part of this step, which is a dependency for the fine-tuning and batch inference step. When fine-tuning is complete, this notebook is run using run magic and prepares a test dataset for sample inference with the fine-tuned model.
  • Transform and monitor – Perform batch inference and set up data quality with model monitoring to have a baseline dataset suggestion.

Run the notebooks

The sample code for this solution is available on GitHub.

Creating a SageMaker notebook job step is similar to creating other SageMaker Pipeline steps. In this notebook example, we use the SageMaker Python SDK to orchestrate the workflow. To create a notebook step in SageMaker Pipelines, you can define the following parameters:

  • Input notebook – The name of the notebook that this notebook step will be orchestrating. Here you can pass in the local path to the input notebook. Optionally, if this notebook has other notebooks it’s running, you can pass these in the AdditionalDependencies parameter for the notebook job step.
  • Image URI – The Docker image behind the notebook job step. This can be the predefined images that SageMaker already provides or a custom image that you have defined and pushed to Amazon Elastic Container Registry (Amazon ECR). Refer to the considerations section at the end of this post for supported images.
  • Kernel name – The name of the kernel that you are using on SageMaker Studio. This kernel spec is registered in the image that you have provided.
  • Instance type (optional) – The Amazon Elastic Compute Cloud (Amazon EC2) instance type behind the notebook job that you have defined and will be running.
  • Parameters (optional) – Parameters you can pass in that will be accessible for your notebook. These can be defined in key-value pairs. Additionally, these parameters can be modified between various notebook job runs or pipeline runs.

Our example has a total of five notebooks:

  • nb-job-pipeline.ipynb – This is our main notebook where we define our pipeline and workflow.
  • preprocess.ipynb – This notebook is the first step in our workflow and contains the code that will pull the public AWS dataset and create a CSV file out of it.
  • training.ipynb – This notebook is the second step in our workflow and contains code to take the CSV from the previous step and conduct local training and fine-tuning. This step also has a dependency from the prepare-test-set.ipynb notebook to pull down a test dataset for sample inference with the fine-tuned model.
  • prepare-test-set.ipynb – This notebook creates a test dataset that our training notebook will use in the second pipeline step and use for sample inference with the fine-tuned model.
  • transform-monitor.ipynb – This notebook is the third step in our workflow and takes the base BERT model and runs a SageMaker batch transform job, while also setting up data quality with model monitoring.

Next, we walk through the main notebook nb-job-pipeline.ipynb, which combines all the sub-notebooks into a pipeline and runs the end-to-end workflow. Note that although the following example only runs the notebook one time, you can also schedule the pipeline to run the notebook repeatedly. Refer to SageMaker documentation for detailed instructions.

For our first notebook job step, we pass in a parameter with a default S3 bucket. We can use this bucket to dump any artifacts we want available for our other pipeline steps. For the first notebook (preprocess.ipynb), we pull down the AWS public SST2 train dataset and create a training CSV file out of it that we push to this S3 bucket. See the following code:

# Parameters
print(default_s3_bucket)

!aws s3 cp s3://sagemaker-sample-files/datasets/text/SST2/sst2.train sst2.train

# will read just the first 500 lines for quicker execution
with open('sst2.train', 'r') as f:
    lines = f.readlines()[:500] 

data = []
for line in lines:
    label, text = line.strip().split(' ', 1)
    data.append((int(label), text))

df = pd.DataFrame(data, columns=['label', 'text'])
df.to_csv("train.csv", index=False) #create csv file with smaller dataset
!aws s3 cp "train.csv" {default_s3_bucket}

We can then convert this notebook in a NotebookJobStep with the following code in our main notebook:

# provide S3 Bucket to dump artifacts in
nb_job_params = {"default_s3_bucket": notebook_artifacts}

preprocess_nb_step = NotebookJobStep(
name=preprocess_step_name,
description=preprocess_description,
notebook_job_name=preprocess_job_name,
image_uri=image_uri,
kernel_name=kernel_name,
display_name=display_name,
role=role,
input_notebook=preprocess_notebook,
instance_type="ml.m5.4xlarge",
parameters=nb_job_params,
)

Now that we have a sample CSV file, we can start training our model in our training notebook. Our training notebook takes in the same parameter with the S3 bucket and pulls down the training dataset from that location. Then we perform fine-tuning by using the Transformers trainer object with the following code snippet:

from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

trainer.train()

After fine-tuning, we want to run some batch inference to see how the model is performing. This is done using a separate notebook (prepare-test-set.ipynb) in the same local path that creates a test dataset to perform inference on using our trained model. We can run the additional notebook in our training notebook with the following magic cell:

%run 'prepare-test-set.ipynb'

We define this extra notebook dependency in the AdditionalDependencies parameter in our second notebook job step:

train_nb_step = NotebookJobStep(
name=training_step_name,
description=training_description,
notebook_job_name=training_job_name,
input_notebook=training_notebook,
additional_dependencies=[test_data_prep_notebook],
image_uri=image_uri,
kernel_name=kernel_name,
display_name=display_name,
instance_type="ml.m5.12xlarge",
role=role,
parameters=nb_job_params,
)

We must also specify that the training notebook job step (Step 2) depends on the Preprocess notebook job step (Step 1) by using the add_depends_on API call as follows:

train_nb_step.add_depends_on([preprocess_nb_step])

Our last step, will take the BERT model run a SageMaker Batch Transform, while also setting up Data Capture and Quality via SageMaker Model Monitor. Note that this is different from using the built-in Transform or Capture steps via Pipelines. Our notebook for this step will execute those same APIs, but will be tracked as a Notebook Job Step. This step is dependent on the Training Job Step that we previously defined, so we also capture that with the depends_on flag.

batch_monitor_step = NotebookJobStep(
name=batch_monitor_step_name,
description=batch_monitor_description,
notebook_job_name=batch_monitor_job_name,
input_notebook=batch_monitor_notebook,
image_uri=image_uri,
kernel_name=kernel_name,
display_name=display_name,
instance_type="ml.m5.12xlarge",
role=role,
parameters=nb_job_params,
)
batch_monitor_step.add_depends_on([train_nb_step])

After the various steps of our workflow have been defined, we can create and run the end-to-end pipeline:

# create pipeline
pipeline = Pipeline(
name=pipeline_name,
steps=[preprocess_nb_step, train_nb_step, batch_monitor_step],
)

# execute pipeline
pipeline.create(session.get_execution_role())
execution = pipeline.start(parameters={})
execution.wait(delay=30, max_attempts=60)
execution_steps = execution.list_steps()
print(execution_steps)

Monitor the pipeline runs

You can track and monitor the notebook step runs via the SageMaker Pipelines DAG, as seen in the following screenshot.

You can also optionally monitor the individual notebook runs on the notebook job dashboard and toggle the output files that have been created via the SageMaker Studio UI. When using this functionality outside of SageMaker Studio, you can define the users who can track the run status on the notebook job dashboard by using tags. For more details about tags to include, see View your notebook jobs and download outputs in the Studio UI dashboard.

For this example, we output the resulting notebook jobs to a directory called outputs in your local path with your pipeline run code. As shown in the following screenshot, here you can see the output of your input notebook and also any parameters you defined for that step.

Clean up

If you followed along with our example, be sure to delete the created pipeline, notebook jobs and the s3 data downloaded by the sample notebooks.

Considerations

The following are some important considerations for this feature:

Conclusion

With this launch, data workers can now programmatically run their notebooks with a few lines of code using the SageMaker Python SDK. Additionally, you can create complex multi-step workflows using your notebooks, significantly reducing the time needed to move from a notebook to a CI/CD pipeline. After creating the pipeline, you can use SageMaker Studio to view and run DAGs for your pipelines and manage and compare the runs. Whether you’re scheduling end-to-end ML workflows or a part of it, we encourage you to try notebook-based workflows.


About the authors

Anchit Gupta is a Senior Product Manager for Amazon SageMaker Studio. She focuses on enabling interactive data science and data engineering workflows from within the SageMaker Studio IDE. In her spare time, she enjoys cooking, playing board/card games, and reading.

Ram Vegiraju is a ML Architect with the SageMaker Service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.

Edward Sun is a Senior SDE working for SageMaker Studio at Amazon Web Services. He is focused on building interactive ML solution and simplifying the customer experience to integrate SageMaker Studio with popular technologies in data engineering and ML ecosystem. In his spare time, Edward is big fan of camping, hiking and fishing and enjoys the time spending with his family.

Read More

Announcing new tools and capabilities to enable responsible AI innovation

Announcing new tools and capabilities to enable responsible AI innovation

The rapid growth of generative AI brings promising new innovation, and at the same time raises new challenges. These challenges include some that were common before generative AI, such as bias and explainability, and new ones unique to foundation models (FMs), including hallucination and toxicity. At AWS, we are committed to developing generative AI responsibly, taking a people-centric approach that prioritizes education, science, and our customers, to integrate responsible AI across the end-to-end AI lifecycle.

Over the past year, we have introduced new capabilities in our generative AI applications and models such as built-in security scanning in Amazon CodeWhisperer, training to detect and block harmful content in Amazon Titan, and data privacy protections in Amazon Bedrock. Our investment in safe, transparent, and responsible generative AI includes collaboration with the global community and policymakers as we encouraged and supported both the White House Voluntary AI commitments and AI Safety Summit in the UK. And we continue to work hand-in-hand with customers to operationalize responsible AI with purpose-built tools like Amazon SageMaker Clarify, ML Governance with Amazon SageMaker, and more.

Introducing new responsible AI innovation

As generative AI scales to new industries, organizations, and use cases, this growth must be accompanied by a sustained investment in responsible FM development. Customers want their FMs to be built with safety, fairness, and security in mind, so that they can in turn deploy AI responsibly. At AWS re:Invent this year, we are excited to announce new capabilities to foster responsible generative AI innovation across a broad set of capabilities with new built-in tools, customer protections, resources to enhance transparency, and tools to combat disinformation. We aim to provide customers the information they need to evaluate FMs against key responsible AI considerations, like toxicity and robustness, and introduce guardrails to apply safeguards based on customer use cases and responsible AI policies. At the same time, our customers want to be better informed on the safety, fairness, security, and other properties, of AI services and FMs, as they use them within their own organization. We are excited to announce more resources to help customers better understand our AWS AI services and deliver the transparency they are asking for.

Implementing safeguards: Guardrails for Amazon Bedrock

Safety is a priority when it comes to introducing generative AI at scale. Organizations want to promote safe interactions between their customers and generative AI applications that avoid harmful or offensive language and align with company policies. The easiest way to do that is to put consistent safeguards in place across the whole organization so everyone can innovate safely. Yesterday we announced the preview of Guardrails for Amazon Bedrock—a new capability that makes it easy to implement application-specific safeguards based on customer use cases and responsible AI policies.

Guardrails drive consistency in how FMs on Amazon Bedrock respond to undesirable and harmful content within applications. Customers can apply guardrails to large language models on Amazon Bedrock as well as to fine-tuned models and in combination with Agents for Amazon Bedrock. Guardrails lets you specify topics to be avoided, and the service automatically detects and prevents queries and responses that fall into restricted categories. Customers can also configure content filter thresholds across categories including hate speech, insults, sexualized language, and violence to filter out harmful content to the desired level. For example, an online banking application can be set up to avoid providing investment advice and limit inappropriate content (such as hate speech, insults, and violence). In the near future, customers will also be able to redact personally identifiable information (PII) in user inputs and FMs’ responses, set profanity filters, and provide a list of custom words to block in interactions between users and FMs, improving compliance and further protecting users. With Guardrails, you can innovate faster with generative AI while maintaining protections and safeguards consistent with company policies.

Identifying the best FM for a specific use case: Model Evaluation in Amazon Bedrock

Today, organizations have a wide range of FM options to power their generative AI applications. To strike the right balance of accuracy and performance for their use case, organizations must efficiently compare models and find the best option based on key responsible AI and quality metrics that are important to them. To evaluate models, organizations must first spend days identifying benchmarks, setting up evaluation tools, and running assessments, all of which requires deep expertise in data science. Furthermore, these tests are not useful for evaluating subjective criteria (e.g., brand voice, relevance, and style) that requires judgment through tedious, time-intensive, human-review workflows. The time, expertise, and resources required for these evaluations—for every new use case —make it difficult for organizations to evaluate models against responsible AI dimensions and make an informed choice around what model will provide the most accurate, safe experience for their customers.

Now available in preview, Model Evaluation on Amazon Bedrock helps customers evaluate, compare, and select the best FMs for their specific use case based on custom metrics, such as accuracy and safety, using either automatic or human evaluations. In the Amazon Bedrock console, customers choose the FMs they want to compare for a given task, such as question-answering or content summarization. For automatic evaluations, customers select predefined evaluation criteria (e.g., accuracy, robustness, and toxicity) and upload their own testing dataset or select from built-in, publicly available datasets. For subjective criteria or nuanced content requiring  judgment, customers can easily set up human-based evaluation workflows with just a few clicks. These workflows leverage a customer’s in-house workteam, or use a managed workforce provided by AWS, to evaluate model responses. During human-based evaluations, customers define use case-specific metrics (e.g., relevance, style, and brand voice). Once customers finish the setup process, Amazon Bedrock runs evaluations and generates a report, so customers can easily understand how the model performed across key safety and accuracy criteria and select the best model for their use case.

This ability to evaluate models is not limited to Amazon Bedrock, customers can also use model evaluation in Amazon SageMaker Clarify to easily evaluate, compare, and select the best FM option across key quality and responsibility metrics such as accuracy, robustness, and toxicity – across all FMs.

Combating disinformation: Watermarking in Amazon Titan

Today, we announced Amazon Titan Image Generator in preview, which empowers customers to rapidly produce and enhance high-quality images at scale. We considered responsible AI during each stage of the model development process, including training data selection, building filtering capabilities to detect and remove inappropriate user inputs and model outputs, and improving demographic diversity of our model outputs. All Amazon Titan-generated images contain an invisible watermark by default, which is designed to help reduce the spread of disinformation by providing a discreet mechanism to identify AI-generated images. AWS is among the first model providers to widely release built-in invisible watermarks that are integrated into image outputs and are designed to be resistant to alterations.

Building trust: Standing behind our models and applications with indemnification

Building customer trust is core to AWS. We have been on a journey with our customers since our inception, and with the growth of generative AI, we remain committed to building innovative technology together. To enable customers to harness the power of our generative AI, they need to know they are protected. AWS offers copyright indemnity coverage for outputs of the following Amazon generative AI services: Amazon Titan Text Express, Amazon Titan Text Lite, Amazon Titan Embeddings, Amazon Titan Multimodal Embeddings, Amazon CodeWhisperer Professional, AWS HealthScribe, Amazon Lex, and Amazon Personalize. This means that customers who use the models responsibly are protected from third-party claims alleging copyright infringement by the outputs generated by those services (see Section 50.10 of the Service Terms). In addition, our standard IP indemnity for use of the services protects customers from third-party claims alleging IP infringement by the services and the data used to train them. To put it another way, if you use an Amazon generative AI service listed above and someone sues you for IP infringement, AWS will defend that lawsuit, which includes covering any judgment against you or settlement costs.

We stand behind our generative AI services and work to continually improve them. As AWS launches new services and generative AI continues to evolve, AWS will continue to relentlessly focus on earning and maintaining customer trust.

Enhancing transparency: AWS AI Service Card for Amazon Titan Text

We introduced AWS AI Service Cards at re:Invent 2022 as a transparency resource to help customers better understand our AWS AI services. AI Service Cards are a form of responsible AI documentation that provide customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for our AI services. They are part of a comprehensive development process we undertake to build our services in a responsible way that addresses fairness, explainability, veracity and robustness, governance, transparency, privacy and security, safety, and controllability.

At re:Invent this year we are announcing a new AI Service Card for Amazon Titan Text to increase transparency in foundation models. We are also launching four new AI Service Cards including: Amazon Comprehend Detect PII, Amazon Transcribe Toxicity Detection, Amazon Rekognition Face Liveness, and AWS HealthScribe. You can explore each of these cards on the AWS website. As generative AI continues to grow and evolve, transparency on how technology is developed, tested, and used will be a vital component to earn the trust of organizations and their customers alike. At AWS, we are committed to continuing to bring transparency resources like AI Service Cards to the broader community—and to iterate and gather feedback on the best ways forward.

Investing in responsible AI across the entire generative AI lifecycle

We are excited about the new innovations announced at re:Invent this week that gives our customers more tools, resources, and built-in protections to build and use generative AI safely. From model evaluation to guardrails to watermarking, customers can now bring generative AI to their organization faster, while mitigating risk. New protections for customers like IP indemnity coverage and new resources to enhance transparency like additional AI Service Cards are also key examples of our commitment to build trust across technology companies, policymakers, community groups, scientists, and more. We continue to make meaningful investments in responsible AI across the lifecycle of a foundation model—to help our customers scale AI in a safe, secure, and responsible way.


About the Authors

Peter Hallinan leads initiatives in the science and practice of Responsible AI at AWS AI, alongside a team of responsible AI experts. He has deep expertise in AI (PhD, Harvard) and entrepreneurship (Blindsight, sold to Amazon). His volunteer activities have included serving as a consulting professor at the Stanford University School of Medicine, and as the president of the American Chamber of Commerce in Madagascar. When possible, he’s off in the mountains with his children: skiing, climbing, hiking and rafting

Vasi Philomin is currently the VP of Generative AI at AWS. He leads generative AI efforts including Amazon Bedrock, Amazon Titan, and Amazon CodeWhisperer.

Read More

Half-precision Inference Doubles On-Device Inference Performance

Half-precision Inference Doubles On-Device Inference Performance

Posted by Marat Dukhan and Frank Barchard, Software Engineers

CPUs deliver the widest reach for ML inference and remain the default target for TensorFlow Lite. Consequently, improving CPU inference performance is a top priority, and we are excited to announce that we doubled floating-point inference performance in TensorFlow Lite’s XNNPack backend by enabling half-precision inference on ARM CPUs. This means that more AI powered features may be deployed to older and lower tier devices.

Traditionally, TensorFlow Lite supported two kinds of numerical computations in machine learning models: a) floating-point using IEEE 754 single-precision (32-bit) format and b) quantized using low-precision integers. While single-precision floating-point numbers provide maximum flexibility and ease of use, they come at the cost of 4X overhead in storage and memory and exhibit a performance overhead compared to 8-bit integer computations. In contrast, half-precision (FP16) floating-point numbers pose an interesting alternative balancing ease-of-use and performance: the processor needs to transfer twice fewer bytes and each vector operation produces twice more elements. By virtue of this property, FP16 inference paves the way for 2X speedup for floating-point models compared to the traditional FP32 way.

For a long time FP16 inference on CPUs primarily remained a research topic, as the lack of hardware support for FP16 computations limited production use-cases. However, around 2017 new mobile chipsets started to include support for native FP16 computations, and by now most mobile phones, both on the high-end and the low-end. Building upon this broad availability, we are pleased to announce the general availability for half-precision inference in TensorFlow Lite and XNNPack.

Performance Improvements

Half-precision inference has already been battle-tested in production across Google Assistant, Google Meet, YouTube, and ML Kit, and demonstrated close to 2X speedups across a wide range of neural network architectures and mobile devices. Below, we present benchmarks on nine public models covering common computer vision tasks:

  1. MobileNet v2 image classification [download]
  2. MobileNet v3-Small image classification [download]
  3. DeepLab v3 segmentation [download]
  4. BlazeFace face detection [download]
  5. SSDLite 2D object detection [download]
  6. Objectron 3D object detection [download]
  7. Face Mesh landmarks [download]
  8. MediaPipe Hands landmarks [download]
  9. KNIFT local feature descriptor [download]

These models were benchmarked on 5 popular mobile devices, including recent and older devices (Pixel 3a, Pixel 5a, Pixel 7, Galaxy M12 and Galaxy S22). The average speedup is shown below.

Graph of Average speedup for fp16 vs fp32
Single-threaded inference speedup with half-precision (FP16) inference compared to single-precision (FP32) across 5 mobile devices. Higher numbers are better.

The same models were also benchmarked on three laptop computers (MacBook Air M1, Surface Pro X and Surface Pro 9)

ALT TEXT
Single-threaded inference speedup with half-precision (FP16) inference compared to single-precision (FP32) across 3 laptop computers. Higher numbers are better.

Currently, the FP16-capable hardware supported in XNNPack is limited to ARM & ARM64 devices with ARMv8.2 FP16 arithmetics extension, which includes Android phones starting with Pixel 3, Galaxy S9 (Snapdragon SoC), Galaxy S10 (Exynos SoC), iOS devices with A11 or newer SoCs, all Apple Silicon Macs, and Windows ARM64 laptops based with Snapdragon 850 SoC or newer.

How Can I Use It?

To benefit from the half-precision inference in XNNPack, the user must provide a floating-point (FP32) model with FP16 weights and special “reduced_precision_support” metadata to indicate model compatibility with FP16 inference. The metadata can be added during model conversion using the _experimental_supported_accumulation_type attribute of the tf.lite.TargetSpec object:

...
converter.target_spec.supported_types = [tf.float16]
converter.target_spec._experimental_supported_accumulation_type = tf.dtypes.float16

When the compatible model is delegated to XNNPack on a hardware with native support for FP16 computations, XNNPack will transparently replace FP32 operators with their FP16 equivalents, and insert additional operators to convert model inputs from FP32 to FP16 and convert model outputs back from FP16 to FP32. If the hardware is not capable of FP16 arithmetics, XNNPack will perform model inference with FP32 calculations. Therefore, a single model can be transparently deployed on both recent and legacy devices.

Additionally, the XNNPack delegate provides an option to force FP16 inference regardless of the model metadata. This option is intended for development workflows, and in particular for testing end-to-end accuracy of the model when FP16 inference is used. In addition to devices with native FP16 arithmetics support, forced FP16 inference is supported on x86/x86-64 devices with AVX2 extension in emulation mode: all elementary floating-point operations are computed in FP32, then converted to FP16 and back to FP32. Note that such simulation is slow and not a bit-exact equivalent to native FP16 inference, but simulates the effects of restricted mantissa precision and exponent range in the native FP16 arithmetics. To force FP16 inference, either build TensorFlow Lite with --define xnnpack_force_float_precision=fp16 Bazel option, or apply XNNPack delegate explicitly and add TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16 flag to the TfLiteXNNPackDelegateOptions.flags bitmask passed into the TfLiteXNNPackDelegateCreate call:

TfLiteXNNPackDelegateOptions xnnpack_options =
    TfLiteXNNPackDelegateOptionsDefault();
...
xnnpack_options.flags |= TFLITE_XNNPACK_DELEGATE_FLAG_FORCE_FP16;
TfLiteDelegate* xnnpack_delegate =
    TfLiteXNNPackDelegateCreate(&xnnpack_options);

XNNPack provides full feature parity between FP32 and FP16 operators: all operators that are supported for FP32 inference are also supported for FP16 inference, and vice versa. In particular, sparse inference operators are supported for FP16 inference on ARM processors. Therefore, users can combine the performance benefits of sparse and FP16 inference in the same model.

Future Work

In addition to most ARM and ARM64 processors, the most recent Intel processors, code-named Sapphire Rapids, support native FP16 arithmetics via the AVX512-FP16 instruction set, and the recently announced AVX10 instruction set promises to make this capability widely available on x86 platform. We plan to optimize XNNPack for these instruction sets in a future release.

Acknowledgements

We would like to thank Alan Kelly, Zhi An Ng, Artsiom Ablavatski, Sachin Joglekar, T.J. Alumbaugh, Andrei Kulik, Jared Duke, Matthias Grundmann for contributions towards half-precision inference in TensorFlow Lite and XNNPack.

Read More

Introducing the AWS Generative AI Innovation Center’s Custom Model Program for Anthropic Claude

Introducing the AWS Generative AI Innovation Center’s Custom Model Program for Anthropic Claude

Since launching in June 2023, the AWS Generative AI Innovation Center team of strategists, data scientists, machine learning (ML) engineers, and solutions architects have worked with hundreds of customers worldwide, and helped them ideate, prioritize, and build bespoke solutions that harness the power of generative AI. Customers worked closely with us to prioritize use cases, select the right foundation models (FMs), incorporate responsible AI principles, develop proofs of concept, optimize solutions, and launch them at scale. Today, we are excited to announce the AWS Generative AI Innovation Center Custom Model Program for Anthropic Claude. Starting in Q1 2024, customers can engage with researchers and ML scientists from the Generative AI Innovation Center to fine-tune Anthropic Claude models securely with their own proprietary data.

For most use cases, customers can use the high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, all available in Amazon Bedrock via a single API. Techniques such as prompt engineering, few-shot learning, and RAG can also help customize model responses for your business context and specific tasks without the need for further training. However, some applications will benefit from deeper customization through model fine-tuning. Fine-tuning refers to taking a general-purpose FM and adapting it to improve performance on specific tasks or domains using a relative smaller, but high-quality labeled datasets. Fine-tuning typically results in better performance on specific tasks compared to the base FM. This additional task-specific training helps the model get better at the applications you care about. The resulting models are also unique to the fine-tuning data used, enabling enterprises to develop differentiated solutions based on their private company data sources.

Fine-tuning, aligning, and optimizing Anthropic Claude models for complex tasks and domains requires deep AI expertise. Starting in Q1 2024, customers can engage with a team of experts from the AWS Generative AI Innovation Center and fine-tune Claude models with their proprietary data sources. Our experts will help you scope requirements for model customization, define evaluation criteria, and work with your proprietary data for fine-tuning. We will collaborate with the Anthropic science team and align the fine-tuned models to meet your needs. You can privately access the fine-tuned models directly through Amazon Bedrock, enabling the same API integrations you use today without the need to manage deployments or infrastructure.

To learn more about the program, contact your AWS account team.


About the authors

Sri Elaprolu currently serves as the Head of AWS Generative AI Innovation Center. He leads a large team of machine learning scientists, engineers, and strategists that work with global enterprises and public sector organizations to address challenging problems and opportunities using generative AI. Previously, he led science teams that supported 100s of AWS customers including the NFL, Cerner, NASA, and the U.S. Dept. of Defense leverage AWS AI/ML to drive business and mission outcomes.

Read More