Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

*=Equal Contributors
Personal devices have adopted diverse authentication methods, including biometric recognition and passcodes. In contrast, headphones have limited input mechanisms, depending solely on the authentication of connected devices. We present Moonwalk, a novel method for passive user recognition utilizing the built-in headphone accelerometer. Our approach centers on gait recognition; enabling users to establish their identity simply by walking for a brief interval, despite the sensor’s placement away from the feet. We employ self-supervised metric learning to train a model that…Apple Machine Learning Research

Vision-Based Hand Gesture Customization from a Single Demonstration

Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization requires efficient usage of user-provided data. We introduce a method that enables users to easily design bespoke gestures with a monocular camera from one demonstration. We employ transformers and…Apple Machine Learning Research

Merge Vision Foundation Models via Multi-Task Distillation

As the repository of publicly available pre-trained vision foundation models (VFMs) — such as CLIP, DINOv2, and SAM — grows, users face challenges in storage, memory, and computational efficiency when deploying multiple models concurrently. To address these concerns, we introduce a unique approach that merges the capabilities of multiple VFMs into a single efficient multi-task model. Our method, termed “joint distillation,” seamlessly integrates teacher-student learning with self-distillation, operating with just unlabeled image data and drastically cutting down on computational requirements…Apple Machine Learning Research

Health-specific embedding tools for dermatology and pathology

Health-specific embedding tools for dermatology and pathology

There’s a worldwide shortage of access to medical imaging expert interpretation across specialties including radiology, dermatology and pathology. Machine learning (ML) technology can help ease this burden by powering tools that enable doctors to interpret these images more accurately and efficiently. However, the development and implementation of such ML tools are often limited by the availability of high-quality data, ML expertise, and computational resources.

One way to catalyze the use of ML for medical imaging is via domain-specific models that utilize deep learning (DL) to capture the information in medical images as compressed numerical vectors (called embeddings). These embeddings represent a type of pre-learned understanding of the important features in an image. Identifying patterns in the embeddings reduces the amount of data, expertise, and compute needed to train performant models as compared to working with high-dimensional data, such as images, directly. Indeed, these embeddings can be used to perform a variety of downstream tasks within the specialized domain (see animated graphic below). This framework of leveraging pre-learned understanding to solve related tasks is similar to that of a seasoned guitar player quickly learning a new song by ear. Because the guitar player has already built up a foundation of skill and understanding, they can quickly pick up the patterns and groove of a new song.

Path Foundation is used to convert a small dataset of (image, label) pairs into (embedding, label) pairs. These pairs can then be used to train a task-specific classifier using a linear probe, (i.e., a lightweight linear classifier) as represented in this graphic, or other types of models using the embeddings as input.

Once the linear probe is trained, it can be used to make predictions on embeddings from new images. These predictions can be compared to ground truth information in order to evaluate the linear probe’s performance.

In order to make this type of embedding model available and drive further development of ML tools in medical imaging, we are excited to release two domain-specific tools for research use: Derm Foundation and Path Foundation. This follows on the strong response we’ve already received from researchers using the CXR Foundation embedding tool for chest radiographs and represents a portion of our expanding research offerings across multiple medical-specialized modalities. These embedding tools take an image as input and produce a numerical vector (the embedding) that is specialized to the domains of dermatology and digital pathology images, respectively. By running a dataset of chest X-ray, dermatology, or pathology images through the respective embedding tool, researchers can obtain embeddings for their own images, and use these embeddings to quickly develop new models for their applications.

Path Foundation

In “Domain-specific optimization and diverse evaluation of self-supervised models for histopathology”, we showed that self-supervised learning (SSL) models for pathology images outperform traditional pre-training approaches and enable efficient training of classifiers for downstream tasks. This effort focused on hematoxylin and eosin (H&E) stained slides, the principal tissue stain in diagnostic pathology that enables pathologists to visualize cellular features under a microscope. The performance of linear classifiers trained using the output of the SSL models matched that of prior DL models trained on orders of magnitude more labeled data.

Due to substantial differences between digital pathology images and “natural image” photos, this work involved several pathology-specific optimizations during model training. One key element is that whole-slide images (WSIs) in pathology can be 100,000 pixels across (thousands of times larger than typical smartphone photos) and are analyzed by experts at multiple magnifications (zoom levels). As such, the WSIs are typically broken down into smaller tiles or patches for computer vision and DL applications. The resulting images are information dense with cells or tissue structures distributed throughout the frame instead of having distinct semantic objects or foreground vs. background variations, thus creating unique challenges for robust SSL and feature extraction. Additionally, physical (e.g., cutting) and chemical (e.g., fixing and staining) processes used to prepare the samples can influence image appearance dramatically.

Taking these important aspects into consideration, pathology-specific SSL optimizations included helping the model learn stain-agnostic features, generalizing the model to patches from multiple magnifications, augmenting the data to mimic scanning and image post processing, and custom data balancing to improve input heterogeneity for SSL training. These approaches were extensively evaluated using a broad set of benchmark tasks involving 17 different tissue types over 12 different tasks.

Utilizing the vision transformer (ViT-S/16) architecture, Path Foundation was selected as the best performing model from the optimization and evaluation process described above (and illustrated in the figure below). This model thus provides an important balance between performance and model size to enable valuable and scalable use in generating embeddings over the many individual image patches of large pathology WSIs.

SSL training with pathology-specific optimizations for Path Foundation.

The value of domain-specific image representations can also be seen in the figure below, which shows the linear probing performance improvement of Path Foundation (as measured by AUROC) compared to traditional pre-training on natural images (ImageNet-21k). This includes evaluation for tasks such as metastatic breast cancer detection in lymph nodes, prostate cancer grading, and breast cancer grading, among others.

Path Foundation embeddings significantly outperform traditional ImageNet embeddings as evaluated by linear probing across multiple evaluation tasks in histopathology.

Derm Foundation

Derm Foundation is an embedding tool derived from our research in applying DL to interpret images of dermatology conditions and includes our recent work that adds improvements to generalize better to new datasets. Due to its dermatology-specific pre-training it has a latent understanding of features present in images of skin conditions and can be used to quickly develop models to classify skin conditions. The model underlying the API is a BiT ResNet-101×3 trained in two stages. The first pre-training stage uses contrastive learning, similar to ConVIRT, to train on a large number of image-text pairs from the internet. In the second stage, the image component of this pre-trained model is then fine-tuned for condition classification using clinical datasets, such as those from teledermatology services.

Unlike histopathology images, dermatology images more closely resemble the real-world images used to train many of today’s computer vision models. However, for specialized dermatology tasks, creating a high-quality model may still require a large dataset. With Derm Foundation, researchers can use their own smaller dataset to retrieve domain-specific embeddings, and use those to build smaller models (e.g., linear classifiers or other small non-linear models) that enable them to validate their research or product ideas. To evaluate this approach, we trained models on a downstream task using teledermatology data. Model training involved varying dataset sizes (12.5%, 25%, 50%, 100%) to compare embedding-based linear classifiers against fine-tuning.

The modeling variants considered were:

  • A linear classifier on frozen embeddings from BiT-M (a standard pre-trained image model)
  • Fine-tuned version of BiT-M with an extra dense layer for the downstream task
  • A linear classifier on frozen embeddings from the Derm Foundation API
  • Fine-tuned version of the model underlying the Derm Foundation API with an extra layer for the downstream task

We found that models built on top of the Derm Foundation embeddings for dermatology-related tasks achieved significantly higher quality than those built solely on embeddings or fine tuned from BiT-M. This advantage was found to be most pronounced for smaller training dataset sizes.

These results demonstrate that the Derm Foundation tooI can serve as a useful starting point to accelerate skin-related modeling tasks. We aim to enable other researchers to build on the underlying features and representations of dermatology that the model has learned.

However, there are limitations with this analysis. We’re still exploring how well these embeddings generalize across task types, patient populations, and image settings. Downstream models built using Derm Foundation still require careful evaluation to understand their expected performance in the intended setting.

Access Path and Derm Foundation

We envision that the Derm Foundation and Path Foundation embedding tools will enable a range of use cases, including efficient development of models for diagnostic tasks, quality assurance and pre-analytical workflow improvements, image indexing and curation, and biomarker discovery and validation. We are releasing both tools to the research community so they can explore the utility of the embeddings for their own dermatology and pathology data.

To get access, please sign up to each tool’s terms of service using the following Google Forms.

After gaining access to each tool, you can use the API to retrieve embeddings from dermatology images or digital pathology images stored in Google Cloud. Approved users who are just curious to see the model and embeddings in action can use the provided example Colab notebooks to train models using public data for classifying six common skin conditions or identifying tumors in histopathology patches. We look forward to seeing the range of use-cases these tools can unlock.

Acknowledgements

We would like to thank the many collaborators who helped make this work possible including Yun Liu, Can Kirmizi, Fereshteh Mahvar, Bram Sterling, Arman Tajback, Kenneth Philbrik, Arnav Agharwal, Aurora Cheung, Andrew Sellergren, Boris Babenko, Basil Mustafa, Jan Freyberg, Terry Spitz, Yuan Liu, Pinal Bavishi, Ayush Jain, Amit Talreja, Rajeev Rikhye, Abbi Ward, Jeremy Lai, Faruk Ahmed, Supriya Vijay,Tiam Jaroensri, Jessica Loo, Saurabh Vyawahare, Saloni Agarwal, Ellery Wulczyn, Jonathan Krause, Fayaz Jamil, Tom Small, Annisah Um’rani, Lauren Winer, Sami Lachgar, Yossi Matias, Greg Corrado, and Dale Webster.

Read More

LLMs Land on Laptops: NVIDIA, HP CEOs Celebrate AI PCs

LLMs Land on Laptops: NVIDIA, HP CEOs Celebrate AI PCs

2024 will be the year generative AI gets personal, the CEOs of NVIDIA and HP said today in a fireside chat, unveiling new laptops that can build, test and run large language models.

“This is a renaissance of the personal computer,” said NVIDIA founder and CEO Jensen Huang at HP Amplify, a gathering in Las Vegas of about 1,500 resellers and distributors. “The work of creators, designers and data scientists is going to be revolutionized by these new workstations.”

“AI is the biggest thing to come to the PC in decades,” said HP’s Enrique Lores, in the runup to the announcement of what his company billed as “the industry’s largest portfolio of AI PCs and workstations.”

Greater Speed and Security

Compared to running their AI work in the cloud, the new systems will provide increased speed and security while reducing costs and energy, Lores said in a keynote at the event.

New HP ZBooks provide a portfolio of mobile AI workstations powered by a full range of NVIDIA RTX Ada Generation GPUs.

Entry-level systems with the NVIDIA RTX 500 Ada Generation Laptop GPU let users run generative AI apps and tools wherever they go.

High-end models pack the RTX 5000 to deliver up to 682 TOPS, so they can create and run LLMs locally, using retrieval-augmented generation (RAG) to connect to their content for results that are both personalized and private.

Access to Accelerated Software

The new workstations can tap into NVIDIA’s full-stack AI platform, including software that speeds the data science at the foundation of generative AI.

The systems’ Z by HP AI Studio platform — developed in collaboration with NVIDIA — links to NVIDIA NGC, a catalog of GPU-accelerated software for AI and data science. NGC includes NVIDIA NeMo, a framework to build, customize and deploy generative AI models.

In addition, HP and NVIDIA announced that NVIDIA CUDA-X libraries will be integrated with the systems to turbocharge the data preparation and processing that’s fundamental for generative AI.

Speedups for Data Scientists

The libraries include NVIDIA RAPIDS cuDF, which accelerates pandas, software used by nearly 10 million data scientists.

“It used to take them hours and sometimes days to process data that now they can do in minutes,” Huang said.

“This pandas library is insanely complex,” he added, noting NVIDIA engineers worked for more than five years on reformulating the code so it can be accelerated with GPUs.

Entering a New Era

In tandem with the new systems, HP announced a partner training program developed in collaboration with NVIDIA. It will equip computer vendors to advise customers on the right AI products and solutions to meet their needs.

Such programs pave the way for an industry that’s entering an era where AI lets software write software.

“We’ve reinvented the computer. We’ve reinvented how software is written, and now we have to reinvent how software is used,” said Huang. “Large language models, connected into other LLMs, will help solve application problems — that’s the future.”

Read More

Automate the process to change image backgrounds using Amazon Bedrock and AWS Step Functions

Automate the process to change image backgrounds using Amazon Bedrock and AWS Step Functions

Many customers, including those in creative advertising, media and entertainment, ecommerce, and fashion, often need to change the background in a large number of images. Typically, this involves manually editing each image with photo software. This can take a lot of effort, especially for large batches of images. However, Amazon Bedrock and AWS Step Functions make it straightforward to automate this process at scale.

Amazon Bedrock offers the generative AI foundation model Amazon Titan Image Generator G1, which can automatically change the background of an image using a technique called outpainting. Step Functions allows you to create an automated workflow that seamlessly connects with Amazon Bedrock and other AWS services. Together, Amazon Bedrock and Step Functions streamline the entire process of automatically changing backgrounds across multiple images.

This post introduces a solution that simplifies the process of changing backgrounds in multiple images. By harnessing the capabilities of generative AI with Amazon Bedrock and the Titan Image Generator G1 model, combined with Step Functions, this solution efficiently generates images with the desired background. This post provides insight into the inner workings of the solution and helps you understand the design choices made to build this own custom solution.

See the GitHub repository for detailed instructions on deploying this solution.

Solution overview

Let’s look at how the solution works at a high level before diving deeper into specific elements and the AWS services used. The following diagram provides a simplified view of the solution architecture and highlights the key elements.

Solution Architecture

The workflow consists of the following steps:

  1. A user uploads multiple images into an Amazon Simple Storage Service (Amazon S3) bucket via a Streamlit web application.
  2. The Streamlit web application calls an Amazon API Gateway REST API endpoint integrated with the Amazon Rekognition DetectLabels API, which detects labels for each image.
  3. Upon submission, the Streamlit web application updates an Amazon DynamoDB table with image details.
  4. The DynamoDB update triggers an AWS Lambda function, which starts a Step Functions workflow.
  5. The Step Functions workflow runs the following steps for each image:
    5.1 Constructs a request payload for the Amazon Bedrock InvokeModel API.
    5.2 Invokes the Amazon Bedrock InvokeModel API action.
    5.3 Parses an image from the response and saves it to an S3 location.
    5.4 Updates the image status in a DynamoDB table.
  6. The Step Functions workflow invokes a Lambda function to generate a status report.
  7. The workflow sends an email using Amazon Simple Notification Service (Amazon SNS).

As shown in the following screenshot, the Streamlit web application allows you to upload images and enter text prompts to specify desired backgrounds, negative prompts, and outpainting mode for image generation. You can also view and remove unwanted labels associated with each uploaded image that you don’t want to keep in the final generated images.

Streamlit Web Application

In this example, the prompt for the background is “London city background.” The automation process generates new images based on the original uploaded images with London as the background.

Generated Images

Streamlit web application and images uploads

A Streamlit web application serves as the frontend for this solution. To protect the application from unauthorized access, it integrates with an Amazon Cognito user pool. API Gateway uses an Amazon Cognito authorizer to authenticate requests. The web application completes the following steps:

  1. For each selected image, it retrieves labels via Amazon Rekognition using an API Gateway REST API endpoint.
  2. Upon submission, the application uploads images to an S3 bucket.
  3. The application updates a DynamoDB table with relevant parameters, image names, and associated labels for each image using another API Gateway REST API endpoint.

Image processing workflow

When the DynamoDB table is updated, DynamoDB Streams triggers a Lambda function to start a new Step Functions workflow. The following is a sample request for the workflow:

{
  "Id": "621fa85a-38bb-4d98-a656-93bbbcf5477f",
  "S3Bucket": "<Image Bucket>",
  "InputS3Prefix": "image-files/<year>/<month>/<day>/<timestamp>",
  "OutputS3Prefix": "generated-image-files/<year>/<month>/<day>/<timestamp>",
  "StatusS3Prefix": "status-report-files/<year>/<month>/<day>/<timestamp>",
  "Prompt": "london city background",
  "NegativePrompt": "low quality, low resolution",
  "Mode": "PRECISE",
  "Images": [
    {
      "ImageName": "bus.png",
      "Labels": "Bus, Person"
    },
    {
      "ImageName": "cop.png",
      "Labels": "Person, Adult, Male, Man, Helmet, Jacket"
    },
    {
      "ImageName": "iguana-2.png",
      "Labels": "Lizard”
    },
    {
      "ImageName": "dog.png",
      "Labels": "Dog"
    }
  ]
}

The Step Functions workflow subsequently performs the following three steps:

  1. Replace the background for all images.
  2. Generate a status report.
  3. Send an email via Amazon SNS.

The following screenshot illustrates the Step Functions workflow.

AWS Step Functions Workflow

Let’s look at each step in more detail.

Replace background for all images

Step Functions uses a Distributed Map to process each image in parallel child workflows. The Distributed Map allows high-concurrency processing. Each child workflow has its own separate run history from that of the parent workflow.

Step Functions uses an InvokeModel optimized API action for Amazon Bedrock. The API accepts requests and responses that are up to 25 MB. However, Step Functions has a 256 KB limit on state payload input and output. To support larger images, the solution uses an S3 bucket where the InvokeModel API reads data from and writes the result to. The following is the configuration for the InvokeModel API for Amazon Bedrock integration:

{
    "ModelId": "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-image-generator-v1",
    "ContentType": "application/json",
    "Input": {  
        "S3Uri": “s3://<Image Bucket>/image-files/<year>/<month>/<day>/<timestamp>/<Image name>.json",
    },  
    "Output": {  
        "S3Uri": “s3://<Image Bucket>/generated-image-files/<year>/<month>/<day>/<timestamp>/<Image name>.json”
    } 
}

The Input S3Uri parameter specifies the source location to retrieve the input data. The Output S3Uri parameter specifies the destination to write the API response.

A Lambda function saves the request payload as a JSON file in the specified Input S3Uri location. The InvokeModel API uses this input payload to generate images with the specified background:

{
    "taskType": "OUTPAINTING",
    "outPaintingParams": {
        "text": "london city background",
        "negativeText": "low quality, low resolution",        
        "image": "<base64-encoded string>",                         
        "maskPrompt": "Bus",                      
        "maskImage": "base64-encoded string",                             
        "outPaintingMode": "DEFAULT | PRECISE"                 
    },                                                 
    "imageGenerationConfig": {
        "numberOfImages": 1,
        "quality": "premium",
        "height": 1024,
        "width": 1024,
        "cfgScale": 8.0
    }
}

The Titan Image Generator G1 model supports the following parameters for image generation:

  • taskType – Specifies the outpainting method to replace background of image.
  • text – A text prompt to define the background.
  • negativeText – A text prompt to define what not to include in the image.
  • maskPrompt – A text prompt that defines the mask. It corresponds to labels that you want to retain in the final generated images.
  • maskImage – The JPEG or PNG image encoded in base64.
  • outPaintingMode – Specifies whether to allow modification of the pixels inside the mask or not. DEFAULT allows modification of the image inside the mask in order to keep it consistent with the reconstructed background. PRECISE prevents modification of the image inside the mask.
  • numberOfImages – The number of images to generate.
  • quality – The quality of the generated images: standard or premium.
  • cfgScale – Specifies how strongly the generated image should adhere to the prompt.
  • height – The height of the image in pixels.
  • width – The width of the image in pixels.

The Amazon Bedrock InvokeModel API generates a response with an encoded image in the Output S3Uri location. Another Lambda function parses the image from the response, decodes it from base64, and saves the image file in the following location: s3://<Image Bucket>/generated-image-file/<year>/<month>/<day>/<timestamp>/.

Finally, a child workflow updates a DynamoDB table with image generation status, marking it as either Succeeded or Failed, and including details such as ImageName, Cause, Error, and Status.

Generate a status report

After the image generation process, a Lambda function retrieves the status details from DynamoDB. It dynamically compiles these details into a comprehensive status report in JSON format. It then saves the generated status report a JSON file in the following location: s3://<Image Bucket>/status-report-files/<year>/<month>/<day>/<timestamp>/. The ITOps team can integrate this report with their existing notification system to track if image processing completed successfully. For business users, you can expand this further to generate a report in CSV format.

Send an email via Amazon SNS

Step Functions invokes an Amazon SNS API action to send an email. The email contains details including the S3 location for the status report and final images files. The following is the sample notification email.

Notification Email

Conclusion

In this post, we provided an overview of a sample solution demonstrating the automation of changing image backgrounds at scale using Amazon Bedrock and Step Functions. We also explained each element of the solution in detail. By using the Step Functions optimized integration with Amazon Bedrock, Distributed Map, and the Titan Image Generator G1 model, the solution efficiently replaces the backgrounds of images in parallel, enhancing productivity and scalability.

To deploy the solution, refer to the instructions in the GitHub repository.

Resources

To learn more about Amazon Bedrock, see the following resources:

To learn more about the Titan Image Generator G1 model, see the following resources:

To learn more about using Amazon Bedrock with Step Functions, see the following resources:


About the Author

Chetan Makvana is a Senior Solutions Architect with Amazon Web Services. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture and implementing strategies to drive adoption of AWS services. He is a technology enthusiast and a builder with a core area of interest on generative AI, serverless, and DevOps. Outside of work, he enjoys watching shows, traveling, and music. 

Read More

Social learning: Collaborative learning with large language models

Social learning: Collaborative learning with large language models

Large language models (LLMs) have significantly improved the state of the art for solving tasks specified using natural language, often reaching performance close to that of people. As these models increasingly enable assistive agents, it could be beneficial for them to learn effectively from each other, much like people do in social settings, which would allow LLM-based agents to improve each other’s performance.

To discuss the learning processes of humans, Bandura and Walters described the concept of social learning in 1977, outlining different models of observational learning used by people. One common method of learning from others is through a verbal instruction (e.g., from a teacher) that describes how to engage in a particular behavior. Alternatively, learning can happen through a live model by mimicking a live example of the behavior.

Given the success of LLMs mimicking human communication, in our paper “Social Learning: Towards Collaborative Learning with Large Language Models”, we investigate whether LLMs are able to learn from each other using social learning. To this end, we outline a framework for social learning in which LLMs share knowledge with each other in a privacy-aware manner using natural language. We evaluate the effectiveness of our framework on various datasets, and propose quantitative methods that measure privacy in this setting. In contrast to previous approaches to collaborative learning, such as common federated learning approaches that often rely on gradients, in our framework, agents teach each other purely using natural language.

Social learning for LLMs

To extend social learning to language models, we consider the scenario where a student LLM should learn to solve a task from multiple teacher entities that already know that task. In our paper, we evaluate the student’s performance on a variety of tasks, such as spam detection in short text messages (SMS), solving grade school math problems, and answering questions based on a given text.

A visualization of the social learning process: A teacher model provides instructions or few-shot examples to a student model without sharing its private data.

Language models have shown a remarkable capacity to perform tasks given only a handful of examples–a process called few-shot learning. With this in mind, we provide human-labeled examples of a task that enables the teacher model to teach it to a student. One of the main use cases of social learning arises when these examples cannot be directly shared with the student due, for example, to privacy concerns.

To illustrate this, let’s look at a hypothetical example for a spam detection task. A teacher model is located on device where some users volunteer to mark incoming messages they receive as either “spam” or “not spam”. This is useful data that could help train a student model to differentiate between spam and not spam, but sharing personal messages with other users is a breach of privacy and should be avoided. To prevent this, a social learning process can transfer the knowledge from the teacher model to the student so it learns what spam messages look like without needing to share the user’s personal text messages.

We investigate the effectiveness of this social learning approach by analogy with the established human social learning theory that we discussed above. In these experiments, we use PaLM 2-S models for both the teacher and the student.

A systems view of social learning: At training time, multiple teachers teach the student. At inference time, the student is using what it learned from the teachers.

Synthetic examples

As a counterpart to the live teaching model described for traditional social learning, we propose a learning method where the teachers generate new synthetic examples for the task and share them with the student. This is motivated by the idea that one can create a new example that is sufficiently different from the original one, but is just as educational. Indeed, we observe that our generated examples are sufficiently different from the real ones to preserve privacy while still enabling performance comparable to that achieved using the original examples.

The 8 generated examples perform as well as the original data for several tasks (see our paper).

We evaluate the efficacy of learning through synthetic examples on our task suite. Especially when the number of examples is high enough, e.g., n = 16, we observe no statistically significant difference between sharing original data and teaching with synthesized data via social learning for the majority of tasks, indicating that the privacy improvement does not have to come at the cost of model quality.

Generating 16 instead of just 8 examples further reduces the performance gap relative to the original examples.

The one exception is spam detection, for which teaching with synthesized data yields lower accuracy. This may be because the training procedure of current models makes them biased to only generate non-spam examples. In the paper, we additionally look into aggregation methods for selecting good subsets of examples to use.

Synthetic instruction

Given the success of language models in following instructions, the verbal instruction model can also be naturally adapted to language models by having the teachers generate an instruction for the task. Our experiments show that providing such a generated instruction effectively improves performance over zero-shot prompting, reaching accuracies comparable to few-shot prompting with original examples. However, we did find that the teacher model may fail on certain tasks to provide a good instruction, for example due to a complicated formatting requirement of the output.

For Lambada, GSM8k, and Random Insertion, providing synthetic examples performs better than providing generated instructions, whereas in the other tasks generated instruction obtains a higher accuracy. This observation suggests that the choice of the teaching model depends on the task at hand, similar to how the most effective method for teaching people varies by task.

Depending on the task, generating instructions can work better than generating new examples.

Memorization of the private examples

We want teachers in social learning to teach the student without revealing specifics from the original data. To quantify how prone this process is to leaking information, we used Secret Sharer, a popular method for quantifying to what extent a model memorizes its training data, and adapted it to the social learning setting. We picked this method since it had previously been used for evaluating memorization in federated learning.

To apply the Secret Sharer method to social learning, we design “canary” data points such that we can concretely measure how much the training process memorized them. These data points are included in the datasets used by teachers to generate new examples. After the social learning process completes, we can then measure how much more confident the student is in the secret data points the teacher used, compared to similar ones that were not shared even with the teachers.

In our analysis, discussed in detail in the paper, we use canary examples that include names and codes. Our results show that the student is only slightly more confident in the canaries the teacher used. In contrast, when the original data points are directly shared with the student, the confidence in the included canaries is much higher than in the held-out set. This supports the conclusion that the teacher does indeed use its data to teach without simply copying it over.

Conclusion and next steps

We introduced a framework for social learning that allows language models with access to private data to transfer knowledge through textual communication while maintaining the privacy of that data. In this framework, we identified sharing examples and sharing instructions as basic models and evaluated them on multiple tasks. Furthermore, we adapted the Secret Sharer metric to our framework, proposing a metric for measuring data leakage.

As next steps, we are looking for ways of improving the teaching process, for example by adding feedback loops and iteration. Furthermore, we want to investigate using social learning for modalities other than text.

Acknowledgements

We would like to acknowledge and thank Matt Sharifi, Sian Gooding, Lukas Zilka, and Blaise Aguera y Arcas, who are all co-authors on the paper. Furthermore, we would like to thank Victor Cărbune, Zachary Garrett, Tautvydas Misiunas, Sofia Neata and John Platt for their feedback, which greatly improved the paper. We’d also like to thank Tom Small for creating the animated figure.

Read More