Challenges in Multi-objective Optimization for Automatic Wireless Network Planning

Economics, combinatorics, physics, and signal processing conspire to make it difficult to design, build, and operate high-quality, cost-effective wireless networks. The radio transceivers that communicate with our mobile phones, the equipment that supports them (such as power and wired networking), and the physical space they occupy are all expensive, so it’s important to be judicious in choosing sites for new transceivers. Even when the set of available sites is limited, there are exponentially many possible networks that can be built. For example, given only 50 sites, there are 250 (over a million billion) possibilities!

Further complicating things, for every location where service is needed, one must know which transceiver provides the strongest signal and how strong it is. However, the physical characteristics of radio propagation in an environment containing buildings, hills, foliage, and other clutter are incredibly complex, so accurate predictions require sophisticated, computationally-intensive models. Building all possible sites would yield the best coverage and capacity, but even if this were not prohibitively expensive, it would create unacceptable interference among nearby transceivers. Balancing these trade-offs is a core mathematical difficulty.

The goal of wireless network planning is to decide where to place new transceivers to maximize coverage and capacity while minimizing cost and interference. Building an automatic network planning system (a.k.a., auto-planner) that quickly solves national-scale problems at fine-grained resolution without compromising solution quality has been among the most important and difficult open challenges in telecom research for decades.

To address these issues, we are piloting network planning tools built using detailed geometric models derived from high-resolution geographic data, that feed into radio propagation models powered by distributed computing. This system provides fast, high-accuracy predictions of signal strength. Our optimization algorithms then intelligently sift through the exponential space of possible networks to output a small menu of candidate networks that each achieve different desirable trade-offs among cost, coverage, and interference, while ensuring enough capacity to meet demand.

Example auto-planning project in Charlotte, NC. Blue dots denote selected candidate sites. The heat map indicates signal strength from the propagation engine. The inset emphasizes the non-isotropic path loss in downtown.

<!–

Example auto-planning project in Charlotte, NC. Blue dots denote selected candidate sites. The heat map indicates signal strength from the propagation engine. The inset emphasizes the non-isotropic path loss in downtown.

–>

Radio Propagation
The propagation of radio waves near Earth’s surface is complicated. Like ripples in a pond, they decay with distance traveled, but they can also penetrate, bounce off, or bend around obstacles, further weakening the signal. Computing radio wave attenuation across a real-world landscape (called path loss) is a hybrid process combining traditional physics-based calculations with learned corrections accounting for obstruction, diffraction, reflection, and scattering of the signal by clutter (e.g., trees and buildings).

We have developed a radio propagation modeling engine that leverages the same high-res geodata that powers Google Earth, Maps and Street View to map the 3D distribution of vegetation and buildings. While accounting for signal origin, frequency, broadcast strength, etc., we train signal correction models using extensive real-world measurements, which account for diverse propagation environments — from flat to hilly terrain and from dense urban to sparse rural areas.

While such hybrid approaches are common, using detailed geodata enables accurate path loss predictions below one-meter resolution. Our propagation engine provides fast point-to-point path loss calculations and scales massively via distributed computation. For instance, computing coverage for 25,000 transceivers scattered across the continental United States can be done at 4 meter resolution in only 1.5 hours, using 1000 CPU cores.

Photorealistic 3D model in Google Earth (top-left) and corresponding clutter height model (top-right). Path profile through buildings and trees from a source to destination in the clutter model (bottom). Gray denotes buildings and green denotes trees.

Auto-Planning Inputs
Once accurate coverage estimates are available, we can use them to optimize network planning, for example, deciding where to place hundreds of new sites to maximize network quality. The auto-planning solver addresses large-scale combinatorial optimization problems such as these, using a fast, robust, scalable approach.

Formally, an auto-planning input instance contains a set of demand points (usually a square grid) where service is to be provided, a set of candidate transceiver sites, predicted signal strengths from candidate sites to demand points (supplied by the propagation model), and a cost budget. Each demand point includes a demand quantity (e.g., estimated from the population of wireless users), and each site includes a cost and capacity. Signal strengths below some threshold are omitted. Finally, the input may include an overall cost budget.

Data Summarization for Large Instances
Auto-planning inputs can be huge, not just because of the number of candidate sites (tens of thousands), and demand points (billions), but also because it requires signal strengths to all demand points from all nearby candidate sites. Simple downsampling is insufficient because population density may vary widely over a given region. Therefore, we apply methods like priority sampling to shrink the data. This technique produces a low-variance, unbiased estimate of the original data, preserving an accurate view of the network traffic and interference statistics, and shrinking the input data enough that a city-size instance fits into memory on one machine.

Multi-objective Optimization via Local Search
Combinatorial optimization remains a difficult task, so we created a domain-specific local search algorithm to optimize network quality. The local search algorithmic paradigm is widely applied to address computationally-hard optimization problems. Such algorithms move from one solution to another through a search space of candidate solutions by applying small local changes, stopping at a time limit or when the solution is locally optimal. To evaluate the quality of a candidate network, we combine the different objective functions into a single one, as described in the following section.

The number of local steps to reach a local optimum, number of candidate moves we evaluate per step, and time to evaluate each candidate can all be large when dealing with realistic networks. To achieve a high-quality algorithm that finishes within hours (rather than days), we must address each of these components. Fast candidate evaluation benefits greatly from dynamic data structures that maintain the mapping between each demand point and the site in the candidate solution that provides the strongest signal to it. We update this “strongest-signal” map efficiently as the candidate solution evolves during local search. The following observations help limit both the number of steps to convergence and evaluations per step.

Bipartite graph representing candidate sites (left) and demand points (right). Selected sites are circled in red, and each demand point is assigned to its strongest available connection. The topmost demand point has no service because the only site that can reach it was not selected. The third and fourth demand points from the top may have high interference if the signal strengths attached to their gray edges are only slightly lower than the ones on their red edges. The bottommost site has high congestion because many demand points get their service from that site, possibly exceeding its capacity.

Selecting two nearby sites is usually not ideal because they interfere. Our algorithm explicitly forbids such pairs of sites, thereby steering the search toward better solutions while greatly reducing the number of moves considered per step. We identify pairs of forbidden sites based on the demand points they cover, as measured by the weighted Jaccard index. This captures their functional proximity much better than simple geographic distance does, especially in urban or hilly areas where radio propagation is highly non-isotropic.

Breaking the local search into epochs also helps. The first epoch mostly adds sites to increase the coverage area while avoiding forbidden pairs. As we approach the cost budget, we begin a second epoch that includes swap moves between forbidden pairs to fine-tune the interference. This restriction limits the number of candidate moves per step, while focusing on those that improve interference with less change to coverage.

Three candidate local search moves. Red circles indicate selected sites and the orange edge indicates a forbidden pair.

Outputting a Diverse Set of Good Solutions
As mentioned before, auto-planning must balance three competing objectives: maximizing coverage, while minimizing interference and capacity violations, subject to a cost budget. There is no single correct tradeoff, so our algorithm delegates the final decision to the user by providing a small menu of candidate networks with different emphases. We apply a multiplier to each objective and optimize the sum. Raising the multiplier for a component guides the algorithm to emphasize it. We perform grid search over multipliers and budgets, generating a large number of solutions, filter out any that are worse than another solution along all four components (including cost), and finally select a small subset that represent different tradeoffs.

Menu of candidate solutions, one per row, displaying metrics. Clicking on a solution turns the selected sites pink and displays a plot of the interference distribution across covered area and demand. Sites not selected are blue.

Conclusion
We described our efforts to address the most vexing challenges facing telecom network operators. Using combinatorial optimization in concert with geospatial and radio propagation modeling, we built a scalable auto-planner for wireless telecommunication networks. We are actively exploring how to expand these capabilities to best meet the needs of our customers. Stay tuned!

For questions and other inquiries, please reach out to wireless-network-interest@google.com.

Acknowledgements
These technological advances were enabled by the tireless work of our collaborators: Aaron Archer, Serge Barbosa Da Torre, Imad Fattouch, Danny Liberty, Pishoy Maksy, Zifei Tong, and Mat Varghese. Special thanks to Corinna Cortes, Mazin Gilbert, Rob Katcher, Michael Purdy, Bea Sebastian, Dave Vadasz, Josh Williams, and Aaron Yonas, along with Serge and especially Aaron Archer for their assistance with this blog post.

Read More

Moderate, classify, and process documents using Amazon Rekognition and Amazon Textract

Many companies are overwhelmed by the abundant volume of documents they have to process, organize, and classify to serve their customers better. Examples of such can be loan applications, tax filing, and billing. Such documents are more commonly received in image formats and are mostly multi-paged and in low-quality format. To be more competitive and cost-efficient, and to stay secure and compliant at the same time, these companies must evolve their document processing capabilities to reduce processing times and improve classification accuracy in an automated and scalable way. These companies face the following challenges in processing documents:

  • Performing moderation on the documents to detect inappropriate, unwanted, or offensive content
  • Manual document classification, which is adopted by smaller companies, is time-consuming, error-prone, and expensive
  • OCR techniques with rules-based systems aren’t intelligent enough and can’t adopt to changes in document format
  • Companies that adopt machine learning (ML) approaches often don’t have resources to scale their model to handle spikes in incoming document volume

This post tackles these challenges and provides an architecture that efficiently solves these problems. We show how you can use Amazon Rekognition and Amazon Textract to optimize and reduce human efforts in processing documents. Amazon Rekognition identifies moderation labels in your document and classify them using Amazon Rekognition Custom Labels. Amazon Textract extracts text from your documents.

In this post, we cover building two ML pipelines (training and inference) to process documents without the need for any manual effort or custom code. The high-level steps in the inference pipeline include:

  1. Perform moderation on uploaded documents using Amazon Rekognition.
  2. Classify documents into different categories such as W-2s, invoices, bank statements, and pay stubs using Rekognition Custom Labels.
  3. Extract text from documents such as printed text, handwriting, forms, and tables using Amazon Textract.

Solution overview

This solution uses the following AI services, serverless technologies, and managed services to implement a scalable and cost-effective architecture:

  • Amazon DynamoDB – A key-value and document database that delivers single-digit millisecond performance at any scale.
  • Amazon EventBridge – A serverless event bus to build event-driven applications at scale using events generated from your applications, integrated software as a service (SaaS) applications, and AWS services.
  • AWS Lambda – A serverless compute service that lets you run code in response to triggers such as changes in data, shifts in system state, or user actions.
  • Amazon Rekognition – Uses ML to identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content.
  • Amazon Rekognition Custom Labels – Uses AutoML for computer vision and transfer learning to help you train custom models to identify the objects and scenes in images that are specific to your business needs.
  • Amazon Simple Storage Service (Amazon S3) – Serves as an object store for your documents and allows for central management with fine-tuned access controls.
  • Amazon Step Functions – A serverless function orchestrator that makes it easy to sequence Lambda functions and multiple services into business-critical applications.
  • Amazon Textract – Uses ML to extract text and data from scanned documents in PDF, JPEG, or PNG formats.

The following diagram illustrates the architecture of the inference pipeline.

Inference Pipeline ArchitectureOur workflow includes the following steps:

  1. User uploads documents into the input S3 bucket.
  2. The upload triggers an Amazon S3 Event Notification to deliver real-time events directly to EventBridge. The Amazon S3 events that match the “object created” filter defined for an EventBridge rule starts the Step Functions workflow.
  3. The Step Functions workflow triggers a series of Lambda functions, which perform the following tasks:
    1. The first function performs preprocessing tasks and makes API calls to Amazon Rekognition:
      • If the incoming documents are in image format (such as JPG or PNG), the function calls the Amazon Rekognition API and provide the documents as S3 objects. However, if the document is in PDF format, the function streams the image bytes when calling the Amazon Rekognition API.
      • If a document contains multiple pages, the function splits the document into individual pages and saves them in an intermediate folder in the output S3 bucket before processing them individually.
      • When the preprocessing tasks are complete, the function makes an API call to Amazon Rekognition to detect inappropriate, unwanted, or offensive content, and makes another API call to the trained Rekognition Custom Labels model to classify documents.
    2. The second function makes an API call to Amazon Textract to initiate a job for extracting text from the input document and storing it in the output S3 bucket.
    3. The third function stores document metadata such as moderation label, document classification, classification confidence, Amazon Textract job ID, and file path into an DynamoDB table.

You can adjust the workflow as per your requirement, for example you can add a natural language processing (NLP) capability in this workflow using Amazon Comprehend to gain insights into the extracted text.

Training pipeline

Before we deploy this architecture, we train a custom model to classify documents into different categories using Rekognition Custom Labels. In the training pipeline, we label the documents using Amazon SageMaker Ground Truth. We then use the labeled documents to train a model with Rekognition Custom Labels. In this example, we use an Amazon SageMaker notebook to perform these steps, but you can also annotate images using the Rekognition Custom Labels console. For instructions, refer to Labeling images.

Training Pipeline Architecture

Dataset

To train the model, we use the following public datasets containing W2s and invoices:

You can use another dataset relevant for your industry.

The following table summarizes the dataset splits between training and testing.

Class Training set Test set
Invoices 352 75
W-2s 86 16
Total 438 91

Deploy the training pipeline with AWS CloudFormation

You deploy an AWS CloudFormation template to provision the necessary AWS Identity and Access Management (IAM) roles and components of the training pipeline, including a SageMaker notebook instance.

  1. Launch the following CloudFormation template in the US East (N. Virginia) Region:
  2. For Stack name, enter a name, such as document-processing-training-pipeline.
  3. Choose Next.
    Training CFN Stack
  4. In the Capabilities and transforms section, select the check box to acknowledge that AWS CloudFormation might create IAM resources.
  5. Choose Create stack.
    Training CFN Stack

The stack details page should show the status of the stack as CREATE_IN_PROGRESS. It can take up to 5 minutes for the status to change to CREATE_COMPLETE. When it’s complete, you can view the outputs on the Outputs tab.

  1. After the stack is launched successfully, open the SageMaker console and choose Notebook instances in the navigation name.
  2. Look for an instance with the DocProcessingNotebookInstance- prefix and wait until its status is InService.
  3. Under Actions, choose Open Jupyter.
    Open Jupyter Notebook

Run the example notebook

To run your notebook, complete the following steps:

  1. Choose the Rekognition_Custom_Labels example notebook.
    Choose Notebook Jupyter
  2. Choose Run to run the cells in the example notebook in order.
    Run cells Jupyter

The notebook demonstrates the entire lifecycle of preparing training and test images, labeling them, creating manifest files, training a model, and running the trained model with Rekognition Custom Labels. Alternatively, you can train and run the model using the Rekognition Custom Labels console. For instructions, refer to Training a model (Console).

The notebook is self-explanatory; you can follow the steps to complete training the model.

  1. Make a note of the ProjectVersionArn to provide for the inference pipeline in a later step.

For SageMaker notebook instances, you’re charged for the instance type you choose, based on the duration of use. If you’re finished training the model, you can stop the notebook instance to avoid cost of idle resources.

Deploy the inference pipeline with AWS CloudFormation

To deploy the inference pipeline, complete the following steps:

  1. Launch the following CloudFormation template in the US East (N. Virginia) Region:
  2. For Stack name, enter a name, such as document-processing-inference-pipeline.
  3. For DynamoDBTableName, enter a unique DynamoDB table name; for example, document-processing-table.
  4. For InputBucketName, enter a unique name for the S3 bucket the stack creates; for example, document-processing-input-bucket.

The input documents are uploaded to this bucket before they’re processed. Use only lowercase characters and no spaces when you create the name of the input bucket. Furthermore, this operation creates a new S3 bucket, so don’t use the name of an existing bucket. For more information, see Rules for Bucket Naming.

  1. For OutputBucketName, enter a unique name for your output bucket; for example, document-processing-output-bucket.

This bucket stores the output documents after they’re processed. It also stores pages of multi-page PDF input documents after they’re split by Lambda function. Follow the same naming rules as your input bucket.

  1. For RekognitionCustomLabelModelARN, enter the ProjectVersionArn value you noted from the Jupyter notebook.
  2. Choose Next.
    Inference CFN Stack
  3. On the Configure stack options page, set any additional parameters for the stack, including tags.
  4. Choose Next.
  5. In the Capabilities and transforms section, select the check box to acknowledge that AWS CloudFormation might create IAM resources.
  6. Choose Create stack.

The stack details page should show the status of the stack as CREATE_IN_PROGRESS. It can take up to 5 minutes for the status to change to CREATE_COMPLETE. When it’s complete, you can view the outputs on the Outputs tab.

Process a document through the pipeline

We’ve deployed both training and inference pipelines, and are now ready to use the solution and process a document.

  1. On the Amazon S3 console, open the input bucket.
  2. Upload a sample document into the S3 folder.

This starts the workflow. The process populates the DynamoDB table with document classification and moderation labels. The output from Amazon Textract is delivered to the output S3 bucket in the TextractOutput folder.

We submitted a few different sample documents to the workflow and received the following information populated in the DynamoDB table.

Metadata storage in DynamoDB

If you don’t see items in the DynamoDB table or documents uploaded in the output S3 bucket, check the Amazon CloudWatch Logs for the corresponding Lambda function and look for potential errors that caused the failure.

Clean up

Complete the following steps to clean up resources deployed for this solution:

  1. On the CloudFormation console, choose Stacks.
  2. Select the stacks deployed for this solution.
  3. Choose Delete.

These steps don’t delete the S3 buckets, DynamoDB table, and the trained Rekognition Custom Labels model. You continue to incur storage charges if they’re not deleted. You should delete these resources directly via their respective service consoles if you no longer need them.

Conclusion

In this post, we presented a scalable, secure, and automated approach to moderate, classify, and process documents. Companies across multiple industries can use this solution to improve their business and serve their customers better. It allows for faster document processing and higher accuracy, and reduces the complexity of data extraction. It also provides better security and compliance with personal data legislation by reducing the human workforce involved in processing incoming documents.

For more information, see the Amazon Rekognition Custom Labels guide, Amazon Rekognition developer guide and Amazon Textract developer guide. If you’re new to Amazon Rekognition Custom Labels, try it out using our Free Tier, which lasts 3 months and includes 10 free training hours per month and 4 free inference hours per month. Amazon Rekognition free tier includes processing 5,000 images per month for 12 months. Amazon Textract free tier also lasts for three months and includes 1,000 pages per month for Detect Document Text API.


About the Authors

Jay Rao is a Principal Solutions Architect at AWS. He enjoys providing technical and strategic guidance to customers and helping them design and implement solutions on AWS.

Uchenna Egbe is an Associate Solutions Architect at AWS. He spends his free time researching about herbs, teas, superfoods, and how he can incorporate them into his daily diet.

Read More

Urban Jungle: AI-Generated Endangered Species Mix With Times Square’s Nightlife

Bengal tigers, red pandas and mountain gorillas are among the world’s most familiar endangered species, but tens of thousands of others — like the Karpathos frog, the Perote deer mouse or the Mekong giant catfish — are largely unknown.

Typically perceived as lacking star quality, these species are now roaming massive billboards in one of the world’s busiest destinations. An AI-powered initiative is spotlighting lesser-known endangered creatures on Times Square billboards this month, nightly in the few minutes before midnight across nearly 100 screens.

The project, dubbed Critically Extant, uses AI to illustrate the limited public data available on critically endangered flora and fauna. It’s the first deep learning art display in the Times Square Arts program’s decade-long history.

“A neural network can only create images based on what it’s seen in training data, and there’s very little information online about some of these critically endangered species,” said artist Sofia Crespo, who created the work with support from Meta Open Arts, using NVIDIA GPUs for AI training and inference. “This project is ultimately about representation — for us to recognize that we are biased towards some species versus others.”

Artwork courtesy of Sofia Crespo

These biases in representation have implications on the effort and funding given to save different species. Research has shown that a small subset of endangered species that are considered charismatic, cute or marketable receive more funding than they need, while most others receive little to no support.

When endangered species of any size — such as insects, fungi or plants — are left without conservation resources, they’re more vulnerable to extinction, contributing to a severe loss of biodiversity that makes ecosystems and food webs less resilient.

Intentionally Imperfect Portraits

The AI model, created by Crespo and collaborator Feileacan McCormick, was trained on a paired dataset of nearly 3 million nature images and text describing around 10,000 species. But this still wasn’t enough data to create true-to-life portraits of the less popular endangered species.

So the deep learning model, a generative adversarial network, does the best it can, guessing the features of a given endangered species based on related species. Due to the limited source data, many of the AI-generated creatures have a different color or body shape than their real-life counterparts — and that’s the point.

“Part of the project was relying on the open-source data that’s available right now,” said Crespo. “If that’s all the data we have, and species go extinct, what kind of knowledge and imagination do we have about the world that was lost?”

Critically Extant features more than 30 species, including amphibians, birds, fish, flowering plants, fungi and insects. After feeding species names to the generative AI model, Crespo animated and processed the synthetic images further to create the final moving portraits.

 

The AI model behind this project was trained using a cluster of NVIDIA Tensor Core GPUs. Crespo used a desktop NVIDIA RTX A6000 GPU for what she called “lightning-quick” inference.

AI in the Public Square

Critically Extant’s Times Square display premiered on May 1 and will be shown nightly through the end of the month.

Image by Michael Hull/Times Square Arts

The three-minute display features all 30+ specimens in a randomized arrangement that shifts every 30 seconds or so. Crespo said that using the NVIDIA RTX A6000 GPU was essential to generate the high-resolution images needed to span dozens of digital billboards.

Crespo and McCormick, who run an ecology and AI-focused studio, also enhanced the art display with an AI-generated soundtrack trained on a diverse range of animal sounds.

“The idea is to show diversity with many creatures, and overwhelm the audience with creatures that look very different from one another,” Crespo said.

The project began as an exhibition on Instagram, with the goal of adding representation of critically endangered species to social media conversations. At Times Square, the work will reach an audience of hundreds of thousands more.

“Crespo’s work brings the natural world directly into the center of the very urban environment at odds with these at-risk species, and nods to the human changes that will be required to save them,” reads the Times Square Arts post.

Crespo and McCormick have showcased their work at NVIDIA GTC, most recently an AI-generated fragment of coral reef titled Beneath the Neural Waves.

Learn more about AI artwork by Crespo and McCormick on the NVIDIA AI Art Gallery, and catch Critically Extant in Times Square through May 31.

Times Square images courtesy of Times Square Arts, photographed by Michael Hull. Artwork by Sofia Crespo. 

The post Urban Jungle: AI-Generated Endangered Species Mix With Times Square’s Nightlife appeared first on NVIDIA Blog.

Read More

AI and Machine Learning @ I/O Recap

Posted by TensorFlow Team

Google I/O 2022 was a major milestone in the evolution of AI and Machine Learning for developers. We’re really excited about the potential for developers using our technologies and Machine Learning to build intelligent solutions, and we believe that 2022 is the year when AI and ML become part of every developer’s toolbox.

At the I/O keynotes we showed our fully open source ecosystem that takes you from end to end with Machine Learning. There are developer tools for managing data, training your models, and deploying them to a variety of surfaces from global scale cloud all the way down to tiny microcontrollers…and of course ways to monitor and maintain your systems with MLOps. All of this comes with a common set of accelerated hardware for training and inference, along with open source tooling for responsible AI end-to-end.

You can get a tour through this ecosystem in the Keynote “AI and Machine Learning updates for Developers”

Responsible AI review processes: From a developer’s point of view

We can all agree that responsible and ethical AI is important, but when you want to build responsibly, you need tooling. We could, and will, create a whole video series about these tools, but the great content to watch right now is the talk on the Responsible AI review process. Googlers who worked on projects like the Covid-19 public forecasts or the Celebrity Recognition APIs will take you step-by-step through their thought process and how the tools lined up to help them build more responsibly and thoughtfully. You’ll also learn about some of the new releases in Responsible AI tools, such as the Counterfactual Logit Pairing library.

Adding machine learning to your developer toolbox

If you’re just getting started on your journey and you want ML to be a part of your toolbox, you probably have a million questions. Follow a developer’s journey through the best offerings, from a turnkey API that can solve basic problems fast, to custom models that can be tuned and deployed.

TensorFlow.js: From prototype to production, what’s new in 2022?

If you’re a web developer there’s a whole bunch of new updates, from the announcement of a new set of courses that will take you from first principles through a deep dive of what’s possible to lots of new models available to web devs. These include a selfie depth estimation model that can be used for cool things like a 3D effect in your pictures without needing any kind of extra sensor. You’ll also see 3D pose estimation that allows you to run at a high FPS to get real time results, allowing you to do things like having a full animated character following your body motion. All in the browser!

Deploy a custom ML model to mobile

If you want to build better mobile apps with AI and Machine Learning, you probably need to understand the ins and outs of getting models to execute on Android or iOS devices, including shrinking them and optimizing them to be power friendly. Supercharge your model with new releases from the TensorFlow Lite team that let you quantize, debug, and accelerate your model on CPU or delegated GPUs, and a whole lot more.

Further on the edge with Coral Dev Board Micro

Speaking of acceleration, this year at I/O we introduced the Coral Dev Board Micro. This is a new microcontroller class device with an on-board Edge TPU that’s powerful enough to run multiple models in tandem. The Coral team has also updated their catalog of pre-trained models, now including over 40 models now available for you to use on embedded systems out of the box!

Tips and tricks for distributed large model training

On the other side of the spectrum, if you want to train large models, you’ll need to understand how to shard training and data across multiple processors or cores. We’ve released lots of new guidance and updates for model and data parallelism. You can learn all about them in this talk, including lessons learned from Google researchers in building language models.

Easier data preprocessing with Keras

Of course, not all data is big data, and if you’re not building giant models, you still need to be able to manage your data. Often this is where devs will write the most code for ML, so we want to highlight some ways of making this easier, in particular with Keras. Keras’s new preprocessing layers that not only make vectorization and augmentation much easier, but also allow for precomputation to make your training more efficient by reducing idle time. Learn about data preprocessing from the creator of Keras!

An introduction to MLOps with TFX

Finally, let’s not forget MLOps and TFX, the open source, end-to-end pipeline management tool. Check out the talk from Robert Crowe who will help you understand everything, from why you need MLOps to managing your process through managing change. You’ll see the component model in TFX, and get an introduction to the new TFX-Addons community that’s focussed on building new ones. Check it all out in this talk!!

I/O wasn’t just about new releases and talks! If you are inspired by any of what you saw, we also have workshops and learning paths you can dig into to learn in more detail.

Full playlist to all AI/ML talks and workshops.

That’s it for this roundup of AI and ML at Google I/O 2022. We hope you’ve enjoyed it, and we’d love to hear your feedback when you explore the content. Please drop by the TensorFlow Forum and let us know what you think!

Read More

GFN Thursday Gets Groovy As ‘Evil Dead: The Game’ Marks 1,300 Games on GeForce NOW

Good. Bad. You’re the Guy With the Gun this GFN Thursday.

Get ready for some horrifyingly good fun with Evil Dead: The Game streaming on GeForce NOW tomorrow at release. It’s the 1,300th game to join GeForce NOW, joining on Friday the 13th.

And it’s part of eight total games joining the GeForce NOW library this week.

Hail to the King, Baby

Step into the shoes of Ash Williams and friends from the iconic Evil Dead franchise in Evil Dead: The Game (Epic Games Store), streaming on GeForce NOW at release tomorrow.

Work together in a game loaded with over-the-top co-op and PvP action across nearly all your devices. Grab your boomsticks, chainsaws and cleavers to fight against the armies of darkness, even on a Mac. Or take control of the Kandarian Demon to hunt the heroes by possessing Deadites, the environment and even the survivors themselves with a mobile phone.

For RTX 3080 members, the horror comes to life with realistic visuals and a physics-based gore system, enhanced by NVIDIA DLSS – the groundbreaking AI rendering technology that increases graphics performance by boosting frame rates and generating beautiful, sharp images.

Plus, everything is better in 4K. Whether you’re tearing a Deadite in two with Ash’s chainsaw hand or flying through the map as the Kandarian Demon, RTX 3080 members playing from the PC and Mac apps can bring the bloodshed in all its glory, streaming at up to 4K resolution and 60 frames per second.

There’s No Time Like Playtime

Brigandine The Legend of Runersia on GeForce NOW
Become a ruler, command knights and monsters, and outplay your enemies in Brigandine The Legend of Runersia.

Not a spooky fan? That’s okay. There’s fun for everyone with eight new games streaming this week:

  • Achilles: Legends Untold (New release on Steam)
  • Brigandine The Legend of Runersia (New release on Steam)
  • Neptunia x SENRAN KAGURA: Ninja Wars (New release on Steam)
  • Songs of Conquest (New release on Steam and Epic Games Store)
  • Cepheus Protocol Anthology (New release on Steam, May 13)
  • Evil Dead: The Game (New release on Epic Games Store, May 13)
  • Pogostuck: Rage With Your Friends (Steam)
  • Yet Another Zombie Defense HD (Steam)

With the armies of darkness upon us this weekend, we’ve got a question for you. Let us know how your chances are looking on Twitter or in the comments below.

The post GFN Thursday Gets Groovy As ‘Evil Dead: The Game’ Marks 1,300 Games on GeForce NOW appeared first on NVIDIA Blog.

Read More

Ambient Clinical Intelligence: Generating Medical Reports with PyTorch

Introduction

Complete and accurate clinical documentation is an essential tool for tracking patient care. It allows for treatment plans to be shared among care teams to aid in continuity of care and ensures a transparent and effective process for reimbursement.

Physicians are responsible for documenting patient care. Traditional clinical documentation methods have resulted in a sub-par patient-provider experience, less time interacting with patients, and decreased work-life balance. A significant amount of physicians’ time is spent in front of the computer doing administrative tasks. As a result, patients are less satisfied with the overall experience, and physicians, who prepare for years studying medicine, cannot practice at the top of their license and are burned out. Every hour physicians provide direct clinical face time to patients results in nearly two additional hours spent on EHR and desk work within the clinic day. Outside office hours, physicians spend another 1 to 2 hours of personal time each night doing additional computer and other clerical work.

Physician burnout is one of the primary causes for increased medical errors, malpractice suits, turnover, and decreased access to care. Burnout leads to an increase in healthcare costs and a decrease in overall patient satisfaction. Burnout costs the United States $4.6 billion a year.

What can we do to bring back trust, joy, and humanity to the delivery of healthcare? A significant portion of the administrative work consists of entering patient data into Electronic Health Records (EHRs) and creating clinical documentation. Clinical documentation is created from information already in the EHR as well as from the patient-provider encounter conversation.

This article will showcase how the Nuance Dragon Ambient eXperience (DAX), an AI-powered, voice-enabled, ambient clinical intelligence solution, automatically documents patient encounters accurately and efficiently at the point of care and the technologies that enable it.

Nuance DAX enhances the quality of care and patient experience, increases provider efficiency and satisfaction, and improves financial outcomes. It can be used in office and telehealth settings in all ambulatory specialties, including primary and urgent care.

Natural Language Processing

Natural Language Processing (NLP) is one of the most challenging fields in Artificial Intelligence (AI). It comprehends a set of algorithms that allow computers to understand or generate the language used by humans. These algorithms can process and analyze vast amounts of natural language data from different sources (either sound or text) to build models that can understand, classify, or even generate natural language as humans would. Like other fields in AI, NLP has significantly progressed thanks to the advent of Deep Learning (DL), which has resulted in models that can obtain results on par with humans in some tasks.

These advanced NLP techniques are being applied in healthcare. During a typical patient-provider encounter, a conversation ensues where the doctor constructs, through questions and answers, a chronological description of the development of the patient’s presenting illness or symptoms. A physician examines the patient and makes clinical decisions to establish a diagnosis and determine a treatment plan. This conversation, and data in the EHR, provide the required information for physicians to generate the clinical documentation, referred to as medical reports.

Two main NLP components play a role in automating the creation of clinical documentation. The first component, Automatic Speech Recognition (ASR), is used to translate speech into text. It takes the audio recording of the encounter and generates a conversation transcription (cf. Figure 2). The second component, Automatic Text Summarization, helps generate summaries from large text documents. This component is responsible for understanding and capturing the nuances and most essential aspects from the transcribed conversation into a final report in narrative form (cf. Figure 3), structured form, or a combination of both.

We will focus on this second component, Automatic Text Summarization, which is a difficult task with many challenges:

  • Its performance is tied to the ASR quality from multiple speakers (noisy input).
  • The input is conversational in nature and contains layman’s terms.
  • Protected Health Information (PHI) regulations limit medical data access.
  • The information for one output sentence is potentially spread across multiple conversation turns.
  • There is no explicit sentence alignment between input and output.
  • Various medical specialties, encounter types, and EHR systems constitute a broad and complex output space.
  • Physicians have different styles of conducting encounters and have their preferences for medical reports; there is no standard.
  • Standard summarization metrics might differ from human judgment of quality.

Figure 2: Transcript of a patient-doctor conversation

Figure 3: Excerpt of an AI-generated medical report. HPI stands for History of present illness.

Text Summarization with PyTorch and Fairseq

PyTorch is an open-source machine learning framework developed by Facebook that helps researchers prototype Deep Learning models. The Fairseq toolkit is built on top of PyTorch and focuses on sequence generation tasks, such as Neural Machine Translation (NMT) or Text Summarization. Fairseq features an active community that is continuously providing reference implementations of state-of-the-art models. It contains many built-in components (model architectures, modules, loss functions, and optimizers) and is easily extendable with plugins.

Text summarization constitutes a significant challenge in NLP. We need models capable of generating a short version of a document while retaining the key points and avoiding uninformative content. These challenges can be addressed with different approaches. 1). Abstractive text summarization aimed at training models that can generate a summary in narrative form. 2). Extractive methods where the models are trained to select the most important parts from the input text. 3). A combination of the two, where the essential parts from the input are selected and then summarized in an abstractive fashion. Hence, summarization can be accomplished via a single end-to-end network or as a pipeline of extractive and abstractive components. To that end, Fairseq provides all the necessary tools to be successful in our endeavor. It features either end-to-end models such as the classical Transformer, different types of Language Models and pre-trained versions that enable researchers to focus on what matters most—to build state-of-the-art models that generate valuable reports.

However, we are not just summarizing the transcribed conversation; we generate high-quality medical reports, which have many considerations.

  • Every section of a medical report is different in terms of content, structure, fluency, etc.
  • All medical facts mentioned in the conversation should be present in the report, for example, a particular treatment or dosage.
  • In the healthcare domain, the vocabulary is extensive, and models need to deal with medical terminology.
  • Patient-doctor conversations are usually much longer than the final report.

All these challenges require our researchers to run a battery of extensive experiments. Thanks to the flexibility of PyTorch and Fairseq, their productivity has greatly increased. Further, the ecosystem offers an easy path from ideation, implementation, experimentation, and final roll-out to production. Using multiple GPUs or CPUs is as simple as providing an additional argument to the tools, and because of the tight Python integration, PyTorch code can be easily debugged.

In our continuous effort to contribute to the open-source community, features have been developed at Nuance and pushed to the Fairseq GitHub repository. These try to overcome some of the challenges mentioned such as, facilitating copying of, especially rare or unseen, words from the input to summary, training speedups by improving Tensor Core utilization, and ensuring TorchScript compatibility of different Transformer configurations. Following, we will show an example of how to train a Transformer model with a Pointer Generator mechanism (Transformer-PG), which can copy words from the input.

How to build a Transformer model with a Pointer Generator mechanism

In this step-by-step guide, it is assumed the user has already installed PyTorch and Fairseq.

1. Create a vocabulary and extend it with source position markers:

These markers will allow the model to point to any word in the input sequence.

vocab_size=<vocab_size>
position_markers=512
export LC_ALL=C
cat train.src train.tgt |
  tr -s '[:space:]' 'n' |
  sort |
  uniq -c |
  sort -k1,1bnr -k2 |
  head -n "$((vocab_size - 4))" |
  awk '{ print $2 " " $1 }' > dict.pg.txt
python3 -c "[print('<unk-{}> 0'.format(n)) for n in range($position_markers)]" >> dict.pg.txt

This will create a file “dict.pg.txt” that contains the <vocab_size> most frequent words followed by 512 position markers named from “<unk-0>” to “<unk-511>”.

In case we have an input like

src = "Hello, I'm The Dogtor"

it could happen that our model has been trained without the word “Dogtor” in its vocabulary. Therefore, when we feed this sequence into the model, it should be converted to:

src = "Hello, I'm The <unk-3>"

Now, “<unk-3>” is part of our vocabulary and could be predicted by the model (this is where the pointer-generator comes in). In such a case, we will only need to post-process the output to replace “<unk-3>” by the word at input position 3.

2. Preprocess the text data to replace unknown words by its positional markers:

We can use the scripts from https://github.com/pytorch/fairseq/tree/master/examples/pointer_generator.

# Considering we have our data in:
# train_src = /path/to/train.src
# train_tgt = /path/to/train.tgt
# valid_src = /path/to/valid.src
# valid_tgt = /path/to/valid.tgt
./preprocess.py --source /path/to/train.src 
                --target /path/to/train.tgt 
                --vocab <(cut -d' ' -f1 dict.pg.txt) 
                --source-out /path/to/train.pg.src 
                --target-out /path/to/train.pg.tgt

./preprocess.py --source /path/to/valid.src 
                --target /path/to/valid.tgt 
                --vocab <(cut -d' ' -f1 dict.pg.txt) 
                --source-out /path/to/valid.pg.src 
                --target-out /path/to/valid.pg.tgt

./preprocess.py --source /path/to/test.src 
                --vocab <(cut -d' ' -f1 dict.pg.txt) 
                --source-out /path/to/test.pg.src

3. Now let’s binarize the data, so that it can be processed faster:

fairseq-preprocess --task "translation" 
                   --source-lang "pg.src" 
                   --target-lang "pg.tgt" 
                   --trainpref /path/to/train 
                   --validpref /path/to/valid 
                   --srcdict dict.pg.txt 
                   --cpu 
                   --joined-dictionary 
                   --destdir <data_dir>

You might notice the type of task is “translation”. This is because there is no “summarization” task available; we could understand it as a kind of NMT task where the input and output languages are shared and the output (summary) is shorter than the input.

4. Now we can train the model:

fairseq-train <data_dir> 
              --save-dir <model_dir> 
              --task "translation" 
              --source-lang "src" 
              --target-lang "tgt" 
              --arch "transformer_pointer_generator" 
              --max-source-positions 512 
              --max-target-positions 128 
              --truncate-source 
              --max-tokens 2048 
              --required-batch-size-multiple 1 
              --required-seq-len-multiple 8 
              --share-all-embeddings 
              --dropout 0.1 
              --criterion "cross_entropy" 
              --optimizer adam 
              --adam-betas '(0.9, 0.98)' 
              --adam-eps 1e-9 
              --update-freq 4 
              --lr 0.004 
              # Pointer Generator
              --alignment-layer -1 
              --alignment-heads 1 
              --source-position-markers 512

This configuration makes use of features Nuance has contributed back to Fairseq:

  • Transformer with a Pointer Generator mechanism to facilitate copying of words from the input.
  • Sequence length padded to a multiple of 8 to better use tensor cores and reduce training time.

5. Now let’s take a look at how to generate a summary with our new medical report generation system:

import torch
from examples.pointer_generator.pointer_generator_src.transformer_pg import TransformerPointerGeneratorModel

# Patient-Doctor conversation
input = "[doctor] Lisa Simpson, thirty six year old female, presents to the clinic today because " 
        "she has severe right wrist pain"

# Load the model
model = TransformerPointerGeneratorModel.from_pretrained(data_name_or_path=<data_dir>,
                                                         model_name_or_path=<model_dir>,
                                                         checkpoint_file="checkpoint_best.pt")

result = model.translate([input], beam=2)

print(result[0])
Ms. <unk-2> is a 36-year-old female who presents to the clinic today for evaluation of her right wrist.

6. Alternatively, we can use fairseq-interactive and a postprocessing tool to substitute positional unknown tokens by its words from the input:

fairseq-interactive <data_dir> 
              --batch-size <batch_size> 
              --task translation 
              --source-lang src 
              --target-lang tgt 
              --path <model_dir>/checkpoint_last.pt 
              --input /path/to/test.pg.src 
              --buffer-size 20 
              --max-len-a 0 
              --max-len-b 128 
              --beam 2 
              --skip-invalid-size-inputs-valid-test | tee generate.out

grep "^H-" generate.out | cut -f 3- > generate.hyp

./postprocess.py 
	--source <(awk 'NF<512' /path/to/test.pg.src) 
	--target generate.hyp 
	--target-out generate.hyp.processed

Now we have the final set of reports in “generate.hyp.processed”, with “<unk-N>” replaced by the original word from the input sequence.

Model Deployment

PyTorch offers great flexibility in modeling and a rich surrounding ecosystem. However, while several recent articles have suggested that the use of PyTorch in research and academia may be close to surpassing TensorFlow, there seems to be an overall sense of TensorFlow being the preferred platform for deployment to production. Is this still the case in 2021? Teams looking to serve their PyTorch models in production have a few options.

Before describing our journey, let’s take a brief detour and define the term model.

Models as computation graphs

A few years back, it was still common for machine learning toolkits to support only particular classes of models of a rather fixed and rigid structure, with only a few degrees of freedom (like the kernel of a support vector machine or the number of hidden layers of a neural network). Inspired by foundational work in Theano, toolkits like Microsoft’s CNTK or Google’s TensorFlow were among the first to popularize a more flexible view on models, as computation graphs with associated parameters that can be estimated from data. This view blurred the boundaries between popular types of models (such as DNNs or SVMs), as it became easy to blend the characteristics of each into your type of graph. Still, such a graph had to be defined upfront before estimating its parameters, and it was pretty static. This made it easy to save models to a self-contained bundle, like a TensorFlow SavedModel (such a bundle simply contains the structure of the graph, as well as the concrete values of the estimated parameters). However, debugging such models can be difficult because the statements in the Python code that build the graph are logically separate from the lines that execute it. Researchers also long for easier ways of expressing dynamic behavior, such as the computation steps of the forward pass of a model being conditionally dependent on its input data (or its previous output).

Most recently, the above limitations have led to a second revolution spearheaded by PyTorch and TensorFlow 2. The computation graph is no longer defined explicitly. Instead, it will be populated implicitly as the Python code executes operations on tensor arguments. An essential technique that powers this development is automatic differentiation. As the computation graph is being built implicitly while executing the steps of the forward pass, all the necessary data will be tracked for later computation of the gradient concerning the model parameters. This allows for great flexibility in training a model, but it raises an important question. If the computation happening inside a model is only implicitly defined through our Python code’s steps as it executes concrete data, what is it that we want to save as a model? The answer – at least initially – was the Python code with all its dependencies, along with the estimated parameters. This is undesirable for practical reasons. For instance, there is a danger that the team working on model deployment does not exactly reproduce the Python code dependencies used during training, leading to subtly divergent behavior. The solution typically consists of combining two techniques, scripting and tracing, that is, extra annotations in your Python code and execution of your code on exemplary input data, allowing PyTorch to define and save the graph that should be executed during later inference on new, unseen data. This requires some discipline by whoever creates the model code (arguably voiding some of the original flexibility of eager execution), but it results in a self-contained model bundle in TorchScript format. The solution in TensorFlow 2 is remarkably similar.

Serving our report generation models

Our journey in deploying the report generation models reflects the above discussion. We started out serving our models by deploying the model code and its dependencies along with the parameter checkpoints in a custom Docker image exposing a gRPC service interface. However, we soon noticed that it became error-prone to replicate the exact code and environment used by the modeling team while estimating the parameters. Moreover, this approach prevented us from leveraging high-performance model serving frameworks like NVIDIA’s Triton, which is written in C++ and requires self-contained models that can be used without a Python interpreter. At this stage, we were facing a choice between attempting to export our PyTorch models to ONNX or TorchScript format. ONNX is an open specification for representing machine learning models that increasingly finds adoption. It is powered by a high-performance runtime developed by Microsoft (ONNX Runtime). Working closely with the ONNX team at Microsoft, we discovered that some operations that our models require were not yet supported in ONNX. Consequently, we turned our attention to TorchScript, the mechanism more native to PyTorch. Through a combination of tracing and scripting, annotating our code where needed, we succeeded and obtained self-contained TorchScript models that Triton could serve. This improved our deployment path considerably. We no longer had to worry about the code dependencies and now had the option of using Triton for high-performance model serving on NVIDIA GPUs.

A maturing ecosystem

Is it all roses? No, it has been a rockier journey than we expected. We encountered what seems to be a memory leak in the MKL libraries used by PyTorch while serving the PyTorch code directly. We encountered deadlocks in trying to load multiple models from multiple threads. We had difficulties exporting our models to ONNX and TorchScript formats. Models would not work out-of-the-box on hardware with multiple GPUs, they always accessed the particular GPU device on which they were exported. We encountered excessive memory usage in the Triton inference server while serving TorchScript models, which we found out was due to automatic differentiation accidentally being enabled during the forward pass. However, the ecosystem keeps improving, and there is a helpful and vibrant open-source community eager to work with us to mitigate such issues. Finally, for those of us that require enterprise-level support, Microsoft now offers Premier Support for use of PyTorch on Azure.

Where to go from here? For those that require the flexibility of serving PyTorch code directly, without going through the extra step of exporting self-contained models, it is worth pointing out that the TorchServe project now provides a way of bundling the code together with parameter checkpoints into a single servable archive, greatly reducing the risk of code and parameters running apart. To us, however, exporting models to TorchScript has proven beneficial. It provides a clear interface between modeling and deployment teams, and TorchScript further reduces the latency when serving models on GPU via its just-in-time compilation engine.

Scaling at large and the future

Finally, efficient deployment to the cloud is about more than just computing the response of a single model instance efficiently. Flexibility is needed in managing, versioning and updating models. High-level scalability must be achieved via techniques such as load-balancing, horizontal scaling and vertical scaling. If many models are involved, scale-to-zero quickly becomes a topic as it is unacceptable to pay for serving models that do not answer any requests. Providing such extra functionality on top of a low-level inference server like Triton is the job of an orchestration framework. After gaining some first experience with KubeFlow, to that end, we decided to turn our attention to Azure ML, which provides similar functionality but integrates more deeply with the Azure platform, on which we crucially rely for large parts of our technology stack already. This part of our journey has just begun.

Conclusion

Academia has long recognized that we are “standing on the shoulders of giants.” As Artificial Intelligence is maturing from a scientific discipline into technology, the same spirit of collaboration that originally fueled its scientific foundation has carried over into the world of software engineering. Open-source enthusiasts join technology companies worldwide to build open software ecosystems that allow for new angles at solving some of the most pressing challenges of modern society. In this article, we’ve taken a look at Nuance’s Dragon Ambient eXperience, an AI-powered, voice-enabled solution that automatically documents patient care, reducing healthcare providers’ administrative burdens. Nuance DAX improves the patient-provider experience, reduces physician burnout, and improves financial outcomes. It brings back trust, joy, and humanity to the delivery of healthcare. Fairseq and PyTorch have proven to be an incredible platform for powering this AI technology, and in turn, Nuance has contributed back some of its innovations in this space. For further reading, we invite you to take a look at our recent ACL publication and the Nuance “What’s Next” blog.

Read More

Technique protects privacy when making online recommendations

Algorithms recommend products while we shop online or suggest songs we might like as we listen to music on streaming apps.

These algorithms work by using personal information like our past purchases and browsing history to generate tailored recommendations. The sensitive nature of such data makes preserving privacy extremely important, but existing methods for solving this problem rely on heavy cryptographic tools requiring enormous amounts of computation and bandwidth.

MIT researchers may have a better solution. They developed a privacy-preserving protocol that is so efficient it can run on a smartphone over a very slow network. Their technique safeguards personal data while ensuring recommendation results are accurate.

In addition to user privacy, their protocol minimizes the unauthorized transfer of information from the database, known as leakage, even if a malicious agent tries to trick a database into revealing secret information.

The new protocol could be especially useful in situations where data leaks could violate user privacy laws, like when a health care provider uses a patient’s medical history to search a database for other patients who had similar symptoms or when a company serves targeted advertisements to users under European privacy regulations.

“This is a really hard problem. We relied on a whole string of cryptographic and algorithmic tricks to arrive at our protocol,” says Sacha Servan-Schreiber, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and lead author of the paper that presents this new protocol.

Servan-Schreiber wrote the paper with fellow CSAIL graduate student Simon Langowski and their advisor and senior author Srinivas Devadas, the Edwin Sibley Webster Professor of Electrical Engineering. The research will be presented at the IEEE Symposium on Security and Privacy.

The data next door

The technique at the heart of algorithmic recommendation engines is known as a nearest neighbor search, which involves finding the data point in a database that is closest to a query point. Data points that are mapped nearby share similar attributes and are called neighbors.

These searches involve a server that is linked with an online database which contains concise representations of data point attributes. In the case of a music streaming service, those attributes, known as feature vectors, could be the genre or popularity of different songs.

To find a song recommendation, the client (user) sends a query to the server that contains a certain feature vector, like a genre of music the user likes or a compressed history of their listening habits. The server then provides the ID of a feature vector in the database that is closest to the client’s query, without revealing the actual vector. In the case of music streaming, that ID would likely be a song title. The client learns the recommended song title without learning the feature vector associated with it.

“The server has to be able to do this computation without seeing the numbers it is doing the computation on. It can’t actually see the features, but still needs to give you the closest thing in the database,” says Langowski.

To achieve this, the researchers created a protocol that relies on two separate servers that access the same database. Using two servers makes the process more efficient and enables the use of a cryptographic technique known as private information retrieval. This technique allows a client to query a database without revealing what it is searching for, Servan-Schreiber explains.

Overcoming security challenges

But while private information retrieval is secure on the client side, it doesn’t provide database privacy on its own. The database offers a set of candidate vectors — possible nearest neighbors — for the client, which are typically winnowed down later by the client using brute force. However, doing so can reveal a lot about the database to the client. The additional privacy challenge is to prevent the client from learning those extra vectors. 

The researchers employed a tuning technique that eliminates many of the extra vectors in the first place, and then used a different trick, which they call oblivious masking, to hide any additional data points except for the actual nearest neighbor. This efficiently preserves database privacy, so the client won’t learn anything about the feature vectors in the database.  

Once they designed this protocol, they tested it with a nonprivate implementation on four real-world datasets to determine how to tune the algorithm to maximize accuracy. Then, they used their protocol to conduct private nearest neighbor search queries on those datasets.

Their technique requires a few seconds of server processing time per query and less than 10 megabytes of communication between the client and servers, even with databases that contained more than 10 million items. By contrast, other secure methods can require gigabytes of communication or hours of computation time. With each query, their method achieved greater than 95 percent accuracy (meaning that nearly every time it found the actual approximate nearest neighbor to the query point). 

The techniques they used to enable database privacy will thwart a malicious client even if it sends false queries to try and trick the server into leaking information.

“A malicious client won’t learn much more information than an honest client following protocol. And it protects against malicious servers, too. If one deviates from protocol, you might not get the right result, but they will never learn what the client’s query was,” Langowski says.

In the future, the researchers plan to adjust the protocol so it can preserve privacy using only one server. This could enable it to be applied in more real-world situations, since it would not require the use of two noncolluding entities (which don’t share information with each other) to manage the database.  

“Nearest neighbor search undergirds many critical machine-learning driven applications, from providing users with content recommendations to classifying medical conditions. However, it typically requires sharing a lot of data with a central system to aggregate and enable the search,” says Bayan Bruss, head of applied machine-learning research at Capital One, who was not involved with this work. “This research provides a key step towards ensuring that the user receives the benefits from nearest neighbor search while having confidence that the central system will not use their data for other purposes.”

Read More

A Generalist Agent

Inspired by progress in large-scale language modelling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.Read More