Reducing Bias and Improving Safety in DALL·E 2

Reducing Bias and Improving Safety in DALL·E 2

Today, we are implementing a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population. This technique is applied at the system level when DALL·E is given a prompt describing a person that does not specify race or gender, like “firefighter.”

Based on our internal evaluation, users were 12× more likely to say that DALL·E images included people of diverse backgrounds after the technique was applied. We plan to improve this technique over time as we gather more data and feedback.


A photo of a CEO

In April, we started previewing the DALL·E 2 research to a limited number of people, which has allowed us to better understand the system’s capabilities and limitations and improve our safety systems.

During this preview phase, early users have flagged sensitive and biased images which have helped inform and evaluate this new mitigation.

We are continuing to research how AI systems, like DALL·E, might reflect biases in its training data and different ways we can address them.

During the research preview we have taken other steps to improve our safety systems, including:

  • Minimizing the risk of DALL·E being misused to create deceptive content by rejecting image uploads containing realistic faces and attempts to create the likeness of public figures, including celebrities and prominent political figures.
  • Making our content filters more accurate so that they are more effective at blocking prompts and image uploads that violate our content policy while still allowing creative expression.
  • Refining automated and human monitoring systems to guard against misuse.

These improvements have helped us gain confidence in the ability to invite more users to experience DALL·E.

Expanding access is an important part of our deploying AI systems responsibly because it allows us to learn more about real-world use and continue to iterate on our safety systems.


OpenAI

Living on the Edge: New Features for NVIDIA Fleet Command Deliver All-in-One Edge AI Management, Maintenance for Enterprises

NVIDIA Fleet Command — a cloud service for deploying, managing and scaling AI applications at the edge — today introduced new features that enhance the seamless management of edge AI deployments around the world.

With the scale of edge AI deployments, organizations can have up to thousands of independent edge locations that must be managed by IT teams — sometimes in far-flung locations like oil rigs, weather gauges, distributed retail stores or industrial facilities.

NVIDIA Fleet Command offers a simple, managed platform for container orchestration that makes it easy to provision and deploy AI applications and systems at thousands of distributed environments, all from a single cloud-based console.

But deployment is just the first step in managing AI applications at the edge. Optimizing these applications is a continuous process that involves applying patches, deploying new applications and rebooting edge systems.

To make these workflows seamless in a managed environment, Fleet Command now offers advanced remote management, multi-instance GPU provisioning and additional integrations with tools from industry collaborators.

Advanced Remote Management 

IT administrators now can access systems and applications with sophisticated security features. Remote management on Fleet Command offers access controls and timed sessions, eliminating vulnerabilities that come with traditional VPN connections. Administrators can securely monitor activity and troubleshoot issues at remote edge locations from the comfort of their offices.

Edge environments are extremely dynamic — which means administrators responsible for edge AI deployments need to be highly nimble to keep up with rapid changes and ensure little deployment downtime. This makes remote management a critical feature for every edge AI deployment.

Check out a complete walkthrough of the new remote management features and how they can be used to help administrators maintain and optimize even the largest edge deployments.

Multi-Instance GPU Provisioning 

Multi-Instance GPU, or MIG, partitions an NVIDIA GPU into several independent instances. MIG is now available on Fleet Command, letting administrators easily assign applications to each instance from the Fleet Command user interface. By allowing organizations to run multiple AI applications on the same GPU, MIG lets organizations right-size their deployments and get the most out of their edge infrastructure.

Learn more about how administrators can use MIG in Fleet Command to better optimize edge resources to scale new workloads with ease.

Working Together to Expand AI

New Fleet Command collaborations are also helping enterprises create a seamless workflow, from development to deployment at the edge.

Domino Data Lab provides an enterprise MLOps platform that allows data scientists to collaboratively develop, deploy and monitor AI models at scale using their preferred tools, languages and infrastructure. The Domino platform’s integration with Fleet Command gives data science and IT teams a single system of record and consistent workflow with which to manage models deployed to edge locations.

Milestone Systems, a leading provider of video management systems and NVIDIA Metropolis elite partner, created AI Bridge, an application programming interface gateway that makes it easy to give AI applications access to consolidated video feeds from dozens of camera streams. Now integrated with Fleet Command, Milestone AI Bridge can be easily deployed to any edge location.

IronYun, an NVIDIA Metropolis elite partner and top-tier member of the NVIDIA Partner Network, with its Vaidio AI platform applies advanced AI, evolved over multiple generations, to security, safety and operational applications worldwide. Vaidio is an open platform that works with any IP camera and integrates out of the box with dozens of market-leading video management systems. Vaidio can be deployed on premises, in the cloud, at the edge and in hybrid environments. Vaidio scales from one to thousands of cameras. Fleet Command makes it easier to deploy Vaidio AI at the edge and simplifies management at scale.

With these new features and expanded collaborations, Fleet Command ensures that the day-to-day process of maintaining, monitoring and optimizing edge deployments is straightforward and painless.

Test Drive Fleet Command

To try these features on Fleet Command, check out NVIDIA LaunchPad for free.

LaunchPad provides immediate, short-term access to a Fleet Command instance to easily deploy and monitor real applications on real servers using hands-on labs that walk users through the entire process — from infrastructure provisioning and optimization to application deployment for use cases like deploying vision AI at the edge of a network.

The post Living on the Edge: New Features for NVIDIA Fleet Command Deliver All-in-One Edge AI Management, Maintenance for Enterprises appeared first on NVIDIA Blog.

Read More

Confidential Containers: Verifiably secure computation in the cloud

White lock within a geometric circle over top a blue to orange color gradient background

For many organizations, trusting their data to the cloud requires having a complete understanding of and control over the environment in which that data resides and how it’s being processed. Microsoft understands this, and we are committed to building a trustworthy cloud—one in which security, privacy, and transparency are built into its core. A key part of this vision is confidential computing—a set of hardware and software capabilities that give data owners visibility into the data environment and verifiable security protection of their data in use. 

The Confidential Computing team at Microsoft Research is collaborating with hardware developers to create trusted execution environments (TEEs), where data stays encrypted not just when stored (encryption at rest) and in transit, but also during use. This work underpins the Azure confidential cloud platform, where users can upload encrypted code and data and get encrypted results back with strong privacy. 

At Microsoft Build 2022, the company announced serverless confidential containers with lift-and-shift support, the next step in the evolution of confidential computing. This service builds on the Confidential Containers work conducted at Microsoft Research. Confidential Containers offers a verifiably secure container environment in Azure where users can confirm that the software performing computations on their data is exactly the software they expect to be running, that it will do what they want it to do with their data, and that they can trust the results it returns. Confidential Containers enables users to take existing container workloads, and with a small amount of configuration, use them in a confidential environment.

Smaller trusted computing base 

Confidential Containers decreases the size of the trusted computing base (TCB)—the totality of elements in a computing environment that must be trusted not to violate the confidentiality of computation. The TCB can include software, hardware, and human administrators, among other things. By removing elements from the TCB, the components that can be compromised are reduced, decreasing the attack surface. Confidential Containers removes Microsoft administrators from the TCB, minimizing it as much as possible while still enabling customers to run existing workloads without modifying them.

This reduced TCB provides an option for organizations that currently run computations on their data on premises because they are concerned about the security of their data in the cloud. Even though setting up a computation environment in the cloud offers flexibility, data can be exposed to anyone who operates the servers on which the system runs. With Confidential Containers, the individuals who can access the data can be tightly controlled. This can be a single designated employee of the organization that owns the data or the business partner that is processing the data. It is never a Microsoft employee or another third party. 

Encrypted, policy-constrained computing environment 

A secure hardware environment enables data protection in use. Confidential Containers runs on AMD processors backed by AMD Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP), which provides a TEE. This hardware-enforced security boundary provides a shield so that nothing outside the encrypted memory space can read the data.

Users of Confidential Containers create a policy defining precisely what can run in the confidential container environment and how. The AMD SEV-SNP hardware produces an attestation report, which provides a succinct representation of everything in the confidential environment, including information about the code that will be enforcing the policy. Users can request this attestation report any time before providing the container with a key to unlock the encrypted dataset for processing. 

A cloud outline within a security shield over top a blue to orange color gradient background.

Sensitive data handling in the cloud 

Before the development of HTTPS, businesses could not securely run a storefront on the public web because communication over the internet was not secure. In the same way, today individuals and organizations cannot run containerized computation over sensitive data in the public cloud. Confidential Containers addresses this need. 

This is a game-changer for organizations that must comply with local and international regulations on how sensitive data is handled. For example, healthcare organizations that store encrypted patient information in the cloud are required by HIPAA regulations to download that data to perform computations on premises. This multistep process entails decrypting the data once it has been downloaded to an organization’s servers, performing the required computations, and then re-encrypting the data before re-uploading it to the cloud. It also requires ensuring that the on-premises environment contains the security architecture necessary to comply with HIPAA and other regulations. 

Because Confidential Containers provides advanced security safeguards for data in use in Azure, organizations no longer need to perform these time-consuming steps. This also means they no longer need to maintain servers on premises. Moreover, Azure users can define even stricter policies for their container environment in the cloud than they have in place in their on-premises environment.

Secure multiparty computations 

Another benefit of Confidential Containers is they enable secure multiparty computations. A single organization can securely process multiple datasets that contain sensitive information, or multiple organizations with datasets that must remain secure can share those datasets with the assurance that their data will not leak. Organizations can perform computations on multiple datasets, such as for training a machine learning model, and gain better results than they would if performing computations on a single dataset, all without knowing what is in those datasets. 

Easy deployment and lift-and-shift of Linux containers 

Creating a confidential container is straightforward for Azure users who are currently using or getting ready to use containers, requiring a small amount of configuration to move existing workloads. Linux users can easily lift-and-shift their Linux containers to Confidential Containers on Azure. 

Unlimited potential with Confidential Containers 

We believe that in the future, all computing in the cloud will be confidential, and we’re excited to share Confidential Containers—a technology that plays a role in making this happen. The capabilities it provides will have implications that we have yet to imagine. We’re particularly excited by the potential of multiparty computations. The ability to perform computations in a protected environment on multiple datasets brings limitless possibilities, unlocking great value to Azure users. 

Confidential Containers is currently available for limited preview and will be available for public preview later this year. Sign up for the Confidential Containers preview. 

The post Confidential Containers: Verifiably secure computation in the cloud appeared first on Microsoft Research.

Read More

CORSAIR Integrates NVIDIA Broadcast’s Audio, Video AI Features in iCUE and Elgato Software This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology accelerates creative workflows. 

Technology company CORSAIR and streaming sensation BigCheeseKIT step In the NVIDIA Studio this week.

A leader in high-performance gear and systems for gamers, content creators and PC enthusiasts, CORSAIR has integrated NVIDIA Broadcast technologies into its hardware and iCUE software. Similar AI enhancements have also been added to Elgato’s audio and video software, Wave Link and Camera Hub.

Powerful Broadcast AI audio and video effects transform content creation stations into home studios.

Creators and gamers with GeForce RTX GPUs can benefit from NVIDIA Broadcast’s AI enhancements to CORSAIR and Elgato microphones and cameras, elevating their live streams, voice chats and video conference calls.

Plus, entertainer Jakeem Johnson, better known by his Twitch name BigCheeseKIT, demonstrates how a GeForce RTX 3080 GPU elevates his thrilling streams with AI-powered benefits.

Advanced AI Audio

The NVIDIA Broadcast integration enables AI-powered noise removal and room echo removal audio features in CORSAIR iCUE and Elgato Wave Link that unlocks new levels of clarity and sharpness for an exceptional audio experience.

 

Noise removal in Elgato Wave Link is built as a virtual studio technology (VST), enabling users to apply the effect per audio channel, and supported in compatible creative apps such as Adobe Premiere Pro, Audition and Blackmagic DaVinci Resolve.

Running on Tensor Cores on GeForce RTX GPUs, the new features use AI to identify users’ voices, separating them from other ambient sounds. This results in noise cancellation that dramatically improves audio and video call quality. Background noises from fans, chatter, pets and more disappear, leaving the speaker’s voice crystal clear.

Broadcast also cancels room echoes, providing dampened, studio-quality acoustics in a wide range of environments without the need to sound-proof walls or ceilings.

CORSAIR’s integration takes a new version of these effects that can separate body sounds. This upgrade adds support to popular capabilities, like muting the friend who forgets to turn on the push-to-talk feature on a video call while they chew their lunch.

AI audio effects are ready to be integrated into nearly the entire lineup of CORSAIR headsets.

These Broadcast features are available on nearly the entire lineup of CORSAIR headsets. Users seeking a premium audio experience should consider headsets like the VOID RGB ELITE WIRELESS with 7.1 surround sound, HS80 RGB WIRELESS with spatial audio or the VIRTUOSO RGB WIRELESS SE.

The Elgato Wave XLR unlocks AI audio effects.

For Elgato creators, noise removal can now be enabled in the Wave Link app. This makes AI-enhanced audio possible for Wave mic users, plus XLR microphones thanks to the Elgato Wave XLR.

Unrestr(AI)ned Video Effects

NVIDIA Broadcast’s video technologies integrated into the Elgato Camera Hub include the virtual background feature.

The ‘background replacement’ AI video feature.

AI-enhanced filters powered by GeForce RTX GPUs offer better edge detection to produce a high-quality visual — much like those produced by a DSLR camera — using just a webcam. Supported effects include blur and replacing the background with a video or still image; eliminating the need for a greenscreen.

 

Background blur and background replacement are now available in Elgato Camera Hub. Creators can apply AI video effects with Facecam, or their studio camera using Cam Link 4K.

Set Up for Streaming Success

Accessing these NVIDIA Broadcast technologies is fast and simple.

If an eligible CORSAIR headset or the ST100 headset stand is recognized by iCUE, it will automatically prompt installation of the NVIDIA Broadcast Audio Effects.

Elgato Camera Hub now features a new Effects tab. Once selected, users will be prompted to download and install Broadcast Video Effects. For Elgato Wave Link, creators will first need to install the Broadcast Audio Effects, followed by the new noise removal VST.

After installation, Broadcast options will appear within iCUE, Wave Link and Camera Hub.

Check out the installation instructions and FAQ.

Broadcast features require GeForce RTX GPUs that can be found in the latest NVIDIA Studio laptops and desktops. These purpose-built systems feature vivid color displays, along with blazing-fast memory and storage to boost streams and all creative work.

Pick up an NVIDIA Studio system today to turn streams into dreams.

Stream Like a Boss In the NVIDIA Studio

If there’s one thing BigCheeseKIT encapsulates, it’s energy.

BigCheeseKIT enjoyed early success as a Golden Joystick award nominee, serving as an ambassador for Twitch and Norton Gaming. He said that the highlight of his career, undoubtedly, was joining T-Pain’s exclusive gaming label, Nappy Boy Gaming.

A natural entertainer, BigCheeseKIT’s presence, gaming knowledge and authenticity dazzle his 60,000+ subscribers. Powered by his GeForce RTX 3080 GPU and live-streaming optimizations from NVIDIA Studio — such as better performance in OBS Studio, BigCheeseKIT has the resources and know-how to host professional streams.

“It’s like having my own television channel, and I’m the host or entertainer,” said the artist.

BigCheeseKIT streams exclusively with OBS Studio, benefitting massively from the dedicated GPU-based NVIDIA Studio encoder (NVENC), which enables seamless streaming with maximum performance.

“Using NVENC with my live streams makes my quality 20x better,” said BigCheeseKIT. “I can definitely see the difference.”

“Quality and consistency,” BigCheeseKIT noted. “NVIDIA hasn’t failed me.”

OBS Studio’s advanced GPU-accelerated encoding also unlocks higher video quality for streaming and recorded videos. Once he started using it, BigCheeseKIT’s system immediately became built to broadcast.

For on-demand videos, BigCheeseKIT prefers to edit using VEGAS Pro. MAGIX’s professional video editing software takes advantage of GPU-accelerated video effects while using NVENC for faster encoding. Overall, the artist said that his creative workflow — charged by his GPU — became faster and easier, saving valuable time.

For aspiring streamers, BigCheeseKIT offered these words of wisdom: “Stream like everyone is watching. Be yourself, have fun and don’t let negativity get to you.”

Nappy Boy Gaming’s newest member: BigCheeseKIT.

Head over to BigCheeseKIT’s Twitch channel to subscribe, learn more and check out his videos.

NVIDIA Broadcast and the SDKs behind it — which enable third-party integrations like the ones described above — are part of the NVIDIA Studio tools that include AI-powered software and NVIDA Studio Drivers.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by signing up for the NVIDIA Studio newsletter.

The post CORSAIR Integrates NVIDIA Broadcast’s Audio, Video AI Features in iCUE and Elgato Software This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Read More

Perceiver AR: general-purpose, long-context autoregressive generation

We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms.Read More

Perceiver AR: general-purpose, long-context autoregressive generation

We develop Perceiver AR, an autoregressive, modality-agnostic architecture which uses cross-attention to map long-range inputs to a small number of latents while also maintaining end-to-end causal masking. Perceiver AR can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation without the need for hand-crafted sparsity patterns or memory mechanisms.Read More

Build a news-based real-time alert system with Twitter, Amazon SageMaker, and Hugging Face

Today, social media is a huge source of news. Users rely on platforms like Facebook and Twitter to consume news. For certain industries such as insurance companies, first respondents, law enforcement, and government agencies, being able to quickly process news about relevant events occurring can help them take action while these events are still unfolding.

It’s not uncommon for organizations trying to extract value from text data to look for a solution that doesn’t involve the training of a complex NLP (natural language processing) model. For those organizations, using a pre-trained NLP model is more practical. Furthermore, if the chosen model doesn’t satisfy their success metrics, organizations want to be able to easily pick another model and reassess.

At present, it’s easier than ever to extract information from text data thanks to the following:

  • The rise of state-of-the art, general-purpose NLP architectures such as transformers
  • The ability that developers and data scientists have to quickly build, train, and deploy machine learning (ML) models at scale on the cloud with services like Amazon SageMaker
  • The availability of thousands of pre-trained NLP models in hundreds of languages and with support for multiple frameworks provided by the community in platforms like Hugging Face Hub

In this post, we show you how to build a real-time alert system that consumes news from Twitter and classifies the tweets using a pre-trained model from the Hugging Face Hub. You can use this solution for zero-shot classification, meaning you can classify tweets at virtually any set of categories, and deploy the model with SageMaker for real-time inference.

Alternatively, if you’re looking for insights into your customer’s conversations and deepen brand awareness by analyzing social media interactions, we encourage you to check out the AI-Driven Social Media Dashboard. The solution uses Amazon Comprehend, a fully managed NLP service that uncovers valuable insights and connections in text without requiring machine learning experience.

Zero-shot learning

The fields of NLP and natural language understanding (NLU) have rapidly evolved to address use cases involving text classification, question answering, summarization, text generation, and more. This evolution has been possible, in part, thanks to the rise of state-of-the art, general-purpose architectures such as transformers, but also the availability of more and better-quality text corpora available for the training of such models.

The transformer architecture is a complex neural network that requires domain expertise and a huge amount of data in order to be trained from scratch. A common practice is to take a pre-trained state-of-the-art transformer like BERT, RoBERTa, T5, GPT-2, or DistilBERT and fine-tune (transfer learning) the model to a specific use case.

Nevertheless, even performing transfer learning on a pre-trained NLP model can often be a challenging task, requiring large amounts of labeled text data and a team of experts to curate the data. This complexity prevents most organizations from using these models effectively, but zero-shot learning helps ML practitioners and organizations overcome this shortcoming.

Zero-shot learning is a specific ML task in which a classifier learns on one set of labels during training, and then during inference is evaluated on a different set of labels that the classifier has never seen before. In NLP, you can use a zero-shot sequence classifier trained on a natural language inference (NLI) task to classify text without any fine-tuning. In this post, we use the popular NLI BART model bart-large-mnli to classify tweets. This is a large pre-trained model (1.6 GB), available on the Hugging Face model hub.

Hugging Face is an AI company that manages an open-source platform (Hugging Face Hub) with thousands of pre-trained NLP models (transformers) in more than 100 different languages and with support for different frameworks such as TensorFlow and PyTorch. The transformers library helps developers and data scientists get started in complex NLP and NLU tasks such as classification, information extraction, question answering, summarization, translation, and text generation.

AWS and Hugging Face have been collaborating to simplify and accelerate the adoption of NLP models. A set of Deep Learning Containers (DLCs) for training and inference in PyTorch or TensorFlow, and Hugging Face estimators and predictors for the SageMaker Python SDK are now available. These capabilities help developers with all levels of expertise get started with NLP easily.

Overview of solution

We provide a working solution that fetches tweets in real time from selected Twitter accounts. For the demonstration of our solution, we use three accounts, Amazon Web Services (@awscloud), AWS Security (@AWSSecurityInfo), and Amazon Science (@AmazonScience), and classify their content into one of the following categories: security, database, compute, storage, and machine learning. If the model returns a category with a confidence score greater than 40%, a notification is sent.

In the following example, the model classified a tweet from Amazon Web Services in the machine learning category, with a confidence score of 97%, generating an alert.
Outline of the solution
The solution relies on a Hugging Face pre-trained transformer model (from the Hugging Face Hub) to classify tweets based on a set of labels that are provided at inference time—the model doesn’t need to be trained. The following screenshots show more examples and how they were classified.
Some relevant examples
We encourage you to try the solution for yourself. Simply download the source code from the GitHub repository and follow the deployment instructions in the README file.

Solution architecture

The solution keeps an open connection to Twitter’s endpoint and, when a new tweet arrives, sends a message to a queue. A consumer reads messages from the queue, calls the classification endpoint, and, depending on the results, notifies the end user.

The following is the architecture diagram of the solution.
Scope of the solution
The solution workflow consists of the following components:

  1. The solution relies on Twitter’s Stream API to get tweets that match the configured rules (tweets from the accounts of interest) in real time. To do so, an application running inside a container keeps an open connection to Twitter’s endpoint. Refer to Twitter API for more details.
  2. The container runs on Amazon Elastic Container Service (Amazon ECS), a fully managed container orchestration service that makes it easy for you to deploy, manage, and scale containerized applications. A single task runs on a serverless infrastructure managed by AWS Fargate.
  3. The Twitter Bearer token is securely stored in AWS Systems Manager Parameter Store, a capability of AWS Systems Manager that provides secure, hierarchical storage for configuration data and secrets. The container image is hosted on Amazon Elastic Container Registry (Amazon ECR), a fully managed container registry offering high-performance hosting.
  4. Whenever a new tweet arrives, the container application puts the tweet into an Amazon Simple Queue Service (Amazon SQS) queue. Amazon SQS is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications.
  5. The logic of the solution resides in an AWS Lambda function. Lambda is a serverless, event-driven compute service. The function consumes new tweets from the queue and classifies them by calling an endpoint.
  6. The endpoint relies on a Hugging Face model and is hosted on SageMaker. The endpoint runs the inference and outputs the class of the tweet.
  7. Depending on the classification, the function generates a notification through Amazon Simple Notification Service (Amazon SNS), a fully managed messaging service. You can subscribe to the SNS topic, and multiple destinations can receive that notification (see Amazon SNS event destinations). For instance, you can deliver the notification to inboxes as email messages (see Email notifications).

Deploy Hugging Face models with SageMaker

You can select any of the over 10,000 publicly available models from the Hugging Face Model Hub and deploy them with SageMaker by using Hugging Face Inference DLCs.

When using AWS CloudFormation, you select one of the publicly available Hugging Face Inference Containers and configure the model and the task. This solution uses the facebook/bart-large-mnli model and the zero-shot-classification task, but you can choose any of the models under Zero-Shot Classification on the Hugging Face Model Hub. You configure those by setting the HF_MODEL_ID and HF_TASK environment variables in your CloudFormation template, as in the following code:

SageMakerModel:
  Type: AWS::SageMaker::Model
  Properties:
    ExecutionRoleArn: !GetAtt SageMakerModelRole.Arn
    PrimaryContainer:
      Image: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.7-transformers4.6-cpu-py36-ubuntu18.04
      Environment:
        HF_MODEL_ID: facebook/bart-large-mnli
        HF_TASK: zero-shot-classification
        SAGEMAKER_CONTAINER_LOG_LEVEL: 20
        SAGEMAKER_REGION: us-east-1

Alternatively, if you’re not using AWS CloudFormation, you can achieve the same results with few lines of code. Refer to Deploy models to Amazon SageMaker for more details.

To classify the content, you just call the SageMaker endpoint. The following is a Python code snippet:

endpoint_name = os.environ['ENDPOINT_NAME']
labels = os.environ['ENDPOINT_NAME']

data = {
    'inputs': tweet,
    'parameters': {
        'candidate_labels': labels,
        'multi_class': False
    }
}

response = sagemaker.invoke_endpoint(EndpointName=endpoint_name,
                                     ContentType='application/json',
                                     Body=json.dumps(data))

response_body = json.loads(response['Body'].read())

Note the False value for the multi_class parameter to indicate that the sum of all the probabilities for each class will add up to 1.

Solution improvements

You can enhance the solution proposed here by storing the tweets and the model results. Amazon Simple Storage Service (Amazon S3), an object storage service, is one option. You can write tweets, results, and other metadata as JSON objects into an S3 bucket. You can then perform ad hoc queries against that content using Amazon Athena, an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL.

You can use the history not only to extract insights but also to train a custom model. You can use Hugging Face support to train a model with your own data with SageMaker. Learn more on Run training on Amazon SageMaker.

Real-world use cases

Customers are already experimenting with Hugging Face models on SageMaker. Seguros Bolívar, a Colombian financial and insurance company founded in 1939, is an example.

“We developed a threat notification solution for customers and insurance brokers. We use Hugging Face pre-trained NLP models to classify tweets from relevant accounts to generate notifications for our customers in near-real time as a prevention strategy to help mitigate claims. A claim occurs because customers are not aware of the level of risk they are exposed to. The solution allows us to generate awareness in our customers, turning risk into something measurable in concrete situations.”

– Julian Rico, Chief of Research and Knowledge at Seguros Bolívar.

Seguros Bolívar worked with AWS to re-architecture their solution; it now relies on SageMaker and resembles the one described in this post.

Conclusion

Zero-shot classification is ideal when you have little data to train a custom text classifier or when you can’t afford to train a custom NLP model. For specialized use cases, when text is based on specific words or terms, it’s better to go with a supervised classification model based on a custom training set.

In this post, we showed you how to build a news classifier using a Hugging Face zero-shot model on AWS. We used Twitter as our news source, but you can choose a news source that is more suitable to your specific needs. Furthermore, you can easily change the model, just specify your chosen model in the CloudFormation template.

For the source code, refer to the GitHub repository It includes the full setup instructions. You can clone, change, deploy, and run it yourself. You can also use it as a starting point and customize the categories and the alert logic or build another solution for a similar use case.

Please give it a try, and let us know what you think. As always, we’re looking forward to your feedback. You can send it to your usual AWS Support contacts, or in the AWS Forum for SageMaker.


About the authors

David Laredo is a Prototyping Architect at AWS Envision Engineering in LATAM, where he has helped develop multiple machine learning prototypes. Previously he has worked as a Machine Learning Engineer and has been doing machine learning for over 5 years. His areas of interest are NLP, time series, and end-to-end ML.

Rafael Werneck is a Senior Prototyping Architect at AWS Envision Engineering, based in Brazil. Previously, he worked as a Software Development Engineer on Amazon.com.br and Amazon RDS Performance Insights.

Vikram Elango is an AI/ML Specialist Solutions Architect at Amazon Web Services, based in Virginia, USA. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. He is currently focused on natural language processing, responsible AI, inference optimization, and scaling ML across the enterprise. In his spare time, he enjoys traveling, hiking, cooking, and camping with his family.

Read More