Scaling Rufus, the Amazon generative AI-powered conversational shopping assistant with over 80,000 AWS Inferentia and AWS Trainium chips, for Prime Day

Scaling Rufus, the Amazon generative AI-powered conversational shopping assistant with over 80,000 AWS Inferentia and AWS Trainium chips, for Prime Day

Amazon Rufus is a shopping assistant experience powered by generative AI. It generates answers using relevant information from across Amazon and the web to help Amazon customers make better, more informed shopping decisions. With Rufus, customers can shop alongside a generative AI-powered expert that knows Amazon’s selection inside and out, and can bring it all together with information from across the web to help shoppers make more informed purchase decisions.

To meet the needs of Amazon customers at scale, Rufus required a low-cost, performant, and highly available infrastructure for inference. The solution needed the capability to serve multi-billion parameter large language models (LLMs) with low latency across the world to service its expansive customer base. Low latency makes sure users have a positive experience chatting with Rufus and can start getting responses in less than a second. To achieve this, the Rufus team is using multiple AWS services and AWS AI chips, AWS Trainium and AWS Inferentia.

Inferentia and Trainium are purpose-built chips developed by AWS that accelerate deep learning workloads with high performance and lower overall costs. With these chips, Rufus reduced its costs by 4.5 times lower than other evaluated solutions while maintaining low latency for its customers. In this post, we dive into the Rufus inference deployment using AWS chips and how this enabled one of the most demanding events of the year—Amazon Prime Day.

Solution overview

At its core, Rufus is powered by an LLM trained on Amazon’s product catalog and information from across the web. LLM deployment can be challenging, requiring you to balance factors such as model size, model accuracy, and inference performance. Larger models generally have better knowledge and reasoning capabilities but come at a higher cost due to more demanding compute requirements and increasing latency. Rufus would need to be deployed and scale to meet the tremendous demand of peak events like Amazon Prime Day. Considerations for this scale include how well it needs to perform, its environmental impact, and the cost of hosting the solution. To meet these challenges, Rufus used a combination of AWS solutions: Inferentia2 and Trainium, Amazon Elastic Container Service (Amazon ECS), and Application Load Balancer (ALB). In addition, the Rufus team partnered with NVIDIA to power the solution using NVIDIA’s Triton Inference Server, providing capabilities to host the model using AWS chips.

Rufus inference is a Retrieval Augmented Generation (RAG) system with responses enhanced by retrieving additional information such as product information from Amazon search results. These results are based on the customer query, making sure the LLM generates reliable, high-quality, and precise responses.

To make sure Rufus was best positioned for Prime Day, the Rufus team built a heterogeneous inference system using multiple AWS Regions powered by Inferentia2 and Trainium. Building a system across multiple Regions allowed Rufus to benefit in two key areas. First, it provided additional capacity that could be used during times of high demand, and second, it improved the overall resiliency of the system.

The Rufus team was also able to use both Inf2 and Trn1 instance types. Because Inf2 and Trn1 instance types use the same AWS Neuron SDK, the Rufus team was able to use both instances to serve the same Rufus model. The only configuration setting to adjust was the tensor parallelism degree (24 for Inf2, 32 for Trn1). Using Trn1 instances also led to an additional 20% latency reduction and throughput improvement compared to Inf2.

The following diagram illustrates the solution architecture.

To support real-time traffic routing across multiple Regions, Rufus built a novel traffic orchestrator. Amazon CloudWatch supported the underlying monitoring, helping the team adjust the traffic ratio across the different Regions in less than 15 minutes based on the traffic pattern changes. By using this type of orchestration, the Rufus team had the ability to direct requests to other Regions when needed, with a small trade-off of latency to the first token. Due to Rufus’s streaming architecture and the performant AWS network between Regions, the perceived latency was minimal for end-users.

These choices allowed Rufus to scale up over 80,000 Trainium and Inferentia chips across three Regions serving an average of 3 million tokens a minute while maintaining P99 less than 1 second latency to the first response for Prime Day customers. In addition, by using these purpose-built chips, Rufus achieved 54% better performance per watt than other evaluated solutions, which helped the Rufus team meet energy efficiency goals.

Optimizing inference performance and host utilization

Within each Region, the Rufus inference system used Amazon ECS, which managed the underlying Inferentia and Trainium powered instances. By managing the underlying infrastructure, the Rufus team only needed to bring their container and configuration by defining an ECS task. Within each container, an NVIDIA Triton Inference Server with a Python backend is used running vLLM with the Neuron SDK. vLLM is a memory-efficient inference and serving engine that is optimized for high throughput. The Neuron SDK makes it straightforward for teams to adopt AWS chips and supports many different libraries and frameworks such as PyTorch Lightning.

The Neuron SDK provides a straightforward LLM inference solution on Trainium and Inferentia hardware with optimized performance supporting a wide range of transformer-based LLM architectures. To reduce latency, Rufus has collaborated with the AWS Annapurna team to develop various optimizations such as INT8 (weight only) quantization, continuous batching with vLLM, resource, compute, and memory bandwidth in the Neuron compiler and runtime. These optimizations are currently deployed in Rufus production and are available to use in the Neuron SDK 2.18 and onward.

To reduce overall waiting time for customers to start seeing a response from Rufus, the team also developed an inference streaming architecture. With the high compute and memory load needed for LLM inference, the total time it takes to finish generating the full response for a customer query can take multiple seconds. With a streaming architecture, Rufus is able to return the tokens right after they’re generated. This optimization allows the customer to start consuming the response in less than 1 second. In addition, multiple services work together using gRPC connections to intelligently aggregate and enhance the streaming response in real time for customers.

As shown in the following figure, images and links are embedded in the response, which allow customers to engage and continue exploring with Rufus.

Scaling up

Although we have to maintain low latency for the best customer experience, it’s also crucial to scale the service throughput by achieving high hardware resource utilization. High hardware utilization makes sure accelerators don’t sit idle and needlessly increase costs. To optimize the inference system throughput, the team improved both single-host throughput as well as load balancing efficiency.

Load balancing for LLM inference is tricky due to following challenges. First, a single host can only handle a limited number of concurrent requests. Second, the end-to-end latency to complete one request can vary, spanning many seconds depending on the LLM response length.

To address the challenges, the team optimized throughput by considering both single-host throughput and throughput across many hosts using load balancing.

The team used the least outstanding requests (LOR) routing algorithm from ALB, increasing throughput by five times faster in comparison to an earlier baseline measurement. This allows each host to have enough time to process in-flight requests and stream back responses using a gRPC connection, without getting overwhelmed by multiple requests received at the same time. Rufus also collaborated with AWS and vLLM teams to improve single-host concurrency using vLLM integration with the Neuron SDK and NVIDIA Triton Inference Server.

Figure 1. ECS tasks scale horizontally hosting the Triton Inference Server and dependencies

Figure 1. ECS tasks scale horizontally hosting the Triton Inference Server and dependencies

With this integration, Rufus was able to benefit from a critical optimization: continuous batching. Continuous batching allows a single host to greatly increase throughput. In addition, continuous batching provides unique capabilities in comparison to other batch techniques, such as static batching. For example, when using static batching, the time to first token (TTFT) increases linearly with the number of requests in one batch. Continuous batching prioritizes the prefill stage for LLM inference, keeping TTFT under control even with more requests running at the same time. This helped Rufus provide a pleasant experience with low latency when generating the first response, and improve the single-host throughput to keep serving costs under control.

Conclusion

In this post, we discussed how Rufus is able to reliably deploy and serve its multi-billion-parameter LLM using the Neuron SDK with Inferentia2 and Trainium chips and AWS services. Rufus continues to evolve with advancements in generative AI and customer feedback and we encourage you to use Inferentia and Trainium.

Learn more about how we are innovating with generative AI across Amazon.


About the author

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In his spare time, he enjoys seeking out new cultures, new experiences, and staying up to date with the latest technology trends.

RJ is an Engineer within Amazon. He builds and optimizes systems for distributed systems for training and works on optimizing adopting systems to reduce latency for ML Inference. Outside work, he is exploring using Generative AI for building food recipes.

Yang Zhou is a software engineer working on building and optimizing machine learning systems. His recent focus is enhancing the performance and cost efficiency of generative AI inference. Beyond work, he enjoys traveling and has recently discovered a passion for running long distances.

Adam (Hongshen) Zhao is a Software Development Manager at Amazon Stores Foundational AI. In his current role, Adam is leading Rufus Inference team to build GenAI inference optimization solutions and inference system at scale for fast inference at low cost. Outside work, he enjoys traveling with his wife and art creations.

Faqin Zhong is a software engineer at Amazon Stores Foundational AI, working on Large Language Model (LLM) inference infrastructure and optimizations. Passionate about Generative AI technology, Faqin collaborates with leading teams to drive innovations, making LLMs more accessible and impactful, ultimately enhancing customer experiences across diverse applications. Outside of work she enjoys cardio exercise and baking with her son.

Nicolas Trown is an engineer in Amazon Stores Foundational AI. His recent focus is lending his systems expertise across Rufus to aid Rufus Inference team and efficient utilization across the Rufus experience. Outside of work he enjoys spending time with his wife and day trips to nearby coast, Napa, and Sonoma areas.

Bing Yin is a director of science at Amazon Stores Foundational AI. He leads the effort to build LLMs that are specialized for shopping use cases and optimized for inference at Amazon scale. Outside of work, he enjoys running marathon races.

Read More

Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision

Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision

Amazon Lookout for Vision, the AWS service designed to create customized artificial intelligence and machine learning (AI/ML) computer vision models for automated quality inspection, will be discontinuing on October 31, 2025. New customers will not be able to access the service effective October 10, 2024, but existing customers will be able to use the service as normal until October 31, 2025. AWS will continue to support the service with security updates, bug fixes, and availability enhancements, but we do not plan to introduce new features for this service.

This post discusses some alternatives to Lookout for Vision and how you can export your data from Lookout for Vision to migrate to an alternate solution.

Alternatives to Lookout for Vision

If you’re interested in an alternative to Lookout for Vision, AWS has options for both buyers and builders.

For an out-of-the-box solution, the AWS Partner Network offers solutions from multiple partners. You can browse solutions on the Computer Vision for Quality Insights page in the AWS Solutions Library. These partner solutions include options for software, software as a service (SaaS) applications, managed solutions or custom implementations based on your needs. This approach provides a solution that addresses your use case without requiring you to have expertise in imaging, computer vision, AI, or application development. This typically provides the fastest time to value by taking advantage of the specialized expertise of the AWS Partners. The Solutions Library also has additional guidance to help you build solutions faster.

If you prefer to build your own solution, AWS offers AI tools and services to help you develop an AI-based computer vision inspection solution. Amazon SageMaker provides a set of tools to build, train, and deploy ML models for your use case with fully managed infrastructure, tools, and workflows. In addition to SageMaker enabling you to build your own models, Amazon SageMaker JumpStart offers built-in computer vision algorithms and pre-trained defect detection models that can be fine-tuned to your specific use case. This approach provides you the tools to accelerate your AI development while providing complete flexibility to build a solution that meets your exact requirements and integrates with your existing hardware and software infrastructure. This typically provides the lowest operating costs for a solution.

AWS also offers Amazon Bedrock, a fully managed service that offers a choice of high-performing generative AI foundation models (FMs), including models that can help build a defect detection model running in the cloud. This approach enables you to build a custom solution while using the power of generative AI to handle the custom computer vision model creation and some of the code generation to speed development, eliminating the need for full AI computer vision expertise. Amazon Bedrock provides the ability to analyze images for defects, compare performance of different models, and generate code for custom applications. This alternative is useful for use cases that don’t require low latency processing, providing faster time to value and lower development costs.

Migrating data from Lookout for Vision

To move existing data from Lookout for Vision to use in an alternative implementation, the Lookout for Vision SDK provides the capability to export a dataset from the service to an Amazon Simple Storage Service (Amazon S3) bucket. This procedure exports the training dataset, including manifest and dataset images, for a project to a destination Amazon S3 location that you specify. With the exported dataset and manifest file, you can use the same data that you used to create a Lookout for Vision model to create a model using SageMaker or Amazon Bedrock, or provide it to a partner to incorporate into their customizations for your use case.

Summary

Although Lookout for Vision is planned to shut down on October 31, 2025, AWS offers a powerful set of AI/ML services and solutions in the form of SageMaker tools to build custom models and generative AI with Amazon Bedrock to do customized inspection and generate code, in addition to a range of offerings from partners in the AWS Partner Network. Export tools enable you to effortlessly move your data from Lookout for Vision to an alternate solution if you so choose. You should explore these options to determine what works best for your specific needs.

For more details, refer to the following resources:


About the Author

Tim Westman is the Product Manager and Go-to-Market Lead for Edge Machine Learning, AWS. Tim leads the Product Management and Business Development for the Edge Machine Learning business at Amazon Web Services. In this role, he works with customers to help build computer vision solutions at the edge to solve complex operational challenges. Tim has more than 30 years of experience in sales, business development and product management roles for leading hardware and software companies, with the last 8 years specializing in AI and computer vision for IoT applications.

Read More

AI’ll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections

AI’ll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections

Half of the world’s population will experience a mental health disorder — but the median number of mental health workers per 100,000 people is just 13, according to the World Health Organization.

To help tackle this disparity — which can vary by over 40x between high-income and low-income countries — a Madrid-based startup is offering therapists AI tools to improve the delivery of mental health services.

Therapyside, a member of the NVIDIA Inception program for cutting-edge startups, is bolstering its online therapy platform using NVIDIA NIM inference microservices. These AI microservices serve as virtual assistants and notetakers, letting therapists focus on connecting with their clients.

“In a therapy setting, having a strong alliance between counselor and client is everything,” said Alessandro De Sario, founder and CEO of Therapyside. “When a therapist can focus on the session without worrying about note-taking, they can reach that level of trust and connection much quicker.”

For the therapists and clients who have opted in to test these AI tools, a speech recognition model transcribes their conversations. A large language model summarizes the session into clinical notes, saving time for therapists so they can speak with more clients and work more efficiently. Another model powers a virtual assistant, dubbed Maia, that can answer therapists’ questions using retrieval-augmented generation, aka RAG.

Therapyside aims to add features over time, such as support for additional languages and an offline version that can transcribe and summarize in-person therapy sessions.

“We’ve just opened the door,” said De Sario. “We want to make the tool much more powerful so it can handle administrative tasks like calendar management and patient follow-up, or remind therapists of topics they should cover in a given session.”

AI’s in Session: Enhancing Therapist-Client Relationships

Therapyside, founded in 2017, works with around 1,000 licensed therapists in Europe offering counseling in English, Italian and Spanish. More than 500,000 therapy sessions have been completed through its virtual platform to date.

The company’s AI tools are currently available through a beta program. Therapists who choose to participate can invite their clients to opt in to the AI features.

“It’s incredibly helpful to have a personalized summary with a transcription that highlights the most important points from each session I have with my patients,” said Alejandro A., one of the therapists participating in the beta program. “I’ve been pleasantly surprised by its ability to identify the most significant areas to focus on with each patient.”

Screen capture of Therapyside session with transcription running live
A speech recognition AI model can capture live transcriptions of sessions.

The therapists testing the tool rated the transcriptions and summaries as highly accurate, helping them focus on listening without worrying about note-taking.

“The recaps allow me to be fully present with the clients in my sessions,” said Maaria A., another therapist participating in the beta program.

During sessions, clients share details about their life experiences that are captured in the AI-powered transcriptions and summaries. Therapyside’s RAG-based Maia connects to these resources to help therapists quickly recall minutiae like the name of a client’s sibling, or track how a client’s main challenges have evolved over time. This information can help therapists pose more personalized questions and provide better support.

“Maia is a valuable tool to have when you’re feeling a little stuck,” said Maaria A. “I have clients all over the world, so Maia helps remind me where they live. And if I ask Maia to suggest exercises clients could do to boost their self-esteem, it helps me find resources I can send to them, which helps save time.”

Screen capture of a therapist Q&A with the Maia virtual assistant
Maia can answer therapist’s questions based on session transcripts and summaries.

Take Note: AI Microservices Enable Easy Deployment

Therapyside’s AI pipeline runs on NVIDIA GPUs in a secure cloud environment and is built with NVIDIA NIM, a set of easy-to-use microservices designed to speed up AI deployment.

For transcription, the pipeline uses NVIDIA Riva NIM microservices, which include NVIDIA Parakeet, a record-setting family of models, to deliver highly accurate automatic speech recognition. Flowchart illustrating Therapyside's AI pipeline

Once the transcript is complete, the text is processed by a NIM microservice for Meta’s Llama 3.1 family of open-source AI models to generate a summary that’s added to the client’s clinical history.

The Maia virtual assistant, which also uses a Llama 3.1 NIM microservice, accesses these clinical records using a RAG pipeline powered by NVIDIA NeMo Retriever NIM microservices. RAG techniques enable organizations to connect AI models to their private datasets to deliver contextually accurate responses.

Therapyside plans to further customize Maia with capabilities that support specific therapeutic methods, such as cognitive behavioral therapy and psychodynamic therapy. The team is also integrating NVIDIA NeMo Guardrails to further enhance the tools’ safety and security.

Kimberly Powell, vice president of healthcare at NVIDIA, will discuss Therapyside and other healthcare innovators in a keynote address at HLTH, a conference taking place October 20-23 in Las Vegas.

Learn more about NVIDIA Inception and get started with NVIDIA NIM microservices at ai.nvidia.com.

Read More

The Next Chapter Awaits: Dive Into ‘Diablo IV’s’ Latest Adventure ‘Vessel of Hatred’ on GeForce NOW

The Next Chapter Awaits: Dive Into ‘Diablo IV’s’ Latest Adventure ‘Vessel of Hatred’ on GeForce NOW

Prepare for a devilishly good time this GFN Thursday as the critically acclaimed Diablo IV: Vessel of Hatred downloadable content (DLC) joins the cloud, one of six new games available this week.

GeForce NOW also extends its game-library sync feature to Battle.net accounts, so members can seamlessly bring their favorite Blizzard games into their cloud-streaming libraries.

Hell’s Bells and Whistles

Get ready to rage. New DLC for the hit title Diablo IV: Vessel of Hatred is available to stream at launch this week, with thrilling content and gameplay for GeForce NOW members to experience.

Diablo IV Vessel of Hatred DLC on GeForce NOW
Hate is in the air.

Diablo IV: Vessel of Hatred DLC is the highly anticipated expansion of the latest installment in Blizzard’s iconic action role-playing game series. It introduces players to the lush and dangerous jungles of Nahantu. Teeming with both beauty and dangers, this new environment offers a fresh backdrop for action-packed battles against the demonic forces of Hell. A new playable class, the Spiritborn, offers unique gameplay mechanics tied to four guardian spirits: the eagle, gorilla, jaguar and centipede.

The DLC extends the main Diablo IV story and includes new features such as recruitable Mercenaries, a Player vs. Everyone co-op endgame activity, Party Finder to help members team up and take down challenges together, and more. Vessel of Hatred arrives alongside major updates including revamped leveling, a new difficulty system and Paragon adjustments that will continue to enhance the world of Diablo IV.

Ultimate members can experience the wrath at up to 4K resolution and 120 frames per second with support for NVIDIA DLSS and ray-tracing technologies. And members can jump right into the latest DLC without having to wait around for updates. Hell never looked so good, even on low-powered devices.

Let That Sync In

Battle.net game sync on GeForce NOW
Connection junction.

With game syncing for Blizzard’s Battle.net game library coming to GeForce NOW this week, members can connect their digital game store accounts so that all of their supported games are part of their streaming libraries.

Members can now easily find and stream popular titles such as StarCraft II, Overwatch 2, Call of Duty HQ and Hearthstone from their cloud gaming libraries, enhancing the games’ accessibility across a variety of devices.

Battle.net joins other digital storefronts that already have game sync support, including Steam, Epic Games Store, Xbox and Ubisoft Connect. This allows members to consolidate their gaming experiences in one place.

Plus, GeForce NOW members can play high-quality titles without the need for high-end hardware, streaming from GeForce RTX-powered servers in the cloud. Whether battling demons in Sanctuary or engaging in epic firefights, GeForce NOW members get a seamless gaming experience anytime, anywhere.

Hot and New

Europa on GeForce NOW
Soar through serenity and uncover destiny, all from the cloud.

Europa is a peaceful game of adventure, exploration and meditation from Future Friends Games, ready for members to stream at launch this week. On the moon Europa, a lush terraformed paradise in Jupiter’s shadow, an android named Zee sets out in search of answers. Run, glide and fly across the landscape, solve mysteries in the ruins of a fallen utopia, and discover the story of the last human alive.

Members can look for the following games available to stream in the cloud this week:

  • Empyrion – Galactic Survival (New release on Epic Games Store, Oct. 10)
  • Europa (New release on Steam, Oct. 11)
  • Dwarven Realms (Steam)
  • Star Trek Timelines (Steam)
  • Star Trucker (Steam)
  • Starcom: Unknown Space (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Unlock the knowledge in your Slack workspace with Slack connector for Amazon Q Business

Unlock the knowledge in your Slack workspace with Slack connector for Amazon Q Business

Amazon Q Business is a fully managed, generative AI-powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. Amazon Q Business offers over 40 built-in connectors to popular enterprise applications and document repositories, including Amazon Simple Storage Service (Amazon S3), Salesforce, Google Drive, Microsoft 365, ServiceNow, Gmail, Slack, Atlassian, and Zendesk and can help you create your generative AI solution with minimal configuration.

Nearly 100 thousand organizations use Slack to bring the right people together to securely collaborate with each other. A Slack workspace captures invaluable organizational knowledge in the form of the information that flows through it as the users communicate on it. Hence, it is valuable to make this knowledge quickly and securely available to the users.

In this post, we will demonstrate how to set up Slack connector for Amazon Q Business to sync communications from both public and private channels, reflective of user permissions. We will also guide you through the configurations needed on your Slack workspace. Additionally, you will learn how to configure the Amazon Q Business application and enable user authentication through AWS IAM Identity Center, which is a recommended service for managing a workforce’s access to AWS applications.

Data source overview

Amazon Q Business uses large language models (LLMs) to build a unified solution that connects multiple data sources. Typically, you’d need to use a natural language processing (NLP) technique called Retrieval Augmented Generation (RAG) for this. With RAG, generative AI enhances its responses by incorporating relevant information retrieved from a curated dataset. Amazon Q Business has a built-in managed RAG capability designed to reduce the undifferentiated heavy lifting involved in creating these systems. Typical of a RAG model, Amazon Q Business has two components: A retrieval component that retrieves relevant documents for the user query and a generation component that takes the query and the retrieved documents and then generates an answer to the query using an LLM.

A Slack workspace has multiple elements. It has public channels where workspace users can participate and private channels where only channel members can communicate with each other. Individuals can also directly communicate with each other in one-on-one conversations and in user groups. This communication is in the form of messages and threads of replies, with optional document attachments. Slack workspaces of active organizations are highly dynamic, with the content and collaboration evolving and growing in volume continuously.

The preceding figure shows the process flow of the solution. When you connect Amazon Q Business to a data source (in this case, Slack), what Amazon Q considers and crawls as a document varies by connector. For the Amazon Q Business Slack connector, each message, message attachment and channel post is considered a single document, However, Slack conversation threads that help you create organized discussions around specific messages are also considered and ingested as a single document, regardless of the number of participants or messages they contain.

Amazon Q Business crawls access control list (ACL) information attached to a document (user and group information) from your Slack instance. This information can be used to filter chat responses to the user’s document access level. The Slack connector supports token-based authentication. This could be a Slack bot user OAuth token or Slack user OAuth token. See the Slack connector overview to get the list of entities that are extracted, supported filters, sync modes, and file types.

User IDs (_user_id) exist in Slack on messages and channels where there are set access permissions. They are mapped from the user emails as the IDs in Slack.

To connect your data source connector to Amazon Q Business, you must give Amazon Q Business an IAM role that has the following permissions:

  • Permission to access the BatchPutDocument and BatchDeleteDocument operations to ingest documents.
  • Permission to access the User Store API operations to ingest user and group access control information from documents.
  • Permission to access your AWS Secrets Manager secret to authenticate your data source connector instance.
  • (Optional) If you’re using Amazon Virtual Private Cloud (Amazon VPC), permission to access your Amazon VPC.

Solution overview

In this solution, we will show you how to create a Slack workspace with users who perform various roles within the organization. We will then show you how to configure this workspace to define a set of scopes that are required by the Amazon Q Business Slack connector to index the user communication. This will be followed by the configuration of the Amazon Q Business application and a Slack data source. Based on the configuration, when the data source is synchronized, the connector either crawls and indexes the content from the workspace that was created on or before a specific date. The connector also collects and ingests ACL information for each indexed message and document. Thus, the search results of a query made by a user includes results only from those documents that the user is authorized to read.

Prerequisites

To build the Amazon Q Business connector for Slack, you need the following:

In Slack:

  • Create a Slack bot user OAuth token or Slack user OAuth token. You can choose either token to connect Amazon Q Business to your Slack data source. See the Slack documentation on access tokens for more information.
  • Note your Slack workspace team ID from your Slack workspace main page URL. For example, https://app.slack.com/client/T0123456789/... where T0123456789 is the team ID.
  • Add the OAuth scopes and read permissions.

In your AWS account:

  • Create an AWS Identity and Access Management (IAM) role for your data source and, if using the Amazon Q Business API, note the ARN of the IAM role.
  • Store your Slack authentication credentials in an AWS Secrets Manager secret and, if using the Amazon Q Business API, note the ARN of the secret.
  • Enable and configure an IAM Identity Center instance. Amazon Q Business integrates with IAM Identity Center as a gateway to manage user access to your Amazon Q Business application. We recommend enabling and pre-configuring an Identity Center instance before you begin to create your Amazon Q Business application. Identity Center is the recommended AWS service for managing human user access to AWS resources. Amazon Q Business supports both organization and account level Identity Center instances. See Setting up for Amazon Q Business for more information.

Configure your Slack workspace

You will create one user for each of the following roles: Administrator, Data scientist, Database administrator, Solutions architect and Generic.

User name Role
arnav_desai Admin
jane_doe Data Scientist
pat_candella DB Admin
mary_major Solutions Architect
john_stiles Generic User

To showcase the ACL propagation, you will create three public channels, #general, #customerwork, and #random, that any member can access including the Generic user. Also, one private channel, #anydepartment-project-private, that can be accessed only by the users arnav_desai, john_stiles, mary_major, and pat_candella.

To create a Slack app:

  1. Navigate to the Slack API Your Apps page and choose Create New App.
  2. Select From scratch. In the next screen, select the workspace to develop your app, and then choose Create an App.
  3. Give the Slack app a name and select a workspace to develop your app in. Then choose Create App.
  4. After you’ve created your app, select it and navigate to Features and choose OAuth & Permissions.
  5. Scroll down to Scopes > User Token Scopes and set the OAuth scope based on the user token scopes in Prerequisites for connecting Amazon Q Business to Slack.

Note: You can configure two types of scopes in a Slack workspace:

  1. Bot token scope: Only the messages to which it has been explicitly added are crawled by the bot token. It is employed to grant restricted access to specific messages only.
  2. User token scope: Only the data shared with the member is accessible to the user token, which acts as a representative of a Slack user.

For this example, so you can search on the conversations between users, you will use the user token scope.

  1. After the OAuth scope for yser token has been set up as described in the Slack prerequisites, scroll up to the section OAuth Tokens for your Workspace, and choose Install to Workspace, and then choose Allow.
  2. This will generate a user OAuth token. Copy this token to use when configuring the Amazon Q Business Slack connector.

Configure the data source using the Amazon Q Business Slack connector

In this section, you will create an Amazon Q Business application using the console.

To create an Amazon Q Business application

  1. In the AWS Management Console for Amazon Q Business, choose Create Application.
  2. Enter an Application Name, such as my-slack-workspace. Leave the Service access as the default value, and select AWS IAM Identity Center for Access Management . Enter a new Tag value as required and choose Create to the Amazon Q Business Application.
  3. Leave the default option of Use Native retriever selected for Retrievers, leave Enterprise as the Index provisioning and leave the default value of 1 as the Number of units. Each unit in Amazon Q Business index is 20,000 documents or 200 MB of extracted text (whichever comes first). Choose Next.
  4. Scroll down the list of available connectors and select Slack and then choose Next.

    1. Enter a Data source name and a Description to identify your data source and then enter the Slack workspace team ID to connect with Amazon Q Business.
    2. In the Authentication section, select Create and add a new secret.
    3. On the dialog box that appears, enter a Secret name followed by the User OAuth Slack token that was copied from the Slack workspace.
    4. For the IAM role, select Create a new service role (Recommended).
    5. In Sync scope, choose the following:
      • For select type of content to crawl, select All channels.
      • Select an appropriate date for Select crawl start date.
      • Leave the default value selected for Maximum file size as 50.
      • You can include specific Messages, such as bot messages or archived messages to sync.
      • Additionally, you can include up to 100 patterns to include or exclude filenames, types, or file paths to sync.

    6. For Sync mode, leave Full sync selected and for the Sync run schedule, select Run on demand.
    7. Leave the field mapping as is and choose Add data source.
    8. On the next page, choose Next.
  5. Add the five users you created earlier, who are a part of IAM Identity Center and the Slack workspace to the Amazon Q Business application. To add users to Identity Center, follow the instructions in Add users to your Identity Center directory. When done, choose Add groups and users and choose Assign.
  6. When a user is added, each user is assigned the default Q Business Pro For more information on different pricing tiers, see the Amazon Q Business pricing page.
  7. Choose Create application to finish creating the Amazon Q Business application.
  8. After the application and the data source are created, select the data source and then choose Sync now to start syncing documents from your data source.
  9. The sync process ingests the documents from your Slack workspace to your selections in the Slack connector configuration in Amazon Q Business. The following screenshot shows the results of a successful sync, indicated by the status of Completed.

Search with Amazon Q Business

Now, you’re ready to make a few queries in Amazon Q Business.

To search using Amazon Q Business:

  1. Navigate to the Web experience settings tab and click on the Deployed URL.
  2. For this demonstration, sign in as pat_candella who has the role of DB Admin.
  3. Enter the password for pat_candella and choose Sign in
  4. Upon successful sign-in, you will be signed in to Amazon Q Business.
  5. In the Slack workspace, there is a public channel, the #customerwork channel that all users are members of. The #customerwork Slack channel is being used to communicate about an upcoming customer engagement, as shown in the following figure.
  6. Post the first question to Amazon Q Business.
I am currently using Apache Kafka. Can you list high level steps involved in migration to Amazon MSK?

Note that the response includes citations that refer to the conversation as well as the content of the PDF that was attached to the conversation.

Security and privacy options with Slack data connector

Next, you will create a private channel called #anydepartment-project-private with four out of the five users—arnav_desai, john_stiles, mary_major and pat_candella—and verify that the messages exchanged in a private channel are not available to non-members like jane_doe. Note that after you create a new private channel, you need to manually re-run the sync on the data source.

The below screenshot shows the private slack channel with four out of five users and the slack conversation.

Testing security and privacy options with Slack data connector

  1. While signed in as pat_candella, who is part of the private #anydepartment-project-private channel, execute the following query:
    What is Amazon Kendra and which API do I use to query a Kendra index?

  2. Now, sign in as jane_doe, who is not a member of the #anydepartment-project-private channel and execute the same query.
  3. Amazon Q Business prevents jane_doe from getting insights from information within the private channels that they aren’t part of, based on the synced ACL information.

Indexing aggregated Slack threads

Slack organizes conversations into threads, which can involve multiple users and messages. The Amazon Q Business Slack connector treats each thread as a single document, regardless of the number of participants or messages it contains. This approach allows Amazon Q Business to ingest entire conversation threads as individual units, maximizing the amount of data that can be processed within a single index unit. As a result, you can efficiently incorporate more comprehensive conversational context into your Amazon Q Business system.

The figure that follows shows a conversation between pat_candella and jane_doe that includes six messages in a thread. The Slack connector aggregates this message thread as a single message, thus maximizing the use of an index unit.

Because the conversation thread is aggregated as a single document within the Amazon Q Business index, you can ask questions that pertain to a single conversation thread as shown in the following figure.

Troubleshooting the sync process

  • Why isn’t Amazon Q Business answering any of my questions?

If you aren’t getting answers to your questions from Amazon Q Business, verify the following:

  • Permissions – Document ACLs indexed by Amazon Q Business may not allow you to query certain data entities as demonstrated in our example. If this is the case, please reach out to your Slack workspace administrator to make sure that your user has access to required documents and repeat the sync process.
  • Data connector sync – A failed data source sync may prevent the documents from being indexed, meaning that Amazon Q Business would be unable to answer questions about the documents that failed to sync. Please refer to the official documentation to troubleshoot data source connectors.
  • I’m receiving access errors on Amazon Q Business application. What causes this?

See Troubleshooting Amazon Q Business identity and access to diagnose and fix common issues that you might encounter when working with Amazon Q and IAM.

  • How can I sync documents without ACLs?

Amazon Q Business supports crawling ACLs for document security by default. Turning off ACLs and identity crawling are no longer supported. If you want to index documents without ACLs, ensure that the documents are marked as public in your data source. Please refer to the official documentation, How Amazon Q Business connector for crawls Slack ACLs.

  • My connector is unable to sync. How can I monitor data source sync progress?

Amazon Q Business provides visibility into the data sync operations. Learn more about this feature in the AWS Machine Learning blog.

Additionally, as the sync process runs, you can monitor progress or debug failures by monitoring the Amazon CloudWatch logs that can be accessed from the Details section of the Sync run history.

A sample query to determine which documents or messages were indexed from a specific slack channel, C12AB34578, and logStream of SYNC_RUN_HISTORY_REPORT/xxxxxxxxxxxxxxxxxxxxxxxx would look like the following:

fields LogLevel, DocumentId, DocumentTitle, CrawlAction, ConnectorDocumentStatus.Status as ConnectorDocumentStatus, ErrorMsg, CrawlStatus.Status as CrawlStatus, SyncStatus.Status as SyncStatus, IndexStatus.Status as IndexStatus, SourceUri, Acl, Metadata, HashedDocumentId, @timestamp

| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/xxxxxxxxxxxxxxxxxxxxxxxx' and Metadata like /"stringValue":"C12AB34578"/

| sort @timestamp desc

| limit 10000

Choosing Run query displays the list of messages as the Amazon Q Business Index sync runs, as shown in the following figure.

Cleanup

To delete an Amazon Q Business application, you can use the console or the DeleteApplication API operation.

To delete an Amazon Q Business application using the console

  1. Sign in to the Amazon Q Business console.
  2. Select the respective the Amazon Q Business Application and choose
  3. Choose Delete
  4. In the dialog box that opens, enter Delete to confirm deletion, and then choose Delete.
  5. You are returned to the service console while your application is deleted. When the deletion process is complete, the console displays a message confirming successful deletion.

To delete the IAM Identity Center instance, see Delete your IAM Identity Center instance.

Conclusion

This blog post provides a step-by-step guide on setting up the Slack connector for Amazon Q Business, enabling you to seamlessly integrate data from your Slack workspace. Moreover, we highlighted the importance of data privacy and security, demonstrating how the connector adheres to the ACLs within your Slack workspace. This feature helps ensure that private channel conversations remain confidential and inaccessible to individuals who aren’t members of those channels. By following these steps and understanding the built-in security measures, you can use the power of Amazon Q Business while maintaining the integrity and privacy of your Slack workspace.

To learn more about the Amazon Q Business connector for Slack, see Connecting Slack to Amazon Q Business. You can automate all the showcased console operations through Amazon Q Business API’s, the AWS CLI and other applicable AWS SDKs.

If you choose to converse with Amazon Q Business using Slack direct messages (DMs) to ask questions and get answers based on company data or to get help creating new content such as email drafts, summarize attached files, and perform tasks, see Deploy a Slack gateway for Amazon Q, your business expert for information about how to bring Amazon Q, your business expert, to users in Slack.


About the Authors

Akshara Shah is a Senior Solutions Architect at Amazon Web Services. She provides strategic technical guidance to help customers design and build cloud solutions. She is currently focused on machine learning and AI technologies.

Roshan Thomas is a Senior Solutions Architect at Amazon Web Services. He is based in Melbourne, Australia and works closely with enterprise customers to accelerate their journey in the cloud. He is passionate about technology and helping customers architect and build solutions on AWS.

Read More

AI Summit: US Energy Secretary Highlights AI’s Role in Science, Energy and Security

AI Summit: US Energy Secretary Highlights AI’s Role in Science, Energy and Security

AI can help solve some of the world’s biggest challenges — whether climate change, cancer or national security — U.S. Secretary of Energy Jennifer Granholm emphasized today during her remarks at the AI for Science, Energy and Security session at the NVIDIA AI Summit, in Washington, D.C.

Granholm went on to highlight the pivotal role AI is playing in tackling major national challenges, from energy innovation to bolstering national security.

“We need to use AI for both offense and defense — offense to solve these big problems and defense to make sure the bad guys are not using AI for nefarious purposes,” she said.

Granholm, who calls the Department of Energy “America’s Solutions Department,” highlighted the agency’s focus on solving the world’s biggest problems.

“Yes, climate change, obviously, but a whole slew of other problems, too … quantum computing and all sorts of next-generation technologies,” she said, pointing out that AI is a driving force behind many of these advances.

“AI can really help to solve some of those huge problems — whether climate change, cancer or national security,” she said. “The possibilities of AI for good are awesome, awesome.”

Following Granholm’s 15-minute address, a panel of experts from government, academia and industry took the stage to further discuss how AI accelerates advancements in scientific discovery, national security and energy innovation.

“AI is going to be transformative to our mission space.… We’re going to see these big step changes in capabilities,” said Helena Fu, director of the Office of Critical and Emerging Technologies at the Department of Energy, underscoring AI’s potential in safeguarding critical infrastructure and addressing cyber threats.

During her remarks, Granholm also stressed that AI’s increasing energy demands must be met responsibly.

“We are going to see about a 15% increase in power demand on our electric grid as a result of the data centers that we want to be located in the United States,” she explained.

However, the DOE is taking steps to meet this demand with clean energy.

“This year, in 2024, the United States will have added 30 Hoover Dams’ worth of clean power to our electric grid,” Granholm announced, emphasizing that the clean energy revolution is well underway.

AI’s Impact on Scientific Discovery and National Security

The discussion then shifted to how AI is revolutionizing scientific research and national security.

Tanya Das, director of the Energy Program at the Bipartisan Policy Center, pointed out that “AI can accelerate every stage of the innovation pipeline in the energy sector … starting from scientific discovery at the very beginning … going through to deployment and permitting.”

Das also highlighted the growing interest in Congress to support AI innovations, adding, “Congress is paying attention to this issue, and, I think, very motivated to take action on updating what the national vision is for artificial intelligence.”

Fu reiterated the department’s comprehensive approach, stating, “We cross from open science through national security, and we do this at scale.… Whether they be around energy security, resilience, climate change or the national security challenges that we’re seeing every day emerging.”

She also touched on the DOE’s future goals: “Our scientific systems will need access to AI systems,” Fu said, emphasizing the need to bridge both scientific reasoning and the new kinds of models we’ll need to develop for AI.

Collaboration Across Sectors: Government, Academia and Industry

Karthik Duraisamy, director of the Michigan Institute for Computational Discovery and Engineering at the University of Michigan, highlighted the power of collaboration in advancing scientific research through AI.

“Think about the scientific endeavor as 5% creativity and innovation and 95% intense labor. AI amplifies that 5% by a bit, and then significantly accelerates the 95% part,” Duraisamy explained. “That is going to completely transform science.”

Duraisamy further elaborated on the role AI could play as a persistent collaborator, envisioning a future where AI can work alongside scientists over weeks, months and years, generating new ideas and following through on complex projects.

“Instead of replacing graduate students, I think graduate students can be smarter than the professors on day one,” he said, emphasizing the potential for AI to support long-term research and innovation.

Learn more about how this week’s AI Summit highlights how AI is shaping the future across industries and how NVIDIA’s solutions are laying the groundwork for continued innovation. 

###END###

Read More

Transitioning off Amazon Lookout for Metrics 

Transitioning off Amazon Lookout for Metrics 

Amazon Lookout for Metrics is a fully managed service that uses machine learning (ML) to detect anomalies in virtually any time-series business or operational metrics—such as revenue performance, purchase transactions, and customer acquisition and retention rates—with no ML experience required. The service, which was launched in March 2021, predates several popular AWS offerings that have anomaly detection, such as Amazon OpenSearch, Amazon CloudWatch, AWS Glue Data Quality, Amazon Redshift ML, and Amazon QuickSight.

After careful consideration, we have made the decision to end support for Amazon Lookout for Metrics, effective October 10, 2025. In addition, as of today, new customer sign-ups are no longer available. Existing customers will be able to use the service as usual until October 10, 2025, when we will end support for Amazon Lookout for Metrics.

In this post, we provide an overview of the alternate AWS services that offer anomaly detection capabilities for customers to consider transitioning their workloads to.

AWS services with anomaly detection capabilities

We recommend customers use Amazon OpenSearch, Amazon CloudWatch, Amazon Redshift ML, Amazon QuickSight, or AWS Glue Data Quality services for their anomaly detection use cases as an alternative to Amazon Lookout for Metrics. These AWS services offer generally available, ML-powered anomaly detection capabilities that can be used out of the box without requiring any ML expertise. Following is a brief overview of each service.

Using Amazon OpenSearch for anomaly detection

Amazon OpenSearch Service features a highly performant, integrated anomaly detection engine that enables the real-time identification of anomalies in streaming data as well as in historical data. You can pair anomaly detection with built-in alerting in OpenSearch to send notifications when there is an anomaly. To start using OpenSearch for anomaly detection you first must index your data into OpenSearch, from there you can enable anomaly detection in OpenSearch Dashboards. To learn more, see the documentation.

Using Amazon CloudWatch for anomaly detection

Amazon CloudWatch supports creating anomaly detectors on specific Amazon CloudWatch Log Groups by applying statistical and ML algorithms to CloudWatch metrics. Anomaly detection alarms can be created based on a metric’s expected value. These types of alarms don’t have a static threshold for determining alarm state. Instead, they compare the metric’s value to the expected value based on the anomaly detection model. To start using CloudWatch anomaly detection, you first must ingest data into CloudWatch and then enable anomaly detection on the log group.

Using Amazon Redshift ML for anomaly detection

Amazon Redshift ML makes it easy to create, train, and apply machine learning models using familiar SQL commands in Amazon Redshift data warehouses. Anomaly detection can be done on your analytics data through Redshift ML by using the included XGBoost model type, local models, or remote models with Amazon SageMaker. With Redshift ML, you don’t have to be a machine learning expert and you pay only for the training cost of the SageMaker models. There are no additional costs to using Redshift ML for anomaly detection. To learn more, see the documentation.

Using Amazon QuickSight for anomaly detection

Amazon QuickSight is a fast, cloud-powered, business intelligence service that delivers insights to everyone in the organization. As a fully managed service, QuickSight lets customers create and publish interactive dashboards that include ML insights. QuickSight supports a highly performant, integrated anomaly detection engine that uses proven Amazon technology to continuously run ML-powered anomaly detection across millions of metrics to discover hidden trends and outliers in customers’ data. This tool allows customers to get deep insights that are often buried in the aggregates and not scalable with manual analysis. With ML-powered anomaly detection, customers can find outliers in their data without the need for manual analysis, custom development, or ML domain expertise. To learn more, see the documentation.

Using Amazon Glue Data Quality for anomaly detection

Data engineers and analysts can use AWS Glue Data Quality to measure and monitor their data. AWS Glue Data Quality uses a rule-based approach that works well for known data patterns and offers ML-based recommendations to help you get started. You can review the recommendations and augment rules from over 25 included data quality rules. To capture unanticipated, less obvious data patterns, you can enable anomaly detection. To use this feature, you can write rules or analyzers and then turn on anomaly detection in AWS Glue ETL. AWS Glue Data Quality collects statistics for columns specified in rules and analyzers, applies ML algorithms to detect anomalies, and generates visual observations explaining the detected issues. Customers can use recommended rules to capture the anomalous patterns and provide feedback to tune the ML model for more accurate detection. To learn more, see the blog post, watch the introductory video, or see the documentation.

Using Amazon SageMaker Canvas for anomaly detection (a beta feature)

The Amazon SageMaker Canvas team plans to provide support for anomaly detection use cases in Amazon SageMaker Canvas. We’ve created an AWS CloudFormation template-based solution to give customers early access to the underlying anomaly detection feature. Customers can use the CloudFormation template to bring up an application stack that receives time-series data from an Amazon Managed Streaming for Apache Kafka (Amazon MSK) streaming source and performs near-real-time anomaly detection in the streaming data. To learn more about the beta offering, see Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink.

Frequently asked questions

  1. What is the cutoff point for current customers?

We created an allow list of account IDs that have used Amazon Lookout for Metrics in the last 30 days and have active Amazon Lookout for Metrics resources, including detectors, within the service. If you are an existing customer and are having difficulties using the service, please reach out to us via AWS Customer Support for help.

  1. How will access change before the sunset date?

Current customers can do all the things they could previously. The only change is that non-current customers cannot create any new resources in Amazon Lookout for Metrics.

  1. What happens to my Amazon Lookout for Metrics resources after the sunset date?

After October 10, 2025, all references to AWS Lookout for Metrics models and resources will be deleted from Amazon Lookout for Metrics. You will not be able to discover or access Amazon Lookout for Metrics from your AWS Management Console and applications that call the Amazon Lookout for Metrics API will no longer work.

  1. Will I be billed for Amazon Lookout for Metrics resources remaining in my account after October 10, 2025?

Resources created by Amazon Lookout for Metrics internally will be deleted after October 10, 2025. Customers will be responsible for deleting the input data sources created by them, such as Amazon Simple Storage Service (Amazon S3) buckets, Amazon Redshift clusters, and so on.

  1. How do I delete my Amazon Lookout for Metrics resources?
  1. How can I export anomalies data before deleting the resources?

Anomalies data for each measure can be downloaded for a detector by using the Amazon Lookout for Metrics APIs for a particular detector. Exporting Anomalies explains how to connect to a detector, query for anomalies, and download them into a format for later use.

Conclusion

In this blog post, we have outlined methods to create anomaly detectors using alternates such as Amazon OpenSearch, Amazon CloudWatch, and a CloudFormation template-based solution.

Resource links:


About the Author

Nirmal Kumar is Sr. Product Manager for the Amazon SageMaker service. Committed to broadening access to AI/ML, he steers the development of no-code and low-code ML solutions. Outside work, he enjoys travelling and reading non-fiction.

Read More

Research Focus: Week of October 7, 2024

Research Focus: Week of October 7, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus | October 7, 2024

Securely Training Decision Trees Efficiently

In a recent paper: Securely Training Decision Trees Efficiently that will appear at ACM CCS 2024, researchers from Microsoft significantly reduce the communication complexity of secure decision tree training. Decision trees are an important class of supervised learning algorithms. In this approach, a classification or regression tree is built based on a set of features or attributes present in the training dataset. As with many learning algorithms, the accuracy of decision trees can be greatly improved with larger volumes of data. However, this can be a challenge, since data may come from multiple independent sources and require attention to data privacy concerns. In this case, the use of a privacy-enhancing technology, such as secure multi-party computation (MPC), can help protect the underlying training data.  

When the number of elements in the dataset is 𝑁, the number of attributes is 𝑚 and the height of the tree to be built is ℎ, the researchers construct a protocol with communication complexity O(𝑚𝑁 log 𝑁 + ℎ𝑚𝑁 + ℎ𝑁 log 𝑁 ), thereby achieving an improvement of ≈ min(ℎ, 𝑚, log 𝑁 ) over the previous state of the art. The essential feature is an improved protocol to regroup sorted private elements further into additional groups (according to a flag vector) while maintaining their relative ordering. Implementing this protocol in the MP-SPDZ framework shows that it requires 10× lesser communication and is 9× faster than existing approaches.


Multi-label audio classification with a noisy zero-shot teacher

Improving the real-world accuracy of audio content detection (ACD) is an important problem for streaming platforms, operating systems and playback devices. It’s similar to audio tagging, i.e., labeling sounds present in a given audio segment of several seconds length or longer. However, ACD may consist of a small number of higher-level labels or super-classes, e.g. speech, music, traffic, machines, animals, etc., where each label can include a multitude of specific sounds.

In a recent paper: Multi-label audio classification with a noisy zero-shot teacher, researchers from Microsoft propose a novel training scheme using self-label correction and data augmentation methods to deal with noisy labels and improve real-world accuracy on a polyphonic audio content detection task. The augmentation method reduces label noise by mixing multiple audio clips and joining their labels, while being compatible with multiple active labels. The researchers show that performance can be improved by a self-label correction method using the same pretrained model. They also show that it is feasible to use a strong zero-shot model such as CLAP to generate labels for unlabeled data and improve the results using the proposed training and label enhancement methods. The resulting model performs similar to CLAP while providing an efficient mobile device friendly architecture which can be quickly adapted to unlabeled sound classes. 


Tabularis Revilio: Converting Text to Tables

Tables are commonly used to store and present data. These tables are often moved as free-form text when copied from documents and applications without proper tabular support like PDF documents, web pages, or images. Users are dependent on manual effort or programming abilities to parse this free-form text back into structured tables.

In a recent paper: Tabularis Revilio: Converting Text to Tables, researchers from Microsoft present a novel neurosymbolic system for reconstructing tables when their column boundaries have been lost. Revilio addresses this task by detecting headers, generating an initial table sketch using a large language model (LLM), and using that sketch as a guiding representation during an enumerate-and-test strategy that evaluates syntactic and semantic table structures. Revilio was evaluated on a diverse set of datasets, demonstrating significant improvements over existing table parsing methods. Revilio outperforms traditional techniques in both accuracy and scalability, handling large tables with over 100,000 rows. The researchers’ experiments using publicly available datasets show an increase in reconstruction accuracy by 5.8–11.3% over both neural and symbolic baseline state-of-the-art systems. 

on-demand event

Microsoft Research Forum Episode 4

Learn about the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization.


Confidential Container Groups: Implementing Confidential Computing on Azure Container Instances

Container-based technologies empower cloud tenants to develop highly portable software and deploy services in the cloud at a rapid pace. Cloud privacy, meanwhile, is important as a large number of container deployments operate on privacy-sensitive data, but challenging due to the increasing frequency and sophistication of attacks. State-of-the-art confidential container-based designs leverage process-based trusted execution environments (TEEs), but face security and compatibility issues that limit their practical deployment.

In a recent article in Communications of the ACM: Confidential Container Groups: Implementing Confidential Computing on Azure Container Instances (opens in new tab), researchers from Microsoft with external colleagues present the Parma architecture, which provides lift-and-shift deployment of unmodified containers while providing strong security protection against a powerful attacker who controls the untrusted host and hypervisor. Parma leverages VM-level isolation to execute a container group within a unique VM-based TEE. Besides container integrity and user data confidentiality and integrity, Parma also offers container attestation and execution integrity based on an attested execution policy. This policy, which is specified by the customer, delimits the actions that the cloud service provider is allowed to take on their behalf when managing the container group. 

The result is that customers receive the security protections of TEEs for their container workloads with minimal costs to perfromance. To learn more, check out Confidential Containers on Azure Container Instances (opens in new tab), which is based on Microsoft’s Parma architecture. 


AI for Business Transformation with Peter Lee and Vijay Mital

Generative AI is changing how businesses operate and how stakeholders talk to each other. The building blocks for large scale AI transformation are now in place, but we are only beginning to imagine how it will unfold. Learn what Microsoft research leaders discovered from some early AI innovation in healthcare, and how businesses can prepare for what’s ahead.

In this new three-part video series, Microsoft Research President Peter Lee and Corporate Vice President Vijay Mital discuss how Microsoft is helping businesses navigate this transformation, along with the critical role of data and how emerging multimodal AI models could turbocharge business innovation.


The post Research Focus: Week of October 7, 2024 appeared first on Microsoft Research.

Read More