SK Telecom improves telco-specific Q&A by fine-tuning Anthropic’s Claude models in Amazon Bedrock

SK Telecom improves telco-specific Q&A by fine-tuning Anthropic’s Claude models in Amazon Bedrock

SK Telecom (SKT), South Korea’s leading telecommunications company serving 30 million customers, is at the forefront of AI innovation. In line with its AI Pyramid Strategy, which aims to unlock AI’s potential for anyone, anywhere, anytime, SKT has collaborated with the AWS Generative AI Innovation Center (GenAIIC) Custom Model Program to explore domain-trained models using Amazon Bedrock for telco-specific use cases.

This collaboration aligns with SKT’s vision of using AI expertise and strategic partnerships to develop innovative AI-based products and services. One such initiative focused on developing a custom solution for grounded question answering (Q&A) based on reference documents.

Retrieval Augmented Generation (RAG) is a popular technique for Q&A tasks, offering improved factual accuracy and knowledge grounding. However, RAG faces challenges with generating a response not matching preferred tone, style, and manners for telco use cases, as well as retrieving irrelevant documents, potentially leading to inaccurate responses. To address this, SKT and AWS GenAIIC aimed to use model customization to improve Anthropic Claude models on Amazon Bedrock in three key areas:

  • Providing concise and informative answers
  • Correctly referencing links from retrieved documents
  • Answering in a tone and style consistent with SKT and similar to ground truth answers

Additionally, the team explored boosting smaller model performance using synthetic data generated by bigger large language models (LLMs) for knowledge distillation and scenarios with limited labeled training data.

Amazon Bedrock is a fully managed service that offers a variety of LLMs and foundation models (FMs) along with capabilities such as Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, and Amazon Bedrock Guardrails that can expedite many generative AI use cases. Amazon Bedrock is the only fully managed service that provides you with the ability to fine-tune Claude models. Amazon Bedrock offers an intuitive and secure way of fine-tuning Anthropic’s Claude models and more. The fine-tuned Claude model can be deployed using Amazon Bedrock and can use the capabilities of Amazon Bedrock seamlessly, for example, Amazon Bedrock Knowledge Bases for the telco domain-specific RAG or Amazon Bedrock Agents for the agentic usage.

In this post, we share how SKT customizes Anthropic Claude models for telco-specific Q&A regarding technical telecommunication documents of SKT using Amazon Bedrock.

Solution overview

The team explored combinations of prompt optimization, customization (fine-tuning), and data augmentation with synthetic data. This multifaceted approach aimed to maximize the benefits of each technique for the grounded Q&A generation task.

In the following sections, we explore these methods in more detail.

Anthropic’s Claude customization with prompt optimization

Fine-tuning, which is available through Amazon Bedrock for various FMs, including Anthropic’s Claude, allows adaptation of pre-trained language models for specific use cases. It’s particularly effective for tailoring response style and format adherence.

The team first optimized the system prompt, implementing standardized guidelines for answer formatting and document citation based on Anthropic model prompting best practices. Key focus areas included:

  • Clear presentation of system commands
  • Consistent use of code block formatting
  • Context-based tailored responses

This prompt engineering, combined with fine-tuning, yielded substantial improvements:

  • Over 50% increase in ROUGE-3 score
  • Over 25% improvement in ROUGE-L score
  • Over 4% increase in embedding similarity score
  • Significant progress in accurate reference citation

The iterative enhancement process demonstrated cumulative benefits, with prompt updates alone showing 35–40 percent improvements in key metrics, and the final customized model achieving 50–60 percent gains in some metrics.

This progression clearly illustrates the cumulative benefits of model customization through RAG, prompt engineering, and fine-tuning, resulting in a model that significantly outperformed both the baseline and the prompt-updated versions in terms of ROUGE scores and citation accuracy. ROUGE score measures the similarity between ground truths and generated results by computing N-gram word overlaps. The following table summarizes these improvements.

LLM Prompt update Fine-tuning Relative improvement over baseline
ROUGE-3 ROUGE-L Citation accuracy
Anthropic’s Claude 3 Sonnet baseline baseline baseline
Anthropic’s Claude 3 Sonnet ✅ +38.30% +13.4% +52.94%
Anthropic’s Claude 3 Sonnet ✅ ✅ +58.1% +26.8% +70.59%

Synthetic data for fine-tuning

To address the challenge of limited high-quality labeled training data, the team explored synthetic data generation techniques. This approach also facilitates knowledge distillation from larger LLMs to smaller, more targeted models, offering benefits such as lower latency and cost.

The team conducted controlled experiments using:

  • A baseline set of 500 ground truth samples
  • An augmented set with 500 original over 1,500 synthetic samples
  • A larger original set of 2,000 samples

Synthetic data was generated using Anthropic’s Claude Sonnet 3, creating new question-answer pairs over the same retrieved documents used in ground truth examples.

The results were evaluated using both LLM-based comparison and human preference evaluation. Human evaluators blindly ranked model outputs, with scores assigned based on preference (Best: 4, Second: 3, Third: 2, Worst: 1). The following table shows the results of the human preference evaluation scores.

Rank Model Cumulative score
(best possible: 160)
1 Fine-tuned with 2,000 original samples 114
2 Fine-tuned with 500 original and 1,500 synthetic samples 112
3 Fine-tuned with 500 original samples 85
4 No fine-tuning (baseline) 84

Some key findings include:

  • Small training sets (500 samples) showed minimal improvement over baseline
  • Larger training sets (2,000 samples) scored considerably higher
  • Synthetically augmented data performed similarly to equivalent-sized original data

Although having a large volume of domain-specific training data is always ideal, many businesses have limited available datasets. In such scenarios, synthetic data can play a crucial role in place of original data. This demonstrates the potential of synthetic data for model customization.

Conclusion

SK Telecom’s collaboration with AWS GenAIIC showcases the company’s commitment to developing innovative AI solutions for telco challenges. By using Amazon Bedrock to customize Anthropic’s Claude models, SKT has achieved significant performance improvements for telco-specific, Korean language use cases without the need to build models from scratch. The proof of concept demonstrated significant improvements:

  • ~58% increase in ROUGE-3 score
  • ~27% increase in ROUGE-L score
  • Substantial improvement in returning correct reference links

This approach, combined with synthetic data generation techniques, aligns with SKT’s AI Pyramid Strategy, enabling faster testing and development of new approaches. As SKT continues to focus on key areas such as personal AI assistants, AI healthcare, and AI data centers, this collaboration with AWS represents a significant step in their AI evolution and long-term competitiveness in the global AI landscape.

For those interested in working with AWS on similar projects, visit Generative AI Innovation Center.


About the Authors

Sungmin Hong is a Senior Applied Scientist at AWS Generative AI Innovation Center where he helps expedite the variety of use cases of AWS customers. Before joining Amazon, Sungmin was a postdoctoral research fellow at Harvard Medical School. He holds Ph.D. in Computer Science from New York University. Outside of work, Sungmin enjoys hiking, reading and cooking.

Sujeong Cha is a Deep Learning Architect at the AWS Generative AI Innovation Center, where she specializes in model customization and optimization. She has extensive hands-on experience in solving customers’ business use cases by utilizing generative AI as well as traditional AI/ML solutions. Sujeong holds a M.S. degree in Data Science from New York University.

Arijit Ghosh Chowdhury is a Scientist with the AWS Generative AI Innovation Center, where he works on model customization and optimization. In his role, he works on applied research in fine-tuning and model evaluations to enable GenAI for various industries. He has a Master’s degree in Computer Science from the University of Illinois at Urbana Champaign, where his research focused on question answering, search and domain adaptation.

Yiyue Qian is an Applied Scientist II at the AWS Generative AI Innovation Center, where she supports providing generative AI solutions to AWS customers. In this role, she collaborates with a team of experts to develop innovative AI-driven models for AWS customers across various industries. Yiyue holds a Ph.D. in Computer Science from the University of Notre Dame, where her research focused on advanced machine learning and deep learning techniques.

Wei-Chih Chen is a Machine Learning Engineer at the AWS Generative AI Innovation Center, where he works on model customization and optimization for LLMs. He also builds tools to help his team tackle various aspects of the LLM development life cycle—including fine-tuning, benchmarking, and load-testing—that accelerating the adoption of diverse use cases for AWS customers. He holds an M.S. degree in Computer Science from UC Davis.

Hannah Marlowe is a Senior Manager of Model Customization at the AWS Generative AI Innovation Center. Her team specializes in helping customers develop differentiating Generative AI solutions using their unique and proprietary data to achieve key business outcomes. She holds a Ph.D in Physics from the University of Iowa, with a focus on astronomical X-ray analysis and instrumentation development. Outside of work, she can be found hiking, mountain biking, and skiing around the mountains in Colorado.

Seunghyeon Jeong (Steve) is a team leader of the Platform Application team at SKT. He is responsible for commercializing the Global Intelligence Platform (GIP), which provides AI models and tools. For most of his career, he has been a PM developing various mobile services such as mobile wallet, fashion streaming, and unified login services for SK. His team is expanding the delivery of models and features to make it easier for internal teams to apply AI, contributing to SKT’s AI Transformation. Before entering the AI space, he was a Product Manager, developing and operating various mobile services such as mobile wallet, fashion streaming, and unified login services for the US and Korea.

Sunwoo Lee (Lois) is the team leader of the Data Construction and Evaluation Team within SK Telecom’s Global AI Tech division. She oversees the design and construction of training data for language models, the model performance evaluation process, and its application to services. Her career has focused on NLP within IT, which is a great fit with her background in Linguistics and Korean language education. Alongside her world-class team, she continues to explore and solve fascinating problems such as how to optimize the design of data for language model training, which tasks and methods to implement for validating language model performance, and the best design of AI-human conversations.

Eric Davis is the vice president of the AI Tech Collaboration Group at SKT. Eric oversees tech collaborations with worldwide tech partners to customize large language models (LLMs) for the telecommunications domain. His teams are responsible for designing and building the datasets to tune LLMs, as well as benchmarking LLMs in general and for the telecommunications domain. Eric holds a Master of Science degree in Computer Science from Carnegie Mellon from the Language Technologies Institute and a Bachelor of Arts in Linguistics and Psychology from the University of California, Los Angeles.

Read More

Scaling Rufus, the Amazon generative AI-powered conversational shopping assistant with over 80,000 AWS Inferentia and AWS Trainium chips, for Prime Day

Scaling Rufus, the Amazon generative AI-powered conversational shopping assistant with over 80,000 AWS Inferentia and AWS Trainium chips, for Prime Day

Amazon Rufus is a shopping assistant experience powered by generative AI. It generates answers using relevant information from across Amazon and the web to help Amazon customers make better, more informed shopping decisions. With Rufus, customers can shop alongside a generative AI-powered expert that knows Amazon’s selection inside and out, and can bring it all together with information from across the web to help shoppers make more informed purchase decisions.

To meet the needs of Amazon customers at scale, Rufus required a low-cost, performant, and highly available infrastructure for inference. The solution needed the capability to serve multi-billion parameter large language models (LLMs) with low latency across the world to service its expansive customer base. Low latency makes sure users have a positive experience chatting with Rufus and can start getting responses in less than a second. To achieve this, the Rufus team is using multiple AWS services and AWS AI chips, AWS Trainium and AWS Inferentia.

Inferentia and Trainium are purpose-built chips developed by AWS that accelerate deep learning workloads with high performance and lower overall costs. With these chips, Rufus reduced its costs by 4.5 times lower than other evaluated solutions while maintaining low latency for its customers. In this post, we dive into the Rufus inference deployment using AWS chips and how this enabled one of the most demanding events of the year—Amazon Prime Day.

Solution overview

At its core, Rufus is powered by an LLM trained on Amazon’s product catalog and information from across the web. LLM deployment can be challenging, requiring you to balance factors such as model size, model accuracy, and inference performance. Larger models generally have better knowledge and reasoning capabilities but come at a higher cost due to more demanding compute requirements and increasing latency. Rufus would need to be deployed and scale to meet the tremendous demand of peak events like Amazon Prime Day. Considerations for this scale include how well it needs to perform, its environmental impact, and the cost of hosting the solution. To meet these challenges, Rufus used a combination of AWS solutions: Inferentia2 and Trainium, Amazon Elastic Container Service (Amazon ECS), and Application Load Balancer (ALB). In addition, the Rufus team partnered with NVIDIA to power the solution using NVIDIA’s Triton Inference Server, providing capabilities to host the model using AWS chips.

Rufus inference is a Retrieval Augmented Generation (RAG) system with responses enhanced by retrieving additional information such as product information from Amazon search results. These results are based on the customer query, making sure the LLM generates reliable, high-quality, and precise responses.

To make sure Rufus was best positioned for Prime Day, the Rufus team built a heterogeneous inference system using multiple AWS Regions powered by Inferentia2 and Trainium. Building a system across multiple Regions allowed Rufus to benefit in two key areas. First, it provided additional capacity that could be used during times of high demand, and second, it improved the overall resiliency of the system.

The Rufus team was also able to use both Inf2 and Trn1 instance types. Because Inf2 and Trn1 instance types use the same AWS Neuron SDK, the Rufus team was able to use both instances to serve the same Rufus model. The only configuration setting to adjust was the tensor parallelism degree (24 for Inf2, 32 for Trn1). Using Trn1 instances also led to an additional 20% latency reduction and throughput improvement compared to Inf2.

The following diagram illustrates the solution architecture.

To support real-time traffic routing across multiple Regions, Rufus built a novel traffic orchestrator. Amazon CloudWatch supported the underlying monitoring, helping the team adjust the traffic ratio across the different Regions in less than 15 minutes based on the traffic pattern changes. By using this type of orchestration, the Rufus team had the ability to direct requests to other Regions when needed, with a small trade-off of latency to the first token. Due to Rufus’s streaming architecture and the performant AWS network between Regions, the perceived latency was minimal for end-users.

These choices allowed Rufus to scale up over 80,000 Trainium and Inferentia chips across three Regions serving an average of 3 million tokens a minute while maintaining P99 less than 1 second latency to the first response for Prime Day customers. In addition, by using these purpose-built chips, Rufus achieved 54% better performance per watt than other evaluated solutions, which helped the Rufus team meet energy efficiency goals.

Optimizing inference performance and host utilization

Within each Region, the Rufus inference system used Amazon ECS, which managed the underlying Inferentia and Trainium powered instances. By managing the underlying infrastructure, the Rufus team only needed to bring their container and configuration by defining an ECS task. Within each container, an NVIDIA Triton Inference Server with a Python backend is used running vLLM with the Neuron SDK. vLLM is a memory-efficient inference and serving engine that is optimized for high throughput. The Neuron SDK makes it straightforward for teams to adopt AWS chips and supports many different libraries and frameworks such as PyTorch Lightning.

The Neuron SDK provides a straightforward LLM inference solution on Trainium and Inferentia hardware with optimized performance supporting a wide range of transformer-based LLM architectures. To reduce latency, Rufus has collaborated with the AWS Annapurna team to develop various optimizations such as INT8 (weight only) quantization, continuous batching with vLLM, resource, compute, and memory bandwidth in the Neuron compiler and runtime. These optimizations are currently deployed in Rufus production and are available to use in the Neuron SDK 2.18 and onward.

To reduce overall waiting time for customers to start seeing a response from Rufus, the team also developed an inference streaming architecture. With the high compute and memory load needed for LLM inference, the total time it takes to finish generating the full response for a customer query can take multiple seconds. With a streaming architecture, Rufus is able to return the tokens right after they’re generated. This optimization allows the customer to start consuming the response in less than 1 second. In addition, multiple services work together using gRPC connections to intelligently aggregate and enhance the streaming response in real time for customers.

As shown in the following figure, images and links are embedded in the response, which allow customers to engage and continue exploring with Rufus.

Scaling up

Although we have to maintain low latency for the best customer experience, it’s also crucial to scale the service throughput by achieving high hardware resource utilization. High hardware utilization makes sure accelerators don’t sit idle and needlessly increase costs. To optimize the inference system throughput, the team improved both single-host throughput as well as load balancing efficiency.

Load balancing for LLM inference is tricky due to following challenges. First, a single host can only handle a limited number of concurrent requests. Second, the end-to-end latency to complete one request can vary, spanning many seconds depending on the LLM response length.

To address the challenges, the team optimized throughput by considering both single-host throughput and throughput across many hosts using load balancing.

The team used the least outstanding requests (LOR) routing algorithm from ALB, increasing throughput by five times faster in comparison to an earlier baseline measurement. This allows each host to have enough time to process in-flight requests and stream back responses using a gRPC connection, without getting overwhelmed by multiple requests received at the same time. Rufus also collaborated with AWS and vLLM teams to improve single-host concurrency using vLLM integration with the Neuron SDK and NVIDIA Triton Inference Server.

Figure 1. ECS tasks scale horizontally hosting the Triton Inference Server and dependencies

Figure 1. ECS tasks scale horizontally hosting the Triton Inference Server and dependencies

With this integration, Rufus was able to benefit from a critical optimization: continuous batching. Continuous batching allows a single host to greatly increase throughput. In addition, continuous batching provides unique capabilities in comparison to other batch techniques, such as static batching. For example, when using static batching, the time to first token (TTFT) increases linearly with the number of requests in one batch. Continuous batching prioritizes the prefill stage for LLM inference, keeping TTFT under control even with more requests running at the same time. This helped Rufus provide a pleasant experience with low latency when generating the first response, and improve the single-host throughput to keep serving costs under control.

Conclusion

In this post, we discussed how Rufus is able to reliably deploy and serve its multi-billion-parameter LLM using the Neuron SDK with Inferentia2 and Trainium chips and AWS services. Rufus continues to evolve with advancements in generative AI and customer feedback and we encourage you to use Inferentia and Trainium.

Learn more about how we are innovating with generative AI across Amazon.


About the author

James Park is a Solutions Architect at Amazon Web Services. He works with Amazon.com to design, build, and deploy technology solutions on AWS, and has a particular interest in AI and machine learning. In his spare time, he enjoys seeking out new cultures, new experiences, and staying up to date with the latest technology trends.

RJ is an Engineer within Amazon. He builds and optimizes systems for distributed systems for training and works on optimizing adopting systems to reduce latency for ML Inference. Outside work, he is exploring using Generative AI for building food recipes.

Yang Zhou is a software engineer working on building and optimizing machine learning systems. His recent focus is enhancing the performance and cost efficiency of generative AI inference. Beyond work, he enjoys traveling and has recently discovered a passion for running long distances.

Adam (Hongshen) Zhao is a Software Development Manager at Amazon Stores Foundational AI. In his current role, Adam is leading Rufus Inference team to build GenAI inference optimization solutions and inference system at scale for fast inference at low cost. Outside work, he enjoys traveling with his wife and art creations.

Faqin Zhong is a software engineer at Amazon Stores Foundational AI, working on Large Language Model (LLM) inference infrastructure and optimizations. Passionate about Generative AI technology, Faqin collaborates with leading teams to drive innovations, making LLMs more accessible and impactful, ultimately enhancing customer experiences across diverse applications. Outside of work she enjoys cardio exercise and baking with her son.

Nicolas Trown is an engineer in Amazon Stores Foundational AI. His recent focus is lending his systems expertise across Rufus to aid Rufus Inference team and efficient utilization across the Rufus experience. Outside of work he enjoys spending time with his wife and day trips to nearby coast, Napa, and Sonoma areas.

Bing Yin is a director of science at Amazon Stores Foundational AI. He leads the effort to build LLMs that are specialized for shopping use cases and optimized for inference at Amazon scale. Outside of work, he enjoys running marathon races.

Read More

Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision

Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision

Amazon Lookout for Vision, the AWS service designed to create customized artificial intelligence and machine learning (AI/ML) computer vision models for automated quality inspection, will be discontinuing on October 31, 2025. New customers will not be able to access the service effective October 10, 2024, but existing customers will be able to use the service as normal until October 31, 2025. AWS will continue to support the service with security updates, bug fixes, and availability enhancements, but we do not plan to introduce new features for this service.

This post discusses some alternatives to Lookout for Vision and how you can export your data from Lookout for Vision to migrate to an alternate solution.

Alternatives to Lookout for Vision

If you’re interested in an alternative to Lookout for Vision, AWS has options for both buyers and builders.

For an out-of-the-box solution, the AWS Partner Network offers solutions from multiple partners. You can browse solutions on the Computer Vision for Quality Insights page in the AWS Solutions Library. These partner solutions include options for software, software as a service (SaaS) applications, managed solutions or custom implementations based on your needs. This approach provides a solution that addresses your use case without requiring you to have expertise in imaging, computer vision, AI, or application development. This typically provides the fastest time to value by taking advantage of the specialized expertise of the AWS Partners. The Solutions Library also has additional guidance to help you build solutions faster.

If you prefer to build your own solution, AWS offers AI tools and services to help you develop an AI-based computer vision inspection solution. Amazon SageMaker provides a set of tools to build, train, and deploy ML models for your use case with fully managed infrastructure, tools, and workflows. In addition to SageMaker enabling you to build your own models, Amazon SageMaker JumpStart offers built-in computer vision algorithms and pre-trained defect detection models that can be fine-tuned to your specific use case. This approach provides you the tools to accelerate your AI development while providing complete flexibility to build a solution that meets your exact requirements and integrates with your existing hardware and software infrastructure. This typically provides the lowest operating costs for a solution.

AWS also offers Amazon Bedrock, a fully managed service that offers a choice of high-performing generative AI foundation models (FMs), including models that can help build a defect detection model running in the cloud. This approach enables you to build a custom solution while using the power of generative AI to handle the custom computer vision model creation and some of the code generation to speed development, eliminating the need for full AI computer vision expertise. Amazon Bedrock provides the ability to analyze images for defects, compare performance of different models, and generate code for custom applications. This alternative is useful for use cases that don’t require low latency processing, providing faster time to value and lower development costs.

Migrating data from Lookout for Vision

To move existing data from Lookout for Vision to use in an alternative implementation, the Lookout for Vision SDK provides the capability to export a dataset from the service to an Amazon Simple Storage Service (Amazon S3) bucket. This procedure exports the training dataset, including manifest and dataset images, for a project to a destination Amazon S3 location that you specify. With the exported dataset and manifest file, you can use the same data that you used to create a Lookout for Vision model to create a model using SageMaker or Amazon Bedrock, or provide it to a partner to incorporate into their customizations for your use case.

Summary

Although Lookout for Vision is planned to shut down on October 31, 2025, AWS offers a powerful set of AI/ML services and solutions in the form of SageMaker tools to build custom models and generative AI with Amazon Bedrock to do customized inspection and generate code, in addition to a range of offerings from partners in the AWS Partner Network. Export tools enable you to effortlessly move your data from Lookout for Vision to an alternate solution if you so choose. You should explore these options to determine what works best for your specific needs.

For more details, refer to the following resources:


About the Author

Tim Westman is the Product Manager and Go-to-Market Lead for Edge Machine Learning, AWS. Tim leads the Product Management and Business Development for the Edge Machine Learning business at Amazon Web Services. In this role, he works with customers to help build computer vision solutions at the edge to solve complex operational challenges. Tim has more than 30 years of experience in sales, business development and product management roles for leading hardware and software companies, with the last 8 years specializing in AI and computer vision for IoT applications.

Read More

AI’ll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections

AI’ll Be by Your Side: Mental Health Startup Enhances Therapist-Client Connections

Half of the world’s population will experience a mental health disorder — but the median number of mental health workers per 100,000 people is just 13, according to the World Health Organization.

To help tackle this disparity — which can vary by over 40x between high-income and low-income countries — a Madrid-based startup is offering therapists AI tools to improve the delivery of mental health services.

Therapyside, a member of the NVIDIA Inception program for cutting-edge startups, is bolstering its online therapy platform using NVIDIA NIM inference microservices. These AI microservices serve as virtual assistants and notetakers, letting therapists focus on connecting with their clients.

“In a therapy setting, having a strong alliance between counselor and client is everything,” said Alessandro De Sario, founder and CEO of Therapyside. “When a therapist can focus on the session without worrying about note-taking, they can reach that level of trust and connection much quicker.”

For the therapists and clients who have opted in to test these AI tools, a speech recognition model transcribes their conversations. A large language model summarizes the session into clinical notes, saving time for therapists so they can speak with more clients and work more efficiently. Another model powers a virtual assistant, dubbed Maia, that can answer therapists’ questions using retrieval-augmented generation, aka RAG.

Therapyside aims to add features over time, such as support for additional languages and an offline version that can transcribe and summarize in-person therapy sessions.

“We’ve just opened the door,” said De Sario. “We want to make the tool much more powerful so it can handle administrative tasks like calendar management and patient follow-up, or remind therapists of topics they should cover in a given session.”

AI’s in Session: Enhancing Therapist-Client Relationships

Therapyside, founded in 2017, works with around 1,000 licensed therapists in Europe offering counseling in English, Italian and Spanish. More than 500,000 therapy sessions have been completed through its virtual platform to date.

The company’s AI tools are currently available through a beta program. Therapists who choose to participate can invite their clients to opt in to the AI features.

“It’s incredibly helpful to have a personalized summary with a transcription that highlights the most important points from each session I have with my patients,” said Alejandro A., one of the therapists participating in the beta program. “I’ve been pleasantly surprised by its ability to identify the most significant areas to focus on with each patient.”

Screen capture of Therapyside session with transcription running live
A speech recognition AI model can capture live transcriptions of sessions.

The therapists testing the tool rated the transcriptions and summaries as highly accurate, helping them focus on listening without worrying about note-taking.

“The recaps allow me to be fully present with the clients in my sessions,” said Maaria A., another therapist participating in the beta program.

During sessions, clients share details about their life experiences that are captured in the AI-powered transcriptions and summaries. Therapyside’s RAG-based Maia connects to these resources to help therapists quickly recall minutiae like the name of a client’s sibling, or track how a client’s main challenges have evolved over time. This information can help therapists pose more personalized questions and provide better support.

“Maia is a valuable tool to have when you’re feeling a little stuck,” said Maaria A. “I have clients all over the world, so Maia helps remind me where they live. And if I ask Maia to suggest exercises clients could do to boost their self-esteem, it helps me find resources I can send to them, which helps save time.”

Screen capture of a therapist Q&A with the Maia virtual assistant
Maia can answer therapist’s questions based on session transcripts and summaries.

Take Note: AI Microservices Enable Easy Deployment

Therapyside’s AI pipeline runs on NVIDIA GPUs in a secure cloud environment and is built with NVIDIA NIM, a set of easy-to-use microservices designed to speed up AI deployment.

For transcription, the pipeline uses NVIDIA Riva NIM microservices, which include NVIDIA Parakeet, a record-setting family of models, to deliver highly accurate automatic speech recognition. Flowchart illustrating Therapyside's AI pipeline

Once the transcript is complete, the text is processed by a NIM microservice for Meta’s Llama 3.1 family of open-source AI models to generate a summary that’s added to the client’s clinical history.

The Maia virtual assistant, which also uses a Llama 3.1 NIM microservice, accesses these clinical records using a RAG pipeline powered by NVIDIA NeMo Retriever NIM microservices. RAG techniques enable organizations to connect AI models to their private datasets to deliver contextually accurate responses.

Therapyside plans to further customize Maia with capabilities that support specific therapeutic methods, such as cognitive behavioral therapy and psychodynamic therapy. The team is also integrating NVIDIA NeMo Guardrails to further enhance the tools’ safety and security.

Kimberly Powell, vice president of healthcare at NVIDIA, will discuss Therapyside and other healthcare innovators in a keynote address at HLTH, a conference taking place October 20-23 in Las Vegas.

Learn more about NVIDIA Inception and get started with NVIDIA NIM microservices at ai.nvidia.com.

Read More

The Next Chapter Awaits: Dive Into ‘Diablo IV’s’ Latest Adventure ‘Vessel of Hatred’ on GeForce NOW

The Next Chapter Awaits: Dive Into ‘Diablo IV’s’ Latest Adventure ‘Vessel of Hatred’ on GeForce NOW

Prepare for a devilishly good time this GFN Thursday as the critically acclaimed Diablo IV: Vessel of Hatred downloadable content (DLC) joins the cloud, one of six new games available this week.

GeForce NOW also extends its game-library sync feature to Battle.net accounts, so members can seamlessly bring their favorite Blizzard games into their cloud-streaming libraries.

Hell’s Bells and Whistles

Get ready to rage. New DLC for the hit title Diablo IV: Vessel of Hatred is available to stream at launch this week, with thrilling content and gameplay for GeForce NOW members to experience.

Diablo IV Vessel of Hatred DLC on GeForce NOW
Hate is in the air.

Diablo IV: Vessel of Hatred DLC is the highly anticipated expansion of the latest installment in Blizzard’s iconic action role-playing game series. It introduces players to the lush and dangerous jungles of Nahantu. Teeming with both beauty and dangers, this new environment offers a fresh backdrop for action-packed battles against the demonic forces of Hell. A new playable class, the Spiritborn, offers unique gameplay mechanics tied to four guardian spirits: the eagle, gorilla, jaguar and centipede.

The DLC extends the main Diablo IV story and includes new features such as recruitable Mercenaries, a Player vs. Everyone co-op endgame activity, Party Finder to help members team up and take down challenges together, and more. Vessel of Hatred arrives alongside major updates including revamped leveling, a new difficulty system and Paragon adjustments that will continue to enhance the world of Diablo IV.

Ultimate members can experience the wrath at up to 4K resolution and 120 frames per second with support for NVIDIA DLSS and ray-tracing technologies. And members can jump right into the latest DLC without having to wait around for updates. Hell never looked so good, even on low-powered devices.

Let That Sync In

Battle.net game sync on GeForce NOW
Connection junction.

With game syncing for Blizzard’s Battle.net game library coming to GeForce NOW this week, members can connect their digital game store accounts so that all of their supported games are part of their streaming libraries.

Members can now easily find and stream popular titles such as StarCraft II, Overwatch 2, Call of Duty HQ and Hearthstone from their cloud gaming libraries, enhancing the games’ accessibility across a variety of devices.

Battle.net joins other digital storefronts that already have game sync support, including Steam, Epic Games Store, Xbox and Ubisoft Connect. This allows members to consolidate their gaming experiences in one place.

Plus, GeForce NOW members can play high-quality titles without the need for high-end hardware, streaming from GeForce RTX-powered servers in the cloud. Whether battling demons in Sanctuary or engaging in epic firefights, GeForce NOW members get a seamless gaming experience anytime, anywhere.

Hot and New

Europa on GeForce NOW
Soar through serenity and uncover destiny, all from the cloud.

Europa is a peaceful game of adventure, exploration and meditation from Future Friends Games, ready for members to stream at launch this week. On the moon Europa, a lush terraformed paradise in Jupiter’s shadow, an android named Zee sets out in search of answers. Run, glide and fly across the landscape, solve mysteries in the ruins of a fallen utopia, and discover the story of the last human alive.

Members can look for the following games available to stream in the cloud this week:

  • Empyrion – Galactic Survival (New release on Epic Games Store, Oct. 10)
  • Europa (New release on Steam, Oct. 11)
  • Dwarven Realms (Steam)
  • Star Trek Timelines (Steam)
  • Star Trucker (Steam)
  • Starcom: Unknown Space (Steam)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Unlock the knowledge in your Slack workspace with Slack connector for Amazon Q Business

Unlock the knowledge in your Slack workspace with Slack connector for Amazon Q Business

Amazon Q Business is a fully managed, generative AI-powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. Amazon Q Business offers over 40 built-in connectors to popular enterprise applications and document repositories, including Amazon Simple Storage Service (Amazon S3), Salesforce, Google Drive, Microsoft 365, ServiceNow, Gmail, Slack, Atlassian, and Zendesk and can help you create your generative AI solution with minimal configuration.

Nearly 100 thousand organizations use Slack to bring the right people together to securely collaborate with each other. A Slack workspace captures invaluable organizational knowledge in the form of the information that flows through it as the users communicate on it. Hence, it is valuable to make this knowledge quickly and securely available to the users.

In this post, we will demonstrate how to set up Slack connector for Amazon Q Business to sync communications from both public and private channels, reflective of user permissions. We will also guide you through the configurations needed on your Slack workspace. Additionally, you will learn how to configure the Amazon Q Business application and enable user authentication through AWS IAM Identity Center, which is a recommended service for managing a workforce’s access to AWS applications.

Data source overview

Amazon Q Business uses large language models (LLMs) to build a unified solution that connects multiple data sources. Typically, you’d need to use a natural language processing (NLP) technique called Retrieval Augmented Generation (RAG) for this. With RAG, generative AI enhances its responses by incorporating relevant information retrieved from a curated dataset. Amazon Q Business has a built-in managed RAG capability designed to reduce the undifferentiated heavy lifting involved in creating these systems. Typical of a RAG model, Amazon Q Business has two components: A retrieval component that retrieves relevant documents for the user query and a generation component that takes the query and the retrieved documents and then generates an answer to the query using an LLM.

A Slack workspace has multiple elements. It has public channels where workspace users can participate and private channels where only channel members can communicate with each other. Individuals can also directly communicate with each other in one-on-one conversations and in user groups. This communication is in the form of messages and threads of replies, with optional document attachments. Slack workspaces of active organizations are highly dynamic, with the content and collaboration evolving and growing in volume continuously.

The preceding figure shows the process flow of the solution. When you connect Amazon Q Business to a data source (in this case, Slack), what Amazon Q considers and crawls as a document varies by connector. For the Amazon Q Business Slack connector, each message, message attachment and channel post is considered a single document, However, Slack conversation threads that help you create organized discussions around specific messages are also considered and ingested as a single document, regardless of the number of participants or messages they contain.

Amazon Q Business crawls access control list (ACL) information attached to a document (user and group information) from your Slack instance. This information can be used to filter chat responses to the user’s document access level. The Slack connector supports token-based authentication. This could be a Slack bot user OAuth token or Slack user OAuth token. See the Slack connector overview to get the list of entities that are extracted, supported filters, sync modes, and file types.

User IDs (_user_id) exist in Slack on messages and channels where there are set access permissions. They are mapped from the user emails as the IDs in Slack.

To connect your data source connector to Amazon Q Business, you must give Amazon Q Business an IAM role that has the following permissions:

  • Permission to access the BatchPutDocument and BatchDeleteDocument operations to ingest documents.
  • Permission to access the User Store API operations to ingest user and group access control information from documents.
  • Permission to access your AWS Secrets Manager secret to authenticate your data source connector instance.
  • (Optional) If you’re using Amazon Virtual Private Cloud (Amazon VPC), permission to access your Amazon VPC.

Solution overview

In this solution, we will show you how to create a Slack workspace with users who perform various roles within the organization. We will then show you how to configure this workspace to define a set of scopes that are required by the Amazon Q Business Slack connector to index the user communication. This will be followed by the configuration of the Amazon Q Business application and a Slack data source. Based on the configuration, when the data source is synchronized, the connector either crawls and indexes the content from the workspace that was created on or before a specific date. The connector also collects and ingests ACL information for each indexed message and document. Thus, the search results of a query made by a user includes results only from those documents that the user is authorized to read.

Prerequisites

To build the Amazon Q Business connector for Slack, you need the following:

In Slack:

  • Create a Slack bot user OAuth token or Slack user OAuth token. You can choose either token to connect Amazon Q Business to your Slack data source. See the Slack documentation on access tokens for more information.
  • Note your Slack workspace team ID from your Slack workspace main page URL. For example, https://app.slack.com/client/T0123456789/... where T0123456789 is the team ID.
  • Add the OAuth scopes and read permissions.

In your AWS account:

  • Create an AWS Identity and Access Management (IAM) role for your data source and, if using the Amazon Q Business API, note the ARN of the IAM role.
  • Store your Slack authentication credentials in an AWS Secrets Manager secret and, if using the Amazon Q Business API, note the ARN of the secret.
  • Enable and configure an IAM Identity Center instance. Amazon Q Business integrates with IAM Identity Center as a gateway to manage user access to your Amazon Q Business application. We recommend enabling and pre-configuring an Identity Center instance before you begin to create your Amazon Q Business application. Identity Center is the recommended AWS service for managing human user access to AWS resources. Amazon Q Business supports both organization and account level Identity Center instances. See Setting up for Amazon Q Business for more information.

Configure your Slack workspace

You will create one user for each of the following roles: Administrator, Data scientist, Database administrator, Solutions architect and Generic.

User name Role
arnav_desai Admin
jane_doe Data Scientist
pat_candella DB Admin
mary_major Solutions Architect
john_stiles Generic User

To showcase the ACL propagation, you will create three public channels, #general, #customerwork, and #random, that any member can access including the Generic user. Also, one private channel, #anydepartment-project-private, that can be accessed only by the users arnav_desai, john_stiles, mary_major, and pat_candella.

To create a Slack app:

  1. Navigate to the Slack API Your Apps page and choose Create New App.
  2. Select From scratch. In the next screen, select the workspace to develop your app, and then choose Create an App.
  3. Give the Slack app a name and select a workspace to develop your app in. Then choose Create App.
  4. After you’ve created your app, select it and navigate to Features and choose OAuth & Permissions.
  5. Scroll down to Scopes > User Token Scopes and set the OAuth scope based on the user token scopes in Prerequisites for connecting Amazon Q Business to Slack.

Note: You can configure two types of scopes in a Slack workspace:

  1. Bot token scope: Only the messages to which it has been explicitly added are crawled by the bot token. It is employed to grant restricted access to specific messages only.
  2. User token scope: Only the data shared with the member is accessible to the user token, which acts as a representative of a Slack user.

For this example, so you can search on the conversations between users, you will use the user token scope.

  1. After the OAuth scope for yser token has been set up as described in the Slack prerequisites, scroll up to the section OAuth Tokens for your Workspace, and choose Install to Workspace, and then choose Allow.
  2. This will generate a user OAuth token. Copy this token to use when configuring the Amazon Q Business Slack connector.

Configure the data source using the Amazon Q Business Slack connector

In this section, you will create an Amazon Q Business application using the console.

To create an Amazon Q Business application

  1. In the AWS Management Console for Amazon Q Business, choose Create Application.
  2. Enter an Application Name, such as my-slack-workspace. Leave the Service access as the default value, and select AWS IAM Identity Center for Access Management . Enter a new Tag value as required and choose Create to the Amazon Q Business Application.
  3. Leave the default option of Use Native retriever selected for Retrievers, leave Enterprise as the Index provisioning and leave the default value of 1 as the Number of units. Each unit in Amazon Q Business index is 20,000 documents or 200 MB of extracted text (whichever comes first). Choose Next.
  4. Scroll down the list of available connectors and select Slack and then choose Next.

    1. Enter a Data source name and a Description to identify your data source and then enter the Slack workspace team ID to connect with Amazon Q Business.
    2. In the Authentication section, select Create and add a new secret.
    3. On the dialog box that appears, enter a Secret name followed by the User OAuth Slack token that was copied from the Slack workspace.
    4. For the IAM role, select Create a new service role (Recommended).
    5. In Sync scope, choose the following:
      • For select type of content to crawl, select All channels.
      • Select an appropriate date for Select crawl start date.
      • Leave the default value selected for Maximum file size as 50.
      • You can include specific Messages, such as bot messages or archived messages to sync.
      • Additionally, you can include up to 100 patterns to include or exclude filenames, types, or file paths to sync.

    6. For Sync mode, leave Full sync selected and for the Sync run schedule, select Run on demand.
    7. Leave the field mapping as is and choose Add data source.
    8. On the next page, choose Next.
  5. Add the five users you created earlier, who are a part of IAM Identity Center and the Slack workspace to the Amazon Q Business application. To add users to Identity Center, follow the instructions in Add users to your Identity Center directory. When done, choose Add groups and users and choose Assign.
  6. When a user is added, each user is assigned the default Q Business Pro For more information on different pricing tiers, see the Amazon Q Business pricing page.
  7. Choose Create application to finish creating the Amazon Q Business application.
  8. After the application and the data source are created, select the data source and then choose Sync now to start syncing documents from your data source.
  9. The sync process ingests the documents from your Slack workspace to your selections in the Slack connector configuration in Amazon Q Business. The following screenshot shows the results of a successful sync, indicated by the status of Completed.

Search with Amazon Q Business

Now, you’re ready to make a few queries in Amazon Q Business.

To search using Amazon Q Business:

  1. Navigate to the Web experience settings tab and click on the Deployed URL.
  2. For this demonstration, sign in as pat_candella who has the role of DB Admin.
  3. Enter the password for pat_candella and choose Sign in
  4. Upon successful sign-in, you will be signed in to Amazon Q Business.
  5. In the Slack workspace, there is a public channel, the #customerwork channel that all users are members of. The #customerwork Slack channel is being used to communicate about an upcoming customer engagement, as shown in the following figure.
  6. Post the first question to Amazon Q Business.
I am currently using Apache Kafka. Can you list high level steps involved in migration to Amazon MSK?

Note that the response includes citations that refer to the conversation as well as the content of the PDF that was attached to the conversation.

Security and privacy options with Slack data connector

Next, you will create a private channel called #anydepartment-project-private with four out of the five users—arnav_desai, john_stiles, mary_major and pat_candella—and verify that the messages exchanged in a private channel are not available to non-members like jane_doe. Note that after you create a new private channel, you need to manually re-run the sync on the data source.

The below screenshot shows the private slack channel with four out of five users and the slack conversation.

Testing security and privacy options with Slack data connector

  1. While signed in as pat_candella, who is part of the private #anydepartment-project-private channel, execute the following query:
    What is Amazon Kendra and which API do I use to query a Kendra index?

  2. Now, sign in as jane_doe, who is not a member of the #anydepartment-project-private channel and execute the same query.
  3. Amazon Q Business prevents jane_doe from getting insights from information within the private channels that they aren’t part of, based on the synced ACL information.

Indexing aggregated Slack threads

Slack organizes conversations into threads, which can involve multiple users and messages. The Amazon Q Business Slack connector treats each thread as a single document, regardless of the number of participants or messages it contains. This approach allows Amazon Q Business to ingest entire conversation threads as individual units, maximizing the amount of data that can be processed within a single index unit. As a result, you can efficiently incorporate more comprehensive conversational context into your Amazon Q Business system.

The figure that follows shows a conversation between pat_candella and jane_doe that includes six messages in a thread. The Slack connector aggregates this message thread as a single message, thus maximizing the use of an index unit.

Because the conversation thread is aggregated as a single document within the Amazon Q Business index, you can ask questions that pertain to a single conversation thread as shown in the following figure.

Troubleshooting the sync process

  • Why isn’t Amazon Q Business answering any of my questions?

If you aren’t getting answers to your questions from Amazon Q Business, verify the following:

  • Permissions – Document ACLs indexed by Amazon Q Business may not allow you to query certain data entities as demonstrated in our example. If this is the case, please reach out to your Slack workspace administrator to make sure that your user has access to required documents and repeat the sync process.
  • Data connector sync – A failed data source sync may prevent the documents from being indexed, meaning that Amazon Q Business would be unable to answer questions about the documents that failed to sync. Please refer to the official documentation to troubleshoot data source connectors.
  • I’m receiving access errors on Amazon Q Business application. What causes this?

See Troubleshooting Amazon Q Business identity and access to diagnose and fix common issues that you might encounter when working with Amazon Q and IAM.

  • How can I sync documents without ACLs?

Amazon Q Business supports crawling ACLs for document security by default. Turning off ACLs and identity crawling are no longer supported. If you want to index documents without ACLs, ensure that the documents are marked as public in your data source. Please refer to the official documentation, How Amazon Q Business connector for crawls Slack ACLs.

  • My connector is unable to sync. How can I monitor data source sync progress?

Amazon Q Business provides visibility into the data sync operations. Learn more about this feature in the AWS Machine Learning blog.

Additionally, as the sync process runs, you can monitor progress or debug failures by monitoring the Amazon CloudWatch logs that can be accessed from the Details section of the Sync run history.

A sample query to determine which documents or messages were indexed from a specific slack channel, C12AB34578, and logStream of SYNC_RUN_HISTORY_REPORT/xxxxxxxxxxxxxxxxxxxxxxxx would look like the following:

fields LogLevel, DocumentId, DocumentTitle, CrawlAction, ConnectorDocumentStatus.Status as ConnectorDocumentStatus, ErrorMsg, CrawlStatus.Status as CrawlStatus, SyncStatus.Status as SyncStatus, IndexStatus.Status as IndexStatus, SourceUri, Acl, Metadata, HashedDocumentId, @timestamp

| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/xxxxxxxxxxxxxxxxxxxxxxxx' and Metadata like /"stringValue":"C12AB34578"/

| sort @timestamp desc

| limit 10000

Choosing Run query displays the list of messages as the Amazon Q Business Index sync runs, as shown in the following figure.

Cleanup

To delete an Amazon Q Business application, you can use the console or the DeleteApplication API operation.

To delete an Amazon Q Business application using the console

  1. Sign in to the Amazon Q Business console.
  2. Select the respective the Amazon Q Business Application and choose
  3. Choose Delete
  4. In the dialog box that opens, enter Delete to confirm deletion, and then choose Delete.
  5. You are returned to the service console while your application is deleted. When the deletion process is complete, the console displays a message confirming successful deletion.

To delete the IAM Identity Center instance, see Delete your IAM Identity Center instance.

Conclusion

This blog post provides a step-by-step guide on setting up the Slack connector for Amazon Q Business, enabling you to seamlessly integrate data from your Slack workspace. Moreover, we highlighted the importance of data privacy and security, demonstrating how the connector adheres to the ACLs within your Slack workspace. This feature helps ensure that private channel conversations remain confidential and inaccessible to individuals who aren’t members of those channels. By following these steps and understanding the built-in security measures, you can use the power of Amazon Q Business while maintaining the integrity and privacy of your Slack workspace.

To learn more about the Amazon Q Business connector for Slack, see Connecting Slack to Amazon Q Business. You can automate all the showcased console operations through Amazon Q Business API’s, the AWS CLI and other applicable AWS SDKs.

If you choose to converse with Amazon Q Business using Slack direct messages (DMs) to ask questions and get answers based on company data or to get help creating new content such as email drafts, summarize attached files, and perform tasks, see Deploy a Slack gateway for Amazon Q, your business expert for information about how to bring Amazon Q, your business expert, to users in Slack.


About the Authors

Akshara Shah is a Senior Solutions Architect at Amazon Web Services. She provides strategic technical guidance to help customers design and build cloud solutions. She is currently focused on machine learning and AI technologies.

Roshan Thomas is a Senior Solutions Architect at Amazon Web Services. He is based in Melbourne, Australia and works closely with enterprise customers to accelerate their journey in the cloud. He is passionate about technology and helping customers architect and build solutions on AWS.

Read More

AI Summit: US Energy Secretary Highlights AI’s Role in Science, Energy and Security

AI Summit: US Energy Secretary Highlights AI’s Role in Science, Energy and Security

AI can help solve some of the world’s biggest challenges — whether climate change, cancer or national security — U.S. Secretary of Energy Jennifer Granholm emphasized today during her remarks at the AI for Science, Energy and Security session at the NVIDIA AI Summit, in Washington, D.C.

Granholm went on to highlight the pivotal role AI is playing in tackling major national challenges, from energy innovation to bolstering national security.

“We need to use AI for both offense and defense — offense to solve these big problems and defense to make sure the bad guys are not using AI for nefarious purposes,” she said.

Granholm, who calls the Department of Energy “America’s Solutions Department,” highlighted the agency’s focus on solving the world’s biggest problems.

“Yes, climate change, obviously, but a whole slew of other problems, too … quantum computing and all sorts of next-generation technologies,” she said, pointing out that AI is a driving force behind many of these advances.

“AI can really help to solve some of those huge problems — whether climate change, cancer or national security,” she said. “The possibilities of AI for good are awesome, awesome.”

Following Granholm’s 15-minute address, a panel of experts from government, academia and industry took the stage to further discuss how AI accelerates advancements in scientific discovery, national security and energy innovation.

“AI is going to be transformative to our mission space.… We’re going to see these big step changes in capabilities,” said Helena Fu, director of the Office of Critical and Emerging Technologies at the Department of Energy, underscoring AI’s potential in safeguarding critical infrastructure and addressing cyber threats.

During her remarks, Granholm also stressed that AI’s increasing energy demands must be met responsibly.

“We are going to see about a 15% increase in power demand on our electric grid as a result of the data centers that we want to be located in the United States,” she explained.

However, the DOE is taking steps to meet this demand with clean energy.

“This year, in 2024, the United States will have added 30 Hoover Dams’ worth of clean power to our electric grid,” Granholm announced, emphasizing that the clean energy revolution is well underway.

AI’s Impact on Scientific Discovery and National Security

The discussion then shifted to how AI is revolutionizing scientific research and national security.

Tanya Das, director of the Energy Program at the Bipartisan Policy Center, pointed out that “AI can accelerate every stage of the innovation pipeline in the energy sector … starting from scientific discovery at the very beginning … going through to deployment and permitting.”

Das also highlighted the growing interest in Congress to support AI innovations, adding, “Congress is paying attention to this issue, and, I think, very motivated to take action on updating what the national vision is for artificial intelligence.”

Fu reiterated the department’s comprehensive approach, stating, “We cross from open science through national security, and we do this at scale.… Whether they be around energy security, resilience, climate change or the national security challenges that we’re seeing every day emerging.”

She also touched on the DOE’s future goals: “Our scientific systems will need access to AI systems,” Fu said, emphasizing the need to bridge both scientific reasoning and the new kinds of models we’ll need to develop for AI.

Collaboration Across Sectors: Government, Academia and Industry

Karthik Duraisamy, director of the Michigan Institute for Computational Discovery and Engineering at the University of Michigan, highlighted the power of collaboration in advancing scientific research through AI.

“Think about the scientific endeavor as 5% creativity and innovation and 95% intense labor. AI amplifies that 5% by a bit, and then significantly accelerates the 95% part,” Duraisamy explained. “That is going to completely transform science.”

Duraisamy further elaborated on the role AI could play as a persistent collaborator, envisioning a future where AI can work alongside scientists over weeks, months and years, generating new ideas and following through on complex projects.

“Instead of replacing graduate students, I think graduate students can be smarter than the professors on day one,” he said, emphasizing the potential for AI to support long-term research and innovation.

Learn more about how this week’s AI Summit highlights how AI is shaping the future across industries and how NVIDIA’s solutions are laying the groundwork for continued innovation. 

###END###

Read More

Transitioning off Amazon Lookout for Metrics 

Transitioning off Amazon Lookout for Metrics 

Amazon Lookout for Metrics is a fully managed service that uses machine learning (ML) to detect anomalies in virtually any time-series business or operational metrics—such as revenue performance, purchase transactions, and customer acquisition and retention rates—with no ML experience required. The service, which was launched in March 2021, predates several popular AWS offerings that have anomaly detection, such as Amazon OpenSearch, Amazon CloudWatch, AWS Glue Data Quality, Amazon Redshift ML, and Amazon QuickSight.

After careful consideration, we have made the decision to end support for Amazon Lookout for Metrics, effective October 10, 2025. In addition, as of today, new customer sign-ups are no longer available. Existing customers will be able to use the service as usual until October 10, 2025, when we will end support for Amazon Lookout for Metrics.

In this post, we provide an overview of the alternate AWS services that offer anomaly detection capabilities for customers to consider transitioning their workloads to.

AWS services with anomaly detection capabilities

We recommend customers use Amazon OpenSearch, Amazon CloudWatch, Amazon Redshift ML, Amazon QuickSight, or AWS Glue Data Quality services for their anomaly detection use cases as an alternative to Amazon Lookout for Metrics. These AWS services offer generally available, ML-powered anomaly detection capabilities that can be used out of the box without requiring any ML expertise. Following is a brief overview of each service.

Using Amazon OpenSearch for anomaly detection

Amazon OpenSearch Service features a highly performant, integrated anomaly detection engine that enables the real-time identification of anomalies in streaming data as well as in historical data. You can pair anomaly detection with built-in alerting in OpenSearch to send notifications when there is an anomaly. To start using OpenSearch for anomaly detection you first must index your data into OpenSearch, from there you can enable anomaly detection in OpenSearch Dashboards. To learn more, see the documentation.

Using Amazon CloudWatch for anomaly detection

Amazon CloudWatch supports creating anomaly detectors on specific Amazon CloudWatch Log Groups by applying statistical and ML algorithms to CloudWatch metrics. Anomaly detection alarms can be created based on a metric’s expected value. These types of alarms don’t have a static threshold for determining alarm state. Instead, they compare the metric’s value to the expected value based on the anomaly detection model. To start using CloudWatch anomaly detection, you first must ingest data into CloudWatch and then enable anomaly detection on the log group.

Using Amazon Redshift ML for anomaly detection

Amazon Redshift ML makes it easy to create, train, and apply machine learning models using familiar SQL commands in Amazon Redshift data warehouses. Anomaly detection can be done on your analytics data through Redshift ML by using the included XGBoost model type, local models, or remote models with Amazon SageMaker. With Redshift ML, you don’t have to be a machine learning expert and you pay only for the training cost of the SageMaker models. There are no additional costs to using Redshift ML for anomaly detection. To learn more, see the documentation.

Using Amazon QuickSight for anomaly detection

Amazon QuickSight is a fast, cloud-powered, business intelligence service that delivers insights to everyone in the organization. As a fully managed service, QuickSight lets customers create and publish interactive dashboards that include ML insights. QuickSight supports a highly performant, integrated anomaly detection engine that uses proven Amazon technology to continuously run ML-powered anomaly detection across millions of metrics to discover hidden trends and outliers in customers’ data. This tool allows customers to get deep insights that are often buried in the aggregates and not scalable with manual analysis. With ML-powered anomaly detection, customers can find outliers in their data without the need for manual analysis, custom development, or ML domain expertise. To learn more, see the documentation.

Using Amazon Glue Data Quality for anomaly detection

Data engineers and analysts can use AWS Glue Data Quality to measure and monitor their data. AWS Glue Data Quality uses a rule-based approach that works well for known data patterns and offers ML-based recommendations to help you get started. You can review the recommendations and augment rules from over 25 included data quality rules. To capture unanticipated, less obvious data patterns, you can enable anomaly detection. To use this feature, you can write rules or analyzers and then turn on anomaly detection in AWS Glue ETL. AWS Glue Data Quality collects statistics for columns specified in rules and analyzers, applies ML algorithms to detect anomalies, and generates visual observations explaining the detected issues. Customers can use recommended rules to capture the anomalous patterns and provide feedback to tune the ML model for more accurate detection. To learn more, see the blog post, watch the introductory video, or see the documentation.

Using Amazon SageMaker Canvas for anomaly detection (a beta feature)

The Amazon SageMaker Canvas team plans to provide support for anomaly detection use cases in Amazon SageMaker Canvas. We’ve created an AWS CloudFormation template-based solution to give customers early access to the underlying anomaly detection feature. Customers can use the CloudFormation template to bring up an application stack that receives time-series data from an Amazon Managed Streaming for Apache Kafka (Amazon MSK) streaming source and performs near-real-time anomaly detection in the streaming data. To learn more about the beta offering, see Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink.

Frequently asked questions

  1. What is the cutoff point for current customers?

We created an allow list of account IDs that have used Amazon Lookout for Metrics in the last 30 days and have active Amazon Lookout for Metrics resources, including detectors, within the service. If you are an existing customer and are having difficulties using the service, please reach out to us via AWS Customer Support for help.

  1. How will access change before the sunset date?

Current customers can do all the things they could previously. The only change is that non-current customers cannot create any new resources in Amazon Lookout for Metrics.

  1. What happens to my Amazon Lookout for Metrics resources after the sunset date?

After October 10, 2025, all references to AWS Lookout for Metrics models and resources will be deleted from Amazon Lookout for Metrics. You will not be able to discover or access Amazon Lookout for Metrics from your AWS Management Console and applications that call the Amazon Lookout for Metrics API will no longer work.

  1. Will I be billed for Amazon Lookout for Metrics resources remaining in my account after October 10, 2025?

Resources created by Amazon Lookout for Metrics internally will be deleted after October 10, 2025. Customers will be responsible for deleting the input data sources created by them, such as Amazon Simple Storage Service (Amazon S3) buckets, Amazon Redshift clusters, and so on.

  1. How do I delete my Amazon Lookout for Metrics resources?
  1. How can I export anomalies data before deleting the resources?

Anomalies data for each measure can be downloaded for a detector by using the Amazon Lookout for Metrics APIs for a particular detector. Exporting Anomalies explains how to connect to a detector, query for anomalies, and download them into a format for later use.

Conclusion

In this blog post, we have outlined methods to create anomaly detectors using alternates such as Amazon OpenSearch, Amazon CloudWatch, and a CloudFormation template-based solution.

Resource links:


About the Author

Nirmal Kumar is Sr. Product Manager for the Amazon SageMaker service. Committed to broadening access to AI/ML, he steers the development of no-code and low-code ML solutions. Outside work, he enjoys travelling and reading non-fiction.

Read More