Building Generative AI and ML solutions faster with AI apps from AWS partners using Amazon SageMaker

Building Generative AI and ML solutions faster with AI apps from AWS partners using Amazon SageMaker

Organizations of every size and across every industry are looking to use generative AI to fundamentally transform the business landscape with reimagined customer experiences, increased employee productivity, new levels of creativity, and optimized business processes. A recent study by Telecom Advisory Services, a globally recognized research and consulting firm that specializes in economic impact studies, shows that cloud-enabled AI will add more than $1 trillion to global GDP from 2024 to 2030.

Organizations are looking to accelerate the process of building new AI solutions. They use fully managed services such as Amazon SageMaker AI to build, train and deploy generative AI models. Oftentimes, they also want to integrate their choice of purpose-built AI development tools to build their models on SageMaker AI.

However, the process of identifying appropriate applications is complex and demanding, requiring significant effort to make sure that the selected application meets an organization’s specific business needs. Deploying, upgrading, managing, and scaling the selected application also demands considerable time and effort. To adhere to rigorous security and compliance protocols, organizations also need their data to stay within the confines of their security boundaries without the need to store it in a software as a service (SaaS) provider-owned infrastructure.

This increases the time it takes for customers to go from data to insights. Our customers want a simple and secure way to find the best applications, integrate the selected applications into their machine learning (ML) and generative AI development environment, manage and scale their AI projects.

Introducing Amazon SageMaker partner AI apps

Today, we’re excited to announce that AI apps from AWS Partners are now available in SageMaker. You can now find, deploy, and use these AI apps privately and securely, all without leaving SageMaker AI, so you can develop performant AI models faster.

Industry-leading app providers

The first group of partners and applications—shown in the following figure—that we’re including are Comet and its model experiment tracking application, Deepchecks and its large language model (LLM) quality and evaluation application, Fiddler and its model observability application, and Lakera and its AI security application.

Managed and secure

These applications are fully managed by SageMaker AI, so customers don’t have to worry about provisioning, scaling, and maintaining the underlying infrastructure. SageMaker AI makes sure that sensitive data stays completely within each customer’s SageMaker environment and will never be shared with a third party.

Available in SageMaker AI and SageMaker Unified Studio (preview)

Data scientists and ML engineers can access these applications from Amazon SageMaker AI (formerly known as Amazon SageMaker) and from SageMaker Unified Studio. This capability enables data scientists and ML engineers to seamlessly access the tools they require, enhancing their productivity and accelerating the development and deployment of AI products. It also empowers data scientists and ML engineers to do more with their models by collaborating seamlessly with their colleagues in data and analytics teams.

Seamless workflow integration

Direct integration with SageMaker AI provides a smooth user experience, from model building and deployment to ongoing production monitoring, all within your SageMaker development environment. For example, a data scientist can run experiments in their SageMaker Studio or SageMaker Unified Studio Jupyter notebook and then use the Comet ML app for visualizing and comparing those experiments.

Streamlined access

Use AWS credits to use partner apps without navigating lengthy procurement or approval processes, accelerating adoption and scaling of AI observability.

Application deep dive

The integration of these AI apps within SageMaker Studio enables you to build AI models and solutions without leaving your SageMaker development environment. Let’s take a look at the initial group of apps launched at re:Invent 2024.

Comet

Comet provides an end-to-end model evaluation solution for AI developers with best-in-class tooling for experiment tracking and model production monitoring. Comet has been trusted by enterprise customers and academic teams since 2017. Within SageMaker Studio, Notebooks and Pipelines, data scientists, ML engineers, and AI researchers can use Comet’s robust tracking and monitoring capabilities to oversee model lifecycles from training through production, bringing transparency and reproducibility to ML workflows.

You can access the Comet UI directly from SageMaker Studio and SageMaker Unified Studio without the need to provide additional credentials. The app infrastructure is deployed, managed, and supported by AWS, providing a holistic experience and seamless integration. This means each Comet deployment through SageMaker AI is securely isolated and provisioned automatically. You can seamlessly integrate Comet’s advanced tools without altering their existing your SageMaker AI workflows. To learn more, visit Comet.

Deepchecks

Deepchecks specializes in LLM evaluation. Their validation capabilities include automatic scoring, version comparison, and auto-calculated metrics for properties such as relevance, coverage, and grounded-in-context. These capabilities enable organizations to rigorously test, monitor, and improve their LLM applications while maintaining complete data sovereignty.

Deepchecks’s state-of-the-art automatic scoring capabilities for LLM applications, paired with the infrastructure and purpose-built tools provided by SageMaker AI for each step of the ML and FM lifecycle, makes it possible for AI teams to improve their models’ quality and compliance.

Starting today, organizations using AWS can immediately work with Deepchecks’s LLM evaluation tools in their environment, minimizing security and privacy concerns because data remains fully contained within their AWS environments. This integration also removes the overhead of onboarding a third-party vendor, because legal and procurement aspects are streamlined by AWS. To learn more, visit Deepchecks.

Fiddler AI

The Fiddler AI Observability solution allows data science, engineering, and line-of-business teams to validate, monitor, analyze, and improve ML models deployed on SageMaker AI.

With Fiddler’s advanced capabilities, users can track model performance, monitor for data drift and integrity, and receive alerts for immediate diagnostics and root cause analysis. This proactive approach allows teams to quickly resolve issues, continuously improving model reliability and performance. To learn more, visit Fiddler.

Lakera

Lakera partners with enterprises and high-growth technology companies to unlock their generative AI transformation. Lakera’s application Lakera Guard provides real-time visibility, protection, and control for generative AI applications. By protecting sensitive data, mitigating prompt attacks, and creating guardrails, Lakera Guard makes sure that your generative AI always interacts as expected.

Starting today, you can set up a dedicated instance of Lakera Guard within SageMaker AI that ensures data privacy and delivers low-latency performance, with the flexibility to scale alongside your generative AI application’s evolving needs. To learn more, visit Lakera.

See how customers are using partner apps

“The AI/ML team at Natwest Group leverages SageMaker and Comet to rapidly develop customer solutions, from swift fraud detection to in-depth analysis of customer interactions. With Comet now a SageMaker partner app, we streamline our tech and enhance our developers’ workflow, improving experiment tracking and model monitoring. This leads to better results and experiences for our customers.”
– Greig Cowan, Head of AI and Data Science, NatWest Group.

“Amazon SageMaker plays a pivotal role in the development and operation of Ping Identity’s homegrown AI and ML infrastructure. The SageMaker partner AI apps capability will enable us to deliver faster, more effective ML-powered functionality to our customers as a private, fully managed service, supporting our strict security and privacy requirements while reducing operational overhead.”
– Ran Wasserman, Principal Architect, Ping Identity.

Start building with AI apps from AWS partners

Amazon SageMaker AI provides access to a highly curated selection of apps from industry leading providers that are designed and certified to run natively and privately on SageMaker AI. Data scientists and developers can quickly find, deploy, and use these applications within SageMaker AI and the new unified studio to accelerate their ML and generative AI model building journey.

You can access all available SageMaker partner AI apps directly from SageMaker AI and SageMaker Unified Studio. Click through to view a specific app’s functionality, licensing terms, and estimated costs for deployment. After subscribing, you can configure the infrastructure that your app will run on by selecting a deployment tier and additional configuration parameters. After the app finishes the provisioning process, you will be able to assign access to your users, who will find the app ready to use in their SageMaker Studio and SageMaker Unified Studio environments.


About the authors

Gwen Chen is a Senior Generative AI Product Marketing Manager at AWS. She started working on AI products in 2018. Gwen has launched an NLP-powered app building product, MLOps, generative AI-powered assistants for data integration and model building, and inference capabilities. Gwen graduated from a dual master degree program of science and business with Duke and UNC Kenan-Flagler. Gwen likes listening to podcasts, skiing, and dancing.

Naufal Mir is a Senior Generative AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy, and migrate ML workloads to SageMaker. He previously worked at financial services institutes developing and operating systems at scale. He enjoys ultra-endurance running and cycling.

Kunal Jha is a Senior Product Manager at AWS. He is focused on building Amazon SageMaker Studio as the IDE of choice for all ML development steps. In his spare time, Kunal enjoys skiing, scuba diving and exploring the Pacific Northwest. You can find him on LinkedIn.

Eric Peña is a Senior Technical Product Manager in the AWS Artificial Intelligence Platforms team, working on Amazon SageMaker Interactive Machine Learning. He currently focuses on IDE integrations on SageMaker Studio. He holds an MBA degree from MIT Sloan and outside of work enjoys playing basketball and football.

Arkaprava De is a manager leading the SageMaker Studio Apps team at AWS. He has been at Amazon for over 9 years and is currently working on improving the Amazon SageMaker Studio IDE experience. You can find him on LinkedIn.

Zuoyuan Huang is a Software Development Manager at AWS. He has been at Amazon for over 5 years, and has been focusing on building SageMaker Studio apps and IDE experience. You can find him on LinkedIn.

Read More

MarS: A unified financial market simulation engine in the era of generative foundation models

MarS: A unified financial market simulation engine in the era of generative foundation models

MarS illustration with document workflow and chatbot icons on a purple gradient background

Introduction

Generative foundation models have transformed various domains, creating new paradigms for content generation. Integrating these models with domain-specific data enables industry-specific applications. Microsoft Research has used this approach to develop the large market model (LMM) and the Financial Market Simulation Engine (MarS) for the financial domain. These innovations have the potential to empower financial researchers to customize generative models for diverse scenarios, establishing a new paradigm for applying generative models to downstream tasks in financial markets. This integration may provide enhanced efficiency, more accurate insights, and significant advancements in the financial domain. 

Applying generative models to financial markets

In recent years, generative foundation models have achieved notable success in fields like natural language processing and media generation. Their rise has sparked a new wave of research and industrial adoption, reshaping production processes across industries. These models excel due to three essential elements: a large volume of high-quality training data; effective tokenization and serialization of core information (such as semantic information in text); and an auto-regressive training approach that models data comprehensively, enabling implicit reasoning. 

Building on years of AI applications across industries, Microsoft researchers recognized that combining generative models with domain-specific data could lead to impactful solutions, particularly in finance. The financial market is a prime example, notably for its vast amount of order data, which are characterized by three key features: 

  • Fine granularity: Orders, as the atomic data in the financial market, provide a comprehensive and detailed representation of the real market. Combined with matching rules, one can reproduce the entire market operation process. 
  • Large scale: Electronic trading has resulted in the accumulation of massive trade-order data across global exchanges
  • Well-structured: The structured nature of order data makes it ideal for tokenization and sequential modeling

These characteristics position order flow data as a critical foundation for generative modeling in financial markets. To this end, Microsoft Research developed the LMM and the MarS, which financial researchers can use to customize generative models for various applications, thus fostering a new paradigm of generative solutions for all downstream tasks in finance. This has the potential to advance efficiency and insight generation in the financial industry. 

Figure 1: Illustration of Stock Market and Orders. On the left, a document icon shows order details. An arrow points to the right where multiple icons (robots and human figures) interact with charts and graphs representing market data.
Figure 1: Illustration of stock market and orders

Tokenization of order flow information

Order flow data is vital for generative models in finance, reflecting real-time interactions among market participants. It offers two types of value: 

  • Fine-grained market feedback: Each order, especially large ones, may influence others’ decisions, providing a micro-level view of pricing behavior. 
  • Macroscopic market dynamics: Collective interactions shape trading dynamics over time, capturing the evolution and resolution of conflicts between market forces. 

Researchers at Microsoft developed LMM by modeling both individual orders and entire order sets over time. This two-tiered approach captures both fine-grained feedback and macro-level dynamics of competition. Figure 2 shows the tokenization techniques for these models, enabling high-fidelity simulations of complex market dynamics. 

Figure 2: Illustration of Tokenization for Individual Orders (Top) and Batch Orders (Bottom) . At the top left, a green document labeled 'Type Price Volume Interval' is connected by dotted lines to another document icon. To the right, a bar chart with red and green bars shows volume on the y-axis and numbers on the x-axis. Below, an arrow points from an 'Order Batch' section with three documents to three grids.
Figure 2: Tokenization for individual orders (top) and batch orders (bottom) 

Expansion law of large market model: Unlocking the potential of financial data 

The effectiveness of generative models improves significantly with larger training datasets and model parameters. Researchers at Microsoft used two tokenization strategies to design models based on the Transformer architecture, testing them across varying data scales. Figure 3 illustrates the scaling behavior of both the order and order batch models, highlighting insights from historical trading data. This integration enhances the model’s ability to generate order flows with a deep understanding of market intricacies, enabling more accurate time-series modeling. 

Figure 3: Two line graphs comparing validation loss against the number of training tokens for different model sizes. The left graph, titled 'Order Model,' shows curves for model sizes ranging from 2M to 1.02B, with validation loss decreasing as the number of training tokens increases. The right graph, titled 'Order-Batch Model,' displays curves for model sizes ranging from 150M to 3B, also showing a decrease in validation loss with an increase in training tokens.
Figure 3: Scaling curves of order and batch order models under different parameter sizes 

Microsoft research podcast

Collaborators: Silica in space with Richard Black and Dexter Greene

College freshman Dexter Greene and Microsoft research manager Richard Black discuss how technology that stores data in glass is supporting students as they expand earlier efforts to communicate what it means to be human to extraterrestrials.


MarS based on LMM

A customizable generative model for financial scenarios

Generative models, once trained, can be easily adapted for a range of downstream tasks, often outperforming traditional models tailored for specific scenarios. Building on the development of LMM, researchers further analyzed the needs of various financial scenarios and designed MarS as a versatile financial market simulation engine. MarS not only serves as a general-purpose simulation tool but also introduces a novel framework for applying generative models across diverse financial tasks, from market prediction and risk assessment to trading strategy optimization. 

Figure 4: Diagram of the MarS framework showing data flow and interactions between components like the current market & environment data, order-level historical market data, large marke model, generated order sequences, simulated market trajectories, and applications.
Figure 4: Framework of MarS

Constructing a unified paradigm for prediction and detection tasks 

Traditional financial prediction solutions often require the development of specialized algorithms, which must be frequently adjusted, consuming time and resources. LMM’s capacity to model financial markets in depth allows for periodic updates based on the latest data. MarS creates a virtual exchange to match order flows generated by LMM, simulating trades and deriving simulated market trajectories (see the top right of Figure 4). This approach can effectively address common prediction and detection tasks in financial scenarios, introducing innovative solutions within the generative model framework. 

Applications in prediction tasks

Prediction tasks, vital in finance, involve estimating future market metrics. Traditional models require modifications with any change in prediction targets. MarS addresses this by continuously generating future order flows from recent data, which are matched in a virtual exchange, allowing for the simulation of potential future market trajectories. This provides a significant advancement in forecasting capabilities.

Figure 5 demonstrates the use of MarS in forecasting stock-price movements, where its performance significantly outperforms traditional benchmark algorithms. Taking the Order Model (1.02B) for instance, its performance exceeds that of DeepLOB by approximately (0.662/0.583−1=13.5%) at a 1-minute horizon and increases to (0.579/0.473−1=22.4%) at a 5-minute horizon This widening performance gap suggests that the Order Model maintains its predictive accuracy more effectively over longer horizons, highlighting its superior generalization capability compared to baseline, especially as the prediction task becomes more challenging over extended timeframes. This provides an attractive solution for prediction tasks in financial markets, while also highlighting LMM’s capability in modeling stock market dynamics. 

Figure 5: Line graph comparing prediction accuracy over time for three models: DeepLOB, Order Model (0.22B), and Order Model (1.02B). Prediction accuracy decreases as time increases from 1 to 5 minutes, with DeepLOB showing the lowest accuracy and Order Model (1.02B) showing the highest.
Figure 5: Predicting stock price trends with MarS

Applications in detection tasks

For regulators, detecting systemic risks or market abuse is critical for market stability. LMM models typical market patterns, enabling the identification of anomalies by comparing real market trajectories with those generated by MarS. Figure 6 shows the differences in the spread distribution (i.e., the difference between the best buy and sell prices, which reflects asset liquidity) between simulated and real market trajectories during a confirmed malicious market manipulation incident. This comparison can uncover subtle deviations indicative of unusual activities, offering regulators effective tools for monitoring market integrity.

Figure 6: Three bar graphs comparing the distribution similarity of data across three different periods: pre-manipulation, manipulation period, and post-manipulation. Each graph shows the probability distribution for 2 types of data: Replay and Simulation. The distribution similarity scores are 0.870 for pre-manipulation, 0.835 for the manipulation period, and 0.873 for post-manipulation.
Figure 6: Spread correlation between simulated and real market during market manipulation 

Defining new FinTech scenarios 

Generative models can create tailored content from simple descriptions. In MarS, a mechanism generates specific order flows from natural language descriptions of market conditions. To address extreme conditions, researchers developed a control signal system using a hierarchical diffusion model to generate high-fidelity signals during rare events, such as stock market crashes and circuit breakers. This capability transforms broad descriptions into precise order flow controls. 

By integrating controlled order generation with real-time feedback, MarS creates a unified framework for prediction and detection tasks, redefining financial research, applications, and market understanding. Key applications include “What If” analyses and training environments for reinforcement learning algorithms in realistic market conditions. 

“What If” analysis for financial research

The question “What would happen if different sizes of trading orders were executed under different market conditions?” is crucial for understanding market behavior. Traditional methods, relying on real orders, experience, and assumptions, are costly and slow. Generative models provide a breakthrough solution.

Figure 7 illustrates how MarS can simulate market impact: the top left shows how buy orders affect asset price trajectories, while the top right presents market impact curves of different strategies, matching traditional patterns. Researchers also used MarS to generate large-scale simulated data, constructing a market impact model using ordinary differential equations (ODE). The bottom left of Figure 7) shows the derived impact formula, and the bottom right demonstrates its interpretability. These advancements highlight MarS’s potential to enhance “What If” research through deep market modeling. 

Figure 7: Composite image of four graphs related to sample research results for market impact of orders Using MarS. The top left graph shows mid-price over time with two lines representing simulation and replay actions. The top right graph displays market impact for different agent types over time. The bottom left graph illustrates the auto-correlation of market impact decay for learned ODE, base ODE, and synthetic Seq. The bottom right heatmap shows interaction weights of the learned ODE with various features on the x-axis and log-transformed time on the y-axis.
Figure 7: Sample research results for market impact of orders using MarS 

Training environments for reinforcement learning in financial markets

Reinforcement learning (RL) algorithms require controlled environments for testing and optimization. Financial market behaviors often manifest through order flow changes, impacting the market. If the simulation cannot reflect these impacts accurately, an RL algorithm may fail in real-world scenarios.

MarS provides high-fidelity generation and real-time feedback, creating a comprehensive environment for RL in finance. Figure 8 shows the training process of trading agents, highlighting significant improvements in performance over time and demonstrating MarS’s effectiveness as an RL training ground. 

Figure 8: Line graph titled 'Price Advantage' on the y-axis and 'Step' on the x-axis. The graph shows an upward trend as the steps increase.
Figure 8: Performance of reinforcement learning trading agents trained in MarS. During training, the agent’s performance improved significantly, showcasing MarS’s ability to aid in developing robust reinforcement learning algorithms for real market conditions. 

Disclaimer: The research mentioned in this article, conducted by Microsoft Research, focuses on scientific exploration, aiming to advance knowledge and provide theoretical and technological support for research and applications in the financial field. All studies adhere to Microsoft’s responsible AI guidelines, ensuring principles such as fairness, inclusiveness, reliability and safety, transparency, privacy, and accountability are maintained. The technologies and methods discussed are still under research and development, not yet forming any commercial products or services, nor constituting any financial solutions. Readers are advised to consult certified financial professionals before making any financial decisions. 

The post MarS: A unified financial market simulation engine in the era of generative foundation models appeared first on Microsoft Research.

Read More

Query structured data from Amazon Q Business using Amazon QuickSight integration

Query structured data from Amazon Q Business using Amazon QuickSight integration

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Although generative AI is fueling transformative innovations, enterprises may still experience sharply divided data silos when it comes to enterprise knowledge, in particular between unstructured content (such as PDFs, Word documents, and HTML pages), and structured data (real-time data and reports stored in databases or data lakes). Both categories of data are typically queried and accessed using separate tools, from in-product browse and search functionality for unstructured data, to business intelligence (BI) tools like Amazon QuickSight for structured content.

Amazon Q Business offers an effective solution for quickly building conversational applications over unstructured content, with over 40 data connectors to popular content and storage management systems such as Confluence, SharePoint, and Amazon Simple Storage Service (Amazon S3), to aggregate enterprise knowledge. Customers are also looking for a unified conversational experience across all their knowledge repositories, regardless of the format the content is stored and organized as.

On December 3, 2024, Amazon Q Business announced the launch of its integration with QuickSight, allowing you to quickly connect your structured sources to your Amazon Q Business applications, creating a unified conversational experience for your end-users. The QuickSight integration offers an extensive set of over 20 structured data source connectors, including Amazon Redshift, PostgreSQL, MySQL, and Oracle, enabling you to quickly expand the conversational scope of your Amazon Q Business assistants to cover a wider range of knowledge sources. For the end-users, answers are returned in real time from your structured sources, combined with other relevant information found in unstructured repositories. Amazon Q Business uses the analytics and advanced visualization engine in QuickSight to generate accurate and simple-to-understand answers from structured sources.

In this post, we show you how to configure the QuickSight connection from Amazon Q Business and then ask questions to get real-time data and visualizations from QuickSight for structured data in addition to unstructured content.

Solution overview

The QuickSight feature in Amazon Q Business is available on the Amazon Q Business console as well as through Amazon Q Business APIs. This feature is implemented as a plugin within Amazon Q Business. After it’s enabled, this plugin will behave differently than other Amazon Q Business plugins—it will query QuickSight automatically for every user prompt, looking for relevant answers.

For AWS accounts that aren’t subscribed to QuickSight already, the Amazon Q Business admin completes the following steps:

  1. Create a QuickSight account.
  2. Connect your database in QuickSight to create a dataset.
  3. Create a topic in QuickSight, which is then used to make it searchable from your Amazon Q Business application.

When the feature is activated, Amazon Q Business will use your unstructured data sources configured in Amazon Q Business, as well as your structured content available using QuickSight, to generate a rich answer that includes narrative and visualizations. Depending on the question and data in QuickSight, Amazon Q Business may generate one or more visualizations as a response.

Prerequisites

You should have the following prerequisites:

  • An AWS account where you can follow the instructions in this post.
  • AWS IAM Identity Center set up to be used with Amazon Q Business. For more information, see Configure Amazon Q Business with AWS IAM Identity Center trusted identity propagation.
  • At least one Amazon Q Business Pro user that has admin permissions to set up and configure Amazon Q Business. For pricing information, see Amazon Q Business pricing.
  • An IAM Identity Center group that will be assigned the QuickSight Admin Pro role, for users who will manage and configure QuickSight.
  • If a QuickSight account exists, then it needs to be in the same AWS account and AWS Region as Amazon Q Business, and configured with IAM Identity Center.
  • A database that is installed and can be reached from QuickSight to load structured data (or you could create a dataset by uploading a CSV or XLS file). The database also needs credentials to create tables and insert data.
  • Sample structured data to load into the database (along with insert statements).

Create an Amazon Q Business application

To use this feature, you need to have an Amazon Q Business application. If you don’t have an existing application, follow the steps in Discover insights from Amazon S3 with Amazon Q S3 connector to create an application along with an Amazon S3 data source. Upload the non-structured document(s) to Amazon S3 and sync the data source.

Create and configure a new QuickSight account

You can skip this section if you already have an existing QuickSight account. To create a QuickSight account, complete the following steps:

  1. On the Amazon Q Business console, navigate to your application.
  2. Choose Amazon QuickSight in the navigation pane.

  1. Choose Create QuickSight account.

  1. Under QuickSight account information, enter your account name and an email for account notifications.
  2. Under Assign QuickSight Admin Pro roles, choose the IAM Identity Center group you created as a prerequisite.
  3. Choose Next.

  1. Under Service access, select Create and use a new service role.
  2. Choose Authorize.

This will create a QuickSight account, assign the IAM Identity Center group as QuickSight Admin Pro, and authorize Amazon Q Business to access QuickSight.

You will see a dashboard with details for QuickSight. Currently, it will show zero datasets and topics.

  1. Choose Go to QuickSight.

You can now proceed to the next section to prepare your data.

Configure an existing QuickSight account

You can skip this section if you followed the previous steps and created a new QuickSight account.

If your current QuickSight account is not on IAM Identity Center, consider using a different AWS account without a QuickSight subscription for the purpose of testing this feature. From that account, you create an Amazon Q Business application on IAM Identity Center and go through the QuickSight integration setup steps on the Amazon Q Business console that will create the QuickSight account for you in IAM Identity Center. Remember to delete that new QuickSight account and Amazon Q Business application after your testing is done to avoid further billing.

Complete the following steps to set up the QuickSight connector from Amazon Q Business for an existing QuickSight account:

  1. On the Amazon Q Business console, navigate to your application.
  2. Choose Amazon QuickSight in the navigation pane.

  1. Choose Authorize QuickSight answers.

  1. Under Assign QuickSight Admin Pro roles, choose the IAM Identity Center group you created as a prerequisite.
  2. Under Service Access, select Create and use a new service role.
  3. Choose Save.

You will see a dashboard with details for QuickSight. If you already have a dataset and topics, they will show up here.

You’re now ready to add a dataset and topics in the next section.

Add data in QuickSight

In this section, we create an Amazon Redshift data source. You can instead create a data source from the database of your choice, use files in Amazon S3, or perform a direct upload of CSV files and connect to it. Refer to Creating a dataset from a database for more details.

To configure your data, complete the following steps:

  1. Create a new dataset with Amazon Redshift as a data source.

Configuring this connection offers multiple choices; choose the one that best fits your needs.

  1. Create a topic from the dataset. For more information, see Creating a topic.

  1. Optionally, create dashboards from the topic. If created, Amazon Q Business can use them.

Ask queries to Amazon Q Business

To start chatting with Amazon Q Business, complete the following steps:

  1. On the Amazon Q Business console, navigate to your application.
  2. Choose Amazon QuickSight in the navigation pane.

You should see the datasets and topics populated with values.

  1. Choose the link under Deployed URL.

We uploaded AWS Cost and Usage Reports for a specific AWS account in QuickSight using Amazon Redshift. We also uploaded Amazon service documentation into a data source using Amazon S3 into Amazon Q Business as unstructured data. We will ask questions related to our AWS costs and show how Amazon Q Business answers questions from both structured and unstructured data.

The following screenshot shows an example question that returns a response from only unstructured data.

The following screenshot shows an example question that returns a response from only structured data.

The following screenshot shows an example question that returns a response from both structured and unstructured data.

The following screenshot shows an example question that returns multiple visualizations from both structured and unstructured data.

Clean up

If you no longer want to use this Amazon Q Business feature, delete the resources you created to avoid future charges:

  1. Delete the Amazon Q Business application:
    1. On the Amazon Q Business console, choose Applications in the navigation pane.
    2. Select your application and on the Actions menu, choose Delete.
    3. Enter delete to confirm and choose Delete.

The process can take up to 15 minutes to complete.

  1. Delete the S3 bucket:
    1. Empty your S3 bucket.
    2. Delete the bucket.
  2. Delete the QuickSight account:
    1. On the Amazon QuickSight console, choose Manage Amazon QuickSight.
    2. Choose Account setting and Manage.
    3. Delete the account.
  3. Delete your IAM Identity Center instance.

Conclusion

In this post, we showed how to include answers from your structured sources in your Amazon Q Business applications, using the QuickSight integration. This creates a unified conversational experience for your end-users that saves them time, helps them make better decisions through more complete answers, and improves their productivity.

At AWS re:Invent 2024, we also announced a similar unified experience enabling access to insights from unstructured data sources in Amazon Q in QuickSight powered by Amazon Q Business.

To learn about the new capabilities Amazon Q in QuickSight provides, see QuickSight Plugin.

To learn more about Amazon Q Business, refer to the Amazon Q Business User Guide.

To learn more about configuring a QuickSight dataset, see Manage your Amazon QuickSight datasets more efficiently with the new user interface.

QuickSight also offers querying unstructured data. For more details, refer to Integrate unstructured data into Amazon QuickSight using Amazon Q Business.


About the authors

jdJiten Dedhia is a Sr. AIML Solutions Architect with over 20 years of experience in the software industry. He has helped Fortune 500 companies with their AIML/Generative AI needs.

jpdJean-Pierre Dodel is a Principal Product Manager for Amazon Q Business, responsible for delivering key strategic product capabilities including structured data support in Q Business, RAG. and overall product accuracy optimizations. He brings extensive AI/ML and Enterprise search experience to the team with over 7 years of product leadership at AWS.

Read More

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

Digital experience interruptions can harm customer satisfaction and business performance across industries. Application failures, slow load times, and service unavailability can lead to user frustration, decreased engagement, and revenue loss. The risk and impact of outages increase during peak usage periods, which vary by industry—from ecommerce sales events to financial quarter-ends or major product launches. According to New Relic’s 2024 Observability Forecast, businesses face a median annual downtime of 77 hours from high-impact outages. These outages can cost up to $1.9 million per hour.

New Relic is addressing these challenges by creating the New Relic AI custom plugin for Amazon Q Business. This custom plugin creates a unified solution that combines New Relic AI’s observability insights and recommendations and Amazon Q Business’s Retrieval Augmented Generation (RAG) capabilities, in and a natural language interface for east of use.

The custom plugin streamlines incident response, enhances decision-making, and reduces cognitive load from managing multiple tools and complex datasets. It empowers team members to interpret and act quickly on observability data, improving system reliability and customer experience. By using AI and New Relic’s comprehensive observability data, companies can help prevent issues, minimize incidents, reduce downtime, and maintain high-quality digital experiences.

This post explores the use case, how this custom plugin works, how it can be enabled, and how it can help elevate customers’ digital experiences.

The challenge: Resolving application problems before they impact customers

New Relic’s 2024 Observability Forecast highlights three key operational challenges:

  • Tool and context switching – Engineers use multiple monitoring tools, support desks, and documentation systems. 45% of support engineers, application engineers, and SREs use five different monitoring tools on average. This fragmentation can cause missed SLAs and SLOs, confusion during critical incidents, and increased negative fiscal impact. Tool switching slows decision-making during outages or ecommerce disruptions.
  • Knowledge accessibility – Scattered, hard-to-access knowledge, including runbooks and post-incident reports, hinders effective incident response. This can cause slow escalations, uncertain decisions, longer disruptions, and higher operational costs from redundant engineer involvement.
  • Complexity in data interpretation – Team members may struggle to interpret monitoring and observability data due to complex applications with numerous services and cloud infrastructure entities, and unclear symptom-problem relationships. This complexity hinders quick, accurate data analysis and informed decision-making during critical incidents.

The custom plugin for Amazon Q Business addresses these challenges with a unified, natural language interface for critical insights. It uses AI to research and translate findings into clear recommendations, providing quick access to indexed runbooks and post-incident reports. This custom plugin streamlines incident response, enhances decision-making, and reduces effort in managing multiple tools and complex datasets.

Solution Overview

The New Relic custom plugin for Amazon Q Business centralizes critical information and actions in one interface, streamlining your workflow. It allows you to inquire about specific services, hosts, or system components directly. For instance, you can investigate a sudden spike in web service response times or a slow database. NR AI responds by analyzing current performance data and comparing it to historical trends and best practices. It then delivers detailed insights and actionable recommendations based on up-to-date production environment information.

The following diagram illustrates the workflow.

Scope of Solution

When a user asks a question in the Amazon Q interface, such as “Show me problems with the checkout process,” Amazon Q queries the RAG ingested with the customers’ runbooks. Runbooks are troubleshooting guides maintained by operational teams to minimize application interruptions. Amazon Q gains contextual information, including the specific service names and infrastructure information related to the checkout service, and uses the custom plugin to communicate with New Relic AI. New Relic AI initiates a deep dive analysis of monitoring data since the checkout service problems began.

New Relic AI conducts a comprehensive analysis of the checkout service. It examines service performance metrics, forecasts of key indicators like error rates, error patterns and anomalies, security alerts, and overall system status and health. The analysis results in a summarized alert intelligence report that identifies and explains root causes of checkout service issues. This report provides clear, actionable recommendations and includes real-time application performance insights. It also offers direct links to detailed New Relic interfaces. Users can access this comprehensive summary without leaving the Amazon Q interface.

The custom plugin presents information and insights directly within the Amazon Q Business interface, eliminating the need to switch between the New Relic and Amazon Q interfaces, and enabling faster problem resolution.

Potential impacts

The New Relic Intelligent Observability platform provides comprehensive incident response and application and infrastructure performance monitoring capabilities for SREs, application engineers, support engineers, and DevOps professionals. Organizations using New Relic report significant improvements in their operations, achieving a 65% reduction in incidents, 10 times more deployments, and 50% faster release times while maintaining 99.99% uptime. When you combine New Relic insights with Amazon Q Business, you can further reduce incidents, deploy higher-quality code more frequently, and create more reliable experiences for your customers:

  • Detect and resolve incidents faster – With this custom plugin, you can reduce undetected incidents and resolve issues more quickly. Incidents often occur when teams miss early warning signs or can’t connect symptoms to underlying problems, leading to extended service disruptions. Although New Relic collects and generates data that can identify these warning signs, teams working in separate tools might not have access to these critical insights. For instance, support specialists might not have direct access to monitoring dashboards, making it challenging to identify emerging issues. The custom plugin consolidates these monitoring insights, helping you more effectively identify and understand related issues.
  • Simplify incident management – The custom plugin enhances support engineers’ and incident responders’ efficiency by streamlining their workflow. The custom plugin allows you to manage incidents without switching between New Relic AI and Amazon Q during critical moments. The integrated interface removes context switching, enabling both technical and non-technical users to access vital monitoring data quickly within the Amazon Q interface. This comprehensive approach speeds up troubleshooting, minimizes downtime, and boosts overall system reliability.
  • Build reliability across teams – The custom plugin makes application and infrastructure performance monitoring insights accessible to team members beyond traditional observability users. translates complex production telemetry data into clear, actionable insights for product managers, customer service specialists, and executives. By providing a unified interface for querying and resolving issues, it empowers your entire team to maintain and improve digital services, regardless of their technical expertise. For example, when a customer service specialist receives user complaints, they can quickly investigate application performance issues without navigating complex monitoring tools or interpreting alert conditions. This unified view enables everyone supporting your enterprise software to understand and act on insights about application health and performance. The result is a more collaborative approach across multiple enterprise teams, leading to more reliable system maintenance and excellent customer experiences.

Conclusion

The New Relic AI custom plugin represents a step forward in digital experience management. By addressing key challenges such as tool fragmentation, knowledge accessibility, and data complexity, this solution empowers teams to deliver superior digital experiences. This collaboration between AWS and New Relic opens up possibilities for building more robust digital infrastructures, advancing innovation in customer-facing technologies, and setting new benchmarks in proactive IT problem-solving.

To learn more about improving your operational efficiency with AI-powered observability, refer to the Amazon Q Business User Guide and explore New Relic AI capabilities. To get started on training, enroll for free Amazon Q training from AWS Training and Certification.

About New Relic

New Relic is a leading cloud-based observability platform that helps businesses optimize the performance and reliability of their digital systems. New Relic processes 3 EB of data annually. Over 5 billion data points are ingested and 2.4 trillion queries are executed every minute across 75,000 active customers. The platform serves over 333 billion web requests each day. The median platform response time is 60 milliseconds.


About the authors

 Meena Menon is a Sr. Customer Solutions Manager at AWS.

Sean Falconer is a Sr. Solutions Architect at AWS.

Nava Ajay Kanth Kota is a Senior Partner Solutions Architect at AWS. He is currently part of the Amazon Partner Network (APN) team that closely works with ISV Storage Partners. Prior to AWS, his experience includes running Storage, Backup, and Hybrid Cloud teams and his responsibilities included creating Managed Services offerings in these areas.

David Girling is a Senior AI/ML Solutions Architect with over 20 years of experience in designing, leading, and developing enterprise systems. David is part of a specialist team that focuses on helping customers learn, innovate, and utilize these highly capable services with their data for their use cases.

Camden Swita is Head of AI and ML Innovation at New Relic specializing in developing compound AI systems, agentic frameworks, and generative user experiences for complex data retrieval, analysis, and actioning.

Read More

Amazon SageMaker launches the updated inference optimization toolkit for generative AI

Amazon SageMaker launches the updated inference optimization toolkit for generative AI

Today, Amazon SageMaker is excited to announce updates to the inference optimization toolkit, providing new functionality and enhancements to help you optimize generative AI models even faster. These updates build on the capabilities introduced in the original launch of the inference optimization toolkit (to learn more, see Achieve up to ~2x higher throughput while reducing costs by ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 1).

The following are the key additions to the inference optimization toolkit:

  • Speculative decoding support for Meta Llama 3.1 models – The toolkit now supports speculative decoding for the latest Meta Llama 3.1 70B and 405B (FP8) text models, allowing you to accelerate inference process.
  • Support for FP8 quantization – The toolkit has been updated to enable FP8 (8-bit floating point) quantization, helping you further optimize model size and inference latency for GPUs. FP8 offers several advantages over FP32 (32-bit floating point) for deep learning model inference, including reduced memory usage, faster computation, lower power consumption, and broader applicability because FP8 quantization can be applied to key model components like the KV cache, attention, and MLP linear layers.
  • Compilation support for TensorRT-LLM – You can now use the toolkit’s compilation capabilities to integrate your generative AI models with NVIDIA’s TensorRT-LLM, delivering enhanced performance by optimizing the model with ahead-of-time compilation. You reduce the model’s deployment time and auto scaling latency because the model weights don’t require just-in-time compilation when the model deploys to a new instance.

These updates build on the toolkit’s existing capabilities, allowing you to reduce the time it takes to optimize generative AI models from months to hours, and achieve best-in-class performance for your use case. Simply choose from the available optimization techniques, apply them to your models, validate the improvements, and deploy the models in just a few clicks through SageMaker.

In this post, we discuss these new features of the toolkit in more detail.

Speculative decoding

Speculative decoding is an inference technique that aims to speed up the decoding process of large language models (LLMs) for latency-critical applications, without compromising the quality of the generated text. The key idea is to use a smaller, less powerful, but faster language model called the draft model to generate candidate tokens. These candidate tokens are then validated by the larger, more powerful, but slower target model. At each iteration, the draft model generates multiple candidate tokens. The target model verifies the tokens, and if it finds a particular token unacceptable, it rejects it and regenerates that token itself. This allows the larger target model to focus on verification, which is faster than auto-regressive token generation. The smaller draft model can quickly generate all the tokens and send them in batches to the target model for parallel evaluation, significantly speeding up the final response generation.

With the updated SageMaker inference toolkit, you get out-of-the-box support for speculative decoding that has been tested for performance at scale on various popular open source LLMs. The toolkit provides a pre-built draft model, eliminating the need to invest time and resources in building your own draft model from scratch. Alternatively, you can also use your own custom draft model, providing flexibility to accommodate your specific requirements. To showcase the benefits of speculative decoding, let’s look at the throughput (tokens per second) for a Meta Llama 3.1 70B Instruct model deployed on an ml.p4d.24xlarge instance using the Meta Llama 3.2 1B Instruct draft model.

Speculative decoding price

Given the increase in throughput that is realized with speculative decoding, we can also see the blended price difference when using speculative decoding vs. when not using speculative decoding. Here we have calculated the blended price as a 3:1 ratio of input to output tokens. The blended price is defined as follows:

  • Total throughput (tokens per second) = NumberOfOutputTokensPerRequest / (ClientLatency / 1,000) x concurrency
  • Blended price ($ per 1 million tokens) = (1−(discount rate)) × (instance per hour price) ÷ ((total token throughput per second) × 60 × 60 ÷ 10^6)) ÷ 4
  • Discount rate assuming a 26% Savings Plan

Speculative Decoding price

Quantization

Quantization is one of the most popular model compression methods to accelerate model inference. From a technical perspective, quantization has several benefits:

  • It reduces model size, which makes it suitable for deploying using fewer GPUs with lower total device memory available.
  • It reduces memory bandwidth pressure by using fewer-bit data types.
  • If offers increased space for the KV cache. This enables larger batch sizes and sequence lengths.
  • It significantly speeds up matrix multiplication (GEMM) operations on the NVIDIA architecture, for example, up to twofold for FP8 compared to the FP16/BF16 data type in microbenchmarks.

With this launch, the SageMaker inference optimization toolkit now supports FP8 and SmoothQuant (TensorRT-LLM only) quantization. SmoothQuant is a post-training quantization (PTQ) technique for LLMs that reduces memory and speeds up inference without sacrificing accuracy. It migrates quantization difficulty from activations to weights, which are easier to quantize. It does this by introducing a hyperparameter to calculate a per-channel scale that balances the quantization difficulty of activations and weights.

The current generation of instances like p5 and g6 provide support for FP8 using specialized tensor cores. FP8 represents float point numbers in 8 bits instead of the usual 16. At the time of writing, vLLM and TRT-LLM support quantizing the KV cache, attention, and linear layers for text-only LLMs. This reduces memory footprint, increases throughput, and lowers latency. Whereas both weights and activations can be quantized for p5 and g6 instances (W8A8), only weights can be quantized for p4d and g5 instances (W8A16). Though FP8 quantization has minimal impact on accuracy, you should always evaluate the quantized model on your data and for your use case. You can evaluate the quantized model through Amazon SageMaker Clarify. For more details, see Understand options for evaluating large language models with SageMaker Clarify.

The following graph compares the throughput of a FP8 quantized Meta Llama 3.1 70B Instruct model against a non-quantized Meta Llama 3.1 70B Instruct model on an ml.p4d.24xlarge instance.

Quantized vs base model throughput

The quantized model has a smaller memory footprint and it can be deployed to a smaller (and cheaper) instance type. In this post, we have deployed the quantized model on g5.12xlarge.

The following graph shows the price difference per million tokens between the FP8-quantized model deployed on g5.12xlarge and the non-quantized version deployed on p4d.24xlarge.

Quantized model price

Our analysis shows a clear price-performance edge for the FP8 quantized model over the non-quantized approach. However, quantization has an impact on model accuracy, so we strongly testing the quantized version of the model on your datasets.

The following is the SageMaker Python SDK code snippet for quantization. You just need to provide the quantization_config attribute in the optimize() function:

quantized_instance_type = "ml.g5.12xlarge"

output_path=f"s3://{artifacts_bucket_name}/llama-3-1-70b-fp8/"

optimized_model = model_builder.optimize(
    instance_type=quantized_instance_type,
    accept_eula=True,
    quantization_config={
        "OverrideEnvironment": {
            "OPTION_QUANTIZE": "fp8",
            "OPTION_TENSOR_PARALLEL_DEGREE": "4"
        },
    },
    output_path=output_path,
)

Refer to the following code example to learn more about how to enable FP8 quantization and speculative decoding using the optimization toolkit for a pre-trained Amazon SageMaker JumpStart model. If you want to deploy a fine-tuned model with SageMaker JumpStart using speculative decoding, refer to the following notebook.

Compilation

Compilation optimizes the model to extract the best available performance on the chosen hardware type, without any loss in accuracy. For compilation, the SageMaker inference optimization toolkit provides efficient loading and caching of optimized models to reduce model loading and auto scaling time by up to 40–60 % for Meta Llama 3 8B and 70B.

Model compilation enables running LLMs on accelerated hardware, such as GPUs, while simultaneously optimizing the model’s computational graph for optimal performance on the target hardware. When using the Large Model Inference (LMI) Deep Learning Container (DLC) with the TensorRT-LLM framework, the compiler is invoked from within the framework and creates compiled artifacts. These compiled artifacts are unique for a combination of input shapes, precision of the model, tensor parallel degree, and other framework- or compiler-level configurations. Although the compilation process avoids overhead during inference and enables optimized inference, it can take a lot of time.

To avoid re-compiling every time a model is deployed onto a GPU with the TensorRT-LLM framework, SageMaker introduces the following features:

  • A cache of pre-compiled artifacts – This includes popular models like Meta Llama 3.1. When using an optimized model with the compilation config, SageMaker automatically uses these cached artifacts when the configurations match.
  • Ahead-of-time compilation – The inference optimization toolkit enables you to compile your models with the desired configurations before deploying them on SageMaker.

The following graph illustrates the improvement in model loading time when using pre-compiled artifacts with the SageMaker LMI DLC. The models were compiled with a sequence length of 4096 and a batch size of 16, with Meta Llama 3.1 8B deployed on a g5.12xlarge (tensor parallel degree = 4) and Meta Llama 3.1 70B Instruct on a p4d.24xlarge (tensor parallel degree = 8). As you can see on the graph, the bigger the model, the bigger the benefit of using a pre-compiled model (16% improvement for Meta Llama 3 8B and 43% improvement for Meta Llama 3 70B).

Load times

Compilation using the SageMaker Python SDK

For the SageMaker Python SDK, you can configure the compilation by changing the environment variables in the .optimize() function. For more details on compilation_config, refer to TensorRT-LLM ahead-of-time compilation of models tutorial.

optimized_model = model_builder.optimize(
    instance_type=gpu_instance_type,
    accept_eula=True,
    compilation_config={
        "OverrideEnvironment": {
            "OPTION_ROLLING_BATCH": "trtllm",
            "OPTION_MAX_INPUT_LEN": "4096",
            "OPTION_MAX_OUTPUT_LEN": "4096",
            "OPTION_MAX_ROLLING_BATCH_SIZE": "16",
            "OPTION_TENSOR_PARALLEL_DEGREE": "8",
        }
    },
    output_path=f"s3://{artifacts_bucket_name}/trtllm/",
)

Refer to the following notebook for more information on how to enable TensorRT-LLM compilation using the optimization toolkit for a pre-trained SageMaker JumpStart model.

Amazon SageMaker Studio UI experience

In this section, let’s walk through the Amazon SageMaker Studio UI experience to run an inference optimization job. In this case, we use the Meta Llama 3.1 70B Instruct model, and for the optimization option, we quantize the model using INT4-AWQ and then use the SageMaker JumpStart suggested draft model Meta Llama 3.2 1B Instruct for speculative decoding.

First, we search for the Meta Llama 3.1 70B Instruct model in the SageMaker JumpStart model hub and choose Optimize on the model card.

Studio-Optimize

The Create inference optimization job page provides you options to choose the type of optimization. In this case, we choose to take advantage of the benefits of both INT4-AWQ quantization and speculative decoding.

Studio Optimization Options

Chosing Optimization Options in Studio

For the draft model, you have a choice to use the SageMaker recommended draft model, choose one the SageMaker JumpStart models, or bring your own draft model.

Draft model options in Studio

For this scenario, we choose the SageMaker recommended Meta Llama 3.2 1B Instruct model as the draft model and start the optimization job.

Optimization job details

When the optimization job is complete, you have an option to evaluate performance or deploy the model onto a SageMaker endpoint for inference.

Inference Optimization Job deployment

Optimized Model Deployment

Pricing

For compilation and quantization jobs, SageMaker will optimally choose the right instance type, so you don’t have to spend time and effort. You will be charged based on the optimization instance used. To learn more, see Amazon SageMaker pricing. For speculative decoding, there is no additional optimization cost involved; the SageMaker inference optimization toolkit will package the right container and parameters for the deployment on your behalf.

Conclusion

To get started with the inference optimization toolkit, refer to Achieve up to 2x higher throughput while reducing cost by up to 50% for GenAI inference on SageMaker with new inference optimization toolkit: user guide – Part 2. This post will walk you through how to use the inference optimization toolkit when using SageMaker inference with SageMaker JumpStart and the SageMaker Python SDK. You can use the inference optimization toolkit with supported models on SageMaker JumpStart. For the full list of supported models, refer to Inference optimization for Amazon SageMaker models.


About the Authors

Marc KarpMarc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Dmitry SoldatkinDmitry Soldatkin is a Senior AI/ML Solutions Architect at Amazon Web Services (AWS), helping customers design and build AI/ML solutions. Dmitry’s work covers a wide range of ML use cases, with a primary interest in Generative AI, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, utilities, and telecommunications. He has a passion for continuous innovation and using data to drive business outcomes.

RaghuRaghu Ramesha is a Senior ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.

Rishabh Ray Chaudhury is a Senior Product Manager with Amazon SageMaker, focusing on Machine Learning inference. He is passionate about innovating and building new experiences for Machine Learning customers on AWS to help scale their workloads. In his spare time, he enjoys traveling and cooking. You can find him on LinkedIn.

Lokeshwaran Ravi is a Senior Deep Learning Compiler Engineer at AWS, specializing in ML optimization, model acceleration, and AI security. He focuses on enhancing efficiency, reducing costs, and building secure ecosystems to democratize AI technologies, making cutting-edge ML accessible and impactful across industries.

Read More