January 2025 – Page 9

AutoGen v0.4: Reimagining the foundation of agentic AI for scale, extensibility, and robustness

The v0.4 update introduces a cohesive AutoGen ecosystem that includes the framework, developer tools, and applications. The framework’s layered architecture clearly defines each layer’s functionality. It supports both first-party and third-party applications and extensions.

Over the past year, our work on AutoGen has highlighted the transformative potential of agentic AI and multi-agent applications. Today, we are excited to announce AutoGen v0.4, a significant milestone informed by insights from our community of users and developers. This update represents a complete redesign of the AutoGen library, developed to improve code quality, robustness, generality, and scalability in agentic workflows.

The initial release of AutoGen generated widespread interest in agentic technologies. At the same time, users struggled with architectural constraints, an inefficient API compounded by rapid growth, and limited debugging and intervention functionality. Feedback highlighted the need for stronger observability and control, more flexible multi-agent collaboration patterns, and reusable components. AutoGen v0.4 addresses these issues with its asynchronous, event-driven architecture.

This update makes AutoGen more robust and extensible, enabling a broader range of agentic scenarios. The new framework includes the following features, inspired by feedback from both within and outside Microsoft.

Asynchronous messaging: Agents communicate through asynchronous messages, supporting both event-driven and request/response interaction patterns.
Modular and extensible: Users can easily customize systems with pluggable components, including custom agents, tools, memory, and models. They can also build proactive and long-running agents using event-driven patterns.
Observability and debugging: Built-in metric tracking, message tracing, and debugging tools provide monitoring and control over agent interactions and workflows, with support for OpenTelemetry for industry-standard observability.
Scalable and distributed: Users can design complex, distributed agent networks that operate seamlessly across organizational boundaries.
Built-in and community extensions: The extensions module enhances the framework’s functionality with advanced model clients, agents, multi-agent teams, and tools for agentic workflows. Community support allows open-source developers to manage their own extensions.
Cross-language support: This update enables interoperability between agents built in different programming languages, with current support for Python and .NET and additional languages in development.
Full type support: Interfaces enforce type checks at build time, improving robustness and maintaining code quality.

New AutoGen framework

As shown in Figure 1, the AutoGen framework features a layered architecture with clearly defined responsibilities across the framework, developer tools, and applications. The framework comprises three layers: core, agent chat, and first-party extensions.

Core: The foundational building blocks for an event-driven agentic system.
AgentChat: A task-driven, high-level API built on the core layer, featuring group chat, code execution, pre-built agents, and more. This layer is most similar to AutoGen v0.2 (opens in new tab), making it the easiest API to migrate to.
Extensions: Implementations of core interfaces and third-party integrations, such as the Azure code executor and OpenAI model client.

Developer tools

In addition to the framework, AutoGen 0.4 includes upgraded programming tools and applications, designed to support developers in building and experimenting with AutoGen.

AutoGen Bench: Enables developers to benchmark their agents by measuring and comparing performance across tasks and environments.

AutoGen Studio: Rebuilt on the v0.4 AgentChat API, this low-code interface enables rapid prototyping of AI agents. It introduces several new capabilities:

Real-time agent updates: View agent action streams in real time with asynchronous, event-driven messages.
Mid-execution control: Pause conversations, redirect agent actions, and adjust team composition. Then seamlessly resume tasks.
Interactive feedback through the UI: Add a UserProxyAgent to enable user input and guidance during team runs in real time.
Message flow visualization: Understand agent communication through an intuitive visual interface that maps message paths and dependencies.
Drag-and-drop team builder: Design agent teams visually using an interface for dragging components into place and configuring their relationships and properties.
Third-party component galleries: Import and use custom agents, tools, and workflows from external galleries to extend functionality.

Magentic-One: A new generalist multi-agent application to solve open-ended web and file-based tasks across various domains. This tool marks a significant step toward creating agents capable of completing tasks commonly encountered in both work and personal contexts.

Migrating to AutoGen v0.4

We implemented several measures to facilitate a smooth upgrade from the previous v0.2 API, addressing core differences in the underlying architecture.

First, the AgentChat API maintains the same level of abstraction as v0.2, making it easy to migrate existing code to v0.4. For example, AgentChat offers an AssistantAgent and UserProxy agent with similar behaviors to those in v0.2. It also provides a team interface with implementations like RoundRobinGroupChat and SelectorGroupChat, which cover all the capabilities of the GroupChat class in v0.2. Additionally, v0.4 introduces many new functionalities, such as streaming messages, improved observability, saving and restoring task progress, and resuming paused actions where they left off.

For detailed guidance, refer to the migration guide (opens in new tab).

Looking forward

This new release sets the stage for a robust ecosystem and strong foundation to drive advances in agentic AI application and research. Our roadmap includes releasing .NET support, introducing built-in, well-designed applications and extensions for challenging domains, and fostering a community-driven ecosystem. We remain committed to the responsible development of AutoGen and its evolving capabilities.

We encourage you to engage with us on AutoGen’s Discord server (opens in new tab) and share feedback on the official AutoGen repository (opens in new tab) via GitHub Issues. Stay up to date with frequent AutoGen updates via X.

Acknowledgments

We would like to thank the many individuals whose ideas and insights helped formalize the concepts introduced in this release, including Rajan Chari, Ece Kamar, John Langford, Ching-An Chen, Bob West, Paul Minero, Safoora Yousefi, Will Epperson, Grace Proebsting, Enhao Zhang, and Andrew Ng.

The post AutoGen v0.4: Reimagining the foundation of agentic AI for scale, extensibility, and robustness appeared first on Microsoft Research.

The CMA’s assessment of Google Search

The CMA has announced that it will assess whether Google Search has “Strategic Market Status” (SMS) under the new Digital Markets, Competition and Consumers regime, and …Read More

GenAI Acceleration for PyTorch 2.5 on Intel® Xeon®Processors

This blog is the fifth in a series focused on accelerating generative AI models with pure, native PyTorch. We demonstrate the GenAI acceleration of GPTFast, Segment Anything Fast, and Diffusion Fast on Intel® Xeon®Processors.

First, we revisit GPTFast, a remarkable work that speeds up text generation in under 1000 lines of native PyTorch code. Initially, GPTFast supported only the CUDA backend. We will show you how to run GPTFast on CPU and achieve additional performance speedup with weight-only quantization (WOQ).

In Segment Anything Fast, we have incorporated support for the CPU backend and will demonstrate performance acceleration by leveraging the increased power of CPU with BFloat16, torch.compile, and scaled_dot_product_attention (SDPA) with a block-wise attention mask. The speedup ratio against FP32 can reach 2.91x in vit_b and 3.95x in vit_h.

Finally, Diffusion Fast now supports the CPU backend and leverages the increased power of CPU with BFloat16, torch.compile, and SDPA. We also optimize the layout propagation rules for convolution, cat, and permute in Inductor CPU to improve performance. The speedup ratio against FP32 can achieve 3.91x in Stable Diffusion XL (SDXL).

Optimization strategies to boost performance on PyTorch CPU

GPTFast

Over the past year, generative AI has achieved great success across various language tasks and become increasingly popular. However, generative models face high inference costs due to the memory bandwidth bottlenecks in the auto-regressive decoding process. To address these issues, the PyTorch team published GPTFast which targets accelerating text generation with only pure, native PyTorch. This project developed an LLM from scratch almost 10x faster than the baseline in under 1000 lines of native PyTorch code. Initially, GPTFast supported only the CUDA backend and garnered approximately 5,000 stars in about four months. Inspired by Llama.cpp, the Intel team provided CPU backend support starting with the PyTorch 2.4 release, further enhancing the project’s availability in GPU-free environments. The following are optimization strategies used to boost performance on PyTorch CPU:

Torch.compile

torch.compile is a PyTorch function introduced since PyTorch 2.0 that aims to solve the problem of accurate graph capturing in PyTorch and ultimately enable software engineers to run their PyTorch programs faster.
Weight-only Quantization

Weight-only quantization (WOQ) is a trade-off between the performance and the accuracy since the bottleneck of the auto-regressive decoding phase in text generation is the memory bandwidth of loading weights and generally WOQ could lead to better accuracy compared to traditional quantization approach such as W8A8. GPTFast supports two types of WOQs: W8A16 and W4A16. To be specific, activations are stored in BFloat16 and model weights could be quantized to int8 and int4, as shown in Figure 1.

Figure 1. Weight-only Quantization Pattern. Source: Mingfei Ma, Intel

Weight Prepacking & Micro Kernel Design.

To maximize throughput, GPTFast allows model weights to be prepacked into hardware-specific layouts on int4 using internal PyTorch ATen APIs. Inspired by Llama.cpp, we prepacked the model weights from [N, K] to [N/kNTileSize, K, kNTileSize/2], with kNTileSize set to 64 on avx512. First, the model weights are blocked along the N dimension, then the two innermost dimensions are transposed. To minimize de-quantization overhead in kernel computation, we shuffle the 64 data elements on the same row in an interleaved pattern, packing Lane2 & Lane0 together and Lane3 & Lane1 together, as illustrated in Figure 2.

Figure 2. Weight Prepacking on Int4. Source: Mingfei Ma, Intel

During the generation phase, the torch.nn.Linear module will be lowered to be computed with high-performance kernels inside PyTorch ATen, where the quantized weights will be de-quantized first and then accumulated with fused multiply-add (FMA) at the register level, as shown in Figure 3.

Figure 3. Micro Kernel Design. Source: Mingfei Ma, Intel

Segment Anything Fast

Segment Anything Fast offers a simple and efficient PyTorch native acceleration for the Segment Anything Model (SAM) , which is a zero-shot vision model for generating promptable image masks. The following are optimization strategies used to boost performance on PyTorch CPU:

BFloat16

Bfloat16 is a commonly used half-precision type. Through less precision per parameter and activations, we can save significant time and memory in computation.
Torch.compile

torch.compile is a PyTorch function introduced since PyTorch 2.0 that aims to solve the problem of accurate graph capturing in PyTorch and ultimately enable developers to run their PyTorch programs faster.
Scaled Dot Product Attention (SDPA)

Scaled Dot-Product Attention (SDPA) is a crucial mechanism in transformer models. PyTorch offers a fused implementation that significantly outperforms a naive approaches. For Segment Anything Fast, we convert the attention mask from bfloat16 to float32 in a block-wise manner. This method not only reduces peak memory usage, making it ideal for systems with limited memory resources, but also enhances performance.

Diffusion Fast

Diffusion Fast offers a simple and efficient PyTorch native acceleration for text-to-image diffusion models. The following are optimization strategies used to boost performance on PyTorch CPU:

BFloat16

Bfloat16 is a commonly used half-precision type. Through less precision per parameter and activations, we can save significant time and memory in computation.
Torch.compile

torch.compile is a PyTorch function introduced since PyTorch 2.0 that aims to solve the problem of accurate graph capturing in PyTorch and ultimately enable software engineers to run their PyTorch programs faster.
Scaled Dot Product Attention (SDPA)

SDPA is a key mechanism used in transformer models, PyTorch provides a fused implementation to show large performance benefits over a naive implementation.

Model Usage on Native PyTorch CPU

GPTFast

To launch WOQ in GPTFast, first quantize the model weights. For example, to quantize with int4 and group size of 32:

python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4 –group size 32

Then run generation by passing the int4 checkpoint to generate.py

python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.pth --compile --device $DEVICE

To use CPU backend in GPTFast, simply switch DEVICE variable from cuda to CPU.

Segment Anything Fast

cd experiments

export SEGMENT_ANYTHING_FAST_USE_FLASH_4=0

python run_experiments.py 16 vit_b &lt;pytorch_github> &lt;segment-anything_github> &lt;path_to_experiments_data> --run-experiments --num-workers 32 --device cpu

python run_experiments.py 16 vit_h &lt;pytorch_github> &lt;segment-anything_github> &lt;path_to_experiments_data> --run-experiments --num-workers 32 --device cpu

Diffusion Fast

python run_benchmark.py --compile_unet --compile_vae --device=cpu

Performance Evaluation

GPTFast

We ran llama-2-7b-chat model based on test branch and the above hardware configuration on PyTorch. After applying the following steps, we saw a 3.8x boost compared to the baseline in eager mode:

Use torch.compile to automatically fuse elementwise operators.
Reduce memory footprint with WOQ-int8.
Further reduce memory footprint with WOQ-int4.
Use AVX512 which enables faster de-quant in micro kernels.

Figure 4. GPTFast Performance speedup in Llama2-7b-chat

Segment Anything Fast

We ran Segment Anything Fast on the above hardware configuration on PyTorch and achieved a performance speedup of BFloat16 with torch.compile and SDPA compared with FP32 as shown in Figure 5. The speedup ratio against FP32 can achieve 2.91x in vit_b, and 3.95x in vit_h.

Figure 5. Segment Anything Fast Performance speedup in vit_b/vit_h

Diffusion Fast

We ran Diffusion Fast on the above hardware configuration on PyTorch and achieved a performance speedup of BFloat16 with torch.compile and SDPA compared with FP32 as shown in Figure 6. The speedup ratio against FP32 can achieve 3.91x in Stable Diffusion XL (SDXL).

Figure 6. Diffusion Fast Performance speedup in Stable Diffusion XL

Conclusion and Future Work

In this blog, we introduced software optimizations for weight-only quantization, torch.compile, and SDPA, demonstrating how we can accelerate text generation with native PyTorch on CPU. Further improvements are expected with the support of the AMX-BF16 instruction set and the optimization of dynamic int8 quantization using torchao on CPU. We will continue to extend our software optimization efforts to a broader scope.

Acknowledgments

The results presented in this blog are a joint effort between Meta and the Intel PyTorch Team. Special thanks to Michael Gschwind from Meta who spent precious time providing substantial assistance. Together we took one more step on the path to improve the PyTorch CPU ecosystem.

Related Blogs

Part 1: How to accelerate Segment Anything over 8x with Segment Anything Fast.

Part 2: How to accelerate Llama-7B by almost 10x with help of GPTFast.

Part 3: How to accelerate text-to-image diffusion models up to 3x with Diffusion Fast.

Part 4: How to speed up FAIR’s Seamless M4T-v2 model by 2.7x.

Product and Performance Information

Figure 4: Intel Xeon Scalable Processors: Measurement on 4th Gen Intel Xeon Scalable processor using: 2x Intel(R) Xeon(R) Platinum 8480+, 56cores, HT On, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB 2 [0], DSA 2 [0], IAA 2 [0], QAT 2 [0], Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3B07.TEL2P1, microcode 0x2b000590, Samsung SSD 970 EVO Plus 2TB, CentOS Stream 9, 5.14.0-437.el9.x86_64, run single socket (1 instances in total with: 56 cores per instance, Batch Size 1 per instance), Models run with PyTorch 2.5 wheel. Test by Intel on 10/15/24.

Figure 5: Intel Xeon Scalable Processors: Measurement on 4th Gen Intel Xeon Scalable processor using: 2x Intel(R) Xeon(R) Platinum 8480+, 56cores, HT On, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB 2 [0], DSA 2 [0], IAA 2 [0], QAT 2 [0], Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3B07.TEL2P1, microcode 0x2b000590, Samsung SSD 970 EVO Plus 2TB, CentOS Stream 9, 5.14.0-437.el9.x86_64, run single socket (1 instances in total with: 56 cores per instance, Batch Size 16 per instance), Models run with PyTorch 2.5 wheel. Test by Intel on 10/15/24.

Figure 6: Intel Xeon Scalable Processors: Measurement on 4th Gen Intel Xeon Scalable processor using: 2x Intel(R) Xeon(R) Platinum 8480+, 56cores, HT On, Turbo On, NUMA 2, Integrated Accelerators Available [used]: DLB 2 [0], DSA 2 [0], IAA 2 [0], QAT 2 [0], Total Memory 512GB (16x32GB DDR5 4800 MT/s [4800 MT/s]), BIOS 3B07.TEL2P1, microcode 0x2b000590, Samsung SSD 970 EVO Plus 2TB, CentOS Stream 9, 5.14.0-437.el9.x86_64, run single socket (1 instances in total with: 56 cores per instance, Batch Size 1 per instance), Models run with PyTorch 2.5 wheel. Test by Intel on 10/15/24.

Notices and Disclaimers

Performance varies by use, configuration and other factors. Learn more on the Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation.

Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

AI disclaimer:

AI features may require software purchase, subscription or enablement by a software or platform provider, or may have specific configuration or compatibility requirements. Details at www.intel.com/AIPC. Results may vary.

Controlling Language and Diffusion Models by Transporting Activations

The increasing capabilities of large generative models and their ever more widespread deployment have raised concerns about their reliability, safety, and potential misuse. To address these issues, recent works have proposed to control model generation by steering model activations in order to effectively induce or prevent the emergence of concepts or behaviours in the generated output. In this paper we introduce Activation Transport (AcT), a general framework to steer activations guided by optimal transport theory that generalizes many previous activation-steering works. AcT is…Apple Machine Learning Research

How BQA streamlines education quality reporting using Amazon Bedrock

Given the value of data today, organizations across various industries are working with vast amounts of data across multiple formats. Manually reviewing and processing this information can be a challenging and time-consuming task, with a margin for potential errors. This is where intelligent document processing (IDP), coupled with the power of generative AI, emerges as a game-changing solution.

Enhancing the capabilities of IDP is the integration of generative AI, which harnesses large language models (LLMs) and generative techniques to understand and generate human-like text. This integration allows organizations to not only extract data from documents, but to also interpret, summarize, and generate insights from the extracted information, enabling more intelligent and automated document processing workflows.

The Education and Training Quality Authority (BQA) plays a critical role in improving the quality of education and training services in the Kingdom Bahrain. BQA reviews the performance of all education and training institutions, including schools, universities, and vocational institutes, thereby promoting the professional advancement of the nation’s human capital.

BQA oversees a comprehensive quality assurance process, which includes setting performance standards and conducting objective reviews of education and training institutions. The process involves the collection and analysis of extensive documentation, including self-evaluation reports (SERs), supporting evidence, and various media formats from the institutions being reviewed.

The collaboration between BQA and AWS was facilitated through the Cloud Innovation Center (CIC) program, a joint initiative by AWS, Tamkeen, and leading universities in Bahrain, including Bahrain Polytechnic and University of Bahrain. The CIC program aims to foster innovation within the public sector by providing a collaborative environment where government entities can work closely with AWS consultants and university students to develop cutting-edge solutions using the latest cloud technologies.

As part of the CIC program, BQA has built a proof of concept solution, harnessing the power of AWS services and generative AI capabilities. The primary purpose of this proof of concept was to test and validate the proposed technologies, demonstrating their viability and potential for streamlining BQA’s reporting and data management processes.

In this post, we explore how BQA used the power of Amazon Bedrock, Amazon SageMaker JumpStart, and other AWS services to streamline the overall reporting workflow.

The challenge: Streamlining self-assessment reporting

BQA has traditionally provided education and training institutions with a template for the SER as part of the review process. Institutions are required to submit a review portfolio containing the completed SER and supporting material as evidence, which sometimes did not adhere fully to the established reporting standards.

The existing process had some challenges:

Inaccurate or incomplete submissions – Institutions might provide incomplete or inaccurate information in the submitted reports and supporting evidence, leading to gaps in the data required for a comprehensive review.
Missing or insufficient supporting evidence – The supporting material provided as evidence by institutions frequently did not substantiate the claims made in their reports, which challenged the evaluation process.
Time-consuming and resource-intensive – The process required dedicating significant time and resources to review the submissions manually and follow up with institutions to request additional information if needed to rectify the submissions, resulting in slowing down the overall review process.

These challenges highlighted the need for a more streamlined and efficient approach to the submission and review process.

Solution overview

The proposed solution uses Amazon Bedrock and the Amazon Titan Express model to enable IDP functionalities. The architecture seamlessly integrates multiple AWS services with Amazon Bedrock, allowing for efficient data extraction and comparison.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.

The following diagram illustrates the solution architecture.

The solution consists of the following steps:

Relevant documents are uploaded and stored in an Amazon Simple Storage Service (Amazon S3) bucket.
An event notification is sent to an Amazon Simple Queue Service (Amazon SQS) queue to align each file for further processing. Amazon SQS serves as a buffer, enabling the different components to send and receive messages in a reliable manner without being directly coupled, enhancing scalability and fault tolerance of the system.
The text extraction AWS Lambda function is invoked by the SQS queue, processing each queued file and using Amazon Textract to extract text from the documents.
The extracted text data is placed into another SQS queue for the next processing step.
The text summarization Lambda function is invoked by this new queue containing the extracted text. This function sends a request to SageMaker JumpStart, where a Meta Llama text generation model is deployed to summarize the content based on the provided prompt.
In parallel, the InvokeSageMaker Lambda function is invoked to perform comparisons and assessments. It compares the extracted text against the BQA standards that the model was trained on, evaluating the text for compliance, quality, and other relevant metrics.
The summarized data and assessment results are stored in an Amazon DynamoDB table
Upon request, the InvokeBedrock Lambda function invokes Amazon Bedrock to generate generative AI summaries and comments. The function constructs a detailed prompt designed to guide the Amazon Titan Express model in evaluating the university’s submission.

Prompt engineering using Amazon Bedrock

To take advantage of the power of Amazon Bedrock and make sure the generated output adhered to the desired structure and formatting requirements, a carefully crafted prompt was developed according to the following guidelines:

Evidence submission – Present the evidence submitted by the institution under the relevant indicator, providing the model with the necessary context for evaluation
Evaluation criteria – Outline the specific criteria the evidence should be assessed against
Evaluation instructions – Instruct the model as follows:
- Indicate N/A if the evidence is irrelevant to the indicator
- Evaluate the university’s self-assessment based on the criteria
- Assign a score from 1–5 for each comment, citing evidence directly from the content
Response format – Specify the response as bullet points, focusing on relevant analysis and evidence, with a word limit of 100 words

To use this prompt template, you can create a custom Lambda function with your project. The function should handle the retrieval of the required data, such as the indicator name, the university’s submitted evidence, and the rubric criteria. Within the function, include the prompt template and dynamically populate the placeholders (${indicatorName}, ${JSON.stringify(allContent)}, and ${JSON.stringify(c.comment)}) with the retrieved data.

The Amazon Titan Text Express model will then generate the evaluation response based on the provided prompt instructions, adhering to the specified format and guidelines. You can process and analyze the model’s response within your function, extracting the compliance score, relevant analysis, and evidence.

The following is an example prompt template:

for (const c of comments) {
        const prompt = `
        Below is the evidence submitted by the university under the indicator "${indicatorName}":
        ${JSON.stringify(allContent)}
    
         Analyze and Evaluate the university's eviedence based on the provided rubric criteria:
        ${JSON.stringify(c.comment)}

        - If the evidence does not relate to the indicator, indicate that it is not applicable (N/A) without any additional commentary.
        
       Choose one from the below compliance score based on evidence submitted:
       1. Non-compliant: The comment does not meet the criteria or standards.
        2.Compliant with recommendation: The comment meets the criteria but includes a suggestion or recommendation for improvement.
        3. Compliant: The comment meets the criteria or standards.

        THE END OF THE RESPONSE THERE SHOULD BE EITHER SCORE: [SCORE: COMPLIANT OR NON-COMPLIANT OR COMPLIANT WITH RECOMMENDATION]
        Write your response in concise bullet points, focusing strictly on relevant analysis and evidence.
        **LIMIT YOUR RESPONSE TO 100 WORDS ONLY.**

        `;

        logger.info(`Prompt for comment ${c.commentId}: ${prompt}`);

        const body = JSON.stringify({
          inputText: prompt,
          textGenerationConfig: {
            maxTokenCount: 4096,
            stopSequences: [],
            temperature: 0,
            topP: 0.1,
          },
        });

The following screenshot shows an example of the Amazon Bedrock generated response.

Results

The implementation of Amazon Bedrock enabled institutions with transformative benefits. By automating and streamlining the collection and analysis of extensive documentation, including SERs, supporting evidence, and various media formats, institutions can achieve greater accuracy and consistency in their reporting processes and readiness for the review process. This not only reduces the time and cost associated with manual data processing, but also improves compliance with the quality expectations, thereby enhancing the credibility and quality of their institutions.

For BQA the implementation helped in achieving one of its strategic objectives focused on streamlining their reporting processes and achieve significant improvements across a range of critical metrics, substantially enhancing the overall efficiency and effectiveness of their operations.

Key success metrics anticipated include:

Faster turnaround times for generating 70% accurate and standards-compliant self-evaluation reports, leading to improved overall efficiency.
Reduced risk of errors or non-compliance in the reporting process, enforcing adherence to established guidelines.
Ability to summarize lengthy submissions into concise bullet points, allowing BQA reviewers to quickly analyze and comprehend the most pertinent information, reducing evidence analysis time by 30%.
More accurate compliance feedback functionality, empowering reviewers to effectively evaluate submissions against established standards and guidelines, while achieving 30% reduced operational costs through process optimizations.
Enhanced transparency and communication through seamless interactions, enabling users to request additional documents or clarifications with ease.
Real-time feedback, allowing institutions to make necessary adjustments promptly. This is particularly useful to maintain submission accuracy and completeness.
Enhanced decision-making by providing insights on the data. This helps universities identify areas for improvement and make data-driven decisions to enhance their processes and operations.

The following screenshot shows an example generating new evaluations using Amazon Bedrock

Conclusion

This post outlined the implementation of Amazon Bedrock at the Education and Training Quality Authority (BQA), demonstrating the transformative potential of generative AI in revolutionizing the quality assurance processes in the education and training sectors. For those interested in exploring the technical details further, the full code for this implementation is available in the following GitHub repo. If you are interested in conducting a similar proof of concept with us, submit your challenge idea to the Bahrain Polytechnic or University of Bahrain CIC website.

About the Author

Maram AlSaegh is a Cloud Infrastructure Architect at Amazon Web Services (AWS), where she supports AWS customers in accelerating their journey to cloud. Currently, she is focused on developing innovative solutions that leverage generative AI and machine learning (ML) for public sector entities.

Boosting team innovation, productivity, and knowledge sharing with Amazon Q Business – Web experience

Amazon Q Business can increase productivity across diverse teams, including developers, architects, site reliability engineers (SREs), and product managers. Amazon Q Business as a web experience makes AWS best practices readily accessible, providing cloud-centered recommendations quickly and making it straightforward to access AWS service functions, limits, and implementations. These elements are brought together in a web integration that serves various job roles and personas exactly when they need it.

As enterprises continue to grow their applications, environments, and infrastructure, it has become difficult to keep pace with technology trends, best practices, and programming standards. Enterprises provide their developers, engineers, and architects with a range of knowledge bases and documents, such as usage guides, wikis, and tools. But these resources tend to become siloed over time and inaccessible across teams, resulting in reduced knowledge, duplication of work, and reduced productivity.

MuleSoft from Salesforce provides the Anypoint platform that gives IT the tools to automate everything. This includes integrating data and systems and automating workflows and processes, and the creation of incredible digital experiences—all on a single, user-friendly platform.

This post shows how MuleSoft introduced a generative AI-powered assistant using Amazon Q Business to enhance their internal Cloud Central dashboard. This individualized portal shows assets owned, costs and usage, and well-architected recommendations to over 100 engineers. For more on MuleSoft’s journey to cloud computing, refer to Why a Cloud Operating Model?

Developers, engineers, FinOps, and architects can get the right answer at the right time when they’re ready to troubleshoot, address an issue, have an inquiry, or want to understand AWS best practices and cloud-centered deployments.

This post covers how to integrate Amazon Q Business into your enterprise setup.

Solution overview

The Amazon Q Business web experience provides seamless access to information, step-by-step instructions, troubleshooting, and prescriptive guidance so teams can deploy well-architected applications or cloud-centered infrastructure. Team members can chat directly or upload documents and receive summarization, analysis, or answers to a calculation. Amazon Q Business uses supported connectors such as Confluence, Amazon Relational Database Service (Amazon RDS), and web crawlers. The following diagram shows the reference architecture for various personas, including developers, support engineers, DevOps, and FinOps to connect with internal databases and the web using Amazon Q Business.

In this reference architecture, you can see how various user personas, spanning across teams and business units, use the Amazon Q Business web experience as an access point for information, step-by-step instructions, troubleshooting, or prescriptive guidance for deploying a well-architected application or cloud-centered infrastructure. The web experience allows team members to chat directly with an AI assistant or upload documents and receive summarization, analysis, or answers to a calculation.

Use cases for Amazon Q Business

Small, medium, and large enterprises, depending on their mode of operation, type of business, and level of investment in IT, will have varying approaches and policies on providing access to information. Amazon Q Business is one of the AWS suites of generative AI services that provides a web-based utility to set up, manage, and interact with Amazon Q. It can answer questions, provide summaries, generate content, and complete tasks using the data and expertise found in your enterprise systems. You can connect internal and external datasets without compromising security to seamlessly incorporate your specific standard operating procedures, guidelines, playbooks, and reference links. With Amazon Q, MuleSoft’s engineering teams were able to address their AWS specific inquiries (such as support ticket escalation, operational guidance, and AWS Well-Architected best practices) at scale.

The Amazon Q Business web experience allows business users across various job titles and functions to interact with Amazon Q through the web browser. With the web experience, teams can access the same information and receive similar recommendations based on their prompt or inquiry, level of experience, and knowledge, ranging from beginner to advanced.

The following demos are examples of what the Amazon Q Business web experience looks like. Amazon Q Business securely connects to over 40 commonly used business tools, such as wikis, intranets, Atlassian, Gmail, Microsoft Exchange, Salesforce, ServiceNow, Slack, and Amazon Simple Storage Service (Amazon S3). Point Amazon Q Business at your enterprise data, and it will search your data, summarize it logically, analyze trends, and engage in dialogue with end users about the data. This helps users access their data no matter where it resides in their organization.

Amazon Q Business underscores prompting and response for prescriptive guidance. Optimizing Amazon Elastic Block Store (Amazon EBS) volumes as an example, it provided detailed migration steps from gp2 to gp3. This is a well-known use case asked about by several MuleSoft teams.

Through the web experience, you can effortlessly perform document uploads and prompts for summary, calculation, or recommendations based on your document. You have the flexibility to upload .pdf, .xls, .xlsx, or .csv files directly into the chat interface. You can also assume a persona such as FinOps or DevOps and get personalized recommendations or responses.

MuleSoft engineers used the Amazon Q Business web summarization feature to better understand Split Cost Allocation Data (SCAD) for Amazon Elastic Kubernetes Service (Amazon EKS). They uploaded the SCAD PDF documents to Amazon Q and got straightforward summaries. This helped them understand their customer’s use of MuleSoft Anypoint platform running on Amazon EKS.

Amazon Q helped analyze IPv4 costs by processing an uploaded Excel file. As the video shows, it calculated expenses for elastic IPs and outbound data transfers, supporting a proposed network estimate.

Amazon Q Business demonstrating its ability to provide tailored advice by responding to a specific user scenario. As the video shows, a user took on the role of a FinOps professional and asked Amazon Q to recommend AWS tools for cost optimization. Amazon Q then offered personalized suggestions based on this FinOps persona perspective.

Prerequisites

To get started with your Amazon Q Business web experience, you need the following prerequisites:

An AWS account that will contain your AWS resources
AWS IAM Identity Center configured for an Amazon Q Business application
An Amazon Q Business subscription (Amazon Q Business Lite or Amazon Q Business Pro) and index (Starter or Enterprise) configured for an Amazon Q business application

Create an Amazon Q Business web experience

Complete the following steps to create your web experience:

The web experience can be used by a variety of business users or personas to yield accurate and repeatable recommendations for level 100, 200, and 300 inquiries. Amazon Q supports a variety of data sources and data connectors to personalize your user experience. You can also further enrich your dataset with knowledge bases within Amazon Q. With Amazon Q Business set up with your own datasets and sources, teams and business units within your enterprise can index from the same information on common topics such as cost optimization, modernization, and operational excellence while maintaining their own unique area of expertise, responsibility, and job function.

Clean Up

After trying the Amazon Q Business web experience, remember to remove any resources you created to avoid unnecessary charges. Complete the following steps:

Delete the web experience:
- On the Amazon Q Business console, navigate to the Web experiences section within your application.
- Select the web experience you want to remove.
- On the Actions menu, choose Delete.
- Confirm the deletion by following the prompts.
If you granted specific users access to the web experience, revoke their permissions. This might involve updating AWS Identity and Access Management (IAM) policies or removing users from specific groups in IAM Identity Center.
If you set up any custom configurations for the web experience, such as specific data source filters or custom prompts, make sure to remove these.
If you integrated the web experience with other tools or services, remove those integrations.
Check for and delete any Amazon CloudWatch alarms or logs specifically set up for monitoring this web experience.

After deletion, review your AWS billing to make sure that charges related to the web experience have stopped.

Deleting a web experience is irreversible. Make sure you have any necessary backups or exports of important data before proceeding with the deletion. Also, keep in mind that deleting a web experience doesn’t automatically delete the entire Amazon Q Business application or its associated data sources. If you want to remove everything, follow the Amazon Q Business application clean-up procedure for the entire application.

Conclusion

Amazon Q Business web experience is your gateway to a powerful generative AI assistant. Want to take it further? Integrate Amazon Q with Slack for an even more interactive experience.

Every organization has unique needs when it comes to AI. That’s where Amazon Q shines. It adapts to your business needs, user applications, and end-user personas. The best part? You don’t need to do the heavy lifting. No complex infrastructure setup. No need for teams of data scientists. Amazon Q connects to your data and makes sense of it with just a click. It’s AI power made simple, giving you the intelligence you need without the hassle.

To learn more about the power of a generative AI assistant in your workplace, see Amazon Q Business.

About the Authors

Rueben Jimenez is an AWS Sr Solutions Architect who designs and implements complex data analytics, machine learning, generative AI, and cloud infrastructure solutions.

Sona Rajamani is a Sr. Manager Solutions Architect at AWS. She lives in the San Francisco Bay Area and helps customers architect and optimize applications on AWS. In her spare time, she enjoys traveling and hiking.

Erick Joaquin is a Sr Customer Solutions Manager for Strategic Accounts at AWS. As a member of the account team, he is focused on evolving his customers’ maturity in the cloud to achieve operational efficiency at scale.

NVIDIA and IQVIA Build Domain-Expert Agentic AI for Healthcare and Life Sciences

IQVIA, the world’s leading provider of clinical research services, commercial insights and healthcare intelligence, is working with NVIDIA to build custom foundation models and agentic AI workflows that can accelerate research, clinical development and access to new treatments.

AI applications trained on the organization’s vast healthcare-specific information and guided by its deep domain expertise will help the industry boost the efficiency of clinical trials and optimize planning for the launch of therapies and medical devices — ultimately improving patient outcomes.

Operating in over 100 countries, IQVIA has built the largest global healthcare network and is uniquely connected to the ecosystem with the most comprehensive and granular set of information, analytics and technologies in the industry.

Announced today at the J.P. Morgan Conference in San Francisco, IQVIA’s collection of models, AI agents and reference workflows will be developed with the NVIDIA AI Foundry platform for building custom models, allowing IQVIA’s thousands of pharmaceutical, biotech and medical device customers to benefit from NVIDIA’s agentic AI capabilities and IQVIA’s technologies, life sciences information and expertise.

Enabling Industry Applications in Clinical Trials

The healthcare and life sciences industry generates more information than any other industry in the world, making up 30% of the world’s data volume.

IQVIA plans to use its unparalleled information assets, analytics and domain expertise — known as IQVIA Connected Intelligence — with the NVIDIA AI Foundry service to build language and multimodal foundational models that will power a collection of customized IQVIA AI agents.

These agents are anticipated to be available in predefined workflows, or blueprints, that would accomplish a specific task. This partnership aims to accelerate the innovation cycle of IQVIA Healthcare-grade AI. IQVIA has been leading in the responsible use of AI, ensuring that its AI-powered capabilities are grounded in privacy, regulatory compliance and patient safety. IQVIA Healthcare-grade AI represents the company’s commitment to these principles.

One key opportunity area is in clinical development, when clinical trials are conducted for new drugs. The overall process takes about 11 years, on average, and each trial has a multitude of workflows that could be supported by AI agents. For example, just starting a clinical trial involves site selection, participant recruitment, regulatory submissions and tight communication between study sites and their sponsors.

NVIDIA AI Foundry Streamlines Custom Model Development

To streamline the development of these AI agents, IQVIA is using tools within NVIDIA AI Foundry and the NVIDIA AI Enterprise software platform, including NVIDIA NIM microservices, especially the Llama Nemotron and Cosmos Nemotron model families; NVIDIA AI Blueprint reference workflows; the NVIDIA NeMo platform for developing custom generative AI; and dedicated capacity on NVIDIA DGX Cloud.

The NVIDIA AI Blueprint for multimodal PDF data extraction can help IQVIA unlock the immense amount of healthcare text, graphs, charts and tables stored in PDF files, bringing previously inaccessible information to train AI models and agents for domain-specific and even customer-specific applications. NVIDIA RAPIDS data science libraries then accelerate the construction of knowledge graphs.

Additional AI agents could automate complex, time-consuming tasks, like document generation and patient recruitment, allowing healthcare professionals to focus on strategic decision-making and human interaction.

Learn more about NVIDIA technologies and their impact on healthcare and life sciences.

Google Cloud’s Automotive AI Agent arrives for Mercedes-Benz

Today Google Cloud is unveiling Automotive AI Agent, a new way for automakers to create helpful generative AI experiences. Built using Gemini with Vertex AI, Automotive …Read More

NVIDIA Statement on the Biden Administration’s Misguided ‘AI Diffusion’ Rule

For decades, leadership in computing and software ecosystems has been a cornerstone of American strength and influence worldwide. The federal government has wisely refrained from dictating the design, marketing and sale of mainstream computers and software — key drivers of innovation and economic growth.

The first Trump Administration laid the foundation for America’s current strength and success in AI, fostering an environment where U.S. industry could compete and win on merit without compromising national security. As a result, mainstream AI has become an integral part of every new application, driving economic growth, promoting U.S. interests and ensuring American leadership in cutting-edge technology.

Today, companies, startups and universities around the world are tapping mainstream AI to advance healthcare, agriculture, manufacturing, education and countless other fields, driving economic growth and unlocking the potential of nations. Built on American technology, the adoption of AI around the world fuels growth and opportunity for industries at home and abroad.

That global progress is now in jeopardy. The Biden Administration now seeks to restrict access to mainstream computing applications with its unprecedented and misguided “AI Diffusion” rule, which threatens to derail innovation and economic growth worldwide.

In its last days in office, the Biden Administration seeks to undermine America’s leadership with a 200+ page regulatory morass, drafted in secret and without proper legislative review. This sweeping overreach would impose bureaucratic control over how America’s leading semiconductors, computers, systems and even software are designed and marketed globally. And by attempting to rig market outcomes and stifle competition — the lifeblood of innovation — the Biden Administration’s new rule threatens to squander America’s hard-won technological advantage.

While cloaked in the guise of an “anti-China” measure, these rules would do nothing to enhance U.S. security. The new rules would control technology worldwide, including technology that is already widely available in mainstream gaming PCs and consumer hardware. Rather than mitigate any threat, the new Biden rules would only weaken America’s global competitiveness, undermining the innovation that has kept the U.S. ahead.

Although the rule is not enforceable for 120 days, it is already undercutting U.S. interests. As the first Trump Administration demonstrated, America wins through innovation, competition and by sharing our technologies with the world — not by retreating behind a wall of government overreach. We look forward to a return to policies that strengthen American leadership, bolster our economy and preserve our competitive edge in AI and beyond.

Ned Finkle is vice president of government affairs at NVIDIA.

KG-TRICK: Unifying Textual and Relational Information Completion of Knowledge for Multilingual Knowledge Graphs

Multilingual knowledge graphs (KGs) provide high-quality relational and textual information for various NLP applications, but they are often incomplete, especially in non-English languages. Previous research has shown that combining information from KGs in different languages aids either Knowledge Graph Completion (KGC), the task of predicting missing relations between entities, or Knowledge Graph Enhancement (KGE), the task of predicting missing textual information for entities. Although previous efforts have considered KGC and KGE as independent tasks, we hypothesize that they are…Apple Machine Learning Research

Collaborators: Silica in space with Richard Black and Dexter Greene

New AutoGen framework

Developer tools

Migrating to AutoGen v0.4

Looking forward

Acknowledgments

Optimization strategies to boost performance on PyTorch CPU

GPTFast

Segment Anything Fast

Diffusion Fast

Model Usage on Native PyTorch CPU

Performance Evaluation

GPTFast

Segment Anything Fast

Diffusion Fast

Conclusion and Future Work

Acknowledgments

Related Blogs

Product and Performance Information

Notices and Disclaimers

AI disclaimer:

The challenge: Streamlining self-assessment reporting

Solution overview

Prompt engineering using Amazon Bedrock

Results

Conclusion

About the Author

Solution overview

Use cases for Amazon Q Business

Prerequisites

Create an Amazon Q Business web experience

Clean Up

Conclusion

About the Authors

Enabling Industry Applications in Clinical Trials

NVIDIA AI Foundry Streamlines Custom Model Development

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.