How SonicJobs Uses AI Agents to Connect the Internet, Starting with Jobs

How SonicJobs Uses AI Agents to Connect the Internet, Starting with Jobs

Companies in the US spend $15bn annually on talent acquisition. The most important metric in recruitment advertising is the conversion from the paid click on the job platform to the application the employer receives. Industry-wide, apply conversion is just 5%. Redirection of the candidate from the job platform to the company site is the biggest cause of abandonment; this step has a 70% bounce rate. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with Mikhil Raja, Cofounder and CEO of SonicJobs, about how they have built AI Agents to enable candidates to complete applications directly on job platforms, without redirection, boosting completion rates to 26% from 5%. Raja delves deep into SonicJobs’ cutting-edge technology, which merges traditional AI with large language models (LLMs) to understand and interact with job application web flows. He also emphasizes the importance of fine-tuning foundational models to achieve more impactful and scalable innovations.

SonicJobs is a member of the NVIDIA Inception program for startups.

Time Stamps

1:19: Why applying for a job remains a Web 1.0 experience — and how SonicJobs’ AI Agents are changing this

6:06: Explanation of SonicJobs’ technology and the benefits to users and companies

9:03: The evolution of AI Agents from AutoGPT to Verticalized B2B solutions

11:33: How SonicJobs realized the approach it should take with Agentic AI

15:18: Scaling SonicJobs’ AI Agent and the adaptive learning flywheel

17:45: Raja discusses the need for accuracy including fine-tuning foundational models

20:45: Framework for how SonicJobs’ Verticalized AI Agent solution  can be applied to further Verticals

23:23: Advice Raja would give to a company that’s currently trying to hire

You Might Also Like…

How Georgia Tech’s AI Makerspace Is Preparing the Future Workforce for AI – Ep. 229

AI is set to transform the workforce — and the Georgia Institute of Technology’s new AI Makerspace is helping tens of thousands of students get ahead of the curve. Arijit Raychowdhury, a professor and Steve W. Cedex school chair of electrical engineering at Georgia Tech’s college of engineering, shares about the school’s supercomputer hub, which provides students with the computing resources to reinforce their coursework and gain hands-on experience with AI.

How Roblox Uses Generative AI to Enhance User Experiences – Ep. 227

Roblox is a colorful online platform that aims to reimagine how people come together — now that vision is being augmented by generative AI. Anupam Singh, vice president of AI and growth engineering at Roblox, speaks about how the company uses the technology to enhance virtual experiences with features such as automated chat filters and real-time text translation.

Performance AI: Insights from Arthur’s Adam Wenchel – Ep. 221

Adam Wenchel, cofounder and CEO of Arthur, explains how the company enhances the performance of AI systems across various metrics like accuracy, explainability, and fairness. He also shares insights into the challenges and opportunities of deploying generative AI.

Media.Monks’ Lewis Smithingham on Enhancing Media and Marketing With AI – Ep. 222

Meet Media.Monks’ Wormhole, an alien-like, conversational robot with a quirky personality and the ability to offer keen marketing expertise. Lewis Smithingham, senior vice president of innovation and special ops at Media.Monks, a global marketing and advertising company, discusses the creation of Wormhole and AI’s potential to enhance media and entertainment.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Challenges and Efforts in PyTorch Multi-Device Integration: Compatibility, Portability, and Integration Efficiencies

Challenges and Efforts in PyTorch Multi-Device Integration: Compatibility, Portability, and Integration Efficiencies

Introduction

As the demand for diverse hardware accelerators grows, the need for a robust and adaptable deep learning framework becomes increasingly critical. While working through this integration, several challenges have surfaced in the PyTorch ecosystem, potentially affecting various hardware vendors. This blog aims to highlight these issues and propose solutions to enhance PyTorch’s adaptability, portability, and resilience across different hardware platforms.

Improve Users’ Code Portability via Accelerator Autoloading

Currently, users face additional work when running their code on different accelerators. One such task is manually importing modules for out-of-tree devices. This requires users to not only understand the different usage patterns between accelerators but also make their code aware of these differences. If you have projects originally running on GPU/CPU and want to migrate to other accelerators, this can lead to significant work and potential frustration.

Examples of extra import:

# Case 1: Use HPU
import torch
import torchvision.models as models
import habana_frameworks.torch # <-- extra import
model = models.resnet50().eval().to("hpu")
input = torch.rand(128, 3, 224, 224).to("hpu")
output = model(input)

# Case 2: Use torch_npu
import torch
import torch_npu # <-- extra import
print(torch.ones(1, 2, device='npu'))

As a high-level machine learning framework, PyTorch’s ability to shield users from device differences is a competitive feature. Accelerator Autoloading allows users to continue using the familiar PyTorch device programming model without explicitly loading or importing device-specific extensions.

How does it works?

Utilize Python’s plugin architecture to enable automatic loading of device extensions via entry points in the PyTorch package.

Python entry points provide a standardized way for Python packages to expose and discover components or plugins within an application. Via definition in accelerator’s package setup.py , PyTorch can automatically initialize accelerator modules when calling import torch , which gives users consistent experience between different backend devices.

From device perspective, only need to claim following setup in setup.py (as example of torch_npu )

// setup.py 
entry_points={
 'torch.backends': ['torch_npu = torch_npu:_autoload', ],
}

When import torch is invoked, the accelerator module will be loaded automatically. This provides users with a consistent programming experience across out-of-tree devices, eliminating the need to be aware of differences between CUDA, HPU, and NPU.

# Case 1: Use HPU 
import torch 
import torchvision.models as models 
model = models.resnet50().eval().to("hpu") 
input = torch.rand(128, 3, 224, 224).to("hpu") 
output = model(input) 

# Case 2: Use torch_npu 
import torch 
print(torch.ones(1, 2, device='npu'))

Device Integration Optimization

What is PrivateUse1?

In PyTorch, the dispatcher is a crucial component of the framework’s backend that manages how operations are routed to the appropriate device-specific implementation. Dispatch keys are an integral part of this system, serving as identifiers that represent various execution contexts—such as the device (CPU, CUDA, XPU), layout (dense, sparse), and autograd functionality. These keys ensure that operations are directed to the correct implementation.

PrivateUse1 is a customizable device dispatch key, similar to CUDA/CPU/XPU, etc.), reserved for out-of-tree devices. It provides developers with a way to extend PyTorch’s functionality without modifying the core framework, allowing for the integration of new devices, hardware accelerators, or other specialized computing environments.

Why do we need PrivateUse1?

Internally, dispatch keys are represented as bit masks, each bit represents whether a certain key is active. This bit mask representation is efficient for quick lookup and combination of keys, but it inherently limits the number of distinct keys (typically to 64 or fewer).

The current implementation of BackendComponent dispatch keys in PyTorch has encountered a critical bottleneck, which restricts the addition of new backends and, as a result, limits the expansion of the PyTorch ecosystem.

bit diagram

In response to this challenge, a series of optimizations have been applied to the PrivateUse1 mechanism to enhance its capacity.

  • PrivateUse1 integration mechanism

    Initially reserved as fallback options, PrivateUse1, along with PrivateUse2 and PrivateUse3, were designed to be activated only when existing key resources became scarce.

    PrivateUse1 is now being developed to match the robustness and versatility of established keys like CUDA and CPU. Achieving this required a deep integration across critical PyTorch modules. This integration wasn’t just a simple switch—it involved significant updates to core components such as AMP (Automatic Mixed Precision), Autograd, Distributed Training, Checkpointing, DataLoader, Optimization, and Quantization, etc.

flow diagram

The activation of PrivateUse1 was a massive collaborative effort, culminating in over 100 pull requests aimed at making it from a placeholder to a fully operational dispatch key.

  • PrivateUse1 UT/CI Quality Assurance

    While unit tests are essential for ensuring quality during the development of the PrivateUse1 mechanism, they are not sufficient on their own to prevent new pull requests from inadvertently affecting existing functionality or compatibility of out-of-tree devices.

    To mitigate this risk, the community has added the pytorch_openreg module to the test suite. This module leverages a CPU backend to simulate interactions with accelerators, creating a controlled environment for rigorous testing. After implemented, this will enable automatic execution of device-generic test cases whenever relevant code is updated, allowing us to quickly detect and address any potential issues affecting the PrivateUse1 integration mechanism.

  • Comprehensive Documentation

    By providing comprehensive and easy-to-understand documentation, we aim to lower the barrier to entry for developers and encourage wider adoption of the PrivateUse1 mechanism in the PyTorch ecosystem. This documentation includes:

    • Step-by-step guides for integrating new backends using PrivateUse1
    • Clear explanations of PrivateUse1’s functionality and benefits
    • Code examples and best practices for efficient implementation

These enhancements aim to improve the robustness and reliability of the PrivateUse1 mechanism, facilitating better integration of new backends and expanding the capabilities of PyTorch.

Compatibility Between Upstream and Downstream

Device-Generic Unit Tests

Most unit tests in PyTorch focus on CPU and CUDA devices, which limits participation from users with other hardware. To address this, a plan to modify PyTorch’s unit testing framework, enabling better support for non-CUDA devices. This plan includes removing existing device restrictions, implementing dynamic data type loading, and generalizing decorators to accommodate a broader range of devices. Additionally, we aim to enforce the use of universal device code and expand distributed testing to support non-NCCL backends.

Through these improvements, we hope to significantly increase test coverage and pass rates for non-CUDA devices, integrating them into PyTorch’s continuous integration process. Initial changes have already been implemented, paving the way for new hardware support and creating a reference template for other devices.

Ensuring Robust Device Integration through Automated Testing

To uphold the high standards of quality assurance in PyTorch, an independent build repository and daily continuous integration (CI) workflows have been established, focusing on smoke and integration testing.

The pytorch-integration-tests repository automates the testing of PyTorch’s device-specific functionalities, ensuring that they operate correctly and efficiently across a variety of hardware platforms(NPUs and other specialized devices). In repository we are trying to make a fully automated system that continuously validates PyTorch’s compatibility with different hardware backends.

  • Automated Integration Tests: Run automated tests across different devices using GitHub Actions. This automation ensures that every change in the codebase is thoroughly tested against multiple hardware platforms, catching potential issues early in the development process.
  • Reusable Workflows: Workflows in this repository are modular and reusable, which streamlines the testing process. Developers can easily adapt these workflows to new devices or testing scenarios, making the system both flexible and scalable as PyTorch evolves.
  • Awareness of Out-of-Tree Devices: The repository displays the existence and behavior of all out-of-tree devices, keeping the community informed. This approach minimizes the risk of accidentally breaking downstream functionalities and provides fast feedback on changes.

Efforts to enhance multi-device integration are pivotal for its adaptability in the evolving deep learning landscape. These initiatives not only benefit current users but also lower entry barriers for new hardware vendors and developers, fostering innovation in AI and machine learning. As PyTorch continues to evolve, its commitment to flexibility, robustness, and inclusivity positions it as a leading framework capable of meeting the diverse needs of the deep learning community.

Read More

Build RAG-based generative AI applications in AWS using Amazon FSx for NetApp ONTAP with Amazon Bedrock

Build RAG-based generative AI applications in AWS using Amazon FSx for NetApp ONTAP with Amazon Bedrock

The post is co-written with Michael Shaul and Sasha Korman from NetApp.

Generative artificial intelligence (AI) applications are commonly built using a technique called Retrieval Augmented Generation (RAG) that provides foundation models (FMs) access to additional data they didn’t have during training. This data is used to enrich the generative AI prompt to deliver more context-specific and accurate responses without continuously retraining the FM, while also improving transparency and minimizing hallucinations.

In this post, we demonstrate a solution using Amazon FSx for NetApp ONTAP with Amazon Bedrock to provide a RAG experience for your generative AI applications on AWS by bringing company-specific, unstructured user file data to Amazon Bedrock in a straightforward, fast, and secure way.

Our solution uses an FSx for ONTAP file system as the source of unstructured data and continuously populates an Amazon OpenSearch Serverless vector database with the user’s existing files and folders and associated metadata. This enables a RAG scenario with Amazon Bedrock by enriching the generative AI prompt using Amazon Bedrock APIs with your company-specific data retrieved from the OpenSearch Serverless vector database.

When developing generative AI applications such as a Q&A chatbot using RAG, customers are also concerned about keeping their data secure and preventing end-users from querying information from unauthorized data sources. Our solution also uses FSx for ONTAP to allow users to extend their current data security and access mechanisms to augment model responses from Amazon Bedrock. We use FSx for ONTAP as the source of associated metadata, specifically the user’s security access control list (ACL) configurations attached to their files and folders and populate that metadata into OpenSearch Serverless. By combining access control operations with file events that notify the RAG application of new and changed data on the file system, our solution demonstrates how FSx for ONTAP enables Amazon Bedrock to only use embeddings from authorized files for the specific users that connect to our generative AI application.

AWS serverless services make it straightforward to focus on building generative AI applications by providing automatic scaling, built-in high availability, and a pay-for-use billing model. Event-driven compute with AWS Lambda is a good fit for compute-intensive, on-demand tasks such as document embedding and flexible large language model (LLM) orchestration, and Amazon API Gateway provides an API interface that allows for pluggable frontends and event-driven invocation of the LLMs. Our solution also demonstrates how to build a scalable, automated, API-driven serverless application layer on top of Amazon Bedrock and FSx for ONTAP using API Gateway and Lambda.

Solution overview

The solution provisions an FSx for ONTAP Multi-AZ file system with a storage virtual machine (SVM) joined to an AWS Managed Microsoft AD domain. An OpenSearch Serverless vector search collection provides a scalable and high-performance similarity search capability. We use an Amazon Elastic Compute Cloud (Amazon EC2) Windows server as an SMB/CIFS client to the FSx for ONTAP volume and configure data sharing and ACLs for the SMB shares in the volume. We use this data and ACLs to test permissions-based access to the embeddings in a RAG scenario with Amazon Bedrock.

The embeddings container component of our solution is deployed on an EC2 Linux server and mounted as an NFS client on the FSx for ONTAP volume. It periodically migrates existing files and folders along with their security ACL configurations to OpenSearch Serverless. It populates an index in the OpenSearch Serverless vector search collection with company-specific data (and associated metadata and ACLs) from the NFS share on the FSx for ONTAP file system.

The solution implements a RAG Retrieval Lambda function that allows RAG with Amazon Bedrock by enriching the generative AI prompt using Amazon Bedrock APIs with your company-specific data and associated metadata (including ACLs) retrieved from the OpenSearch Serverless index that was populated by the embeddings container component. The RAG Retrieval Lambda function stores conversation history for the user interaction in an Amazon DynamoDB table.

End-users interact with the solution by submitting a natural language prompt either through a chatbot application or directly through the API Gateway interface. The chatbot application container is built using Streamlit and fronted by an AWS Application Load Balancer (ALB). When a user submits a natural language prompt to the chatbot UI using the ALB, the chatbot container interacts with the API Gateway interface that then invokes the RAG Retrieval Lambda function to fetch the response for the user. The user can also directly submit prompt requests to API Gateway and obtain a response. We demonstrate permissions-based access to the RAG documents by explicitly retrieving the SID of a user and then using that SID in the chatbot or API Gateway request, where the RAG Retrieval Lambda function then matches the SID to the Windows ACLs configured for the document. As an additional authentication step in a production environment, you may want to also authenticate the user against an identity provider and then match the user against the permissions configured for the documents.

The following diagram illustrates the end-to-end flow for our solution. We start by configuring data sharing and ACLs with FSx for ONTAP, and then these are periodically scanned by the embeddings container. The embeddings container splits the documents into chunks and uses the Amazon Titan Embeddings model to create vector embeddings from these chunks. It then stores these vector embeddings with associated metadata in our vector database by populating an index in a vector collection in OpenSearch Serverless. The following diagram illustrates the end-to-end flow.

end to end embedding flow for the fsxontap and bedrock integration

The following architecture diagram illustrates the various components of our solution.overall architecture diagram describing all the components of the solution

Prerequisites

Complete the following prerequisite steps:

  1. Make sure you have model access in Amazon Bedrock. In this solution, we use Anthropic Claude v3 Sonnet on Amazon Bedrock.
  2. Install the AWS Command Line Interface (AWS CLI).
  3. Install Docker.
  4. Install Terraform.

Deploy the solution

The solution is available for download on this GitHub repo. Cloning the repository and using the Terraform template will provision all the components with their required configurations.

  1. Clone the repository for this solution:
    sudo yum install -y unzip
    git clone https://github.com/aws-samples/genai-bedrock-fsxontap.git
    cd genai-bedrock-fsxontap/terraform

  2. From the terraform folder, deploy the entire solution using Terraform:
    terraform init
    terraform apply -auto-approve

This process can take 15–20 minutes to complete. When finished, the output of the terraform commands should look like the following:

api-invoke-url = "https://9ng1jjn8qi.execute-api.<region>.amazonaws.com/prod"
fsx-management-ip = toset([
"198.19.255.230",])
fsx-secret-id = "arn:aws:secretsmanager:<region>:<account-id>:secret:AmazonBedrock-FSx-NetAPP-ONTAP-a2fZEdIt-0fBcS9"
fsx-svm-smb-dns-name = "BRSVM.BEDROCK-01.COM"
lb-dns-name = "chat-load-balancer-2040177936.<region>.elb.amazonaws.com"

Load data and set permissions

To test the solution, we will use the EC2 Windows server (ad_host) mounted as an SMB/CIFS client to the FSx for ONTAP volume to share sample data and set user permissions that will then be used to populate the OpenSearch Serverless index by the solution’s embedding container component. Perform the following steps to mount your FSx for ONTAP SVM data volume as a network drive, upload data to this shared network drive, and set permissions based on Windows ACLs:

  1. Obtain the ad_host instance DNS from the output of your Terraform template.
  2. Navigate to AWS Systems Manager Fleet Manager on your AWS console, locate the ad_host instance and follow instructions here to login with Remote Desktop. Use the domain admin user bedrock-01Admin and obtain the password from AWS Secrets Manager. You can find the password using the Secrets Manager fsx-secret-id secret id from the output of your Terraform template.
  3. To mount an FSx for ONTAP data volume as a network drive, under This PC, choose (right-click) Network and then choose Map Network drive.
  4. Choose the drive letter and use the FSx for ONTAP share path for the mount
    (\<svm>.<domain >c$<volume-name>):
    map network drive
  5. Upload the Amazon Bedrock User Guide to the shared network drive and set permissions to the admin user only (make sure that you disable inheritance under Advanced):upload the amazon bedrock user guide
  6. Upload the Amazon FSx for ONTAP User Guide to the shared drive and make sure permissions are set to Everyone:upload the amazon fsx ontap media guide
  7. On the ad_host server, open the command prompt and enter the following command to obtain the SID for the admin user:
    wmic useraccount where name='Admin' get sid

Test permissions using the chatbot

To test permissions using the chatbot, obtain the lb-dns-name URL from the output of your Terraform template and access it through your web browser:

test with chatbot and enter prompt

For the prompt query, ask any general question on the FSx for ONTAP user guide that is available for access to everyone. In our scenario, we asked “How can I create an FSx for ONTAP file system,” and the model replied back with detailed steps and source attribution in the chat window to create an FSx for ONTAP file system using the AWS Management Console, AWS CLI, or FSx API:

test with chatbot and enter prompt related to the bedrock guide

Now, let’s ask a question about the Amazon Bedrock user guide that is available for admin access only. In our scenario, we asked “How do I use foundation models with Amazon Bedrock,” and the model replied with the response that it doesn’t have enough information to provide a detailed answer to the question.:

Use the admin SID on the user (SID) filter search in the chat UI and ask the same question in the prompt. This time, the model should reply with steps detailing how to use FMs with Amazon Bedrock and provide the source attribution used by the model for the response:

Test permissions using API Gateway

You can also query the model directly using API Gateway. Obtain the api-invoke-url parameter from the output of your Terraform template.

curl -v '<api-invoke-url>/bedrock_rag_retreival' -X POST -H 'content-type: application/json' -d '{"session_id": "1","prompt": "What is an FSxN ONTAP filesystem?", "bedrock_model_id": "anthropic.claude-3-sonnet-20240229-v1:0", "model_kwargs": {"temperature": 1.0, "top_p": 1.0, "top_k": 500}, "metadata": "NA", "memory_window": 10}'

Then invoke the API gateway with Everyone access for a query related to the FSx for ONTAP user guide by setting the value of the metadata parameter to NA to indicate Everyone access:

curl -v '<api-invoke-url>/bedrock_rag_retreival' -X POST -H 'content-type: application/json' -d '{"session_id": "1","prompt": "what is bedrock?", "bedrock_model_id": "anthropic.claude-3-sonnet-20240229-v1:0", "model_kwargs": {"temperature": 1.0, "top_p": 1.0, "top_k": 500}, "metadata": "S-1-5-21-4037439088-1296877785-2872080499-1112", "memory_window": 10}'

Cleanup

To avoid recurring charges, clean up your account after trying the solution. From the terraform folder, delete the Terraform template for the solution:

terraform apply --destroy

Conclusion

In this post, we demonstrated a solution that uses FSx for ONTAP with Amazon Bedrock and uses FSx for ONTAP support for file ownership and ACLs to provide permissions-based access in a RAG scenario for generative AI applications. Our solution enables you to build generative AI applications with Amazon Bedrock where you can enrich the generative AI prompt in Amazon Bedrock with your company-specific, unstructured user file data from an FSx for ONTAP file system. This solution enables you to deliver more relevant, context-specific, and accurate responses while also making sure only authorized users have access to that data. Finally, the solution demonstrates the use of AWS serverless services with FSx for ONTAP and Amazon Bedrock that enable automatic scaling, event-driven compute, and API interfaces for your generative AI applications on AWS.

For more information about how to get started building with Amazon Bedrock and FSx for ONTAP, refer to the following resources:


About the authors

Kanishk Mahajan is Principal, Solutions Architecture at AWS. He leads cloud transformation and solution architecture for ISV customers and partner at AWS. Kanishk specializes in containers, cloud operations, migrations and modernizations, AI/ML, resilience and security and compliance. He is a Technical Field Community (TFC) member in each of those domains at AWS.

Michael Shaul is a Principal Architect at NetApp’s office of the CTO. He has over 20 years of experience building data management systems, applications, and infrastructure solutions. He has a unique in-depth perspective on cloud technologies, builder, and AI solutions.

Sasha Korman is a tech visionary leader of dynamic development and QA teams across Israel and India. With 14-years at NetApp that began as a programmer, his hands-on experience and leadership have been pivotal in steering complex projects to success, with a focus on innovation, scalability, and reliability.

Read More

Support for AWS DeepComposer ending soon

Support for AWS DeepComposer ending soon

AWS DeepComposer was first introduced during AWS re:Invent 2019 as a fun way for developers to compose music by using generative AI. AWS DeepComposer was the world’s first machine learning (ML)-enabled keyboard for developers to get hands-on—literally—with a musical keyboard and the latest ML techniques to compose their own music.

After careful consideration, we have made the decision to end support for AWS DeepComposer, effective September 17, 2025. With your help and feedback, our portfolio of products and services has grown to include new tools for developers to get hands-on with AI and ML. Amazon PartyRock, for example, is a generative AI playground for intuitive, code-free help in building web applications.

If you have data stored on the AWS DeepComposer console, you will be able to use AWS DeepComposer as normal until September 17, 2025, when support for the service will end. After this date, you will no longer be able to use AWS DeepComposer through the AWS Management Console, manage AWS DeepComposer devices, or access any compositions or models you have created. Until then, you can continue to work on your compositions or models and export those you would like to keep by using the step-by-step guide in the AWS DeepComposer FAQs.

If you have additional questions, please read our FAQs or contact us.


About the author

Kanchan Jagannathan is a Sr. Program Manager in the AWS AI Devices team where he helps launches AWS devices into sales channel and also oversees the Service Availability Change process for the team. He was a Program Manager for FC automation deployment and launches before joining AWS. Outside of work, he has begun to bravely endeavour camping with his 5-yr old and 1-yr old kids and enjoying the moments he gets to be with them.

Read More

Preserve access and explore alternatives for Amazon Lookout for Equipment

Preserve access and explore alternatives for Amazon Lookout for Equipment

Amazon Lookout for Equipment, the AWS machine learning (ML) service designed for industrial equipment predictive maintenance, will no longer be open to new customers effective October 17, 2024. Existing customers will be able to use the service (both using the AWS Management Console and API) as normal and AWS will continue to invest in security, availability, and performance improvements for Lookout for Equipment, but we do not plan to introduce new features for this service.

This post discusses how you can maintain access to Lookout for Equipment after it is closed to new customers and some alternatives to Lookout for Equipment.

Maintaining access to Lookout for Equipment

You’re considered an existing customer if you use the service, either through cloud training or cloud inferencing, any time in the 30 days prior to October 17, 2024 (September 17, 2024, through October 16, 2024). To maintain access to the service after October 17, 2024, you should complete one of the following tasks from the account for which you intend to maintain access:

  • On the Lookout for Equipment console, start a new project and successfully complete a model training
  • On the Lookout for Equipment console, open an existing project, schedule an inference for a given model, and run at least one inference
  • Use Lookout for Equipment API calls CreateInferenceScheduler and StartInferenceScheduler (and StopInferenceScheduler when done)

For any questions or support needed, contact your assigned AWS Account Manager or Solutions Architect, or create a case from the AWS console.

Alternatives to Lookout for Equipment

If you’re interested in an alternative to Lookout for Equipment, AWS has options for both buyers and builders.

For an out-of-the-box solution, the AWS Partner Network offers solutions from multiple partners. You can browse solutions on the Asset Maintenance and Reliability page in the AWS Solutions Library. This approach provides a solution that addresses your use case without requiring you to have expertise in predictive maintenance, and typically provides the fastest time to value by using the specialized expertise of the AWS Partners.

If you prefer to build your own solution, AWS offers AI/ML tools and services to help you develop an AI-based predictive maintenance solution. Amazon SageMaker provides a set of tools to enable you to build, train, infer, and deploy ML models for your use case with fully managed infrastructure, tools, and workflows.

Summary

Although new customers will no longer have access to Lookout for Equipment after October 17, 2024, AWS offers a powerful set of AI/ML services and solutions in the form of SageMaker tools to build customer models, and also offers a range of solutions from partners through the AWS Partner Network. You should explore these options to determine what works best for your specific needs.

For more details, refer to the following resources:


About the author

Stuart Gillen is a Sr. Product Manager, Lookout for Equipment, at AWS. Stuart has held a variety of roles in engineering management, business development, product management, and consulting. Most of his career has been focused on industrial applications specifically in reliability practices, maintenance systems, and manufacturing.
Stuart is the Product Manager for Lookout for Equipment at AWS where he utilizes his industrial and AI background in applications focusing on Predictive Maintenance and Condition Monitoring.

Read More

Eureka: Evaluating and understanding progress in AI

Eureka: Evaluating and understanding progress in AI

A summary of insights extracted by using the Eureka framework, shown via two radar charts for multimodal (left) and language (right) capabilities respectively. The radar charts show the best and worst performance observed for each capability.

In the fast-paced progress of AI, the question of how to evaluate and understand capabilities of state-of-the-art models is timelier than ever. New and capable models are being released frequently, and each release promises the next big leap in frontiers of intelligence. Yet, as researchers and developers, often we ask ourselves: Are these models all comparable, if not the same, in terms of capabilities? There are, of course, strong reasons to believe they are, given that many score similarly in standard benchmarks. In addition, rankings in the numerous leaderboards do not offer a consistent and detailed explanation of why a model is ranked slightly better than others. However, if some models are fundamentally different, what are their strengths and weaknesses? More importantly, are there capabilities that are essential for making AI useful in the real world but still universally challenging for most models? Answering such questions helps us understand where we are on the frontier of AI, and what capability improvements are needed to meet the expectations that humanity and science have for safe and responsible deployments of AI models. 

The prevalence of these models is dependent on our ability to mature the science of in-depth AI evaluation and measurement. In our latest open-source release and technical report EUREKA: Evaluating and Understanding Large Foundation Models (opens in new tab), we start answering these questions by running an in-depth measurement analysis across 12 state-of-the-art proprietary and open-weights models. Behind this analysis stands Eureka (opens in new tab), an open-source framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. The framework currently supports both language and multimodal (text and image) data and enables developers to define custom pipelines for data processing, inference, and evaluation, with the possibility to inherit from existing pipelines and minimize development work. Eureka and all our evaluation pipelines are available as open source to foster transparent and reproducible evaluation practices. We hope to collaborate with the open-source community to share and expand current measurements for new capabilities and models. 

Focus on challenging and non-saturated capabilities

Eureka tests models across a rich collection of fundamental language and multimodal capabilities that are challenging for even the most advanced models, but are often overlooked by standard benchmarks commonly reported in model releases. In practice, this also means that our analysis intentionally does not pivot on oversaturated benchmarks. As unconventional as this may sound, it is motivated by two reasons. First, measurement on saturated benchmarks, for which most models perform over 95%, leaves very little space for failure analysis and model comparison. Second, even though saturation may be rooted in genuine model improvements, concerns about memorization and overfitting to labeling errors lower the credibility of measurements, especially in the very high accuracy regime. 

Microsoft Research blog

Microsoft at FAccT 2024: Advancing responsible AI research and practice

From studying how to identify gender bias in Hindi to uncovering AI-related risks for workers, Microsoft is making key contributions towards advancing the state of the art in responsible AI research. Check out their work at ACM FAccT 2024.


Beyond single-score measurements and universal rankings

Even though rankings and leaderboards remain the quickest way to compare models, they rarely uncover important conditions of failure. Due to overreliance on single-score aggregations of performance, the more nuanced comparative findings are hidden behind small differences between model scores aggregated across many capabilities and experimental conditions.

As we show in our study, the chase after these rankings has created surprising dynamics that do not necessarily lead to identical models, but to models that use different complementary skills to achieve comparable overall scores in important leaderboards. Imagine you are a triathlon athlete aiming to achieve an elite performance, which historically takes around two hours. Despite your ambition to hit this top-tier mark, you face constraints with limited time and resources for training and preparation. In practice, athletes often focus their best resources on excelling in certain disciplines while aiming for a satisfactory performance in others. They prioritize based on what they believe is most achievable given their time and experience.

We observe similar phenomena in the set of 12 models we study. Even if two models may score very closely for the same capability, disaggregating that performance across disciplines and input conditions shows that each model has its own complementary strengths. Identifying, measuring, and understanding these strengths for a single model is needed for planning targeted improvements. Repeating this process for a large set of models, as we do in Eureka, is needed for identifying the hypothetical frontier, guiding research and development, and creating a model that combines and delivers capabilities that build on the strengths observed in existing models. 

Measuring consistency: non-determinism and backward compatibility

When people work with collaborators or when they choose tools to assist them in everyday tasks, predictability and consistency are key to a successful collaboration. Similarly, humans and application developers expect their AI assistants and models to be consistent over time for similar inputs and interactions. In our analysis, we study this under-explored angle of model performance, by focusing on two key aspects: the determinism of answer outcomes for identical examples and prompts, and the backward compatibility of model answers at the example level after a model has been updated with a new version. Lack of consistency in either of these domains would lead to breaking trust with users and application developers. 

The analysis shows surprising results and opens new considerations for improvement. For example, we observe that very few large foundation models are fully deterministic and for most of them there are visible variations in the output — and most importantly in accuracy — when asked the same question several times, with generation temperature set to zero—a control that tells models to minimize randomness in generations. In addition, when comparing new model releases with earlier models from the same family, a significant amount of regress at the example level can be observed after the update, even though the overall accuracy may increase. In practice, this type of inconsistency can be frustrating for application developers who rely on prewritten examples and prompts propagated to a foundation model. 

Eureka Insights

Figure 1 is a high-level illustration of the current state of AI for Eureka-Bench, highlighting the best and the worst performances across various capabilities. These results reveal a nuanced picture of different models’ strengths, showing that no single model excels in all tasks. However, Claude 3.5 Sonnet, GPT-4o 2024-05-13, and Llama 3.1 405B consistently outperform others in several key areas.

A summary of insights extracted by using the Eureka framework, shown via two radar charts for multimodal (left) and language (right) capabilities respectively. The radar charts show the best and worst performance observed for each capability.
Figure 1 – Performance of best and worse models for multimodal (left) and language (right) datasets in in Eureka-Bench. The red frontier shows the performance of the worse model, indicating the area that is already solved for the set of capabilities. The green frontier shows the performance of the best model, indicating the best-known result with current technology. The blue horizon between the best model and the maximum performance shows the room for improvement for mastering the capability. The best performance sets indicated in the green border include all models that perform within 2% of the best observed result. 

Multimodal capabilities

Evaluation in Eureka reveals that state-of-the-art models are still fairly limited in their multimodal abilities, specifically when it comes to detailed image understanding (for example, localization of objects, geometric and spatial reasoning, and navigation), which is most needed in truly multimodal scenarios that require physical awareness, visual grounding, and localization. 

  1. State-of-the-art multimodal models struggle with geometric reasoning. 
    Models perform worse in reasoning about height than about depth. Claude 3.5 Sonnet and Gemini 1.5 Pro are the best performing models for this task, with Claude 3.5 Sonnet being the most accurate model for depth ordering, and Gemini 1.5 Pro the most accurate for height ordering. 
  2. Multimodal capabilities lag language capabilities. 
    On tasks that can be described either as multimodal or as language-only, the performance of most tested models is higher for the language-only condition. GPT-4o 2024-05-13 is the only model that consistently achieves better results when presented with both vision and language information, showing therefore that it can better fuse the two data modalities.
  3. Complementary performance across models for fundamental multimodal skills.
    Claude 3.5 Sonnet, GPT-4o 2024-05-13, and GPT-4 Turbo 2024-04-09 have comparable performance in multimodal question answering (MMMU). In tasks like object recognition and visual prompting, the performance of Claude 3.5 Sonnet is better or comparable to GPT-4o 2024-05-13, but Gemini 1.5 Pro outperforms them both. Finally, in tasks like object detection and spatial reasoning, GPT-4o 2024-05-13 is the most accurate model. 

Language

The evaluation through Eureka shows that there have been important advances from state-of-the-art models in the language capabilities of instruction following, long context question answering, information retrieval, and safety. The analysis also discovers major differences and gaps between models related to robustness to context length, factuality and grounding for information retrieval, and refusal behavior. 

  1. Faster improvements in instruction following across all model families. 
    Instruction following is the ability to follow guidance expressed in user prompts regarding specifications related to format, style, and structure of the generated content. Among the studied language capabilities, instruction following is where most models are improving faster, potentially due to strong investments in instruction tuning processes, with most models now having an instruction following rate of higher than 75%. 
  2. All models’ performance in question answering drops with longer context. 
    Contrary to “needle-in-a-haystack” experiments, testing state-of-the-art models on tasks that involve reasoning over long context shows significant decline in performance as context size grows. Amongst all models, GPT-4o 2024-05-13 and Llama 3.1 405B have the lowest drop in performance for longer context.
  3. Major gaps in factuality and grounding for information retrieval from parametric knowledge or input context. 
    Models exhibit query fact precision rates of lower than 55%, fact recall rates of lower than 25%, and rates of irrelevant and fabricated information above 20%. Llama 3.1 405B, GPT-4o 2024-05-13, and Claude 3.5 Sonnet are the top performers in this area across different conditions.
  4. High refusal rates. Lower accuracy in detecting toxic content vs. neutral content for most models. 
    While several models have high accuracy rates for toxicity detection, others (Gemini 1.5 Pro, Claude 3.5 Sonnet, Claude 3 Opus, and Llama 3.1 405B) exhibit low accuracy in classifying toxic content and a high refusal rate to classify toxic or neutral context, both of which make toxic content difficult to detect. During the safe language generation evaluation, models like GPT-4 1106 Preview and Mistral Large 2407 have the highest toxicity rates. GPT-4o 2024-05-13 is the only model that has both a high toxicity detection accuracy and a low toxicity score for safe language generation. 

Non-determinism

Several models have highly non-deterministic output for identical runs. Gemini 1.5 Pro, GPT-4 1106 Preview, GPT-4 Vision Preview, and GPT-4 Turbo 2024-04-09 show high non-determinism of outcomes. These results raise important questions regarding the stability of user and developer experiences when repeatedly inferencing with identical queries using the same prompt templates. Llama 3 70B, Llama 3.1 70B, and Mistral Large 2407 are almost perfectly deterministic. 

Backward compatibility

Backward incompatibility for shifts within the same model family is prevalent across all state-of-the-art models. This is reflected in high regression rates for individual examples and at a subcategory level. This type of regression can break trust with users and application developers during model updates. Regression varies per task and metric, but we observe several cases when it is higher than 10% across three model families (Claude, GPT, Llama), and sometimes they can dominate progress rates for whole subcategories of data. 

Conclusion

The complementary results extracted from this study highlight opportunities for improving current models across various areas, aiming to match the performance of the best model for each individual capability in this challenge set. However, several tasks in the challenge set remain difficult even for the most capable models. It is crucial to discuss and explore whether these gaps can be addressed with current technologies, architectures, and data synthesis protocols.

Finally, Eureka and the set of associated benchmarks are only the initial snapshot of an effort that aims at reliably measuring progress in AI. Our team is excited about further collaborations with the open-source community and research, with the goal of sharing and extending current measurements for new capabilities and models. 

The post Eureka: Evaluating and understanding progress in AI appeared first on Microsoft Research.

Read More

Upgrade Livestreams With Twitch Enhanced Broadcasting and the NVIDIA Encoder

Upgrade Livestreams With Twitch Enhanced Broadcasting and the NVIDIA Encoder

At TwitchCon — a global convention for the Twitch livestreaming platform—livestreamers and content creators this week can experience the latest technologies for accelerating creative workflows and improving video quality.

That includes the beta release of Twitch Enhanced Broadcasting support for HEVC when using the NVIDIA encoder.

Content creators can also use the NVIDIA Broadcast app, eighth-generation NVIDIA NVENC and RTX-powered optimizations in streaming and video editing apps to enhance their productions.

Plus, the September NVIDIA Studio Driver, designed to optimize creative apps, is now ready for download. Studio Drivers undergo extensive testing to ensure seamless compatibility while enhancing features, automating processes and accelerating workflows.

Twitch Enhanced Broadcasting With HEVC

The tradeoff between higher-resolution video quality and reliable streaming is a common issue livestreamers struggle with.

Higher-quality video provides more enjoyable viewing experiences but can cause streams to buffer for viewers with lower bandwidth or older devices. Streaming lower-bitrate video allows more people to watch content seamlessly but introduces artifacts that can interfere with viewing quality.

To address this issue, NVIDIA and Twitch collaborated to develop Twitch Enhanced Broadcasting. The feature adds the capability to send multiple streams — different versions of encoded video with different resolutions or bitrates — directly from NVIDIA GeForce RTX-equipped PCs or NVIDIA RTX workstations to deliver the highest-quality video a viewer’s internet connection can handle.

Twitch supports HEVC (H.265) in the Enhanced Broadcasting closed beta. With the NVIDIA encoder, Twitch streamers get 25% improved efficiency and quality over H.264.

This means that video will look as if it were being streamed with 25% more bitrate — in higher quality and with reduced artifacts or encoding errors. The feature is ideal for streaming fast-paced gameplay, enabling cleaner, sharper video with minimal lag.

Because all stream versions are generated with a dedicated hardware encoder on GeForce RTX GPUs, the rest of the system’s GPU and CPU are free to focus on running games more smoothly to maximize performance.

Learn how to get started on twitch.com.

AI-Enhanced Microphones and Webcams

Streaming is easier than ever with NVIDIA technologies.

For starters, PC performance and video quality are incredibly high quality thanks to NVIDIA’s dedicated encoder. And, NVIDIA GPUs include Tensor Cores that efficiently run AI.

Livestreamers can use AI to enhance their hardware peripherals and devices, which is especially helpful for those who haven’t had the time or resources to assemble extensive audio and video setups.

NVIDIA Broadcast transforms any home office or dorm room into a home studio — without the need to purchase specialized equipment. Its AI-powered features include Noise and Echo Removal for microphones, and Virtual Background, Auto Frame, Video Noise Removal and Eye Contact for cameras.

Livestreamers can download the Broadcast app or access its effects across popular creative apps, including Corsair iCUE, Elgato Camera Hub, OBS, Streamlabs, VTube Studio and Wave Link.

Spotlight the Highlights

GeForce RTX GPUs make it lightning-fast to edit and enhance video footage on the most popular video editing apps, from Adobe Premiere Pro to CapCut Pro.

Streamers can use AI-powered, RTX-accelerated features like Enhance Speech to remove noise and improve the quality of dialogue clips; Auto Reframe to automatically size social media videos; and Scene Edit Detection to break up long videos, like B-roll stringouts, into individual clips.

NVIDIA encoders help turbocharge the export process. For those looking for extreme performance, the GeForce RTX 4070 Ti GPU and up come equipped with dual encoders that can be used in parallel to halve export times on apps like CapCut, the most widely used video editing app on TikTok.

Clearer, Sharper Viewing Experiences With RTX Video

NVIDIA RTX Video — available exclusively for NVIDIA and GeForce RTX GPU owners — can turn any online and native video into pristine 4K high dynamic range (HDR) content with two technologies: Video Super Resolution and Video HDR.

RTX Video Super Resolution de-artifacts and upscales streamed video to remove errors that occur during encoding or transport, then runs an AI super-resolution effect. The result is cleaner, sharper video that’s ideal for streaming on platforms like YouTube and Twitch.

Many users have HDR displays, but there isn’t much HDR content online. RTX Video HDR addresses this by turning any standard dynamic range (SDR) video into HDR10 quality that delivers a wider range of brights and darks and makes visuals more vibrant and colorful. This feature is especially helpful when watching dark-lit scenes in video games.

RTX Video HDR requires an RTX GPU connected to an HDR10-compatible monitor or TV. For more information, see the RTX Video FAQ.

Check out TwitchCon — taking place in San Diego and online from Sept. 20-22 for the latest streaming updates. 

Read More

New AI Innovation Hub in Tunisia Drives Technological Advancement Across Africa

New AI Innovation Hub in Tunisia Drives Technological Advancement Across Africa

A new AI innovation hub for developers across Tunisia launched today in Novation City, a technology park that’s designed to cultivate a vibrant, innovation ecosystem in mechatronics — an industry encompassing IT, mechanics and electronics — and to foster synergy between education, research and industry in the North African country.

Built in collaboration with the NVIDIA Deep Learning Institute (DLI), the hub offers the training, technologies and business networks needed to help drive AI adoption across the continent.

The hub’s launch is part of NVIDIA’s efforts to train 100,000 developers across Africa through the DLI over the next three years — a goal that’s about a quarter complete today.

Located in Sousse — a coastal city in central Tunisia that’s surrounded by universities, startups and other organizations with a strong focus on STEM and AI — the hub includes complimentary access to NVIDIA DLI courses on topics such as generative AI, accelerated computing and data science.

Through its AI, industry 4.0 and smart transport centers of excellence, Novation City provides cutting-edge resources and access to NVIDIA DGX infrastructure for AI startups and researchers. Novation City also hosts organized activities to drive ecosystem growth, such as hackathons and specialized training sessions. It’s all brought together in this new environment conducive to AI learning, experimentation and deployment.

“Novation City has launched several key AI initiatives to strengthen the ecosystem, with NVIDIA’s support being instrumental in empowering AI startups and advancing AI skills,” said Anas Rochdi, chief innovation officer at Novation City. “This year, we deployed Tunisia’s first NVIDIA DGX system and launched major academic initiatives in collaboration with the NVIDIA Deep Learning Institute, aiming to train more than 1,000 developers in one year.”

Novation City also runs several startup accelerator programs, and more than 10 participating companies are members of NVIDIA Inception, a free program that nurtures cutting-edge startups.

Tunisia Drives Innovation in STEM and AI Education

Tunisia’s education system has traditionally emphasized STEM, especially mathematics and the sciences, according to Wei Xiao, who leads NVIDIA’s developer relations team for startups, enterprises and universities in the Middle East and Africa.

“Tunisia has a rich history of valuing knowledge and scholarship, dating back to ancient Carthage and the Islamic Golden Age,” Xiao said. “And the nation’s curriculum is rigorous, creating a solid foundation for advanced studies in STEM fields.”

This has made the country — well-situated to serve as a gateway between Europe, Africa and the Middle East (EMEA) — a thriving ecosystem for innovation, entrepreneurship and research. Novation City and NVIDIA aim to bolster the ecosystem even further through the new AI innovation hub.

Fostering collaboration between academia, industry and government, the initiative is funded by French and German development agencies, the Tunisian government, the World Bank and the European Union, as well as several enterprises.

And the hub comes at a time when Tunisia has adopted a national strategy for AI and digitalization — which includes promoting AI education and research — as part of a broader vision to position the nation as a digital leader in Africa. For example, the University of Tunis this month launched the nation’s first public institute specializing in AI.

Partners Foster Sovereign AI in Tunisia and Beyond

A key part of sovereign AI is a nation’s ability to produce artificial intelligence using its own workforce — along with its own infrastructure, data and business networks. Free DLI training offered through Tunisia’s AI innovation hub is poised to enable just that, helping upskill the next generation of African AI experts.

Plus, Novation City already offers a wide range of facilities designed to support technological and scientific advancement.

In February, Novation City deployed an NVIDIA DGX system, among the first in Africa, that has empowered about 30 startups across the continent in climate AI, transportation, manufacturing, agtech and other industries to develop accelerated computing-based solutions.

In addition, ESPRIT University — a specialized university in Tunisia with more than 10,000 engineering students — boasts nine NVIDIA DLI ambassadors who are delivering training to students and contributing to the broader tech ecosystem across the country. This makes ESPRIT University one of the most active DLI organizations across EMEA.

Since 2018, ESPRIT has been tapping into DLI to advance AI education. The university has also acquired an NVIDIA DGX system to support research and product development.

NVIDIA has planned similar AI education initiatives in Kenya and Nigeria to further upskill and enhance African technology ecosystems.

Learn more about the NVIDIA Deep Learning Institute.

Read More