Brave New World: Leo AI and Ollama Bring RTX-Accelerated Local LLMs to Brave Browser Users

Brave New World: Leo AI and Ollama Bring RTX-Accelerated Local LLMs to Brave Browser Users

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for GeForce RTX PC and NVIDIA RTX workstation users.

From games and content creation apps to software development and productivity tools, AI is increasingly being integrated into applications to enhance user experiences and boost efficiency.

Those efficiency boosts extend to everyday tasks, like web browsing. Brave, a privacy-focused web browser, recently launched a smart AI assistant called Leo AI that, in addition to providing search results, helps users summarize articles and videos, surface insights from documents, answer questions and more.

Leo AI helps users summarize articles and videos, surface insights from documents, answer questions and more.

The technology behind Brave and other AI-powered tools is a combination of hardware, libraries and ecosystem software that’s optimized for the unique needs of AI.

Why Software Matters

NVIDIA GPUs power the world’s AI, whether running in the data center or on a local PC. They contain Tensor Cores, which are specifically designed to accelerate AI applications like Leo AI through massively parallel number crunching — rapidly processing the huge number of calculations needed for AI simultaneously, rather than doing them one at a time.

But great hardware only matters if applications can make efficient use of it. The software running on top of GPUs is just as critical for delivering the fastest, most responsive AI experience.

The first layer is the AI inference library, which acts like a translator that takes requests for common AI tasks and converts them to specific instructions for the hardware to run. Popular inference libraries include NVIDIA TensorRT, Microsoft’s DirectML and the one used by Brave and Leo AI via Ollama, called llama.cpp.

Llama.cpp is an open-source library and framework. Through CUDA — the NVIDIA software application programming interface that enables developers to optimize for GeForce RTX and NVIDIA RTX GPUs — provides Tensor Core acceleration for hundreds of models, including popular large language models (LLMs) like Gemma, Llama 3, Mistral and Phi.

On top of the inference library, applications often use a local inference server to simplify integration. The inference server handles tasks like downloading and configuring specific AI models so that the application doesn’t have to.

Ollama is an open-source project that sits on top of llama.cpp and provides access to the library’s features. It supports an ecosystem of applications that deliver local AI capabilities. Across the entire technology stack, NVIDIA works to optimize tools like Ollama for NVIDIA hardware to deliver faster, more responsive AI experiences on RTX.

NVIDIA’s focus on optimization spans the entire technology stack — from hardware to system software to the inference libraries and tools that enable applications to deliver faster, more responsive AI experiences on RTX.

Local vs. Cloud

Brave’s Leo AI can run in the cloud or locally on a PC through Ollama.

There are many benefits to processing inference using a local model. By not sending prompts to an outside server for processing, the experience is private and always available. For instance, Brave users can get help with their finances or medical questions without sending anything to the cloud. Running locally also eliminates the need to pay for unrestricted cloud access. With Ollama, users can take advantage of a wider variety of open-source models than most hosted services, which often support only one or two varieties of the same AI model.

Users can also interact with models that have different specializations, such as bilingual models, compact-sized models, code generation models and more.

RTX enables a fast, responsive experience when running AI locally. Using the Llama 3 8B model with llama.cpp, users can expect responses up to 149 tokens per second — or approximately 110 words per second. When using Brave with Leo AI and Ollama, this means snappier responses to questions, requests for content summaries and more.

NVIDIA internal throughput performance measurements on NVIDIA GeForce RTX GPUs, featuring a Llama 3 8B model with an input sequence length of 100 tokens, generating 100 tokens.

Get Started With Brave With Leo AI and Ollama

Installing Ollama is easy — download the installer from the project’s website and let it run in the background. From a command prompt, users can download and install a wide variety of supported models, then interact with the local model from the command line.

For simple instructions on how to add local LLM support via Ollama, read the company’s blog. Once configured to point to Ollama, Leo AI will use the locally hosted LLM for prompts and queries. Users can also switch between cloud and local models at any time.

Brave with Leo AI running on Ollama and accelerated by RTX is a great way to get more out of your browsing experience. You can even summarize and ask questions about AI Decoded blogs!

Developers can learn more about how to use Ollama and llama.cpp in the NVIDIA Technical Blog.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

PyTorch Conference 2024 Recap: On Fire 🔥

PyTorch Conference 2024 Recap: On Fire 🔥

women dancing with fire

The 2024 PyTorch Conference in San Francisco gathered nearly 1,500 AI researchers, developers, and enthusiasts. Over two days, the event featured engaging discussions, insightful keynotes, and hands-on sessions focused on artificial intelligence (AI) and advancements in PyTorch, the leading open-source machine learning framework. Attendees delved into the future of generative AI, Large Language Models (LLMs), and the crucial role open-source technology plays in driving AI innovation. Here’s a recap of the key themes, highlights, and major takeaways from this year’s conference.

Key Themes of the PyTorch Conference 2024

Three core themes emerged throughout the conference:

  1. Generative AI and LLMs: Many sessions focused on how PyTorch continues to evolve as a primary framework for Large Language Models and Generative AI applications. From scaling these models to optimizing their performance on various hardware platforms, the conference showcased the ongoing advancements and challenges in LLM architecture.
  2. Democratizing AI Through Open Source: One of the recurring themes was the importance of open source tools and communities in shaping the future of AI. PyTorch is committed to inclusivity, ease of use, and accessibility to developers of all levels, with a focus on bringing AI to an even larger global audience.
  3. Distributed and Edge Computing: Distributed computing and edge deployment appeared in many discussions, highlighting how PyTorch is being used to drive AI to the edge. The focus on edge accelerators, scalable training, and inference showcased how PyTorch enables the deployment of powerful models across diverse environments, from the cloud to on-device applications.

panel of people on a conference stage

Watch the Sessions from PyTorch Conference

The PyTorch Conference featured keynote sessions from top AI leaders and interesting lightning talks. You can view all of the conference sessions on our YouTube channel.

PyTorch Conference Startup Showcase

man speaking at a conference

New this year, the Startup Showcase was an exciting addition to the PyTorch Conference. Featuring early-stage founders pitching their AI startups to a panel of top venture capitalists, this event showcased the next generation of AI-driven innovation. The finalists for the inaugural PyTorch Conference Startup Showcase included Remix Inc., Cartesia, OpenBabylon, Remyx AI, A2 Labs, Inc., QuicSnap, Iso AI, CTGT, and Creao.ai, representing some of the most innovative AI/ML startups in the industry. Attendees got a front-row seat to see cutting-edge AI startups in action, while top VCs from the AI industry evaluated the pitches.

Congratulations to the PyTorch Conference Startup Showcase winner, CTGT! Deep learning can be opaque and biased, which limits its potential in crucial areas like healthcare and finance. CTGT is changing the game by enhancing data lineage in LLMs and cutting hallucinations. They’re empowering companies to create customized models using 500x less compute.

View the Startup Showcase

Mini-Summits

The DL Compiler Mini-Summit offered attendees a deep dive into the advances in deep learning (DL) compilers that are transforming AI workloads.

View the DL Compiler Mini-Summit

People watching an event

The Fine-Tuning Mini-Summit brought together a thriving community of researchers, developers, practitioners and hobbyists which focuses on topics ranging from memory efficiency, parameter-efficient fine-tuning and quantization to performance at scale and reproducible evaluations.

View the Fine-Tuning Mini-Summit

Major Takeaways from the PyTorch Conference 2024

Matt giving his keynote

  1. LLMs are Here to Stay: were a focal point of the event, reaffirming their pivotal role in the future of AI. As these models continue to scale, PyTorch remains the preferred framework for developing, training, and deploying them across various platforms and industries.
  2. Open Source Drives Innovation: A key takeaway from the conference was that open-source tools like PyTorch are vital for democratizing AI. This community-driven approach accelerates innovation, enabling researchers and developers globally to collaborate and contribute to faster advancements and more accessible AI technologies.
  3. Ethics and Sustainability Matter: The focus on ethical AI development was a significant takeaway. Talks on the inclusivity of computer vision models, the environmental impacts of AI infrastructure, and the need for transparent, unbiased AI models highlighted the growing importance of ethical considerations in the future of AI.
  4. PyTorch Expands Beyond the Cloud: With several sessions dedicated to edge AI and distributed computing, the conference showcased how PyTorch is expanding beyond cloud-based applications into edge devices and diverse computing environments. This shift is crucial as AI advances into areas like autonomous vehicles, mobile applications, and IoT devices.

Thank You to Our Sponsors

A crowd of people at a conference

Sponsor logos

We would like to thank each of the sponsors that made the PyTorch Conference 2024 possible. These include:

Diamond Sponsors:

  • AMD
  • Cloud Native Computing Foundation
  • IBM
  • Intel – PyTorch
  • Lightning.ai
  • Meta – PyTorch

Platinum Sponsors:

  • Arm
  • Google
  • Lambda Labs
  • Nvidia

Silver Sponsors:

  • Anyscale – PyTorch
  • Baseten
  • Chainguard
  • Databricks
  • Fal
  • FuriosaAi
  • HPE
  • Jane Street
  • Microsoft – PyTorch
  • MinIO
  • Outerbounds
  • Together.AI

Bronze Sponsors:

  • d-Matrix
  • MemVerge
  • Perforated AI
  • Quansight
  • Rotational Labs
  • ScaleGenAI

Special Event Sponsors:

  • PyTorch Flare Party: Hugging Face
  • Startup Showcase: Mayfield
  • Diversity Scholarship: AWS
  • Women and Non-Binary in PyTorch Lunch: Google
  • Happy Hour Reception: Lightning.AI

Thank you for your continued support in advancing the PyTorch ecosystem and helping to shape the future of AI!

Save the Date

See you next year for the PyTorch Conference in San Francisco at the Palace of Fine Arts from October 22-23, 2025.

Read More

AWS recognized as a first-time Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

AWS recognized as a first-time Leader in the 2024 Gartner Magic Quadrant for Data Science and Machine Learning Platforms

Over the last 18 months, AWS has announced more than twice as many machine learning (ML) and generative artificial intelligence (AI) features into general availability than the other major cloud providers combined. This accelerated innovation is enabling organizations of all sizes, from disruptive AI startups like Hugging Face, AI21 Labs, and Articul8 AI to industry leaders such as NASDAQ and United Airlines, to unlock the transformative potential of generative AI. By providing a secure, high-performance, and scalable set of data science and machine learning services and capabilities, AWS empowers businesses to drive innovation through the power of AI.

At the heart of this innovation are Amazon Bedrock and Amazon SageMaker, both of which were mentioned in the recent Gartner Data Science and Machine Learning (DSML) Magic Quadrant evaluation. These services play a pivotal role in addressing diverse customer needs across the generative AI journey.

Amazon SageMaker, the foundational service for ML and generative AI model development, provides the fine-tuning and flexibility that makes it simple for data scientists and machine learning engineers to build, train, and deploy machine learning and foundation models (FMs) at scale. For application developers, Amazon Bedrock is the simplest way to build and scale generative AI applications with FMs for a wide variety of use cases. Whether leveraging the best FMs out there or importing custom models from SageMaker, Bedrock equips development teams with the tools they need to accelerate innovation.

We believe continued innovations for both services and our positioning as a Leader in the 2024 Gartner Data Science and Machine Learning (DSML) Magic Quadrant reflects our commitment to meeting evolving customer needs, particularly in data science and ML. In our opinion, this recognition, coupled with our recent recognition in the Cloud AI Developer Services (CAIDS) Magic Quadrant, solidifies AWS as a provider of innovative AI solutions that drive business value and competitive advantage.

Review the Gartner Magic Quadrant and Methodology

For Gartner, the DSML Magic Quadrant research methodology provides a graphical competitive positioning of four types of technology providers in fast-growing markets: Leaders, Visionaries, Niche Players and Challengers. As companion research, Gartner Critical Capabilities notes provide deeper insight into the capability and suitability of providers’ IT products and services based on specific or customized use cases.

The following figure highlights where AWS lands in the DSML Magic Quadrant.

Access a complimentary copy of the full report to see why Gartner positioned AWS as a Leader, and dive deep into the strengths and cautions of AWS.

Further detail on Amazon Bedrock and Amazon SageMaker

Amazon Bedrock provides a straightforward way to build and scale applications with large language models (LLMs) and foundation models (FMs), empowering you to build generative AI applications with security and privacy. With Amazon Bedrock, you can experiment with and evaluate high performing FMs for your use case, import custom models, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources. Tens of thousands of customers across multiple industries are deploying new generative AI experiences for diverse use cases.

Amazon SageMaker is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost ML for any use case. You can access a wide-ranging choices of ML tools, fully managed and scalable infrastructure, repeatable and responsible ML workflows and the power of human feedback across the ML lifecycle, including sophisticated tools that make it straightforward to work with data like Amazon SageMaker Canvas and Amazon SageMaker Data Wrangler.

In addition, Amazon SageMaker helps data scientists and ML engineers build FMs from scratch, evaluate and customize FMs with advanced techniques, and deploy FMs with fine-grained controls for generative AI use cases that have stringent requirements on accuracy, latency, and cost. Hundreds of thousands of customers from Perplexity to Thomson Reuters to Workday use SageMaker to build, train, and deploy ML models, including LLMs and other FMs.

Gartner does not endorse any vendor, product or service depicted in its research publications and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from AWS.

GARTNER is a registered trademark and service mark of Gartner and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved.


About the author

Susanne Seitinger leads AI and ML product marketing at Amazon Web Services (AWS), including the introduction of critical generative AI services like Amazon Bedrock as well as coordinating generative AI marketing activities across AWS. Prior to AWS, Susanne was the director of public sector marketing at Verizon Business Group, and previously drove public sector marketing in the United States for Signify, after holding various positions in R&D, innovation, and segment management and marketing. She holds a BA from Princeton University, as well as a master’s in city planning and a PhD from MIT.

Read More

Build a serverless voice-based contextual chatbot for people with disabilities using Amazon Bedrock

Build a serverless voice-based contextual chatbot for people with disabilities using Amazon Bedrock

At Amazon and AWS, we are always finding innovative ways to build inclusive technology. With voice assistants like Amazon Alexa, we are enabling more people to ask questions and get answers on the spot without having to type. Whether you’re a person with a motor disability, juggling multiple tasks, or simply away from your computer, getting search results without typing is a valuable feature. With modern voice assistants, you can now ask your questions conversationally and get verbal answers instantly.

In this post, we discuss voice-guided applications. Specifically, we focus on chatbots. Chatbots are no longer a niche technology. They are now ubiquitous on customer service websites, providing around-the-clock automated assistance. Although AI chatbots have been around for years, recent advances of large language models (LLMs) like generative AI have enabled more natural conversations. Chatbots are proving useful across industries, handling both general and industry-specific questions. Voice-based assistants like Alexa demonstrate how we are entering an era of conversational interfaces. Typing questions already feels cumbersome to many who prefer the simplicity and ease of speaking with their devices.

We explore how to build a fully serverless, voice-based contextual chatbot tailored for individuals who need it. We also provide a sample chatbot application. The application is available in the accompanying GitHub repository. We create an intelligent conversational assistant that can understand and respond to voice inputs in a contextually relevant manner. The AI assistant is powered by Amazon Bedrock. This chatbot is designed to assist users with various tasks, provide information, and offer personalized support based on their unique requirements. For our LLM, we use Anthropic Claude on Amazon Bedrock.

We demonstrate the process of integrating Anthropic Claude’s advanced natural language processing capabilities with the serverless architecture of Amazon Bedrock, enabling the deployment of a highly scalable and cost-effective solution. Additionally, we discuss techniques for enhancing the chatbot’s accessibility and usability for people with motor disabilities. The aim of this post is to provide a comprehensive understanding of how to build a voice-based, contextual chatbot that uses the latest advancements in AI and serverless computing.

We hope that this solution can help people with certain mobility disabilities. A limited level of interaction is still required, and specific identification of start and stop talking operations is required. In our sample application, we address this by having a dedicated Talk button that performs the transcription process while being pressed.

For people with significant motor disabilities, the same operation can be implemented with a dedicated physical button that can be pressed by a single finger or another body part. Alternatively, a special keyword can be said to indicate the beginning of the command. This approach is used when you communicate with Alexa. The user always starts the conversation with “Alexa.”

Solution overview

The following diagram illustrates the architecture of the solution.

Architecture of serverless components of the solution

To deploy this architecture, we need managed compute that can host the web application, authentication mechanisms, and relevant permissions. We discuss this later in the post.

All the services that we use are serverless and fully managed by AWS. You don’t need to provision the compute resources. You only consume the services through their API. All the calls to the services are made directly from the client application.

The application is a simple React application that we create using the Vite build tool. We use the AWS SDK for JavaScript to call the services. The solution uses the following major services:

  • Amazon Polly is a service that turns text into lifelike speech.
  • Amazon Transcribe is an AWS AI service that makes it straightforward to convert speech to text.
  • Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities that you need to build generative AI applications.
  • Amazon Cognito is an identity service for web and mobile apps. It’s a user directory, an authentication server, and an authorization service for OAuth 2.0 access tokens and AWS credentials.

To consume AWS services, the user needs to obtain temporary credentials from AWS Identity and Access Management (IAM). This is possible due to the Amazon Cognito identity pool, which acts as a mediator between your application user and IAM services. The identity pool holds the information about the IAM roles with all permissions necessary to run the solution.

Amazon Polly and Amazon Transcribe don’t require additional setup from the client aside from what we have described. However, Amazon Bedrock requires named user authentication. This means that having an Amazon Cognito identity pool is not enough—you also need to use the Amazon Cognito user pool, which allows you to define users and bind them to the Amazon Cognito identity pool. To understand better how Amazon Cognito allows external applications to invoke AWS services, refer to refer to Secure API Access with Amazon Cognito Federated Identities, Amazon Cognito User Pools, and Amazon API Gateway.

The heavy lifting of provisioning the Amazon Cognito user pool and identity pool, including generating the sign-in UI for the React application, is done by AWS Amplify. Amplify consists of a set of tools (open source framework, visual development environment, console) and services (web application and static website hosting) to accelerate the development of mobile and web applications on AWS. We cover the steps of setting Amplify in the next sections.

Prerequisites

Before you begin, complete the following prerequisites:

  1. Make sure you have the following installed:
  2. Create an IAM role to use in the Amazon Cognito identity pool. Use the least privilege principal to provide only the minimum set of permissions needed to run the application.
    • To invoke Amazon Bedrock, use the following code:
      {
      					  "Version": "2012-10-17",
      					  "Statement": [
      						{
      						  "Sid": "VisualEditor1",
      						  "Effect": "Allow",
      						  "Action": "bedrock:InvokeModel",
      						  "Resource": "*"
      						}
      					  ]
      					}

    • To invoke Amazon Polly, use the following code:
      {
      					  "Version": "2012-10-17",
      					  "Statement": [
      						{
      						  "Sid": "VisualEditor2",
      						  "Effect": "Allow",
      						  "Action": "polly:SynthesizeSpeech",
      						  "Resource": "*"
      						}
      					  ]
      					}

    • To invoke Amazon Transcribe, use the following code:
      {
      				  "Version": "2012-10-17",
      				  "Statement": [
      					{
      					  "Sid": "VisualEditor3",
      					  "Effect": "Allow",
      					  "Action": "transcribe:StartStreamTranscriptionWebSocket",
      					  "Resource": "*"
      					}
      				  ]
      				}

The full policy JSON should look as follows:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "VisualEditor1",
      "Effect": "Allow",
      "Action": "bedrock:InvokeModel",
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor2",
      "Effect": "Allow",
      "Action": "polly:SynthesizeSpeech",
      "Resource": "*"
    },
    {
      "Sid": "VisualEditor3",
      "Effect": "Allow",
      "Action": "transcribe:StartStreamTranscriptionWebSocket",
      "Resource": "*"
    }
  ]
}
  1. Run the following command to clone the GitHub repository:
    git clone https://github.com/aws-samples/serverless-conversational-chatbot.git

  2. To use Amplify, refer to Set up Amplify CLI to complete the initial setup.
  3. To be consistent with the values that you use later in the instructions, call your AWS profile amplify when you see the following prompt.
    Creation of the AWS profile "amplify"
  4. Create the role amplifyconsole-backend-role with the AdministratorAccess-Amplify managed policy, which allows Amplify to create the necessary resources.
    IAM Role with "AdministratorAccess-Amplify" policy
  5. For this post, we use the Anthropic Claude 3 Haiku LLM. To enable the LLM in Amazon Bedrock, refer to Access Amazon Bedrock foundation models.

Deploy the solution

There are two options to deploy the solution:

  • Use Amplify to deploy the application automatically
  • Deploy the application manually

We provide the steps for both options in this section.

Deploy the application automatically using Amplify

Amplify can deploy the application automatically if it’s stored in GitHub, Bitbucket, GitLab, or AWS CodeCommit. Upload the application that you downloaded earlier to your preferred repository (from the aforementioned options). For instructions, see Getting started with deploying an app to Amplify Hosting.

You can now continue to the next section of this post to set up IAM permissions.

Deploy the application manually

If you don’t have access to one of the storage options that we mentioned, you can deploy the application manually. This can also be useful if you want to modify the application to better fit your use case.

We tested the deployment on AWS Cloud9, a cloud integrated development environment (IDE) for writing, running, and debugging code, with Ubuntu Server 22.04 and Amazon Linux 2023.

We use the Visual Studio Code IDE and run all the following commands directly in the terminal window inside the IDE, but you can also run the commands in the terminal of your choice.

  1. From the directory where you checked out the application on GitHub, run the following command:
    cd serverless-conversational-chatbot

  2. Run the following commands:
    npm i
    
    amplify init

  3. Follow the prompts as shown in the following screenshot.
    • For authentication, choose the AWS profile amplify that you created as part of the prerequisite steps.
      Initial AWS Amplify setup in React application: 1. Do you want to use an existing environment? No 2. Enter a name for the environment: sampleenv 3. Select the authentication method you want to use: AWS Profile 4. Please choose the profile you want to use: amplify
    • Two new files will appear in the project under the src folder:
      • amplifyconfiguration.json
      • aws-exports.js

      New objects created by AWS Amplify: 1. aws-exports.js 2. amplifyconfiguration.json

  1. Next run the following command:
    amplify configure project

Then select “Project Information”

Project Configuration of AWS Amplify in React Applications

  1.  Enter the following information:
    Which setting do you want to configure? Project information
    
    Enter a name for the project: servrlsconvchat
    
    Choose your default editor: Visual Studio Code
    
    Choose the type of app that you're building: javascript
    
    What javascript framework are you using: react
    
    Source Directory Path: src
    
    Distribution Directory Path: dist
    
    Build Command: npm run-script build
    
    Start Command: npm run-script start

You can use an existing Amazon Cognito identity pool and user pool or create new objects.

  1. For our application, run the following command:
    amplify add auth

If you get the following message, you can ignore it:

Auth has already been added to this project. To update run amplify update auth
  1. Choose Default configuration.
    Selecting "default configuration" when adding authentication objects
  2. Accept all options proposed by the prompt.
  3. Run the following command:
    amplify add hosting

  4. Choose your hosting option.

You have two options to host the application. The application can be hosted to the Amplify console or to Amazon Simple Storage Service (Amazon S3) and then exposed through Amazon CloudFront.

Hosting with the Amplify console differs from CloudFront and Amazon S3. The Amplify console is a managed service providing continuous integration and delivery (CI/CD) and SSL certificates, prioritizing swift deployment of serverless web applications and backend APIs. In contrast, CloudFront and Amazon S3 offer greater flexibility and customization options, particularly for hosting static websites and assets with features like caching and distribution. CloudFront and Amazon S3 are preferable for intricate, high-traffic web applications with specific performance and security needs.

For this post, we use the Amplify console. To learn more about the deployment with Amazon S3 and Amazon CloudFront, refer to documentation.
Selecting the deployment option for the React application on the Amplify Console. Selected option: Hosting with Amplify Console

Now you’re ready to publish the application. There is an option to publish the application to GitHub to support CI/CD pipelines. Amplify has built-in integration with GitHub and can redeploy the application automatically when you push the changes. For simplicity, we use manual deployment.

  1. Choose Manual deployment.
    Selecting "Manual Deployment" when publishing the project
  2. Run the following command:
    amplify publish

After the application is published, you will see the following output. Note down this URL to use in a later step.
Result of the Deployment of the React Application on the Amplify Console. The URL that the user should use to enter the Amplify application

  1. Log in to the Amplify console, navigate to the servrlsconvchat application, and choose General under App settings in the navigation pane.
    Service Role attachment to the deployed application. First step. Select the deployed application. Seelct “General” option
  2. Edit the app settings and enter amplifyconsole-backend-role for Service role (you created this role in the prerequisites section).
    Service Role attachment to the deployed application. Second step. Setting “amplifyconsole-backend-role” in the “Service role” field

Now you can proceed to the next section to set up IAM permissions.

Configure IAM permissions

As part of the publishing method you completed, you provisioned a new identity pool. You can view this on the Amazon Cognito console, along with a new user pool. The names will be different from those presented in this post.

As we explained earlier, you need to attach policies to this role to allow interaction with Amazon Bedrock, Amazon Polly, and Amazon Transcribe. To set up IAM permissions, complete the following steps:

  1. On the Amazon Cognito console, choose Identity pools in the navigation pane.
  2. Navigate to your identity pool.
  3. On the User access tab, choose the link for the authenticated role.
    Identifying the IAM Authentication Role in the Cognitive Identity Pool. Select “Identity pools” option in the console. Select “User access” tab. Click on the link under “Authentication role”
  4. Attach the policies that you defined in the prerequisites section.
    IAM Policies Attached to Cognito Identity Pool Authenticated Roles. Textual data presaented in “Prerequisites” section, item 2.

Amazon Bedrock can only be used with a named user, so we create a sample user in the Amazon Cognito user pool that was provisioned as part of the application publishing process.

  1. On the user pool details page, on the Users tab, choose Create user.
    User Creation in the Cognito User Pool. Select relevant user pool in “User pools” section. Select “Users” tab. Click on “Create user” button
  2. Provide your user information.
    Sample user definition in the Cognito User Pool. Enter email address and temporary password.

You’re now ready to run the application.

Use the sample serverless application

To access the application, navigate to the URL you saved from the output at the end of the application publishing process. Sign in to the application with the user you created in the previous step. You might be asked to change the password the first time you sign in.
Application Login Page. Enter user name and password

Use the Talk button and hold it while you’re asking the question. (We use this approach for the simplicity of demonstrating the abilities of the tool. For people with motor disabilities, we propose using a dedicated button that can be operated with different body parts, or a special keyword to initiate the conversation.)

When you release the button, the application sends your voice to Amazon Transcribe and returns the transcription text. This text is used as an input for an Amazon Bedrock LLM. For this example, we use Anthropic Claude 3 Haiku, but you can modify the code and use another model.

The response from Amazon Bedrock is displayed as text and is also spoken by Amazon Polly.
Instructions on how to invoke the "Talk" operation, by using “Talk” operation

The conversation history is also stored. This means that you can ask follow-up questions, and the context of the conversation is preserved. For example, we asked, “What is the most famous tower there?” without specifying the location, and our chatbot was able to understand that the context of the question is Paris based on our previous question.
Demonstration of context preservation during conversation. Continues question-answer conversation with chatbot.

We store the conversation history inside a JavaScript variable, which means that if you refresh the page, the context will be lost. We discuss how to preserve the conversation context in a persistent way later in this post.

To identify that the transcription process is happening, choose and hold the Talk button. The color of the button changes and a microphone icon appears.

"Talk" operation indicator. “Talk” button changes color to orche

Clean up

To clean up your resources, run the following command from the same directory where you ran the Amplify commands:

amplify delete

Result of the "Cleanup" operation after running “amplify delete” command

This command removes the Amplify settings from the React application, Amplify resources, and all Amazon Cognito objects, including the IAM role and Amazon Cognito user pool’s user.

Conclusion

In this post, we presented how to create a fully serverless voice-based contextual chatbot using Amazon Bedrock with Anthropic Claude.

This serves a starting point for a serverless and cost-effective solution. For example, you could extend the solution to have persistent conversational memory for your chats, such as Amazon DynamoDB. If you want to use a Retrieval Augmented Generation (RAG) approach, you can use Amazon Bedrock Knowledge Bases to securely connect FMs in Amazon Bedrock to your company data.

Another approach is to customize the model you use in Amazon Bedrock with your own data using fine-tuning or continued pre-training to build applications that are specific to your domain, organization, and use case. With custom models, you can create unique user experiences that reflect your company’s style, voice, and services.

For additional resources, refer to the following:


About the Author

Michael Shapira is a Senior Solution Architect covering general topics in AWS and part of the AWS Machine Learning community. He has 16 years’ experience in Software Development. He finds it fascinating to work with cloud technologies and help others on their cloud journey.

Eitan Sela is a Machine Learning Specialist Solutions Architect with Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them build and operate machine learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Read More

Maintain access and consider alternatives for Amazon Monitron

Maintain access and consider alternatives for Amazon Monitron

Amazon Monitron, the Amazon Web Services (AWS) machine learning (ML) service for industrial equipment condition monitoring, will no longer be available to new customers effective October 31, 2024. Existing customers of Amazon Monitron will be able to purchase devices and use the service as normal. We will continue to sell devices until July 2025 and will honor the 5-year device warranty, including service support. AWS continues to invest in security, availability, and performance improvements for Amazon Monitron, but we do not plan to introduce new features to Amazon Monitron.

This post discusses how customers can maintain access to Amazon Monitron after it is closed to new customers and what some alternatives are to Amazon Monitron.

Maintaining access to Amazon Monitron

Customers will be considered an existing customer if they have commissioned an Amazon Monitron sensor through a project any time in the 30 days prior to October 31, 2024. In order to maintain access to the service after October 31, 2024, customers should create a project and commission at least one sensor.

For any questions or support needed, you may contact your assigned account manager, solutions architect, or create a case from the AWS Management Console.

Ordering Amazon Monitron hardware

For existing Amazon business customers, we will allowlist your account with the existing Amazon Monitron devices. For existing Amazon.com retail customers, the Amazon Monitron team will provide specific ordering instructions according to individual request.

Alternatives to Amazon Monitron

For customers interested in an alternative for your condition monitoring needs, we recommend exploring alternative solutions provided by our AWS Partners: Tactical Edge, IndustrAI, and Factory AI.

Summary

While new customers will no longer have access to Amazon Monitron after October 31, 2024, AWS offers a range of partner solutions through the AWS Partner Network finder. Customers should explore these options to understand what works best for their specific needs.

More details can be found in the following resources at AWS Partner Network.


About the author

Stuart Gillen is a Sr. Product Manager for Monitron, at AWS. Stuart has held a variety of roles in engineering management, business development, product management, and consulting. Most of his career has been focused on industrial applications specifically in reliability practices, maintenance systems, and manufacturing.

Read More

Data Formulator: Exploring how AI can help analysts create rich data visualizations 

Data Formulator: Exploring how AI can help analysts create rich data visualizations 

white outline icons (representing AI and human computer interaction) on a blue to purple to pink gradient background.

Transforming raw data into meaningful visuals, such as charts, is key to uncovering hidden trends and valuable insights, but even with advances in AI-powered tools, this process remains complex. Integrating AI into the iterative nature of the data visualization process is particularly challenging, as data analysts often struggle to describe complicated tasks in a single text prompt while lacking the direct control of traditional tools. This highlights the need for smarter, more intuitive solutions that combine AI’s precision with the flexibility of hands-on methods.

To address this, we’re excited to release Data Formulator as an open-source research project. This update builds on last year’s release by combining user interface (UI) interactions for designing charts with natural language input for refining details. Unlike the previous version, which required users to choose between two methods, this unified approach allows them to iteratively solve complex tasks and with less effort.

  • Download

    Data Formulator 

    Transform data and create rich visualizations iteratively with AI.

Figure 1: This figure shows the user interface of Data Formulator. There are four callouts in the figure highlighting key components of the user interface. The first call out describes “1. Concept Encoding Shelf: specify charts with field encodings and NL instructions”. The second callout describes “2. (Local) Data Threads: backtrack and revise inputs”. The third describes “3. Data Threads: navigate data derivation history”. The fourth callout contains “4. Data View: inspect original and derived data”. The user interface contains a visualization in the center that shows renewable percentage.
Figure 1. Data Formulator’s UI

Creating and refining charts with the Concept Encoding Shelf and data threads

With Data Formulator, data analysts can now create charts from scratch or select from existing designs through data threads. The UI features a pane called the “Concept Encoding Shelf,” where users can build their chart by dragging various data fields into it and defining them or by creating new ones. A large language model (LLM) on the backend processes this input, generating the necessary code to produce the visual and updating the data threads for future use. This process is illustrated in Figure 2.

Figure 2: This figure shows the user experience workflow in Data Formulator. On the left it shows Data Threads, and the user clicks a line chart that visualizes the renewable percentage of 20 countries and expands it in the main panel. In the middle it shows “Concept Encoding Shelf”, and the user provides an instruction “Show only top 5 CO2 emission countries”. On the right it shows the result produced from running the user instruction with AI: the result is a table with three columns “Year” “Entity” “Renewable Percentage” and int contains only top 5 CO2 countries’ values; a line chart that only contains these five countries trends is also generated. The line chart is added to data threads.
Figure 2. To create a new chart, users can select a previously created chart from the data threads and then use a combination of UI elements and language to describe their intent.

Data threads enable users to review and modify charts they created previously. This iterative process streamlines the editing and refinement process, as the LLM adapts past code to new contexts. Without this feature, users would need to provide more detailed prompts to recreate designs from scratch. This iterative mechanism also allows users to continue updating their charts until they’re satisfied.

Figure 3: This figure illustrates how Data Formulator’s data threads work. On the left side, it shows two data threads, one is the derivation process of electricity produced from each energy source from each country from 2000 to 2020, the other is the thread showing that the user derives the renewable percentage of each country per year followed by a line chart that shows the rankings of these countries. The figure illustrates that each of the plots is backed by a python data transformation code to derive data appropriate to the user instruction. On the right it shows actions users can take in local data threads: (a) the user can click and rerun a previous instruction, (b) the user can provide a new instruction to follow up, (c) the user can click the previous card and revise instruction and rerun.
Figure 3: Data Formulator’s data threads support complex navigation, quick editing, and the rerunning of previous instructions. 

Data Formulator’s framework

Data Formulator’s architecture separates data transformation from chart configuration, improving both the user experience and AI performance. Upon receiving user specifications, the system follows a three-step process: (1) it generates a Vega-Lite script, which defines how data is visualized; (2) it instructs the AI to handle data transformation; and (3) it creates the chart using the converted data, as illustrated in Figure 4.

Figure 4: This figure shows data formulator architecture. The left side shows user’s chart specification with Year on x-axis, rank on y-axis, Entity on color with instruction “rank by renewable percentage”. In the first step, Data Formulator generates a Vega-Lite line chart template with field names. In step 2, Data Formulator compiles a prompt containing “system prompt”, “Context (data fields + sample data + dialog history)” and “Goal (user instruction + expected fields)”, and AI takes this prompt to generate a python code to transform the data. In step 3, Data Formulator combines the data and the Vega-Lite spec to create a line chart that shows ranking of the countries from 2000 to 2020.
Figure 4: Behind the scenes, Data Formulator compiles a Vega-Lite script from the Concept Encoding Shelf (1), prompts the LLM to generate the necessary code for preparation (2), and, upon creating new data, creates the chart (3).

Implications and looking forward

Refining how users interact with AI-powered tools is essential for improving how they communicate their requirements, paving the way for more efficient and effective collaboration. By integrating UI elements and natural language input, we designed Data Formulator to let users to define their visualization needs with precision, leading to better results and reducing the need for multiple clarifications.

While Data Formulator addresses some challenges in data transformation and visualization authoring, others remain. For example, how can AI assist in cleaning unstructured data without losing critical information? And how can it help users define clear data analysis goals when starting with ambiguous or undefined objectives? We’re actively investigating these research questions and invite you to contribute by building on the Data Formulator codebase (opens in new tab).

Learn more about our research efforts on human-AI interaction by exploring how we design dynamic UI widgets (opens in new tab) for visualization editing. You can also view a demo of the Data Formulator project on GitHub Codespace (opens in new tab).

Acknowledgements

We’d like to thank Bongshin Lee, John Thompson, and Gonzalo Ramos for their feedback and contributions to this project. 

The post Data Formulator: Exploring how AI can help analysts create rich data visualizations  appeared first on Microsoft Research.

Read More

NVIDIA AI Summit DC: Industry Leaders Gather to Showcase AI’s Real-World Impact

NVIDIA AI Summit DC: Industry Leaders Gather to Showcase AI’s Real-World Impact

Washington, D.C., is where possibility has always met policy, and AI presents unparalleled opportunities for tackling global challenges.

NVIDIA’s AI Summit in Washington, set for October 7-9, will gather industry leaders to explore how AI addresses some of society’s most significant challenges.

Held at the Ronald Reagan Building and JW Marriott in the heart of the nation’s capital, the event will focus on the potential of AI to drive breakthroughs in healthcare, cybersecurity, manufacturing and more.

Attendees will hear from industry leaders in 50 sessions, live demos and hands-on workshops covering such topics as generative AI, remote sensing, cybersecurity, robotics and industrial digitalization.

Key Speakers and Sessions

Throughout the conference, speakers will touch on sustainability, economic development and AI for good.

A highlight of the event is the special address by Bob Pette, vice president of enterprise platforms at NVIDIA, on October 8.

Pette will explain how NVIDIA’s accelerated computing platform enables advancements in sensor processing, autonomous systems, and digital twins. These AI applications offer wide-reaching benefits across industries.

Following Pette’s keynote, Greg Estes, vice president of corporate marketing and developer programs at NVIDIA, will discuss how the company’s AI platform is empowering millions of developers worldwide.

Estes will provide insights into NVIDIA’s workforce development programs, which are designed to prepare the next generation of AI talent through hands-on training and certifications.

He’ll spotlight NVIDIA’s extensive training initiatives, including those offered at the AI Summit and throughout the year via the NVIDIA Deep Learning Institute, emphasizing how these programs are equipping individuals with the critical skills needed in the AI-driven economy.

Estes will also share examples of successful collaborations with federal and state governments, as well as educational institutions, that are helping to expand AI education and workforce development efforts.

In addition, Estes will highlight opportunities for organizations to partner with NVIDIA in broadening AI training and reskilling initiatives, ensuring that more professionals can contribute to and benefit from the rapid advancements in AI technology.

Other notable speakers include Lisa Einstein, chief AI scientist at the Cybersecurity and Infrastructure Security Agency (CISA), who will offer an executive perspective in her session, “Navigating the Future of Cyber Operations with AI.”

This session will provide critical insights into how AI is transforming the landscape of cyber operations and securing national infrastructure.

Additionally, Sheri Bachstein, CEO of The Weather Company, will focus on how AI-driven tools are addressing environmental challenges like climate monitoring, while Helena Fu, director at the U.S. Department of Energy, will speak to the role of AI in bolstering national security and advancing sustainable technologies.

Breakthroughs and Demos

With more than 60 sessions planned, the summit will explore critical topics such as generative AI, sustainable computing and AI policy.

Key sessions include Kari Briski, vice president of generative AI software product management at NVIDIA, discussing the impact of NVIDIA’s generative AI platform across industries, and Rev Lebaredian, vice president of Omniverse and simulation technology, covering the future of physical AI, robotics and autonomy.

Renee Wegrzyn, director of ARPA-H, and Rory Kelleher, who leads global business development for healthcare and life sciences at NVIDIA, will delve into AI-enabled healthcare, while Tanya Das, from the Bipartisan Policy Center, will examine how AI can drive scientific discovery, economic growth and national security.

Live demos will showcase groundbreaking innovations such as NVIDIA’s Earth-2, a climate forecasting tool, alongside advancements in quantum computing and robotics. A panel of NVIDIA experts, including Nikki Pope and Leon Derczynski, will address the tools ensuring safe and responsible AI deployment.

Hands-on technical workshops will offer attendees opportunities to earn certifications in data science, generative AI and other essential skills for the future workforce.

These sessions will provide participants with the tools needed to help Americans thrive in an AI-driven economy, enhancing productivity and creating new career opportunities.

Networking and Industry Partnerships

The summit will feature over 95 sponsors, including Microsoft, Dell and Lockheed Martin, showcasing how AI is transforming industries.

Attendees will be able to engage with these partners in the expo hall and explore how AI solutions are being implemented to drive positive change in the public and private sectors.

Whether attending in person or virtually, the NVIDIA AI Summit will provide insights into how AI is contributing to solutions for today’s most significant challenges.

Read More