Yotta CEO Sunil Gupta on Supercharging India’s Fast-Growing AI Market

Yotta CEO Sunil Gupta on Supercharging India’s Fast-Growing AI Market

India’s AI market is expected to be massive. Yotta Data Services is setting its sights on supercharging it. In this episode of NVIDIA’s AI Podcast, Sunil Gupta, cofounder, managing director and CEO of Yotta Data Services, speaks with host Noah Kravitz about the company’s Shakti Cloud offering, which provides scalable GPU services for enterprises of all sizes. Yotta is the first Indian cloud services provider in the NVIDIA Partner Network, and its Shakti Cloud is India’s fastest AI supercomputing infrastructure, with 16 exaflops of compute capacity supported by over 16,000 NVIDIA H100 Tensor Core GPUs. Tune in to hear Gupta’s insights on India’s potential as a major AI market and how to balance data center growth with sustainability and energy efficiency.

Stay tuned for more AI Podcast episodes recorded live from GTC.

Time Stamps

1:18: Background on Yotta
2:58: What is Shakti Cloud?
6:44: What does Shakti Cloud mean for India’s tech sector?
10:36: The self-service, scalable capabilities of Shakti Cloud
19:48: Balancing data center growth with sustainability
24:35: Yotta’s work with NVIDIA
27:48: What’s next for Yotta?

You Might Also Like…

Performance AI: Insights From Arthur’s Adam Wenchel – Ep. 221

In this episode of the NVIDIA AI Podcast, recorded live at the GTC 2024, host Noah Kravitz sits down with Adam Wenchel, co-founder and CEO of Arthur, about the company’s technology, which enhances the performance of AI systems across various metrics like accuracy, explainability and fairness.

How the Ohio Supercomputing Center Drives the Future of Computing – Ep. 213

NASCAR races are all about speed, but even the fastest cars need to factor in safety, especially as rules and tracks change. The Ohio Supercomputer Center is ready to help. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with Alan Chalker, the director of strategic programs at the OSC, about all things supercomputing.

Replit CEO Amjad Masad on Empowering the Next Billion Software Creators – Ep. 201

Replit aims to empower the next billion software creators. In this week’s episode of NVIDIA’s AI Podcast, host Noah Kraviz dives into a conversation with Replit CEO Amjad Masad about bridging the gap between ideas and software, a task simplified by advances in generative AI.

Rendered.ai CEO Nathan Kundtz on Using AI to Build Better AI – Ep. 177

Data is the fuel that makes artificial intelligence run. Training machine learning and AI systems requires data, but compiling quality real-world data for AI and ML can be difficult and expensive. That’s where synthetic data comes in. In this episode of NVIDIA’s AI Podcast, Nathan Kundtz, founder and CEO of Rendered.ai, speaks with host Noah Kravtiz about a platform as a service for creating synthetic data to train AI models.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Google Play, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Streamline custom model creation and deployment for Amazon Bedrock with Provisioned Throughput using Terraform

Streamline custom model creation and deployment for Amazon Bedrock with Provisioned Throughput using Terraform

As customers seek to incorporate their corpus of knowledge into their generative artificial intelligence (AI) applications, or to build domain-specific models, their data science teams often want to conduct A/B testing and have repeatable experiments. In this post, we discuss a solution that uses infrastructure as code (IaC) to define the process of retrieving and formatting data for model customization and initiating the model customization. This enables you to version and iterate as needed.

With Amazon Bedrock, you can privately and securely customize foundation models (FMs) with your own data to build applications that are specific to your domain, organization, and use case. With custom models, you can create unique user experiences that reflect your company’s style, voice, and services.

Amazon Bedrock supports two methods of model customization:

  • Fine-tuning allows you to increase model accuracy by providing your own task-specific labeled training dataset and further specialize your FMs.
  • Continued pre-training allows you to train models using your own unlabeled data in a secure and managed environment and supports customer-managed keys. Continued pre-training helps models become more domain-specific by accumulating more robust knowledge and adaptability—beyond their original training.

In this post, we provide guidance on how to create an Amazon Bedrock custom model using HashiCorp Terraform that allows you to automate the process, including preparing datasets used for customization.

Terraform is an IaC tool that allows you to manage AWS resources, software as a service (SaaS) resources, datasets, and more, using declarative configuration. Terraform provides the benefits of automation, versioning, and repeatability.

Solution overview

We use Terraform to download a public dataset from the Hugging Face Hub, convert it to JSONL format, and upload it to an Amazon Simple Storage Service (Amazon S3) bucket with a versioned prefix. We then create an Amazon Bedrock custom model using fine-tuning, and create a second model using continued pre-training. Lastly, we configure Provisioned Throughput for our new models so we can test and deploy the custom models for wider usage.

The following diagram illustrates the solution architecture.

Diagram depicting Amazon Bedrock Custom Model creation process using Terraform.

The workflow includes the following steps:

  1. The user runs the terraform apply The Terraform local-exec provisioner is used to run a Python script that downloads the public dataset DialogSum from the Hugging Face Hub. This is then used to create a fine-tuning training JSONL file.
  2. An S3 bucket stores training, validation, and output data. The generated JSONL file is uploaded to the S3 bucket.
  3. The FM defined in the Terraform configuration is used as the source for the custom model training job.
  4. The custom model training job uses the fine-tuning training data stored in the S3 bucket to enrich the FM. Amazon Bedrock is able to access the data in the S3 bucket (including output data) due to the AWS Identity and Access Management (IAM) role defined in the Terraform configuration, which grants access to the S3 bucket.
  5. When the custom model training job is complete, the new custom model is available for use.

The high-level steps to implement this solution are as follows:

  1. Create and initialize a Terraform project.
  2. Create data sources for context lookup.
  3. Create an S3 bucket to store training, validation, and output data.
  4. Create an IAM service role that allows Amazon Bedrock to run a model customization job, access your training and validation data, and write your output data to your S3 bucket.
  5. Configure your local Python virtual environment.
  6. Download the DialogSum public dataset and convert it to JSONL.
  7. Upload the converted dataset to Amazon S3.
  8. Create an Amazon Bedrock custom model using fine-tuning.
  9. Configure custom model Provisioned Throughput for your models.

Prerequisites

This solution requires the following prerequisites:

Create and initialize a Terraform project

Complete the following steps to create a new Terraform project and initialize it. You can work in a local folder of your choosing.

  1. In your preferred terminal, create a new folder named bedrockcm and change to that folder:
    1. If on Windows, use the following code:
      md bedrockcm
      cd bedrockcm

    2. If on Mac or Linux, use the following code:
      mkdir bedrockcm
      cd bedrockcm

Now you can work in a text editor and enter in code.

  1. In your preferred text editor, add a new file with the following Terraform code:
terraform {
  required_version = ">= 1.0.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 5.35.0"
    }
  }
}
  1. Save the file in the root of the bedrockcm folder and name it main.tf.
  2. In your terminal, run the following command to initialize the Terraform working directory:
terraform init

The output will contain a successful message like the following:

“Terraform has been successfully initialized”

  1. In your terminal, validate the syntax for your Terraform files:
terraform validate

Create data sources for context lookup

The next step is to add configurations that define data sources that look up information about the context Terraform is currently operating in. These data sources are used when defining the IAM role and policies and when creating the S3 bucket. More information can be found in the Terraform documentation for aws_caller_identity, aws_partition, and aws_region.

  1. In your text editor, add the following Terraform code to your main.tf file:
# Data sources to query the current context Terraform is operating in
data "aws_caller_identity" "current" {}
data "aws_partition" "current" {}
data "aws_region" "current" {}
  1. Save the file.

Create an S3 bucket

In this step, you use Terraform to create an S3 bucket to use during model customization and associated outputs. S3 bucket names are globally unique, so you use the Terraform data source aws_caller_identity, which allows you to look up the current AWS account ID, and use string interpolation to include the account ID in the bucket name. Complete the following steps:

  1. Add the following Terraform code to your main.tf file:
# Create a S3 bucket
resource "aws_s3_bucket" "model_training" {
  bucket = "model-training-${data.aws_caller_identity.current.account_id}"
}
  1. Save the file.

Create an IAM service role for Amazon Bedrock

Now you create the service role that Amazon Bedrock will assume to operate the model customization jobs.

You first create a policy document, assume_role_policy, which defines the trust relationship for the IAM role. The policy allows the bedrock.amazonaws.com service to assume this role. You use global condition context keys for cross-service confused deputy prevention. There are also two conditions you specify: the source account must match the current account, and the source ARN must be an Amazon Bedrock model customization job operating from the current partition, AWS Region, and current account.

Complete the following steps:

  1. Add the following Terraform code to your main.tf file:
# Create a policy document to allow Bedrock to assume the role
data "aws_iam_policy_document" "assume_role_policy" {
  statement {
    actions = ["sts:AssumeRole"]
    effect  = "Allow"
    principals {
      type        = "Service"
      identifiers = ["bedrock.amazonaws.com"]
    }
    condition {
      test     = "StringEquals"
      variable = "aws:SourceAccount"
      values   = [data.aws_caller_identity.current.account_id]
    }
    condition {
      test     = "ArnEquals"
      variable = "aws:SourceArn"
      values   = ["arn:${data.aws_partition.current.partition}:bedrock:${data.aws_region.current.name}:${data.aws_caller_identity.current.account_id}:model-customization-job/*"]
    }
  }
}

The second policy document, bedrock_custom_policy, defines permissions for accessing the S3 bucket you created for model training, validation, and output. The policy allows the actions GetObject, PutObject, and ListBucket on the resources specified, which are the ARN of the model_training S3 bucket and all of the buckets contents. You will then create an aws_iam_policy resource, which creates the policy in AWS.

  1. Add the following Terraform code to your main.tf file:
# Create a policy document to allow Bedrock to access the S3 bucket
data "aws_iam_policy_document" "bedrock_custom_policy" {
  statement {
    sid       = "AllowS3Access"
    actions   = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
    resources = [aws_s3_bucket.model_training.arn, "${aws_s3_bucket.model_training.arn}/*"]
  }
}

resource "aws_iam_policy" "bedrock_custom_policy" {
  name_prefix = "BedrockCM-"
  description = "Policy for Bedrock Custom Models customization jobs"
  policy      = data.aws_iam_policy_document.bedrock_custom_policy.json
}

Finally, the aws_iam_role resource, bedrock_custom_role, creates an IAM role with a name prefix of BedrockCM- and a description. The role uses assume_role_policy as its trust policy and bedrock_custom_policy as a managed policy to allow the actions specified.

  1. Add the following Terraform code to your main.tf file:
# Create a role for Bedrock to assume
resource "aws_iam_role" "bedrock_custom_role" {
  name_prefix = "BedrockCM-"
  description = "Role for Bedrock Custom Models customization jobs"

  assume_role_policy  = data.aws_iam_policy_document.assume_role_policy.json
  managed_policy_arns = [aws_iam_policy.bedrock_custom_policy.arn]
}
  1. Save the file.

Configure your local Python virtual environment

Python supports creating lightweight virtual environments, each with their own independent set of Python packages installed. You create and activate a virtual environment, and then install the datasets package.

  1. In your terminal, in the root of the bedrockcm folder, run the following command to create a virtual environment:
python3 -m venv venv
  1. Activate the virtual environment:
    1. If on Windows, use the following command:
      venvScriptsactivate

    2. If on Mac or Linux, use the following command:
      source venv/bin/activate

Now you install the datasets package via pip.

  1. In your terminal, run the following command to install the datasets package:
pip3 install datasets

Download the public dataset

You now use Terraform’s local-exec provisioner to invoke a local Python script that will download the public dataset DialogSum from the Hugging Face Hub. The dataset is already divided into training, validation, and testing splits. This example uses just the training split.

You prepare the data for training by removing the id and topic columns, renaming the dialogue and summary columns, and truncating the dataset to 10,000 records. You then save the dataset in JSONL format. You could also use your own internal private datasets; we use a public dataset for example purposes.

You first create the local Python script named dialogsum-dataset-finetune.py, which is used to download the dataset and save it to disk.

  1. In your text editor, add a new file with the following Python code:
import pandas as pd
from datasets import load_dataset

# Load the dataset from the huggingface hub
dataset = load_dataset("knkarthick/dialogsum")

# Convert the dataset to a pandas DataFrame
dft = dataset['train'].to_pandas()

# Drop the columns that are not required for fine-tuning
dft = dft.drop(columns=['id', 'topic'])

# Rename the columns to prompt and completion as required for fine-tuning.
# Ref: https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-prereq.html#model-customization-prepare
dft = dft.rename(columns={"dialogue": "prompt", "summary": "completion"})

# Limit the number of rows to 10,000 for fine-tuning
dft = dft.sample(10000,
    random_state=42)

# Save DataFrame as a JSONL file, with each line as a JSON object
dft.to_json('dialogsum-train-finetune.jsonl', orient='records', lines=True)
  1. Save the file in the root of the bedrockcm folder and name it dialogsum-dataset-finetune.py.

Next, you edit the main.tf file you have been working in and add the terraform_data resource type, uses a local provisioner to invoke your Python script.

  1. In your text editor, edit the main.tf file and add the following Terraform code:
resource "terraform_data" "training_data_fine_tune_v1" {
  input = "dialogsum-train-finetune.jsonl"

  provisioner "local-exec" {
    command = "python dialogsum-dataset-finetune.py"
  }
}

Upload the converted dataset to Amazon S3

Terraform provides the aws_s3_object resource type, which allows you to create and manage objects in S3 buckets. In this step, you reference the S3 bucket you created earlier and the terraform_data resource’s output attribute. This output attribute is how you instruct the Terraform resource graph that these resources need to be created with a dependency order.

  1. In your text editor, edit the main.tf file and add the following Terraform code:
resource "aws_s3_object" "v1_training_fine_tune" {
  bucket = aws_s3_bucket.model_training.id
  key    = "training_data_v1/${terraform_data.training_data_fine_tune_v1.output}"
  source = terraform_data.training_data_fine_tune_v1.output
}

Create an Amazon Bedrock custom model using fine-tuning

Amazon Bedrock has multiple FMs that support customization with fine-tuning. To see a list of the models available, use the following AWS Command Line Interface (AWS CLI) command:

  1. In your terminal, run the following command to list the FMs that support customization by fine-tuning:
aws bedrock list-foundation-models --by-customization-type FINE_TUNING

You use the Cohere Command-Light FM for this model customization. You add a Terraform data source to query the foundation model ARN using the model name. You then create the Terraform resource definition for aws_bedrock_custom_model, which creates a model customization job, and immediately returns.

The time it takes for model customization is non-deterministic, and is based on the input parameters, model used, and other factors.

  1. In your text editor, edit the main.tf file and add the following Terraform code:
data "aws_bedrock_foundation_model" "cohere_command_light_text_v14" {
  model_id = "cohere.command-light-text-v14:7:4k"
}

resource "aws_bedrock_custom_model" "cm_cohere_v1" {
  custom_model_name     = "cm_cohere_v001"
  job_name              = "cm.command-light-text-v14.v001"
  base_model_identifier = data.aws_bedrock_foundation_model.cohere_command_light_text_v14.model_arn
  role_arn              = aws_iam_role.bedrock_custom_role.arn
  customization_type    = "FINE_TUNING"

  hyperparameters = {
    "epochCount"             = "1"
    "batchSize"              = "8"
    "learningRate"           = "0.00001"
    "earlyStoppingPatience"  = "6"
    "earlyStoppingThreshold" = "0.01"
    "evalPercentage"         = "20.0"
  }

  output_data_config {
    s3_uri = "s3://${aws_s3_bucket.model_training.id}/output_data_v1/"
  }

  training_data_config {
    s3_uri = "s3://${aws_s3_bucket.model_training.id}/training_data_v1/${terraform_data.training_data_fine_tune_v1.output}"
  }
}
  1. Save the file.

Now you use Terraform to create the data sources and resources defined in your main.tf file, which will start a model customization job.

  1. In your terminal, run the following command to validate the syntax for your Terraform files:
terraform validate
  1. Run the following command to apply the configuration you created. Before creating the resources, Terraform will describe all the resources that will be created so you can verify your configuration:
terraform apply

Terraform will generate a plan and ask you to approve the actions, which will look similar to the following code:

...

Plan: 6 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value:
  1. Enter yes to approve the changes.

Terraform will now apply your configuration. This process runs for a few minutes. At this time, your custom model is not yet ready for use; it will be in a Training state. Wait for training to finish before continuing. You can review the status on the Amazon Bedrock console on the Custom models page.

Screenshot of Amazon Bedrock Console training a custom model

When the process is complete, you receive a message like the following:

Apply complete! Resources: 6 added, 0 changed, 0 destroyed.

You can also view the status on the Amazon Bedrock console.

Screenshot of Amazon Bedrock Console displaying a custom model training job in 'completed' status.

You have now created an Amazon Bedrock custom model using fine-tuning.

Configure custom model Provisioned Throughput

Amazon Bedrock allows you to run inference on custom models by purchasing Provisioned Throughput. This guarantees a consistent level of throughput in exchange for a term commitment. You specify the number of model units needed to meet your application’s performance needs. For evaluating custom models initially, you can purchase Provisioned Throughput hourly (on-demand) with no long-term commitment. With no commitment, a quota of one model unit is available per Provisioned Throughput.

You create a new resource for Provisioned Throughput, associate one of your custom models, and provide a name. You omit the commitment_duration attribute to use on-demand.

  1. In your text editor, edit the main.tf file and add the following Terraform code:
resource "aws_bedrock_provisioned_model_throughput" "cm_cohere_provisioned_v1" {
  provisioned_model_name = "${aws_bedrock_custom_model.cm_cohere_v1.custom_model_name}-provisioned"
  model_arn              = aws_bedrock_custom_model.cm_cohere_v1.custom_model_arn
  model_units            = 1 
}
  1. Save the file.

Now you use Terraform to create the resources defined in your main.tf file.

  1. In your terminal, run the following command to re-initialize the Terraform working directory:
terraform init

The output will contain a successful message like the following:

“Terraform has been successfully initialized”
  1. Validate the syntax for your Terraform files:
terraform validate
  1. Run the following command to apply the configuration you created:
terraform apply

Best practices and considerations

Note the following best practices when using this solution:

  • Data and model versioning – You can version your datasets and models by using version identifiers in your S3 bucket prefixes. This allows you to compare model efficacy and outputs. You could even operate a new model in a shadow deployment so that your team can evaluate the output relative to your models being used in production.
  • Data privacy and network security – With Amazon Bedrock, you are in control of your data, and all your inputs and customizations remain private to your AWS account. Your data, such as prompts, completions, custom models, and data used for fine-tuning or continued pre-training, is not used for service improvement and is never shared with third-party model providers. Your data remains in the Region where the API call is processed. All data is encrypted in transit and at rest. You can use AWS PrivateLink to create a private connection between your VPC and Amazon Bedrock.
  • Billing – Amazon Bedrock charges for model customization, storage, and inference. Model customization is charged per tokens processed. This is the number of tokens in the training dataset multiplied by the number of training epochs. An epoch is one full pass through the training data during customization. Model storage is charged per month, per model. Inference is charged hourly per model unit using Provisioned Throughput. For detailed pricing information, see Amazon Bedrock Pricing.
  • Custom models and Provisioned Throughput – Amazon Bedrock allows you to run inference on custom models by purchasing Provisioned Throughput. This guarantees a consistent level of throughput in exchange for a term commitment. You specify the number of model units needed to meet your application’s performance needs. For evaluating custom models initially, you can purchase Provisioned Throughput hourly with no long-term commitment. With no commitment, a quota of one model unit is available per Provisioned Throughput. You can create up to two Provisioned Throughputs per account.
  • Availability – Fine-tuning support on Meta Llama 2, Cohere Command Light, and Amazon Titan Text FMs is available today in Regions US East (N. Virginia) and US West (Oregon). Continued pre-training is available today in public preview in Regions US East (N. Virginia) and US West (Oregon). To learn more, visit the Amazon Bedrock Developer Experience and check out Custom models.

Clean up

When you no longer need the resources created as part of this post, clean up those resources to save associated costs. You can clean up the AWS resources created in this post using Terraform with the terraform destroy command.

First, you need to modify the configuration of the S3 bucket in the main.tf file to enable force destroy so the contents of the bucket will be deleted, so the bucket itself can be deleted. This will remove all of the sample data contained in the S3 bucket as well as the bucket itself. Make sure there is no data you want to retain in the bucket before proceeding.

  1. Modify the declaration of your S3 bucket to set the force_destroy attribute of the S3 bucket:
# Create a S3 bucket
resource "aws_s3_bucket" "model_training" {
  bucket = "model-training-${data.aws_caller_identity.current.account_id}"
  force_destroy = true
}
  1. Run the terraform apply command to update the S3 bucket with this new configuration:
terraform apply
  1. Run the terraform destroy command to delete all resources created as part of this post:
terraform destroy

Conclusion

In this post, we demonstrated how to create Amazon Bedrock custom models using Terraform. We introduced GitOps to manage model configuration and data associated with your custom models.

We recommend testing the code and examples in your development environment, and making appropriate changes as required to use them in production. Consider your model consumption requirements when defining your Provisioned Throughput.

We welcome your feedback! If you have questions or suggestions, leave them in the comments section.


About the Authors

Josh Famestad is a Solutions Architect at AWS helping public sector customers accelerate growth, add agility, and reduce risk with cloud-based solutions.

Kevon Mayers is a Solutions Architect at AWS. Kevon is a Core Contributor for Terraform and has led multiple Terraform initiatives within AWS. Prior to joining AWS, he was working as a DevOps engineer and developer, and before that was working with the GRAMMYs/The Recording Academy as a studio manager, music producer, and audio engineer.

Tyler Lynch is a Principal Solution Architect at AWS. Tyler leads Terraform provider engineering at AWS and is a Core Contributor for Terraform.

Read More

Boost productivity with video conferencing transcripts and summaries with the Amazon Chime SDK Meeting Summarizer solution

Boost productivity with video conferencing transcripts and summaries with the Amazon Chime SDK Meeting Summarizer solution

Businesses today heavily rely on video conferencing platforms for effective communication, collaboration, and decision-making. However, despite the convenience these platforms offer, there are persistent challenges in seamlessly integrating them into existing workflows. One of the major pain points is the lack of comprehensive tools to automate the process of joining meetings, recording discussions, and extracting actionable insights from them. This gap results in inefficiencies, missed opportunities, and limited productivity, hindering the seamless flow of information and decision-making processes within organizations.

To address this challenge, we’ve developed the Amazon Chime SDK Meeting Summarizer application deployed with the Amazon Cloud Development Kit (AWS CDK). This application uses an Amazon Chime SDK SIP media application, Amazon Transcribe, and Amazon Bedrock to seamlessly join meetings, record meeting audio, and process recordings for transcription and summarization. By integrating these services programmatically through the AWS CDK, we aim to streamline the meeting workflow, empower users with actionable insights, and drive better decision-making outcomes. Our solution currently integrates with popular platforms such as Amazon Chime, Zoom, Cisco Webex, Microsoft Teams, and Google Meet.

In addition to deploying the solution, we’ll also teach you the intricacies of prompt engineering in this post. We guide you through addressing parsing and information extraction challenges, including speaker diarization, call scheduling, summarization, and transcript cleaning. Through detailed instructions and structured approaches tailored to each use case, we illustrate the effectiveness of Amazon Bedrock, powered by Anthropic Claude models.

Solution overview

The following infrastructure diagram provides an overview of the AWS services that are used to create this meeting summarization bot. The core services used in this solution are:

  • An Amazon Chime SDK SIP Media Application is used to dial into the meeting and record meeting audio
  • Amazon Transcribe is used to perform speech-to-text processing of the recorded audio, including speaker diarization
  • Anthropic Claude models in Amazon Bedrock are used to identify names, improve the quality of the transcript, and provide a detailed summary of the meeting

For a detailed explanation of the solution, refer to the Amazon Chime Meeting Summarizer documentation.

Prerequisites

Before diving into the project setup, make sure you have the following requirements in place:

  • Yarn – Yarn must be installed on your machine.
  • AWS account – You’ll need an active AWS account.
  • Enable Claude Anthropic models – These models should be enabled in your AWS account. For more information, see Model access.
  • Enable Amazon Titan – Amazon Titan should be activated in your AWS account. For more information, see Amazon Titan Models.

Refer to our GitHub repository for a step-by-step guide on deploying this solution.

Access the meeting with Amazon Chime SDK

To capture the audio of a meeting, an Amazon Chime SDK SIP media application will dial into the meeting using the meeting provider’s dial-in number. The Amazon Chime SDK SIP media application (SMA) is a programable telephony service that will make a phone call over the public switched telephone network (PSTN) and capture the audio. SMA uses a request/response model with an AWS Lambda function to process actions. In this demo, an outbound call is made using the CreateSipMediaApplicationCall API. This will cause the Lambda function to be invoked with a NEW_OUTBOUND_CALL event.

Because most dial-in mechanisms for meetings require a PIN or other identification to be made, the SMA will use the SendDigits action to send these dual tone multi-frequency (DTMF) digits to the meeting provider. When the application has joined the meeting, it will introduce itself using the Speak action and then record the audio using the RecordAudio action. This audio will be saved in MP3 format and saved to an Amazon Simple Storage Service (Amazon S3) bucket.

Speaker diarization with Amazon Transcribe

Because the SMA is joined to the meeting as a participant, the audio will be a single channel of all the participants. To process this audio file, Amazon Transcribe will be used with the ShowSpeakerLabels setting:

const response = await transcribeClient.send(
    new StartTranscriptionJobCommand({
        TranscriptionJobName: jobName,
        IdentifyLanguage: true,
        MediaFormat: 'wav',
        Media: {
            MediaFileUri: audioSource,
        },
        Settings: {
            ShowSpeakerLabels: true,
            MaxSpeakerLabels: 10,
        },
        OutputBucketName: `${BUCKET}`,
        OutputKey: `${PREFIX_TRANSCRIBE_S3}/`,
    }),
);

With speaker diarization, Amazon Transcribe will distinguish different speakers in the transcription output. The JSON file that is produced will include the transcripts and items along with speaker labels grouped by speaker with start and end timestamps. With this information provided by Amazon Transcribe, a turn-by-turn transcription can be generated by parsing through the JSON. The result will be a more readable transcription. See the following example:

spk_0: Hey Court , how’s it going ?
spk_1: Hey Adam, it’s going good . How are you
spk_0: doing ? Well , uh hey , thanks for uh joining me today on this call . I’m excited to talk to you about uh architecting on Aws .
spk_1: Awesome . Yeah , thank you for inviting me . So ,
spk_0: uh can you tell me a little bit about uh the servers you’re currently using on premises ?
spk_1: Yeah . So for our servers , we currently have four web servers running with four gigabyte of RA M and two CP US and we’re currently running Linux as our operating system .
spk_0: Ok . And what are you using for your database ?
spk_1: Oh , yeah , for a database , we currently have a 200 gigabyte database running my as to will and I can’t remember the version . But um the thing about our database is sometimes it lags . So we’re really looking for a faster option for
spk_0: that . So , um when you’re , you’re noticing lags with reads or with rights to the database
spk_1: with , with reads .
spk_0: Yeah . Ok . Have you heard of uh read replicas ?
spk_1: I have not .
spk_0: Ok . That could be a potential uh solution to your problem . Um If you don’t mind , I’ll go ahead and just uh send you some information on that later for you and your team to review .
spk_1: Oh , yeah , thank you , Adam . Yeah . Anything that could help . Yeah , be really helpful .
spk_0: Ok , last question before I let you go . Um what are you doing uh to improve the security of your on premises ? Uh data ?
spk_1: Yeah , so , so number one , we have been experiencing some esto injection attacks . We do have a Palo Alto firewall , but we’re not able to fully achieve layer server protection . So we do need a better option for that .
spk_0: Have you ex have you experienced uh my sequel attacks in the past or sequel injections ?
spk_1: Yes.
spk_0: Ok , great . Uh All right . And then are you guys bound by any compliance standards like PC I DS S or um you know GDR ? Uh what’s another one ? Any of those C just uh
spk_1: we are bound by fate , moderate complaints . So .
spk_0: Ok . Well , you have to transition to fed ramp high at any time .
spk_1: Uh Not in the near future . No .
spk_0: Ok . All right , Court . Well , thanks for that uh context . I will be reaching out to you again here soon with a follow up email and uh yeah , I’m looking forward to talking to you again uh next week .
spk_1: Ok . Sounds good . Thank you , Adam for your help .
spk_0: All right . Ok , bye .
spk_1: All right . Take care .

Here, speakers have been identified based on the order they spoke. Next, we show you how to further enhance this transcription by identifying speakers using their names, rather than spk_0, spk_1, and so on.

Use Anthropic Claude and Amazon Bedrock to enhance the solution

This application uses a large language model (LLM) to complete the following tasks:

  • Speaker name identification
  • Transcript cleaning
  • Call summarization
  • Meeting invitation parsing

Prompt engineering for speaker name identification

The first task is to enhance the transcription by assigning names to speaker labels. These names are extracted from the transcript itself when a person introduces themselves and then are returned as output in JSON format by the LLM.

Special instructions are provided for cases where only one speaker is identified to provide consistency in the response structure. By following these instructions, the LLM will process the meeting transcripts and accurately extract the names of the speakers without additional words or spacing in the response. If no names are identified by the LLM, we prompted the model to return an Unknown tag.

In this demo, the prompts are designed using Anthropic Claude Sonnet as the LLM. You may need to tune the prompts to modify the solution to use another available model on Amazon Bedrock.

Human: You are a meeting transcript names extractor. Go over the transcript and extract the names from it. Use the following instructions in the <instructions></instructions> xml tags
<transcript> ${transcript} </transcript>
<instructions>
– Extract the names like this example – spk_0: “name1”, spk_1: “name2”.
– Only extract the names like the example above and do not add any other words to your response
– Your response should only have a list of “speakers” and their associated name separated by a “:” surrounded by {}
– if there is only one speaker identified then surround your answer with {}
– the format should look like this {“spk_0” : “Name”, “spk_1: “Name2”, etc.}, no unnecessary spacing should be added
</instructions>

Assistant: Should I add anything else in my answer?

Human: Only return a JSON formatted response with the Name and the speaker label associated to it. Do not add any other words to your answer. Do NOT EVER add any introductory sentences in your answer. Only give the names of the speakers actively speaking in the meeting. Only give the names of the speakers actively speaking in the meeting in the format shown above.

Assistant:

After the speakers are identified and returned in JSON format, we can replace the generic speaker labels with the name attributed speaker labels in the transcript. The result will be a more enhanced transcription:

Adam: Hey Court , how’s it going ?
Court: Hey Adam , it’s going good . How are you
Adam: doing ? Well , uh hey , thanks for uh joining me today on this call . I’m excited to talk to you about uh architecting on Aws .
Court: Awesome . Yeah , thank you for inviting me . So ,
Adam: uh can you tell me a little bit about uh the servers you’re currently using on premises ?
Court: Yeah . So for our servers , we currently have four web servers running with four gigabyte of RA M and two CP US and we’re currently running Linux as our operating system .

… transcript continues….

But what if a speaker can’t be identified because they never introduced themselves. In that case, we want to let the LLM leave them as unknown, rather than try to force or a hallucinate a label.

We can add the following to the instructions:

If no name is found for a speaker, use UNKNOWN_X where X is the speaker label number

The following transcript has three speakers, but only two identified speakers. The LLM must label a speaker as UNKNOWN rather than forcing a name or other response on the speaker.

spk_0: Yeah .
spk_1: Uh Thank you for joining us Court . I am your account executive here at Aws . Uh Joining us on the call is Adam , one of our solutions architect at Aws . Adam would like to introduce yourself .
spk_2: Hey , everybody . High Court . Good to meet you . Uh As your dedicated solutions architect here at Aws . Uh my job is to help uh you at every step of your cloud journey . Uh My , my role involves helping you understand architecting on Aws , including best practices for the cloud . Uh So with that , I’d love it if you could just take a minute to introduce yourself and maybe tell me a little about what you’re currently running on premises .
spk_0: Yeah , great . It’s uh great to meet you , Adam . Um My name is Court . I am the V P of engineering here at any company . And uh yeah , really excited to hear what you can do for us .
spk_1: Thanks , work . Well , we , we talked a little bit of last time about your goals for migrating to Aws . I invited Adam to , to join us today to get a better understanding of your current infrastructure and other technical requirements .
spk_2: Yeah . So co could you tell me a little bit about what you’re currently running on premises ?
spk_1: Sure . Yeah , we’re , uh ,
spk_0: we’re running a three tier web app on premise .

When we give Claude Sonnet the option to not force a name , we see results like this:

{“spk_0”: “Court”, “spk_1”: “UNKNOWN_1”, “spk_2”: “Adam”}

Prompt engineering for cleaning the transcript

Now that the transcript has been diarized with speaker attributed names, we can use Amazon Bedrock to clean the transcript. Cleaning tasks include eliminating distracting filler words, identifying and rectifying misattributed homophones, and addressing any diarization errors stemming from subpar audio quality. For guidance on accomplishing these tasks using Anthropic Claude Sonnet models, see the provided prompt:

Human: You are a transcript editor, please follow the <instructions> tags.
<transcript> ${transcript} </transcript>
<instructions>
– The <transcript> contains a speaker diarized transcript
– Go over the transcript and remove all filler words. For example “um, uh, er, well, like, you know, okay, so, actually, basically, honestly, anyway, literally, right, I mean.”
– Fix any errors in transcription that may be caused by homophones based on the context of the sentence. For example, “one instead of won” or “high instead of hi”
– In addition, please fix the transcript in cases where diarization is improperly performed. For example, in some cases you will see that sentences are split between two speakers. In this case infer who the actual speaker is and attribute it to them.

– Please review the following example of this,

Input Example
Court: Adam you are saying the wrong thing. What
Adam: um do you mean, Court?

Output:
Court: Adam you are saying the wrong thing.
Adam: What do you mean, Court?

– In your response, return the entire cleaned transcript, including all of the filler word removal and the improved diarization. Only return the transcript, do not include any leading or trailing sentences. You are not summarizing. You are cleaning the transcript. Do not include any xml tags <>
</instructions>
Assistant:

Transcript Processing

After the initial transcript is passed into the LLM, it returns a polished transcript, free from errors. The following are excerpts from the transcript:

Speaker Identification

Input:

spk_0: Hey Court , how’s it going ?
spk_1: Hey Adam, it’s going good . How are you
spk_0: doing ? Well , uh hey , thanks for uh joining me today on this call . I’m excited to talk to you about uh architecting on Aws .

Output:

Adam: Hey Court, how’s it going?
Court: Hey Adam, it’s going good. How are you?
Adam: Thanks for joining me today. I’m excited to talk to you about architecting on AWS

Homophone Replacement

Input:

Adam: Have you ex have you experienced uh my sequel attacks in the past or sequel injections ?
Court: Yes .

Output:

Adam: Have you experienced SQL injections in the past?
Court: Yes.

Filler Word Removal

Input:

Adam: Ok , great . Uh All right . And then are you guys bound by any compliance standards like PC I DS S or um you know GDR ? Uh what’s another one ? Any of those C just uh
Court: we are bound by fate , moderate complaints . So .
Adam: Ok . Well , you have to transition to fed ramp high at any time

Output:

Adam: Ok, great. And then are you guys bound by any compliance standards like PCI DSS or GDPR? What’s another one? Any of those? CJIS?
Court: We are bound by FedRAMP moderate compliance.
Adam: Ok. Will you have to transition to FedRAMP high at any time?

Prompt engineering for summarization

Now that the transcript has been created by Amazon Transcribe, diarized, and enhanced with Amazon Bedrock, the transcript can be summarized with Amazon Bedrock and an LLM.  A simple version might look like the following:

Human:
You are a transcript summarizing bot. You will go over the transcript below and provide a summary of the transcript.
Transcript: ${transcript}

Assistant:

Although this will work, it can be improved with additional instructions. You can use XML tags to provide structure and clarity to the task requirements:

Human: You are a transcript summarizing bot. You will go over the transcript below and provide a summary of the content within the <instructions> tags.
<transcript> ${transcript} </transcript>

<instructions>
– Go over the conversation that was had in the transcript.
– Create a summary based on what occurred in the meeting.
– Highlight specific action items that came up in the meeting, including follow-up tasks for each person.
</instructions>

Assistant:

Because meetings often involve action items and date/time information, instructions are added to explicitly request the LLM to include this information. Because the LLM knows the speaker names, each person is assigned specific action items for them if any are found. To prevent hallucinations, an instruction is included that allows the LLM to fail gracefully.

Human: You are a transcript summarizing bot. You will go over the transcript below and provide a summary of the content within the <instructions> tags.

<transcript> ${transcript} </transcript>

<instructions>
– Go over the conversation that was had in the transcript.
– Create a summary based on what occurred in the meeting.
– Highlight specific action items that came up in the meeting, including follow-up tasks for each person.
– If relevant, focus on what specific AWS services were mentioned during the conversation.
– If there’s sufficient context, infer the speaker’s role and mention it in the summary. For instance, “Bob, the customer/designer/sales rep/…”
– Include important date/time, and indicate urgency if necessary (e.g., deadline/ETAs for outcomes and next steps)
</instructions>

Assistant: Should I add anything else in my answer?

Human: If there is not enough context to generate a proper summary, then just return a string that says “Meeting not long enough to generate a transcript.

Assistant:

Alternatively, we invite you to explore Amazon Transcribe Call Analytics generative call summarization for an out-of-the-box solution that integrates directly with Amazon Transcribe.

Prompt engineering for meeting invitation parsing

Included in this demo is a React-based UI that will start the process of the SMA joining the meeting. Because this demo supports multiple meeting types, the invitation must be parsed. Rather than parse this with a complex regular expression (regex), it will be processed with an LLM. This prompt will first identify the meeting type: Amazon Chime, Zoom, Google Meet, Microsoft Teams, or Cisco Webex. Based on the meeting type, the LLM will extract the meeting ID and dial-in information. Simply copy/paste the meeting invitation to the UI, and the invitation will be processed by the LLM to determine how to call the meeting provider. This can be done for a meeting that is currently happening or scheduled for a future meeting. See the following example:

Human: You are an information extracting bot. Go over the meeting invitation below and determine what the meeting id and meeting type are <instructions></instructions> xml tags

<meeting_invitation>
${meetingInvitation}
</meeting_invitation>

<instructions>
1. Identify Meeting Type:
Determine if the meeting invitation is for Chime, Zoom, Google, Microsoft Teams, or Webex meetings.

2. Chime, Zoom, and Webex
– Find the meetingID
– Remove all spaces from the meeting ID (e.g., #### ## #### -> ##########).

3. If Google – Instructions Extract Meeting ID and Dial in
– For Google only, the meeting invitation will call a meetingID a ‘pin’, so treat it as a meetingID
– Remove all spaces from the meeting ID (e.g., #### ## #### -> ##########).
– Extract Google and Microsoft Dial-In Number (if applicable):
– If the meeting is a Google meeting, extract the unique dial-in number.
– Locate the dial-in number following the text “to join by phone dial.”
– Format the extracted Google dial-in number as (+1 ###-###-####), removing dashes and spaces. For example +1 111-111-1111 would become +11111111111)

4. If Microsoft Teams – Instructions if meeting type is Microsoft Teams.
– Pay attention to these instructions carefully
– The meetingId we want to store in the generated response is the ‘Phone Conference ID’ : ### ### ###
– in the meeting invitation, there are two IDs a ‘Meeting ID’ (### ### ### ##) and a ‘Phone Conference ID’ (### ### ###), ignore the ‘Meeting ID’ use the ‘Phone Conference ID’
– The meetingId we want to store in the generated response is the ‘Phone Conference ID’ : ### ### ###
– The meetingID that we want is referenced as the ‘Phone Conference ID’ store that one as the meeting ID.
– Find the phone number, extract it and store it as the dialIn number (format (+1 ###-###-####), removing dashes and spaces. For example +1 111-111-1111 would become +11111111111)

5. meetingType rules
– The only valid responses for meetingType are ‘Chime’, ‘Webex’, ‘Zoom’, ‘Google’, ‘Teams’

6. Generate Response:
– Create a response object with the following format:
{
meetingId: “meeting id goes here with spaces removed”,
meetingType: “meeting type goes here (options: ‘Chime’, ‘Webex’, ‘Zoom’, ‘Google’, ‘Teams’)”,
dialIn: “Insert Google/Microsoft Teams Dial-In number with no dashes or spaces, or N/A if not a Google/Microsoft Teams Meeting”
}

Meeting ID Formats:
Zoom: ### #### ####
Webex: #### ### ####
Chime: #### ## ####
Google: ### ### ####
Teams: ### ### ###

Ensure that the program does not create fake phone numbers and only includes the Microsoft or Google dial-in number if the meeting type is Google or Teams.

</instructions>

Assistant: Should I add anything else in my answer?

Human: Only return a JSON formatted response with the meetingid and meeting type associated to it. Do not add any other words to your answer. Do not add any introductory sentences in your answer.

Assistant:

With this information extracted from the invitation, a call is placed to the meeting provider so that the SMA can join the meeting as a participant.

Clean up

If you deployed this sample solution, clean up your resources by destroying the AWS CDK application from the AWS Command Line Interface (AWS CLI). This can be done using the following command:

yarn cdk destroy

Conclusion

In this post, we showed how to enhance Amazon Transcribe with an LLM using Amazon Bedrock by extracting information that would otherwise be difficult for a regex to extract. We also used this method to extract information from a meeting invitation sent from an unknown source. Finally, we showed how to use an LLM to provide a summarization of the meeting using detailed instructions to produce action items and include date/time information in the response.

We invite you to deploy this demo into your own account. We’d love to hear from you. Let us know what you think in the issues forum of the Amazon Chime SDK Meeting Summarizer GitHub repository. Alternatively, we invite you to explore other methods for meeting transcription and summarization, such as Amazon Live Meeting Assistant, which uses a browser extension to collect call audio.


About the authors

Adam Neumiller is a Solutions Architect for AWS. He is focused on helping public sector customers drive cloud adoption through the use of infrastructure as code. Outside of work, he enjoys spending time with his family and exploring the great outdoors.

Court Schuett is a Principal Specialist SA – GenAI focused on third party models and how they can be used to help solve customer problems.  When he’s not coding, he spends his time exploring parks, traveling with his family, and listening to music.

Christopher Lott is a Principal Solutions Architect in the AWS AI Language Services team. He has 20 years of enterprise software development experience. Chris lives in Sacramento, California, and enjoys gardening, cooking, aerospace/general aviation, and traveling the world.

Hang Su is a Senior Applied Scientist at AWS AI. He has been leading AWS Transcribe Contact Lens Science team. His interest lies in call-center analytics, LLM-based abstractive summarization, and general conversational AI.

Jason Cai is an Applied Scientist at AWS AI. He has made contributions to AWS Bedrock, Contact Lens, Lex and Transcribe. His interests include LLM agents, dialogue summarization, LLM prediction refinement, and knowledge graph.

Edgar Costa Filho is a Senior Cloud Infrastructure Architect with a focus on Foundations and Containers, including expertise in integrating Amazon EKS with open-source tooling like Crossplane, Terraform, and GitOps. In his role, Edgar is dedicated to assisting customers in achieving their business objectives by implementing best practices in cloud infrastructure design and management.

Read More

SAP and NVIDIA Create AI for ‘The Most Valuable Language,’ CEOs Unveil at Sapphire Orlando

SAP and NVIDIA Create AI for ‘The Most Valuable Language,’ CEOs Unveil at Sapphire Orlando

German enterprise cloud leader SAP is harnessing generative AI and industrial digital twins in the development of next-generation enterprise applications for its customers.

At SAP’s Sapphire event today, in Orlando, Florida, NVIDIA and the enterprise software company unveiled generative AI efforts featuring NVIDIA AI Enterprise software to offer two new capabilities for Joule, SAP’s generative AI copilot — SAP Consulting capabilities and ABAP Developer capabilities.

To help streamline the selling and buying process of complex products, SAP is also integrating NVIDIA Omniverse Cloud application programming interfaces, or APIs, into its Intelligent Product Recommendation solution. This will enable salespeople to visualize 3D product digital twins directly in SAP Intelligent Product Recommendation. NVIDIA Omniverse, based on OpenUSD, lets users simulate how configurable equipment might physically fit into and operate in the real world, saving time and costs while improving efficiency and safety.

These new advances come after the companies announced AI-focused collaborations at GTC in March.

“One of the most amazing breakthroughs in large language models and generative AI is that we’ve managed to learn the representation of almost any language,” said NVIDIA founder and CEO Jensen Huang, speaking by video link at the SAP Sapphire conference. “In our company, the most valuable language is CUDA. At SAP, the most valuable language is ABAP.”

“We’ve created some amazing generative APIs that understand the language of SAP, that understand the language of supply chains, that understand the language of ERP systems, so that we have a specialist, an agent and AI to help us to do all the things that we want to do to manage our business,” said SAP CEO Christian Klein.

NVIDIA AI Helps to Accelerate Joule, SAP’s Generative AI Copilot  

Joule, which is embedded across SAP’s portfolio of cloud solutions and applications, allows users to work faster while gaining smarter insights grounded in business data. At SAP Sapphire, the company announced a new consulting capability for Joule leveraging NVIDIA NeMo Retriever microservices.

NVIDIA NeMo Retriever connects AI applications with proprietary data to drive retrieval-augmented generation, or RAG. This brings domain expertise and knowledge of the business to LLMs so that AI copilots and coding assistants can give more accurate and relevant responses.

With the new AI-powered SAP Consulting capabilities — connected to 200,000 pages of SAP learning content, help, product documentation, and community content — tens of thousands of SAP consultants can be more productive, while accessing the latest product and process details when working with end customers. Joule’s SAP Consulting capabilities are already being previewed by internal SAP consultants as well as leading system integrators.

NVIDIA-powered AI is also coming to assist the more than 5 million developers using SAP’s ABAP programming language. SAP’s custom model was trained on over 250 million lines of proprietary ABAP code using NVIDIA HGX H100 systems and will be deployed using NVIDIA NIM inference microservices for optimal runtime performance. Joule’s ABAP Developer capabilities can help developers generate, complete and explain code as well as create software unit tests, which allows them to spend more time building and deploying features rather than manually writing lines of code.

As announced at GTC 2024, this NVIDIA AI Enterprise software, including NeMo and NIM, is expected to be accessible in the generative AI hub in SAP AI Core for developers and customers to leverage.

Omniverse to Help Optimize Sales Processes for Complex Products

Using NVIDIA AI and Omniverse, SAP is building the future of connectivity and is reimagining key business processes, including the sales process for manufacturers.

SAP Intelligent Product Recommendation software utilizes generative AI to analyze requirements described in natural language and generate recommendations for the most suitable configurable products. With Omniverse Cloud APIs integrated with the application, users can then interact with physically accurate models of complex products in a digital twin of the environment where they will be installed. This lets sales teams generate quotes more quickly and helps sales representatives recommend the best solutions for their customers’ needs.

In addition, SAP Intelligent Product Recommendation can use NVIDIA AI to analyze text from multiple sources, including emails and notes captured in customer relationship management software or from product specifications documents. This lets the software generate  recommendations based on key, custom parameters, such as power requirements, lead times and expected carbon footprint.

Watch the replay of the Sapphire Orlando keynote here.

Read More

NVIDIA and Cisco Weave Fabric for Generative AI

NVIDIA and Cisco Weave Fabric for Generative AI

Building and deploying AI applications at scale requires a new class of computing infrastructure — one that can handle the massive amounts of data, compute power and networking bandwidth needed by generative AI models.

To better ensure these models perform optimally and efficiently, NVIDIA is teaming with Cisco to enable enterprise generative AI infrastructure.

Cisco’s new Nexus HyperFabric AI cluster solution, developed in collaboration with NVIDIA, provides a path for enterprises to operationalize generative AI. Cisco HyperFabric is an enterprise-ready, end-to-end infrastructure solution to scale generative AI workloads. It combines NVIDIA accelerated computing and AI software with Cisco AI-native networking and a robust VAST Data Platform.

“Enterprise applications are transforming into generative AI applications, significantly increasing data processing requirements and overall infrastructure complexity,” said Kevin Wollenweber, senior vice president and general manager of data center and provider connectivity at Cisco. “Together, Cisco and NVIDIA are advancing HyperFabric to advance generative AI for the world’s enterprises so they can use their data and domain expertise to transform productivity and insight.”

Powering a Full-Stack AI Fabric

Foundational to the solution are NVIDIA Tensor Core GPUs, which provide the accelerated computing needed to process massive datasets. The solution utilizes NVIDIA AI Enterprise, a cloud-native software platform that acts as the operating system for enterprise AI. NVIDIA AI Enterprise streamlines the development and deployment of production-grade AI copilots and other generative AI applications, ensuring optimized performance, security and application programming interface stability.

Included with NVIDIA AI Enterprise, NVIDIA NIM inference microservices accelerate the deployment of foundation models while ensuring data security. NIM microservices are designed to bridge the gap between complex AI development and enterprise operational needs. As organizations across various industries embark on their AI journeys, the combination of NVIDIA NIM and the Cisco Nexus HyperFabric AI cluster supports the entire process, from ideation to development and deployment of production-scale AI applications.

The Cisco Nexus HyperFabric AI cluster solution integrates NVIDIA Tensor Core GPUs and NVIDIA BlueField-3 SuperNICs and DPUs to enhance system performance and security. The SuperNICs offer advanced network capabilities, ensuring seamless, high-speed connectivity across the infrastructure. BlueField-3 DPUs offload, accelerate and isolate the infrastructure services, creating a more efficient AI solution.

BlueField-3 DPUs can also run security services like the Cisco Hypershield solution. It enables an AI-native, hyperdistributed security architecture, where security shifts closer to the workloads needing protection. Cisco Hypershield is another notable area of collaboration between the companies, focusing on creating AI-powered security solutions.

Join NVIDIA at Cisco Live

Learn more about how Cisco and NVIDIA power generative AI at Cisco Live — running through June 6 in Las Vegas — where the companies will showcase NVIDIA AI technologies at the Cisco AI Hub and share best practices for enterprises to get started with AI.

Attend these sessions to discover how to accelerate generative AI with NVIDIA, Cisco and other ecosystem partners:

  • Keynote Deep Dive: “Harness a Bold New Era: Transform Data Center and Service Provider Connectivity” with NVIDIA’s Kevin Deierling and Cisco’s Jonathan Davidson, Kevin Wollenweber, Jeremy Foster and Bill Gartner — Wednesday, June 5, from 1-2 p.m. PT
  • AI Hub Theater Presentation: “Accelerate, Deploy Generative AI Anywhere With NVIDIA Inference Microservices” with Marty Jain, vice president of sales and business development at NVIDIA — Tuesday, June 4, from 2:15-2:45 p.m. PT
  • WWT AI Hub Booth: Thought leadership interview with NVIDIA’s Jain and WWT Vice President of Cloud, Infrastructure and AI Solutions Neil Anderson — Wednesday, June 5, from 10-11 a.m. PT
  • NetApp Theater: “Accelerating Gen AI With NVIDIA Inference Microservices on FlexPod” with Sicong Ji, strategic platforms and solutions lead at NVIDIA — Wednesday, June 5, from 1:30-1:40 p.m. PT
  • Pure Storage Theater: “Accelerating Gen AI With NVIDIA Inference Microservices on FlashStack” with Joslyn Shakur, sales alliance manager at NVIDIA — Wednesday, June 5, from 2-2:10 p.m. PT ​

Sign up for generative AI news to stay up to date on the latest breakthroughs, developments and technologies.

Read More

Ready, Set, Contribute: PyTorch Docathon Kickoff H1 2024

The PyTorch Docathon is now live! This event is dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. Our hope with this Docathon is to simplify the process for new users to get started with PyTorch, guide them in effectively utilizing its features, and ultimately expedite the transition from research to production in machine learning.

JOIN THE KICK-OFF EVENT
on June 4th at 10 AM PT

Event Details

  • June 4: Kick-off – join a 30-minutes livestream kick off event on Discord on June 4th at 10 AM PT here. If you can’t join the kick-off event, watch our welcome video on YouTube
  • June 4-June 16: Submissions and Feedback
  • June 17-18: Final Reviews
  • June 20: Winner Announcements

How to Contribute

Review the Docathon H1 2024 issue in the pytorch/pytorch or pytorch/tutorials repo that contain all the necessary information on participating in the Docathon and highlights the specific issues to work on. Remember to sign the CLA in your first PR and adhere to the Code of Conduct guidelines.

Read the Code of Conduct

Take a moment to review the PyTorch code of conduct found here. This document outlines the expectations for behavior and communication within our team, and it is important that everyone is aware of and adheres to these guidelines.

Join our Discord

This channel serves as the main communication hub during the Docathon. You can join it using by using this link:

JOIN DISCORD SERVER

When you first join the server, you will have limited access. To gain full access to our Discord PyTorch Docathon Channel:

  1. Enter the server and navigate to the #self-roles channel.
  2. In the #self-roles channel, click on the ‘Join Docathon’ button in the relevant post to assign yourself the docathon role.
  3. After assigning the role, you will see the ‘PyTorch Docathon H1 2024 Section’ in the left-hand menu for discussions.
  4. To help prevent spam we are asking that you change your server username to your GitHub username or the email username you registered with.

Explore the GitHub Issues

All the Docathon issues are posted on GitHub. You can find them by the docathon-h1-2024 label in the following participating repositories:

The issues are categorized into three levels of difficulty: easy, medium, and advanced. If this is your first time contributing to PyTorch, we recommend starting with an issue at the easy level.

Prizes for Winners

We will have a leaderboard throughout the duration of the Docathon. The more you contribute, the higher you’ll get on the board! Our top three winners will get free admission to PyTorch Conference 2024.

Thank you to our Partners

This year, we’re thrilled to work with the PyTorch Teams at Meta, Google and Snowflake to help us put on a successful event. We’ll also be at Snowflake Dev Day on June 6 where you can hear from Meta’s Matthias Reso, and check out our PyTorch booth.

Happy contributing!

Read More

Entity Disambiguation via Fusion Entity Decoding

Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training and inefficient generation. Most importantly, entity descriptions, which could contain crucial information to distinguish similar entities from each other, are often overlooked. We propose an…Apple Machine Learning Research

AGRaME: Any Granularity Ranking with Multi-Vector Embeddings

Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or require a specific dense index for each desired level of granularity. Such lack of flexibility in granularity negatively affects many applications that can benefit from more granular ranking, such as sentence-level ranking for open-domain question-answering, or proposition-level ranking for attribution. In this work, we introduce the idea of any-granularity ranking which leverages multi-vector approaches to rank at varying levels of…Apple Machine Learning Research