March 2024 – Page 13

Head of the Class: Explore AI’s Potential in Higher Education and Research at GTC

For students, researchers and educators eager to delve into AI, GTC — NVIDIA’s conference on AI and accelerated computing — is in a class of its own.

Taking place from March 18-21 at the San Jose Convention Center, GTC features over 900 talks presented by world-renowned experts in fields such as generative AI, high performance computing, healthcare, energy and environment and robotics.

See some of the top sessions for attendees in higher education below. And don’t miss NVIDIA founder and CEO Jensen Huang’s GTC keynote on how AI is transforming industries, on Monday, March 18, at 1 p.m. PT.

For Researchers

Transforming AI is a panel featuring Huang and the eight authors of “Attention Is All You Need,” a groundbreaking paper that introduced the transformer neural network architecture.
Fireside Chat With Fei-Fei Li and Bill Dally: The High-Speed Revolution in AI and Managing the Impact on Humanity, featuring Dally, chief scientist and senior vice president of research at NVIDIA, and Li, Sequoia Professor of computer science at Stanford University.
Fireside Chat With Christian Szegedy and Bojan Tunguz: Automated Reasoning for More Advanced Software Synthesis and Verification, featuring Szegedy, research scientist and founder of xAI, and Tunguz, data scientist at NVIDIA.

See more sessions for researchers.

For Educators

Priming Researchers and Students for AI and Accelerated Computing Breakthroughs With Self-Sustaining Training Programs, featuring Israel Chaparro-Cruz, lecturer and co-investigator from Universidad Nacional Jorge Basadre Grohmann; Mohammad Mostafanejad, lead software scientist at the Molecular Sciences Software Institute at Virginia Tech; and Joe Bungo, Deep Learning Institute program manager at NVIDIA.
Learn How Educators Are Integrating Generative AI, Simulation and Design Into Their Curricula, featuring Deepak Chetty, area head for virtual production and assistant professor of practice at the University of Texas, Austin; Barbara Mones, teaching professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington; and Laura Scholl, senior content developer for the NVIDIA Deep Learning Institute.
Omniverse Educator Summit, an opportunity for educators to explore the NVIDIA Omniverse platform, connect with peers and discover practical resources for classrooms from NVIDIA, including the Deep Learning Institute’s Teaching Kits and University Ambassador Program. Register to attend.

Find more sessions for educators.

For Students

AI Secrets I Wish I Knew, featuring speakers from Stanford University, NVIDIA and education journalism initiative EdSurge.
NVIDIA Graduate Fellowship Fast Forward Talks, featuring Dally and fellowship recipients from Caltech, Cornell University, Stanford University and UC Berkeley.
Bridging the AI Divide: Expanding Access and Training to Nontraditional Talents and Underserved Communities, featuring speakers from Black Women In Artificial Intelligence, Create Labs, Cortex Innovation District and NVIDIA.

Discover more sessions for students and apply to join the NVIDIA Student Network.

To gain hands-on experience, check out training labs and full-day technical workshops at GTC.

4 updates from the 2024 Google for Games Developer Summit

These new tools help developers build immersive and engaging games that players love.Read More

Run an audience overlap analysis in AWS Clean Rooms

Advertisers, publishers, and advertising technology providers are actively seeking efficient ways to collaborate with their partners to generate insights about their collective datasets. One common reason to engage in data collaboration is to run an audience overlap analysis, which is a common analysis to run when media planning and evaluating new partnerships.

In this post, we explore what an audience overlap analysis is, discuss the current technical approaches and their challenges, and illustrate how you can run secure audience overlap analysis using AWS Clean Rooms.

Audience overlap analysis

Audience overlap is the percentage of users in your audience who are also present in another dataset (calculated as the number of users present in both your audience and another dataset divided by the total number of users in your audience). In the digital media planning process, audience overlaps are often conducted to compare an advertiser’s first-party dataset with a media partner’s (publisher) dataset. The analysis helps determine how much of the advertiser’s audience can be reached by a given media partner. By evaluating the overlap, advertisers can determine whether a media partner provides unique reach or if the media partner’s audience predominantly overlaps with the advertiser’s existing audience.

Current approaches and challenges

Advertisers, publishers, third-party data providers, and other entities often share their data when running audience overlaps or match tests. Common methods for sharing data, such as using pixels and SFTP transfers, can carry risk because they involve moving sensitive customer information. Sharing this data to another party can be time consuming and increase the risk of potential data breaches or unauthorized access. If the receiving party mishandles the data, it could violate privacy regulations, resulting in legal risks. Also, any perceived misuse or exposure of customer data can erode consumer trust, leading to reputational damage and potential loss of business.

Solution overview

AWS Clean Rooms can help you and your partners effortlessly and securely collaborate on and analyze your collective datasets—without copying each other’s underlying data. With AWS Clean Rooms, you can create a data clean room in minutes and collaborate with your partners to generate unique insights. AWS Clean Rooms allows you to run an audience overlap analysis and generate valuable insights while avoiding risks associated with other current approaches.

The following are key concepts and prerequisites to use AWS Clean Rooms:

Each party in the analysis (collaboration member) needs to have an AWS account.
One member invites the other member to the AWS Clean Rooms collaboration. It doesn’t matter which member creates the invitation. The collaboration creator uses the invitee’s AWS account ID as input to send invitations.
Only one member can query in the collaboration, and only one member can receive results from the collaboration. The abilities of each member are defined when the collaboration is created.
Each collaboration member stores datasets in their respective Amazon Simple Storage Service (Amazon S3) bucket and catalogs them (creates a schema with column names and data types) in the AWS Glue Data Catalog. You can also create the Data Catalog definition using the Amazon Athena create database and create table statements.
Collaborators need to have their S3 buckets and Data Catalog tables in the same AWS Region.
Collaborators can use the AWS Clean Rooms console, APIs, or AWS SDKs to set up a collaboration.
AWS Clean Rooms enables you to use any column as a join key, for example hashed MAIDs, emails, IP addresses, and RampIDs.
Each collaboration member associates their own data to the collaboration.

Let’s look at a scenario in which an advertiser collaborates with a publisher to identify the audience overlap. In this example, the publisher creates the collaboration, invites the advertiser, and designates the advertiser as the member who can query and receive results.

Prerequisites

To invite another person to a collaboration, you need their AWS account ID. In our use case, the publisher needs the AWS account ID of the advertiser.

Create a collaboration

In our use case, the publisher creates a collaboration using the AWS Clean Rooms console and invites the advertiser.

To create a collaboration, complete the following steps:

On the AWS Clean Rooms, console, choose Collaborations in the navigation pane.
Choose Create collaboration.
For Name, enter a name for the collaboration.
In the Members section, enter the AWS account ID of the account you want to invite (in this case, the advertiser).
In the Member abilities section, choose the member who can query and receive results (in this case, the advertiser).
For Query logging, decide if you want query logging turned on. The queries are logged to Amazon CloudWatch.
For Cryptographic computing, decide if you want to turn on support for cryptographic computing (pre-encrypt your data before associating it). AWS Clean Rooms will then run queries on the encrypted data.
Choose Next.
On the Configure membership page, choose if you want to create the membership and collaboration now, or create the collaboration but activate your membership later.
For Query results settings defaults, choose if you want to keep the default settings to receive results.
For Log storage in Amazon CloudWatch Logs, specify your log settings.
Specify any tags and who is paying for queries.
Choose Next.
Review the configuration and choose to either create the collaboration and membership now, or just the collaboration.

The publisher sends an invitation to the advertiser. The advertiser reviews the collaboration settings and creates a membership.

Create a configured table and set analysis rules

The publisher creates a configured table from the AWS Glue table (which represents the metadata definition of the S3 data, including location, so it can be read by AWS Clean Rooms when the query is run).

Complete the following steps:

On the AWS Clean Rooms console, choose Configured tables in the navigation pane.
Choose Configure new table.
In the Choose AWS Glue table section, choose your database and table.
In the Columns allowed in collaboration section, choose which of the existing table columns to allow for querying in the collaboration.
In the Configured table details section, enter a name and optional description for the configured table.
Choose Configure new table.
Choose the analysis rule type that matches the type of queries you want to allow on table. To allow an aggregation analysis, such as finding the size of the audience overlap, choose the aggregation analysis rule type.
In the Aggregate functions section, choose COUNT DISTINCT as the aggregate function.
In the Join controls section, choose whether your collaborator is required to join a table with yours. Because this is an audience overlap use case, select No, only overlap can be queried.
Select the operators to allow for matching (for this example, select AND and OR).
In the Dimension controls section, choose if you want to make any columns available as dimensions.
In the Scalar functions section, choose if you want to limit the scalar functions allowed.
Choose Next.
In the Aggregation constraints section, choose the minimum aggregation constraint for the configured table.

This allows you to filter out rows that don’t meet a certain minimum threshold of users (for example, if the threshold is set to 10, rows that aggregate fewer than 10 users are filtered out).

Choose Next.
Review the settings and create the table.

Associate the table to the collaboration

AWS Clean Rooms requires access to read the table in order to run the query submitted by the advertiser. Complete the following steps to associate the table:

On the AWS Clean Rooms console, navigate to your collaboration.
Choose Associate table.
For Configured table name, choose the name of your configured table.
In the Table association details section, enter a name and optional description for the table.
In the Service access section, you can choose to can use the default settings to create an AWS Identity and Access Management (IAM) service role for AWS Clean Rooms automatically, or you can use an existing role. IAM permissions are required to create or modify the role and pass the role to AWS Clean Rooms.
Choose Associate table.

The advertiser also completes the steps detailed in the preceding sections to create a configured table and associate it to the collaboration.

Run queries in the query editor

The advertiser can now navigate to the Queries tab for the collaboration and review tables to query and their analysis rules. You can specify

the S3 bucket where the output of the overlap query will go.

The advertiser can now write and run an overlap query. You can use a hashed email as a join key for the query (you have the option to use any column as the join key and can also use multiple columns for multiple join keys). You can also use the Analysis Builder no-code option to have AWS Clean Rooms generate SQL on your behalf. For our use case, we run the following queries:

#Query 1 – count of overlapping users between advertiser and publisher datasets

SELECT COUNT(DISTINCT advertiser.hashed_email)
FROM consumer as advertiser
INNER JOIN impressions as publisher
ON advertiser.hashed_email = publisher.hashed_email

#Query 2 – count of users in advertiser dataset

SELECT COUNT(DISTINCT advertiser.hashed_email)
FROM consumer as advertiser

The query results are sent to the advertiser’s S3 bucket, as shown in the following screenshot.

Clean up

It’s a best practice to delete resources that are no longer being used. The advertiser and publisher should clean up their respective resources:

Advertiser – The advertiser deletes their configured table associations and collaboration membership. However, they don’t have to delete their configured table because it’s reusable across collaborations.
Publisher – The publisher deletes their configured table associations and the collaboration. They don’t have to delete their configured table because it’s reusable across collaborations.

Conclusion

In this post, we demonstrated how to set up an audience overlap collaboration using AWS Clean Rooms for media planning and partnership evaluation using a hashed email as a join key between datasets. Advertisers are increasingly turning to AWS Clean Rooms to conduct audience overlap analyses with their media partners, aiding their media investment decisions. Furthermore, audience overlaps help you accelerate your partnership evaluations by identifying the extent of overlap you share with potential partners.

To learn more about AWS Clean Rooms, watch the video Getting Started with AWS Clean Rooms, and refer to the following additional resources:

About the Authors

Eric Saccullo is a Senior Business Development Manager for AWS Clean Rooms at Amazon Web Services. He is focused on helping customers collaborate with their partners in privacy-enhanced ways to gain insights and improve business outcomes.

Shamir Tanna is a Senior Technical Product Manager at Amazon Web Services.

Ryan Malecky is a Senior Solutions Architect at Amazon Web Services. He is focused on helping customers gain insights from their data, especially with AWS Clean Rooms.

Large language model inference over confidential data using AWS Nitro Enclaves

This post is co-written with Justin Miles, Liv d’Aliberti, and Joe Kovba from Leidos.

Leidos is a Fortune 500 science and technology solutions leader working to address some of the world’s toughest challenges in the defense, intelligence, homeland security, civil, and healthcare markets. In this post, we discuss how Leidos worked with AWS to develop an approach to privacy-preserving large language model (LLM) inference using AWS Nitro Enclaves.

LLMs are designed to understand and generate human-like language, and are used in many industries, including government, healthcare, financial, and intellectual property. LLMs have broad applicability, including chatbots, content generation, language translation, sentiment analysis, question answering systems, search engines, and code generation. Introducing LLM-based inference into a system also has the potential to introduce privacy threats, including model exfiltration, data privacy violations, and unintended LLM-based service manipulation. Technical architectures need to be implemented in order to make sure that LLMs don’t expose sensitive information during inference.

This post discusses how Nitro Enclaves can help protect LLM model deployments, specifically those that use personally identifiable information (PII) or protected health information (PHI). This post is for educational purposes only and should not be used in production environments without additional controls.

Overview of LLMs and Nitro Enclaves

A potential use case is an LLM-based sensitive query chatbot designed to carry out a question and answering service containing PII and PHI. Most current LLM chatbot solutions explicitly inform users that they should not include PII or PHI when inputting questions due to security concerns. To mitigate these concerns and protect customer data, service owners rely primarily on user protections such as the following:

Redaction – The process of identifying and obscuring sensitive information like PII in documents, texts, or other forms of content. This can be accomplished with input data before being sent to a model or an LLM trained to redact their responses automatically.
Multi-factor authentication – A security process that requires users to provide multiple authentication methods to verify their identity to gain access to the LLM.
Transport Layer Security (TLS) – A cryptographic protocol that provides secure communication that enhances data privacy in transit between users and the LLM service.

Although these practices enhance the security posture of the service, they are not sufficient to safeguard all sensitive user information and other sensitive information that can persist without the user’s knowledge.

In our example use case, an LLM service is designed to answer employee healthcare benefit questions or provide a personal retirement plan. Let’s analyze the following sample architecture and identify data privacy risk areas.

Figure 1 – Data Privacy Risk Areas Diagram

The potential risk areas are as follows:

Privileged users have access to the instance that houses the server. Unintentional or unauthorized changes to the service could result in sensitive data being exposed in unintended ways.
Users must trust the service will not expose or retain sensitive information in application logs.
Changes to application packages can cause changes to the service, resulting in the exposure of sensitive data.
Privileged users with access to the instance have unrestricted access to the LLM used by the service. Changes may cause incorrect or inaccurate information being returned to users.

Nitro Enclaves provides additional isolation to your Amazon Elastic Compute Cloud (Amazon EC2) instance, safeguarding data in use from unauthorized access, including admin-level users. In the preceding architecture, it’s possible for an unintentional change to result in sensitive data to persist in plaintext and accidentally get revealed to a user who may not need to access that data. With Nitro Enclaves, you create an isolated environment from your EC2 instance, permitting you to allocate CPU and memory resources to the enclave. This enclave is a highly restrictive virtual machine. By running code that handles sensitive data within the enclave, none of the parent’s processes will be able to view enclave data.

Nitro Enclaves offers the following benefits:

Memory and CPU Isolation – It relies on the Nitro Hypervisor to isolate the CPU and memory of the enclave from users, applications, and libraries on the parent instance. This feature helps isolate the enclave and your software, and significantly reduces the surface area for unintended events.
Separate virtual machine – Enclaves are separated virtual machines attached to an EC2 instance to further protect and securely process highly sensitive data.
No interactive access – Enclaves provide only secure local socket connectivity with their parent instance. They have no persistent storage, interactive access, or external networking.
Cryptographic attestation – Nitro Enclaves offers cryptographic attestation, a process used to prove the identity of an enclave and verify that only authorized code is running in your enclave.
AWS integration – Nitro Enclaves is integrated with AWS Key Management Service (AWS KMS), allowing you to decrypt files that have been encrypted using AWS KMS inside the enclave. AWS Certificate Manager (ACM) for Nitro Enclaves allows you to use public and private SSL/TLS certificates with your web applications and servers running on EC2 instances with Nitro Enclaves.

You can use these features provided by Nitro Enclaves to help mitigate risks associated with PII and PHI data. We recommend including Nitro Enclaves in an LLM service when handling sensitive user data.

Solution overview

Let’s examine the architecture of the example service, now including Nitro Enclaves. By incorporating Nitro Enclaves, as shown in the following figure, the LLM becomes a more secure chatbot for handling PHI or PII data.

Figure 2 – Solution Overview Diagram

User data, including PII, PHI, and questions, remains encrypted throughout the request-response process when the application is hosted within an enclave. The steps carried out during the inference are as follows:

The chatbot app generates temporary AWS credentials and asks the user to input a question. The question, which may contain PII or PHI, is then encrypted via AWS KMS. The encrypted user input is combined with the temporary credentials to create the encrypted request.
The encrypted data is sent to an HTTP server hosted by Flask as a POST request. Before accepting sensitive data, this endpoint should be configured for HTTPs.
The client app receives the POST request and forwards it through a secure local channel (for example, vsock) to the server app running inside Nitro Enclaves.
The Nitro Enclaves server app uses the temporary credentials to decrypt the request, queries the LLM, and generates the response. The model-specific settings are stored within the enclaves and are protected with cryptographic attestation.
The server app uses the same temporary credentials to encrypt the response.
The encrypted response is returned back to the chatbot app through the client app as a response from the POST request.
The chatbot app decrypts the response using their KMS key and displays the plaintext to the user.

Prerequisites

Before we get started, you need the following prerequisites to deploy the solution:

AWS Identity and Access Management (IAM) user
KMS symmetric key
AWS Cloud9 environment

Configure an EC2 instance

Complete the following steps to configure an EC2 instance:

Launch an r5.8xlarge EC2 instance using the amzn2-ami-kernel-5.10-hvm-2.0.20230628.0-x86_64-gp2 AMI with Nitro Enclaves enabled.
Install the Nitro Enclaves CLI to build and run Nitro Enclaves applications:
- sudo amazon-linux-extras install aws-nitro-enclaves-cli -y
- sudo yum install aws-nitro-enclaves-cli-devel -y
Verify the installation of the Nitro Enclaves CLI:
- nitro-cli –version
- The version used in this post is 1.2.2
Install Git and Docker to build Docker images and download the application from GitHub. Add your instance user to the Docker group (<USER> is your IAM instance user):
- sudo yum install git -y
- sudo usermod -aG ne <USER>
- sudo usermod -aG docker <USER>
- sudo systemctl start docker && sudo systemctl enable docker
Start and enable the Nitro Enclaves allocator and vsock proxy services:
- sudo systemctl start nitro-enclaves-allocator.service && sudo systemctl enable nitro-enclaves-allocator.service
- sudo systemctl start nitro-enclaves-vsock-proxy.service && sudo systemctl enable nitro-enclaves-vsock-proxy.service

Nitro Enclaves uses a local socket connection called vsock to create a secure channel between the parent instance and the enclave.

After all the services are started and enabled, restart the instance to verify that all of the user groups and services are running correctly:

sudo shutdown -r now

Configure the Nitro Enclaves allocator service

Nitro Enclaves is an isolated environment that designates a portion of the instance CPU and memory to run the enclave. With the Nitro Enclaves allocator service, you can indicate how many CPUs and how much memory will be taken from the parent instance to run the enclave.

Modify the enclave’s reserved resources using a text editor (for our solution, we allocate 8 CPU and 70,000 MiB memory to provide enough resources):

vi /etc/nitro_enclaves/allocatory.yaml

Figure 3 – AWS Nitro Enclaves Allocator Service Configuration

Clone the project

After you configure the EC2 instance, you can download the code to run the sensitive chatbot with an LLM inside of Nitro Enclaves.

You need to update the server.py file with the appropriate KMS key ID that you created in the beginning to encrypt the LLM response.

Clone the GitHub project:
- cd ~/ && git clone https://<THE_REPO.git>
Navigate to the project folder to build the enclave_base Docker image that contains the Nitro Enclaves Software Development Kit (SDK) for cryptographic attestation documents from the Nitro Hypervisor (this step can take up to 15 minutes):
- cd /nitro_llm/enclave_base
- docker build ./ -t “enclave_base”

Save the LLM in the EC2 Instance

We are using the open-source Bloom 560m LLM for natural language processing to generate responses. This model is not fine-tuned to PII and PHI, but demonstrates how an LLM can live inside of an enclave. The model also needs to be saved on the parent instance so that it can be copied into the enclave via the Dockerfile.

Navigate to the project:
- cd /nitro_llm
Install the necessary requirements to save the model locally:
- pip3 install requirements.txt
Run the save_model.py app to save the model within the /nitro_llm/enclave/bloom directory:
- python3 save_model.py

Build and run the Nitro Enclaves image

To run Nitro Enclaves, you need to create an enclave image file (EIF) from a Docker image of your application. The Dockerfile located in the enclave directory contains the files, code, and LLM that will run inside of the enclave.

Building and running the enclave will take multiple minutes to complete.

Navigate to the root of the project:
- cd /nitro_llm
Build the enclave image file as enclave.eif:
- nitro-cli build-enclave --docker-uri enclave:latest --output-file enclave.eif

Figure 4 – AWS Nitro Enclaves Build Result

When the enclave is built, a series of unique hashes and platform configuration registers (PCRs) will be created. The PCRs are a contiguous measurement to prove the identity of the hardware and application. These PCRs will be required for cryptographic attestation and used during the KMS key policy update step.

Run the enclave with the resources from the allocator.service (adding the --attach-console argument at the end will run the enclave in debug mode):
- nitro-cli run-enclave --cpu-count 8 --memory 70000 --enclave-cid 16 --eif-path enclave.eif

You need to allocate at least four times the EIF file size. This can be modified in the allocator.service from previous steps.

Verify the enclave is running with the following command:
- nitro-cli describe-enclaves

AWS-Nitro-Enclave-Describe-Command-Response

Figure 5 – AWS Nitro Enclave Describe Command

Update the KMS key policy

Complete the following steps to update your KMS key policy:

On the AWS KMS console, choose Customer managed keys in the navigation pane.
Search for the key that you generated as a prerequisite.
Choose Edit on the key policy.
Update the key policy with the following information:
- Your account ID
- Your IAM user name
- The updated Cloud9 environment instance role
- Actions kms:Encrypt and kms:Decrypt
- Enclave PCRs (for example, PCR0, PCR1, PCR2) to your key policy with a condition statement

See the following key policy code:

{
   "Version":"2012-10-17",
   "Id":"key-default-1",
   "Statement":[
      {
         "Sid":"Enable User permissions",
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam:::user/"
         },
         "Action":[
            "kms:CreateAlias",
            "kms:CreateKey",
            "kms:DeleteAlias",
            "kms:Describe*",
            "kms:GenerateRandom",
            "kms:Get*",
            "kms:List*",
            "kms:TagResource",
            "kms:UntagResource",
            "iam:ListGroups",
            "iam:ListRoles",
            "iam:ListUsers"
         ],
         "Resource":"*"
      },
      {
         "Sid":"Enable Enclave permissions",
         "Effect":"Allow",
         "Principal":{
            "AWS":"arn:aws:iam:::role/"
         },
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt"
         ],
         "Resource":"*",
         "Condition":{
            "StringEqualsIgnoreCase":{
               "kms:RecipientAttestation:PCR0":"",
               "kms:RecipientAttestation:PCR1":"",
               "kms:RecipientAttestation:PCR2":""
            }
         }
      }
   ]
}

Save the chatbot app

To mimic a sensitive query chatbot application that lives outside of the AWS account, you need to save the chatbot.py app and run it inside the Cloud9 environment. Your Cloud9 environment will use its instance role for temporary credentials to disassociate permissions from the EC2 running the enclave. Complete the following steps:

On the Cloud9 console, open the environment you created.
Copy the following code into a new file like chatbot.py into the main directory.
Install the required modules:
- pip install boto3
- Pip install requests
On the Amazon EC2 console, note the IP associated with your Nitro Enclaves instance.
Update the URL variable in http://<ec2instanceIP>:5001.

"""
Modules for a basic chatbot like application and AWS communications
"""
import base64
import requests
import boto3
 
def get_identity_document():
    """
    Get identity document for current EC2 Host
    """
    identity_doc = requests.get(
        "http://169.254.169.254/latest/dynamic/instance-identity/document", timeout=30)
    return identity_doc
 
def get_region(identity):
    """
    Get account of current instance identity
    """
    region = identity.json()["region"]
    return region
 
def get_account(identity):
    """
    Get account of current instance identity
    """
    account = identity.json()["accountId"]
    return account
 
def set_identity():
    """
    Set region and account for KMS
    """
    identity = get_identity_document()
    region = get_region(identity)
    account = get_account(identity)
    return region, account
 
def prepare_server_request(ciphertext):
    """
    Get the AWS credential from EC2 instance metadata
    """
    instance_prof = requests.get(
        "http://169.254.169.254/latest/meta-data/iam/security-credentials/", timeout=30)
    instance_profile_name = instance_prof.text
 
    instance_prof_json = requests.get(
        f"http://169.254.169.254/latest/meta-data/iam/security-credentials/{instance_profile_name}",
        timeout=30)
    response = instance_prof_json.json()
 
    credential = {
        'access_key_id': response['AccessKeyId'],
        'secret_access_key': response['SecretAccessKey'],
        'token': response['Token'],
        'region': REGION,
        'ciphertext': ciphertext
    }
    return credential
 
def get_user_input():
    """
    Start chatbot to collect user input
    """
    print("Chatbot: Hello! How can I assist you?")
    user_input = input('Your Question: ')
    return user_input.lower()
 
def encrypt_string(user_input, alias, kms):
    """
    Encrypt user input using AWS KMS
    """
    file_contents = user_input
    encrypted_file = kms.encrypt(KeyId=f'alias/{alias}', Plaintext=file_contents)
    encrypted_file_contents = encrypted_file[u'CiphertextBlob']
    encrypted_file_contents_base64 = base64.b64encode(encrypted_file_contents)
    return encrypted_file_contents_base64.decode()
 
def decrypt_data(encrypted_data, kms):
    """
    Decrypt the LLM response using AWS KMS
    """
    try:
        ciphertext_blob = base64.b64decode(encrypted_data)
        response = kms.decrypt(CiphertextBlob=ciphertext_blob)
        decrypted_data = response['Plaintext'].decode()
        return decrypted_data
    except ImportError as e_decrypt:
        print("Decryption failed:", e_decrypt)
        return None
 
REGION, ACCOUNT = set_identity()
  
def main():
    """
    Main function to encrypt/decrypt data and send/receive with parent instance
    """
    kms = boto3.client('kms', region_name=REGION)
    alias = "ncsnitro"
    user_input = get_user_input()
    encrypted_input = encrypt_string(user_input, alias, kms)
    server_request = prepare_server_request(encrypted_input)
    url = 'http://<EC2 Instance Private IP>:5001'
    x = requests.post(url, json = server_request)
    response_body = x.json()
    llm_response = decrypt_data(response_body["EncryptedData"], kms)
    print(llm_response)
 
if __name__ == '__main__':
    main()

Run the chatbot application:

- python3 chat.py

When it’s running, the terminal will ask for the user input and follow the architectural diagram from earlier to generate a secure response.

Run the private question and answer chatbot

Now that Nitro Enclaves is up and running on the EC2 instance, you can more securely ask your chatbot PHI and PII questions. Let’s look at an example.

Within the Cloud9 environment, we ask our chatbot a question and provide our user name.

Figure 6 – Asking the Chat Bot a Question

AWS KMS encrypts the question, which looks like the following screenshot.

Figure 7 – Encrypted Question

It is then sent to the enclave and asked of the secured LLM. The question and response of the LLM will look like the following screenshot (the result and encrypted response are visible inside the enclave only in debug mode).

Figure 8 – Response from LLM

The result is then encrypted using AWS KMS and returned to the Cloud9 environment to be decrypted.

Figure 9 – Final Decrypted Response

Clean up

Complete the following steps to clean up your resources:

Stop the EC2 instance created to house your enclave.
Delete the Cloud9 environment.
Delete the KMS key.
Remove the EC2 instance role and IAM user permissions.

Conclusion

In this post, we showcased how to use Nitro Enclaves to deploy an LLM question and answering service that more securely sends and receives PII and PHI information. This was deployed on Amazon EC2, and the enclaves are integrated with AWS KMS restricting access to a KMS key, so only Nitro Enclaves and the end-user are allowed to use the key and decrypt the question.

If you’re planning to scale this architecture to support larger workloads, make sure the model selection process matches your model requirements with EC2 resources. Additionally, you must consider the maximum request size and what impact that will have on the HTTP server and inference time against the model. Many of these parameters are customizable through the model and HTTP server settings.

The best way to determine the specific settings and requirements for your workload is through testing with a fine-tuned LLM. Although this post only included natural language processing of sensitive data, you can modify this architecture to support alternate LLMs supporting audio, computer vision, or multi-modalities. The same security principles highlighted here can be applied to data in any format. The resources used to build this post are available on the GitHub repo.

Share how you are going to adapt this solution for your environment in the comments section.

About the Authors

Justin Miles is a cloud engineer within the Leidos Digital Modernization Sector under the Office of Technology. In his spare time, he enjoys golfing and traveling.

Liv d’Aliberti is a researcher within the Leidos AI/ML Accelerator under the Office of Technology. Their research focuses on privacy-preserving machine learning.

Chris Renzo is a Sr. Solution Architect within the AWS Defense and Aerospace organization. Outside of work, he enjoys a balance of warm weather and traveling.

Joe Kovba is a Vice President within the Leidos Digital Modernization Sector. In his free time, he enjoys refereeing football games and playing softball.

Randomized Algorithms for Precise Measurement of Differentially-private, Personalized

This paper was accepted at The 5th AAAI Workshop on Privacy-Preserving Artificial Intelligence.
Personalized recommendations form an important part of today’s internet ecosystem, helping artists and creators to reach interested users, and helping users to discover new and engaging content. However, many users today are skeptical of platforms that personalize recommendations, in part due to historically careless treatment of personal data and data privacy. Now, businesses that rely on personalized recommendations are entering a new paradigm, where many of their systems must be overhauled to be…Apple Machine Learning Research

Chain-of-table: Evolving tables in the reasoning chain for table understanding

Posted by Zilong Wang, Student Researcher, and Chen-Yu Lee, Research Scientist, Cloud AI Team

People use tables every day to organize and interpret complex information in a structured, easily accessible format. Due to the ubiquity of such tables, reasoning over tabular data has long been a central topic in natural language processing (NLP). Researchers in this field have aimed to leverage language models to help users answer questions, verify statements, and analyze data based on tables. However, language models are trained over large amounts of plain text, so the inherently structured nature of tabular data can be difficult for language models to fully comprehend and utilize.

Recently, large language models (LLMs) have achieved outstanding performance across diverse natural language understanding (NLU) tasks by generating reliable reasoning chains, as shown in works like Chain-of-Thought and Least-to-Most. However, the most suitable way for LLMs to reason over tabular data remains an open question.

In “Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding”, we propose a framework to tackle table understanding tasks, where we train LLMs to outline their reasoning step by step, updating a given table iteratively to reflect each part of a thought process, akin to how people solve the table-based problems. This enables the LLM to transform the table into simpler and more manageable segments so that it can understand and analyze each part of the table in depth. This approach has yielded significant improvements and achieved new state-of-the-art results on the WikiTQ, TabFact, and FeTaQA benchmarks. The figure below shows the high-level overview of the proposed Chain-of-Table and other methods.

Given a complex table where a cyclist’s nationality and name are in the same cell, (a) generic, multi-step reasoning is unable to provide the correct answer (b) program-aided reasoning generates and executes programs (e.g., SQL queries) to deliver the answer, but falls short in accurately addressing the question. In contrast, (c) Chain-of-Table iteratively samples a chain of operations that effectively transform the complex table into a version specifically tailored to the question.

Chain-of-Table

In Chain-of-Table, we guide LLMs using in-context learning to iteratively generate operations and to update the table to represent its reasoning chain over tabular data. This enables LLMs to dynamically plan the next operation based on the results of previous ones. This continuous evolution of the table forms a chain, which provides a more structured and clear representation of the reasoning process for a given problem and enables more accurate and reliable predictions from the LLM.

For example, when asked, “Which actor has the most NAACP image awards?” the Chain-of-Table framework prompts an LLM to generate tabular operations mirroring tabular reasoning processes. It first identifies the relevant columns. Then, it aggregates rows based on shared content. Finally, it reorders the aggregated results to yield a final table that clearly answers the posed question.

These operations transform the table to align with the question presented. To balance performance with computational expense on large tables, we construct the operation chain according to a subset of tabular rows.. Meanwhile, the step-by-step operations reveal the underlying reasoning process through the display of intermediate results from the tabular operations, fostering enhanced interpretability and understanding.

Illustration of the tabular reasoning process in Chain-of-Table. This iterative process involves dynamically planning an operation chain and accurately storing intermediate results in the transformed tables. These intermediate tables serve as a tabular thought process that can guide the LLM to land to the correct answer more reliably.

Chain-of-Table consists of three main stages. In the first stage, it instructs the LLM to dynamically plan the next operation by in-context learning. Specifically, the prompt involves three components as shown in the following figure:

The question Q: “Which country had the most cyclists finish in the top 3?”
The operation history chain: f_add_col(Country) and f_select_row(1, 2, 3).
The latest intermediate table T: the transformed intermediate table.

By providing the triplet (T, Q, chain) in the prompt, the LLM can observe the previous tabular reasoning process and select the next operation from the operation pool to complete the reasoning chain step by step.

Illustration of how Chain-of-Table selects the next operation from the operation pool and generates the arguments for the operation.(a) Chain-of-Table samples the next operation from the operation pool. (b) It takes the selected operation as input and generates its arguments.

After the next operation f is determined, in the second stage, we need to generate the arguments. As above, Chain-of-Table considers three components in the prompt as shown in the figure: (1) the question, (2) the selected operation and its required arguments, and (3) the latest intermediate table.

For instance, when the operation f_group_by is selected, it requires a header name as its argument.

The LLM selects a suitable header within the table. Equipped with the selected operation and the generated arguments, Chain-of-Table executes the operation and constructs a new intermediate table for the following reasoning.

Chain-of-Table iterates the previous two stages to plan the next operation and generate the required arguments. During this process, we create an operation chain acting as a proxy for the tabular reasoning steps. These operations generate intermediate tables presenting the results of each step to the LLM. Consequently, the output table contains comprehensive information about the intermediate phases of tabular reasoning. In our final stage, we employ this output table in formulating the final query and prompt the LLM along with the question for the final answer.

Experimental setup

We use PaLM 2-S and GPT 3.5 as the backbone LLMs and conduct the experiments on three public table understanding benchmarks: WikiTQ, TabFact, and FeTaQA. WikiTQ and FeTaQA are datasets for table-based question answering. TabFact is a table-based fact verification benchmark. In this blogpost, we will focus on the results on WikiTQ and TabFact. We compare Chain-of-Table with the generic reasoning methods (e.g., End-to-End QA, Few-Shot QA, and Chain-of-Thought) and the program-aided methods (e.g., Text-to-SQL, Binder, and Dater).

Better robustness on harder questions

In Chain-of-Table, longer operation chains indicate the higher difficulty and complexity of the questions and their corresponding tables. We categorize the test samples according to their operation lengths in Chain-of-Table. We compare Chain-of-Table with Chain-of-Thought and Dater, as representative generic and program-aided reasoning methods. We illustrate this using results from PaLM 2 on WikiTQ.

Performance of Chain-of-Thought, Dater, and the proposed Chain-of-Table on WikiTQ for questions that require an operation chain of varying lengths. Our proposed atomic operations significantly improve performance over generic and program-aided reasoning counterparts.

Notably, Chain-of-Table consistently surpasses both baseline methods across all operation chain lengths, with a significant margin up to 11.6% compared with Chain-of-Thought, and up to 7.9% compared with Dater. Moreover, the performance of Chain-of-Table declines gracefully with increasing number of operations compared to other baseline methods, exhibiting only a minimal decrease when the number of operations increases from four to five.

Better robustness with larger tables

We categorize the tables from WikiTQ into three groups based on token number: small (<2000 tokens), medium (2000 to 4000 tokens) and large (>4000 tokens). We then compare Chain-of-Table with Dater and Binder, the two latest and strongest baselines.

Performance of Binder, Dater, and the proposed Chain-of-Table on small (<2000 tokens), medium (2000 to 4000 tokens), and large (>4000 tokens) tables from WikiTQ. We observe that the performance decreases with larger input tables while Chain-of-Table diminishes gracefully, achieving significant improvements over competing methods. (As above, underlined text denotes the second-best performance; bold denotes the best performance.)

As anticipated, the performance decreases with larger input tables, as models are required to reason through longer contexts. Nevertheless, the performance of the proposed Chain-of-Table diminishes gracefully, achieving a significant 10+% improvement over the second best competing method when dealing with large tables. This demonstrates the efficacy of the reasoning chain in handling long tabular inputs.

Conclusion

Our proposed Chain-of-Table method enhances the reasoning capability of LLMs by leveraging the tabular structure to express intermediate steps for table-based reasoning. It instructs LLMs to dynamically plan an operation chain according to the input table and its associated question. This evolving table design sheds new light on the understanding of prompting LLMs for table understanding.

Acknowledgements

This research was conducted by Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister. Thanks to Chih-Kuan Yeh and Sergey Ioffe for their valuable feedback.

Eco-System Upgrade: AI Plants a Digital Forest at NVIDIA GTC

The ecosystem around NVIDIA’s technologies has always been verdant — but this is absurd.

After a stunning premiere at the World Economic Forum in Davos, immersive artworks based on Refik Anadol Studio’s Large Nature Model will come to the U.S. for the first time at NVIDIA GTC.

Offering a deep dive into the synergy between AI and the natural world, Anadol’s multisensory work, “Large Nature Model: A Living Archive,” will be situated prominently on the main concourse of the San Jose Convention Center, where the global AI event is taking place, from March 18-21.

Fueled by NVIDIA’s advanced AI technology, including powerful DGX A100 stations and high-performance GPUs, the exhibit offers a captivating journey through our planet’s ecosystems with stunning visuals, sounds and scents.

These scenes are rendered in breathtaking clarity across screens with a total output of 12.5 million pixels, immersing attendees in an unprecedented digital portrayal of Earth’s ecosystems.

Refik Anadol, recognized by The Economist as “the artist of the moment,” has emerged as a key figure in AI art. His work, notable for its use of data and machine learning, places him at the forefront of a generation pushing the boundaries between technology, interdisciplinary research and aesthetics. Anadol’s influence reflects a wider movement in the art world towards embracing digital innovation, setting new precedents in how art is created and experienced.

Exhibition Details

Location: Main concourse at the San Jose McEnery Convention Center, ensuring easy access for all GTC attendees.
Total experience hours: Available from 5-7 p.m., providing a curated window to engage with the installation fully.
Screen dimensions: The installation features two towering screens, each four meters high. The larger, four-by-12-meter screen displays the “Large Nature Model: Living Archive,” showcasing Anadol’s centerpiece. A second, four-by-six-meter screen offers a glimpse into the process of building the Large Nature Model.

A Gateway to Digital Nature

Large Nature Model is a generative AI model focused exclusively on nature.

This installation exemplifies AI’s unique potential to capture nature’s inherent intelligence, aiming to redefine our engagement with and appreciation of Earth’s ecosystems.

Anadol has been working with nature-based datasets throughout his career, and began working with rainforest data years ago.

The Large Nature Model, on which the work being shown at GTC is based, continues to evolve. It represents the work of a team of 29 data scientists, graphic designers and AI specialists from around the world, all working under the umbrella of the Refik Anadol Studio.

The Large Nature Model showcased at GTC is fine-tuned using the Getty Images foundation model built using the NVIDIA Edify architecture. The model is fine-tuned on an extensive dataset of approximately 750,000 images, comprising 274,947 images of flora, 358,713 images of fauna and 130,282 images of fungi — showcasing the rich biodiversity of the Amazonian rainforest.

Insights Into the Making

Alongside the visual feast, a panel discussion featuring Anadol and colleagues from the Refik Anadol Studio will provide insights into their research and design processes.

Moderated by Brian Dowdy, a senior technical marketing engineer at NVIDIA, the discussion will explore the collaborative efforts, technical challenges and creative processes that make such pioneering art possible.

The creation of the Large Nature Model represents six months of rigorous development and collaboration with NVIDIA researchers, underscoring the dedication and interdisciplinary effort required to bring this innovative vision to life.

How VistaPrint delivers personalized product recommendations with Amazon Personalize

VistaPrint, a Cimpress business, is the design and marketing partner to millions of small businesses around the world. For more than two decades, VistaPrint has empowered small businesses to quickly and effectively create the marketing products – from promotional materials and signage to print advertising and more – to get the job done, regardless of whether they operate in-store or online.

To support small businesses on their brand-building journey, VistaPrint provides customers with personalized product recommendations, both in real time on vistaprint.com and through marketing emails. These product recommendations improve their customers’ experience by making it more efficient to find the products they need, while increasing VistaPrint’s conversion rates. Since implementing Amazon Personalize, VistaPrint increased their conversion rate by 10 percent and reduced their total cost of ownership by 30 percent.

In this post, we show you how VistaPrint uses a combination of Amazon Personalize, Twilio Segment, and auxiliary AWS services and partner solutions to better understand their customers’ needs and provide personalized product recommendations.

Prior solution and challenges

Prior to their current solution, VistaPrint had an internally developed product recommendation system hosted on-premises. The first challenge with their prior solution was that the solution couldn’t scale automatically when demand increased. The second challenge was that changes to the in-house developed system were time-consuming, because a high degree of machine learning and ecommerce domain specialization was required to make modifications.

These challenges led to the decision to create a new cloud-native system that can scale with increased demand and consists of serverless and software as a service (SaaS) components that externalize much of the domain-specific functionality to allow for easier operations and faster time-to-market for changes.

The new VistaPrint personalized product recommendation system

Architecture diagram showing Vistaprint's personalized product recommendation system.

Figure 1

As seen in Figure 1, the steps in how VistaPrint provides personalized product recommendations with their new cloud-native architecture are:

Aggregate historical data in a data warehouse. Data from upstream systems including customer data platforms (CDPs) like Twilio Segment, order management, product catalog, and user management systems are collected in a data warehouse, which in VistaPrint’s case is Snowflake.
Transform the data to create Amazon Personalize training data. Amazon Personalize uses data about users, items, and interactions, and this data is ingested from Amazon Simple Storage Service (Amazon S3) in CSV format. In VistaPrint’s case, they use Databricks to perform the required data transformations before landing the data in Amazon S3.
Import bulk historical data to train Amazon Personalize models. After bulk historical data is ingested into an Amazon Personalize dataset, one or more solutions are trained using this data. In VistaPrint’s case, they use the User-Personalization and Similar-Items model recipes.
- With User-Personalization, Amazon Personalize predicts the items that a user will interact with based on previous interactions across all users.
- With Similar-Items, Amazon Personalize generates recommendations for items that are similar to an item you specify.
To maintain the relevance of the personalization models, steps 2 and 3 are repeated on a regular basis to keep the training data up to date.
Stream ecommerce website events to a CDP. A CDP is used to capture events from an ecommerce website, for example when a user views a product or adds a product to their shopping cart. A CDP can also perform identity resolution, which helps to identify the user regardless of whether they’re accessing a platform from a mobile or a web client. VistaPrint uses Twilio Segment as their CDP.
Generate real-time product recommendations as a customer navigates the ecommerce website. As a customer navigates an ecommerce website and these events are captured by a CDP, they are also forwarded to Amazon Personalize. Amazon Personalize in turn generates recommendations for additional products that a customer may be interested in. These recommendations are placed back into the ecommerce website experience in real-time.
- AWS Lambda is used to send data from Segment to Amazon Personalize using Segment’s Amazon Lambda Destination. VistaPrint uses the Segment Amazon Lambda Destination to perform additional data transformations and to get flexibility to integrate with additional observability tooling not shown, but other AWS customers can consider Segment’s Amazon Personalize Destination which is suitable for simpler integrations.
- VistaPrint created a personalization service that sits in front of Amazon Personalize. This service provides additional functionality on top of Amazon Personalize APIs, including the ability to cache recent recommendations in Amazon DynamoDB, and integration with VistaPrint’s authentication and authorization systems.
- VistaPrint created a placement and offer engine (POE), which allows data scientists and marketers to collaborate. Placement templates are used to create customized placements by allowing a marketer to select an Amazon Personalize model, the visual style of the placement, and extra features like whether to display a customer’s logo as it would appear on the final manufactured product. Figure 2 shows an example of one of these placements, called More with your design, as seen on vistaprint.com.
Generate product recommendations as part of email marketing campaigns. In addition to providing real-time product recommendations on their website, VistaPrint uses personalized product recommendations in email marketing campaigns. The same POE system is used to design and place product recommendations into email templates.

Figure 2

Business Impact

Since implementing its new personalized product recommendation system, VistaPrint has realized a 10 percent increase in conversions originating from personalized recommendations. Amazon Personalize also reduced VistaPrint’s total cost of ownership by 30 percent compared to the previous on-premises solution.

Conclusion

VistaPrint’s cloud-native personalized product recommendation system helps the company deliver a more efficient and helpful experience to their customers, while increasing the company’s conversion rates.

Amazon Personalize is at the center of VistaPrint’s personalized product recommendation system, providing a fully managed, machine learning powered solution.

A customer data platform like Twilio Segment allows companies like VistaPrint to build a connected, 360 degree view of their customers by aggregating data from all of their customer touchpoints across multiple business domains. This cohesive view of the customer leads to more accurate and personalized product recommendations when paired with Amazon Personalize.

Next Steps

The VistaPrint personalized product recommendation system is one product within a larger data mesh of products. Read more about Vista’s data mesh strategy in this previous post How Vista built a data mesh enabled by solutions available in AWS Marketplace

Also read more on the other topics in this post:

Personalize cross-channel customer experiences with Amazon SageMaker, Amazon Personalize, and Twilio Segment
Retail Demo Store is an example retail web application on GitHub showing Amazon Personalize in action.

About the Authors

Ethan Fahy is an Enterprise Senior Solutions Architect at AWS based in Boston, MA. Ethan has a background in geophysics and enjoys building large-scale, cloud-native architectures to support scientific workloads.

Mouloud Lounaci leads the Engineering team for Marketing Optimization at Vista. He is a Machine Learning enthusiast with around 10 years of experience in building AI-powered software products to solve complex customer problems. Whenever he gets a chance, Mouloud jumps on a plane to discover cultures, food, and landscapes from around the world.

Emeline Escolivet is the Engineering Manager for the Recommendations team at Vista. With 10+ years of experience as a Software Engineer, she enjoys turning complex business issues into reliable software solutions. In her free time, she likes to describe herself as a hiker, dancer and food lover.

Vibhusheet Tripathi is a Senior Data Engineer in the Recommendations Team at Vista. When not experimenting with machine learning systems, Vibhu likes to read, play sports and listen to music.

AI Getting Green Light: City of Raleigh Taps NVIDIA Metropolis to Improve Traffic

You might say that James Alberque has a bird’s-eye view of the road congestion and challenges that come with a booming U.S. city.

Alberque analyzes traffic data for Raleigh, North Carolina, which has seen its population more than double in the past three decades. The city has been working with NVIDIA and its partners to analyze traffic on the roads and intersections to help reduce congestion and enhance pedestrian safety.

“We can now push traffic video into the NVIDIA DeepStream platform and can quantify in real time how many vehicles are entering and exiting intersections and visualize it for our engineers,” said Alberque, a geoinformation systems and emerging technology manager for the city.

Such information can be fed to vendors responsible for keeping traffic lights optimized, so population expansion doesn’t bring roadways to a crawl or increase the number of accidents.

Urban growth has slowed commutes as metropolitan regions across the nation turn to AI for assistance in optimizing traffic flow.

“We got great accuracy level using NVIDIA pre-trained AI computer vision models for traffic cameras right out of the box,” said Alberque. “And our engineers worked with an NVIDIA Metropolis partner, Quantiphi, to refine those models and got them up to an incredible 95% accuracy,” said Alberque.

The Raleigh system uses hundreds of cameras to enhance its model training, and the city has its sights set on everything from road flooding, license plate tracking and parking utilization to bus stop wait times and sanitation management.

Federal Initiatives Support Intersection Safety

Advances from the city of Raleigh and others looking to smooth the flow of traffic come as the U.S. Department of Transportation continues to support AI efforts.

The DOT recently announced winners of the first phase of its Intersection Safety Challenge, which aims to support innovation in intersection safety. Three of the DOT’s winning entrants are harnessing NVIDIA Metropolis for smart intersections.

In this first stage, 15 participants who submitted design concept proposals for intersection safety systems out of 120 submissions were awarded $100,000 each and an invitation to participate further.

The next stage will focus on system assessment and virtual testing, with teams expected to develop, train and improve algorithms used for detection, localization and classification of vehicles and road users in a controlled test intersection.

Enlisting NVIDIA AI for Smart Intersections

Deloitte Consulting is building a foundation for smart intersections, enlisting the NVIDIA Metropolis application framework, developer tools and partner ecosystem.

Derq USA is developing an intersection safety system that relies on NVIDIA Metropolis to help manage the deluge of sensor data for insights.

Metropolis partner, Miovision, which has traffic light safety systems deployed across the U.S., uses the NVIDIA Jetson edge AI platform in its devices for processing video and TensorRT for inference.

“There are so many people moving into our city and surrounding areas, and our number one and number two concerns for citizens are around traffic — this is providing data to move the needle,” said Alberque, regarding Raleigh’s Metropolis and DeepStream development.

Register for GTC24 to discover how AI is transforming smart cities. Some key sessions include:

S62372 – Overcoming London’s Commuting Conditions With Computer Vision – Vivacity

S62436 – Harnessing Computer Vision and Traffic Cameras for Urban Traffic Monitoring – Quantiphi & City of Raleigh

S62402 – How Cities and DOTs Can Implement Road Safety Technologies – Derq & Seattle Dept. of Transportation

[S62382] Paving the Way for a Safer, Smarter Future in Transportation – Deloitte

Explore more smart traffic solutions powered by NVIDIA Metropolis in this Smart Roadways eBook.

Scaling early detection of esophageal cancer with AI

white icons of first aid kit, DNA strand, laptop monitor with overlapping eye, and microscope on a blue and green gradient background

Microsoft Research and Cyted have collaborated to build novel AI models (opens in new tab) to scale the early detection of esophageal cancer. The AI-supported methods demonstrated the same diagnostic performance as the existing manual workflow, potentially reducing the pathologist’s workload by up to 63%.

Esophageal cancer is the sixth most common cause of cancer deaths worldwide, in part because this disease is typically diagnosed late, making treatment difficult. Fewer than 1 in 5 patients survive five years after diagnosis, making early detection of this disease critical to improving a patient’s chances. One opportunity for early detection is to identify patients with a condition called Barrett’s esophagus (BE). Patients with BE are at an increased risk of developing cancer, though most never will. Chronic heartburn is a risk factor and a possible cause of Barrett’s.

Detecting BE dramatically improves a patient’s chances. Earlier detection of cancer and earlier start of treatment mean that more than 9 in 10 patients survive 5 years after diagnosis. However early detection of BE has typically involved an endoscopic biopsy, a procedure that many people find uncomfortable and invasive. It often requires sedation, is resource intensive, and increases the risk of complications.

A major step toward enabling large-scale screening for BE has been spearheaded by Cyted (opens in new tab), a start-up company at the forefront of medical innovation. Cyted has developed a capsule sponge device called EndoSign (opens in new tab)® – a dissolvable capsule on a string that expands into a small medical sponge once in the stomach. When pulled back out, it collects cells from the lining of the esophagus, which are then processed, placed on slides, stained, and scanned for digital analysis.

The capsule sponge is easier to administer and less costly than endoscopy. But a pathologist still needs to review the digitized slides to determine the presence of any goblet cells, a type of cell normally found in the intestinal lining, which would indicate BE if found in the esophagus. These images are huge (up to 100,000 by 100,000 pixels – the size of a squash court if printed at the typical photo resolution of 300dpi) – yet may contain only a few goblet cells per image, each cell just a few pixels large. To identify BE, pathologists need to use slides from two stains, H&E (a routine stain for observing cell structure) and TFF3 (a special stain just to find goblet cells). Since most patients with heartburn will not have BE, pathologists spend most of their time examining negative cases, taking away time in which they could be prioritizing high-risk cases without more sophisticated approaches to analysis.

Microsoft Research and Cyted have collaborated to build novel AI models that can efficiently check the slides for goblet cells, using either the H&E or TFF3 stains. This joint effort has led to a Nature Communications paper titled “Enabling large-scale screening of Barrett’s esophagus using weakly supervised deep learning in histopathology (opens in new tab).” Our study uses the strength of transformer-based multiple instance learning to assist in the screening of BE. In the paper, we introduce two major innovations. First, we show that the AI models can be built solely from the pathologists’ findings on whether BE is present, eliminating the need for expensive pixel-level annotations. This means that existing large capsule sponge screening datasets can be used to further improve the performance of the model. Secondly, we demonstrate that goblet cells can be detected with high accuracy using only the H&E slides. This is the most common routine stain in pathology, and it suggests that the more time-consuming and costly specialized staining, TFF3, could be skipped (see Figure 2 below).

Fig1 — Figure 1: The top-left contains a thumbnail image of an H&E slide with goblet cells. In the bottom left, the attention maps of the AI model show which image regions the model uses to make its final prediction. Zooming in to those areas (bottom right), we see that image parts that receive high attention contain goblet cells. We validate that these are indeed goblet cells by looking at the corresponding TFF3 slide (top right), where goblet cells are shown as brown.

In the paper, we further discuss different AI-assisted workflows designed to optimize the screening process. The first workflow necessitates a pathologist’s review only if either the H&E or TFF3 models predict a sample as positive. This method can achieve the same diagnostic performance as the existing manual workflow in terms of sensitivity and specificity, potentially reducing the pathologist’s workload by 52% (see Figure 3 below).

The second proposed workflow reduces the need for pathologist review by 63% of the original load, by restricting reviews to positive predictions from the H&E model only. However, this comes at slightly reduced sensitivity, since goblet cells are more clearly visible in the TFF3 stain.

diagram — Figure 2: Proposed AI-assisted workflows. a) Workflow “Pathologist reviews any positives” b) Workflow “Pathologist reviews H&E model positives”

Proposed AI-assisted workflow	Pathologist review (per-cent of all cases)	TFF3 staining required (per-cent of all cases)	Sensitivity @ Specificity 1.00
Pathologist reviews any positive	48%	100%	1.00
Pathologist reviews H&E model positives	37%	37%	0.91

Figure 3: Quantitative comparison of the proposed workflows. For the two workflows described in Figure 2, we compare the pathologist workload as a fraction of the total number of cases, the amount of images for which a costly TFF3 stain is required, and the resulting accuracy numbers.

Our collaboration with Cyted demonstrates the transformative potential of integrating advanced AI models into clinical workflows, saving valuable time for pathologists. As we move forward, the scalability of this technology holds the promise for widespread adoption of early detection in the fight against esophageal cancer.

“This represents a significant step in our fight against esophageal cancer, offering the potential to save countless lives through early detection with our minimally-invasive capsule sponge technology,” said Cyted CEO Marcel Gehrung. “Our collaboration with Microsoft Research has been instrumental in pushing the boundaries of what’s possible in medical imaging and screening technologies, creating optimal efficiencies from start to finish of the testing process.”

We have open sourced code to build these models (opens in new tab), which is designed to be scalable to very large datasets, using Azure Machine Learning (opens in new tab). This flexibility allows other researchers and institutions to adapt and enhance our code according to their specific needs. Importantly, our code represents a significant advancement over previous work in the field. Unlike earlier approaches that focused solely on training the multiple instance and attention layers, our code allows for end-to-end fine-tuning, including the image encoder. This comprehensive approach to training ensures optimal performance and accuracy, setting a new standard for AI models in histopathology.

“The open sourcing of this code has helped us to advance our research in the field of early cancer detection,” said Florian Markowetz, Professor of Computational Oncology at the University of Cambridge, and Senior Group Leader at Cancer Research UK Cambridge Institute. “Several key features will soon be integrated into ongoing clinical trials, where we aim to improve the detection of Barrett’s esophagus in patients and ultimately treat more cancers through early intervention. Furthermore, these features will help improve the workflow of pathologists and identify key regions quicker, enabling clinicians to tackle more cases with greater reliability.”

By sharing our work, we aim not only to enhance the detection of BE and esophageal cancer, but also to empower researchers and clinicians around the world to leverage this technology in their fight against cancer^[1]. Because our code can be used as a building block to develop AI models for histopathology slides, it may also potentially be applied to other cancer types. It is our hope that this open-source initiative will foster innovation and collaboration, and ultimately lead to breakthroughs that save lives.

As researchers, it has been exciting to work closely with Cyted and be part of the long path towards early detection of esophageal cancer. Cross-discipline collaborations like this are excellent opportunities to solve complex clinical problems. With AI models built using the principles of responsible AI like fairness, privacy and security, and reliability and safety, we can ultimately make a tangible difference to patient outcomes.

Acknowledgement

Thank you to the team: Kenza Bouzid, Harshita Sharma, Sarah Killcoyne, Daniel C. Castro, Anton Schwaighofer, Max Ilse, Valentina Salvatelli, Ozan Oktay, Sumanth Murthy, Lucas Bordeaux, Luiza Moore, Maria O’Donovan, Anja Thieme, Hannah Richardson, Aditya Nori, Marcel Gehrung, Javier Alvarez-Valle

[1] (opens in new tab) Code released for research use only. Full disclaimer here: https://github.com/microsoft/be-trans-mil (opens in new tab)

The post Scaling early detection of esophageal cancer with AI appeared first on Microsoft Research.

For Researchers

For Educators

For Students

Audience overlap analysis

Current approaches and challenges

Solution overview

Prerequisites

Create a collaboration

Create a configured table and set analysis rules

Associate the table to the collaboration

Run queries in the query editor

Clean up

Conclusion

About the Authors

Overview of LLMs and Nitro Enclaves

Solution overview

Prerequisites

Configure an EC2 instance

Configure the Nitro Enclaves allocator service

Clone the project

Save the LLM in the EC2 Instance

Build and run the Nitro Enclaves image

Update the KMS key policy

Save the chatbot app

Run the private question and answer chatbot

Clean up

Conclusion

About the Authors

Chain-of-Table

Experimental setup

More accurate answers

Better robustness on harder questions

Better robustness with larger tables

Conclusion

Acknowledgements

Exhibition Details

A Gateway to Digital Nature

Insights Into the Making

Prior solution and challenges

The new VistaPrint personalized product recommendation system

Business Impact

Conclusion

Next Steps

About the Authors

Federal Initiatives Support Intersection Safety

Enlisting NVIDIA AI for Smart Intersections

Acknowledgement

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.