Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models

*=Equal Contributors
This paper was accepted at the Efficient Natural Language and Speech Processing workshop at NeurIPS 2023.
Interactions with virtual assistants often begin with a predefined trigger phrase followed by the user command. To make interactions with the assistant more natural, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We address this task by combining the decoder signals of an automatic speech recognition (ASR) system with acoustic and lexical representations as input features to a large language model…Apple Machine Learning Research

Visual AI Takes Flight at Canada’s Largest, Busiest Airport

Visual AI Takes Flight at Canada’s Largest, Busiest Airport

Toronto Pearson International Airport, in Ontario, Canada, is the country’s largest and busiest airport, serving some 50 million passengers each year.

To enhance traveler experiences, the airport in June deployed the Zensors AI platform, which uses anonymized footage from existing security cameras to generate spatial data that helps optimize operations in real time.

A member of the NVIDIA Metropolis vision AI partner ecosystem, Zensors helped the Toronto Pearson operations team significantly reduce wait times in customs lines, decreasing the average time it took passengers to go through the arrivals process from an estimated 30 minutes during peak periods in 2022 to just under six minutes last summer.

“Zensors is making visual AI easy for all to use,” said Anuraag Jain, the company’s cofounder and head of product and technology.

Scaling multimodal, transformer-based AI isn’t easy for most organizations, Jain added, so airports have often defaulted to traditional, less effective solutions based on hardware sensors, lidar or 3D stereo cameras, or look to improve their operations by renovating or building new terminals instead — which can be multibillion-dollar projects.

“We provide a platform that allows airports to instead think more like software companies, deploying quicker, cheaper and more accurate solutions using their existing cameras and the latest AI technologies,” Jain said.

Speeding Airport Operations

To meet the growing travel demands, Toronto Pearson needed a way to improve its operations in a matter of weeks, rather than the months or years it would normally take to upgrade or build new terminal infrastructure.

The Zensors AI platform — deployed to monitor 20+ customs lines in two of the airport’s terminals — delivered such a solution. It converts video feeds from the airport’s existing camera systems into structured data.

Using anonymized footage, the platform counts how many travelers are in a line, identifies congested areas and predicts passenger wait times, among other tasks — and it alerts staff in real time to speed operations.

The platform also offers analytical reports that enable operations teams to assess performance, plan more effectively and redeploy staff for optimal efficiency.

In addition to providing airport operators data-driven insights, live wait-time statistics from Zensors AI are published on Toronto Pearson’s online dashboard, as well as on electronic displays in the terminals. This lets passengers easily access accurate information about how long customs or security processes will take. And it increases customer satisfaction overall and reduces potential anxieties about whether they’ll be able to make connecting flights.

“The analyses we get from the Zensors platform are proving to be very accurate,” said Zeljko Cakic, director of airport IT planning and development at the Greater Toronto Airport Authority, Toronto Pearson’s managing company. “Our goal is to improve overall customer experience and reduce wait times, and the data gathered through the Zensors platform is one of the key contributors for decision-making to drive these results.”

Accurate AI Powered by NVIDIA

Zensors AI — built with vision transformer models — offers insights with an impressive accuracy of about 96% compared to when humans validate the information manually. It’s all powered by NVIDIA technology.

“The Zensors model development and inference run-time stack is effectively the NVIDIA AI stack,” Jain said.

The company uses NVIDIA GPUs and the CUDA parallel computing platform to train its AI models, along with the cuDNN accelerated library of primitives for deep neural networks and the NVIDIA DALI library for decoding and augmenting images and videos.

With checkpoints at Toronto Pearson open 24/7, Zensors AI inference runs around the clock on NVIDIA Triton Inference Server, an open-source software available through the NVIDIA AI Enterprise platform.

The company estimates that using NVIDIA Triton to optimize its inference run-time decreased its monthly cloud GPU spending by more than 20%. In this way, NVIDIA technology enables Zensors to provide a high-availability, production-grade, fully managed service for Toronto Pearson and other clients, Jain said.

“Today, lots of companies and organizations want to adopt AI, but the hard part is figuring out how to go about it,” he added. “Being a part of NVIDIA Metropolis gives us the best tools and enables more visibility for potential end users of Zensors technology, which ultimately lets users deploy AI with ease.”

Zensors is also a member of NVIDIA Inception, a free program that nurtures cutting-edge startups.

Visual AI for the Future of Transportation

Among many other customers who use Zensors AI is Ireland’s Cork Airport, which uses the platform to optimize its operations from curb to gate. In June, Zensors AI was deployed across the airport in just 20 days and, in less than four months, the platform helped save about 90 hours of congestion time through proactive curbside traffic management.

“Aviation is just one part of mobility,” Jain said. “We’re expanding to rail, bus and multimodal transit — and we believe Zensors will provide the layer of intelligence to eventually bring AI to all types of brick-and-mortar operators.”

Looking forward, the company is working to incorporate generative AI and large language models into the question-answering capabilities of its platform in a safe, reliable way.

Learn more about the NVIDIA Metropolis platform and how it’s used to build smarter, safer travel hubs, including at Bengaluru Airport, one of India’s busiest airports.

Read More

Mitigate hallucinations through Retrieval Augmented Generation using Pinecone vector database & Llama-2 from Amazon SageMaker JumpStart

Mitigate hallucinations through Retrieval Augmented Generation using Pinecone vector database & Llama-2 from Amazon SageMaker JumpStart

Despite the seemingly unstoppable adoption of LLMs across industries, they are one component of a broader technology ecosystem that is powering the new AI wave. Many conversational AI use cases require LLMs like Llama 2, Flan T5, and Bloom to respond to user queries. These models rely on parametric knowledge to answer questions. The model learns this knowledge during training and encodes it into the model parameters. In order to update this knowledge, we must retrain the LLM, which takes a lot of time and money.

Fortunately, we can also use source knowledge to inform our LLMs. Source knowledge is information fed into the LLM through an input prompt. One popular approach to providing source knowledge is Retrieval Augmented Generation (RAG). Using RAG, we retrieve relevant information from an external data source and feed that information into the LLM.

In this blog post, we’ll explore how to deploy LLMs such as Llama-2 using Amazon Sagemaker JumpStart and keep our LLMs up to date with relevant information through Retrieval Augmented Generation (RAG) using the Pinecone vector database in order to prevent AI Hallucination.

Retrieval Augmented Generation (RAG) in Amazon SageMaker

Pinecone will handle the retrieval component of RAG, but you need two more critical components: somewhere to run the LLM inference and somewhere to run the embedding model.

Amazon SageMaker Studio an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all machine learning (ML) development. It provides SageMaker JumpStart which is a model hub where users can locate, preview, and launch a particular model in their own SageMaker account. It provides pretrained, publicly available and proprietary models for a wide range of problem types, including Foundation Models.

Amazon SageMaker Studio provides the ideal environment for developing RAG-enabled LLM pipelines. First, using the AWS console, go to Amazon SageMaker & create a SageMaker Studio domain and open a Jupyter Studio notebook.

Prerequisites

Complete the following prerequisite steps:

  1. Set up Amazon SageMaker Studio.
  2. Onboard to an Amazon SageMaker Domain.
  3. Sign up for a free-tier Pinecone Vector Database.
  4. Prerequisite libraries: SageMaker Python SDK, Pinecone Client

Solution Walkthrough

Using SageMaker Studio notebook, we first need install prerequisite libraries:

!pip install -qU sagemaker pinecone-client==2.2.1 ipywidgets==7.0.0 

Deploying an LLM

In this post, we discuss two approaches to deploying an LLM. The first is through the HuggingFaceModel object. You can use this when deploying LLMs (and embedding models) directly from the Hugging Face model hub.

For example, you can create a deployable config for the google/flan-t5-xl model as shown in the following screen capture:

import sagemaker
from sagemaker.huggingface import (
HuggingFaceModel, 
get_huggingface_llm_image_uri
)
role = sagemaker.get_execution_role()
hub_config = {'HF_MODEL_ID':'google/flan-t5-xl', # model_id from hf.co/models
'HF_TASK':'text-generation' # NLP task you want to use for predictions

# retrieve the llm image uri
llm_image = get_huggingface_llm_image_uri("huggingface", version="0.8.2"&)
huggingface_model = HuggingFaceModel(env=hub_config, role=role, # iam role with permissions to create an Endpoint 
image_uri=llm_image
)

When deploying models directly from Hugging Face, initialize the my_model_configuration with the following:

  • An env config tells us which model we want to use and for what task.
  • Our SageMaker execution role gives us permissions to deploy our model.
  • An image_uri is an image config specifically for deploying LLMs from Hugging Face.

Alternatively, SageMaker has a set of models directly compatible with a simpler JumpStartModel object. Many popular LLMs like Llama 2 are supported by this model, which can be initialized as shown in the following screen capture:

import sagemaker 
from sagemaker.jumpstart.model import JumpStartModel 

role = sagemaker.get_execution_role() 

my_model = JumpStartModel(model_id = "meta-textgeneration-llama-2-7b-f")

For both versions of my_model, deploy them as shown in the following screen capture:

predictor = my_model.deploy(
    initial_instance_count=1, instance_type="ml.g5.4xlarge", endpoint_name="llama-2-generator")

Querying the pre-trained LLM

With our initialized LLM endpoint, you can begin querying. The format of our queries may vary (particularly between conversational and non-conversational LLMs), but the process is generally the same. For the Hugging Face model, do the following:

# https://aws.amazon.com/blogs/machine-learning/llama-2-foundation-models-from-meta-are-now-available-in-amazon-sagemaker-jumpstart/

prompt = """Answer the following QUESTION based on the CONTEXT
given. If you do not know the answer and the CONTEXT doesn't
contain the answer truthfully say "I don't know

ANSWER:

"""

payload = {
    "inputs":  
      [
        [
         {"role": "system", "content": prompt},
         {"role": "user", "content": question},
        ]   
      ],
   "parameters":{"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

out = predictor.predict(payload, custom_attributes='accept_eula=true')
out[0]['generation']['content']

You can find the solution in the GitHub repository.

The generated answer we’re receiving here doesn’t make much sense — it is a hallucination.

Providing Additional Context to LLM

Llama 2 attempts to answer our question based solely on internal parametric knowledge. Clearly, the model parameters do not store knowledge of which instances we can with managed spot training in SageMaker.

To answer this question correctly, we must use source knowledge. That is, we give additional information to the LLM via the prompt. Let’s add that information directly as additional context for the model.

context = """Managed Spot Training can be used with all instances
supported in Amazon SageMaker. Managed Spot Training is supported
in all AWS Regions where Amazon SageMaker is currently available."""

prompt_template = """Answer the following QUESTION based on the CONTEXT
given. If you do not know the answer and the CONTEXT doesn't
contain the answer truthfully say "I don't know".

CONTEXT:
{context}

ANSWER:
"""

text_input = prompt_template.replace("{context}", context).replace("{question}", question)

payload = {
    "inputs":  
      [
        [
         {"role": "system", "content": text_input},
         {"role": "user", "content": question},
        ]   
      ],
   "parameters":{"max_new_tokens": 64, "top_p": 0.9, "temperature": 0.6, "return_full_text": False}
}

out = predictor.predict(payload, custom_attributes='accept_eula=true')
generated_text = out[0]['generation']['content']
print(f"[Input]: {question}n[Output]: {generated_text}")

[Input]: Which instances can I use with Managed Spot Training in SageMaker?

[Output]:  Based on the given context, you can use Managed Spot Training with all instances supported in Amazon SageMaker. Therefore, the answer is:

All instances supported in Amazon SageMaker.

We now see the correct answer to the question; that was easy! However, a user is unlikely to insert contexts into their prompts, they would already know the answer to their question.

Rather than manually inserting a single context, automatically identify relevant information from a more extensive database of information. For that, you will need Retrieval Augmented Generation.

Retrieval Augmented Generation

With Retrieval Augmented Generation, you can encode a database of information into a vector space where the proximity between vectors represents their relevance/semantic similarity. With this vector space as a knowledge base, you can convert a new user query, encode it into the same vector space, and retrieve the most relevant records previously indexed.

After retrieving these relevant records, select a few of them and include them in the LLM prompt as additional context, providing the LLM with highly relevant source knowledge. This is a two-step process where:

  • Indexing populates the vector index with information from a dataset.
  • Retrieval happens during a query and is where we retrieve relevant information from the vector index.

Both steps require an embedding model to translate our human-readable plain text into semantic vector space. Use the highly efficient MiniLM sentence transformer from Hugging Face as shown in the following screen capture. This model is not an LLM and therefore is not initialized in the same way as our Llama 2 model.

hub_config = {
    "HF_MODEL_ID": "sentence-transformers/all-MiniLM-L6-v2",  # model_id from hf.co/models
    "HF_TASK": "feature-extraction",
}

huggingface_model = HuggingFaceModel(
    env=hub_config,
    role=role,
    transformers_version="4.6",  # transformers version used
    pytorch_version="1.7",  # pytorch version used
    py_version="py36",  # python version of the DLC
)

In the hub_config, specify the model ID as shown in the screen capture above but for the task, use feature-extraction because we are generating vector embeddings not text like our LLM. Following this, initialize the model config with HuggingFaceModel as before, but this time without the LLM image and with some version parameters.

encoder = huggingface_model.deploy(
    initial_instance_count=1, instance_type="ml.t2.large", endpoint_name="minilm-embedding"
)

You can deploy the model again with deploy, using the smaller (CPU only) instance of ml.t2.large. The MiniLM model is tiny, so it does not require a lot of memory and doesn’t need a GPU because it can quickly create embeddings even on a CPU. If preferred, you can run the model faster on GPU.

To create embeddings, use the predict method and pass a list of contexts to encode via the inputs key as shown:

out = encoder.predict({"inputs": ["some text here", "some more text goes here too"]})

Two input contexts are passed, returning two context vector embeddings as shown:

len(out)

2

The embedding dimensionality of the MiniLM model is 384 which means each vector embedding MiniLM outputs should have a dimensionality of 384. However, looking at the length of our embeddings, you will see the following:

len(out[0]), len(out[1])

(8, 8)

Two lists contain eight items each. MiniLM first processes text in a tokenization step. This tokenization transforms our human-readable plain text into a list of model-readable token IDs. In the output features of the model, you can see the token-level embeddings. one of these embeddings shows the expected dimensionality of 384 as shown:

len(out[0][0])

384

Transform these token-level embeddings into document-level embeddings by using the mean values across each vector dimension, as shown in the following illustration.

Mean pooling operation to get a single 384-dimensional vector.

import numpy as np embeddings = np.mean(np.array(out), axis=1)embeddings.shape(2, 384)

With two 384-dimensional vector embeddings, one for each input text. To make our lives easier, wrap the encoding process into a single function as shown in the following screen capture:

from typing import List

def embed_docs(docs: List[str]) -> List[List[float]]:
    out = encoder.predict({"inputs": docs})
    embeddings = np.mean(np.array(out), axis=1)
    return embeddings.tolist()

Downloading the Dataset

Download the Amazon SageMaker FAQs as the knowledge base to get the data which contains both question and answer columns.

Download the Amazon SageMaker FAQs

When performing the search, look for Answers only, so you can drop the Question column. See notebook for details.

Our dataset and the embedding pipeline are ready. Now all we need is somewhere to store those embeddings.

Indexing

The Pinecone vector database stores vector embeddings and searches them efficiently at scale. To create a database, you will need a free API key from Pinecone.

import pinecone
import os

# add Pinecone API key from app.pinecone.io
api_key = os.environ.get("PINECONE_API_KEY") or "YOUR_API_KEY"
# set Pinecone environment - find next to API key in console
env = os.environ.get("PINECONE_ENVIRONMENT") or "YOUR_ENV"

pinecone.init(api_key=api_key, environment=env)

After you have connected to the Pinecone vector database, create a single vector index (similar to a table in traditional DBs). Name the index retrieval-augmentation-aws and align the index dimension and metric parameters with those required by the embedding model (MiniLM in this case).

import time

index_name = "retrieval-augmentation-aws"

if index_name in pinecone.list_indexes():
    pinecone.delete_index(index_name)

pinecone.create_index(name=index_name, dimension=embeddings.shape[1], metric="cosine")
# wait for index to finish initialization
while not pinecone.describe_index(index_name).status["ready"]:
    time.sleep(1)

To begin inserting data, run the following:

from tqdm.auto import tqdm

batch_size = 2  # can increase but needs larger instance size otherwise instance runs out of memory
vector_limit = 1000

answers = df_knowledge[:vector_limit]
index = pinecone.Index(index_name)

for i in tqdm(range(0, len(answers), batch_size)):
    # find end of batch
    i_end = min(i + batch_size, len(answers))
    # create IDs batch
    ids = [str(x) for x in range(i, i_end)]
    # create metadata batch
    metadatas = [{"text": text} for text in answers["Answer"][i:i_end]]
    # create embeddings
    texts = answers["Answer"][i:i_end].tolist()
    embeddings = embed_docs(texts)
    # create records list for upsert
    records = zip(ids, embeddings, metadatas)
    # upsert to Pinecone
    index.upsert(vectors=records)

You can begin querying the index with the question from earlier in this post.

# extract embeddings for the questions
query_vec = embed_docs(question)[0]

# query pinecone
res = index.query(query_vec, top_k=1, include_metadata=True)

# show the results
res
{'matches': [{'id': '90',
'metadata': {'text': 'Managed Spot Training can be used with all '
'instances supported in Amazon '
'SageMaker.rn'},
'score': 0.881181657,
'values': []}],
'namespace': ''}

Above output shows that we’re returning relevant contexts to help us answer our question. Since we top_k = 1, index.query returned the top result along side the metadata which reads Managed Spot Training can be used with all instances supported in Amazon.

Augmenting the Prompt

Use the retrieved contexts to augment the prompt and decide on a maximum amount of context to feed into the LLM. Use the 1000 characters limit to iteratively add each returned context to the prompt until you exceed the content length.

Augmenting the Prompt

Augmenting the Prompt

Feed the context_str into the LLM prompt as shown in the following screen capture:

payload = create_payload(question, context_str)
out = predictor.predict(payload, custom_attributes='accept_eula=true')
generated_text = out[0]['generation']['content']
print(f"[Input]: {question}n[Output]: {generated_text}")
[Input]: Which instances can I use with Managed Spot Training in SageMaker?

[Output]:  Based on the context provided, you can use Managed Spot Training with all instances supported in Amazon SageMaker. Therefore, the answer is:


All instances supported in Amazon SageMaker.

The logic works, so wrap it up into a single function to keep things clean.

def rag_query(question: str) -> str:
    # create query vec
    query_vec = embed_docs(question)[0]
    # query pinecone
    res = index.query(query_vec, top_k=5, include_metadata=True)
    # get contexts
    contexts = [match.metadata["text"] for match in res.matches]
    # build the multiple contexts string
    context_str = construct_context(contexts=contexts)
    # create our retrieval augmented prompt
    payload = create_payload(question, context_str)
    # make prediction
    out = predictor.predict(payload, custom_attributes='accept_eula=true')
    return out[0]["generation"]["content"]

You can now ask questions like those shown in the following:

rag_query("Does SageMaker support spot instances?")

' Yes, Amazon SageMaker supports spot instances for managed spot training. According to the provided context, Managed Spot Training can be used with all instances supported in Amazon SageMaker, and Managed Spot Training is supported in all AWS Regions where Amazon SageMaker is currently available.nnTherefore, the answer to your question is:nnYes, SageMaker supports spot instances in all regions where Amazon SageMaker is available.'

Clean up

To stop incurring any unwanted charges, delete the model and endpoint.

encoder.delete_model()

encoder.delete_endpoint()

Conclusion

In this post, we introduced you to RAG with open-access LLMs on SageMaker. We also showed how to deploy Amazon SageMaker Jumpstart models with Llama 2, Hugging Face LLMs with Flan T5, and embedding models with MiniLM.

We implemented a complete end-to-end RAG pipeline using our open-access models and a Pinecone vector index. Using this, we showed how to minimize hallucinations, and keep LLM knowledge up to date, and ultimately enhance the user experience and trust in our systems.

To run this example on your own, clone this GitHub repository and walkthrough the previous steps using the Question Answering notebook on GitHub.


About the authors

Vedant Jain profile pictureVedant Jain is a Sr. AI/ML Specialist, working on strategic Generative AI initiatives. Prior to joining AWS, Vedant has held ML/Data Science Specialty positions at various companies such as Databricks, Hortonworks (now Cloudera) & JP Morgan Chase. Outside of his work, Vedant is passionate about making music, rock climbing, using science to lead a meaningful life & exploring cuisines from around the world.

James Briggs is a Staff Developer Advocate at Pinecone, specializing in vector search and AI/ML. He guides developers and businesses in developing their own GenAI solutions through online education. Prior to Pinecone James worked on AI for small tech startups to established finance corporations. Outside of work, James has a passion for traveling and embracing new adventures, ranging from surfing and scuba to Muay Thai and BJJ.

Xin HuangXin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.

Read More

Abstracts: December 6, 2023

Abstracts: December 6, 2023

Microsoft Research Podcast - Abstracts

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Xing Xie, a Senior Principal Research Manager at Microsoft Research, joins host Gretchen Huizinga to discuss “Evaluating General-Purpose AI with Psychometrics.” As AI capabilities move from task specific to more general purpose, the paper explores psychometrics, a subfield of psychology, as an alternative to traditional methods for evaluating model performance and for supporting consistent and reliable systems.

Transcript

[MUSIC PLAYS]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

Today I’m talking to Dr. Xing Xie, a Senior Principal Research Manager at Microsoft Research. Dr. Xie is coauthor of a vision paper on large language models called “Evaluating General-Purpose AI with Psychometrics,” and you can find a preprint of this paper now on arXiv. Xing Xie, thank you for joining us on Abstracts!

XING XIE: Yes, thank you. It’s my pleasure to be here. 

HUIZINGA: So in a couple sentences, tell us what issue or problem your research addresses and why people should care about it. 


XIE: Yeah, in a sense, actually, we are exploring the potential of psychometrics to revolutionize how we evaluate general-purpose AI. Because AI is advancing at a very rapid pace, traditional evaluation methods face significant challenges, especially when it comes to predicting a model’s performance in unfamiliar scenarios. And this method also lacks a robust mechanism to assess their own quality. Additionally, we, in this paper, we delve into the complexity of directly applying psychometrics to this domain and underscore several promising directions for future research. We believe that this research is of great importance. As AI continues to be integrated into novel application scenarios, it could have significant implications for both individuals and society at large. It’s crucial that we ensure their performance is both consistent and reliable.

HUIZINGA: OK, so I’m going to drill in a little bit in case there’s people in our audience that don’t understand what psychometrics is. Could you explain that a little bit for the audience? 

XIE: Yeah, psychometrics could be considered as a subdomain of psychology. Basically, psychology just studies everything about humans, but psychometrics is specifically developed to study how we can better evaluate, we could also call this general-purpose intelligence, but it’s human intelligence. So there are, actually, a lot of methodologies and approaches in how we develop this kind of test and what tasks we need to carry out. The previous AI is designed for specific tasks like machine translation, like summarization. But now I think people are already aware of many progress in big models, in large language models. AI, actually, currently can be considered as some kind of solving general-purpose tasks. Sometimes we call it few-shot learning, or sometimes we call it like zero-shot learning. We don’t need to train a model before we bring new tasks to them. So this brings a question in how we evaluate this kind of general-purpose AI, because traditionally, we evaluate AI usually using some specific benchmark, specific dataset, and specific tasks. This seems to be unsuitable to this new general-purpose AI. 

HUIZINGA: So how does your approach build on and/or differ from what’s been done previously in this field? 

XIE: Yeah, we actually see a lot of efforts have been investigated into evaluating the performance of these new large language models. But we see a significant portion of these evaluations are task specific. They’re still task specific. And also, frankly speaking, they are easily affected by changes. That means even slight alterations to a test could lead to substantial drops in performance. So our methodology differs from these approaches in that rather than solely testing how AI performs on those predetermined tasks, we actually are evaluating those latent constructs because we believe that pinpointing these latent constructs is very important.

HUIZINGA: Yeah. 

XIE: It’s important in forecasting AI’s performance in evolving and unfamiliar contexts. We can use an example like game design. With humans, even if an individual has never worked on game design—it’s just a whole new task for her—we might still confidently infer their potential if we know they possess the essential latent constructs, or abilities, which are important for game design. For example, creativity, critical thinking, and communication. 

HUIZINGA: So this is a vision paper and you’re making a case for using psychometrics as opposed to regular traditional benchmarks for assessing AI. So would you say there was a methodology involved in this as a research paper, and if so, how did you conduct the research for this? What was the overview of it? 

XIE: As you said, this is a vision paper. So instead of describing a specific methodology, we are collaborating with several experienced psychometrics researchers. Collectively, we explore the feasibility of integrating psychometrics into AI evaluation and discerning which concepts are viable and which are not. In February this year, we hosted a workshop on this topic. Over the past months, we have engaged in, in numerous discussions, and the outcome of these discussions is articulated in this paper. And additionally, actually, we are also in the middle of drafting another paper; that paper will apply insights from this paper to devise a rigorous methodology for assessing the latent capability of the most cutting-edge language models. 

HUIZINGA: When you do a regular research paper, you have findings. And when you did this paper and you workshopped it, what did you come away with in terms of the possibilities for what you might do on assessing AI with psychometrics? What were your major findings? 

XIE: Yeah, our major findings can be divided into two areas. First, we underscore the significant potential of psychometrics. This includes exploring how these metrics can be utilized to enhance predictive accuracy and guarantee test quality. Second, we also draw attention to the new challenges that arise when directly applying these principles to AI. For instance, test results could be misinterpreted, as assumptions verified for human tests might not necessarily apply to AI. Furthermore, capabilities that are essential for humans may not hold the same importance for AI.

HUIZINGA: Hmm …  

XIE: Another notable challenge is the lack of a consistent and defined population of AI, especially considering their rapid evolution. But this population is essential for traditional psychometrics, and we need to have a population of humans for verifying either the reliability or the validity of a test. But for AI, this becomes a challenge. 

HUIZINGA: Based on those findings, how do you think your work is significant in terms of real-world impact at this point? 

XIE: We believe that our approach will signal the start of a new era in the evaluation of general-purpose AI, shifting from earlier, task-specific methodologies to a more rigorous scientific method. Fundamentally, there’s an urgent demand to establish a dedicated research domain focusing solely on AI evaluation. We believe psychometrics will be at the heart of this domain. Given AI’s expanding role in society and its growing significance as an indispensable assistant, this evolution will be crucial. I think one missing part of current AI evaluation is how we can make sure the test, the benchmark, or these evaluation methods of AI themselves, is scientific. Actually, previously, I used the example of game design. Suppose in the future, I think there are a lot of people discussing language model agents, AI agents … they could be used to not only write in code but also develop software by collaborating among different agents. Then what kind of capabilities, or we call them latent constructs, of these AI models they should have before they make success in game design or any other software development. For example, like creativity, critical thinking, communication. Because this could be important when there are multiple AI models—they communicate with each other, they check the result of the output of other models. 

HUIZINGA: Are there other areas that you could say, hey, this would be a relevant application of having AI evaluated with psychometrics instead of the regular benchmarks because of the generality of intelligence?

XIE: We are mostly interested in maybe doing research, because a lot of researchers have started to leverage AI for their own research. For example, not only for writing papers, not only for generating some ideas, but maybe they could use AI models for more tasks in the whole pipeline of research. So this may require AI to have some underlying capabilities, like, as we have said, like critical thinking—how AI should define the new ideas and how they check whether these ideas are feasible and how they propose creative solutions and how they work together on research. This could be another domain. 

HUIZINGA: So if there was one thing that you want our listeners to take away from this work, what would it be? 

XIE: Yeah, I think the one takeaway I want to say is we should be aware of the vital importance of AI evaluation. We are still far from achieving a truly scientific standard, so we need to still work hard to get that done. 

HUIZINGA: Finally, what unanswered questions or unsolved problems remain in this area? What’s next on your research agenda that you’re working on? 

XIE: Yeah, actually, there are a lot of unanswered questions as highlighted at the later part of this paper. Ultimately, our goal is to adapt psychometric theories and the techniques to fit AI contexts. So we have discussed with our collaborators in both AI and psychometrics … some examples would be, how can we develop guidelines, extended theories, and techniques to ensure a rigorous evaluation that prevents misinterpretation? And how can we best evaluate assistant AI and the dynamics of AI-human teaming? This actually is particularly proposed by one of our collaborators in the psychometrics domain. And how do we evaluate the value of general-purpose AI and ensure their alignment with human objectives? And then how can we employ semiautomatic methods to develop psychometric tests, theories, and techniques with the help of general-purpose AI? That means we use AI to solve these problems by themselves. This is also important because, you know, psychometrics or psychology have developed for hundreds, or maybe thousands, of years to come to all the techniques today. But can we shorten that period? Can we leverage AI to speed up this development? 

HUIZINGA: Would you say there’s wide agreement in the AI community that this is a necessary direction to head?

XIE: This is only starting. I think there are several papers discussing how we can apply some part of psychology or some part of psychometrics to AI. But there is no systematic discussion or thinking along this line. So I, I don’t think there is agreement, but there’s already initial thoughts and initial perspectives showing in the academic community. 

[MUSIC PLAYS]

HUIZINGA: Well, Xing Xie, thanks for joining us today, and to our listeners, thank you for tuning in. If you’re interested in learning more about this paper, you can find a link at aka.ms/abstracts (opens in new tab), or you can find a preprint of the paper on arXiv. See you next time on Abstracts!

The post Abstracts: December 6, 2023 appeared first on Microsoft Research.

Read More

Microsoft at ESEC/FSE 2023: AI techniques for a streamlined coding workflow

Microsoft at ESEC/FSE 2023: AI techniques for a streamlined coding workflow

These research papers were presented at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (opens in new tab) (ESEC/FSE 2023), a premier conference in the field of software engineering.

ESEC/FSE 2023
Two papers on a blue/green gradient: InterFix and AdaptivePaste

The practice of software development inevitably involves the challenge of handling bugs and various coding irregularities. These issues can become pronounced when developers engage in the common practice of copying and pasting code snippets from the web or other peer projects. While this approach might offer a quick solution, it can introduce a host of potential complications, including compilation issues, bugs, and even security vulnerabilities into the developer’s codebase.

To address this, researchers at Microsoft have been working to advance different aspects of the software development lifecycle, from code adaptation to automated bug detection and repair. At ESEC/FSE 2023 (opens in new tab), we introduced two techniques aimed at enhancing coding efficiency. AdaptivePaste utilizes a learning-based approach to adapt and refine pasted code snippets in an integrated development environment (IDE). InferFix is an end-to-end program repair framework designed to automate bug detection and resolution. This blog outlines these technologies.

Microsoft Research Podcast

AI Frontiers: Models and Systems with Ece Kamar

Ece Kamar explores short-term mitigation techniques to make these models viable components of the AI systems that give them purpose and shares the long-term research questions that will help maximize their value. 


AdaptivePaste: Intelligent copy-paste in IDE

A widespread practice among developers involves adapting pasted code snippets to specific use cases. However, current code analysis and completion techniques, such as masked language modeling and CodeT5, do not achieve an acceptable level of accuracy in identifying and adapting variable identifiers within these snippets to align them with the surrounding code. In the paper, “AdaptivePaste: Intelligent Copy-Paste in IDE,” we propose a learning-based approach to source code adaptation, aiming to capture meaningful representations of variable usage patterns. First, we introduce a specialized dataflow-aware de-obfuscation pretraining objective for pasted code snippet adaptation. Next, we introduce a transformer-based model of two variants: a traditional unidecoder and parallel-decoder model with tied weights.

Diagram depicting AdaptivePaste architecture. Starting with a program with a pasted code snippet, AdaptivePaste extracts and prioritizes syntax hierarchies most relevant for the learning task, analyzes the data-flow, and then anonymizes the pasted code. The resulting program serves as input for neural model. The output is serialized as a sequence of tokens.
Figure 1. AdaptivePaste architecture. For a program with a pasted code snippet, AdaptivePaste extracts and prioritizes syntax hierarchies most relevant for the learning task, analyzes the data flow, and anonymizes variable identifiers in the pasted code snippet. The resulting program serves as input for neural model. The output is serialized as a sequence of tokens entries.

The unidecoder follows a standard autoregressive decoder formulation, mapping each variable in the pasted snippet to a unique symbol in the context or declaring a new variable. The parallel decoder duplicates the decoder for each anonymized symbol in the anonymized pasted snippet, predicting names independently and factorizing the output distribution per symbol. This enables selective code snippet adaptation by surfacing model predictions above a specified threshold and outputting “holes” where uncertainty exists.

To establish a dataflow-aware de-obfuscation pretraining objective for pasted code snippet adaptation, we assigned mask symbols to variable identifiers at the granularity of whole code tokens. The pre-existing code context was unanonymized, allowing the model to attend to existing identifier names defined in scope.

Our evaluation of AdaptivePaste showed promising results. It successfully adapted Python source code snippets with 67.8 percent exact match accuracy. When we analyzed the impact of confidence thresholds on model predictions, we observed that the parallel decoder transformer model improves precision to 85.9 percent in a selective code adaptation setting.

InferFix: End-to-end program repair with LLMs

Addressing software defects accounts for a significant portion of development costs. To tackle this, the paper, “InferFix: End-to-End Program Repair with LLMs over Retrieval-Augmented Prompts,” introduces a program repair framework that combines the capabilities of a state-of-the-art static analyzer called Infer, a semantic retriever model called Retriever, and a transformer-based model called Generator to address crucial security and performance bugs in Java and C#.

The Infer static analyzer is used to reliably detect, classify, and locate critical bugs within complex systems through formal verification. The Retriever uses a transformer encoder model to search for semantically equivalent bugs and corresponding fixes in large datasets of known bugs. It’s trained using a contrastive learning objective to excel at finding relevant examples of the same bug type.

The Generator employs a 12 billion-parameter codex model, fine-tuned on supervised bug-fix data. To enhance its performance, the prompts provided to the Generator are augmented with bug type annotations, bug contextual information, and semantically similar fixes retrieved from an external nonparametric memory by the Retriever. The Generator generates the candidate to fix the bug.

Diagram depicting the InferFix approach workflow. Starting with a Pull Request, the Infer Static Analyzer conducts bug detection, classification, and localization. Subsequently, Context Extraction gathers pertinent details of the bugs and the surrounding context, and then Retriever identifies semantically similar bugs. The process concludes with the LLM Generator proposing a fix based on the generated prompt.
Figure 2: The InferFix workflow. An error-prone code modification is detected by the Infer static analyzer, which is used to craft a prompt with bug type annotation, location information, relevant syntax hierarchies, and similar fixes identified by the Retriever. The large language model (LLM) Generator provides a candidate fix to the developer.

To test InferFix, we curated a dataset called InferredBugs (opens in new tab), which is rich in metadata and comprises bugs identified through executing the Infer static analyzer on thousands of Java and C# repositories. The results are noteworthy. InferFix outperforms strong LLM baselines, achieving a top-1 accuracy of 65.6 percent in C# and an impressive 76.8 percent in Java on the InferredBugs dataset.

Looking ahead

With AdaptivePaste and InferFix, we hope to significantly streamline the coding process, minimizing errors and enhancing efficiency. This includes reducing the introduction of bugs when code snippets are added and providing automated bug detection, classification, and patch validation. We believe that these tools hold promise for an enhanced software development workflow, leading to reduced costs and an overall boost in project efficiency.

Looking ahead, the rapid advancement of LLMs like GPT-3.5 and GPT-4 has sparked our interest in exploring ways to harness their potential in bug management through prompt engineering and other methods. Our goal is to empower developers by streamlining the bug detection and repair process, facilitating a more robust and efficient development environment.

The post Microsoft at ESEC/FSE 2023: AI techniques for a streamlined coding workflow appeared first on Microsoft Research.

Read More

17 Predictions for 2024: From RAG to Riches to Beatlemania and National Treasures

17 Predictions for 2024: From RAG to Riches to Beatlemania and National Treasures

Move over, Merriam-Webster: Enterprises this year found plenty of candidates to add for word of the year. “Generative AI” and “generative pretrained transformer” were followed by terms such as “large language models” and “retrieval-augmented generation” (RAG) as whole industries turned their attention to transformative new technologies.

Generative AI started the year as a blip on the radar but ended with a splash. Many companies are sprinting to harness its ability to ingest text, voice and video to churn out new content that can revolutionize productivity, innovation and creativity.

Enterprises are riding the trend. Deep learning algorithms like OpenAI’s ChatGPT, further trained with corporate data, could add the equivalent of $2.6 trillion to $4.4 trillion annually across 63 business use cases, according to McKinsey & Company.

Yet managing massive amounts of internal data often has been cited as the biggest obstacle to scaling AI. Some NVIDIA experts in AI predict that 2024 will be all about phoning a friend — creating partnerships and collaborations with cloud service providers, data storage and analytical companies, and others with the know-how to handle, fine-tune and deploy big data efficiently.

Large language models are at the center of it all. NVIDIA experts say advancements in LLM research will increasingly be applied in business and enterprise applications. AI capabilities like RAG, autonomous intelligent agents and multimodal interactions will become more accessible and more easily deployed via virtually any platform.

Hear from NVIDIA experts on what to expect in the year ahead:

MANUVIR DAS
Vice President of Enterprise Computing

One size doesn’t fit all: Customization is coming to enterprises. Companies won’t have one or two generative AI applications — many will have hundreds of customized applications using proprietary data that is suited to various parts of their business.

Once running in production, these custom LLMs will feature RAG capabilities to connect data sources to generative AI models for more accurate, informed responses. Leading companies like Amdocs, Dropbox, Genentech, SAP, ServiceNow and Snowflake are already building new generative AI services built using RAG and LLMs.

Open-source software leads the charge: Thanks to open-source pretrained models, generative AI applications that solve specific domain challenges will become part of businesses’ operational strategies.

Once companies combine these headstart models with private or real-time data, they can begin to see accelerated productivity and cost benefits across the organization. AI computing and software are set to become more accessible on virtually any platform, from cloud-based computing and AI model foundry services to the data center, edge and desktop.

Off-the-shelf AI and microservices: Generative AI has spurred the adoption of application programming interface (API) endpoints, which make it easier for developers to build complex applications.

In 2024, software development kits and APIs will level up as developers customize off-the-shelf AI models using AI microservices such as RAG as a service. This will help enterprises harness the full potential of AI-driven productivity with intelligent assistants and summarization tools that can access up-to-date business information.

Developers will be able to embed these API endpoints directly into their applications without having to worry about maintaining the necessary infrastructure to support the models and frameworks. End users can in turn experience more intuitive, responsive and tailored applications that adapt to their needs.

IAN BUCK
Vice President of Hyperscale and HPC

National treasure: AI is set to become the new space race, with every country looking to create its own center of excellence for driving significant advances in research and science and improving GDP.

With just a few hundred nodes of accelerated computing, countries will be able to quickly build highly efficient, massively performant, exascale AI supercomputers. Government-funded generative AI centers of excellence will boost countries’ economic growth by creating new jobs and building stronger university programs to create the next generation of scientists, researchers and engineers.

Quantum leaps and bounds: Enterprise leaders will launch quantum computing research initiatives based on two key drivers: the ability to use traditional AI supercomputers to simulate quantum processors and the availability of an open, unified development platform for hybrid-classical quantum computing. This enables developers to use standard programming languages instead of needing custom, specialized knowledge to build quantum algorithms.

Once considered an obscure niche in computer science, quantum computing exploration will become more mainstream as enterprises join academia and national labs in pursuing rapid advances in materials science, pharmaceutical research, subatomic physics and logistics.

KARI BRISKI
Vice President of AI Software

From RAG to riches: Expect to hear a lot more about retrial-augmented generation as enterprises embrace these AI frameworks in 2024.

As companies train LLMs to build generative AI applications and services, RAG is widely seen as an answer to the inaccuracies or nonsensical replies that sometimes occur when the models don’t have access to enough accurate, relevant information for a given use case.

Using semantic retrieval, enterprises will take open-source foundation models, ingest their own data so that a user query can retrieve the relevant data from the index and then pass it to the model at run time.

The upshot is that enterprises can use fewer resources to achieve more accurate generative AI applications in sectors such as healthcare, finance, retail and manufacturing. End users should expect to see more sophisticated, context-sensitive and multimodal chatbots and personalized content recommendation systems that allow them to talk to their data naturally and intuitively.

Multimodality makes its mark: Text-based generative AI is set to become a thing of the past. Even as generative AI remains in its infancy, expect to see many industries embrace multimodal LLMs that allow consumers to use a combination of text, speech and images to deliver more contextually relevant responses to a query about tables, charts or schematics.

Companies such as Meta and OpenAI will look to push the boundaries of multimodal generative AI by adding greater support for the senses, which will lead to advancements in the physical sciences, biological sciences and society at large. Enterprises will be able to understand their data not just in text format but also in PDFs, graphs, charts, slides and more.

NIKKI POPE
Head of AI and Legal Ethics

Target lock on AI safety: Collaboration among leading AI organizations will accelerate the research and development of robust, safe AI systems. Expect to see emerging standardized safety protocols and best practices that will be adopted across industries, ensuring a consistent and high level of safety across generative AI models.

Companies will heighten their focus on transparency and interpretability in AI systems — and use new tools and methodologies to shed light on the decision-making processes of complex AI models. As the generative AI ecosystem rallies around safety, anticipate AI technologies becoming more reliable, trustworthy and aligned with human values.

RICHARD KERRIS
Vice President of Developer Relations, Head of Media and Entertainment

The democratization of development: Virtually anyone, anywhere will soon be set to become a developer. Traditionally, one had to know and be proficient at using a specific development language to develop applications or services. As computing infrastructure becomes increasingly trained on the languages of software development, anyone will be able to prompt the machine to create applications, services, device support and more.

While companies will continue to hire developers to build and train AI models and other professional applications, expect to see significantly broader opportunities for anyone with the right skill set to build custom products and services. They’ll be helped by text inputs or voice prompts, making interactions with computers as simple as verbally instructing it.

“Now and Then” in film and song: Just as the “new” AI-augmented song by the Fab Four spurred a fresh round of Beatlemania, the dawn of the first feature-length generative AI movie will send shockwaves through the film industry.

Take a filmmaker who shoots using a 35mm film camera. The same content can soon be transformed into a 70mm production using generative AI, reducing the significant costs involved in film production in the IMAX format and allowing a broader set of directors to participate.

Creators will transform beautiful images and videos into new types and forms of entertainment by prompting a computer with text, images or videos. Some professionals worry their craft will be replaced, but those issues will fade as generative AI gets better at being trained on specific tasks. This, in turn, will free up hands to tackle other tasks and provide new tools with artist-friendly interfaces.

KIMBERLY POWELL
Vice President of Healthcare 

AI surgical assistants: The day has come when surgeons can use voice to augment what they see and understand inside and outside the surgical suite.

Combining instruments, imaging, robotics and real-time patient data with AI will lead to better surgeon training, more personalization during surgery and better safety with real-time feedback and guidance even during remote surgery. This will help close the gap on the 150 million surgeries that are needed yet do not occur, particularly in low- and middle-income countries.

Generative AI drug discovery factories: A new drug discovery process is emerging, where generative AI molecule generation, property prediction and complex modeling will drive an intelligent lab-in-the-loop flywheel, shortening the time to discover and improving the quality of clinically viable drug candidates.

These AI drug discovery factories employ massive healthcare datasets using whole genomes, atomic-resolution instruments and robotic lab automation capable of running 24/7. For the first time, computers can learn patterns and relationships within enormous and complex datasets and generate, predict and model complex biological relationships that were only previously discoverable through time-consuming experimental observation and human synthesis.

CHARLIE BOYLE
Vice President of DGX Platforms

Enterprises lift bespoke LLMs into the cloud: One thing enterprises learned from 2023 is that building LLMs from scratch isn’t easy. Companies taking this route are often daunted by the need to invest in new infrastructure and technology and they experience difficulty in figuring out how and when to prioritize other company initiatives.

Cloud service providers, colocation providers and other businesses that handle and process data for other businesses will help enterprises with full-stack AI supercomputing and software. This will make customizing pretrained models and deploying them easier for companies across industries.

Fishing for LLM gold in enterprise data lakes: There’s no shortage of statistics on how much information the average enterprise stores — it can be anywhere in the high hundreds of petabytes for large corporations. Yet many companies report that they’re mining less than half that information for actionable insights.

In 2024, businesses will begin using generative AI to make use of that untamed data by putting it to work building and customizing LLMs. With AI-powered supercomputing, business will begin mining their unstructured data — including chats, videos and code — to expand their generative AI development into training multimodal models. This leap beyond the ability to mine tables and other structured data will let companies deliver more specific answers to questions and find new opportunities. That includes helping detect anomalies on health scans, uncovering emerging trends in retail and making business operations safer.

AZITA MARTIN
Vice President of Retail, Consumer-Packaged Goods and Quick-Service Restaurants 

Generative AI shopping advisors: Retailers grapple with the dual demands of connecting customers to the products they desire while delivering elevated, human-like, omnichannel shopping experiences that align with their individual needs and preferences.

To meet these goals, retailers are gearing up to introduce cutting-edge, generative AI-powered shopping advisors, which will undergo meticulous training on the retailers’ distinct brand, products and customer data to ensure a brand-appropriate, guided, personalized shopping journey that mimics the nuanced expertise of a human assistant. This innovative approach will help set brands apart and increase customer loyalty by providing personalized help.

Setting up for safety: Retailers across the globe are facing a mounting challenge as organized retail crime grows increasingly sophisticated and coordinated. The National Retail Federation reported that retailers are experiencing a staggering 26.5% surge in such incidents since the post-pandemic uptick in retail theft.

To enhance the safety and security of in-store experiences for both customers and employees, retailers will begin using computer vision and physical security information management software to collect and correlate events from disparate security systems. This will enable AI to detect weapons and unusual behavior like the large-scale grabbing of items from shelves. It will also help retailers proactively thwart criminal activities and maintain a safer shopping environment.

REV LEBAREDIAN
Vice President of Omniverse and Simulation Technology

Industrial digitalization meets generative AI: The fusion of industrial digitalization with generative AI is poised to catalyze industrial transformation.Generative AI will make it easier to turn aspects of the physical world — such as geometry, light, physics, matter and behavior — into digital data. Democratizing the digitalization of the physical world will accelerate industrial enterprises, enabling them to design, optimize, manufacture and sell products more efficiently. It also enables them to more easily create virtual training grounds and synthetic data to train a new generation of AIs that will interact and operate within the physical world, such as autonomous robots and self-driving cars.

3D interoperability takes off: From the drawing board to the factory floor, data for the first time will be interoperable.

The world’s most influential software and practitioner companies from the manufacturing, product design, retail, e-commerce and robotics industries are committing to the newly established Alliance for OpenUSD. OpenUSD, the universal language between 3D tools and data, will break down data siloes, enabling industrial enterprises to collaborate across data lakes, tool systems and specialized teams easier and faster than ever to accelerate the digitalization of previously cumbersome, manual industrial processes.

XINZHOU WU
Vice President and General Manager of Automotive

Modernizing the vehicle production lifecycle: The automotive industry will further embrace generative AI to deliver physically accurate, photorealistic renderings that show exactly how a vehicle will look inside and out — while speeding design reviews, saving costs and improving efficiencies.

More automakers will embrace this technology within their smart factories, connecting design and engineering tools to build digital twins of production facilities. This will reduce costs and streamline operations without the need to shut down factory lines.

Generative AI will make consumer research and purchasing more interactive. From car configurators and 3D visualizations to augmented reality demonstrations and virtual test drives, consumers will be able to have a more engaging and enjoyable shopping experience.

Safety is no accident: Beyond the automotive product lifecycle, generative AI will also enable breakthroughs in autonomous vehicle (AV) development, including turning recorded sensor data into fully interactive 3D simulations. These digital twin environments, as well as synthetic data generation, will be used to safely develop, test and validate AVs at scale virtually before they’re deployed in the real world.

Generative AI foundational models will also support a vehicle’s AI systems to enable new personalized user experiences, capabilities and safety features inside and outside the car.

The behind-the-wheel experience is set to become safer, smarter and more enjoyable.

BOB PETTE
Vice President of Enterprise Platforms

Building anew with generative AI: Generative AI will allow organizations to design cars by simply speaking to a large language model or create cities from scratch using new techniques and design principles.

The architecture, engineering, construction and operations (AECO) industry is building the future using generative AI as its guidepost. Hundreds of generative AI startups and customers in AECO and manufacturing will focus on creating solutions for virtually any use case, including design optimization, market intelligence, construction management and physics prediction. AI will accelerate a manufacturing evolution that promises increased efficiency, reduced waste and entirely new approaches to production and sustainability.

Developers and enterprises are focusing in particular on point cloud data analysis, which uses lidar to generate representations of built and natural environments with precise details. This could lead to high-fidelity insights and analysis through generative AI-accelerated workflows.

GILAD SHAINER
Vice President of Networking 

AI influx ignites connectivity demand: A renewed focus on networking efficiency and performance will take off as enterprises seek the necessary network bandwidth for accelerated computing using GPUs and GPU-based systems.

Trillion-parameter LLMs will expose the need for faster transmission speeds and higher coverage. Enterprises that want to quickly roll out generative AI applications will need to invest in accelerated networking technology or choose a cloud service provider that does. The key to optimal connectivity is baking it into full-stack systems coupled with next-generation hardware and software.

The defining element of data center design: Enterprises will learn that not all data centers need to be alike. Determining the purpose of a data center is the first step toward choosing the appropriate networking to use within it. Traditional data centers are limited in terms of bandwidth, while those capable of running large AI workloads require thousands of GPUs to work at very deterministic, low-tail latency.

What the network is capable of when under a full load at scale is the best determinant of performance. The future of enterprise data center connectivity requires separate management (aka north-south) and AI (aka east-west) networks, where the AI network includes in-network computing specifically designed for high performance computing, AI and hyperscale cloud infrastructures.

DAVID REBER JR.
Chief Security Officer

Clarity in adapting the security model to AI: The pivot from app-centric to data-centric security is in full swing. Data is the fundamental supply chain for LLMs and the future of generative AI. Enterprises are just now seeing the problem unfold at scale. Companies will need to reevaluate people, processes and technologies to redefine the secure development lifecycle (SDLC). The industry at large will redefine its approach to trust and clarify what transparency means.

A new generation of cyber tools will be born. The SDLC of AI will be defined with new market leaders of tools and expectations to address the transition from the command line interface to the human language interface. The need will be especially important as more enterprises shift toward using open-source LLMs like Meta’s Llama 2 to accelerate generative AI output.

Scaling security with AI: Applications of AI to the cybersecurity deficit will detect never-before-seen threats. Currently, a fraction of global data is used for cyber defense. Meanwhile, attackers continue to take advantage of every misconfiguration.

Experimentation will help enterprises realize the potential of AI in identifying emergent threats and risks. Cyber copilots will help enterprise users navigate phishing and configuration. For the technology to be effective, companies will need to tackle privacy issues inherent in the intersection of work and personal life to enable collective defense in data-centric environments.

Along with democratizing access to technology, AI will also enable a new generation of cyber defenders as threats continue to grow. As soon as companies gain clarity on each threat, AI will be used to generate massive amounts of data that train downstream detectors to defend and detect these threats.

RONNIE VASISHTA
Senior Vice President of Telecoms

Running to or from RAN: Expect to see a major reassessment of investment cases for 5G.

After five years of 5G, network coverage and capacity have boomed — but revenue growth is sluggish and costs for largely proprietary and inflexible infrastructure have risen. Meantime, utilization for 5G RAN is stuck below 40%.

The new year will be about aggressively pursuing new revenue sources on existing spectrum to uncover new monetizable applications. Telecoms also will rethink the capex structure, focusing more on a flexible, high-utilization infrastructure built on general-purpose components. And expect to see a holistic reduction of operating expenses as companies leverage AI tools to increase performance, improve efficiency and eliminate costs. The outcome of these initiatives will determine how much carriers will invest in 6G technology.

From chatbots to network management: Telcos are already using generative AI for chatbots and virtual assistants to improve customer service and support. In the new year they’ll double down, ramping up their use of generative AI for operational improvements in areas such as network planning and optimization, fault and fraud detection, predictive analytics and maintenance, cybersecurity operations and energy optimization.

Given how pervasive and strategic generative AI is becoming, building a new type of AI factory infrastructure to support its growth also will become a key imperative. More and more telcos will build AI factories for internal use, as well as deploy these factories as a platform as a service for developers. That same infrastructure will be able to support RAN as an additional tenant.

MALCOLM DEMAYO
Vice President of Financial Services 

AI-first financial services: With AI advancements growing exponentially, financial services firms will bring the compute power to the data, rather than the other way around.

Firms will undergo a strategic shift toward a highly scalable, hybrid combination of on-premises infrastructure and cloud-based computing, driven by the need to mitigate concentration risk and maintain agility amid rapid technological advancements. Firms that handle their most mission-critical workloads, including AI-powered customer service assistants, fraud detection, risk management and more, will lead.

MARC SPIELER
Senior Director of Energy

Physics-ML for faster simulation: Energy companies will increasingly turn to physics-informed machine learning (physics-ML) to accelerate simulations, optimize industrial processes and enhance decision-making.

Physics-ML integrates traditional physics-based models with advanced machine learning algorithms, offering a powerful tool for the rapid, accurate simulation of complex physical phenomena. For instance, in energy exploration and production, physics-ML can quickly model subsurface geologies to aid in identification of potential exploration sites and assessment of operational and environmental risks.

In renewable energy sectors, such as wind and solar, physics-ML will play a crucial role in predictive maintenance, enabling energy companies to foresee equipment failures and schedule maintenance proactively to reduce downtimes and costs. As computational power and data availability continue to grow, physics-ML is poised to transform how energy companies approach simulation and modeling tasks, leading to more efficient and sustainable energy production.

LLMs — the fix for better operational outcomes: Couple with physics-ML, LLMs will analyze extensive historical data and real-time sensor inputs from energy equipment to predict potential failures and maintenance needs before they occur. This proactive approach will reduce unexpected downtime and extend the lifespan of turbines, generators, solar panels and other critical infrastructure. LLMs will also help optimize maintenance schedules and resource allocation, ensuring that repairs and inspections are efficiently carried out. Ultimately, LLM use in predictive maintenance will save costs for energy companies and contribute to a more stable energy supply for consumers.

Deepu Talla
Vice President of Embedded and Edge Computing

The rise of robotics programmers: LLMs will lead to rapid improvements for robotics engineers. Generative AI will develop code for robots and create new simulations to test and train them.

LLMs will accelerate simulation development by automatically building 3D scenes, constructing environments and generating assets from inputs. The resulting simulation assets will be critical for workflows like synthetic data generation, robot skills training and robotics application testing.

In addition to helping robotics engineers, transformer AI models, the engines behind LLMs, will make robots themselves smarter so that they better understand complex environments and more effectively execute a breadth of skills within them.

For the robotics industry to scale, robots have to become more generalizable — that is, they need to acquire skills more quickly or bring them to new environments. Generative AI models — trained and tested in simulation — will be a key enabler in the drive toward more powerful, flexible and easier-to-use robots.

Read More

Techniques for automatic summarization of documents using language models

Techniques for automatic summarization of documents using language models

Summarization is the technique of condensing sizable information into a compact and meaningful form, and stands as a cornerstone of efficient communication in our information-rich age. In a world full of data, summarizing long texts into brief summaries saves time and helps make informed decisions. Summarization condenses content, saving time and improving clarity by presenting information concisely and coherently. Summarization is invaluable for decision-making and in managing large volumes of content.

Summarization methods have a broad range of applications serving various purposes, such as:

  • News aggregation News aggregation involves summarizing news articles into a newsletter for the media industry
  • Legal document summarization Legal document summarization helps legal professionals extract key legal information from lengthy documents like terms, conditions, and contracts
  • Academic research – Summarization annotates, indexes, condenses, and simplifies important information from academic papers
  • Content curation for blogs and websites – You can create engaging and original content summaries for readers, especially in marketing
  • Financial reports and market analysis – You can extract financial insights from reports and create executive summaries for investor presentations in the finance industry

With the advancements in natural language processing (NLP), language models, and generative AI, summarizing texts of varying lengths has become more accessible. Tools like LangChain, combined with a large language model (LLM) powered by Amazon Bedrock or Amazon SageMaker JumpStart, simplify the implementation process.

This post delves into the following summarization techniques:

  • Extractive summarization using the BERT extractive summarizer
  • Abstractive summarization using specialized summarization models and LLMs
  • Two multi-level summarization techniques:
    • Extractive-abstractive summarization using the extractive-abstractive content summarization strategy (EACSS)
    • Abstractive-abstractive summarization using Map Reduce and Map ReRank

Text Summarization Techniques

The complete code sample is found in the GitHub repo. You can launch this solution in Amazon SageMaker Studio.

Click here to open the AWS console and follow along.

Types of summarizations

There are several techniques to summarize text, which are broadly categorized into two main approaches: extractive and abstractive summarization. Furthermore, multi-level summarization methodologies incorporate a series of steps, combining both extractive and abstractive techniques. These multi-level approaches are advantageous when dealing with text with tokens longer than the limit of an LLM, enabling an understanding of complex narratives.

Extractive summarization

Extractive summarization is a technique used in NLP and text analysis to create a summary by extracting key sentences. Instead of generating new sentences or content as in abstractive summarization, extractive summarization relies on identifying and pulling out the most relevant and informative portions of the original text to create a condensed version.

Extractive summarization, although advantageous in preserving the original content and ensuring high readability by directly pulling important sentences from the source text, has limitations. It lacks creativity, is unable to generate novel sentences, and may overlook nuanced details, potentially missing important information. Moreover, it may produce lengthy summaries, sometimes overwhelming readers with excessive and unwanted information. There are many extractive summarization techniques, such as TextRank and LexRank. In this post, we focus on the BERT extractive summarizer.

BERT extractive summarizer

The BERT extractive summarizer is a type of extractive summarization model that uses the BERT language model to extract the most important sentences from a text. BERT is a pre-trained language model that can be fine-tuned for a variety of tasks, including text summarization. It works by first embedding the sentences in the text using BERT. This produces a vector representation for each sentence that captures its meaning and context. The model then uses a clustering algorithm to group the sentences into clusters. The sentences that are closest to the center of each cluster are selected to form the summary.

Compared with LLMs, the advantage of the BERT extractive summarizer is it’s relatively straightforward to train and deploy the model and it’s more explainable. The disadvantage is the summarization isn’t creative and doesn’t generate sentences. It only selects sentences from the original text. This limits its ability to summarize complex or nuanced texts.

Abstractive summarization

Abstractive summarization is a technique used in NLP and text analysis to create a summary that goes beyond mere extraction of sentences or phrases from the source text. Instead of selecting and reorganizing existing content, abstractive summarization generates new sentences or phrases that capture the core meaning and main ideas of the original text in a more condensed and coherent form. This approach requires the model to understand the content of the text and express it in a way that is not necessarily present in the source material.

Specialized summarization models

These pre-trained natural language models, such as BART and PEGASUS, are specifically tailored for text summarization tasks. They employ encoder-decoder architectures and are smaller in parameters compared to their counterparts. This reduced size allows for ease of fine-tuning and deployment on smaller instances. However, it’s important to note that these summarization models also come with smaller input and output token sizes. Unlike their more general-purpose counterparts, these models are exclusively designed for summarization tasks. As a result, the input required for these models is solely the text that needs to be summarized.

Large language models

A large language model refers to any model that undergoes training on extensive and diverse datasets, typically through self-supervised learning at a large scale, and is capable of being fine-tuned to suit a wide array of specific downstream tasks. These models are larger in parameter size and perform better in tasks. Notably, they feature substantially larger input token sizes, some going up to 100,000, such as Anthropic’s Claude. To use one of these models, AWS offers the fully managed service Amazon Bedrock. If you need more control of the model development lifecycle, you can deploy LLMs through SageMaker.

Given their versatile nature, these models require specific task instructions provided through input text, a practice referred to as prompt engineering. This creative process yields varying outcomes based on the model type and input text. The effectiveness of both the model’s performance and the prompt’s quality significantly influence the final quality of the model’s outputs. The following are some tips when engineering prompts for summarization:

  • Include the text to summarize – Input the text that needs to be summarized. This serves as the source material for the summary.
  • Define the task – Clearly state that the objective is text summarization. For example, “Summarize the following text: [input text].”
  • Provide context – Offer a brief introduction or context for the given text that needs to be summarized. This helps the model understand the content and context. For example, “You are given the following article about Artificial Intelligence and its role in Healthcare: [input text].”
  • Prompt for the summary – Prompt the model to generate a summary of the provided text. Be clear about the desired length or format of the summary. For example, “Please generate a concise summary of the given article on Artificial Intelligence and its role in Healthcare: [input text].”
  • Set constraints or length guidelines – Optionally, guide the length of the summary by specifying a desired word count, sentence count, or character limit. For example, “Please generate a summary that is no longer than 50 words: [input text].”

Effective prompt engineering is critical for ensuring that the generated summaries are accurate, relevant, and aligned with the intended summarization task. Refine the prompt for optimal summarization result with experiments and iterations. After you have established the effectiveness of the prompts, you can reuse them with the use of prompt templates.

Multi-level summarization

Extractive and abstractive summarizations are useful for shorter texts. However, when the input text exceeds the model’s maximum token limit, multi-level summarization becomes necessary. Multi-level summarization involves a combination of various summarization techniques, such as extractive and abstractive methods, to effectively condense longer texts by applying multiple layers of summarization processes. In this section, we discuss two multi-level summarization techniques: extractive-abstractive summarization and abstractive-abstractive summarization.

Extractive-abstractive summarization

Extractive-abstractive summarization works by first generating an extractive summary of the text. Then it uses an abstractive summarization system to refine the extractive summary, making it more concise and informative. This enhances accuracy by providing more informative summaries compared to extractive methods alone.

Extractive-abstractive content summarization strategy

The EACSS technique combines the strengths of two powerful techniques: the BERT extractive summarizer for the extractive phase and LLMs for the abstractive phase, as illustrated in the following diagram.

Extractive Abstractive Text Summarization

EACSS offers several advantages, including the preservation of crucial information, enhanced readability, and adaptability. However, implementing EACSS is computationally expensive and complex. There’s a risk of potential information loss, and the quality of the summarization heavily depends on the performance of the underlying models, making careful model selection and tuning essential for achieving optimal results. Implementation includes the following steps:

  1. The first step is to break down the large document, such as a book, into smaller sections, or chunks. These chunks are defined as sentences, paragraphs, or even chapters, depending on the granularity desired for the summary.
  2. For the extractive phase, we employ the BERT extractive summarizer. This component works by embedding the sentences within each chunk and then employing a clustering algorithm to identify sentences that are closest to the cluster’s centroids. This extractive step helps in preserving the most important and relevant content from each chunk.
  3. Having generated extractive summaries for each chunk, we move on to the abstractive summarization phase. Here, we utilize LLMs known for their ability to generate coherent and contextually relevant summaries. These models take the extracted summaries as input and produce abstractive summaries that capture the essence of the original document while ensuring readability and coherence.

By combining extractive and abstractive summarization techniques, this approach offers an efficient and comprehensive way to summarize lengthy documents such as books. It ensures that important information is extracted while allowing for the generation of concise and human-readable summaries, making it a valuable tool for various applications in the domain of document summarization.

Abstractive-abstractive summarization

Abstractive-abstractive summarization is an approach where abstractive methods are used for both extracting and generating summaries. It offers notable advantages, including enhanced readability, coherence, and the flexibility to adjust summary length and detail. It excels in language generation, allowing for paraphrasing and avoiding redundancy. However, there are drawbacks. For example, it’s computationally expensive and resource intensive, and its quality heavily depends on the effectiveness of the underlying models, which, if not well-trained or versatile, may impact the quality of the generated summaries. Selection of models is crucial to mitigate these challenges and ensure high-quality abstractive summaries. For abstractive-abstractive summarization, we discuss two strategies: Map Reduce and Map ReRank.

Map Reduce using LangChain

This two-step process comprises a Map step and a Reduce step, as illustrated in the following diagram. This technique enables you to summarize an input that is longer than the model’s input token limit.

Abstractive text summarization mapreduce

The process consists of three main steps:

  1. The corpora is split into smaller chunks that fit into the LLM’s token limit.
  2. We use a Map step to individually apply an LLM chain that extracts all the important information from each passage, and its output is used as a new passage. Depending on the size and structure of the corpora, this could be in the form of overarching themes or short summaries.
  3. The Reduce step combines the output passages from the Map step or a Reduce Step such that it fits the token limit and feeds it into the LLM. This process is repeated until the final output is a singular passage.

The advantage of using this technique is that it’s highly scalable and parallelizable. All the processing in each step is independent from each other, which takes advantage of distributed systems or serverless services and lower compute time.

Map ReRank using LangChain

This chain runs an initial prompt on each document that not only tries to complete a task but also gives a score for how certain it is in its answer. The highest scoring response is returned.

This technique is very similar to Map Reduce but with the advantage of requiring fewer overall calls, streamlining the summarization process. However, its limitation lies in its inability to merge information across multiple documents. This restriction makes it most effective in scenarios where a single, straightforward answer is expected from a singular document, making it less suitable for more complex or multifaceted information retrieval tasks that involve multiple sources. Careful consideration of the context and the nature of the data is essential to determine the appropriateness of this method for specific summarization needs.

Cohere ReRank uses a semantic-based reranking system that contextualizes the meaning of a user’s query beyond keyword relevance. It’s used with vector store systems as well as keyword-based search engines, giving it flexibility.

Comparing summarization techniques

Each summarization technique has its own unique advantages and disadvantages:

  • Extractive summarization preserves the original content and ensures high readability but lacks creativity and may produce lengthy summaries.
  • Abstractive summarization, while offering creativity and generating concise, fluent summaries, comes with the risk of unintentional content modification, challenges in language accuracy, and resource-intensive development.
  • Extractive-abstractive multi-level summarization effectively summarizes large documents and provides better flexibility in fine-tuning the extractive part of the models. However, it’s expensive, time consuming, and lacks parallelization, making parameter tuning challenging.
  • Abstractive-abstractive multi-level summarization also effectively summarizes large documents and excels in enhanced readability and coherence. However, it’s computationally expensive and resource intensive, relying heavily on the effectiveness of underlying models.

Careful model selection is crucial to mitigate challenges and ensure high-quality abstractive summaries in this approach. The following table summarizes the capabilities for each type of summarization.

Aspect Extractive Summarization Abstractive Summarization Multi-level Summarization
Generate creative and engaging summaries No Yes Yes
Preserve original content Yes No No
Balance information preservation and creativity No Yes Yes
Suitable for short, objective text (input text length smaller than maximum tokens of the model) Yes Yes No
Effective for longer, complex documents such as books (input text length greater than maximum tokens of the model) No No Yes
Combines extraction and content generation No No Yes

Multi-level summarization techniques are suitable for long and complex documents where the input text length exceeds the token limit of the model. The following table compares these techniques.

Technique Advantages Disadvantages
EACSS (extractive-abstractive) Preserves crucial information, provides the ability to fine-tune the extractive part of the models. Computationally expensive, potential information loss, and lacks parallelization.
Map Reduce (abstractive-abstractive) Scalable and parallelizable, with less compute time. The best technique to generate creative and concise summaries. Memory-intensive process.
Map ReRank (abstractive-abstractive) Streamlined summarization with semantic-based ranking. Limited information merging.

Tips when summarizing text

Consider the following best practices when summarizing text:

  • Be aware of the total token size – Be prepared to split the text if it exceeds the model’s token limits or employ multiple levels of summarization when using LLMs.
  • Be aware of the types and number of data sources – Combining information from multiple sources may require transformations, clear organization, and integration strategies. LangChain Stuff has integration on a wide variety of data sources and document types. It simplifies the process of combining text from different documents and data sources with the use of this technique.
  • Be aware of model specialization – Some models may excel at certain types of content but struggle with others. There may be fine-tuned models that are better suited for your domain of text.
  • Use multi-level summarization for large bodies of text – For texts that exceed the token limits, consider a multi-level summarization approach. Start with a high-level summary to capture the main ideas and then progressively summarize subsections or chapters for more detailed insights.
  • Summarize text by topics – This approach helps maintain a logical flow and reduce information loss, and it prioritizes the retention of crucial information. If you’re using LLMs, craft clear and specific prompts that guide the model to summarize a particular topic instead of the whole body of text.

Conclusion

Summarization stands as a vital tool in our information-rich era, enabling the efficient distillation of extensive information into concise and meaningful forms. It plays a pivotal role in various domains, offering numerous advantages. Summarization saves time by swiftly conveying essential content from lengthy documents, aids decision-making by extracting critical information, and enhances comprehension in education and content curation.

This post provided a comprehensive overview of various summarization techniques, including extractive, abstractive, and multi-level approaches. With tools like LangChain and language models, you can harness the power of summarization to streamline communication, improve decision-making, and unlock the full potential of vast information repositories. The comparison table in this post can help you identify the most suitable summarization techniques for your projects. Additionally, the tips shared in the post serve as valuable guidelines to avoid repetitive errors when experimenting with LLMs for text summarization. This practical advice empowers you to apply the knowledge gained, ensuring successful and efficient summarization in the projects.

References


About the authors

Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.

Suhas chowdary Jonnalagadda is a Data Scientist at AWS Global Services. He is passionate about helping enterprise customers solve their most complex problems with the power of AI/ML. He has helped customers in transforming their business solutions across diverse industries, including finance, healthcare, banking, ecommerce, media, advertising, and marketing.

Tabby Ward is a Principal Cloud Architect/Strategic Technical Advisor with extensive experience migrating customers and modernizing their application workload and services to AWS. With over 25 years of experience developing and architecting software, she is recognized for her deep-dive ability as well as skillfully earning the trust of customers and partners to design architectures and solutions across multiple tech stacks and cloud providers.

Shyam Desai is a Cloud Engineer for big data and machine learning services at AWS. He supports enterprise-level big data applications and customers using a combination of software engineering expertise with data science. He has extensive knowledge in computer vision and imaging applications for artificial intelligence, as well as biomedical and bioinformatic applications.

Read More