Build a classification pipeline with Amazon Comprehend custom classification (Part I)

Build a classification pipeline with Amazon Comprehend custom classification (Part I)

“Data locked away in text, audio, social media, and other unstructured sources can be a competitive advantage for firms that figure out how to use it“

Only 18% of organizations in a 2019 survey by Deloitte reported being able to take advantage of unstructured data. The majority of data, between 80% and 90%, is unstructured data. That is a big untapped resource that has the potential to give businesses a competitive edge if they can find out how to use it. It can be difficult to find insights from this data, particularly if efforts are needed to classify, tag, or label it. Amazon Comprehend custom classification can be useful in this situation. Amazon Comprehend is a natural-language processing (NLP) service that uses machine learning to uncover valuable insights and connections in text.

Document categorization or classification has significant benefits across business domains –

  • Improved search and retrieval – By categorizing documents into relevant topics or categories, it makes it much easier for users to search and retrieve the documents they need. They can search within specific categories to narrow down results.
  • Knowledge management – Categorizing documents in a systematic way helps to organize an organization’s knowledge base. It makes it easier to locate relevant information and see connections between related content.
  • Streamlined workflows – Automatic document sorting can help streamline many business processes like processing invoices, customer support, or regulatory compliance. Documents can be automatically routed to the right people or workflows.
  • Cost and time savings – Manual document categorization is tedious, time-consuming, and expensive. AI techniques can take over this mundane task and categorize thousands of documents in a short time at a much lower cost.
  • Insight generation – Analyzing trends in document categories can provide useful business insights. For example, an increase in customer complaints in a product category could signify some issues that need to be addressed.
  • Governance and policy enforcement – Setting up document categorization rules helps to ensure that documents are classified correctly according to an organization’s policies and governance standards. This allows for better monitoring and auditing.
  • Personalized experiences – In contexts like website content, document categorization allows for tailored content to be shown to users based on their interests and preferences as determined from their browsing behavior. This can increase user engagement.

The complexity of developing a bespoke classification machine learning model varies depending on a variety of aspects such as data quality, algorithm, scalability, and domain knowledge, to mention a few. It’s essential to start with a clear problem definition, clean and relevant data, and gradually work through the different stages of model development. However, businesses can create their own unique machine learning models using Amazon Comprehend custom classification to automatically classify text documents into categories or tags, to meet business specific requirements and map to business technology and document categories. As human tagging or categorization is no longer necessary, this can save businesses a lot of time, money, and labor. We have made this process simple by automating the whole training pipeline.

In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks. We are using the BBC news dataset, and will be training a classifier to identify the class (e.g. politics, sports) that a document belongs to. The pipeline will enable your organization to rapidly respond to changes and train new models without having to start from scratch each time. You may scale up and train multiple models based on your demand easily.

Prerequisites

  • An active AWS account (Click here to create a new AWS account)
  • Access to Amazon Comprehend, Amazon S3, Amazon Lambda, Amazon Step Function, Amazon SNS, and Amazon CloudFormation
  • Training data (semi-structure or text) prepared in following section
  • Basic knowledge about Python and Machine Learning in general

Prepare training data

This solution can take input as either text format (ex. CSV) or semi-structured format (ex. PDF).

Text input

Amazon Comprehend custom classification supports two modes: multi-class and multi-label.

In multi-class mode, each document can have one and only one class assigned to it. The training data should be prepared as two-column CSV file with each line of the file containing a single class and the text of a document that demonstrates the class.

CLASS, Text of document 1
CLASS, Text of document 2
...

Example for BBC news dataset:

Business, Europe blames US over weak dollar...
Tech, Cabs collect mountain of mobiles...
...

In multi-label mode, each document has at least one class assigned to it, but can have more. Training data should be as a two-column CSV file, which each line of the file containing one or more classes and the text of the training document. More than one class should be indicated by using a delimiter between each class.

CLASS, Text of document 1
CLASS|CLASS|CLASS, Text of document 2
...

No header should be included in the CSV file for either of the training mode.

Semi-structured input

Starting in 2023, Amazon Comprehend now supports training models using semi-structured documents. The training data for semi-structure input is comprised of a set of labeled documents, which can be pre-identified documents from a document repository that you already have access to. The following is an example of an annotations file CSV data required for training (Sample Data):

CLASS, document1.pdf, 1
CLASS, document1.pdf, 2
...

The annotations CSV file contains three columns: The first column contains the label for the document, the second column is the document name (i.e., file name), and the last column is the page number of the document that you want to include in the training dataset. In most cases, if the annotations CSV file is located at the same folder with all other document, then you just need to specify the document name in the second column. However, if the CSV file is located in a different location, then you’d need to specify the path to location in the second column, such as path/to/prefix/document1.pdf.

For details, how to prepare your training data, please refer to here.

Solution overview

  1. Amazon Comprehend training pipeline starts when training data (.csv file for text input and annotation .csv file for semi-structure input) is uploaded to a dedicated Amazon Simple Storage Service (Amazon S3) bucket.
  2. An AWS Lambda function is invoked by Amazon S3 trigger such that every time an object is uploaded to specified Amazon S3 location, the AWS Lambda function retrieves the source bucket name and the key name of the uploaded object and pass it to training step function workflow.
  3. In training step function, after receiving the training data bucket name and object key name as input parameters, a custom model training workflow kicks-off as a series of lambdas functions as described:
    1. StartComprehendTraining: This AWS Lambda function defines a ComprehendClassifier object depending on the type of input files (i.e., text or semi-structured) and then kicks-off an Amazon Comprehend custom classification training task by calling create_document_classifier Application Programming Interfact (API), which returns a training Job Amazon Resource Names (ARN) . Subsequently, this function checks the status of the training job by invoking describe_document_classifier API. Finally, it returns a training Job ARN and job status, as output to the next stage of training workflow.
    2. GetTrainingJobStatus: This AWS Lambda checks the job status of training job in every 15 minutes, by calling describe_document_classifier API, until training job status changes to Complete or Failed.
    3. GenerateMultiClass or GenerateMultiLabel: If you select yes for performance report when launching the stack, one of these two AWS Lambdas will run analysis according to your Amazon Comprehend model outputs, which generates per class performance analysis and save it to Amazon S3.
    4. GenerateMultiClass: This AWS Lambda will be called if your input is MultiClass and you select yes for performance report.
    5. GenerateMultiLabel: This AWS Lambda will be called if your input is MultiLabel and you select yes for performance report.
  4. Once the training is done successfully, the solution generates following outputs:
    1. Custom Classification Model: A trained model ARN will be available in your account for future inference work.
    2. Confusion Matrix [Optional]: A confusion matrix (confusion_matrix.json) will be available in user defined output Amazon S3 path, depending on the user selection.
    3. Amazon Simple Notification Service notification [Optional]: A notification email will be sent about training job status to the subscribers, depending on the initial user selection.

Walkthrough

Launching the solution

To deploy your pipeline, complete the following steps:

  1. Choose Launch Stack button:

  1. Choose Next

  1. Specify the pipeline details with the options fitting your use case:

Information for each stack detail:

  • Stack name (Required) – the name you specified for this AWS CloudFormation stack. The name must be unique in the Region in which you’re creating it.
  • Q01ClassifierInputBucketName (Required) – The Amazon S3 bucket name to store your input data. It should be a globally unique name and AWS CloudFormation stack helps you create the bucket while it’s being launched.
  • Q02ClassifierOutputBucketName (Required) – The Amazon S3 bucket name to store outputs from Amazon Comprehend and the pipeline. It should also be a globally unique name.
  • Q03InputFormat – A dropdown selection, you can choose text (if your training data is csv files) or semi-structure (if your training data are semi-structure [e.g., PDF files]) based on your data input format.
  • Q04Language – A dropdown selection, choosing the language of documents from supported list. Please note, currently only English is supported if your input format is semi-structure.
  • Q05MultiClass – A dropdown selection, select yes if your input is MultiClass mode. Otherwise, select no.
  • Q06LabelDelimiter – Only required if your Q05MultiClass answer is no. This delimiter is used in your training data to separate each class.
  • Q07ValidationDataset – A dropdown selection, change the answer to yes if you want to test the performance of trained classifier with your own test data.
  • Q08S3ValidationPath – Only required if your Q07ValidationDataset answer is yes.
  • Q09PerformanceReport – A dropdown selection, select yes if you want to generate the class-level performance report post model training. The report will be saved in you specified output bucket in Q02ClassifierOutputBucketName.
  • Q10EmailNotification – A dropdown selection. Select yes if you want to receive notification after model is trained.
  • Q11EmailID – Enter valid email address for receiving performance report notification. Please note, you have to confirm subscription from your email after AWS CloudFormation stack is launched, before you could receive notification when training is completed.
  1. In the Amazon Configure stack options section, add optional tags, permissions, and other advanced settings.

  1. Choose Next
  2. Review the stack details and select I acknowledge that AWS CloudFormation might create AWS IAM resources.

  1. Choose Submit. This initiates pipeline deployment in your AWS account.
  2. After the stack is deployed successfully, then you can start using the pipeline. Create a /training-data folder under your specified Amazon S3 location for input. Note: Amazon S3 automatically applies server-side encryption (SSE-S3) for each new object unless you specify a different encryption option. Please refer Data protection in Amazon S3 for more details on data protection and encryption in Amazon S3.

  1. Upload your training data to the folder. (If the training data are semi-structure, then upload all the PDF files before uploading .csv format label information).

You’re done! You’ve successfully deployed your pipeline and you can check the pipeline status in deployed step function. (You will have a trained model in your Amazon Comprehend custom classification panel).

If you choose the model and its version inside Amazon Comprehend Console, then you can now see more details about the model you just trained. It includes the Mode you select, which corresponds to the option Q05MultiClass, the number of labels, and the number of trained and test documents inside your training data. You could also check the overall performance below; however, if you want to check detailed performance for each class, then please refer to the Performance Report generated by the deployed pipeline.

Service quotas

Your AWS account has default quotas for Amazon Comprehend and AmazonTextract, if inputs are in semi-structure format. To view service quotas, please refer here for Amazon Comprehend and here for AmazonTextract.

Clean up

To avoid incurring ongoing charges, delete the resources you created as part of this solution when you’re done.

  1. On the Amazon S3 console, manually delete the contents inside buckets you created for input and output data.
  2. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  3. Select the main stack and choose Delete.

This automatically deletes the deployed stack.

  1. Your trained Amazon Comprehend custom classification model will remain in your account. If you don’t need it anymore, in Amazon Comprehend console, delete the created model.

Conclusion

In this post, we showed you the concept of a scalable training pipeline for Amazon Comprehend custom classification models and providing an automated solution to efficiently training new models. The AWS CloudFormation template provided makes it possible for you to create your own text classification models effortlessly, catering to demand scales. The solution adopts the recent announced Euclid feature and accepts inputs in text or semi-structured format.

Now, we encourage you, our readers, to test these tools. You can find more details about training data preparation and understand the custom classifier metrics. Try it out and see firsthand how it can streamline your model training process and enhance efficiency. Please share your feedback to us!


About the Authors

Sandeep Singh is a Senior Data Scientist with AWS Professional Services. He is passionate about helping customers innovate and achieve their business objectives by developing state-of-the-art AI/ML powered solutions. He is currently focused on Generative AI, LLMs, prompt engineering, and scaling Machine Learning across enterprises. He brings recent AI advancements to create value for customers.

Yanyan Zhang is a Senior Data Scientist in the Energy Delivery team with AWS Professional Services. She is passionate about helping customers solve real problems with AI/ML knowledge. Recently, her focus has been on exploring the potential of Generative AI and LLM. Outside of work, she loves traveling, working out and exploring new things.

Wrick Talukdar is a Senior Architect with the Amazon Comprehend Service team. He works with AWS customers to help them adopt machine learning on a large scale. Outside of work, he enjoys reading and photography.

Read More

Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator

Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator

Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific tasks.

With Amazon SageMaker, now you can run a SageMaker training job simply by annotating your Python code with @remote decorator. The SageMaker Python SDK automatically translates your existing workspace environment, and any associated data processing code and datasets, into an SageMaker training job that runs on the training platform. This has the advantage of writing the code in a more natural, object-oriented way, and still uses SageMaker capabilities to run training jobs on a remote cluster with minimal changes.

In this post, we showcase how to fine-tune a Falcon-7B Foundation Models (FM) using @remote decorator from SageMaker Python SDK. It also uses Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization techniques through bitsandbytes to support fine-tuning. The code presented in this blog can also be used to fine-tune other FMs, such as Llama-2 13b.

The full precision representations of this model might have challenges to fit into memory on a single or even several Graphic Processing Units (GPUs) — or may even need a bigger instance. Hence, in order to fine-tune this model without increasing cost, we use the technique known as Quantized LLMs with Low-Rank Adapters (QLoRA). QLoRA is an efficient fine-tuning approach that reduces memory usage of LLMs while maintaining very good performance.

Advantages of using @remote decorator

Before going further, let’s understand how remote decorator improves developer productivity while working with SageMaker:

  • @remote decorator triggers a training job directly using native python code, without the explicit invocation of SageMaker Estimators and SageMaker input channels
  • Low barrier for entry for developers training models on SageMaker.
  • No need to switch Integrated development environments (IDEs). Continue writing code in your choice of IDE and invoke SageMaker training jobs.
  • No need to learn about containers. Continue providing dependencies in a requirements.txt and supply that to remote decorator.

Prerequisites

An AWS account is needed with an AWS Identity and Access Management (AWS IAM) role that has permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account.

In this post, we use Amazon SageMaker Studio with the Data Science 3.0 image and a ml.t3.medium fast launch instance. However, you can use any integrated development environment (IDE) of your choice. You just need to set up your AWS Command Line Interface (AWS CLI) credentials correctly. For more information, refer to Configure the AWS CLI.

For fine-tuning, the Falcon-7B, an ml.g5.12xlarge instance is used in this post. Please ensure sufficient capacity for this instance in AWS account.

You need to clone this Github repository for replicating the solution demonstrated in this post.

Solution overview

  1. Install pre-requisites to fine tuning the Falcon-7B model
  2. Set up remote decorator configurations
  3. Preprocess the dataset containing AWS services FAQs
  4. Fine-tune Falcon-7B on AWS services FAQs
  5. Test the fine-tune models on sample questions related to AWS services

1. Install prerequisites to fine tuning the Falcon-7B model

Launch the notebook falcon-7b-qlora-remote-decorator_qa.ipynb in SageMaker Studio by selecting the Image as Data Science and Kernel as Python 3. Install all the required libraries mentioned in the requirements.txt. Few of the libraries need to be installed on the notebook instance itself. Perform other operations needed for dataset processing and triggering a SageMaker training job.

%pip install -r requirements.txt

%pip install -q -U transformers==4.31.0
%pip install -q -U datasets==2.13.1
%pip install -q -U peft==0.4.0
%pip install -q -U accelerate==0.21.0
%pip install -q -U bitsandbytes==0.40.2
%pip install -q -U boto3
%pip install -q -U sagemaker==2.154.0
%pip install -q -U scikit-learn

2. Setup remote decorator configurations

Create a configuration file where all the configurations related to Amazon SageMaker training job are specified. This file is read by @remote decorator while running the training job. This file contains settings like dependencies, training image, instance, and the execution role to be used for training job. For a detailed reference of all the settings supported by config file, check out Configuring and using defaults with the SageMaker Python SDK.

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: ./requirements.txt
        ImageUri: '{aws_account_id}.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04'
        InstanceType: ml.g5.12xlarge
        RoleArn: arn:aws:iam::111122223333:role/ExampleSageMakerRole

It’s not mandatory to use the config.yaml file in order to work with the @remote decorator. This is just a cleaner way to supply all configurations to the @remote decorator. This keeps SageMaker and AWS related parameters outside of code with a one time effort for setting up the config file used across the team members. All the configurations could also be supplied directly in the decorator arguments, but that reduces readability and maintainability of changes in the long run. Also, the configuration file can be created by an administrator and shared with all the users in an environment.

Preprocess the dataset containing AWS services FAQs

Next step is to load and preprocess the dataset to make it ready for training job. First, let us have a look at the dataset:

It shows FAQ for one of the AWS services. In addition to QLoRA, bitsanbytes is used to convert to 4-bit precision to quantize frozen LLM to 4-bit and attach LoRA adapters on it.

Create a prompt template to convert each FAQ sample to a prompt format:

from random import randint

# custom instruct prompt start
prompt_template = f"{{question}}n---nAnswer:n{{answer}}{{eos_token}}"

# template dataset to add prompt to each sample
def template_dataset(sample):
    sample["text"] = prompt_template.format(question=sample["question"],
                                            answer=sample["answers"],
                                            eos_token=tokenizer.eos_token)
    return sample

Next step is to convert the inputs (text) to token IDs. This is done by a Hugging Face Transformers Tokenizer.

from transformers import AutoTokenizer

model_id = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set the Falcon tokenizer
tokenizer.pad_token = tokenizer.eos_token

Now simply use the prompt_template function to convert all the FAQ to prompt format and set up train and test datasets.

4. Fine tune Falcon-7B on AWS services FAQs

Now you can prepare the training script and define the training function train_fn and put @remote decorator on the function.

The training function does the following:

  • tokenizes and chunks the dataset
  • set up BitsAndBytesConfig, which specifies the model should be loaded in 4-bit but while computation should be converted to bfloat16.
  • Load the model
  • Find target modules and update the necessary matrices by using the utility method find_all_linear_names
  • Create LoRA configurations that specify ranking of update matrices (s), scaling factor (lora_alpha), the modules to apply the LoRA update matrices (target_modules), dropout probability for Lora layers(lora_dropout), task_type, etc.
  • Start the training and evaluation
import bitsandbytes as bnb

def find_all_linear_names(hf_model):
    lora_module_names = set()
    for name, module in hf_model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:
        lora_module_names.remove("lm_head")
    return list(lora_module_names)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sagemaker.remote_function import remote
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import transformers

# Start training
@remote(volume_size=50)
def train_fn(
        model_name,
        train_ds,
        test_ds,
        lora_r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        learning_rate=2e-4,
        num_train_epochs=1
):
    # tokenize and chunk dataset
    lm_train_dataset = train_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, batch_size=24, remove_columns=list(train_dataset.features)
    )


    lm_test_dataset = test_ds.map(
        lambda sample: tokenizer(sample["text"]), batched=True, remove_columns=list(test_dataset.features)
    )

    # Print total number of samples
    print(f"Total number of train samples: {len(lm_train_dataset)}")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    # Falcon requires you to allow remote code execution. This is because the model uses a new architecture that is not part of transformers yet.
    # The code is provided by the model authors in the repo.
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        quantization_config=bnb_config,
        device_map="auto")

    model.gradient_checkpointing_enable()
    model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)

    # get lora target modules
    modules = find_all_linear_names(model)
    print(f"Found {len(modules)} modules to quantize: {modules}")

    config = LoraConfig(
        r=lora_r,
        lora_alpha=lora_alpha,
        target_modules=modules,
        lora_dropout=lora_dropout,
        bias="none",
        task_type="CAUSAL_LM"
    )

    model = get_peft_model(model, config)
    print_trainable_parameters(model)

    trainer = transformers.Trainer(
        model=model,
        train_dataset=lm_train_dataset,
        eval_dataset=lm_test_dataset,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=per_device_train_batch_size,
            per_device_eval_batch_size=per_device_eval_batch_size,
            logging_steps=2,
            num_train_epochs=num_train_epochs,
            learning_rate=learning_rate,
            bf16=True,
            save_strategy="no",
            output_dir="outputs"
        ),
        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
    )
    model.config.use_cache = False

    trainer.train()
    trainer.evaluate()

    model.save_pretrained("/opt/ml/model")

And invoke the train_fn()

train_fn(model_id, train_dataset, test_dataset)

The tuning job would be running on the Amazon SageMaker training cluster. Wait for tuning job to finish.

5. Test the fine tune models on sample questions related to AWS services

Now, it’s time to run some tests on the model. First, let us load the model:

from peft import PeftModel, PeftConfig
import torch
from transformers import AutoModelForCausalLM

device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

config = PeftConfig.from_pretrained("./model")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
model = PeftModel.from_pretrained(model, "./model")
model.to(device)

Now load a sample question from the training dataset to see the original answer and then ask the same question from the tuned model to see the answer in comparison.

Here is a sample a question from training set and the original answer:

Now, same question being asked to tuned Falcon-7B model:

This concludes the implementation of fine tuning Falcon-7B on AWS services FAQ dataset using @remote decorator from Amazon SageMaker Python SDK.

Cleaning up

Complete the following steps to clean up your resources:

  • Shut down the Amazon SageMaker Studio instances to avoid incurring additional costs.
  • Clean up your Amazon Elastic File System (Amazon EFS) directory by clearing the Hugging Face cache directory:
    rm -R ~/.cache/huggingface/hub

Conclusion

In this post, we showed you how to effectively use the @remote decorator’s capabilities to fine-tune Falcon-7B model using QLoRA, Hugging Face PEFT with bitsandbtyes without applying significant changes in the training notebook, and used Amazon SageMaker capabilities to run training jobs on a remote cluster.

All the code shown as part of this post to fine-tune Falcon-7B is available in the GitHub repository. The repository also contains notebook showing how to fine-tune Llama-13B.

As a next step, we encourage you to check out the @remote decorator functionality and Python SDK API and use it in your choice of environment and IDE. Additional examples are available in the amazon-sagemaker-examples repository to get you started quickly. You can also check out the following posts:


About the Authors

Bruno Pistone is an AI/ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations.

Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Read More

Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents

Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents

This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful.

Unstructured data accounts for 80% of all the data found within organizations, consisting of repositories of manuals, PDFs, FAQs, emails, and other documents that grows daily. Businesses today rely on continuously growing repositories of internal information, and problems arise when the amount of unstructured data becomes unmanageable. Often, users find themselves reading and checking many different internal sources to find the answers they need.

Internal question and answer forums can help users get highly specific answers but also require longer wait times. In the case of company-specific internal FAQs, long wait times result in lower employee productivity. Question and answer forums are difficult to scale as they rely on manually written answers. With generative AI, there is currently a paradigm shift in how users search and find information. The next logical step is to use generative AI to condense large documents into smaller bite sized information for easier user consumption. Instead of spending a long time reading text or waiting for answers, users can generate summaries in real-time based on multiple existing repositories of internal information.

Solution overview

The solution allows customers to retrieve curated responses to questions asked about internal documents by using a transformer model to generate answers to questions about data that it has not been trained on, a technique known as zero-shot prompting. By adopting this solution, customers can gain the following benefits:

  • Find accurate answers to questions based on existing sources of internal documents
  • Reduce the time users spend searching for answers by using Large Language Models (LLMs) to provide near-immediate answers to complex queries using documents with the most updated information
  • Search previously answered questions through a centralized dashboard
  • Reduce stress caused by spending time manually reading information to look for answers

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) reduces some of the shortcomings of LLM based queries by finding the answers from your knowledge base and using the LLM to summarize the documents into concise responses. Please read this post to learn how to implement the RAG approach with Amazon Kendra. The following risks and limitations are associated with LLM based queries that a RAG approach with Amazon Kendra addresses:

  • Hallucinations and traceability – LLMS are trained on large data sets and generate responses on probabilities. This can lead to inaccurate answers, which are known as hallucinations.
  • Multiple data silos – In order to reference data from multiple sources within your response, one needs to set up a connector ecosystem to aggregate the data. Accessing multiple repositories is manual and time-consuming.
  • Security – Security and privacy are critical considerations when deploying conversational bots powered by RAG and LLMs. Despite using Amazon Comprehend to filter out personal data that may be provided through user queries, there remains a possibility of unintentionally surfacing personal or sensitive information, depending on the ingested data. This means that controlling access to the chatbot is crucial to prevent unintended access to sensitive information.
  • Data relevance – LLMS are trained on data up to certain date, which means information is often not current. The cost associated with training models on recent data is high. To ensure accurate and up-to-date responses, organizations bear the responsibility of regularly updating and enriching the content of the indexed documents.
  • Cost – The cost associated with deploying this solution should be a consideration for businesses. Businesses need to carefully assess their budget and performance requirements when implementing this solution. Running LLMs can require substantial computational resources, which may increase operational costs. These costs can become a limitation for applications that need to operate at a large scale. However, one of the benefits of the AWS Cloud is the flexibility to only pay for what you use. AWS offers a simple, consistent, pay-as-you-go pricing model, so you are charged only for the resources you consume.

Usage of Amazon SageMaker JumpStart

For transformer-based language models, organizations can benefit from using Amazon SageMaker JumpStart, which offers a collection of pre-built machine learning models. Amazon SageMaker JumpStart offers a wide range of text generation and question-answering (Q&A) foundational models that can be easily deployed and utilized. This solution integrates a FLAN T5-XL Amazon SageMaker JumpStart model, but there are different aspects to keep in mind when choosing a foundation model.

Integrating security in our workflow

Following the best practices of the Security Pillar of the Well-Architected Framework, Amazon Cognito is used for authentication. Amazon Cognito User Pools can be integrated with third-party identity providers that support several frameworks used for access control, including Open Authorization (OAuth), OpenID Connect (OIDC), or Security Assertion Markup Language (SAML). Identifying users and their actions allows the solution to maintain traceability. The solution also uses the Amazon Comprehend personally identifiable information (PII) detection feature to automatically identity and redact PII. Redacted PII includes addresses, social security numbers, email addresses, and other sensitive information. This design ensures that any PII provided by the user through the input query is redacted. The PII is not stored, used by Amazon Kendra, or fed to the LLM.

Solution Walkthrough

The following steps describe the workflow of the Question answering over documents flow:

  1. Users send a query through a web interface.
  2. Amazon Cognito is used for authentication, ensuring secure access to the web application.
  3. The web application front-end is hosted on AWS Amplify.
  4. Amazon API Gateway hosts a REST API with various endpoints to handle user requests that are authenticated using Amazon Cognito.
  5. PII redaction with Amazon Comprehend:
    • User Query Processing: When a user submits a query or input, it is first passed through Amazon Comprehend. The service analyzes the text and identifies any PII entities present within the query.
    • PII Extraction: Amazon Comprehend extracts the detected PII entities from the user query.
  6. Relevant Information Retrieval with Amazon Kendra:
    • Amazon Kendra is used to manage an index of documents that contains the information used to generate answers to the user’s queries.
    • The LangChain QA retrieval module is used to build a conversation chain that has relevant information about the user’s queries.
  7. Integration with Amazon SageMaker JumpStart:
    • The AWS Lambda function uses the LangChain library and connects to the Amazon SageMaker JumpStart endpoint with a context-stuffed query. The Amazon SageMaker JumpStart endpoint serves as the interface of the LLM used for inference.
  8. Storing responses and returning it to the user:
    • The response from the LLM is stored in Amazon DynamoDB along with the user’s query, the timestamp, a unique identifier, and other arbitrary identifiers for the item such as question category. Storing the question and answer as discrete items allows the AWS Lambda function to easily recreate a user’s conversation history based on the time when questions were asked.
    • Finally, the response is sent back to the user via a HTTPs request through the Amazon API Gateway REST API integration response.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. Check and redact any PII / Sensitive info
  2. LangChain QA Retrieval Chain
    • Search and retrieve relevant info
  3. Context Stuffing & Prompt Engineering
    • LangChain
  4. Inference with LLM
  5. Return response & Save it

Use cases

There are many business use cases where customers can use this workflow. The following section explains how the workflow can be used in different industries and verticals.

Employee Assistance

Well-designed corporate training can improve employee satisfaction and reduce the time required for onboarding new employees. As organizations grow and complexity increases, employees find it difficult to understand the many sources of internal documents. Internal documents in this context include company guidelines, policies, and Standard Operating Procedures. For this scenario, an employee has a question in how to proceed and edit an internal issue ticketing ticket. The employee can access and use the generative artificial intelligence (AI) conversational bot to ask and execute the next steps for a specific ticket.

Specific use case: Automate issue resolution for employees based on corporate guidelines.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent to identify the intent
  2. Send notification based on employee request
  3. Modify ticket status

In this architecture diagram, corporate training videos can be ingested through Amazon Transcribe to collect a log of these video scripts. Additionally, corporate training content stored in various sources (i.e., Confluence, Microsoft SharePoint, Google Drive, Jira, etc.) can be used to create indexes through Amazon Kendra connectors. Read this article to learn more on the collection of native connectors you can utilize in Amazon Kendra as a source point. The Amazon Kendra crawler is then able to use both the corporate training video scripts and documentation stored in these other sources to assist the conversational bot in answering questions specific to company corporate training guidelines. The LangChain agent verifies permissions, modifies ticket status, and notifies the correct individuals using Amazon Simple Notification Service (Amazon SNS).

Customer Support Teams

Quickly resolving customer queries improves the customer experience and encourages brand loyalty. A loyal customer base helps drive sales, which contributes to the bottom line and increases customer engagement. Customer support teams spend lots of energy referencing many internal documents and customer relationship management software to answer customer queries about products and services. Internal documents in this context can include generic customer support call scripts, playbooks, escalation guidelines, and business information. The generative AI conversational bot helps with cost optimization because it handles queries on behalf of the customer support team.

Specific use case: Handling an oil change request based on service history and customer service plan purchased.

In this architecture diagram, the customer is routed to either the generative AI conversational bot or the Amazon Connect contact center. This decision can be based on the level of support needed or the availability of customer support agents. The LangChain agent identifies the customer’s intent and verifies identity. The LangChain agent also checks the service history and purchased support plan.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent identifies the intent
  2. Retrieve Customer Information
  3. Check customer service history and warranty information
  4. Book appointment, provide more information, or route to contact center
  5. Send email confirmation

Amazon Connect is used to collect the voice and chat logs, and Amazon Comprehend is used to remove personally identifiable information (PII) from these logs. The Amazon Kendra crawler is then able to use the redacted voice and chat logs, customer call scripts, and customer service support plan policies to create the index. Once a decision is made, the generative AI conversational bot decides whether to book an appointment, provide more information, or route the customer to the contact center for further assistance. For cost optimization, the LangChain agent can also generate answers using fewer tokens and a less expensive large language model for lower priority customer queries.

Financial Services

Financial services companies rely on timely use of information to stay competitive and comply with financial regulations. Using a generative AI conversational bot, financial analysts and advisors can interact with textual information in a conversational manner and reduce the time and effort it takes to make better informed decisions. Outside of investment and market research, a generative AI conversational bot can also augment human capabilities by handling tasks that would traditionally require more human effort and time. For example, a financial institution specializing in personal loans can increase the rate at which loans are processed while providing better transparency to customers.

Specific use case: Use customer financial history and previous loan applications to decide and explain loan decision.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent to identify the intent
  2. Check customer financial and credit score history
  3. Check internal customer relationship management system
  4. Check standard loan policies and suggest decision for employee qualifying the loan
  5. Send notification to customer

This architecture incorporates customer financial data stored in a database and data stored in a customer relationship management (CRM) tool. These data points are used to inform a decision based on the company’s internal loan policies. The customer is able to ask clarifying questions to understand what loans they qualify for and the terms of the loans they can accept. If the generative AI conversational bot is unable to approve a loan application, the user can still ask questions about improving credit scores or alternative financing options.

Government

Generative AI conversational bots can greatly benefit government institutions by speeding up communication, efficiency, and decision-making processes. Generative AI conversational bots can also provide instant access to internal knowledge bases to help government employees to quickly retrieve information, policies, and procedures (i.e., eligibility criteria, application processes, and citizen’s services and support). One solution is an interactive system, which allows tax payers and tax professionals to easily find tax-related details and benefits. It can be used to understand user questions, summarize tax documents, and provide clear answers through interactive conversations.

Users can ask questions such as:

  • How does inheritance tax work and what are the tax thresholds?
  • Can you explain the concept of income tax?
  • What are the tax implications when selling a second property?

Additionally, users can have the convenience of submitting tax forms to a system, which can help verify the correctness of the information provided.

This architecture illustrates how users can upload completed tax forms to the solution and utilize it for interactive verification and guidance on how to accurately completing the necessary information.

Healthcare

Healthcare businesses have the opportunity to automate the use of large amounts of internal patient information, while also addressing common questions regarding use cases such as treatment options, insurance claims, clinical trials, and pharmaceutical research. Using a generative AI conversational bot enables quick and accurate generation of answers about health information from the provided knowledge base. For example, some healthcare professionals spend a lot of time filling in forms to file insurance claims.

In similar settings, clinical trial administrators and researchers need to find information about treatment options. A generative AI conversational bot can use the pre-built connectors in Amazon Kendra to retrieve the most relevant information from the millions of documents published through ongoing research conducted by pharmaceutical companies and universities.

Specific use case: Reduce the errors and time needed to fill out and send insurance forms.

In this architecture diagram, a healthcare professional is able to use the generative AI conversational bot to figure out what forms need to be filled out for the insurance. The LangChain agent is then able to retrieve the right forms and add the needed information for a patient as well as giving responses for descriptive parts of the forms based on insurance policies and previous forms. The healthcare professional can edit the responses given by the LLM before approving and having the form delivered to the insurance portal.

The following steps describe the AWS Lambda functions and their flow through the process:

  1. LangChain agent to identify the intent
  2. Retrieve the patient information needed
  3. Fill out the insurance form based on the patient information and form guideline
  4. Submit the form to the insurance portal after user approval

AWS HealthLake is used to securely store the health data including previous insurance forms and patient information, and Amazon Comprehend is used to remove personally identifiable information (PII) from the previous insurance forms. The Amazon Kendra crawler is then able to use the set of insurance forms and guidelines to create the index. Once the form(s) are filled out by the generative AI, then the form(s) reviewed by the medical professional can be sent to the insurance portal.

Cost estimate

The cost of deploying the base solution as a proof-of-concept is shown in the following table. Since the base solution is considered a proof-of-concept, Amazon Kendra Developer Edition was used as a low-cost option since the workload would not be in production. Our assumption for Amazon Kendra Developer Edition was 730 active hours for the month.

For Amazon SageMaker, we made an assumption that the customer would be using the ml.g4dn.2xlarge instance for real-time inference, with a single inference endpoint per instance. You can find more information on Amazon SageMaker pricing and available inference instance types here.

Service Resources Consumed Cost Estimate Per Month in USD
AWS Amplify 150 build minutes
1 GB of Data served
500,000 requests
15.71
Amazon API Gateway 1M REST API Calls 3.5
AWS Lambda 1 Million requests
5 seconds duration per request
2 GB memory allocated
160.23
Amazon DynamoDB 1 million reads
1 million writes
100 GB storage
26.38
Amazon Sagemaker Real-time inference with ml.g4dn.2xlarge 676.8
Amazon Kendra Developer Edition with 730 hours/month
10,000 Documents scanned
5,000 queries/day
821.25
. . Total Cost: 1703.87

*  Amazon Cognito has a free tier of 50,000 Monthly Active Users who use Cognito User Pools or 50 Monthly Active Users who use SAML 2.0 identity providers

Clean Up

To save costs, delete all the resources you deployed as part of the tutorial. You can delete any SageMaker endpoints you may have created via the SageMaker console. Remember, deleting an Amazon Kendra index doesn’t remove the original documents from your storage.

Conclusion

In this post, we showed you how to simplify access to internal information by summarizing from multiple repositories in real-time. After the recent developments of commercially available LLMs, the possibilities of generative AI have become more apparent. In this post, we showcased ways to use AWS services to create a serverless chatbot that uses generative AI to answer questions. This approach incorporates an authentication layer and Amazon Comprehend’s PII detection to filter out any sensitive information provided in the user’s query. Whether it be individuals in healthcare understanding the nuances to file insurance claims or HR understanding specific company-wide regulations, there’re multiple industries and verticals that can benefit from this approach. An Amazon SageMaker JumpStart foundation model is the engine behind the chatbot, while a context stuffing approach using the RAG technique is used to ensure that the responses more accurately reference internal documents.

To learn more about working with generative AI on AWS, refer to Announcing New Tools for Building with Generative AI on AWS. For more in-depth guidance on using the RAG technique with AWS services, refer to Quickly build high-accuracy Generative AI applications on enterprise data using Amazon Kendra, LangChain, and large language models. Since the approach in this blog is LLM agnostic, any LLM can be used for inference. In our next post, we’ll outline ways to implement this solution using Amazon Bedrock and the Amazon Titan LLM.


About the Authors

Abhishek Maligehalli Shivalingaiah is a Senior AI Services Solution Architect at AWS. He is passionate about building applications using Generative AI, Amazon Kendra and NLP. He has around 10 years of experience in building Data & AI solutions to create value for customers and enterprises. He has even built a (personal) chatbot for fun to answers questions about his career and professional journey. Outside of work he enjoys making portraits of family & friends, and loves creating artworks.

Medha Aiyah is an Associate Solutions Architect at AWS, based in Austin, Texas. She recently graduated from the University of Texas at Dallas in December 2022 with her Masters of Science in Computer Science with a specialization in Intelligent Systems focusing on AI/ML. She is interested to learn more about AI/ML and utilizing AWS services to discover solutions customers can benefit from.

Hugo Tse is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Information Technology from Arizona State University and a bachelor’s degree in Economics from the University of Chicago. He is a member of the Information Systems Audit and Control Association (ISACA) and International Information System Security Certification Consortium (ISC)2. He enjoys helping customers benefit from technology.

Ayman Ishimwe is an Associate Solutions Architect at AWS based in Seattle, Washington. He holds a Master’s degree in Software Engineering and IT from Oakland University. He has a prior experience in software development, specifically in building microservices for distributed web applications. He is passionate about helping customers build robust and scalable solutions on AWS cloud services following best practices.

Shervin Suresh is an Associate Solutions Architect at AWS based in Austin, Texas. He has graduated with a Masters in Software Engineering with a Concentration in Cloud Computing and Virtualization and a Bachelors in Computer Engineering from San Jose State University. He is passionate about leveraging technology to help improve the lives of people from all backgrounds.

Read More

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

MSR Podcast | AI Frontiers | Ahmed Awadallah

Episode 149 | Sept. 14, 2023

Powerful large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come.  

In this Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these models—and the models that will come next—mean for our approach to creating, understanding, and deploying AI, its applications in areas such as healthcare and education, and its potential to benefit humanity.

This episode features Senior Principal Research Manager Ahmed H. Awadallah, whose work improving the efficiency of large-scale AI models and efforts to help move advancements in the space from research to practice have put him at the forefront of this new era of AI. Awadallah discusses the shift in dynamics between model size and amount—and quality—of data when it comes to model training; the recently published paper “Orca: Progressive Learning from Complex Explanation Traces of GPT-4,” which further explores the use of large-scale AI models to improve the performance of smaller, less powerful ones; and the need for better evaluation strategies, particularly as we move into a future in which Awadallah hopes to see gains in these models’ ability to continually learn.

Transcript

[MUSIC PLAYS]

ASHLEY LLORENS: I’m Ashley Llorens with Microsoft Research. I’ve spent the last 20 years working in AI and machine learning, but I’ve never felt more inspired to work in the field than right now. The release of GPT-4 was a watershed moment in the pursuit of artificial intelligence, and yet progress continues to accelerate. The latest large-scale AI models and the systems they power are continuing to exhibit improvements in reasoning, problem-solving, and translation across languages and domains. In this podcast series, I’m sharing conversations with fellow researchers about the latest developments in large-scale AI models, the work we’re doing to understand their capabilities and limitations, and ultimately how innovations like these can have the greatest benefit for humanity. Welcome to AI Frontiers.

Today, I’ll speak with Ahmed Awadallah. Ahmed is a Senior Principal Researcher at Microsoft Research in Redmond. Much of his work focuses on machine learning, helping to create foundation models that excel at key tasks while using less compute and energy. His work has been at the leading edge of recent progress in AI and gives him a unique perspective on where it will go next.


[MUSIC FADES] 

All right, Ahmed, let’s dive right in. Among other things, I find that people are hungry to understand the drivers of the progress we’re seeing in AI. Over these last few years when people like you or I have tried to explain this, we’ve often pointed to some measure of scale. You know, I know many times as I’ve given talks in AI, I’ve shown plots that feature some kind of up-and-to-the-right trend in scale over time—the increasing size of the AI models we’re training, the increasing size of the datasets we’re using to train them on, or even the corresponding increase in the overall compute budget. But when you double-click into this general notion of scale related to large AI models, what gets exposed is really a rapidly evolving frontier of experimental science. So, Ahmed, I’m going to start with a big question and then we can kind of decompose it from there. As someone at the forefront of all of this, how has your understanding of what’s driving progress in AI changed over this last year?

AHMED AWADALLAH: Thanks, Ashley. That’s a very good question. And the short answer is it’s changed a lot. I think I have never been learning as much as I have been throughout my career. Things are moving really, really fast. The progress is amazing to witness, and we’re just learning more and more every day. To your point, for quite some time, we were thinking of scale as the main driver of progress, and scale is clearly very important and necessary. But over the last year, we have been also seeing many different things. Maybe the most prominent one is the importance of data being used for training these models. And that’s not very separate from scale, because when we think about scale, what really matters is how much compute we are spending in training these models. And you can choose to spend that compute in making the model bigger or in training it on more and more data, training it for longer. And it has been over the past few years a lot of iterations in trying to understand that. But it has been very clear over the last year that we were, in a sense, underestimating the value of data in different ways: number one, in having more data but even more important, the quality of the data, having cleaner data, having more representative data, and also the distribution or the mixing of the data that we are using. Like, for example, one of the very interesting things we have witnessed maybe over the last year to year and a half is that a lot of the language models are being trained on text and code. And surprisingly, the training on code is actually helping the model a lot—not just in coding tasks but in normal other tasks that do not really involve coding. More importantly, I think one of the big shifts last year in particular—it has been happening for quite some time but we have been seeing a lot of value for it last year—is that there are now like two stages of training these models: the pretraining stage, where you are actually training the language model in an autoregressive manner to predict the next word. And that just makes it a very good language model. But then the post-training stage with the instruction tuning and RLHF (reinforcement learning from human feedback) and reward models, using a very different form of data; this is not self-supervised, freely available data on the internet anymore. This is human-generated, human-curated, maybe a mixture of model- and human-curated data that’s trying to get the model to be better at very specific elements like being more helpful or being harmless. 

LLORENS: There’s so much to unpack even in that, in that short answer. So let’s, let’s dig in to some of these core concepts here. You, you teed up this notion of ways to spend compute, you know, ways to spend a compute budget. And one of the things you said was, you know, one of the things we can do is make the model bigger. And I think to really illustrate this concept, we need to, we need to dig in to what that means. One, one concept that gets obfuscated there a little bit is the architecture of the model. So what does it mean to make the model bigger? Maybe you can tell us something about, you know, how to think about parameters in the model and how important is architecture in that, in that conversation.

AWADALLAH: So most of the progress, especially in language and other domains, as well, have been using the transformer model. And the transformer model have been actually very robust to change over the years. I don’t … I think a lot … I’ve asked a lot of experts over the years whether they had expected the transformer model to be still around five, six years later, and most of them thought we would have something very different. But it has been very robust and very universal, and, yes, there have been improvements and changes, but the core idea has still been the same. And with dense transformer models, the size of the model tends to be around the number of layers that you have in the model and then the number of parameters that you have in each layer, which is basically the depths and the widths of the model. And we have been seeing very steady exponential increase in that. It’s very, it’s very interesting to think that just like five years ago when BERT came up, the large model was like 300-something million parameters and the smaller one was 100 million parameters. And we consider these to be really large ones. Now that’s a very, very small scale. So things have been moving and moving really fast in making these models bigger. But over the time, there started to be an understanding being developed of how big should the model be. If I were to invest a certain amount of compute, what should I do with that in terms of the model size and especially on how it relates to the data side? And, perhaps, one of the most significant efforts there was the OpenAI scaling laws, which came up in 2020, late 2020, I think. And it was basically saying that if you are … if you have 10x more compute to spend, then you should dedicate maybe five of that … 5x of that to making the model bigger—more layers, more width—and maybe 2x to making the data bigger. And that translated to … for, like say, GPT-3-like model being trained on almost 300 billion tokens, and for quite some time, the 300 billion tokens was stuck, like it became the standard, and a lot of people were using that. But then fast-forward less than two years later came the second iteration of the scaling laws, the Chinchilla paper, where the, the recommendation was slightly different. It was like we were not paying enough attention to the size of the data. Actually, you should now think of the data and the size as equally … and the size of the model … as equally important. So if you were to invest in X more, you should just split them evenly between bigger models and more data. And that was quite a change, and it actually got all the people to pay more attention to the data. But then fast-forward one more year, in 2023—and maybe pioneered mostly with the Llama work from Meta and then many, many others followed suit—we started finding out that we don’t have to operate at this optimal point. We can actually push for more data and the model will continue to improve. And that’s interesting because when you are thinking about the training versus the deployment or the inference parts of the life cycle of the model, they are actually very different. When you are training the model, you would like the model to learn to generalize as best as possible. When you are actually using the model, the size of the model becomes a huge difference. I actually recall an interesting quote from a 2015 paper by Geoff Hinton and others. That’s the paper that introduced the idea of distillation for neural networks. Distillation was there before from the work of, of Rich Caruana, our colleague here at Microsoft, and others. But in 2015, there was this paper specifically discussing distilling models for neural network models, and one of the motivating sentences at the very beginning of the paper was basically talking about insects and how insects would have different forms throughout their life cycles. At the beginning of their life, they are optimized for extracting energy and nutrients from the environment, and then later on, in their adult form, they have very different forms as optimized for flying and traveling and reproduction and so on and so forth. So that, that analogy is very interesting here because like you can think about the same not just in the context of distillation, as this paper was describing, but just for pretraining the models in general. Yes, the optimal point might have been to equally split your compute between the data and the size, but actually going more towards having more and more data actually is beneficial. As long as the model is getting better, it will give you a lot more benefit because you have a smaller model to use during the inference time. And we would see that with the latest iteration of the Llama models, we are now seeing models as small as 7 billion parameters being trained on 1 to 2 trillion tokens of data, which was unheard before.

LLORENS: Let’s talk a bit more about evaluating performance. Of course, the neural scaling laws that you referenced earlier really predict how the performance of a model on the task of next word prediction will improve with the size of the model or the size of the data. But of course, that’s not what we really care about. What we’re really after is better performance on any number of downstream tasks like reasoning, document summarization, or even writing fiction. How do we predict and measure performance in that broader sense? 

AWADALLAH: Yeah, that’s a very good question. And that’s another area where our understanding of evaluating generative models in general has been challenged quite a bit over the last year in particular. And I think one of the areas that I would recommend to spend a lot of time working on right now is figuring out a better strategy around evaluating generative language models. We … this field has been very benchmark driven for many, many years, and we have been seeing a lot of very well-established benchmarks that have been helping the community in general make a lot of progress. We have seen leaderboards like GLUE and SuperGLUE, and many, many others play a very important role in the development of pretrained models. But over the last year, there has been a lot of changes. One is that these benchmarks are being saturated really, really quickly. There was … this paper that I was reading a few, reading a few months back talking about how we went from times where benchmarks like Switchboard and MNIST for speech and image processing lasted for 10 to 20 years before they get saturated to times where things like SQuAD and GLUE and SuperGLUE are getting saturated in a year or two to now where many of the benchmarks just get like maybe two or three submissions and that’s it. It gets saturated very quickly after that. BIG-Bench is a prime example of that, where it was like a collaborative effort, over 400 people coming together from many different institutions designing, a benchmark to challenge language models. And then came GPT-4, and we’re seeing that it’s doing really, really, really well, even in like zero-shot and, and, and few-shot settings, where the tasks are completely new to the models. So the model out of the box is basically solving a lot of the benchmarks that we have. That’s an artifact of the significant progress that we have been seeing and the speed of that progress, but it’s actually making that, that answer to that question even harder. But there’s another thing that’s making it even harder is that the benchmarks are giving us a much more limited view of the actual capabilities of these models compared to what they can actually do, especially models like GPT-4. The, the breadth of capabilities of the model is beyond what we had benchmarks to measure it with. And we have seen once it was released, then once people started interacting with it, there are so many experiences and so many efforts just thinking about what can we do with that model. Now we figured out that it can do this new task; it can do that new task. I can use it in this way that I didn’t think about before. So that expansion in the surface of capabilities of the model is making the question of evaluating them even, even harder and, and moving forward, I think this would be one of the most interesting areas to really spend time on.

LLORENS: Why don’t we talk a bit about a paper that you recently published with some Microsoft Research colleagueS called “Orca: Progressive Learning from Complex Explanation Traces of GPT-4.” And there’s a couple of, of concepts that we’ve been talking about that I want to pull through to, to a discussion around, around this work. One is the idea of quality of data. And so it would be great to hear, you know, some of the intuitions around … yeah, what, what drove you to focus on data quality versus, you know, number of parameters or number of tokens? And then we can also come back to this notion of benchmarks, because to publish, you have to pick some benchmarks, right? [LAUGHS] So, so first, why don’t we talk about the intuitions behind this paper and what you did there, and then I’d love to understand how you thought through the process of picking benchmarks to evaluate these models. 

AWADALLAH: Yeah, so, so in this paper, we were basically thinking about like … there has been a lot of work actually on thinking about how do we have a very powerful model and use it to improve a less powerful model. This is not a new concept. It has been there forever, and I mentioned the Hinton et al. paper on distillation, one of the pioneer papers applying that to neural networks. And over time, this field actually continued getting better and better. And the way the large, more powerful models were used just continued evolving. So people were using the logits generated by the model and then maybe looking at intermediate layers and their output, maybe looking at attention maps and trying to map that between the models and coming up with more and more complex ways of distilling information from the powerful model to improve a less powerful model. But with models like GPT-4, we were thinking that GPT-4 is so good that you can actually start thinking about different ways of having a model teaching another model. And in that particular case, the idea was, can we actually have the powerful model explain in step by step how to do the task, and can we actually have a smaller model learn from that? And how far can this actually help the smaller one? A big part of this has to do with the data quality but also with the teacher model quality. You wouldn’t be able to … and this gets us into the whole notion of synthesized data and the role of synthesized data can play in making models better. Models like GPT-4, the level of capability where you could actually generate a lot of synthetic data at a very high quality comparable in some cases to what you’d get from a human, better in some cases than what you could get from a human. And even more than that, when you are working with a model like GPT-4, there has been a lot of work over the last few months demonstrating that you can even get the model to be a lot better by having the model reflect on what it’s doing, having the model critique what it’s doing and try to come up with even corrections and improvements to its own generation. And once you have this going, you see that you can actually create very high-quality synthetic data in so many ways, mostly because of the quality of the model but also because of like these different ways of generating the data on top of the model. And then it was really an experiment of how far can another model learn from these models. And by the way—and there is … we’re seeing some work like that, as well—it doesn’t even have to be a different model. It can be the same model improving itself. It can be the same model giving feedback to itself. That coincided with actually us having, having … we have been spending a lot of time thinking about this idea of learning from feedback or like continual improvement. How can we take a language model and continue to improve it based on interaction, based on feedback? So we started connecting these two concepts and basically thinking of it like the powerful model is just giving feedback to our much less powerful model and trying to help it improve across certain dimensions. And that’s where that line of work started. And what we were finding out is that you can actually have the more powerful model teach a smaller model. It would have definitely much narrower capabilities than the bigger model because like by virtue of this training cycle, you are just focused on teaching it particular concepts. You cannot teach it everything that the larger model can do. But also because this is another example of this like post-training step, like this model has already been pretrained language model and it’s always limited by the basic capabilities that it has. So, yes, the large language model can teach it a little bit more, but it will always be limited by that.

LLORENS: Now you mentioned … you’ve sketched out now the idea of using a powerful general-purpose model through some process of distillation to train a, a smaller, more special, more specialized model. And in the paper, you, you and your colleagues offer a number of case studies. So can you, can you pick one? Give, give us, you know, give us an example of a specialized domain and the way that you utilize GPT-4 to accomplish this training and what the performance outcome was. 

AWADALLAH: Yeah, actually, when we were working on this paper, the team was thinking that what capability should we try to focus on to, to demonstrate that the small model can improve from, from the guidance of the much more powerful model. And we were thinking it would be very cool if we can demonstrate that the small model can get better at reasoning, because reasoning has been one of the capabilities that have been clearly emerging with larger and larger models, and models like GPT-4 demonstrate the level of reasoning that we have never seen with any of our systems before. So we were thinking can we … can, can GPT-4 help actually get the smaller model to be better at reasoning. And that had a lot of implications on the selection of what datasets to use for, for creating the synthetic data. In this particular paper, by the way, we’re not, we’re not using GPT-4 to answer the questions. We already have the questions and the answers. We are just asking GPT-4 to explain it in step by step. This is similar to what we have been seeing with chain-of-thought reasoning, chain-of-thought prompting, and other different prompting techniques showing that if you actually push the language model to go step by step, it can actually do a lot better. So we are basically saying, can we have these explanations and step-by-step traces and have them help the smaller language model learn to reason a little bit better. And because of that, actually—and this goes back to your earlier questions about benchmarks—in this particular paper, we chose two main benchmarks. There were more than two, but like the two main benchmarks where BIG-Bench Hard and AGIEval. BIG-Bench Hard is a 23 subset of BIG-Bench that we were just talking about earlier, and a lot of the tasks are very heavy on reasoning. AGIEval is a set of questions that are SAT-, LSAT-, GRE-, and GMAT-type of questions. They are also very heavy on reasoning. The benchmarks were selected to highlight the reasoning improvement and the reasoning capability of the model. And we had, we had a bunch of use cases there, and you would see one of the common themes there is that there is actually … even before the use cases, if you look at the, the results, the reasoning ability as measured by these two benchmarks at least of the base model significantly improved. Still far behind the teacher. The teacher is much, much more powerful and there’s no real comparison, but still the fact that collecting synthetic data from a model like GPT-4 explaining reasoning steps could help a much smaller model get better at reasoning and get better by that magnitude was a very interesting finding. We were, we were quite a bit surprised, actually, by the results. We thought that it will improve the model reasoning abilities, but it actually improved it beyond what we expected. And again, this goes back to like imagine if we were … if we wanted to do that without a model like GPT-4, that would entail having humans generate explanations for a very large number of tasks and make sure that these explanations remain faithful and align with the answers of the question. It would have been a very hard task, and the type of annotator that you would like to recruit in order to do that, it would have been … even made it harder and slower. But having, having the capabilities of a model like GPT-4 is really what made it possible to do that.

LLORENS: You’ve, you’ve outlined now, you know, your experiments around using GPT-4 to train a smaller model, but earlier, you also alluded to a pretty compelling idea that maybe even a large, powerful model could, I guess, self-improve by generate, you know, performing a generation, critiquing itself, and then somehow guiding, you know, the parameter weights in a way that, that was informed by the critique. Is that, was that part of these experiments, or what … or, or is that … does that work? [LAUGHS] Have, have we … do we have experimental evidence of that?  

AWADALLAH: Yeah, I think, I think that’s a very good question. That was really how we started. That was really what we were aiming and still trying to do. The value … we started off by asking that question: can we actually have a model self-improve, self-improve itself? From an experimental perspective, it was much easier to have a powerful model help a smaller model improve. But self-improvement is really what we, what got us excited about this direction from the beginning. There has been evidence from other work showing up over the last short period actually showing that this is actually a very promising direction, too. For example, one of the very interesting findings about these powerful models—I think that the term frontier models is being used to refer to them now—is that they have a very good ability at critiquing and verifying output. And sometimes that’s even better than their ability at solving the task. So you can basically go to GPT-4 and ask it to solve a coding question. Write a Python function to do something. And then you can go again to GPT-4 and ask it to look back at that code and see if there are any bugs in there. And surprisingly, it would identify bugs in its own generation with a very high quality. And then you can go back to GPT-4 again and ask it to improve its own generation and fix the bugs. And it does that. So we actually have a couple of experiments with that. One of them in a toolkit called LIDA that one of my colleagues here, Victor [Dibia], has been working on for some time. LIDA is a tool for visualizations, and you basically go there and submit a query. The query would be, say, create a graph that shows the trends of stocks over the last year. And it’ll actually go to the data basically, engineer Python code. The Python code, when compiled and executed, would generate a visualization. But then we were finding out that we don’t have to stop there. We can actually ask GPT-4 again to go back to that visualization and critique it, and it doesn’t have to be open critique. We can define the dimensions that we would like to improve on and ask GPT-4 to critique and provide feedback across these dimensions. Like it could be the readability of the chart. It could be, is the type of chart the best fit for the data? And surprisingly it does that quite well. And then that opens the door to so many interesting experiences where you can, after coming up with the initial answer, you can actually suggest some of these improvements to a human. Or maybe if you are confident enough, you just go ahead and apply them even without involving the human in the loop and you actually get a lot better. There was another experiment like that where another colleague of mine has been working on a library called AutoGen, which basically helps with these iterative loops on top of language models, as well as figuring out values of hyperparameters and so on and so forth. And the experiments were very similar. There was a notion there of like having a separate agent that the team refers to as a user proxy agent, and that agent basically has a criteria of what the user is trying to do. And it keeps asking GPT-4 to critique the output and improve the output up until this criteria is met. And we see that we get much, much better value with using GPT-4 this way. That cycle is expensive, though, because you have to iterate and go back multiple times. The whole idea of self-improvement is basically, can we literally distill that cycle into the model itself again so that as the model is being used and being asked to maybe critique and provide feedback or maybe also getting some critique and feedback from the human user, can we use that data to continue to improve the model itself?

LLORENS: It is pretty fascinating that these models can be better at evaluating a candidate solution to a task than generating a novel solution to the task. On the other hand, maybe it’s not so surprising. One of the things that’s hard about or one of the things that can be challenging is this idea of, you know, prompt engineering, by which I’m trying to specify a task for the, for the model to solve or for the AI system to solve. But if you think about it, the best I can do at specifying the task is to actually try my best to complete the task. I’ve now specified the task to the greatest extent that I possibly can. So the machine kind of has my best task specification. With that, that information, now it becomes a kind of maybe even in some cases a superhuman evaluator. It’s doing better than I can at evaluating my own work. So that’s kind of an interesting twist there. Back, you know, back to the Orca paper, one of the things that you wouldn’t have seen … you know, earlier in the talk, you, you harkened back to say a decade ago, when benchmarks lasted a long, a longer time, one of the things that we would not necessarily have seen in a paper from that era, you know, say the CNN era of AI, is, is, a, is a safety evaluation, you know, for a specialized object recognition model. But in the Orca paper, we do have a safety evaluation. Can you, you talk a little bit about the thought process behind the particular evaluations that you did conduct and, and why these are necessary in the first place in this era of AI? 

AWADALLAH: Yeah, I think in this era of AI, this is one of the most important parts of the development cycle of any LLM—large or small. And as we were just describing, we are discovering abilities of these models as we go. So just as there will be a lot of emerging capabilities that are surprising and useful and interesting, this would also open the door to a lot of misuse. And safety evaluation is at least … is the least we can do in order to make sure that we understand how, how can this model be used and what are some of the possible harms or the possible misuses that can come from using these models? So I think, I think this is, this is now definitely should be a standard for any work on language models. And here we are not, we’re not really training a language model from scratch. This is more of like a post-training or a fine-tuning of an existing language model. But even for, for, for research like that, I think safety evaluation should be a critical component of that. And, yes, we did some, and we, we, we actually have a couple of paragraphs in the paper where we say we need to do a lot more, and we are doing a lot more of that right now. I think … what we did in the paper that … we focused on only two dimensions: truthfulness and toxicity. And we were basically trying to make sure that we are trying to see the additional fine-tuning and training that we do, is it improving the model across these dimensions or is it not? And the good news that it was actually improving it in both dimensions, at least with the benchmarks that we have tried. I, I think it was interesting that actually on the, on the toxicity aspect in particular, we found that this particular type of post-training is actually improving the base model in terms of its tendency to generate toxic or biased content. But I think a big part of that is that we, we’re using Azure APIs in part of the data cleaning and data processing, and Azure has invested a lot of time and effort in making sure that we have a lot of tools and classifiers for identifying unsafe content, so the training data, the post-training data, benefited from that, which ended up helping the model, as well. But to your point, I think this is a critical component that should go into any work related to pretraining or post-training or even fine-tuning in many cases. And we did some in the paper, but I think, I think there’s a lot more to be done there. 

LLORENS: Can you talk a little bit more about post-training as distinct from pretraining? How that, how that process has evolved, and, and where you see it going from here?

AWADALLAH: I, I, I see a ton of potential and, and opportunity there actually. And pretraining is the traditional language model training as we have always done it. Surprisingly, actually, if you go back to … like I, I was … in, in one of the talks, I was showing like a 20-years-ago paper by Bengio et al. doing the language model training with neural networks, and we’re still training neural networks the same way, autoregressive next word prediction. Very different architecture, a lot of detail that goes into the training process, but we are still training them as a language model to predict the next word. In a big departure from that—and it started with the InstructGPT paper and then a lot of other work had followed—there was this introduction of other steps of the language model training process. The first step is instruction tuning, which is showing the model prompts and responses and asking it to … and training the model on these prompts and responses. Often these responses are originated by a human. So you are not just training the model to learn the language model criteria only anymore, you are actually training it to respond to a way the human would want it to respond. And this was very interesting because you could see that the language models are really very good text-completion engines. And at some time actually, a lot of folks were working on framing the task such that it looks like this text completion. So if you are doing classification, you would basically list your input and then ask a question where the completion of that question would be the class that you are looking for. But then the community started figuring out that you can actually introduce this additional step of instruction tuning, where now out of all the possible ways of completing a sentence like if I’m asking a question, maybe listing other similar questions is a very good way of completion. Maybe repeating that question with more details is another way of completion, or answering the question is a third way of completion, and all of them could be highly probable. The instruction tuning is basically teaching the model the way to respond, and a big part of that has to do with safety, as well, because you could demonstrate how we want the model to be helpful, how we want the model to be harmless, in this instruction-tuning step. But the instruction tuning step is only showing the model what to do. It’s not showing it what not to do. And this is where the RLHF step came in, the reinforcement learning from human feedback. What’s happening really is that instead of showing the model a single answer, we’re showing them a little more than one answer. And we are basically showing them only a preference. We’re basically telling the model Answer A is better than Answer B. It could be better for many reasons. We are just encoding our criteria of better into these annotations, and we are training a reward model first that basically it’s job is, given any response, would assign a scalar value to it on how good it is. And then we are doing the RLHF training loop, where the reward model is used to update the original model such that it learns what are better responses or not or worse responses and tries to align more with the better responses. The post-training is, as a concept, is very related and, and sometimes referred to also as alignment, because the way post-training has been mostly used is to align the model to human values, whether this be being helpful or being harmless. 

LLORENS: Ahmed, as we, as we wrap up here, typically, I would ask something like, you know, what’s next for your research, and maybe you can tell us a little bit about what’s next for your research. [LAUGHS] But, but before you do that, I’d love to understand what, what key limitation you see in the current era of AI that you would … would be on your wish list, right, as something that maybe you and your team or maybe the broader field has accomplished in the next five years. What, what new capabilities would be on your wish list for AI over the next five years? 

AWADALLAH: Yeah, given, given the progress, I would say even much shorter than five years. 

LLORENS: Five months. [LAUGHS]

AWADALLAH: But I would say … actually the answer to the two questions are, are very similar. Actually, I think where we are with these models right now is much better than many people anticipated, and we are able to solve problems that we didn’t think we could solve before. One of the key capabilities that I would like to see getting better over the next, few months to a few years—hopefully more toward few months—is the ability of the model to continue to learn. This like continual learning loop where the model is learning as it interacts with the humans. The model is reflecting on past experiences and getting better as we use it, and maybe also getting better in an adaptive way. Like we sometimes use this term adaptive alignment, where we are basically saying we want the model actually to continue to align and continue to align in the way it behaves across multiple dimensions. Like maybe the model will get more personal as I use it, and it will start acting more and, and behaving more in a way I want it to be. Or maybe I am developing a particular application, and for that application, I want the model to be a lot more creative or I want the model to be a lot more grounded. We can do some of that with prompting right now, but I think having more progress along this notion of continual learning, lifelong learning … this has been a heavily studied subject in machine learning in general and has been the holy grail of machine learning for many, many, many years. Having a model that’s able to continue to learn, continue to adapt, gets better every time you use it, so just when I use it today and I interact with it and it could learn about my preferences, and next time along, I don’t have to state these preferences again. Or maybe when it makes a mistake and I provide a feedback, next time along, it already knows that it had made that mistake and it already gives me a better solution.  

LLORENS: That should have been the last question. But I think I have one more. That is, how will we know that the models are getting better at that, right? That’s a metric that’s sort of driven by interaction versus, you know, static evaluation. So how do you, how do you measure progress in adaptive alignment that way?

AWADALLAH: I think, I think that’s a very interesting point. And this actually ties this back with two concepts that we brought up earlier: the evaluation side and the safety side. Because from the evaluation perspective, I do think we need to move beyond static benchmark evaluation to a more dynamic human-in-the-loop evaluation, and there’s already been attempts and progress at that just over the past few months, and there is still a lot more to do there. The evaluation criteria will not also be universal. Like there will be a lot … like a lot of people talk about the, let’s say, fabrications—the models making up information, facts. Well, if I am using the model to help me write fictional stories, like this becomes a feature; it’s not a bug. But if I’m using the model to ask questions, especially in the high-stakes scenario, it becomes a very big problem. So having a way of evaluating these models that are dynamic, that are human-in-the-loop, that are adaptive, that aligns with objectives of how we are using the models will be a very important research area, and that ties back to the safety angles, as well, because if I … if we are barely … we’re, we’re …  everybody is working really hard to try to understand the safety of the models after the models are being trained and they are fixed. But what if the models continue to improve? What if it’s continuing to learn? What if it’s learning things from me that are different than what it’s learning from you? Then that notion of alignment and safety and evaluation of that becomes also a very open and interesting question.  

LLORENS: Well, look, I love the ambition there, Ahmed, and thanks for a fascinating discussion. 

AWADALLAH: Thank you so much, Ashley.

The post AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens appeared first on Microsoft Research.

Read More

Microsoft at ACM SIGCOMM 2023: Innovating the future of networking

Microsoft at ACM SIGCOMM 2023: Innovating the future of networking

Innovating the future of networking

Modern applications heavily rely on robust network infrastructure, requiring continuous innovation. In this evolving landscape, Microsoft is at the forefront, spearheading innovation efforts in networking and strengthening the foundational network infrastructure that underpins the cloud ecosystem. By investing in and enhancing this critical infrastructure, Microsoft not only ensures the resilience and scalability of cloud services but also lays the groundwork for the sophisticated and transformative applications that will continue to define the technological landscape.

ACM SIGCOMM (opens in new tab), the premier annual conference of the Association for Computing Machinery’s special interest group on data communication (opens in new tab) (SIGCOMM), is dedicated to the study of communication and computer networks. Microsoft was proud to be a Gold Sponsor of this year’s conference, publishing 10 papers and participating in the organizing committee. Dave Maltz (opens in new tab), technical fellow and corporate vice president of Azure Networking, served as one of the program committee chairs, helping to oversee the conference’s technical program. Additionally, we are proud to acknowledge the significant achievement of one of our youngest researchers, Siva Kakarla (opens in new tab), recognized as the ACM SIGCOMM Dissertation Award (opens in new tab) runner up for his thesis, “Formal Methods for a Robust Domain Name System (opens in new tab).”  

Microsoft also had a booth showcasing some of our latest technologies, including hollow core biber-based connectivity, SoNIC on smart switches, container networking, technologies for L3/L4-based DDoS protection, and technologies that we are building to extend the cloud into space—for both earth observation and satellite communication.

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.


Paper highlights 

The papers Microsoft published at SIGCOMM 2023 span a wide spectrum of networking domains, ranging from 5G and wide area networks (WAN) to enterprise networks. They also explore various aspects of networking, including traffic engineering, network offload strategies, and specialized network designs tailored for applications like gaming, video conferencing, and financial services.   

Here are some of the highlights:

Switchboard: Efficient Resource Management for Conferencing Services 

Efficient resource management is crucial for conferencing services, such as Microsoft Teams, to balance user experience and cost-effectiveness. This involves optimizing the allocation of media processing servers, responsible for handling media streams during calls. Rahul Bothra, Rohan Gandhi, Ranjita Bhagwan, Venkat Padmanabhan, Rui Liang, Steve Carlson, Vinayaka Kamath, Sreangsu Acharyya, Ken Sueda, Somesh Chaturmohta, and Harsha Sharm introduce Switchboard, a significant advancement in resource management controllers. Switchboard is peak-aware, recognizing that resource costs vary with peak usage times and across time zones, allowing servers to serve calls during peak times and act as backups during off-peak hours. Additionally, it enhances efficiency by coordinating network and compute provisioning and application-aware resource allocation. Evaluation using Microsoft Teams data demonstrates that Switchboard reduces provisioning costs by up to 51 percent while maintaining or improving latency compared to existing solutions.

Resilient Baseband Processing in Virtualized RANs with Slingshot 

In the realm of cellular networks, virtualized radio access networks (vRANs) are gaining prominence, replacing traditional specialized hardware with software on commodity servers. However, current vRAN setups lack resilience, making it challenging to implement failover mechanisms and upgrades without prolonged service interruptions. Nikita Lazarev, Tao Ji, Anuj Kalia, Daehyeok Kim, Ilias Marinos, Francis Y. Yan, Christina Delimitrou, Zhiru Zhang, and Aditya Akella propose Slingshot, an innovative system designed to seamlessly introduce resilience to the most critical layer of vRANs, the physical layer (PHY). Slingshot accomplishes this by employing novel techniques for real-time workload migration, incorporating fast RAN protocol middleboxes, and implementing real-time RAN failure detection. A key breakthrough in Slingshot’s design is its approach to treat transient disruptions from resilience events as akin to regular wireless signal impairments, using the inherent resilience of cellular networks to these occurrences. Experiments conducted on a cutting-edge 5G vRAN testbed demonstrate Slingshot’s capability to manage PHY failover without interrupting video conferencing and causing under 110 microseconds of disruption to a TCP connection. Furthermore, it enables seamless zero-downtime upgrades in vRAN deployments.

DBO: Response Time Fairness for Cloud-Hosted Financial Exchanges 

When hosting financial exchanges in cloud environments, ensuring equal and predictable latency for all market participants is critical, especially in tasks like high-speed trading. Existing cloud deployments often struggle to maintain such fairness due to factors like congestion and varying network paths. In this paper, Prateesh Goyal, Eashan Gupta, Ilias Marinos, Chenxingyu Zhao, Radhika Mittal, and myself (Ranveer Chandra), tackle the issue arising from the lack of determinism in cloud networks, showing that achieving predictable or bounded latency isn’t a necessity to ensure fairness. Inspired by the concept of logical clocks in distributed systems, the paper introduces Delivery Based Ordering (DBO) as a novel approach to rectifying latency discrepancies among participants, helping ensure fairness. The evaluation of DBO, conducted both in a hardware testbed and a public cloud environment, demonstrates its feasibility in achieving guaranteed fairness and sustaining sub-100 microsecond latency, even at high transaction rates.

For the complete list of accepted publications by Microsoft researchers, please see the publications list on Microsoft at SIGCOMM 2023.

a group of researchers attending SIGCOMM 2023. They are standing in front of multiple buildings.

Learn about opportunities

Microsoft welcomes talented individuals across various roles at Microsoft Research, Azure Networking, and other departments. Whether you’re a networking partner or researcher, we welcome your collaboration and exploration to advance computer networking and invite you to be part of the team crafting cutting-edge solutions for industry challenges. Review our open positions at the Microsoft Research website.

The post Microsoft at ACM SIGCOMM 2023: Innovating the future of networking appeared first on Microsoft Research.

Read More

Shout at the Devil: Capcom’s ‘Devil May Cry 5’ Joins GeForce NOW

Shout at the Devil: Capcom’s ‘Devil May Cry 5’ Joins GeForce NOW

GFN Thursday is downright demonic, as Devil May Cry 5 comes to GeForce NOW.

Capcom’s action-packed third-person brawler leads 15 titles joining the GeForce NOW library this week, including Gears Tactics and The Crew Motorfest.

It’s also the last week to take on the Ultimate KovaaK’s Challenge. Get on the leaderboard today for a chance to win a 240Hz gaming monitor, a gaming Chromebook, GeForce NOW memberships or other prizes. The challenge ends on Thursday, Sept. 21.

The Devil Returns

Devil May Cry 5 on GeForce NOW
Jackpot!

Devil May Cry 5 is the next title from Capcom’s catalog to come to GeForce NOW. Members can stream all of its high-octane, stylish action at GeForce RTX quality to nearly any device, thanks to the power of GeForce NOW cloud gaming servers.

The threat of demonic power has returned to menace the world once again. Take on hordes of enemies as Nero, V or the legendary Dante with the ramped-up sword-and-gun gameplay that the series is known for. Battle epic bosses in adrenaline-fueled fights across the overrun Red Grave City — all to the beat of a truly killer soundtrack.

Take the action on the go thanks to the power of the cloud. GeForce NOW Priority members can take the fight with them across nearly any device at up to 1080p and 60 frames per second.

Kickin’ It Into High Gear

Gears Tactics on GeForce NOW
A squad of survivors is all it takes to stop the Locust threat.

Rise up and fight, members. Gears Tactics is the next PC Game Pass title to arrive in the cloud.

Gears Tactics is a fast-paced, turn-based strategy game from one of the most acclaimed video game franchises — Gears of War. Set a dozen years before the first Gears of War game, the Gears Tactics story opens as cities on the planet Sera begin falling to the monstrous threat rising from underground: the Locust Horde. With the government in disarray, a squad of survivors emerges as humanity’s last hope. Play as the defiant soldier Gabe Diaz to recruit, develop and command squads on a desperate mission to hunt down the relentless and powerful leader of the Locust army, Ukkon, the group’s monster-making mastermind.

Fight for survival and outsmart the enemy with the sharpness of 4K resolution streaming from the cloud with a GeForce NOW Ultimate membership.

Hit the Road, Jack

The Crew Motorfest on GeForce NOW
The best way to see Hawaii is by car, at 100 mph.

The Crew Motorfest also comes to GeForce NOW this week. The latest entry in Ubisoft’s racing franchise drops drivers into the open roads of Oahu, Hawaii. Get behind the wheel of 600+ iconic vehicles from the past, present and future, including sleek sports cars, rugged off-road vehicles and high-performance racing machines. Race alone or with friends through the bustling city of Honolulu, test off-roading skills on the ashy slopes of a volcano or kick back on the sunny beaches behind the wheel of a buggy.

Members can take a test drive from Sept. 14-17 with a five-hour free trial. Explore the vibrant Hawaiian open world, participate in thrilling driving activities and collect prestigious cars, with all progress carrying over to the full game purchase.

Take the pole position with a GeForce NOW Ultimate membership to stream The Crew Motorfest and more than 1,600 other titles at the highest frame rates. Upgrade today.

A New Challenge

Gunbrella on GeForce NOW
Rain, rain, go away. The umbrella is also a gun today.

With GeForce NOW, there’s always something new to play. Here’s what’s hitting the playlist this week:

  • Tavernacle! (New release on Steam, Sept. 11)
  • Gunbrella (New release on Steam, Sept. 13)
  • The Crew Motorfest (New release on Ubisoft Connect, Sept. 14)
  • Amnesia: The Bunker (Xbox, available on PC Game Pass)
  • Descenders (Xbox, available on PC Game Pass)
  • Devil May Cry 5 (Steam)
  • Gears Tactics (Steam and Xbox, available on PC Game Pass)
  • Last Call BBS (Xbox)
  • The Matchless Kungfu (Steam)
  • Mega City Police (Steam)
  • Opus Magnum (Xbox)
  • Remnant II (Epic Games Store)
  • Space Hulk: Deathwing – Enhanced Edition (Xbox)
  • Superhot (Xbox)
  • Vampyr (Xbox)

What are you planning to play this weekend? Let us know on Twitter or in the comments below.

Read More

Visualize an Amazon Comprehend analysis with a word cloud in Amazon QuickSight

Visualize an Amazon Comprehend analysis with a word cloud in Amazon QuickSight

Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content.

Amazon Comprehend is a fully, managed service that uses natural language processing (NLP) to extract insights about the content of documents. Amazon Comprehend develops insights by recognizing the entities, key phrases, sentiment, themes, and custom elements in a document. Amazon Comprehend can create new insights based on understanding the document structure and entity relationships. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.

Amazon Comprehend lets non-ML experts easily do tasks that normally take hours of time. Amazon Comprehend eliminates much of the time needed to clean, build, and train your own model. For building deeper custom models in NLP or any other domain, Amazon SageMaker enables you to build, train, and deploy models in a much more conventional ML workflow if desired.

In this post, we use Amazon Comprehend and other AWS services to analyze and extract new insights from a repository of documents. Then, we use Amazon QuickSight to generate a simple yet powerful word cloud visual to easily spot themes or trends.

Overview of solution

The following diagram illustrates the solution architecture.

To begin, we gather the data to be analyzed and load it into an Amazon Simple Storage Service (Amazon S3) bucket in an AWS account. In this example, we use text formatted files. The data is then analyzed by Amazon Comprehend. Amazon Comprehend creates a JSON formatted output that needs to be transformed and processed into a database format using AWS Glue. We verify the data and extract specific formatted data tables using Amazon Athena for a QuickSight analysis using a word cloud. For more information about visualizations, refer to Visualizing data in Amazon QuickSight.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload data to an S3 bucket

Upload your data to an S3 bucket. For this post, we use UTF-8 formatted text of the US Constitution as the input file. Then you’re ready to analyze the data and create visualizations.

Analyze data using Amazon Comprehend

There are many types of text-based and image information that can be processed using Amazon Comprehend. In addition to text files, you can use Amazon Comprehend for one-step classification and entity recognition to to accept image files, PDF files, and Microsoft Word files as input, which are not discussed in this post.

To analyze your data, complete the following steps:

  1. On the Amazon Comprehend console, choose Analysis jobs in the navigation pane.
  2. Choose Create analysis job.
  3. Enter a name for your job.
  4. For Analysis type, choose Key phrases.
  5. For Language¸ choose English.
  6. For Input data location, specify the folder you created as a prerequisite.
  7. For Output data location, specify the folder you created as a prerequisite.
  8. Choose Create an IAM role.
  9. Enter a suffix for the role name.
  10. Choose Create job.

The job will run and the status will be displayed on the Analysis jobs page.

Wait for the analysis job to complete. Amazon Comprehend will create a file and place it in the output data folder you provided. The file is in .gz or GZIP format.

This file needs to be download and converted to a non-compressed format. You can download an object from the data folder or S3 bucket using the Amazon S3 console.

  1. On the Amazon S3 console, select the object and choose Download. If you want to download the object to a specific folder, choose Download on the Actions menu.
  2. After you download the file to your local computer, open the zipped file and save it as an uncompressed file.

The uncompressed file must be uploaded to the output folder before the AWS Glue crawler can process it. For this example, we upload the uncompressed file into the same output folder that we use in later steps.

  1. On the Amazon S3 console, navigate to your S3 bucket and choose Upload.
  2. Choose Add files.
  3. Choose the uncompressed files from your local computer.
  4. Choose Upload.

After you upload the file, delete the original zipped file.

  1. On the Amazon S3 console, select the bucket and choose Delete.
  2. Confirm the file name to permanently delete the file by entering the file name in the text box.
  3. Choose Delete objects.

This will leave one file remaining in the output folder: the uncompressed file.

Convert JSON data to table format using AWS Glue

In this step, you prepare the Amazon Comprehend output to be used as input into Athena. The Amazon Comprehend output is in JSON format. You can use AWS Glue to convert JSON into a database structure to ultimately be read by QuickSight.

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Choose Create crawler.
  3. Enter a name for your crawler.
  4. Choose Next.
  5. For Is your data already mapped to Glue tables, select Not yet.
  6. Add a data source.
  7. For S3 path, enter the location of the Amazon Comprehend output data folder.

Be sure to add the trailing / to the path name. AWS Glue will search the folder path for all files.

  1. Select Crawl all sub-folders.
  2. Choose Add an S3 data source.

  1. Create a new AWS Identity and Access Management (IAM) role for the crawler.
  2. Enter a name for the IAM role.
  3. Choose Update chosen IAM role to be sure the new role is assigned to the crawler.
  4. Choose Next to enter the output (database) information.
  5. Choose Add database.
  6. Enter a database name.
  7. Choose Next.
  8. Choose Create crawler.
  9. Choose Run crawler to run the crawler.

You can monitor the crawler status on the AWS Glue console.

Use Athena to prepare tables for QuickSight

Athena will extract data from the database tables the AWS Glue crawler created to provide a format that QuickSight will use to create the word cloud.

  1. On the Athena console, choose Query editor in the navigation pane.
  2. For Data source, choose AwsDataCatalog.
  3. For Database, choose the database the crawler created.

To create a table compatible for QuickSight, the data must be unnested from the arrays.

  1. The first step is to create a temporary database with the relevant Amazon Comprehend data:
CREATE TABLE temp AS
SELECT keyphrases, nested
FROM output
CROSS JOIN UNNEST(output.keyphrases) AS t (nested)
  1. The following statement limits to phrases of at least three words and groups by frequency of the phrases:
CREATE TABLE tableforquicksight AS
SELECT COUNT(*) AS count, nested.text
FROM temp
WHERE nested.Score > .9 AND 
 length(nested.text) - length(replace(nested.text, ' ', '')) + 1 > 2
GROUP BY nested.text
ORDER BY count desc

Use QuickSight to visualize output

Finally, you can create the visual output from the analysis.

  1. On the QuickSight console, choose New analysis.
  2. Choose New dataset.
  3. For Create a dataset, choose From new data sources.
  4. Choose Athena as the data source.
  5. Enter a name for the data source and choose Create data source.

  1. Choose Visualize.

Make sure QuickSight has access to the S3 buckets where the Athena tables are stored.

  1. On the QuickSight console, choose the user profile icon and choose Manage QuickSight.

  1. Choose Security & permissions.
  1. Look for the section QuickSight access to AWS services.

By configuring access to AWS services, QuickSight can access the data in those services. Access by users and groups can be controlled through the options.

  1. Verify Amazon S3 is granted access.

Now you can create the word cloud.

  1. Choose the word cloud under Visual types.
  2. Drag text to Group by and count to Size.


Choose the options menu (three dots) in the visualization to access the edit options. For example, you might want to hide the term “other” from the display. You can also edit items such as the title and subtitle for your visual. To download the word cloud as a PDF, choose Download on the QuickSight toolbar.

Clean up

To avoid incurring ongoing charges, delete any unused data and processes or resources provisioned on their respective service console.

Conclusion

Amazon Comprehend uses NLP to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. You can use Amazon Comprehend to create new products based on understanding the structure of documents. For example, with Amazon Comprehend, you can scan an entire document repository for key phrases.

This post described the steps to build a word cloud to visualize a text content analysis from Amazon Comprehend using AWS tools and QuickSight to visualize the data.

Let’s stay in touch via the comments section!


About the Authors

Kris Gedman is the US East sales leader for Retail & CPG at Amazon Web Services. When not working, he enjoys spending time with his friends and family, especially summers on Cape Cod. Kris is a temporarily retired Ninja Warrior but he loves watching and coaching his two sons for now.

Clark Lefavour is a Solutions Architect leader at Amazon Web Services, supporting enterprise customers in the East region. Clark is based in New England and enjoys spending time architecting recipes in the kitchen.

Read More

Research Focus: Week of September 11, 2023

Research Focus: Week of September 11, 2023

Microsoft Research Focus 24 | Week of September 11, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

PolySem: Efficient Polyglot Analytics on Semantic Data

Data scientists and data engineers spend a large portion of their time trying to understand, clean and transform their data before they can even start performing meaningful analysis. Most database vendors provide business intelligence (BI) tools as an efficient and user-friendly platform for customers to perform data cleaning, preparation and linking tasks to obtain actionable semantic data. However, customers are increasingly interested in querying semantic data through various modalities including SQL, imperative programming languages such as Python, and natural language queries. Today, customers are limited to using either the visual interfaces provided by these tools or languages that are specific to the particular tool.

In a new paper: PolySem: Efficient Polyglot Analytics on Semantic Data, researchers from Microsoft propose techniques to enable the execution of user queries expressed in different modalities on semantic datasets without having to export data out of the BI system. Their techniques include automatic translation of user queries into a language-agnostic representation of data processing operations, and subsequently into the specific query language that is amenable to execution on the BI engine. Evaluation results on BI and decision support benchmarks suggest significant improvements in query performance compared to other popular data processing engines.

Microsoft Research Podcast

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma

Emre Kiciman and Amit Sharma discuss their paper “Causal Reasoning and Large Language Models: Opening a New Frontier for Causality” and how it examines the causal capabilities of large language models (LLMs) and their implications.


NEW RESOURCE

Generative retrieval for conversational question answering

The growth of conversational agents, including voice assistants and chatbots, has led to a shift towards dialogue-based interfaces for information-seeking activities. This has spurred the development of conversational question answering (QA) systems. Effective passage retrieval, which excludes irrelevant data from scanned documents, is crucial but challenging for such systems due to the ambiguity of questions. Current methods rely on the dual-encoder architecture to embed contextualized vectors of questions in conversations. However, this architecture is limited in the embedding bottleneck and the dot-product operation.

To alleviate these limitations, researchers from Microsoft propose generative retrieval for conversational QA (GCoQA). GCoQA assigns distinctive identifiers for passages and retrieves passages by generating their identifiers token-by-token via the encoder–decoder architecture. In this generative way, GCoQA eliminates the need for a vector-style index and could attend to crucial tokens of the conversation context at every decoding step. Experiments on three public datasets containing about twenty million passages show GCoQA achieves relative improvements of +13.6% in passage retrieval and +42.9% in document retrieval. GCoQA also reduces memory usage and improves inference speed.


NEW RESOURCE

BatteryML: An open-source tool for machine learning on battery degradation

In recent years, lithium-ion batteries have become the cornerstone of energy storage solutions, owing to their high energy density, long cycle life, and relatively low self-discharge. They have found widespread applications across various industries, including electric vehicles, consumer electronics, and renewable energy systems. Despite these advantages, lithium-ion batteries face challenges related to capacity degradation and performance optimization, which have become critical areas of focus in battery research.

Capacity degradation is a complex process influenced by various factors such as temperature, charge-discharge rate, and state of charge. Understanding and mitigating these factors is crucial for enhancing the performance and longevity of lithium-ion batteries. This has led to the development of advanced battery management systems and the application of machine learning techniques to improve prediction accuracy and optimize battery performance.

To address these challenges, researchers from Microsoft have released BatteryML (opens in new tab), a comprehensive open-source tool designed specifically for machine learning researchers, battery scientists, and materials researchers with an interest in battery performance prediction and analysis. BatteryML aims to address the challenges of capacity degradation by leveraging machine learning methods to improve various aspects of battery performance, such as capacity fade modeling, state of health prediction, and state of charge estimation.

The post Research Focus: Week of September 11, 2023 appeared first on Microsoft Research.

Read More