Empowering everyone with GenAI to rapidly build, customize, and deploy apps securely: Highlights from the AWS New York Summit

Empowering everyone with GenAI to rapidly build, customize, and deploy apps securely: Highlights from the AWS New York Summit

Imagine this—all employees relying on generative artificial intelligence (AI) to get their work done faster, every task becoming less mundane and more innovative, and every application providing a more useful, personal, and engaging experience. To realize this future, organizations need more than a single, powerful large language model (LLM) or chat assistant. They need a full range of capabilities to build and scale generative AI applications that are tailored to their business and use case —including apps with built-in generative AI, tools to rapidly experiment and build their own generative AI apps, a cost-effective and performant infrastructure, and security controls and guardrails. That’s why we are investing in a comprehensive generative AI stack. At the top layer, which includes generative AI-powered applications, we have Amazon Q, the most capable generative AI-powered assistant. The middle layer has Amazon Bedrock, which provides tools to easily and rapidly build, deploy, and scale generative AI applications leveraging LLMs and other foundation models (FMs). And at the bottom, there’s our resilient, cost-effective infrastructure layer, which includes chips purpose-built for AI, as well as Amazon SageMaker to build and run FMs. All of these services are secure by design, and we keep adding features that are critical to deploying generative AI applications tailored to your business. During the last 18 months, we’ve launched more than twice as many machine learning (ML) and generative AI features into general availability than the other major cloud providers combined. That’s another reason why hundreds of thousands of customers are now using our AI services.

Today at the AWS New York Summit, we announced a wide range of capabilities for customers to tailor generative AI to their needs and realize the benefits of generative AI faster. We’re enabling anyone to build generative AI applications with Amazon Q Apps by writing a simple natural language prompt—in seconds. We’re making it easier to leverage your data, supercharge agents, and quickly, securely, and responsibly deploy generative AI into production with new features in Amazon Bedrock. And we announced new partnerships with innovators like Scale AI to help you customize your applications quickly and easily.

Generative AI-powered apps transform business as usual

Generative AI democratizes information, gives more people the ability to create and innovate, and provides access to productivity-enhancing assistance that was never available before. That’s why we’re building generative AI-powered applications for everyone.

Amazon Q, which includes Amazon Q Developer and Amazon Q Business, is the most capable generative AI-powered assistant for software development and helping employees make better decisions—faster—leveraging their company’s data. Not only does Amazon Q generate the industry’s most accurate coding suggestions, it can also autonomously perform multistep tasks like upgrading Java applications and generating and implementing new features. Amazon Q is where developers need it on the AWS Management Console and in popular integrated development environments, including IntelliJ IDEA, Visual Studio, VS Code, and Amazon SageMaker Studio. You can securely customize Amazon Q Developer with your internal code base to get more relevant and useful recommendations for in-line coding and save even more time. For instance, National Australia Bank has seen increased acceptance rates of 60%, up from 50% and Amazon Prime developers have already seen a 30% increase in acceptance rates. Amazon Q can also help employees do more with the vast troves of data and information contained in their company’s documents, systems, and applications by answering questions, providing summaries, generating business intelligence (BI) dashboards and reports, and even generating applications that automate key tasks. We’re super excited about the productivity gains customers and partners have seen, with early signals that Amazon Q could help their employees become over 80% more productive at their jobs.

To enable all employees to create their own generative AI applications to automate tasks, today we announced the general availability of Amazon Q Apps, a feature of Amazon Q Business. With Amazon Q Apps employees can go from conversation to generative AI-powered app based on their company data in seconds. Users simply describe the application they want in a prompt and Amazon Q instantly generates it. Amazon Q also gives employees the option to generate an app from an existing conversation with a single click. During preview, we saw users generate applications for diverse tasks, including summarizing feedback, creating onboarding plans, writing copy, drafting memos, and many more. For instance, Druva, a data security provider, created an Amazon Q App to support their request for proposal (RFP) process by summarizing the required information almost instantly, reducing RFP response times by up to 25%.

In addition to Amazon Q Apps, which makes it easy for any employee to automate their individual tasks, today we announced AWS App Studio (preview), a generative AI-powered service that enables technical professionals such as IT project managers, data engineers, and enterprise architects to use natural language to create, deploy, and manage enterprise applications across an organization. With App Studio, a user simply describes the application they want, what they want it to do, and the data sources they want to integrate with, and App Studio builds an application in minutes that could have taken a professional developer days to build a similar application from scratch. App Studio’s generative AI-powered assistant eliminates the learning curve of typical low-code tools, accelerating the application creation process and simplifying common tasks like designing the UI, building workflows, and testing the application. Each application can be immediately scaled to thousands of users and is secure and fully managed by AWS, eliminating the need for any operational expertise.

­New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps

Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications with the broadest selection of leading LLMs and FMs as well as easy-to-use capabilities for developers. Tens of thousands of customers are already using Amazon Bedrock, and it’s one of AWS’s fastest growing services over the last decade. For example, Ferrari is rapidly introducing new experiences for customers, dealers, and internal teams to run faster simulations, create new knowledge bases that assist dealers and technical users, enhance the racing fan experience, and create hyper-personalized vehicle recommendations for customers from the millions of options offered by Ferrari in seconds.

Since the start of 2024, we have announced the general availability of more features and capabilities in Amazon Bedrock than comparable services from other leading cloud providers to help customers get generative AI apps from proof of concept to production faster. This includes support for new industry-leading models from Anthropic, Meta, Mistral, and more, as well as the recent addition of Anthropic Claude 3.5 Sonnet, their most advanced model to date, which was made available immediately for Amazon Bedrock customers. Thousands of customers have already used Anthropic’s Claude 3.5 since its release.

Today, we announced some major new Amazon Bedrock innovations that enable you to:

Customize generative AI applications with your data. You can customize generative AI applications with your data to make them specific to your use case, your organization, and your industry:

  • Fine tune Anthropic’s Claude 3 Haiku in Amazon Bedrock – With Amazon Bedrock, you can privately and securely fine tune Amazon Titan, Cohere Command and Command Lite, and Meta Llama 2 models by providing labeled data in Amazon Simple Storage Service (Amazon S3) to specialize the model for your business and use case. Starting today, Amazon Bedrock is also the only fully managed service that provides you with the ability to fine tune Anthropic’s Claude 3 Haiku (in preview). Read more in the News Blog.
  • Leverage even more data sources for Retrieval Augmented Generation (RAG) – With RAG, you can provide a model with new knowledge or up-to-date info from multiple sources, including document repositories, databases, and APIs. For example, the model might use RAG to retrieve search results from Amazon OpenSearch Service or documents from Amazon S3. Knowledge Bases for Amazon Bedrock fully manages this experience by connecting to your private data sources, including Amazon Aurora, Amazon OpenSearch Serverless, MongoDB, Pinecone, and Redis Enterprise Cloud. Today, we’ve expanded the list to include connectors for Salesforce, Confluence, and SharePoint (in preview), so organizations can leverage more business data to customize models for their specific needs. More knowledge base updates can be found in the News Blog.
  • Get the fastest vector search available – To further enhance your RAG workflows, we’ve added vector search to some of our most popular data services, including OpenSearch Service and OpenSearch Serverless, Aurora, Amazon Relational Database Service (Amazon RDS), and more. Customers can co-locate vector data with operational data, reducing the overhead of managing another database. Today, we’re also excited to announce the general availability of vector search for Amazon MemoryDB. Amazon MemoryDB delivers the fastest vector search performance at the highest recall rates among popular vector databases on AWS, making it a great fit for use cases that require single-digit millisecond latency. For example, Amazon Advertising, IBISWorld, Mediaset, and other organizations are using it to deliver real-time semantic search, and Broadridge Financial is running RAG while delivering the same real-time response rates that their customers are accustomed to. You can use MemoryDB vector search standalone today, and soon, you’ll be able to access it through Knowledge Bases for Amazon Bedrock. Read more about MemoryDB in the News Blog.

Create more advanced, personalized customer experiences. With Agents for Amazon Bedrock, applications can take action, executing multistep tasks using company systems and data sources, making generative AI applications substantially more useful. Today, we’re adding key capabilities to Agents for Amazon Bedrock. Previously, agents were limited to taking action based on information from within a single session. Now agents can retain memory across multiple interactions to remember where you last left off and provide better recommendations based on prior interactions. For instance, in a flight booking application, a developer can create an agent that can remember the last time you traveled or that you opt for a vegetarian meal. Agents can also now interpret code to tackle complex data-driven use cases, such as data analysis, data visualization, text processing, solving equations, and optimization problems. For instance, an application user can ask to analyze the historical real estate prices across various zip codes to identify investment opportunities. Check out the News Blogs for more on these capabilities.

De-risk generative AI with Guardrails for Amazon Bedrock. Customers are concerned about hallucinations, where LLMs generate incorrect responses by conflating multiple pieces of information, providing incorrect information, or inventing new information. These results can misinform employees and customers and harm brands, limiting the usefulness of generative AI. Today, we’re adding contextual grounding checks in Guardrails for Amazon Bedrock to detect hallucinations in model responses for applications using RAG and summarization applications. Contextual grounding checks add to the industry-leading safety protection in Guardrails for Amazon Bedrock to make sure the LLM response is based on the right enterprise source data and evaluates the LLM response to confirm that it’s relevant to the user’s query or instruction. Contextual grounding checks can detect and filter over 75% hallucinated responses for RAG and summarization workloads. Read more about our commitments to responsible AI on the AWS Machine Learning Blog.

We’re excited to see how our customers leverage these ever-expanding capabilities of Amazon Bedrock to customize their generative AI applications for vertical industries and business functions. For example, Deloitte is using Amazon Bedrock’s advanced customization capabilities to build their C-Suite AI™ solution, designed specifically for CFOs. It leverages Deloitte’s proprietary data and industry depth across the finance function. C-Suite AI provides customized AI models tailored to the needs of CFOs, with applications that span critical finance areas, generative analytics for data-driven insights, contract intelligence, and investor relations support.

New partners and trainings help customers along the AI journey

Our extensive partner network helps our customers along the journey to realizing the potential of generative AI. For example, BrainBox AI—which worked with our generative AI competency partner, Caylent—developed its AI assistant ARIA on AWS to help reduce energy costs and emissions in buildings. We have been building out our partner network and training offerings to help customers move quickly from experiment to broad usage. Our AWS Generative AI Competency Partner Program is designed to identify, validate, and promote AWS Partners with demonstrated AWS technical expertise and proven customer success. Today 19 new partners joined the program, giving customers access to 60 Generative AI Competency Partners across the globe. New partners include C3.ai, Cognizant, IBM, and LG CNS, and we have significantly expanded customer offerings into Korea, Greater China, and LATAM, and Saudi Arabia.

We’re also announcing a new partnership with Scale AI, our first model customization and evaluation partner. Through this collaboration, enterprise and public sector organizations can use Scale GenAI Platform and Scale Donovan to evaluate their generative AI applications and further customize, configure, and fine tune models to ensure trust and high performance in production, all built on Amazon Bedrock. Scale AI upholds the highest standards of privacy and regulatory compliance working with some of the most stringent government customers, such as the US Department of Defense. Customers can access Scale AI through an engagement with the AWS Generative AI Innovation Center, a program offered by AWS that pairs you with AWS science and strategy experts, or through the AWS Marketplace.

To help upskill your workforce, we’re making a new interactive online learning experience available, AWS SimuLearn, that pairs generative AI-powered simulations and hands-on training, to help people learn how to translate business problems into technical solutions. This is part of our broader commitment to provide free cloud computing skills training to 29 million people worldwide by 2025. Today, we announced that we surpassed this milestone, more than a year ahead of schedule.

We’re giving customers tools that put the power of generative AI into all employees’ hands, providing more ways to create personalized and relevant generative AI-powered applications, and working on the tough problems like reducing hallucinations so more companies can gain benefits from generative AI. We’re energized by the progress our customers have already made in making generative AI a reality for their organizations and will continue to innovate on their behalf. To watch the New York Summit keynote for an in-depth look at these announcements, visit our AWS New York Summit page or learn more about our generative AI services.


About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Read More

A progress update on our commitment to safe, responsible generative AI

A progress update on our commitment to safe, responsible generative AI

Responsible AI is a longstanding commitment at Amazon. From the outset, we have prioritized responsible AI innovation by embedding safety, fairness, robustness, security, and privacy into our development processes and educating our employees. We strive to make our customers’ lives better while also establishing and implementing the necessary safeguards to help protect them. Our practical approach to transform responsible AI from theory into practice, coupled with tools and expertise, enables AWS customers to implement responsible AI practices effectively within their organizations. To date, we have developed over 70 internal and external offerings, tools, and mechanisms that support responsible AI, published or funded over 500 research papers, studies, and scientific blogs on responsible AI, and delivered tens of thousands of hours of responsible AI training to our Amazon employees. Amazon also continues to expand its portfolio of free responsible AI training courses for people of all ages, backgrounds, and levels of experience.

Today, we are sharing a progress update on our responsible AI efforts, including the introduction of new tools, partnerships, and testing that improve the safety, security, and transparency of our AI services and models.

Launched new tools and capabilities to build and scale generative AI safely, supported by adversarial style testing (i.e., red teaming)

In April 2024, we announced the general availability of Guardrails for Amazon Bedrock and Model Evaluation in Amazon Bedrock to make it easier to introduce safeguards, prevent harmful content, and evaluate models against key safety and accuracy criteria. Guardrails is the only solution offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution. It helps customers block up to 85% of harmful content on top of the native protection from FMs on Amazon Bedrock.

In May, we published a new AI Service Card for Amazon Titan Text Premier to further support our investments in responsible, transparent generative AI. AI Service Cards are a form of responsible AI documentation that provide customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for our AI services and models. We’ve created more than 10 AI Service Cards thus far to deliver transparency for our customers as part of our comprehensive development process that addresses fairness, explainability, veracity and robustness, governance, transparency, privacy and security, safety, and controllability.

AI systems can also have performance flaws and vulnerabilities that can increase risk around security threats or harmful content. At Amazon, we test our AI systems and models, such as Amazon Titan, using a variety of techniques, including manual red-teaming. Red-teaming engages human testers to probe an AI system for flaws in an adversarial style, and complements our other testing techniques, which include automated benchmarking against publicly available and proprietary datasets, human evaluation of completions against proprietary datasets, and more. For example, we have developed proprietary evaluation datasets of challenging prompts that we use to assess development progress on Titan Text. We test against multiple use cases, prompts, and data sets because it is unlikely that a single evaluation dataset can provide an absolute picture of performance. Altogether, Titan Text has gone through multiple iterations of red-teaming on issues including safety, security, privacy, veracity, and fairness.

Introduced watermarking to enable users to determine if visual content is AI-generated

A common use case for generative AI is the creation of digital content, like images, videos, and audio, but to help prevent disinformation, users need to be able to able to identify AI-generated content. Techniques such as watermarking can be used to confirm if it comes from a particular AI model or provider. To help reduce the spread of disinformation, all images generated by Amazon Titan Image Generator have an invisible watermark by default. It is designed to be tamper-resistant, helping increase transparency around AI-generated content and combat disinformation. We also introduced a new API (preview) in Amazon Bedrock that checks for the existence of this watermark and helps you confirm whether an image was generated by Titan Image Generator.

Promoted collaboration among companies and governments regarding trust and safety risks

Collaboration among companies, governments, researchers, and the AI community is critical to foster the development of AI that is safe, responsible, and trustworthy. In February 2024, Amazon joined the U.S. Artificial Intelligence Safety Institute Consortium, established by the National Institute of Standards and Technology (NIST). Amazon is collaborating with NIST to establish a new measurement science to enable the identification of scalable, interoperable measurements and methodologies to promote development of trustworthy AI. We are also contributing $5 million in AWS compute credits to the Institute for the development of tools and methodologies to evaluate the safety of foundation models. Also in February, Amazon joined the “Tech Accord to Combat Deceptive Use of AI in 2024 Elections” at the Munich Security Conference. This is an important part of our collective work to advance safeguards against deceptive activity and protect the integrity of elections.

We continue to find new ways to engage in and encourage information-sharing among companies and governments as the technology continues to evolve. This includes our work with Thorn and All Tech is Human to safely design our generative AI services to reduce the risk that they will be misused for child exploitation. We’re also a member of the Frontier Model Forum to advance the science, standards, and best practices in the development of frontier AI models.

Used AI as a force for good to address society’s greatest challenges and supported initiatives that foster education

At Amazon, we are committed to promoting the safe and responsible development of AI as a force for good. We continue to see examples across industries where generative AI is helping to address climate change and improve healthcare. Brainbox AI, a pioneer in commercial building technology, launched the world’s first generative AI-powered virtual building assistant on AWS to deliver insights to facility managers and building operators that will help optimize energy usage and reduce carbon emissions. Gilead, an American biopharmaceutical company, accelerates life-saving drug development with AWS generative AI by understanding a clinical study’s feasibility and optimizing site selection through AI-driven protocol analysis utilizing both internal and real-world datasets.

As we navigate the transformative potential of these technologies, we believe that education is the foundation for realizing their benefits while mitigating risks. That’s why we offer education on potential risks surrounding generative AI systems. Amazon employees have spent tens of thousands of training hours since July 2023, covering a range of critical topics like risk assessments, as well as deep dives into complex considerations surrounding fairness, privacy, and model explainability. As part of Amazon’s “AI Ready” initiative to provide free AI skills training to 2 million people globally by 2025, we’ve launched new free training courses about safe and responsible AI use on our digital learning centers. The courses include “Introduction to Responsible AI” for new-to-cloud learners on AWS Educate and courses like “Responsible AI Practices,” and “Security, Compliance, and Governance for AI Solutions” on AWS Skill Builder.

Delivering groundbreaking innovation with trust at the forefront

As an AI pioneer, Amazon continues to foster the safe, responsible, and trustworthy development of AI technology. We are dedicated to driving innovation on behalf of our customers while also establishing and implementing the necessary safeguards. We’re also committed to working with companies, governments, academia, and researchers alike to deliver groundbreaking generative AI innovation with trust at the forefront.


About the author

Vasi Philomin is VP of Generative AI at AWS. He leads generative AI efforts, including Amazon Bedrock and Amazon Titan.

Read More

Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality

Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality

Frontier large language models (LLMs) like Anthropic Claude on Amazon Bedrock are trained on vast amounts of data, allowing Anthropic Claude to understand and generate human-like text. Fine-tuning Anthropic Claude 3 Haiku on proprietary datasets can provide optimal performance on specific domains or tasks. The fine-tuning as a deep level of customization represents a key differentiating factor by using your own unique data.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities to build generative artificial intelligence (AI) applications, simplifying development with security, privacy, and responsible AI. With Amazon Bedrock custom models, you can customize FMs securely with your data. According to Anthropic, Claude 3 Haiku is the fastest and most cost-effective model on the market for its intelligence category. You can now fine-tune Anthropic Claude 3 Haiku in Amazon Bedrock in a preview capacity in the US West (Oregon) AWS Region. Amazon Bedrock is the only fully managed service that provides you with the ability to fine-tune Anthropic Claude models.

This post introduces the workflow of fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock. We first introduce the general concept of fine-tuning and then focus on the important steps in fining-tuning the model, including setting up permissions, preparing for data, commencing the fine-tuning jobs, and conducting evaluation and deployment of the fine-tuned models.

Solution overview

Fine-tuning is a technique in natural language processing (NLP) where a pre-trained language model is customized for a specific task. During fine-tuning, the weights of the pre-trained Anthropic Claude 3 Haiku model will get updated to enhance its performance on a specific target task. Fine-tuning allows the model to adapt its knowledge to the task-specific data distribution and vocabulary. Hyperparameters like learning rate and batch size need to be tuned for optimal fine-tuning.

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock offers significant advantages for enterprises. This process enhances task-specific model performance, allowing the model to handle custom use cases with task-specific performance metrics that meet or surpass more powerful models like Anthropic Claude 3 Sonnet or Anthropic Claude 3 Opus. As a result, businesses can achieve improved performance with reduced costs and latency. Essentially, fine-tuning Anthropic Claude 3 Haiku provides you with a versatile tool to customize Anthropic Claude, enabling you to meet specific performance and latency goals efficiently.

You can benefit from fine-tuning Anthropic Claude 3 Haiku in different use cases, using your own data. The following use cases are well-suited for fine-tuning the Anthropic Claude 3 Haiku model:

  • Classification – For example, when you have 10,000 labeled examples and want Anthropic Claude to do really well at this task
  • Structured outputs – For example, when you need Anthropic Claude’s response to always conform to a given structure
  • Industry knowledge – For example, when you need to teach Anthropic Claude how to answer questions about your company or industry
  • Tools and APIs – For example, when you need to teach Anthropic Claude how to use your APIs really well

In the following sections, we go through the steps of fine-tuning and deploying Anthropic Claude 3 Haiku in Amazon Bedrock using the Amazon Bedrock console and the Amazon Bedrock API.

Prerequisites

To use this feature, make sure you have satisfied the following requirements:

  • An active AWS account.
  • Anthropic Claude 3 Haiku enabled in Amazon Bedrock. You can confirm it’s enabled on the Model access page of the Amazon Bedrock console.
  • Access to the preview of Anthropic Claude 3 Haiku fine-tuning in Amazon Bedrock. To request access, contact your AWS account team or submit a support ticket using the AWS Management Console. When creating the support ticket, choose Bedrock for Service and Models for Category.
  • The required training dataset (and optional validation dataset) prepared and stored in Amazon Simple Storage Service (Amazon S3).

To create a model customization job using Amazon Bedrock, you need to create an AWS Identity and Access Management (IAM) role with the following permissions (for more details, see Create a service role for model customization):

The following code is the trust relationship, which allows Amazon Bedrock to assume the IAM role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "account-id"
                },
                "ArnEquals": {
                    "aws:SourceArn": "arn:aws:bedrock:us-west-2:account-id:model-customization-job/*"
                }
            }
        }
    ] 
}

Prepare the data

To fine-tune the Anthropic Claude 3 Haiku model, the training data must be in JSON Lines (JSONL) format, where each line represents a single training record. Specifically, the training data format aligns with the MessageAPI:

{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}]}
{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}]}
{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}]}

The following is an example from a text summarization use case used as one-line input for fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock. In JSONL format, each record is one text line.

{
"system": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.",
"messages": [
{"role": "user", "content": "instruction:nnSummarize the news article provided below.nninput:nSupermarket customers in France can add airline tickets to their shopping lists thanks to a unique promotion by a budget airline. ... Based at the airport, new airline launched in 2007 and is a low-cost subsidiary of the airline."},
{"role": "assistant", "content": "New airline has included voucher codes with the branded products ... to pay a booking fee and checked baggage fees ."}
]
}

You can invoke the fine-tuned model using the same MessageAPI format, providing consistency. In each line, the "system" message is optional information, which is a way of providing context and instructions to the model, such as specifying a particular goal or role, often known as a system prompt. The "user" content corresponds to the user’s instruction, and the "assistant" content is the desired response that the fine-tuned model should provide. Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock supports both single-turn and multi-turn conversations. If you want to use multi-turn conversations, the data format for each line is as follows:

{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}, {"role": "user", "content": string}, {"role": "assistant", "content": string}]}

The last line’s "assistant" role represents the desired output from the fine-tuned model, and the previous chat history serves as the prompt input. For both single-turn and multi-turn conversation data, the total length of each record (including system, user, and assistant content) should not exceed 32,000 tokens.

In addition to your training data, you can prepare validation and test datasets. Although it’s optional, a validation dataset is recommended because it allows you to monitor the model’s performance during training. This dataset enables features like early stopping and helps improve model performance and convergence. Separately, a test dataset is used to evaluate the final model’s performance after training is complete. Both additional datasets follow a similar format to your training data, but serve distinct purposes in the fine-tuning process.

If you’re already using Amazon Bedrock to fine-tune Amazon Titan, Meta Llama, or Cohere models, the training data should follow this format:

{"prompt": "<prompt1>", "completion": "<expected generated text>"}
{"prompt": "<prompt2>", "completion": "<expected generated text>"}
{"prompt": "<prompt3>", "completion": "<expected generated text>"}

For data in this format, you can use the following Python code to convert to the required format for fine-tuning:

import json

# Define the system string, leave it empty if not needed
system_string = ""

# Input file path
input_file = "Orig-FT-Data.jsonl"

# Output file path
output_file = "Haiku-FT-Data.jsonl"

with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
    for line in f_in:
        data = json.loads(line)
        prompt = data["prompt"]
        completion = data["completion"]

        new_data = {}
        if system_string:
            new_data["system"] = system_string
        new_data["messages"] = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": completion}
        ]

        f_out.write(json.dumps(new_data) + "n")

print("Conversion completed!")

To optimize the fine-tuning performance, the quality of training data is more important than the size of the dataset. We recommend starting with a small but high-quality training dataset (50–100 rows of data is a reasonable start) to fine-tune the model and evaluate its performance. Based on the evaluation results, you can then iterate and refine the training data. Generally, as the size of the high-quality training data increases, you can expect to achieve better performance from the fine-tuned model. However, it’s essential to maintain a focus on data quality, because a large but low-quality dataset may not yield the desired improvements in the fine-tuned model performance.

Currently, the requirements for the number of records in training and validation data for fine-tuning Anthropic Claude 3 Haiku align with the customization limits set by Amazon Bedrock for fine-tuning other models. Specifically, the training data should not exceed 10,000 records, and the validation data should not exceed 1,000 records. These limits provide efficient resource utilization while allowing for model optimization and evaluation within a reasonable data scale.

Fine-tune the model

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock allows you to configure various hyperparameters that can significantly impact the fine-tuning process and the resulting model’s performance. The following table summarizes the supported hyperparameters.

Name Description Type Default Value Range
epochCount The maximum number of iterations through the entire training dataset. Epochcount is equivalent to epoch. integer 2 1–10
batchSize The number of samples processed before updating model parameters. integer 32 4–256
learningRateMultiplier The multiplier that influences the learning rate at which model parameters are updated after each batch. float 1 0.1–2
earlyStoppingThreshold The minimum improvement in validation loss required to prevent premature stopping of the training process. float 0.001 0–0.1
earlyStoppingPatience The tolerance for stagnation in the validation loss metric before stopping the training process. int 2 1–10

The learningRateMultiplier parameter is a factor that adjusts the base learning rate set by the model itself, which determines the actual learning rate applied during the training process by scaling the model’s base learning rate with this multiplier factor. Typically, you should increase the batchSize when the training dataset size increases, and you may need to perform hyperparameter optimization (HPO) to find the optimal settings. Early stopping is a technique used to prevent overfitting and stop the training process when the validation loss stops improving. The validation loss is computed at the end of each epoch. If the validation loss has not decreased enough (determined by earlyStoppingThreshold) for earlyStoppingPatience times, the training process will be stopped.

For example, the following table shows example validation losses for each epoch during a training process.

Epoch Validation Loss
1 0.9
2 0.8
3 0.7
4 0.66
5 0.64
6 0.65
7 0.65

The following table illustrates the behavior of early stopping during the training, based on different configurations of earlyStoppingThreshold and earlyStoppingPatience.

Scenario earlyStopping Threshold earlyStopping Patience Training Stopped Best Checkpoint
1 0 2 Epoch 7 Epoch 5 (val loss 0.64)
2 0.05 1 Epoch 4 Epoch 4 (val loss 0.66)

Choosing the right hyperparameter values is crucial for achieving optimal fine-tuning performance. You may need to experiment with different settings or use techniques like HPO to find the best configuration for your specific use case and dataset.

Run the fine-tuning job on the Amazon Bedrock console

Make sure you have access to the preview of Anthropic Claude 3 Haiku fine-tuning in Amazon Bedrock, as discussed in the prerequisites. After you’re granted access, complete the following steps:

  1. On the Amazon Bedrock console, choose Foundation models in the navigation pane.
  2. Choose Custom models.
  3. In the Models section, on the Customize model menu, choose Create Fine-tuning job.
  1. For Category, choose Anthropic.
  2. For Models available for fine-tuning, choose Claude 3 Haiku.
  3. Choose Apply.
  1. For Fine-tuned model name, enter a name for the model.
  2. Select Model encryption to add a KMS key.
  3. Optionally, expand the Tags section to add tags for tracking.
  4. For Job name, enter a name for the training job.

Before you start a fine-tuning job, create an S3 bucket in the same Region as your Amazon Bedrock service (for example, us-west-2), as mentioned in the prerequisites. At the time of writing, fine-tuning for Anthropic Claude 3 Haiku in Amazon Bedrock is available in preview in the US West (Oregon) Region. Within this S3 bucket, set up separate folders for your training data, validation data, and fine-tuning artifacts. Upload your training and validation datasets to their respective folders.

  1. Under Input data, specify the S3 locations for both your training and validation datasets.

This setup enforces proper data access and Regional compatibility for your fine-tuning process.

Next, you configure the hyperparameters for your fine-tuning job.

  1. Set the number of epochs, batch size, and learning rate multiplier.
  2. If you’ve included a validation dataset, you can enable early stopping.

This feature allows you to set an early stopping threshold and patience value. Early stopping helps prevent overfitting by halting the training process when the model’s performance on the validation set stops improving.

  1. Under Output data, for S3 location, enter the S3 path for the bucket storing fine-tuning metrics.
  2. Under Service access, select a method to authorize Amazon Bedrock. You can select Use an existing service role if you have an access role with fine-grained IAM policies or select Create and use a new service role.
  3. After you have added all the required configurations for fine-tuning Anthropic Claude 3 Haiku, choose Create Fine- tuning job.

When the fine-tuning job starts, you can see the status of the training job (Training or Complete) under Jobs.

As the fine-tuning job progresses, you can find more information about the training job, including job creation time, job duration, input data, and hyperparameters used for the fine-tuning job. Under Output data, you can navigate to the fine-tuning folder in the S3 bucket, where you can find the training and validation metrics that were computed as part of the fine-tuning job.

Run the fine-tuning job using the Amazon Bedrock API

Make sure to request access to the preview of Anthropic Claude 3 Haiku fine-tuning in Amazon Bedrock, as discussed in the prerequisites.

To start a fine-tuning job for Anthropic Claude 3 Haiku using the Amazon Bedrock API, complete the following steps:

  1. Create an Amazon Bedrock client and set the base model ID for the Anthropic Claude 3 Haiku model:
    import boto3
    bedrock = boto3.client(service_name="bedrock")
    base_model_id = "anthropic.claude-3-haiku-20240307-v1:0:200k"

  2. Generate a unique job name and custom model name, typically using a timestamp:
    from datetime import datetime
    ts = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
    customization_job_name = f"model-finetune-job-{ts}"
    custom_model_name = f"finetuned-model-{ts}"

  3. Specify the IAM role ARN that has the necessary permissions to access the required resources for the fine-tuning job, as discussed in the prerequisites:
    customization_role = "arn:aws:iam::<YOUR_AWS_ACCOUNT_ID>:role/<YOUR_IAM_ROLE_NAME>"

  4. Set the customization type to FINE_TUNING and define the hyperparameters for fine-tuning the model, as discussed in the previous session:
    customization_type = "FINE_TUNING"
    hyper_parameters = {
    "epochCount": "5",
    "batchSize": "32",
    "learningRateMultiplier": "0.05",
    "earlyStoppingThreshold": "0.001",
    "earlyStoppingPatience": "2"
    }

  5. Configure the S3 bucket and prefix where the fine-tuned model and output data will be stored, and provide the S3 data paths for your training and validation datasets (the validation dataset is optional):
    s3_bucket_name = "<YOUR_S3_BUCKET_NAME>"
    s3_bucket_config = f"s3://{s3_bucket_name}/outputs/output-{custom_model_name}"
    s3_train_uri = "s3://<YOUR_S3_BUCKET_NAME>/<YOUR_TRAINING_DATA_PREFIX>"
    s3_validation_uri = "s3://<YOUR_S3_BUCKET_NAME>/<YOUR_VALIDATION_DATA_PREFIX>"
    training_data_config = {"s3Uri": s3_train_uri}
    validation_data_config = {
        "validators": [{
            "s3Uri": s3_validation_uri
        }]
    }

  6. With these configurations in place, you can create the fine-tuning job using the create_model_customization_job method from the Amazon Bedrock client, passing in the required parameters:
    training_job_response = bedrock.create_model_customization_job(
        customizationType=customization_type,
        jobName=customization_job_name,
        customModelName=custom_model_name,
        roleArn=customization_role,
        baseModelIdentifier=base_model_id,
        hyperParameters=hyper_parameters,
        trainingDataConfig=training_data_config,
        validationDataConfig=validation_data_config,
        outputDataConfig=output_data_config
    )

The create_model_customization method will return a response containing information about the created fine-tuning job. You can monitor the job’s progress and retrieve the fine-tuned model when the job is complete, either through the Amazon Bedrock API or Amazon Bedrock console.

Deploy and evaluate the fine-tuned model

After successfully fine-tuning the model, you can evaluate the fine-tuning metrics recorded during the process. These metrics are stored in the specified S3 bucket for evaluation purposes. For the training data, step-wise training metrics are recorded with columns, including step_number, epoch_number, and training_loss.

If you provided a validation dataset, additional validation metrics are stored in a separate file, including step_number, epoch_number, and corresponding validation_loss.

When you’re satisfied with the fine-tuning metrics, you can purchase Provisioned Throughput to deploy your fine-tuned model, which allows you to take advantage of the improved performance and specialized capabilities of the fine-tuned model in your applications. Provisioned Throughput refers to the number and rate of inputs and outputs that a model processes and returns. To use a fine-tuned model, you must purchase Provisioned Throughput, which is billed hourly. The pricing for Provisioned Throughput depends on the following factors:

  • The base model the fine-tuned model was customized from.
  • The number of Model Units (MUs) specified for the Provisioned Throughput. MU is a unit that specifies the throughput capacity for a given model; each MU defines the number of input tokens it can process and output tokens it can generate across all requests within 1 minute.
  • The commitment duration, which can be no commitment, 1 month, or 6 months. Longer commitments offer more discounted hourly rates.

After Provisioned Throughput is set up, you can use the MessageAPI to invoke the fine-tuned model, similar to how the base model is invoked. This provides a seamless transition and maintains compatibility with existing applications or workflows.

It’s crucial to evaluate the performance of the fine-tuned model to make sure it meets the desired criteria and outperforms in specific tasks. You can conduct various evaluations, including comparing the fine-tuned model with the base model, or even evaluating performance against more advanced models, like Anthropic Claude 3 Sonnet.

Deploy the fine-tuned model using the Amazon Bedrock console

To deploy the fine-tuned model using the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, choose Custom models in the navigation pane.
  2. Select the fine-tuned model and choose Purchase Provisioned Throughput.
  1. For Provisioned Throughput name¸ enter a name.
  2. Choose the model you want to deploy.
  3. For Commitment term, choose your level of commitment (for this post, we choose No commitment).
  4. Choose Purchase Provisioned Throughput.

After the fine-tuned model has been deployed using Provisioned Throughput, you can see the model status as In service when you go to the Provisioned Throughput page on the Amazon Bedrock console.

You can use the fine-tuned model deployed using Provisioned Throughput for task-specific use cases. In the Amazon Bedrock playground, you can find the fine-tuned model under Custom models and use it for inference.

Deploy the fine-tuned model using the Amazon Bedrock API

To deploy the fine-tuned model using the Amazon Bedrock API, complete the following steps:

  1. Retrieve the fine-tuned model ID from the job’s output, and create a Provisioned Throughput model instance with the desired model units:
    import boto3
    bedrock = boto3.client(service_name='bedrock')
    
    custom_model_id = training_job_response["customModelId"]
    provisioned_model_id = bedrock.create_provisioned_model_throughput(
    modelUnits=1,
    provisionedModelName='finetuned-haiku-model',
    modelId=custom_model_id
    )['provisionedModelArn']

  2. When the Provisioned Throughput model is ready, you can call the invoke_model function from the Amazon Bedrock runtime client to generate text using the fine-tuned model:
    import json
    bedrock_runtime = boto3.client(service_name='bedrock-runtime')
    
    body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "messages": [{"role": "user", "content": <YOUR_INPUT_PROMPT_STRING>}],
    "temperature": 0.1,
    "top_p": 0.9,
    "system": <YOUR_SYSTEM_PROMPT_STRING>
    })
    
    fine_tuned_response = bedrock_runtime.invoke_model(body=body, modelId=provisioned_model_id)
    fine_tuned_response_body = json.loads(fine_tuned_response.get('body').read())
    print("Fine tuned model response:", fine_tuned_response_body['content'][0]['text']+'n')

By following these steps, you can deploy and use your fine-tuned Anthropic Claude 3 Haiku model through the Amazon Bedrock API, allowing you to generate customized Anthropic Claude 3 Haiku models tailored to your specific requirements.

Conclusion

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock empowers enterprises to optimize this LLM for your specific needs. By combining Amazon Bedrock with Anthropic Claude 3 Haiku’s speed and cost-effectiveness, you can efficiently customize the model while maintaining robust security. This process enhances the model’s accuracy and tailors its outputs to unique business requirements, driving significant improvements in efficiency and effectiveness.

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock is now available in preview in the US West (Oregon) Region. To request access to the preview, contact your AWS account team or submit a support ticket.


About the Authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Sovik Kumar Nath is an AI/ML and Generative AI Senior Solutions Architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double master’s degrees from the University of South Florida and University of Fribourg, Switzerland, and a bachelor’s degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and going on adventures.

Carrie Wu is an Applied Scientist at Amazon Web Services, working on fine-tuning large language models for alignment to custom tasks and responsible AI. She graduated from Stanford University with a PhD in Management Science and Engineering. Outside of work, she loves reading, traveling, aerial yoga, ice skating, and spending time with her dog.

Fang Liu is a principal machine learning engineer at Amazon Web Services, where he has extensive experience in building AI/ML products using cutting-edge technologies. He has worked on notable projects such as Amazon Transcribe and Amazon Bedrock. Fang Liu holds a master’s degree in computer science from Tsinghua University.

Read More

Unified Database: Laying the foundation for large language model vertical applications

Unified Database: Laying the foundation for large language model vertical applications

A diagram showing splitting vector partitions and reallocating vectors in partitions to adapt to changes in data distribution.

Large language models (LLMs) have become a valuable technology in areas such as content creation, language comprehension, and intelligent dialogue, or interactions between people and computer systems. However, these models generate responses based on patterns and rules observed in fixed training data, which can potentially lead them to produce erroneous and even fictitious information. The models can also struggle with real-time knowledge updates. One technique known as retrieval augmented generation (RAG) can organically combine fresh external information with LLMs, putting relevant and precise knowledge into context to help guide the answer generation process, enhancing their performance and reliability.
 
One of the core components of RAG, the vector database, significantly differs from traditional relational databases in its storage and query mechanisms. This presents a challenge to the unified management of increasingly diverse and multimodal knowledge bases. Researchers from the Systems and Networking Group at Microsoft Research Asia believe that a unified database capable of managing rich attributes and modalities of external knowledge will support widespread application and improved reliability of LLMs.
 
“As the capabilities of large models continue to improve, various types of data, such as text, images, and videos, can be encoded into high-dimensional vectors using machine learning technology. Detailed attributes of knowledge, such as the type of images, user preferences, etc., can be converted into different data features. It becomes difficult to achieve efficient and accurate query results among these mixed types of information. Therefore, a unified database is needed to effectively manage the data, providing a more solid knowledge foundation for LLMs,” said Qi Chen, a principal researcher at the Microsoft Research Asia lab in Vancouver, Canada.

VBase query system: Providing a unified foundation for vector index and scalar index scanning

Vector databases and scalar databases have different index scan patterns. Therefore, the lack of a unified foundation is the first problem to be solved in building a unified database.
 
Scalar indexes are based on numerical order, and the scanning of these indexes follows a strictly increasing or decreasing order. This is the primary reason why relational databases can efficiently execute queries. For example, when searching for clothes priced between 100 and 200 Canadian dollars on a shopping platform, the system starts scanning from the price of C$100, and the query stops once the price exceeds C$200. Clearly, this monotonicity-based scalar query is highly efficient.
 
In contrast, vector indexes are built on proximity in high-dimensional space, and index traversal cannot follow a strict order, so they lack monotonicity. Vector indexes only provide approximate spatial navigation for queries, to estimate the nearest subspace. To achieve early termination, the vector index scanning process relies on the TopK algorithm to provide the temporary order. Although the order can be used to terminate the execution in advance, this method is inefficient.

Diagrams illustrating query execution on scalar database and vector database. Left: Scalar index diagram with ordered scanning; Right: Vector index diagram with no strict order.
Figure 1. Query execution on scalar database and vector database

For example, suppose a person has a picture of a garment and wants to find similar items on a shopping platform that are priced below C$200. The traditional method is to first conduct a similarity query to get a large number of candidates, and then filter based on price. For instance, to find the top 10 most similar and appropriately priced results, one can first set the search range to 1,000 candidates, and then filter one by one according to the price condition until 10 results appear that meet the requirements. If the results are insufficient, the search range can be expanded to 2,000 or 3,000 until the requirements are met. 

This method is designed to convert the retrieval results of vector data into a temporary scalar index that follows strict monotonicity and then perform scalar queries. 

The problem with this method is that it cannot guarantee that the K results returned will meet the final filtering query requirements. Therefore, to ensure that the filtering results meet the requirements, either TopK needs to perform a wider similarity query, returning more Ks; or when K is insufficient, TopK queries are repeated. But both methods will lead to suboptimal query performance.

By analyzing a large number of vector indices, researchers have found that vector index queries do not require strict monotonicity for early termination. The traversal of vector indices exhibits a kind of relaxed monotonicity. The traversal of scalar indices is a special case of this relaxed monotonicity.

Based on this discovery, researchers have developed the VBase unified database system. This system provides a unified foundation for efficient scanning of vector indices and scalar indices, making the scanning of various indices follow the same interface and early termination conditions. This innovation not only improves the performance of vector databases in executing complex queries by 10 to 1,000 times, but also improves the accuracy of queries.

VBase makes it possible to build a unified database capable of executing various complex relational vector and scalar mixed queries. Currently, based on the VBase (opens in new tab) system, an open-source database platform has successfully built its own multimodal vector database.

MICROSOFT RESEARCH PODCAST

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

This episode features Senior Principal Research Manager Ahmed H. Awadallah, whose work improving the efficiency of large-scale AI models and efforts to help move advancements in the space from research to practice have put him at the forefront of this new era of AI.


SPFresh: First vector index that supports real-time in-place incremental update 

The RAG technology based on vector database retrieval significantly improves the accuracy of the generation results of LLMs. However, this improvement requires real-time updates to the data in the vector database. For vectors with hundreds to thousands of dimensions, updating is not easy – it can take days to reconstruct the vector index. 

Scalar databases typically use a B-tree or B+ tree index, which can complete updates by directly inserting information after finding the specified location through binary search. However, updating a vector database is much more complicated. 

Take the currently popular fine-grained graph-based vector index and coarse-grained cluster-based vector index as examples. When inserting or deleting vectors in the fine-grained graph vector index, it is necessary to perform large-scale graph scanning to find the appropriate neighbors for update, which requires a lot of computational resources.

Meanwhile, an insufficient update can weaken performance and accuracy. In the update of the coarse-grained cluster index, although the insertion or deletion of vectors only involves the modification of the nearest partition, the cost is lower. But as partition updates accumulate, the data distribution can become unbalanced, which could affect query latency and accuracy, leading to a decline in index quality.

Existing vector index update methods rely on periodic global rebuilding, which is slow and resource intensive. Although performance and accuracy are immediately improved after rebuilding, they gradually decline between rebuilds. In addition, the cost of global rebuilding is very high, requiring more than 10 times the resources of traditional indexing, possibly exceeding the cost of index search services. 

To solve these problems, researchers from Microsoft and their collaborators have proposed SPFresh, which is the first vector index that supports real-time, in-place, incremental updating of unified databases. The core of SPFresh is LIRE – a lightweight incremental rebalancing protocol used to dynamically split or merge vector partitions and reallocate vectors in partitions to adapt to changes in data distribution. LIRE achieves low-resource vector updates by reallocating vectors only at nearby partitions. 

A diagram showing splitting vector partitions and reallocating vectors in partitions to adapt to changes in data distribution.
Figure 2. Partition splitting requires reallocating vector data

Compared to existing periodic index rebuilding methods, SPFresh can greatly reduce the resources required for index rebuilding, and can always maintain a stable high recall rate, low latency, and high query throughput, effectively adapting to dynamic changes in data distribution in a timely manner.

OneSparse: A unified system for sparse and dense multi-index vector search

Vector databases are widely used in fields such as natural language processing, information retrieval, and recommendation systems, providing efficient solutions for handling unstructured data. However, various encoding methods for vector data exist, with sparse and dense vectors each having their own advantages for different types of tasks. For example, sparse vectors are suitable for keyword matching tasks, while dense vectors are better for extracting semantic information. Therefore, in practical applications, multi-index mixed queries are widely used, especially in mixed data sets, where the method of finding similar items by combining sparse and dense feature collaborative filtering has been proven to improve the accuracy of query results.

However, due to the special traversal manner of vector indexes, the intersection between multiple vector indexes cannot be directly pushed down, making it difficult to combine search results from multiple indexes.

To overcome this challenge, researchers from Microsoft and their collaborators have introduced OneSparse, a unified index system catering to both sparse and dense vectors. OneSparse enables the execution of multi-index mixed queries and dynamically generates the optimal merge plan, facilitating rapid intersection and union operations within a single index during index traversal.

OneSparse unifies sparse indexes and dense indexes into a single inverted index and rearranges all posting lists according to the document ID, ensuring efficient execution, even when performing complex queries for both semantic and keyword matching. The technology has been successfully applied in Microsoft Bing’s web search and sponsored search. 

Diagram illustrating the OneSparse index overview. For sparse data, OneSparse maintains one dimension of the sparse vectors (i.e., term) per inverted posting list, which allows fast lookup to all relevant documents of a word in a query. The values stored in an inverted posting list are pairs of ID and a single dimensional feature (e.g., term frequency). For dense vectors, OneSparse clusters them into several posting lists by SPANN. Besides, it builds a SPTAG in-memory ANN index on cluster centroids to quickly navigate to the nearest SPANN posting lists. The values stored in a SPANN posting list are pairs of ID and dense vector in this cluster. All inverted posting lists and SPANN posting lists are saved on disk.
Figure 3. OneSparse index overview

Unified databases accelerate the development of LLMs and hardware innovation 

As early as 2018, Microsoft Research Asia began in-depth research on vector data systems. “At that time, we realized that vectorization would become the cornerstone of deep learning applications,” Qi Chen said. “Therefore, we developed SPTAG (opens in new tab) and SPANN (opens in new tab) technologies one after another, successfully solving the generalization and scalability problems of vector indexing, and applied it to Microsoft Bing search, achieving the world’s largest vector semantic search system.”
 
Researchers at Microsoft Research Asia continue to explore vector database technology. Based on the relaxed monotonicity and the lightweight update method of the LIRE protocol, they have built a unified database system MSVBASE (opens in new tab), which has been open-sourced on GitHub. The MSVBASE system can be used for semantic analysis of multimodal data, providing developers with powerful tools for researching and utilizing the RAG mechanism and designing more complex RAG retrieval queries. RAG technology will not only be able to perform TopK-based vector queries but also make use of more high-dimensional vector data and attributes for retrieval, achieving more accurate query results. 

In the current age of extensive knowledge expansion, unified databases offer better knowledge transfer between multimodal data types. They provide substantial corpus support for large models and are poised to drive innovation in underlying hardware, laying the foundation for data-enhanced AI in the future.

The post Unified Database: Laying the foundation for large language model vertical applications appeared first on Microsoft Research.

Read More

Paige Cofounder Thomas Fuchs’ Diagnosis on Improving Cancer Patient Outcomes With AI

Paige Cofounder Thomas Fuchs’ Diagnosis on Improving Cancer Patient Outcomes With AI

Improved cancer diagnostics — and improved patient outcomes — could be among the changes generative AI will bring to the healthcare industry, thanks to Paige, the first company with an FDA-approved tool for cancer diagnosis. In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with Paige cofounder and Chief Scientific Officer Thomas Fuchs. He’s also dean of artificial intelligence and human health at the Icahn School of Medicine at Mount Sinai.

Tune in to hear Fuchs on machine learning and AI applications and how technology brings better precision and care to the medical industry.

Time Stamps

1:03: Background on Paige and computational pathology
7:28: How AI models use visual pattern recognition to accelerate cancer detection
11:27: Paige’s results using AI in cancer imaging and pathology
15:16: Challenges in cancer detection
17:38: Thomas Fuchs’ background in engineering at JPL and NASA
24:10: AI’s future in the medical industry

You Might Also Like:

Dotlumen CEO Cornel Amariei on Assistive Technology for the Visually Impaired – Ep. 217

NVIDIA Inception program member Dotlumen is building AI glasses to help people with visual impairments navigate the world. CEO and founder Cornel Amariei discusses the processes of developing assistive technology and its potential for enhancing accessibility.

Personalized Health: Viome’s Guru Banavar Discusses Startup’s AI-Driven Approach – Ep. 216

Viome CTO Guru Banavar discusses innovations in AI and genomics and how technology has advanced personalized health and wellness. Viome aims to tackle the root causes of chronic diseases by analyzing microbiomes and gene expression, transforming biological data into practical recommendations for a holistic approach to wellness.

Cardiac Clarity: Dr. Keith Channon Talks Revolutionizing Heart Health With AI – Ep. 212

Caristo Diagnostics has developed an AI-powered solution for detecting coronary inflammation in cardiac CT scans. Dr. Keith Channon, cofounder and chief medical officer, discusses how Caristo uses AI to improve treatment plans and risk predictions by providing patient-specific readouts.

Subscribe to the AI Podcast

Get the AI Podcast through iTunes, Amazon Music, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Read More

Mission NIMpossible: Decoding the Microservices That Accelerate Generative AI

Mission NIMpossible: Decoding the Microservices That Accelerate Generative AI

In the rapidly evolving world of artificial intelligence, generative AI is captivating imaginations and transforming industries. Behind the scenes, an unsung hero is making it all possible: microservices architecture.

The Building Blocks of Modern AI Applications

Microservices have emerged as a powerful architecture, fundamentally changing how people design, build and deploy software.

A microservices architecture breaks down an application into a collection of loosely coupled, independently deployable services. Each service is responsible for a specific capability and communicates with other services through well-defined application programming interfaces, or APIs. This modular approach stands in stark contrast to traditional all-in-one architectures, in which all functionality is bundled into a single, tightly integrated application.

By decoupling services, teams can work on different components simultaneously, accelerating development processes and allowing updates to be rolled out independently without affecting the entire application. Developers can focus on building and improving specific services, leading to better code quality and faster problem resolution. Such specialization allows developers to become experts in their particular domain.

Services can be scaled independently based on demand, optimizing resource utilization and improving overall system performance. In addition, different services can use different technologies, allowing developers to choose the best tools for each specific task.

A Perfect Match: Microservices and Generative AI

The microservices architecture is particularly well-suited for developing generative AI applications due to its scalability, enhanced modularity and flexibility.

AI models, especially large language models, require significant computational resources. Microservices allow for efficient scaling of these resource-intensive components without affecting the entire system.

Generative AI applications often involve multiple steps, such as data preprocessing, model inference and post-processing. Microservices enable each step to be developed, optimized and scaled independently. Plus, as AI models and techniques evolve rapidly, a microservices architecture allows for easier integration of new models as well as the replacement of existing ones without disrupting the entire application.

NVIDIA NIM: Simplifying Generative AI Deployment

As the demand for AI-powered applications grows, developers face challenges in efficiently deploying and managing AI models.

NVIDIA NIM inference microservices provide models as optimized containers to deploy in the cloud, data centers, workstations, desktops and laptops. Each NIM container includes the pretrained AI models and all the necessary runtime components, making it simple to integrate AI capabilities into applications.

NIM offers a game-changing approach for application developers looking to incorporate AI functionality by providing simplified integration, production-readiness and flexibility. Developers can focus on building their applications without worrying about the complexities of data preparation, model training or customization, as NIM inference microservices are optimized for performance, come with runtime optimizations and support industry-standard APIs.

AI at Your Fingertips: NVIDIA NIM on Workstations and PCs

Building enterprise generative AI applications comes with many challenges. While cloud-hosted model APIs can help developers get started, issues related to data privacy, security, model response latency, accuracy, API costs and scaling often hinder the path to production.

Workstations with NIM provide developers with secure access to a broad range of models and performance-optimized inference microservices.

By avoiding the latency, cost and compliance concerns associated with cloud-hosted APIs as well as the complexities of model deployment, developers can focus on application development. This accelerates the delivery of production-ready generative AI applications — enabling seamless, automatic scale out with performance optimization in data centers and the cloud.

The recently announced general availability of the Meta Llama 3 8B model as a NIM, which can run locally on RTX systems, brings state-of-the-art language model capabilities to individual developers, enabling local testing and experimentation without the need for cloud resources. With NIM running locally, developers can create sophisticated retrieval-augmented generation (RAG) projects right on their workstations.

Local RAG refers to implementing RAG systems entirely on local hardware, without relying on cloud-based services or external APIs.

Developers can use the Llama 3 8B NIM on workstations with one or more NVIDIA RTX 6000 Ada Generation GPUs or on NVIDIA RTX systems to build end-to-end RAG systems entirely on local hardware. This setup allows developers to tap the full power of Llama 3 8B, ensuring high performance and low latency.

By running the entire RAG pipeline locally, developers can maintain complete control over their data, ensuring privacy and security. This approach is particularly helpful for developers building applications that require real-time responses and high accuracy, such as customer-support chatbots, personalized content-generation tools and interactive virtual assistants.

Hybrid RAG combines local and cloud-based resources to optimize performance and flexibility in AI applications. With NVIDIA AI Workbench, developers can get started with the hybrid-RAG Workbench Project — an example application that can be used to run vector databases and embedding models locally while performing inference using NIM in the cloud or data center, offering a flexible approach to resource allocation.

This hybrid setup allows developers to balance the computational load between local and cloud resources, optimizing performance and cost. For example, the vector database and embedding models can be hosted on local workstations to ensure fast data retrieval and processing, while the more computationally intensive inference tasks can be offloaded to powerful cloud-based NIM inference microservices. This flexibility enables developers to scale their applications seamlessly, accommodating varying workloads and ensuring consistent performance.

NVIDIA ACE NIM inference microservices bring digital humans, AI non-playable characters (NPCs) and interactive avatars for customer service to life with generative AI, running on RTX PCs and workstations.

ACE NIM inference microservices for speech — including Riva automatic speech recognition, text-to-speech and neural machine translation — allow accurate transcription, translation and realistic voices.

The NVIDIA Nemotron small language model is a NIM for intelligence that includes INT4 quantization for minimal memory usage and supports roleplay and RAG use cases.

And ACE NIM inference microservices for appearance include Audio2Face and Omniverse RTX for lifelike animation with ultrarealistic visuals. These provide more immersive and engaging gaming characters, as well as more satisfying experiences for users interacting with virtual customer-service agents.

Dive Into NIM

As AI progresses, the ability to rapidly deploy and scale its capabilities will become increasingly crucial.

NVIDIA NIM microservices provide the foundation for this new era of AI application development, enabling breakthrough innovations. Whether building the next generation of AI-powered games, developing advanced natural language processing applications or creating intelligent automation systems, users can access these powerful development tools at their fingertips.

Ways to get started:

  • Experience and interact with NVIDIA NIM microservices on ai.nvidia.com.
  • Join the NVIDIA Developer Program and get free access to NIM for testing and prototyping AI-powered applications.
  • Buy an NVIDIA AI Enterprise license with a free 90-day evaluation period for production deployment and use NVIDIA NIM to self-host AI models in the cloud or in data centers.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

Empowering NGOs with generative AI in the fight against human trafficking

Empowering NGOs with generative AI in the fight against human trafficking

Tech Against Trafficking, Issara Institute, and Polaris icons on a blue to green gradient background.

Human trafficking and labor exploitation are ancient problems that have evolved with each major leap in technology, from the agricultural revolution to the information age. But what if the right combination of people, data, and technology could help to tackle these problems on an unprecedented scale? With the emergence of generative AI models, which can create rich text and media from natural language prompts and real-world understanding, we are seeing new opportunities to advance the work of organizations that are leading this fight on the front lines.

Presentation of generative AI tools and opportunities at Issara Global Forum, Bangkok, November 2023. Photograph shows presenter, fellow panel members, and audience.
Presentation of generative AI tools and opportunities at Issara Global Forum, Bangkok, November 2023.

One effort to combat trafficking is the Tech Against Trafficking (opens in new tab) accelerator program, in which tech companies collaborate with anti-trafficking organizations and global experts to help eradicate trafficking with technology. In the latest accelerator, Microsoft worked with Issara Institute (opens in new tab) and Polaris (opens in new tab) to explore how generative AI could help NGOs drive the ethical transformation of global supply chains. By aiming to reduce all forms and levels of worker exploitation, including but not limited to the most serious cases of human trafficking, these organizations aim to make systematic labor exploitation impossible to hide. 

The main issue to contend with, however, is that it is all too easy for such practices to remain hidden, even across datasets that contain evidence of their existence. Many NGOs lack the resources to “connect the dots” at the necessary scale, and time spent on data work is often at the expense of direct assistance activities. Through the accelerator, we developed several first-of-their-kind workflows for real-world data tasks – automating the creation of rich intelligence reports and helping to motivate collective, evidence-based action. We are pleased to announce that we have now combined these workflows into a single system – Intelligence Toolkit (opens in new tab) – and published the code to GitHub for use by the broader community.

Building on multi-stakeholder engagements 

Since Microsoft co-founded Tech Against Trafficking (TAT) in 2018, we have worked with a range of UN agencies and NGOs to understand the challenges facing the anti-trafficking community, as well as opportunities for new research technologies (opens in new tab) to drive evidence-based action at scale. For example, our collaboration with IOM (UN Migration) (opens in new tab) in the 2019 TAT accelerator program (opens in new tab) resulted in new tools (opens in new tab) for private data release, as well as new open datasets (opens in new tab) for the community. However, while growing the shared evidence base enables better decision making and policy development, it is not sufficient. NGOs and other anti-trafficking organizations need time and resources to analyze such datasets, discover relevant insights, and write the intelligence reports that drive real-world action. 

For the 2023-2024 TAT accelerator program, we worked with Issara and Polaris to understand the potential for generative AI to support such analysis within their own organizations and geographies of concern (South and Southeast Asia for Issara; Mexico and the U.S. for the Polaris Nonechka (opens in new tab) project). Using a combination of open and internal datasets, we developed and refined a series of proof-of-concept interfaces before sharing them for stakeholder feedback at the annual TAT Summit (opens in new tab), Issara Global Forum (opens in new tab), and NetHope Global Summit (opens in new tab) events. We learned many lessons through this process, helping to shape what community-oriented tool we should build, how to build it, and when it should be used: 

  • What: Use to automate analysis and reporting under expert supervision. For NGO staff members that need to divide their time between frontline assistance and data work, any tool that increases the efficiency and quality of data work can create more time for more effective assistance. 
  • How: Use an appropriate combination of statistical and generative methods. Generative AI excels at translating data summaries into narrative reports, but statistical methods are also important for identifying all the potential insights (e.g., patterns, clusters, networks) worth reporting.
  • When: Use for individual-level case data and entity data. Worker voice data (e.g., employer grievances) creates the need to both protect the privacy of workers and connect data across employers in ways that reveal aggregate risk. Neither is well supported by existing data tools.

Developing Intelligence Toolkit as a gateway to generative AI 

For the various intelligence-generating activities shared with us by Issara and Polaris, as well as prior accelerator participants, we developed interactive workflows supported by different combinations of statistical methods and generative AI. Each was developed as a lightweight, no-code user interface that supports the end-to-end process of data upload, preparation, analysis, and export. Our Intelligence Toolkit (opens in new tab) application combines six of these workflows with the most relevance to the broader community. Following the recent TAT showcase event (opens in new tab) that shared how this application was being used internally at both Issara and Polaris, we are pleased to announce the general availability of this software on GitHub (opens in new tab)

The six workflows currently supported are:

Data Synthesis generates differentially private datasets and summaries from case records

Our approach to private data release using synthetic data was first developed in the 2019 TAT accelerator program with IOM (UN Migration) (opens in new tab), and IOM recently used our existing open source tools to release the largest individual-level dataset (opens in new tab) on victims of trafficking that is both publicly available and protected by differential privacy. The synthetic datasets we generate retain the structure and statistics of the original sensitive datasets, but individual records do not represent actual people and the presence of any individual in the sensitive dataset is obscured by calibrated noise injected into the synthesis process. 

Because other workflows require access to individual-level case data, we chose to integrate a streamlined approach to synthetic data generation in Intelligence Toolkit. This workflow was used by both Issara and Polaris to translate worker voice datasets into a form that could be shared with the community, as well as used in other workflows to guarantee that the resulting reports preserve privacy by design. 

Attribute Patterns generates reports on attribute patterns detected in streams of case records 

Our approach to detecting patterns of attributes in timestamped case records was first developed in the 2021 TAT accelerator program with Unseen UK, becoming one of our key tools for discovering insights in real-world data. This approach takes the common activity of “drilling down” into data dashboards by progressively selecting data values of interest and inverts it, generating all combinations of record attributes in each time period that appear interesting from a statistical perspective. It is vastly more efficient for domain experts to review lists of such patterns than to manually search for them one at a time. 

Over the last year, we have collaborated with researchers at Johns Hopkins University and the University of Delaware to redesign this approach using Graph Fusion Encoder Embedding (opens in new tab). Unlike previous iterations, the Intelligence Toolkit workflow does not end with a list of attribute patterns. Instead, the analyst is invited to use generative AI to create reports that describe the pattern in narrative form, including what it represents, how it has varied over time, what other attributes co-occur with the pattern, what competing hypotheses could potentially explain the pattern, and what possible actions could be taken in response. In this and all subsequent workflows, users can edit the AI system prompts in ways that tailor reports to their specific needs. In the latest TAT accelerator, Issara used this workflow to discover and describe patterns of worker-reported grievances over time. 

Alt text: Screenshot of Attribute Patterns workflow at the “Generate AI pattern reports” stage. The target dataset is Issara worker voice data. The selected attribute pattern shows a peak in the first half of 2020 for Burmese males experiencing working conditions issues in Thailand. The AI-generated pattern report explains this pattern in narrative form, and the editable prompt text allows the user to customize the nature of such pattern reports. 
Attribute Patterns workflow with Issara worker voice data. The selected attribute pattern shows a peak in the first half of 2020 for Burmese males experiencing working conditions issues in Thailand. The AI-generated pattern report explains this pattern.

Group Narratives generates reports by defining and comparing groups of case records

This workflow aims to mimic the kinds of group-level comparisons that often lend structure to data narratives. For example, Polaris was interested in the different routes taken by H-2A visa (opens in new tab) workers from their place of origin to their place of work, the different kinds of grievances they reported, and how this varied by worker age. H-2A workers are highly reported as potential victims of labor trafficking to the National Human Trafficking Hotline (opens in new tab). This analysis was achieved by specifying a prefilter (H-2A visa), group definition (source-destination), comparison attributes (workload issues, conditions issues, etc.), and comparison window (age band). Given the resulting table of counts, ranks, and deltas, the user is then able to generate AI reports for specific groups, reports comparing the top N groups, and so on. 

Alt text: Screenshot of Group Narratives workflow at the “Generate AI group reports” stage. The target dataset is Polaris worker voice data collected in the Nonechka project. The selected top three routes from worker origin to work site all connect regions of Mexico to sites in North Carolina and reveal a range of reported issues linked to conditions, workload, treatment, payment, and control. The AI-generated group report explains these routes in narrative form, and the editable prompt text allows the user to customize the nature of such group reports. 
Group Narratives workflow with Polaris worker voice data collected in the Nonechka project. The selected top three routes from worker origin to work site reveal a range of reported issues. The AI-generated group report describes these routes.

Record Matching generates reports on record matches detected across entity datasets 

While previous workflows are independent of the identities of data subjects, in some cases such identities are the very focus of analysis. This often occurs not for case data linked to people, but for entity data linked to organizations. In the TAT accelerator, for example, Issara presented the problem of having two product databases describing many of the same employers, but without any links between common entities or any notion of a canonical identity. Connecting these two databases was critical for providing a comprehensive picture of each employer. The problem is also a general one; it arises whenever organizations seek to combine internal and external data sources on the same real-world entities (e.g., supplier companies). 

Our solution was to create a record matching workflow based on the text embedding capabilities of large language models (LLMs). In addition to generating text, LLMs can also map arbitrary chunks of text into points in vector space, where similar vector positions represent similar semantics for the associated text chunks. Given the text embeddings of entity records taken from different databases, the tool is therefore able to identify groups of sufficiently similar entities so as to suggest a real-world match. Generative AI is then used to evaluate and prioritize these matches for human review and potential record linking. 

Risk Networks generates reports on risk exposure for networks of related entities 

Our risk networks workflow builds on our earlier work tackling corruption in the public procurement process, providing a streamlined interface for inferring entity relationships from shared attributes and then propagating red flag risks throughout the resulting networks. As in the record matching workflow, text embeddings are used to identify fuzzy matches between similar entity names and contact details that have different spellings or formats. Since LLMs tend to struggle with graph reasoning problems, the workflow computes and converts to text all shortest paths from flagged entities to the target entity of the network. These path descriptions then provide context for the LLM to reason about the potential for relationship-mediated risk exposure among entities with different degrees of relatedness and similarity. In the TAT accelerator, Polaris used this workflow together with open-source intelligence to analyze risk patterns within networks of employers recruiting temporary agricultural workers via the H-2A visa program. 

Question Answering generates reports from an entity-rich document collection

Question answering is one of the leading use cases for generative AI, given the ability of LLMs to perform in-context learning over a set of input texts. For situations where the size of data to be queried exceeds the context window of the LLM, retrieval-augmented generation (RAG) can enable embedding-based matching of query text against input texts, before using the retrieved texts to help the LLM generate a grounded response. A major limitation of standard RAG, however, is that there is no guarantee that the retrieved texts provide a sufficiently comprehensive grounding to answer user questions, especially if the questions ask for summaries rather than facts. Our recent work (opens in new tab) using LLM-derived knowledge graphs as a RAG index aims to provide such grounding, but requires an extensive indexing process before any questions can be answered. 

For Intelligence Toolkit, we therefore developed a new RAG approach for lightweight yet comprehensive question answering over collections of existing reports, targeted at NGOs wanting to leverage both their own report collections and those of other organizations (e.g., see collections of public reports from Issara (opens in new tab), Polaris (opens in new tab), Unseen (opens in new tab), and IOM (opens in new tab)). In this approach, text chunks that match the user’s question are first mined for question-answer pairs, before the question is augmented with any partial answers and embedded again alongside both unmined text chunks and the mined questions and answers. This process repeats until sufficient question-answer pairs have been extracted and matched against the augmented question, providing both an independent FAQ and grounding for the LLM answer to the original user question. 

Alt text: Screenshot of Question Answering workflow at the “Generate AI answer reports” stage. The target dataset is a compilation of PDF reports published independently by Issara and Polaris. The user query of “In what ways do Issara and Polaris take a similar approach?” was answering by an AI-generated report titled “Comparative Analysis of Issara and Polaris Approaches to Combatting Modern Slavery”. The editable prompt text allows the user to customize the nature of such answer reports. 
Question Answering workflow with PDF reports published independently by Issara and Polaris. The user query of “In what ways do Issara and Polaris take a similar approach?” was answering by an AI-generated report that compares their respective approaches.

Continuing the fight against all kinds of societal threats

Intelligence Toolkit is our latest example of a human rights technology developed with global experts in the anti-trafficking community, yet applicable to a broad class of problems impacting societal resilience as a whole. As we work with TAT to help NGOs and other organizations use Intelligence Toolkit for their own data challenges, we hope to identify opportunities to refine and expand our initial workflows.

Across multiple stakeholder events, we have helped to raise awareness of generative AI and the real risks that misuse could pose to vulnerable populations. At the same time, generative AI has unprecedented potential to drive insight discovery, communication, and collective action across entire communities, in ways that are essential for tackling societal problems at scale. With Intelligence Toolkit, we have taken our first steps towards understanding how generative AI can be shaped into the tools that society most urgently needs. 

The post Empowering NGOs with generative AI in the fight against human trafficking appeared first on Microsoft Research.

Read More

Learn how to develop Android applications with ExecuTorch and Llama models

This blog is courtesy of the PyTorch team at Arm. More details can be found here.

Arm’s compute platform is delivering GenAI applications on phones, laptops, and servers. Cost, privacy, performance, security, and energy efficiency are just some of the reasons developers are investigating on-device AI.

A new Learning Path explaining how to leverage the capabilities of large language models (LLMs) on Android using ExecuTorch and XNNPACK is now available.

Here’s a summary of what you’ll learn:

  • Development Environment setup

    The Learning Path begins by guiding you through setting up your development environment, ensuring you have all the necessary tools installed, including Android Studio, the Android NDK, Java JDK, and Python.

  • ExecuTorch and XNNPACK

    You’ll learn about the core technologies: ExecuTorch, a framework for deploying PyTorch models to edge devices, and XNNPACK, a high-performance library for executing neural networks on Arm-based platforms.

  • Llama models

    The Learning Path explores Llama, a family of powerful LLMs, focusing specifically on the 8B Llama 3 model. You’ll learn about quantization techniques, which are essential for optimizing model size and performance on mobile devices.

  • Prepare Llama models for ExecuTorch

    You’ll be guided through the process of downloading, exporting, and evaluating Llama models, ensuring they are ready for deployment using ExecuTorch.

  • Check model performance on Android

    The Learning Path walks you through cross-compiling the Llama runner binary for Android, allowing you to test your model’s performance on your phone.

  • Build and run an Android Chat App

    Finally, you’ll learn how to build a native Android chat app using the LlamaDemo application from the ExecuTorch repository. This hands-on experience allows you to put your knowledge into practice and create a real-world application.

Explore this Learning Path if you want to learn how to leverage the power of LLMs on your Android phone, and gain expertise in tools for on-device machine learning.

Dig into the excitement of building Android chat apps and understand more about how they work on the Arm Developer Hub.

Read More

Accurate Knowledge Distillation via N-best Reranking

We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model’s training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal is validated through experiments on the WMT’21 German ↔ English and Chinese ↔ English translation tasks. Our results demonstrate that utilizing…Apple Machine Learning Research