New approach enables sustainable machine learning for remote-sensing applications.Read More
Research Focus: Week of February 5, 2024
EVENT RECAP
Microsoft Research Forum series kicks off with focus on the promise and challenges of AI
With a look back at the incredible changes of 2023 and a look ahead at the tangible advances to come, the inaugural Microsoft Research Forum explored bold new ideas and important issues in the era of general AI. Leaders from Microsoft Research, including the AI Frontiers team and the AI4Science lab, discussed the importance of openness and collaboration to enable successful and responsible AI research.
Peter Lee, CVP, Microsoft Research and Incubations, led off the discussion, followed by a panel exploring some of the biggest potential AI breakthroughs, along with challenges to overcome. This includes:
- Building AI systems that become helpers in the physical world
- Uncovering the building blocks of human reasoning
- Making AI technology smaller and less costly, to improve performance and availability
- Helping AI learn from people that use it, rather than simply answering questions
In the “lightning round,” Microsoft researchers explored current work to improve pretrained large language models, understand and evaluate foundation models, facilitate breakthroughs in molecular science, augment human decision making, and improve visual storytelling.
To learn more, check out this recap (opens in new tab) and browse the on-demand session replays (opens in new tab). Be sure to register for upcoming episodes (opens in new tab).
Spotlight: On-demand video
AI Explainer: Foundation models and the next era of AI
Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.
NEW RESEARCH
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Transformer-based large language models (LLMs) have become a fixture in machine learning. Correspondingly, significant resources are allocated towards research to further advance this technology, typically resulting in models of increasing size that are trained on increasing amounts of data.
In a recent paper, The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction, researchers from Microsoft demonstrate a surprising result: that it is possible to significantly improve LLM performance by selectively removing higher-order components of their constituent weight matrices. As covered in a Microsoft Research Forum lightning talk, this simple intervention—LAyer-SElective Rank reduction (LASER)—can be done on a model after training has been completed, and requires no additional parameters or data. In extensive experiments, the researchers demonstrate the generality of this finding across language models and datasets. They provide in-depth analyses offering insights into both when LASER is effective and the mechanism by which it operates.
NEW RESEARCH
Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets
Business intelligence tools make it easy to analyze large amounts of data. In these tools, top-k aggregation queries are used to summarize and identify important groups of data. These queries are usually processed by computing exact aggregates for all groups and then selecting the groups with the top-k aggregate values. However, this can be inefficient for high-cardinality large datasets, where intermediate results may not fit within the local cache of multi-core processors, leading to excessive data movement.
Researchers from Microsoft, in their recent paper: Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets, introduce a new cache-conscious algorithm to address this. The algorithm efficiently computes exact top-k aggregates without fully processing all groups. Aggregation over large datasets requires multiple passes of data partitioning and repartitioning, thereby presenting a significant opportunity to reduce partitioning overhead for top-k aggregation queries. The algorithm leverages skewness in data distribution to select a small set of candidate groups for early aggregation. This helps eliminate many non-candidates group partitions through efficient partitioning techniques and coarse-grained statistics without computing exact aggregation. Empirical evaluation using both real-world and synthetic datasets demonstrate that the algorithm achieves a median speed-up of over 3x for monotonic aggregation functions and 1.4x for non-monotonic functions, compared to existing cache-conscious aggregation methods, across standard k value ranges (1 to 100).
AWARDS
Six Microsoft researchers named 2023 ACM Fellows
The Association for Computing Machinery’s (ACM) annual fellows award recognizes people who make transformative contributions to computing science and technology. For 2023, the global organization named six researchers from Microsoft among its 68 award recipients.
Jianfeng Gao – VP and Distinguished Scientist
For contributions to machine learning for web search, natural language processing, and conversational systems
Sumit Gulwani – Partner Research Manager
For contributions to AI-assisted programming for developers, data scientists, end users, and students
Nicole Immorlica – Senior Principal Researcher
For contributions to economics and computation including market design, auctions, and social networks
Stefan Saroiu – Senior Principal Researcher
For contributions to memory security and trusted computing
Manik Varma – VP and Distinguished Scientist
For contributions to machine learning and its applications
Xing Xie – Senior Principal Research Manager
For contributions to spatial data mining and recommendation systems
The post Research Focus: Week of February 5, 2024 appeared first on Microsoft Research.
Beyond ‘Data-Driven’: How Energy-Efficient Computing for AI Is Propelling Innovation and Savings Across Industries
With advances in computing, sophisticated AI models and machine learning are having a profound impact on business and society. Industries can use AI to quickly analyze vast bodies of data, allowing them to derive meaningful insights, make predictions and automate processes for greater efficiency.
In the public sector, government agencies are achieving superior disaster preparedness. Biomedical researchers are bringing novel drugs to market faster. Telecommunications providers are building more energy-efficient networks. Manufacturers are trimming emissions from product design, development and manufacturing processes. Hollywood studios are creating impressive visual effects at a fraction of the cost and time. Robots are being deployed on important missions to help preserve the Earth. And investment advisors are running more trade scenarios to optimize portfolios.
Eighty-two percent of companies surveyed are already using or exploring AI, and 84% report that they’re increasing investments in data and AI initiatives. Any organization that delays AI implementation risks missing out on new efficiency gains and becoming obsolete.
However, AI workloads are computationally demanding, and legacy computing systems are ill-equipped for the development and deployment of AI. CPU-based compute requires linear growth in power input to meet the increased processing needs of AI and data-heavy workloads. If data centers are using carbon-based energy, it’s impossible for enterprises to innovate using AI while controlling greenhouse gas emissions and meeting sustainability commitments. Plus, many countries are introducing tougher regulations to enforce data center carbon reporting.
Accelerated computing — the use of GPUs and special hardware, software and parallel computing techniques — has exponentially improved the performance and energy efficiency of data centers.
Below, read more on how industries are using energy-efficient computing to scale AI, improve products and services, and reduce emissions and operational costs.
The Public Sector Drives Research, Delivers Improved Citizen Services
Data is playing an increasingly important role in government services, including for public health and disease surveillance, scientific research, social security administration, and extreme-weather monitoring and management. These operations require platforms and systems that can handle large volumes of data, provide real-time data access, and ensure data quality and accuracy.
But many government agencies rely on legacy systems that are difficult to maintain, don’t efficiently integrate with modern technologies and consume excessive energy. To handle increasingly demanding workloads while sticking to sustainability goals, government agencies and public organizations must adopt more efficient computing solutions.
The U.S. Department of Energy is making inroads in this endeavor. The department runs the National Energy Research Scientific Computing Center for open science. NERSC develops simulations, data analytics and machine learning solutions to accelerate scientific discovery through computation. Seeking new computing efficiencies, the center measured results across four of its key high performance computing and AI applications. It clocked how fast the applications ran, as well as how much energy they consumed using CPU-only versus GPU-accelerated nodes on Perlmutter, one of the world’s largest supercomputers.
At performance parity, a GPU-accelerated cluster consumes 588 less megawatt hours per month, representing a 5x improvement in energy efficiency. By running the same workload on GPUs rather than CPU-only instances, researchers could save millions of dollars per month. These gains mean that the 8,000+ researchers using NERSC computing infrastructure can perform more experiments on important use cases, like studying subatomic interactions to uncover new green energy sources, developing 3D maps of the universe and bolstering a broad range of innovations in materials science and quantum physics.
Governments help protect citizens from adverse weather events, such as hurricanes, floods, blizzards and heat waves. With GPU deployments, climate models, like the IFS model from the European Centre for Medium-Range Weather Forecasts, can run up to 24x faster while reducing annual energy usage by up to 127 gigawatt hours compared to CPU-only systems. As extreme-weather events occur with greater frequency and, often, with little warning, meteorology centers can use accelerated computing to generate more accurate, timely forecasts that improve readiness and response.
By adopting more efficient computing systems, governments can save costs while equipping researchers with the tools they need for scientific discoveries to improve climate modeling and forecasting, as well as deliver superior services in public health, disaster relief and more.
Drug Discovery Researchers Conduct Virtual Screenings, Generate New Proteins at Light Speed
Drug development has always been a time-consuming process that involves innumerable calculations and thousands of experiments to screen new compounds. To develop novel medications, the binding properties of small molecules must be tested against protein targets, a cumbersome task required for up to billions of compounds — which translates to billions of CPU hours and hundreds of millions of dollars each year.
Highly accurate AI models can now predict protein structures, generate small molecules, predict protein-ligand binding and perform virtual screening.
Researchers at Oak Ridge National Laboratory (ORNL) and Scripps Research have shown that screening a dataset of billions of compounds against a protein, which has traditionally taken years, can now be completed in just hours with accelerated computing. By running AutoDock, a molecular-modeling simulation software, on a supercomputer with more than 27,000 NVIDIA GPUs, ORNL screened more than 25,000 molecules per second and evaluated the docking of 1 billion compounds in less than 12 hours. This is a speedup of more than 50x compared with running AutoDock on CPUs.
Iambic, an AI platform for drug discovery, has developed an approach combining quantum chemistry and AI that calculates quantum-accurate molecular-binding energies and forces at a fraction of the computational expense of traditional methods. These energies and forces can power molecular-dynamics simulations at unprecedented speed and accuracy. With its OrbNet model, Iambic uses a graph transformer to power quantum-mechanical operators that represent chemical structures. The company is using the technology to identify drug molecules that could deactivate proteins linked to certain cancer types.
As the number of new drug approvals declines and research and development and computing costs rise, optimizing drug discovery with accelerated computing can help control energy expenditures while creating a far-reaching impact on medical research, treatments and patient outcomes.
Telcos Scale Network Capacity
To connect their subscribers, telecommunications companies send data across sprawling networks of cell towers, fiber-optic cables and wireless signals. In the U.S., AT&T’s network connects more than 100 million users from the Aleutian Islands in Alaska to the Florida Keys, processing 500 petabytes of data per day. As telcos add compute-intensive workloads like AI and user plane function (UPF) to process and route data over 5G networks, power consumption costs are skyrocketing.
AT&T processes trillions of data rows to support field technician dispatch operations, generate performance reports and power mobile connectivity. To process data faster, AT&T tested the NVIDIA RAPIDS Accelerator for Apache Spark. By spreading work across nodes in a cluster, the software processed 2.8 trillion rows of information — a month’s worth of mobile data — in just five hours. That’s 3.3x faster at 60% lower cost than any prior test.
Other telcos are saving energy by offloading networking and security tasks to SmartNICs and data processing units (DPUs) to reduce server power consumption. Ericsson, a leading telecommunications equipment manufacturer, tested a 5G UPF on servers with and without network offload to an NVIDIA ConnectX-6 Dx NIC. At maximum network traffic, the network offloading provided 23% power savings. The study also found that CPU micro-sleeps and frequency scaling — allowing CPUs to sleep and slow their clock frequencies during low workload levels — saved more than 10% of power per CPU.
Hardware-accelerated networking offloads like these allow telco operators to increase network capacity without a proportional increase in energy consumption, ensuring that networks can scale to handle increased demand and conserve energy during times of low use. By adopting energy-efficient accelerated computing, telco operators can reduce their carbon footprint, improve scalability and lower operational costs.
Manufacturing and Product Design Teams Achieve Faster, Cleaner Simulations
Many industries rely on computational fluid dynamics during design and engineering processes to model fluid flows, combustion, heat transfer and aeroacoustics. The aerospace and automotive industries use CFD to model vehicle aerodynamics, and the energy and environmental industries use it to optimize fluid-particle refining systems and model reactions, wind-farm air flow and hydro-plant water flow.
Traditional CFD methods are compute-intensive, using nearly 25 billion CPU core hours annually, and consume massive amounts of energy. This is a major obstacle for industrial companies looking to reduce carbon emissions and achieve net zero. Parallel computing with GPUs is making a difference.
Ansys, an engineering simulation company, is speeding up CFD physics models with GPUs to help customers drastically reduce emissions while improving the aerodynamics of vehicles. To measure computing efficiency, the company ran the benchmark DrivAer model, used for optimizing vehicle geometry, on different CPU and GPU configurations using its Fluent fluid-simulation software. Results showed that a single GPU achieved more than 5x greater performance than a cluster with 80 CPU cores. With eight GPUs, the simulation experienced more than a 30x speedup. And a server with six GPUs reduced power consumption 4x compared with a high performance computing CPU cluster delivering the same performance.
CPFD offers GPU parallelization for Barracuda Virtual Reactor, a physics-based engineering software package capable of predicting fluid, particulate-solid, thermal and chemically reacting behavior in fluidized bed reactors and other fluid-particle systems.
Using CPFD’s Barracuda software, green energy supplier ThermoChem Recovery International (TRI) developed technology that converts municipal solid waste and woody biomass into jet fuel. Since its partnership with CPFD began 14 years ago, TRI has benefitted from 1,500x model speedups as CPFD moved its code from CPU hardware to full GPU parallelization. With these exponential speedups, models that would’ve previously taken years to run can now be completed in a day or less, saving millions of dollars in data center infrastructure and energy costs.
With GPU parallelization and energy-efficient architectures, industrial design processes that rely on CFD can benefit from dramatically faster simulations while achieving significant energy savings.
Media and Entertainment Boost Rendering
Rendering visual effects (VFX) and stylized animations consumes nearly 10 billion CPU core hours per year in the media and entertainment industry. A single animated film can require over 50,000 CPU cores working for more than 300 million hours. Enabling this necessitates a large space for data centers, climate control and computing — all of which result in substantial expenditures and a sizable carbon footprint.
Accelerated computing offers a more energy-efficient way to produce VFX and animation, enabling studios to iterate faster and compress production times.
Studios like Wylie Co., known for visuals in the Oscar-winning film Dune and in HBO and Netflix features, are adopting GPU-powered rendering to improve performance and save energy. After migrating to GPU rendering, Wylie Co. realized a 24x performance boost over CPUs.
Image Engine, a VFX company involved in creating Marvel Entertainment movies and Star Wars-based television shows, observed a 25x performance improvement by using GPUs for rendering.
GPUs can increase performance up to 46x while reducing energy consumption by 10x and capital expenses by 6x. With accelerated computing, the media and entertainment industry has the potential to save a staggering $900 million in hardware acquisition costs worldwide and conserve 215 gigawatt hours of energy that would have been consumed by CPU-based render farms. Such a shift would lead to substantial cost savings and significant reductions in the industry’s environmental impact.
Robotics Developers Extend Battery Life for Important Missions
With edge AI and supercomputing now available using compact modules, demand for robots is surging for use in factory logistics, sales showrooms, urban delivery services and even ocean exploration. Mobile robot shipments are expected to climb from 549,000 units last year to 3 million by 2030, with revenue forecast to jump from more than $24 billion to $111 billion in the same period, according to ABI Research.
Most robots are battery-operated and rely on an array of lidar sensors and cameras for navigation. Robots communicate with edge servers or clouds for mission dispatch and require high throughput due to diverse sets of camera sensors as well as low latency for real-time decision-making. These factors necessitate energy-efficient onboard computing.
Accelerated edge computing can be optimized to decode images, process video and analyze lidar data to enable robot navigation of unstructured environments. This allows developers to build and deploy more energy-efficient machines that can remain in service for longer without needing to charge.
The Woods Hole Oceanographic Institution Autonomous Robotics and Perception Laboratory (WARPLab) and MIT are using the NVIDIA Jetson Orin platform for energy-efficient edge AI and robotics to power an autonomous underwater vehicle to study coral reefs.
The AUV, named CUREE, for Curious Underwater Robot for Ecosystem Exploration, gathers visual, audio and other environmental data to help understand the human impact on reefs and sea life. With 25% of the vehicle’s power needed for data collection, energy efficiency is a must. With Jetson Orin, CUREE constructs 3D models of reefs, tracks marine organisms and plant life, and autonomously navigates and gathers data. The AUV’s onboard energy-efficient computing also powers convolutional neural networks that enhance underwater vision by reducing backscatter and correcting colors. This enables CUREE to transmit clear images to scientists, facilitating fish detection and reef analysis.
Driverless smart tractors with energy-efficient edge computing are now available to help farmers with automation and data analysis. The Founder Series MK-V tractors, designed by NVIDIA Inception member Monarch Tractor, combine electrification, automation and data analysis to help farmers reduce their carbon footprint, improve field safety and streamline farming operations. Using onboard AI video analytics, the tractor can traverse rows of crops, enabling it to navigate even in remote areas without connectivity or GPS.
The MK-V tractor produces zero emissions and is estimated to save farmers $2,600 annually compared to diesel tractors. The tractor’s AI data analysis advises farmers on how to reduce the use of expensive, harmful herbicides that deplete the soil. Decreasing the volume of chemicals is a win all around, empowering farmers to protect the quality of soil, reduce herbicide expenditures and deliver more naturally cultivated produce to consumers.
As energy-efficient edge computing becomes more accessible to enable AI, expect to see growing use cases for mobile robots that can navigate complex environments, make split-second decisions, interact with humans and safely perform difficult tasks with precision.
Financial Services Use Data to Inform Investment Decisions
Financial services is an incredibly data-intensive industry. Bankers and asset managers pursuing the best results for investors rely on AI algorithms to churn through terabytes of unstructured data from economic indicators, earnings reports, news articles, and disparate environmental, social and governance metrics to generate market insight that inform investments. Plus, financial services companies must comb through network data and transactions to prevent fraud and protect accounts.
NVIDIA and Dell Technologies are optimizing computing for financial workloads to achieve higher throughput, speed and capacity with greater energy efficiency. The Strategic Technology Analysis Center, an organization dedicated to technology discovery and assessment in the finance industry, recently tested the STAC-A2 benchmark tests on several computing stacks comprising CPU-only infrastructure and GPU-based infrastructure. The STAC-A2 benchmark is designed by quants and technologists to measure the performance, scalability, quality and resource efficiency of technology stacks running market-risk analysis for derivatives.
When testing the STAC-A2 options pricing benchmark, the Dell PowerEdge server with NVIDIA GPUs performed 16x faster and 3x more energy efficiently than a CPU-only system for the same workload. This enables investment advisors to integrate larger bodies of data into derivatives risk-analysis calculations, enabling more data-driven decisions without increasing computing time or energy requirements.
PayPal, which was looking to deploy a new fraud-detection system to operate 24/7, worldwide and in real time to protect customer transactions, realized CPU-only servers couldn’t meet such computing requirements. Using NVIDIA GPUs for inference, PayPal improved real-time fraud detection by 10% and lowered server energy consumption by nearly 8x.
With accelerated computing, financial services organizations can run more iterations of investment scenarios, improve risk assessments and make more informed decisions for better investment results. Accelerated computing is the foundation for improving data throughput, reducing latency and optimizing energy usage to lower operating costs and achieve emissions goals.
An AI Future With Energy-Efficient Computing
With energy-efficient computing, enterprises can reduce data center costs and their carbon footprint while scaling AI initiatives and data workloads to stay competitive.
The NVIDIA accelerated computing platform offers a comprehensive suite of energy-efficient hardware and software to help enterprises use AI to drive innovation and efficiency without the need for equivalent growth in energy consumption.
With more than 100 frameworks, pretrained models and development tools optimized for GPUs, NVIDIA AI Enterprise accelerates the entire AI journey, from data preparation and model training to inference and scalable deployment. By getting their AI into production faster, businesses can significantly reduce overall power consumption.
With the NVIDIA RAPIDS Accelerator for Apache Spark, which is included with NVIDIA AI Enterprise, data analytics workloads can be completed 6x faster, translating to 5x savings on infrastructure and 6x less power used for the same amount of work. For a typical enterprise, this means 10 gigawatt hours less energy consumed compared with running jobs without GPU acceleration.
NVIDIA BlueField DPUs bring greater energy efficiency to data centers by offloading and accelerating data processing, networking and security tasks from the main CPU infrastructure. By maximizing performance per watt, they can help enterprises slash server power consumption by up to 30%, saving millions in data center costs.
As businesses shift to a new paradigm of AI-driven results, energy-efficient accelerated computing is helping organizations deliver on the promise of AI while controlling costs, maintaining sustainable practices and ensuring they can keep up with the pace of innovation.
Learn how accelerated computing can help organizations achieve both AI goals and carbon-footprint objectives.
Accenture creates a regulatory document authoring solution using AWS generative AI services
This post is co-written with Ilan Geller, Shuyu Yang and Richa Gupta from Accenture.
Bringing innovative new pharmaceuticals drugs to market is a long and stringent process. Companies face complex regulations and extensive approval requirements from governing bodies like the US Food and Drug Administration (FDA). A key part of the submission process is authoring regulatory documents like the Common Technical Document (CTD), a comprehensive standard formatted document for submitting applications, amendments, supplements, and reports to the FDA. This document contains over 100 highly detailed technical reports created during the process of drug research and testing. Manually creating CTDs is incredibly labor-intensive, requiring up to 100,000 hours per year for a typical large pharma company. The tedious process of compiling hundreds of documents is also prone to errors.
Accenture built a regulatory document authoring solution using automated generative AI that enables researchers and testers to produce CTDs efficiently. By extracting key data from testing reports, the system uses Amazon SageMaker JumpStart and other AWS AI services to generate CTDs in the proper format. This revolutionary approach compresses the time and effort spent on CTD authoring. Users can quickly review and adjust the computer-generated reports before submission.
Because of the sensitive nature of the data and effort involved, pharmaceutical companies need a higher level of control, security, and auditability. This solution relies on the AWS Well-Architected principles and guidelines to enable the control, security, and auditability requirements. The user-friendly system also employs encryption for security.
By harnessing AWS generative AI, Accenture aims to transform efficiency for regulated industries like pharmaceuticals. Automating the frustrating CTD document process accelerates new product approvals so innovative treatments can get to patients faster. AI delivers a major leap forward.
This post provides an overview of an end-to-end generative AI solution developed by Accenture for regulatory document authoring using SageMaker JumpStart and other AWS services.
Solution overview
Accenture built an AI-based solution that automatically generates a CTD document in the required format, along with the flexibility for users to review and edit the generated content. The preliminary value is estimated at a 40–45% reduction in authoring time.
This generative AI-based solution extracts information from the technical reports produced as part of the testing process and delivers the detailed dossier in a common format required by the central governing bodies. Users then review and edit the documents, where necessary, and submit the same to the central governing bodies. This solution uses the SageMaker JumpStart AI21 Jurassic Jumbo Instruct and AI21 Summarize models to extract and create the documents.
The following diagram illustrates the solution architecture.
The workflow consists of the following steps:
- A user accesses the regulatory document authoring tool from their computer browser.
- A React application is hosted on AWS Amplify and is accessed from the user’s computer (for DNS, use Amazon Route 53).
- The React application uses the Amplify authentication library to detect whether the user is authenticated.
- Amazon Cognito provides a local user pool or can be federated with the user’s active directory.
- The application uses the Amplify libraries for Amazon Simple Storage Service (Amazon S3) and uploads documents provided by users to Amazon S3.
- The application writes the job details (app-generated job ID and Amazon S3 source file location) to an Amazon Simple Queue Service (Amazon SQS) queue. It captures the message ID returned by Amazon SQS. Amazon SQS enables a fault-tolerant decoupled architecture. Even if there are some backend errors while processing a job, having a job record inside Amazon SQS will ensure successful retries.
- Using the job ID and message ID returned by the previous request, the client connects to the WebSocket API and sends the job ID and message ID to the WebSocket connection.
- The WebSocket triggers an AWS Lambda function, which creates a record in Amazon DynamoDB. The record is a key-value mapping of the job ID (WebSocket) with the connection ID and message ID.
- Another Lambda function gets triggered with a new message in the SQS queue. The Lambda function reads the job ID and invokes an AWS Step Functions workflow for processing data files.
- The Step Functions state machine invokes a Lambda function to process the source documents. The function code invokes Amazon Textract to analyze the documents. The response data is stored in DynamoDB. Based on specific requirements with processing data, it can also be stored in Amazon S3 or Amazon DocumentDB (with MongoDB compatibility).
- A Lambda function invokes the Amazon Textract API DetectDocument to parse tabular data from source documents and stores extracted data into DynamoDB.
- A Lambda function processes the data based on mapping rules stored in a DynamoDB table.
- A Lambda function invokes the prompt libraries and a series of actions using generative AI with a large language model hosted through Amazon SageMaker for data summarization.
- The document writer Lambda function writes a consolidated document in an S3 processed folder.
- The job callback Lambda function retrieves the callback connection details from the DynamoDB table, passing the job ID. Then the Lambda function makes a callback to the WebSocket endpoint and provides the processed document link from Amazon S3.
- A Lambda function deletes the message from the SQS queue so that it’s not reprocessed.
- A document generator web module converts the JSON data into a Microsoft Word document, saves it, and renders the processed document on the web browser.
- The user can view, edit, and save the documents back to the S3 bucket from the web module. This helps in reviews and corrections needed, if any.
The solution also uses SageMaker notebooks (labeled T in the preceding architecture) to perform domain adaption, fine-tune the models, and deploy the SageMaker endpoints.
Conclusion
In this post, we showcased how Accenture is using AWS generative AI services to implement an end-to-end approach towards a regulatory document authoring solution. This solution in early testing has demonstrated a 60–65% reduction in the time required for authoring CTDs. We identified the gaps in traditional regulatory governing platforms and augmented generative intelligence within its framework for faster response times, and are continuously improving the system while engaging with users across the globe. Reach out to the Accenture Center of Excellence team to dive deeper into the solution and deploy it for your clients.
This joint program focused on generative AI will help increase the time-to-value for joint customers of Accenture and AWS. The effort builds on the 15-year strategic relationship between the companies and uses the same proven mechanisms and accelerators built by the Accenture AWS Business Group (AABG).
Connect with the AABG team at accentureaws@amazon.com to drive business outcomes by transforming to an intelligent data enterprise on AWS.
For further information about generative AI on AWS using Amazon Bedrock or SageMaker, refer to Generative AI on AWS: Technology and Get started with generative AI on AWS using Amazon SageMaker JumpStart.
You can also sign up for the AWS generative AI newsletter, which includes educational resources, blogs, and service updates.
About the Authors
Ilan Geller is a Managing Director in the Data and AI practice at Accenture. He is the Global AWS Partner Lead for Data and AI and the Center for Advanced AI. His roles at Accenture have primarily been focused on the design, development, and delivery of complex data, AI/ML, and most recently Generative AI solutions.
Shuyu Yang is Generative AI and Large Language Model Delivery Lead and also leads CoE (Center of Excellence) Accenture AI (AWS DevOps professional) teams.
Richa Gupta is a Technology Architect at Accenture, leading various AI projects. She comes with 18+ years of experience in architecting Scalable AI and GenAI solutions. Her expertise area is on AI architecture, Cloud Solutions and Generative AI. She plays and instrumental role in various presales activities.
Shikhar Kwatra is an AI/ML Specialist Solutions Architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.
Sachin Thakkar is a Senior Solutions Architect at Amazon Web Services, working with a leading Global System Integrator (GSI). He brings over 23 years of experience as an IT Architect and as Technology Consultant for large institutions. His focus area is on Data, Analytics and Generative AI. Sachin provides architectural guidance and supports the GSI partner in building strategic industry solutions on AWS.
Graph neural networks in TensorFlow
Objects and their relationships are ubiquitous in the world around us, and relationships can be as important to understanding an object as its own attributes viewed in isolation — take for example transportation networks, production networks, knowledge graphs, or social networks. Discrete mathematics and computer science have a long history of formalizing such networks as graphs, consisting of nodes connected by edges in various irregular ways. Yet most machine learning (ML) algorithms allow only for regular and uniform relations between input objects, such as a grid of pixels, a sequence of words, or no relation at all.
Graph neural networks, or GNNs for short, have emerged as a powerful technique to leverage both the graph’s connectivity (as in the older algorithms DeepWalk and Node2Vec) and the input features on the various nodes and edges. GNNs can make predictions for graphs as a whole (Does this molecule react in a certain way?), for individual nodes (What’s the topic of this document, given its citations?) or for potential edges (Is this product likely to be purchased together with that product?). Apart from making predictions about graphs, GNNs are a powerful tool used to bridge the chasm to more typical neural network use cases. They encode a graph’s discrete, relational information in a continuous way so that it can be included naturally in another deep learning system.
We are excited to announce the release of TensorFlow GNN 1.0 (TF-GNN), a production-tested library for building GNNs at large scales. It supports both modeling and training in TensorFlow as well as the extraction of input graphs from huge data stores. TF-GNN is built from the ground up for heterogeneous graphs, where types of objects and relations are represented by distinct sets of nodes and edges. Real-world objects and their relations occur in distinct types, and TF-GNN’s heterogeneous focus makes it natural to represent them.
Inside TensorFlow, such graphs are represented by objects of type tfgnn.GraphTensor
. This is a composite tensor type (a collection of tensors in one Python class) accepted as a first-class citizen in tf.data.Dataset
, tf.function
, etc. It stores both the graph structure and its features attached to nodes, edges and the graph as a whole. Trainable transformations of GraphTensors can be defined as Layers objects in the high-level Keras API, or directly using the tfgnn.GraphTensor
primitive.
GNNs: Making predictions for an object in context
For illustration, let’s look at one typical application of TF-GNN: predicting a property of a certain type of node in a graph defined by cross-referencing tables of a huge database. For example, a citation database of Computer Science (CS) arXiv papers with one-to-many cites and many-to-one cited relationships where we would like to predict the subject area of each paper.
Like most neural networks, a GNN is trained on a dataset of many labeled examples (~millions), but each training step consists only of a much smaller batch of training examples (say, hundreds). To scale to millions, the GNN gets trained on a stream of reasonably small subgraphs from the underlying graph. Each subgraph contains enough of the original data to compute the GNN result for the labeled node at its center and train the model. This process — typically referred to as subgraph sampling — is extremely consequential for GNN training. Most existing tooling accomplishes sampling in a batch way, producing static subgraphs for training. TF-GNN provides tooling to improve on this by sampling dynamically and interactively.
Pictured, the process of subgraph sampling where small, tractable subgraphs are sampled from a larger graph to create input examples for GNN training. |
TF-GNN 1.0 debuts a flexible Python API to configure dynamic or batch subgraph sampling at all relevant scales: interactively in a Colab notebook (like this one), for efficient sampling of a small dataset stored in the main memory of a single training host, or distributed by Apache Beam for huge datasets stored on a network filesystem (up to hundreds of millions of nodes and billions of edges). For details, please refer to our user guides for in-memory and beam-based sampling, respectively.
On those same sampled subgraphs, the GNN’s task is to compute a hidden (or latent) state at the root node; the hidden state aggregates and encodes the relevant information of the root node’s neighborhood. One classical approach is message-passing neural networks. In each round of message passing, nodes receive messages from their neighbors along incoming edges and update their own hidden state from them. After n rounds, the hidden state of the root node reflects the aggregate information from all nodes within n edges (pictured below for n = 2). The messages and the new hidden states are computed by hidden layers of the neural network. In a heterogeneous graph, it often makes sense to use separately trained hidden layers for the different types of nodes and edges
The training setup is completed by placing an output layer on top of the GNN’s hidden state for the labeled nodes, computing the loss (to measure the prediction error), and updating model weights by backpropagation, as usual in any neural network training.
Beyond supervised training (i.e., minimizing a loss defined by labels), GNNs can also be trained in an unsupervised way (i.e., without labels). This lets us compute a continuous representation (or embedding) of the discrete graph structure of nodes and their features. These representations are then typically utilized in other ML systems. In this way, the discrete, relational information encoded by a graph can be included in more typical neural network use cases. TF-GNN supports a fine-grained specification of unsupervised objectives for heterogeneous graphs.
Building GNN architectures
The TF-GNN library supports building and training GNNs at various levels of abstraction.
At the highest level, users can take any of the predefined models bundled with the library that are expressed in Keras layers. Besides a small collection of models from the research literature, TF-GNN comes with a highly configurable model template that provides a curated selection of modeling choices that we have found to provide strong baselines on many of our in-house problems. The templates implement GNN layers; users need only to initialize the Keras layers.
At the lowest level, users can write a GNN model from scratch in terms of primitives for passing data around the graph, such as broadcasting data from a node to all its outgoing edges or pooling data into a node from all its incoming edges (e.g., computing the sum of incoming messages). TF-GNN’s graph data model treats nodes, edges and whole input graphs equally when it comes to features or hidden states, making it straightforward to express not only node-centric models like the MPNN discussed above but also more general forms of GraphNets. This can, but need not, be done with Keras as a modeling framework on the top of core TensorFlow. For more details, and intermediate levels of modeling, see the TF-GNN user guide and model collection.
Training orchestration
While advanced users are free to do custom model training, the TF-GNN Runner also provides a succinct way to orchestrate the training of Keras models in the common cases. A simple invocation may look like this:
The Runner provides ready-to-use solutions for ML pains like distributed training and tfgnn.GraphTensor
padding for fixed shapes on Cloud TPUs. Beyond training on a single task (as shown above), it supports joint training on multiple (two or more) tasks in concert. For example, unsupervised tasks can be mixed with supervised ones to inform a final continuous representation (or embedding) with application specific inductive biases. Callers only need substitute the task argument with a mapping of tasks:
Additionally, the TF-GNN Runner also includes an implementation of integrated gradients for use in model attribution. Integrated gradients output is a GraphTensor with the same connectivity as the observed GraphTensor but its features replaced with gradient values where larger values contribute more than smaller values in the GNN prediction. Users can inspect gradient values to see which features their GNN uses the most.
Conclusion
In short, we hope TF-GNN will be useful to advance the application of GNNs in TensorFlow at scale and fuel further innovation in the field. If you’re curious to find out more, please try our Colab demo with the popular OGBN-MAG benchmark (in your browser, no installation required), browse the rest of our user guides and Colabs, or take a look at our paper.
Acknowledgements
The TF-GNN release 1.0 was developed by a collaboration between Google Research: Sami Abu-El-Haija, Neslihan Bulut, Bahar Fatemi, Johannes Gasteiger, Pedro Gonnet, Jonathan Halcrow, Liangze Jiang, Silvio Lattanzi, Brandon Mayer, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Dustin Zelle, Google Core ML: Arno Eigenwillig, Oleksandr Ferludin, Parth Kothari, Mihir Paradkar, Jan Pfeifer, Rachael Tamakloe, and Google DeepMind: Alvaro Sanchez-Gonzalez and Lisa Wang.
Graph neural networks in TensorFlow
Posted by Dustin Zelle – Software Engineer, Research and Arno Eigenwillig – Software Engineer, CoreML
This article is also shared on the Google Research Blog
Objects and their relationships are ubiquitous in the world around us, and relationships can be as important to understanding an object as its own attributes viewed in isolation — for example: transportation networks, production networks, knowledge graphs, or social networks. Discrete mathematics and computer science have a long history of formalizing such networks them as graphs, consisting of nodes arbitrarily connected by edges in various irregular ways. Yet most machine learning (ML) algorithms allow only for regular and uniform relations between input objects, such as a grid of pixels, a sequence of words, or no relation at all.
Graph neural networks, or GNNs for short, have emerged as a powerful technique to leverage both the graph’s connectivity (as in the older algorithms DeepWalk and Node2Vec) and the input features on the various nodes and edges. GNNs can make predictions for graphs as a whole (Does this molecule react in a certain way?), for individual nodes (What’s the topic of this document, given its citations?) or for potential edges (Is this product likely to be purchased together with that product?). Apart from making predictions about graphs, GNNs are a powerful tool used to bridge the chasm to more typical neural network use cases. They encode a graph’s discrete, relational information in a continuous way so that it can be included naturally in another deep learning system.
We are excited to announce the release of TensorFlow GNN 1.0 (TF-GNN), a production-tested library for building GNNs at large scale. It supports both modeling and training in TensorFlow as well as the extraction of input graphs from huge data stores. TF-GNN is built from the ground up for heterogeneous graphs where types and relations are represented by distinct sets of nodes and edges. Real-world objects and their relations occur in distinct types and TF-GNN’s heterogeneous focus makes it natural to represent them.
Inside TensorFlow, such graphs are represented by objects of type tfgnn.GraphTensor
. This is a composite tensor type (a collection of tensors in one Python class) accepted as a first-class citizen in tf.data.Dataset
, tf.function
, etc. It stores both the graph structure and its features attached to nodes, edges and the graph as a whole. Trainable transformations of GraphTensors can be defined as Layers objects in the high-level Keras API, or directly using the tfgnn.GraphTensor
primitive.
GNNs: Making predictions for an object in context
For illustration, let’s look at one typical application of TF-GNN: predicting a property of a certain type of node in a graph defined by cross-referencing tables of a huge database. For example, a citation database of Computer Science (CS) arXiv papers with one-to-many cites and many-to-one cited relationships where we would like to predict the subject area of each paper.
Like most neural networks, a GNN is trained on a dataset of many labeled examples (~millions), but each training step consists only of a much smaller batch of training examples (say, hundreds). To scale to millions, the GNN gets trained on a stream of reasonably small subgraphs from the underlying graph. Each subgraph contains enough of the original data to compute the GNN result for the labeled node at its center and train the model. This process — typically referred to as subgraph sampling — is extremely consequential for GNN training. Most existing tooling accomplishes sampling in a batch way, producing static subgraphs for training. TF-GNN provides tooling to improve on this by sampling dynamically and interactively.
Pictured, the process of subgraph sampling where small, tractable subgraphs are sampled from a larger graph to create input examples for GNN training. |
TF-GNN 1.0 debuts a flexible Python API to configure dynamic or batch subgraph sampling at all relevant scales: interactively in a Colab notebook (like this one), for efficient sampling of a small dataset stored in the main memory of a single training host, or distributed by Apache Beam for huge datasets stored on a network filesystem (up to hundreds of millions of nodes and billions of edges). For details, please refer to our user guides for in-memory and beam-based sampling, respectively.
On those same sampled subgraphs, the GNN’s task is to compute a hidden (or latent) state at the root node; the hidden state aggregates and encodes the relevant information of the root node’s neighborhood. One classical approach is message-passing neural networks. In each round of message passing, nodes receive messages from their neighbors along incoming edges and update their own hidden state from them. After n rounds, the hidden state of the root node reflects the aggregate information from all nodes within n edges (pictured below for n = 2). The messages and the new hidden states are computed by hidden layers of the neural network. In a heterogeneous graph, it often makes sense to use separately trained hidden layers for the different types of nodes and edges.
Pictured, a simple message-passing neural network where, at each step, the node state is propagated from outer to inner nodes where it is pooled to compute new node states. Once the root node is reached, a final prediction can be made. |
The training setup is completed by placing an output layer on top of the GNN’s hidden state for the labeled nodes, computing the loss (to measure the prediction error), and updating model weights by backpropagation, as usual in any neural network training.
Beyond supervised training (i.e., minimizing a loss defined by labels), GNNs can also be trained in an unsupervised way (i.e., without labels). This lets us compute a continuous representation (or embedding) of the discrete graph structure of nodes and their features. These representations are then typically utilized in other ML systems. In this way, the discrete, relational information encoded by a graph can be included in more typical neural network use cases. TF-GNN supports a fine-grained specification of unsupervised objectives for heterogeneous graphs.
Building GNN architectures
The TF-GNN library supports building and training GNNs at various levels of abstraction.
At the highest level, users can take any of the predefined models bundled with the library that are expressed in Keras layers. Besides a small collection of models from the research literature, TF-GNN comes with a highly configurable model template that provides a curated selection of modeling choices that we have found to provide strong baselines on many of our in-house problems. The templates implement GNN layers; users need only to initialize the Keras layers.
import tensorflow_gnn as tfgnn
from tensorflow_gnn.models import mt_albis
def model_fn(graph_tensor_spec: tfgnn.GraphTensorSpec):
"""Builds a GNN as a Keras model."""
graph = inputs = tf.keras.Input(type_spec=graph_tensor_spec)
# Encode input features (callback omitted for brevity).
graph = tfgnn.keras.layers.MapFeatures(
node_sets_fn=set_initial_node_states)(graph)
# For each round of message passing...
for _ in range(2):
# ... create and apply a Keras layer.
graph = mt_albis.MtAlbisGraphUpdate(
units=128, message_dim=64,
attention_type="none", simple_conv_reduce_type="mean",
normalization_type="layer", next_state_type="residual",
state_dropout_rate=0.2, l2_regularization=1e-5,
)(graph)
return tf.keras.Model(inputs, graph)
At the lowest level, users can write a GNN model from scratch in terms of primitives for passing data around the graph, such as broadcasting data from a node to all its outgoing edges or pooling data into a node from all its incoming edges (e.g., computing the sum of incoming messages). TF-GNN’s graph data model treats nodes, edges and whole input graphs equally when it comes to features or hidden states, making it straightforward to express not only node-centric models like the MPNN discussed above but also more general forms of GraphNets. This can, but need not, be done with Keras as a modeling framework on the top of core TensorFlow. For more details, and intermediate levels of modeling, see the TF-GNN user guide and model collection.
Training orchestration
While advanced users are free to do custom model training, the TF-GNN Runner also provides a succinct way to orchestrate the training of Keras models in the common cases. A simple invocation may look like this:
from tensorflow_gnn import runner
runner.run(
task=runner.RootNodeBinaryClassification("papers", ...),
model_fn=model_fn,
trainer=runner.KerasTrainer(tf.distribute.MirroredStrategy(), model_dir="/tmp/model"),
optimizer_fn=tf.keras.optimizers.Adam,
epochs=10,
global_batch_size=128,
train_ds_provider=runner.TFRecordDatasetProvider("/tmp/train*"),
valid_ds_provider=runner.TFRecordDatasetProvider("/tmp/validation*"),
gtspec=...,
)
The Runner provides ready-to-use solutions for ML pains like distributed training and tfgnn.GraphTensor
padding for fixed shapes on Cloud TPUs. Beyond training on a single task (as shown above), it supports joint training on multiple (two or more) tasks in concert. For example, unsupervised tasks can be mixed with supervised ones to inform a final continuous representation (or embedding) with application specific inductive biases. Callers only need substitute the task argument with a mapping of tasks:
from tensorflow_gnn import runner
from tensorflow_gnn.models import contrastive_losses
runner.run(
task={
"classification": runner.RootNodeBinaryClassification("papers", ...),
"dgi": contrastive_losses.DeepGraphInfomaxTask("papers"),
},
...
)
Additionally, the TF-GNN Runner also includes an implementation of integrated gradients for use in model attribution. Integrated gradients output is a GraphTensor with the same connectivity as the observed GraphTensor but its features replaced with gradient values where larger values contribute more than smaller values in the GNN prediction. Users can inspect gradient values to see which features their GNN uses the most.
Conclusion
In short, we hope TF-GNN will be useful to advance the application of GNNs in TensorFlow at scale and fuel further innovation in the field. If you’re curious to find out more, please try our Colab demo with the popular OGBN-MAG benchmark (in your browser, no installation required), browse the rest of our user guides and Colabs, or take a look at our paper.
Acknowledgements
The TF-GNN release 1.0 was developed by a collaboration between Google Research (Sami Abu-El-Haija, Neslihan Bulut, Bahar Fatemi, Johannes Gasteiger, Pedro Gonnet, Jonathan Halcrow, Liangze Jiang, Silvio Lattanzi, Brandon Mayer, Vahab Mirrokni, Bryan Perozzi, Anton Tsitsulin, Dustin Zelle), Google Core ML (Arno Eigenwillig, Oleksandr Ferludin, Parth Kothari, Mihir Paradkar, Jan Pfeifer, Rachael Tamakloe), and Google DeepMind (Alvaro Sanchez-Gonzalez and Lisa Wang).
Integrate QnABot on AWS with ServiceNow
Do your employees wait for hours on the telephone to open an IT ticket? Do they wait for an agent to triage an issue, which sometimes only requires restarting the computer? Providing excellent IT support is crucial for any organization, but legacy systems have relied heavily on human agents being available to intake reports and triage issues. Conversational AI (or chatbots) can help triage some of these common IT problems and create a ticket for the tasks when human assistance is needed. Chatbots quickly resolve common business issues, improve employee experiences, and free up agents’ time to handle more complex problems.
QnABot on AWS is an open source solution built using AWS native services like Amazon Lex, Amazon OpenSearch Service, AWS Lambda, Amazon Transcribe, and Amazon Polly. QnABot version 5.4+ is also enhanced with generative AI capabilities.
According to Gartner Magic Quadrant 2023, ServiceNow is one of the leading IT Service Management (ITSM) providers on the market. ServiceNow’s Incident Management uses workflows to identify, track, and resolve high‑impact IT service incidents.
In this post, we demonstrate how to integrate the QnABot on AWS chatbot solution with ServiceNow. With this integration, users can chat with QnABot to triage their IT service issues and open an incident ticket in ServiceNow in real time by providing details to QnABot.
Watch the following video to see how users can ask questions to an IT service desk chatbot and get answers. For most frequently asked questions, chatbot answers can help resolve the issue. When a user determines that the answers provided are not useful, they can request the creation of a ticket in ServiceNow.
Solution overview
QnABot on AWS is a multi-channel, multi-language chatbot that responds to your customer’s questions, answers, and feedback. QnABot on AWS is a complete solution and can be deployed as part of your IT Service Desk ticketing workflow. Its distributed architecture allows for integrations with other systems like ServiceNow. If you wish to build your own chatbot using Amazon Lex or add only Amazon Lex as part of your application, refer to Integrate ServiceNow with Amazon Lex chatbot for ticket processing.
The following diagram illustrates the solution architecture.
The workflow includes the following steps:
- A QnABot administrator can configure the questions using the Content Designer UI delivered by Amazon API Gateway and Amazon Simple Storage Service (Amazon S3).
- The Content Designer Lambda function saves the input in OpenSearch Service in a question’s bank index.
- When QnABot users ask questions prompting ServiceNow integration, Amazon Lex fetches the questions and requests the user to provide a description of the issue. When the description is provided, it invokes a Lambda function.
- The Lambda function fetches secrets from AWS Secrets Manager, where environment variables are stored, and makes an HTTP call to create a ticket in ServiceNow. The ticket number is then returned to the user.
When building a diagnostic workflow, you may require inputs to different questions before you can create a ticket in ServiceNow. You can use response bots and the document chaining capabilities of QnABot to achieve this capability.
Response bots are bots created to elicit a response from users and store them as part of session variables or as part of slot values. You can use built-in response bots or create a custom response bot. Response chatbot names must start with the letters “QNA.”
This solution provides a set of built-in response bots. Refer to Configuring the chatbot to ask the questions and use response bots for implementation details.
You can use document chaining to elicit the response and invoke Lambda functions. The chaining rule is a JavaScript programming expression used to test the value of the session attribute set to elicit a response and either route to another bot or invoke Lambda functions. You can identify the next question in the document by identifying the question ID (QID) specified in the Document Chaining:Chaining Rule field as ‘QID::‘ followed by the QID value of the document. For example, a rule that evaluates to “QID::Admin001” will chain to item Admin.001.
When using a chaining rule for Lambda, the function name must start with the letters “QNA,” and is specified in the Document Chaining:Chaining Rule field as ‘Lambda::FunctionNameorARN’. All chaining rules must be enclosed in a single quote.
Deploy the QnABot solution
Complete the following steps to deploy the solution:
- Choose Launch Solution on the QnABot implementation guide to deploy the latest QnABot template via AWS CloudFormation.
- Provide a name for the bot.
- Provide an email where you will receive an email to reset your password.
- Make sure that EnableCognitoLogin is set to true.
- For all other parameters, accept the defaults (see the implementation guide for parameter definitions), and launch the QnABot stack.
This post uses a static webpage hosted on Amazon CloudFront, and the QnABot chatbot is embedded in the page using the Amazon Lex web UI sample plugin. We also provide instructions for testing this solution using the QnABot client page.
Create a ServiceNow account
This section walks through the steps to create a ServiceNow account and ServiceNow developer instance:
- First, sign up for a ServiceNow account.
- Go to your email and confirm this email address for your ServiceNow ID.
- As part of the verification, you’ll will be asked to provide the six-digit verification code sent to your email.
- You can skip the page that asks you to set up two-factor authentication. You’re redirected to the landing page with the ServiceNow Developer program.
- In the Getting Started steps, choose Yes, I need a developer oriented IDE.
- Choose Start Building to set up an instance.
When the build is complete, which may take couple of seconds to minutes, you will be provided with the instance URL, user name, and password details. Save this information to use in later steps.
- Log in to the site using the following URL (provide your instance):
https://devXXXXXX.service-now.com/now/nav/ui/classic/params/target/change_request_list.do
.
Be sure to stay logged in to the ServiceNow developer instance throughout the process.
If logged out, use your email and password to log back in and wake up the instance and prevent hibernation.
- Choose All in the navigation bar, then choose Incidents.
- Select All to remove all of the filters.
All incidents will be shown on this page.
Create users in ServiceNow and an Amazon Cognito pool
You can create an incident using the userid of the chatbot user. For that, we need to confirm that the userId of the chatbot user exists in ServiceNow. First, we create the ServiceNow user, then we create a user with the same ID in an Amazon Cognito user pool. Amazon Cognito is an AWS service to authenticate clients and provide temporary AWS credentials.
- Create a ServiceNow user. Be sure to include a first name, last name, and email.
Note down the user ID of the newly created user. You will need this when creating an Amazon Cognito user in a user pool.
- On the Amazon Cognito console, choose User pools in the navigation pane.
If you have deployed the Amazon Lex web UI plugin, you will see two user pool names; if you did not, you’ll see only one user pool name.
- Select the user pool that has your QnABot name and create a new user. Use the same userId as that of the ServiceNow user.
- If you are using the Amazon Lex web UI, create a user in the appropriate Amazon Cognito user pool by following the preceding steps.
Note that the userId you created will be used for the QnABot client and Amazon Lex Web UI client.
Create a Lambda function for invoking ServiceNow
In this step, you create a Lambda function that invokes the ServiceNow API to create a ticket.
- On the Lambda console, choose Functions in the navigation pane.
- Choose Create function.
- Select Author from scratch.
- For Function name, enter a name, such as
qna-ChatBotLambda
. (Remember that QnABot requires the prefixqna-
in the name.) - For Runtime, choose Node.js 18.x.
This Lambda function creates new role. If you want to use an existing role, you can change the default AWS Identity and Access Management (IAM) execution role by selecting Use existing role.
- Choose Create function.
- After you create the function, use the inline editor to edit the code for index.js.
- Right-click on
index.js
and rename it toindex.mjs
. - Enter the following code, which is sample code for the function that you’re using as the compute layer for our logic:
This function uses the ServiceNow Incident API. For more information, refer to Create an incident.
- Choose Deploy to deploy this code to the $LATEST version of the Lambda function.
- On the Configuration tab, in the Environment variables section, add the following:
-
- Add
SERVICENOW_HOST
with the valuedevXXXXXX.service-now.com
. - Add
SERVICENOW_USERNAME
with the valueadmin
.
- Add
-
- Copy the Lambda function ARN. You will need it at later stage.
The next step is to store your ServiceNow user name and password in Secrets Manager.
- On the Secrets Manager console, create a new secret.
- Select Other type of secret.
- Add your key-value pairs as shown and choose Next.
- For Secret name, enter a descriptive name (for this post, servicenow/password). If you choose a different name, update the value of const
secret_name
in the Lambda function code. - Choose Next.
- Leave Configure rotation on default and choose Next.
- Review the secret information and choose Store.
- Copy the ARN of the newly created secret.
Now let’s give Lambda permissions to Secrets Manager.
- On the Lambda function page, go to the Configurations tab and navigate to the Permissions section.
- Choose the execution role name to open the IAM page for the role.
- In the following inline policy, provide the ARN of the secret you created earlier:
Configure QnABot configurations
In this section, we first create some knowledge questions using the Questions feature of QnABot. We then create a response bot that elicits a response from a user when they ask for help. This bot uses document chaining to call another bot, and triggers Lambda to create a ServiceNow ticket.
For more information about using QnABot with generative AI, refer to Deploy generative AI self-service question answering using the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra, and Amazon Bedrock.
Create knowledge question 1
Create a knowledge question for installing software:
- On the AWS CloudFormation console, navigate to the QnABot stack.
- On the Outputs tab, and open the link for
ContentDesignerURL
. - Log in to the QnABot Content Designer using admin credentials.
- Choose Add to add a new question.
- Select qna.
- For Item ID, enter software.001.
- Under Questions/Utterances, enter the following:
- Under Answer, enter the following answer:
- Expand the Advanced section and enter the same text in Markdown Answer.
- Leave the rest as default, and choose Create to save the question.
Create knowledge question 2
Now you create the second knowledge question.
- Choose Add to add a new question.
- Select qna.
- For Item ID, enter
knowledge.001
. - Under Questions/Utterances, enter
Want to learn more about Amazon Lex
. - Under Answer, enter the following answer:
- Expand the Advanced section and enter the same answer under Markdown Answer.
- Leave the rest as default, and choose Create to save the question.
Create knowledge question 3
Complete the following steps to add another knowledge question:
- Choose Add to add a new question.
- Select qna.
- For Item ID, enter
password.reset
. - Under Questions/Utterances, enter I need to reset my password.
- Under Answer, enter the following answer:
- Expand the Advanced section and enter the same text for Markdown Answer.
- Choose Create to save the question.
Create a response bot
Complete the following steps to create the first response bot, which elicits a response:
- Choose Add to add a new question.
- Select qna.
- For Item ID, enter
ElicitResponse.001
. - Under Questions/Utterances, enter
Please create a ticket
. - Under Answer, enter the following answer:
- Expand the Advanced section and navigate to the Elicit Response section.
- For Elicit Response: ResponseBot Hook, enter
QNAFreeText
. - For Elicit Response: Response Session Attribute Namespace, enter
short_description
.
This creates a slot named short_description
that captures the response or description for the incident. This slot uses the built-in QNAFreeText, which is used for capturing free text.
- For Document Chaining: Chaining Rule, enter
QID::item.002
. This must be in single quotes. Remember this chaining rule to use when creating your document chain. - Leave the rest as default.
- Choose Create to save the question.
Create a document chain
Now we create a document chain in QnABot that will trigger the Lambda function to create a ticket and respond with a ticket number. Document chaining allows you to chain two bots based on the rule you configured. Complete the following steps:
- Choose Add to add a new question.
- Select qna.
- For Item ID, enter
item.002
. This should match the QID value given in the document chain rule earlier. - Under Questions/Utterances, enter
servicenow integration
. - Under Answer, enter the following answer:
- In the Advanced section, add the Lambda function ARN for Lambda Hook.
- Choose Create to save the question.
Test the QnABot
To test the QnABot default client, complete the following steps:
- Choose the options menu in the Content Designer and choose QnABot Client.
The QnABot client will open in a new browser tab.
- Log in using the newly created user credentials to begin the test.
If you plan to use the Amazon Lex Web UI on a static page, follow these instructions.
- Choose the chat icon at the bottom of the page to start the chat.
- To log in, choose Login on the menu.
You will be routed to the login page.
- Provide the
userId
created earlier. - For first-time logins, you will be prompted to reset your password.
- Now we can test the chatbot with example use cases. For our first use case, we want to learn about Amazon and enter the question “I want to learn about Amazon Lex, can you give me some information about it?” QnABot provides a video and some links to resources.
- In our next, example, we need to install software on our laptop, and ask “Can you give me instructions to install software.” QnABot understands that the user is requesting help installing software and provides answers from the knowledge bank. You can follow those instructions and install the software you need.
- While installing the software, what if you locked your password due to multiple failed login attempts? To request a password reset, you can ask “I need to reset my password.”
- You might need additional assistance resetting the password and want to create a ticket. In this case, enter “Please create a ticket.” QnABot asks for a description of the problem; you can enter “reset password.” QnAbot creates a ticket with the description provided and provides the ticket number as part of the response.
- You can verify the incident ticket was created on the ServiceNow console under Incidents. If the ticket is not shown on the first page, search for the ticket number using the search toolbar.
Clean up
To avoid incurring future charges, delete the resources you created. For instructions to uninstall the QnABot solution plugin, refer to Uninstall the solution.
Conclusion
Integrating QnABot on AWS with ServiceNow provides an end-to-end solution for automated customer support. With QnABot’s conversational AI capabilities to understand customer questions and ServiceNow’s robust incident management features, companies can streamline ticket creation and resolution. You can also extend this solution to show a list of tickets created by the user. For more information about incorporating these techniques into your bots, see QnABot on AWS.
About the Authors
Sujatha Dantuluri is a Senior Solutions Architect in the US federal civilian team at AWS. She has over 20 years of experience supporting commercial and federal government. She works closely with customers in building and architecting mission-critical solutions. She has also contributed to IEEE standards.
Maia Haile is a Solutions Architect at Amazon Web Services based in the Washington, D.C. area. In that role, she helps public sector customers achieve their mission objectives with well-architected solutions on AWS. She has 5 years of experience spanning nonprofit healthcare, media and entertainment, and retail. Her passion is using AI and ML to help public sector customers achieve their business and technical goals.
Deploy large language models for a healthtech use case on Amazon SageMaker
In 2021, the pharmaceutical industry generated $550 billion in US revenue. Pharmaceutical companies sell a variety of different, often novel, drugs on the market, where sometimes unintended but serious adverse events can occur.
These events can be reported anywhere, from hospitals or at home, and must be responsibly and efficiently monitored. Traditional manual processing of adverse events is made challenging by the increasing amount of health data and costs. Overall, $384 billion is projected as the cost of pharmacovigilance activities to the overall healthcare industry by 2022. To support overarching pharmacovigilance activities, our pharmaceutical customers want to use the power of machine learning (ML) to automate the adverse event detection from various data sources, such as social media feeds, phone calls, emails, and handwritten notes, and trigger appropriate actions.
In this post, we show how to develop an ML-driven solution using Amazon SageMaker for detecting adverse events using the publicly available Adverse Drug Reaction Dataset on Hugging Face. In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried.
We implemented the solution using the AWS Cloud Development Kit (AWS CDK). However, we don’t cover the specifics of building the solution in this post. For more information on the implementation of this solution, refer to Build a system for catching adverse events in real-time using Amazon SageMaker and Amazon QuickSight.
This post delves into several key areas, providing a comprehensive exploration of the following topics:
- The data challenges encountered by AWS Professional Services
- The landscape and application of large language models (LLMs):
- Transformers, BERT, and GPT
- Hugging Face
- The fine-tuned LLM solution and its components:
- Data preparation
- Model training
Data challenge
Data skew is often a problem when coming up with classification tasks. You would ideally like to have a balanced dataset, and this use case is no exception.
We address this skew with generative AI models (Falcon-7B and Falcon-40B), which were prompted to generate event samples based on five examples from the training set to increase the semantic diversity and increase the sample size of labeled adverse events. It’s advantageous to us to use the Falcon models here because, unlike some LLMs on Hugging Face, Falcon gives you the training dataset they use, so you can be sure that none of your test set examples are contained within the Falcon training set and avoid data contamination.
The other data challenge for healthcare customers are HIPAA compliance requirements. Encryption at rest and in transit has to be incorporated into the solution to meet these requirements.
Transformers, BERT, and GPT
The transformer architecture is a neural network architecture that is used for natural language processing (NLP) tasks. It was first introduced in the paper “Attention Is All You Need” by Vaswani et al. (2017). The transformer architecture is based on the attention mechanism, which allows the model to learn long-range dependencies between words. Transformers, as laid out in the original paper, consist of two main components: the encoder and the decoder. The encoder takes the input sequence as input and produces a sequence of hidden states. The decoder then takes these hidden states as input and produces the output sequence. The attention mechanism is used in both the encoder and the decoder. The attention mechanism allows the model to attend to specific words in the input sequence when generating the output sequence. This allows the model to learn long-range dependencies between words, which is essential for many NLP tasks, such as machine translation and text summarization.
One of the more popular and useful of the transformer architectures, Bidirectional Encoder Representations from Transformers (BERT), is a language representation model that was introduced in 2018. BERT is trained on sequences where some of the words in a sentence are masked, and it has to fill in those words taking into account both the words before and after the masked words. BERT can be fine-tuned for a variety of NLP tasks, including question answering, natural language inference, and sentiment analysis.
The other popular transformer architecture that has taken the world by storm is Generative Pre-trained Transformer (GPT). The first GPT model was introduced in 2018 by OpenAI. It works by being trained to strictly predict the next word in a sequence, only aware of the context before the word. GPT models are trained on a massive dataset of text and code, and they can be fine-tuned for a range of NLP tasks, including text generation, question answering, and summarization.
In general, BERT is better at tasks that require deeper understanding of the context of words, whereas GPT is better suited for tasks that require generating text.
Hugging Face
Hugging Face is an artificial intelligence company that specializes in NLP. It provides a platform with tools and resources that enable developers to build, train, and deploy ML models focused on NLP tasks. One of the key offerings of Hugging Face is its library, Transformers, which includes pre-trained models that can be fine-tuned for various language tasks such as text classification, translation, summarization, and question answering.
Hugging Face integrates seamlessly with SageMaker, which is a fully managed service that enables developers and data scientists to build, train, and deploy ML models at scale. This synergy benefits users by providing a robust and scalable infrastructure to handle NLP tasks with the state-of-the-art models that Hugging Face offers, combined with the powerful and flexible ML services from AWS. You can also access Hugging Face models directly from Amazon SageMaker JumpStart, making it convenient to start with pre-built solutions.
Solution overview
We used the Hugging Face Transformers library to fine-tune transformer models on SageMaker for the task of adverse event classification. The training job is built using the SageMaker PyTorch estimator. SageMaker JumpStart also has some complementary integrations with Hugging Face that makes straightforward to implement. In this section, we describe the major steps involved in data preparation and model training.
Data preparation
We used the Adverse Drug Reaction Data (ade_corpus_v2) within the Hugging Face dataset with an 80/20 training/test split. The required data structure for our model training and inference has two columns:
- One column for text content as model input data.
- Another column for the label class. We have two possible classes for a text:
Not_AE
andAdverse_Event
.
Model training and experimentation
In order to efficiently explore the space of possible Hugging Face models to fine-tune on our combined data of adverse events, we constructed a SageMaker hyperparameter optimization (HPO) job and passed in different Hugging Face models as a hyperparameter, along with other important hyperparameters such as training batch size, sequence length, models, and learning rate. The training jobs used an ml.p3dn.24xlarge instance and took an average of 30 minutes per job with that instance type. Training metrics were captured though the Amazon SageMaker Experiments tool, and each training job ran through 10 epochs.
We specify the following in our code:
- Training batch size – Number of samples that are processed together before the model weights are updated
- Sequence length – Maximum length of the input sequence that BERT can process
- Learning rate – How quickly the model updates its weights during training
- Models – Hugging Face pretrained models
Results
The model that performed the best in our use case was the monologg/biobert_v1.1_pubmed
model hosted on Hugging Face, which is a version of the BERT architecture that has been pre-trained on the Pubmed dataset, which consists of 19,717 scientific publications. Pre-training BERT on this dataset gives this model extra expertise when it comes to identifying context around medically related scientific terms. This boosts the model’s performance for the adverse event detection task because it has been pre-trained on medically specific syntax that shows up often in our dataset.
The following table summarizes our evaluation metrics.
Model | Precision | Recall | F1 |
Base BERT | 0.87 | 0.95 | 0.91 |
BioBert | 0.89 | 0.95 | 0.92 |
BioBERT with HPO | 0.89 | 0.96 | 0.929 |
BioBERT with HPO and synthetically generated adverse event | 0.90 | 0.96 | 0.933 |
Although these are relatively small and incremental improvements over the base BERT model, this nevertheless demonstrates some viable strategies to improve model performance through these methods. Synthetic data generation with Falcon seems to hold a lot of promise and potential for performance improvements, especially as these generative AI models get better over time.
Clean up
To avoid incurring future charges, delete any resources created like the model and model endpoints you created with the following code:
Conclusion
Many pharmaceutical companies today would like to automate the process of identifying adverse events from their customer interactions in a systematic way in order to help improve customer safety and outcomes. As we showed in this post, the fine-tuned LLM BioBERT with synthetically generated adverse events added to the data classifies the adverse events with high F1 scores and can be used to build a HIPAA-compliant solution for our customers.
As always, AWS welcomes your feedback. Please leave your thoughts and questions in the comments section.
About the authors
Zack Peterson is a data scientist in AWS Professional Services. He has been hands on delivering machine learning solutions to customers for many years and has a master’s degree in Economics.
Dr. Adewale Akinfaderin is a senior data scientist in Healthcare and Life Sciences at AWS. His expertise is in reproducible and end-to-end AI/ML methods, practical implementations, and helping global healthcare customers formulate and develop scalable solutions to interdisciplinary problems. He has two graduate degrees in Physics and a doctorate degree in Engineering.
Ekta Walia Bhullar, PhD, is a senior AI/ML consultant with the AWS Healthcare and Life Sciences (HCLS) Professional Services business unit. She has extensive experience in the application of AI/ML within the healthcare domain, especially in radiology. Outside of work, when not discussing AI in radiology, she likes to run and hike.
Han Man is a Senior Data Science & Machine Learning Manager with AWS Professional Services based in San Diego, CA. He has a PhD in Engineering from Northwestern University and has several years of experience as a management consultant advising clients in manufacturing, financial services, and energy. Today, he is passionately working with key customers from a variety of industry verticals to develop and implement ML and generative AI solutions on AWS.
Twitch Streamer Mr_Vudoo Supercharges Gaming, Entertaining and Video Editing With RTX This Week ‘In the NVIDIA Studio’
Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.
Mr_Vudoo is a digital renaissance man — a livestreamer, video editor, gamer and entertainer skilled in producing an array of content for his audience.
This week’s featured artist In the NVIDIA Studio, he recently acquired a new GeForce RTX 4080 SUPER graphics card, which helps creators like him take their content to the next level. (Read more about the 4080 SUPER below.)
There’s no better place for creative types to connect with others and explore what’s next in AI and accelerated computing than GTC, which is back in-person, from March 18-21 in San Jose.
From the keynote by NVIDIA founder and CEO Jensen Huang to hundreds of sessions, exhibits and networking events, GTC delivers something for every technical level and interest area, including sessions on how to power content creation using OpenUSD and generative AI. GTC registration is open for virtual or in-person attendance.
In other NVIDIA Studio news, Topaz Labs, a company that delivers AI-powered photo and video enhancement software, recently adopted NVIDIA TensorRT acceleration for its new Remove Tool. It uses AI to replace unwanted objects in an image with a context-aware background. The tool expedites photo editing workflows, delivering 2.4x faster processing on a GeForce RTX 4090 GPU compared with an Apple MacBook M3 Max.
Mr_Vudoo Taps RTX for a Livestreaming Upgrade
Mr_Vudoo’s Twitch channel is known for its variety of unique content. In his Trading Card Games series, Mr_Vudoo opens trading card packs and competes at one of the highest levels live on stream. In his Gameplay series, he goes back to his original love for gaming, streaming both multiplayer online games and third-person shooters. And in his In Real Life series, he breaks the norm of traditional streaming bringing his viewers outside to share his everyday life experiences.
It takes significant computing power to bring these series to life. Mr_Vudoo’s GeForce RTX 4080 SUPER features the eighth-generation NVIDIA NVENC — an independent component for encoding video that frees up the system to run games or tackle other compute-intensive tasks. Using it, Mr_Vudoo can achieve a more seamless streaming experience.
“It’s a revelation to stream with high-quality settings and get almost no performance loss in games,” he said.
Mr_Vudoo can also join in the new Twitch Enhanced Broadcasting beta, powered by GeForce RTX GPUs and NVENC, to broadcast up to three resolutions simultaneously at up to 1080p. This eliminates the need to trade off resolution for stream reliability.
In the coming months, Enhanced Broadcasting beta testers will be able to experiment with higher input bit rates, up to 4K resolutions, support for up to five concurrent streams and new codecs.
Mr_Vudoo uses a number of AI-enabled features in the video and photo editing part of his creative workflow. With RTX acceleration, he can add a video camera effect with a timed zoom and a film strip transition in real time without having to render the entire project.
“Adding multiple effects on a clip without affecting the preview of the video is a massive time-saver,” he said.
DaVinci Resolve has several RTX-accelerated AI features that can boost content creation, offering tools to smooth slow-motion effects or provide seamless video super resolution. These features are available to all RTX GPU owners.
“GeForce RTX graphics cards are the best GPUs for video editors to use, as they can render tasks much faster, allowing us to become more efficient with our work.” — Mr_Vudoo
Mr_Vudoo can quickly export files with the RTX 4080 SUPER’s dual encoders, which work in tandem to slash export times nearly in half.
In post-production, Mr_Vudoo uses Adobe Photoshop’s AI-powered subject selection tool to quickly isolate objects in an image, instead of having to manually crop them out, speeding his work.
Mr_Vudoo also taps the free NVIDIA Broadcast app to boost his productivity.
“I’ve utilized the video noise removal and background replacement the most,” he said. “The eye contact feature was very interesting and quite honestly took me by surprise at how well it worked.”
AI has become an irreplaceable part of Mr_Vudoo’s content creation process, helping him quickly and effortlessly produce his best work. Catch him on Twitch.
RTX 4080 SUPER Brings Super Performance
GeForce RTX 4080 SUPER graphics cards are changing the content creation game.
Generative AI apps like Adobe Photoshop can take advantage of the GPU’s Tensor Cores to speed productivity and creative workflows. With the 4080 SUPER, 3D apps like Blender can run up to 70% faster than on previous-generation graphics cards. And video editing apps like Blackmagic Design’s DaVinci Resolve can accelerate AI effects over 30% faster than with the GeForce RTX 3080 Ti.
For gamers, the RTX 4080 SUPER enables greater immersion, with fully ray-traced visuals and the ability to run all settings at max. It delivers twice the speed of the RTX 3080 Ti, up to 836 trillion operations per second, in the most graphically intensive games with DLSS Frame Generation.
Since its release, GeForce RTX 4080 SUPER Series GPUs have been put to the test in creating, gaming and other AI-powered tasks. Here’s what some reviewers had to say:
- “Jumping over to creative professionals and content creators, the 4080 Super also provides nice performance gains over the standard GeForce RTX 4080 in applications like Blender, Maya, After Effects, DaVinci Resolve and more. This means users can take full advantage of what the NVIDIA 4080 SUPER offers in much more than just gaming and can push the software they use for streaming, video creation, audio and 3D creation to get the most out of their PC.” – CG Magazine
- “Features like NVIDIA Broadcast and Reflex hold deep practical appeal; RTX Video Super Resolution uses AI to make ugly videos beautiful. And NVIDIA maintains a strong lead in most creative and machine learning/AI workloads if you like to put your GPU to work when you’re not playing — witness the dual AV1 encoders in the 4080 SUPER” — PC World
- “Blender can make use of the RT cores on NVIDIA’s GPUs through the OptiX ray tracing rendering engine, and as a result, performance is much higher than any competing GPU in a similar class. The GeForce RTX 4080 SUPER notches another victory over its namesake, and dang the RTX 4090 is a beast.” – Hot Hardware
- “In terms of creative performance, the RTX 4080 SUPER walks away the winner against the RX 7900 XTX, even if you don’t factor in the fact that Blender Benchmark 4.0.0 workloads wouldn’t even run on the RX 7900 XTX (though the RX 7900 XT was able to run them, just not nearly as well).” – Tech Radar
- “And when you throw in RTX technologies like DLSS into the mix, NVIDIA’s superior AV1 encoding quality, content-creator-friendly features, and performance, plus generative AI capabilities – and there’s a lot more to the story here than pure 4K gaming EXPERT-ise.” — TweakTown
Discover what RTX 4080 SUPER Series graphics cards and systems are available.
PyTorch 2 paper and tutorial @ ASPLOS 2024
The PyTorch team is excited to share that our paper on PyTorch 2 has been accepted for presentation at the ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), scheduled to take place from April 27 to May 1, 2024, in San Diego, CA, USA.
The paper delves into the implementation of torch.compile and highlights the key technologies driving it, including TorchDynamo (graph capture), TorchInductor (backend compiler), and Dynamic Shape support.
During the ASPLOS conference, we’ll be conducting a tutorial on Saturday, April 27, focusing on the inner workings of PyTorch 2 and how systems researchers can leverage and build upon it. Stay tuned for more details as the event approaches – we look forward to your participation!
A preview of the paper is attached below:
Title: PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. Full Paper PDF
Abstract
This paper introduces two extensions to the popular PyTorch machine learning framework, TorchDynamo and TorchInductor, which implement the torch.compile feature released in PyTorch 2. TorchDynamo is a Python-level just-in-time (JIT) compiler that enables graph compilation in PyTorch programs without sacrificing the flexibility of Python. It achieves this by dynamically modifying Python bytecode before execution and extracting sequences of PyTorch operations into an FX graph, which is then JIT compiled using one of many extensible backends. TorchInductor is the default compiler backend for TorchDynamo, which translates PyTorch programs into OpenAI’s Triton for GPUs and C++ for CPUs. Results show that TorchDynamo is able to capture graphs more robustly than prior approaches while adding minimal overhead, and TorchInductor is able to provide a 2.27x inference and 1.41x training geometric mean speedup on an NVIDIA A100 GPU across 180+ real-world models, which outperforms six other compilers. These extensions provide a new way to apply optimizations through compilers in eager mode frameworks like PyTorch.
Authors
Jason Ansel (Meta); Edward Yang (Meta); Horace He (Meta); Natalia Gimelshein (OpenAI); Animesh Jain (Meta); Michael Voznesensky (Meta); Bin Bao (Meta); David Berard (Meta); Geeta Chauhan (Meta); Anjali Chourdia (Meta); Will Constable (Meta); Alban Desmaison (Meta); Zachary DeVito (Meta); Elias Ellison (Meta); Will Feng (Meta); Jiong Gong (Intel); Michael Gschwind (Meta); Brian Hirsh (Meta); Sherlock Huang (Meta); Laurent Kirsch (Meta); Michael Lazos (Meta); Yanbo Liang (Meta); Jason Liang (Meta); Yinghai Lu (Meta); CK Luk (Meta); Bert Maher (Meta); Yunjie Pan (University of Michigan); Christian Puhrsch (Meta); Matthias Reso (Meta); Mark Saroufim (Meta); Helen Suk (Meta); Michael Suo (Meta); Phil Tillet (OpenAI); Eikan Wang (Intel); Xiaodong Wang (Meta); William Wen (Meta); Shunting Zhang (Meta); Xu Zhao (Meta); Keren Zhou (OpenAI & George Mason University); Richard Zou (Meta); Ajit Mathews (Meta); Gregory Chanan (Meta); Peng Wu (Meta); Soumith Chintala (Meta)