NVIDIA Releases cuPyNumeric, Enabling Scientists to Harness GPU Acceleration at Cluster Scale

NVIDIA Releases cuPyNumeric, Enabling Scientists to Harness GPU Acceleration at Cluster Scale

Whether they’re looking at nanoscale electron behaviors or starry galaxies colliding millions of light years away, many scientists share a common challenge — they must comb through petabytes of data to extract insights that can advance their fields.

With the NVIDIA cuPyNumeric accelerated computing library, researchers can now take their data-crunching Python code and effortlessly run it on CPU-based laptops and GPU-accelerated workstations, cloud servers or massive supercomputers. The faster they can work through their data, the quicker they can make decisions about promising data points, trends worth investigating and adjustments to their experiments.

To make the leap to accelerated computing, researchers don’t need expertise in computer science. They can simply write code using the familiar NumPy interface or apply cuPyNumeric to existing code, following best practices for performance and scalability.

Once cuPyNumeric is applied, they can run their code on one or thousands of GPUs with zero code changes.

The latest version of cuPyNumeric, now available on Conda and GitHub, offers support for the NVIDIA GH200 Grace Hopper Superchip, automatic resource configuration at run time and improved memory scaling. It also supports HDF5, a popular file format in the scientific community that helps efficiently manage large, complex data.

Researchers at the SLAC National Accelerator Laboratory, Los Alamos National Laboratory, Australia National University, UMass Boston, the Center for Turbulence Research at Stanford University and the National Payments Corporation of India are among those who have integrated cuPyNumeric to achieve significant improvements in their data analysis workflows.

Less Is More: Limitless GPU Scalability Without Code Changes

Python is the most common programming language for data science, machine learning and numerical computing, used by millions of researchers in scientific fields including astronomy, drug discovery, materials science and nuclear physics. Tens of thousands of packages on GitHub depend on the NumPy math and matrix library, which had over 300 million downloads last month. All of these applications could benefit from accelerated computing with cuPyNumeric.

Many of these scientists build programs that use NumPy and run on a single CPU-only node — limiting the throughput of their algorithms to crunch through increasingly large datasets collected by instruments like electron microscopes, particle colliders and radio telescopes.

cuPyNumeric helps researchers keep pace with the growing size and complexity of their datasets by providing a drop-in replacement for NumPy that can scale to thousands of GPUs. cuPyNumeric doesn’t require code changes when scaling from a single GPU to a whole supercomputer. This makes it easy for researchers to run their analyses on accelerated computing systems of any size.

Solving the Big Data Problem, Accelerating Scientific Discovery

Researchers at SLAC National Accelerator Laboratory, a U.S. Department of Energy lab operated by Stanford University, have found that cuPyNumeric helps them speed up X-ray experiments conducted at the Linac Coherent Light Source.

A SLAC team focused on materials science discovery for semiconductors found that cuPyNumeric accelerated its data analysis application by 6x, decreasing run time from minutes to seconds. This speedup allows the team to run important analyses in parallel when conducting experiments at this highly specialized facility.

By using experiment hours more efficiently, the team anticipates it will be able to discover new material properties, share results and publish work more quickly.

Other institutions using cuPyNumeric include: 

  • Australia National University, where researchers used cuPyNumeric to scale the Levenberg-Marquardt optimization algorithm to run on multi-GPU systems at the country’s National Computational Infrastructure. While the algorithm can be used for many applications, the researchers’ initial target is large-scale climate and weather models.
  • Los Alamos National Laboratory, where researchers are applying cuPyNumeric to accelerate data science, computational science and machine learning algorithms. cuPyNumeric will provide them with additional tools to effectively use the recently launched Venado supercomputer, which features over 2,500 NVIDIA GH200 Grace Hopper Superchips.
  • Stanford University’s Center for Turbulence Research, where researchers are developing Python-based computational fluid dynamics solvers that can run at scale on large accelerated computing clusters using cuPyNumeric. These solvers can seamlessly integrate large collections of fluid simulations with popular machine learning libraries like PyTorch, enabling complex applications including online training and reinforcement learning.
  • UMass Boston, where a research team is accelerating linear algebra calculations to analyze microscopy videos and determine the energy dissipated by active materials. The team used cuPyNumeric to decompose a matrix of 16 million rows and 4,000 columns.
  • National Payments Corporation of India, the organization behind a real-time digital payment system used by around 250 million Indians daily and expanding globally. NPCI uses complex matrix calculations to track transaction paths between payers and payees. With current methods, it takes about 5 hours to process data for a one-week transaction window on CPU systems. A trial showed that applying cuPyNumeric to accelerate the calculations on multi-node NVIDIA DGX systems could speed up matrix multiplication by 50x, enabling NPCI to process larger transaction windows in less than an hour and detect suspected money laundering in near real time.

To learn more about cuPyNumeric, see a live demo in the NVIDIA booth at the Supercomputing 2024 conference in Atlanta, join the theater talk in the expo hall and participate in the cuPyNumeric workshop.   

Watch the NVIDIA special address at SC24.

Read More

How InsuranceDekho transformed insurance agent interactions using Amazon Bedrock and generative AI

How InsuranceDekho transformed insurance agent interactions using Amazon Bedrock and generative AI

This post is co-authored with Nishant Gupta from InsuranceDekho.

The insurance industry is complex and overwhelming, with numerous options that can be hard for consumers to understand. This complexity hinders customers from making informed decisions. As a result, customers face challenges in selecting the right insurance coverage, while insurance aggregators and agents struggle to provide clear and accurate information.

InsuranceDekho is a leading InsurTech service that offers a wide range of insurance products from over 49 insurance companies in India. The service operates through a vast network of 150,000 point of sale person (POSP) agents and direct-to-customer channels. InsuranceDekho uses cutting-edge technology to simplify the insurance purchase process for all users. The company’s mission is to make insurance transparent, accessible, and hassle-free for all customers through tech-driven solutions.

In this post, we explain how InsuranceDekho harnessed the power of generative AI using Amazon Bedrock and Anthropic’s Claude to provide responses to customer queries on policy coverages, exclusions, and more. This let our customer care agents and POSPs confidently help our customers understand the policies without reaching out to insurance subject matter experts (SMEs) or memorizing complex plans while providing sales and after-sales services. The use of this solution has improved sales, cross-selling, and overall customer service experience.

Amazon Bedrock provided the flexibility to explore various leading LLM models using a single API, reducing the undifferentiated heavy lifting associated with hosting third-party models. Leveraging this, InsuranceDekho developed the industry’s first Health Pro Genie with the most efficient engine. It facilitates the insurance agents to choose the right plan for the end customer from the pool of over 125 health plans from 21 different health insurers available on the InsuranceDekho platform.

– Ish Babbar, Co-Founder and CTO, InsuranceDekho

The challenge

InsuranceDekho faced a significant challenge in responding to customer queries on insurance products in a timely manner. For a given lead, the insurance advisors, particularly those who are new to insurance, would often reach out to SMEs to inquire about policy or product-specific queries. The added step of SME consultation resulted in a process slowdown, requiring advisors to await expert input before responding to customers, introducing delays of a few minutes. Additionally, although SMEs can provide valuable guidance and expertise, their involvement introduces additional costs.

This delay not only affects the customer’s experience but also results in lost prospects because potential customers may decide not to purchase and explore competing services if they get better clarity on those products. The current process was inefficient, and InsuranceDekho needed a solution to empower its agents to respond to customer queries confidently and efficiently, without requiring excessive memorization.

The following figure depicts a common scenario where an SME receives multiple calls from insurance advisors, resulting in delays for the customers. Because SMEs can handle one call at a time, the advisors are left waiting for a response. This further prolongs the time it takes for customers to get clarity on the insurance product and decide on which product they want to purchase.

Solution overview

To overcome the limitations of relying on SMEs, a generative AI-based chat assistant was developed to autonomously resolve agent queries with accuracy. One of the key considerations while designing the chat assistant was to avoid responses from the default large language model (LLM) trained on generic data and only use the insurance policy documents. To generate such high-quality responses, we decided to go with the Retrieval Augmented Generation (RAG) approach using Amazon Bedrock and Anthropic’s Claude Haiku.

Amazon Bedrock

We conducted a thorough evaluation of several generative AI model providers and selected Amazon Bedrock as our primary provider for our foundation model (FM) needs. The key reasons that influenced this decision were:

  • Managed service – Amazon Bedrock is a fully serverless offering that offers a choice of industry leading FMs without provisioning infrastructure, procuring GPUs around the clock, or configuring ML frameworks. As a result, it significantly reduces development, deployment overheads, and total cost of ownership, while enhancing efficiency and accelerating innovation in disruptive technologies like generative AI.
  • Continuous model enhancements – Amazon Bedrock provides access to a vast and continuously expanding set of FMs through a single API. The continuous additions and updates to its model portfolio facilitate access to the latest advancements and improvements in AI technology, enabling us to evaluate upcoming LLMs and optimize output quality, latency, and cost by selecting the most suitable LLM for each specific task or application. We experienced this flexibility firsthand when we seamlessly transitioned from Anthropic’s Claude Instant to Anthropic’s Claude Haiku with the advent of Anthropic’s Claude 3, without requiring code changes.
  • Performance – Amazon Bedrock provides options to achieve high-performance, low-latency, and scalable inference capabilities through on-demand and provisioned throughput options depending on the requirements.
  • Secure model access – Secure, private model access using AWS PrivateLink gives controlled data transfer for inference without traversing the public internet, maintaining data privacy and helping to adhere to compliance requirements.

Retrieval Augmented Generation

RAG is a process in which LLMs access external documents or knowledge bases, promoting accurate and relevant responses. By referencing authoritative sources beyond their training data, RAG helps LLMs generate high-quality responses and overcome common pitfalls such as outdated or misleading information. RAG can be applied to various applications, including improving customer service, enhancing research capabilities, and streamlining business processes.

Solution building blocks

To begin designing the solution, we identified the key components needed, including the generative AI service, LLMs, vector databases, and caching engines. In this section, we delve into the key building blocks used in the solution, highlighting their importance in achieving optimal accuracy, cost-effectiveness, and performance:

  • LLMs – After a thorough evaluation of various LLMs and benchmarking, we chose Anthropic’s Claude Haiku for its exceptional performance. The benchmarking results demonstrated unparalleled speed and affordability in its category. Additionally, it delivered rapid and accurate responses while handling straightforward queries or complex requests, making it an ideal choice for our use case.
  • Embedding model – An embedding model is a type of machine learning (ML) model that maps discrete objects, such as words, phrases, or entities, into dense vector representations in a continuous embedding space. These vector representations, called embeddings, capture the semantic and syntactic relationships between the objects, allowing the model to reason about their similarities and differences. For our use case, we used a third-party embedding model.
  • Vector database – For the purpose of vector database, we chose Amazon OpenSearch Service because of its scalability, high-performance search capabilities, and cost-effectiveness. Additionally, the OpenSearch Service flexible data model and integration with other features make it an ideal choice for our use case.
  • Caching – To enhance the performance, efficiency, and cost-effectiveness of our chat assistant, we used Redis on Amazon ElastiCache to cache frequently accessed responses. This approach enables the chat assistant to rapidly retrieve cached responses, minimizing latency and computational load and resulting in a significantly improved user experience and reduced cost.

Implementation details

The following diagram illustrates the workflow of the current solution. Overall, the workflow can be divided into two workflows: the ingestion workflow and the response generation workflow.

Ingestion workflow

The ingestion workflow serves as the foundation that fuels the entire response generation workflow by keeping the knowledge base up to date with the latest information. This process is crucial for making sure that the system can provide accurate and relevant responses based on the most recent insurance policy documents. The ingestion workflow involves three key components: policy documents, embedding model, and OpenSearch Service as a vector database.

  1. The policy documents contain the insurance policy information that needs to be ingested into the knowledge base.
  2. These documents are processed by the embedding model, which converts the textual content into high-dimensional vector representations, capturing the semantic meaning of the text. After the embedding model generates the vector representations of the policy documents, these embeddings are stored in OpenSearch Service. This ingestion workflow enables the chat assistant to provide responses based on the latest policy information available.

Response generation workflow

The response generation workflow is the core of our chat assistant solution. Insurance advisors use it to provide comprehensive responses to customers’ queries regarding policy coverage, exclusions, and other related topics.

  1. To initiate this workflow, our chatbot serves as the entry point, facilitating seamless interaction between the insurance advisors and the underlying response generation system.
  2. This solution incorporates a caching mechanism that uses semantic search to check if a query has been recently processed and answered. If a match is found in the cache (Redis), the chat assistant retrieves and returns the corresponding response, bypassing the full response generation workflow for redundant queries, thereby enhancing system performance.
  3. If no match is found in the cache, the query goes to the intent classifier powered by Anthropic’s Claude Haiku. It analyzes the query to understand the user’s intent and classify it accordingly. This enables dynamic prompting and tailored processing based on the query type. For generic or common queries, the intent classifier can provide the final response independently, bypassing the full RAG workflow, thereby optimizing efficiency and response times.
  4. For queries requiring the full RAG workflow, the intent classifier passes the query to the retrieval step, where a semantic search is performed on a vector database containing insurance policy documents to find the most relevant information, that is, the context based on the query.
  5. After the retrieval step, the retrieved context is integrated with the query and prompt, and this augmented information is fed into the generation process. This augmentation is the core component that enables the enhancement of the generation.
  6. In the final generation step, the actual response to the query is produced based on the external knowledge base of policy documents.

Results

The implementation of the generative AI-powered RAG chat assistant solution has yielded impressive results for InsuranceDekho. By using this solution, insurance advisors can now confidently and efficiently address customer queries autonomously, without the constant need for SME involvement. Additionally, the implementation of this solution has resulted in a significant reduction in response time to address customer queries. InsuranceDekho has witnessed a remarkable 80% decrease in the response time of the customer queries to understand the plan features, inclusions, and exclusions.

InsuranceDekho’s adoption of this generative AI-powered solution has streamlined the customer service process, making sure that customers receive precise and trustworthy responses to their inquiries in a timely manner.

Conclusion

In this post, we discussed how InsuranceDekho harnessed the power of generative AI to equip its insurance advisors with the tools to efficiently respond to customer queries regarding various insurance policies. By implementing a RAG-based chat assistant using Amazon Bedrock and OpenSearch Service, InsuranceDekho empowered its insurance advisors to deliver exceptional service. This solution minimized the reliance on SMEs and significantly reduced response times so advisors could address customer inquiries promptly and accurately.


About the Authors

Vishal Gupta is a Senior Solutions Architect at AWS India, based in Delhi. In his current role at AWS, he works with digital native business customers and enables them to design, architect, and innovate highly scalable, resilient, and cost-effective cloud architectures. An avid blogger and speaker, Vishal loves to share his knowledge with the tech community. Outside of work, he enjoys traveling to new destinations and spending time with his family.

Nishant Gupta is working as Vice President, Engineering at InsuranceDekho with 14 years of experience. He is passionate about building highly scalable, reliable, and cost-optimized solutions that can handle massive amounts of data efficiently.

Read More

Hopper Scales New Heights, Accelerating AI and HPC Applications for Mainstream Enterprise Servers

Hopper Scales New Heights, Accelerating AI and HPC Applications for Mainstream Enterprise Servers

Since its introduction, the NVIDIA Hopper architecture has transformed the AI and high-performance computing (HPC) landscape, helping enterprises, researchers and developers tackle the world’s most complex challenges with higher performance and greater energy efficiency.

During the Supercomputing 2024 conference, NVIDIA announced the availability of the NVIDIA H200 NVL PCIe GPU — the latest addition to the Hopper family. H200 NVL is ideal for organizations with data centers looking for lower-power, air-cooled enterprise rack designs with flexible configurations to deliver acceleration for every AI and HPC workload, regardless of size.

According to a recent survey, roughly 70% of enterprise racks are 20kW and below and use air cooling. This makes PCIe GPUs essential, as they provide granularity of node deployment, whether using one, two, four or eight GPUs enabling data centers to pack more computing power into smaller spaces. Companies can then use their existing racks and select the number of GPUs that best suits their needs. 

Enterprises can use H200 NVL to accelerate AI and HPC applications, while also improving energy efficiency through reduced power consumption. With a 1.5x memory increase and 1.2x bandwidth increase over NVIDIA H100 NVL, companies can use H200 NVL to fine-tune LLMs within a few hours and deliver up to 1.7x faster inference performance. For HPC workloads, performance is boosted up to 1.3x over H100 NVL and 2.5x over the NVIDIA Ampere architecture generation. 

Complementing the raw power of the H200 NVL is NVIDIA NVLink technology. The latest generation of NVLink provides GPU-to-GPU communication 7x faster than fifth-generation PCIe — delivering higher performance to meet the needs of HPC, large language model inference and fine-tuning. 

The NVIDIA H200 NVL is paired with powerful software tools that enable enterprises to accelerate applications from AI to HPC. It comes with a five-year subscription for NVIDIA AI Enterprise, a cloud-native software platform for the development and deployment of production AI. NVIDIA AI Enterprise includes NVIDIA NIM microservices for the secure, reliable deployment of high-performance AI model inference. 

Companies Tapping Into Power of H200 NVL

With H200 NVL, NVIDIA provides enterprises with a full-stack platform to develop and deploy their AI and HPC workloads. 

Customers are seeing significant impact for multiple AI and HPC use cases across industries, such as visual AI agents and chatbots for customer service, trading algorithms for finance, medical imaging for improved anomaly detection in healthcare, pattern recognition for manufacturing, and seismic imaging for federal science organizations. 

Dropbox is harnessing NVIDIA accelerated computing for its services and infrastructure.

Dropbox handles large amounts of content, requiring advanced AI and machine learning capabilities,” said Ali Zafar, VP of Infrastructure at Dropbox. “We’re exploring H200 NVL to continually improve our services and bring more value to our customers.”

The University of New Mexico has been using NVIDIA accelerated computing in various research and academic applications. 

“As a public research university, our commitment to AI enables the university to be on the forefront of scientific and technological advancements,” said Prof. Patrick Bridges, director of the UNM Center for Advanced Research Computing. “As we shift to H200 NVL, we’ll be able to accelerate a variety of applications, including data science initiatives, bioinformatics and genomics research, physics and astronomy simulations, climate modeling and more.”

H200 NVL Available Across Ecosystem

Dell Technologies, Hewlett Packard Enterprise, Lenovo and Supermicro are expected to deliver a wide range of configurations supporting H200 NVL. 

Additionally, H200 NVL will be available in platforms from Aivres, ASRock Rack, ASUS, GIGABYTE, Ingrasys, Inventec, MSI, Pegatron, QCT, Wistron and Wiwynn.

Some systems are based on the NVIDIA MGX modular architecture, which enables computer makers to quickly and cost-effectively build a vast array of data center infrastructure designs.

Platforms with H200 NVL will be available from NVIDIA’s global systems partners beginning in December. To complement availability from leading global partners, NVIDIA is also developing an Enterprise Reference Architecture for H200 NVL systems. 

The reference architecture will incorporate NVIDIA’s expertise and design principles, so partners and customers can design and deploy high-performance AI infrastructure based on H200 NVL at scale. This includes full-stack hardware and software recommendations, with detailed guidance on optimal server, cluster and network configurations. Networking is optimized for the highest performance with the NVIDIA Spectrum-X Ethernet platform.

NVIDIA technologies will be showcased on the showroom floor at SC24, taking place at the Georgia World Congress Center through Nov. 22. To learn more, watch NVIDIA’s special address.

See notice regarding software product information.

Read More

Foxconn Expands Blackwell Testing and Production With New Factories in U.S., Mexico and Taiwan

Foxconn Expands Blackwell Testing and Production With New Factories in U.S., Mexico and Taiwan

To meet demand for Blackwell, now in full production, Foxconn, the world’s largest electronics manufacturer, is using NVIDIA Omniverse. The platform for developing industrial AI simulation applications is helping bring facilities in the U.S., Mexico and Taiwan online faster than ever.

Foxconn uses NVIDIA Omniverse to virtually integrate their facility and equipment layouts, NVIDIA Isaac Sim for autonomous robot testing and simulation, and NVIDIA Metropolis for vision AI.

Omniverse enables industrial developers to maximize efficiency through test and optimization in a digital twin before deploying costly change-orders to the physical world. Foxconn expects its Mexico facility alone to deliver significant cost savings and a reduction in kilowatt-hour usage of more than 30% annually.

World’s Largest Electronics Maker Plans With Omniverse and AI

To meet demands at Foxconn, factory planners are building physical AI-powered robotic factories with Omniverse and NVIDIA AI.

The company has built digital twins with Omniverse that allow their teams  to virtually integrate facility and equipment information from leading industry applications, such as Siemens Teamcenter X and Autodesk Revit. Floor plan layouts are optimized first in the digital twin, and planners can locate optimal camera positions that help measure and identify ways to streamline operations with Metropolis visual AI agents.

In the construction process, the Foxconn teams use the Omniverse digital twin as the source of truth to communicate and validate the accurate layout and placement of equipment.

Virtual integration on Omniverse offers significant advantages, potentially saving factory planners millions by reducing costly change orders in real-world operations.

Delivering Robotics for Manufacturing With Omniverse Digital Twin

Once the digital twin of the factory is built, it becomes a virtual gym for Foxconn’s fleets of autonomous robots including industrial manipulators and autonomous mobile robots. Foxconn’s robot developers can simulate, test and validate their AI robot models in NVIDIA Isaac Sim before deploying to their real world robots.

Using Omniverse, Foxconn can simulate robot AIs before deploying to NVIDIA Jetson-driven autonomous mobile robots.

On assembly lines, they can simulate with Isaac Manipulator libraries and AI models for automated optical inspection, object identification, defect detection and trajectory planning.

Omniverse also enables their facility planners to test and optimize intelligent camera placement before installing in the physical world – ensuring they have complete coverage of the factory floor to support worker safety, and provide the foundation for visual AI agent frameworks.

Creating Efficiencies While Building Resilient Supply Chains

Using NVIDIA Omniverse and AI, Foxconn plans to replicate its precision production lines across the world. This will enable it to quickly deploy high-quality production facilities that meet unified standards, increasing the company’s competitive edge and adaptability in the market.

Foxconn’s ability to rapidly replicate will accelerate its global deployments and enhance its resilience in the supply chain in the face of disruptions, as it can quickly adjust production strategies and reallocate resources to ensure continuity and stability to meet changing demands.

Foxconn’s Mexico facility will begin production early next year and the Taiwan location will begin production in December.

Learn more about Blackwell and Omniverse.

Read More

From Algorithms to Atoms: NVIDIA ALCHEMI NIM Catalyzes Sustainable Materials Research for EV Batteries, Solar Panels and More

From Algorithms to Atoms: NVIDIA ALCHEMI NIM Catalyzes Sustainable Materials Research for EV Batteries, Solar Panels and More

More than 96% of all manufactured goods — ranging from everyday products, like laundry detergent and food packaging, to advanced industrial components, such as semiconductors, batteries and solar panels — rely on chemicals that cannot be replaced with alternative materials.

With AI and the latest technological advancements, researchers and developers are studying ways to create novel materials that could address the world’s toughest challenges, such as energy storage and environmental remediation.

Announced today at the Supercomputing 2024 conference in Atlanta, the NVIDIA ALCHEMI NIM microservice accelerates such research by optimizing AI inference for chemical simulations that could lead to more efficient and sustainable materials to support the renewable energy transition.

It’s one of the many ways NVIDIA is supporting researchers, developers and enterprises to boost energy and resource efficiency in their workflows, including to meet requirements aligned with the global Net Zero Initiative.

NVIDIA ALCHEMI for Material and Chemical Simulations

Exploring the universe of potential materials, using the nearly infinite combinations of chemicals — each with unique characteristics — can be extremely complex and time consuming. Novel materials are typically discovered through laborious, trial-and-error synthesis and testing in a traditional lab.

Many of today’s plastics, for example, are still based on material discoveries made in the mid-1900s.

More recently, AI has emerged as a promising accelerant for chemicals and materials innovation.

With the new ALCHEMI NIM microservice, researchers can test chemical compounds and material stability in simulation, in a virtual AI lab, which reduces costs, energy consumption and time to discovery.

For example, running MACE-MP-0, a pretrained foundation model for materials chemistry, on an NVIDIA H100 Tensor Core GPU, the new NIM microservice speeds evaluations of a potential composition’s simulated long-term stability 100x. The below figure shows a 25x speedup from using the NVIDIA Warp Python framework for high-performance simulation, followed by a 4x speedup with in-flight batching. All in all, evaluating 16 million structures would have taken months — with the NIM microservice, it can be done in just hours.

By letting scientists examine more structures in less time, the NIM microservice can boost research on materials for use with solar and electric batteries, for example, to bolster the renewable energy transition.

NVIDIA also plans to release NIM microservices that can be used to simulate the manufacturability of novel materials — to determine how they might be brought from test tubes into the real world in the form of batteries, solar panels, fertilizers, pesticides and other essential products that can contribute to a healthier, greener planet.

SES AI, a leading developer of lithium-metal batteries, is using the NVIDIA ALCHEMI NIM microservice with the AIMNet2 model to accelerate the identification of electrolyte materials used for electric vehicles.

“SES AI is dedicated to advancing lithium battery technology through AI-accelerated material discovery, using our Molecular Universe Project to explore and identify promising candidates for lithium metal electrolyte discovery,” said Qichao Hu, CEO of SES AI. “Using the ALCHEMI NIM microservice with AIMNet2 could drastically improve our ability to map molecular properties, reducing time and costs significantly and accelerating innovation.”

SES AI recently mapped 100,000 molecules in half a day, with the potential to achieve this in under an hour using ALCHEMI. This signals how the microservice is poised to have a transformative impact on material screening efficiency.

Looking ahead, SES AI aims to map the properties of up to 10 billion molecules within the next couple of years, pushing the boundaries of AI-driven, high-throughput discovery.

The new microservice will soon be available for researchers to test for free through the NVIDIA NGC catalog — be notified of ALCHEMI’s launch. It will also be downloadable from build.nvidia.com, and the production-grade NIM microservice will be offered through the NVIDIA AI Enterprise software platform.

Learn more about the NVIDIA ALCHEMI NIM microservice, and hear the latest on how AI and supercomputing are supercharging researchers and developers’ workflows by joining NVIDIA at SC24, running through Friday, Nov. 22.

See notice regarding software product information.

Read More

Introducing Yasuyuki Matsushita: Tackling societal challenges with AI at Microsoft Research Asia – Tokyo 

Introducing Yasuyuki Matsushita: Tackling societal challenges with AI at Microsoft Research Asia – Tokyo 

Earlier this year, Microsoft Research announced (opens in new tab) its newest lab in Tokyo, Japan. Today, we are celebrating its grand opening, reinforcing Microsoft Research’s commitment to AI research across the Asia-Pacific region. This new lab will focus on embodied AI, well-being and neuroscience, societal AI, and industry innovation—all areas that align with Japan’s socioeconomic priorities. This initiative will enhance collaboration with local academic and industrial partners, contributing to global innovation and talent development. 

We recently spoke with Yasuyuki Matsushita, head of the newly established Tokyo lab. Matsushita, who worked at Microsoft Research Asia from 2003 to 2015, served as a professor at Osaka University for the past decade, before returning in October. He reflects on his journey, the evolution of technology, and the opportunities ahead for Microsoft Research Asia – Tokyo.

Yasuyuki Matsushita, Senior Principal Research Manager, Microsoft Research Asia - Tokyo
Yasuyuki Matsushita, Microsoft Research Asia – Tokyo

Why return to Microsoft Research Asia?

Question: We are excited to have you leading the new lab in Tokyo. You worked at Microsoft Research Asia in Beijing from 2003 to 2015 before transitioning to academia. What motivated you to return after nearly a decade? 

Yasuyuki Matsushita: Microsoft Research Asia has always been an exceptional place for conducting cutting-edge research, especially in the AI era. Earlier this year, I learned about Microsoft Research Asia’s expansion, including the establishment of a new lab in Tokyo. This presented an exciting opportunity to make a meaningful impact both locally and globally, sparking my motivation to return. Additionally, Microsoft is at the forefront of AI advancements, making this an ideal moment to re-engage. I’m confident that my work can contribute meaningfully to this dynamic field. The pace of AI development today is unmatched, making this an exhilarating time to be involved. 

What has changed over the decade? 

Question: Now that you’ve been back for a few weeks, from your perspective, what has changed at Microsoft Research Asia, and what has remained the same since you were last here? 

Yasuyuki Matsushita: The most immediate change I’ve noticed is the array of employee tools and resources, which have evolved significantly over the past decade. I’m still familiarizing myself with these new systems, designed to optimize efficiency and collaboration. Over the past ten years, Microsoft has played a key role in driving digital transformation for other companies, and it has also transformed internally. 

Beyond these changes, much of what made Microsoft Research Asia unique remains the same. The culture and people continue to foster an environment of innovation and collaboration. The organization still attracts exceptional talent, and the spirit of research is as vibrant as ever. One of its greatest strengths is its open, collaborative approach. It has maintained long-standing partnerships with universities and research institutions, which encourage cross-regional, cross-cultural, and interdisciplinary exchanges. This synergy stimulates innovation and supports industry development. The commitment to excellence remains at the heart of Microsoft Research Asia’s identity. 

Plans for the Microsoft Research Asia – Tokyo lab 

Question: With Microsoft Research Asia expanding regionally to places like Tokyo, Vancouver, Singapore, and Hong Kong, what are your plans as the head of the Tokyo lab, and how do you see it contributing to the region’s innovation ecosystem?

Yasuyuki Matsushita: My primary goal is to align the Tokyo lab’s growth with Microsoft Research Asia’s mission to advance science and technology for the benefit of humanity. The research efforts we’re focusing on in this lab aim to address pressing societal issues while advancing AI technologies to benefit society as a whole. 

For instance, Japan’s aging population presents unique challenges that require efficient societal solutions—an issue faced by many nations today. Through our research, we aim to generate insights that can be applied globally to proactively address and mitigate such challenges. 

Japan also has a strong legacy of scientific research in fields like electronics, materials science, and robotics. Its advanced industrial base, featuring renowned companies across the automotive, electronics, and machinery sectors, provides rich application scenarios for our research outcomes. Additionally, Japan’s robust education system supplies an intellectual foundation crucial for our in-depth research. 

We’re dedicated to maintaining open research practices. By publishing our findings and open-sourcing our tools, we ensure our work benefits the broader industry and enriches the global knowledge pool. Our goal is to share insights that drive societal progress and innovation worldwide.

Cultivating the next generation 

Question: Talent is at the heart of Microsoft Research’s mission and culture. What kind of talent is Microsoft Research Asia – Tokyo looking for? In what ways can the Tokyo lab enhance its efforts to cultivate the next generation of tech innovators for the region? 

Yasuyuki Matsushita: One of the key advantages of being part of Microsoft is the close connection we have to real-world applications. This bridge between research and practice allows our work to have a direct societal impact, ensuring that innovative technology results in meaningful and beneficial outcomes. 

When recruiting new talent, we seek bright, self-driven individuals with an innate curiosity and a passion for solving societal challenges. The most vital trait we look for is a deep desire to understand the “why” behind complex problems. While technical expertise is essential, a commitment to addressing social issues fuels creativity and drives meaningful progress. This blend of curiosity and purpose sparks innovation and propels us forward at Microsoft Research Asia. 

At the Tokyo lab, a core part of our vision is cultivating the next wave of tech innovators. We plan to build on the legacy of successful talent programs that Microsoft Research Asia has championed throughout the region, like joint research initiatives, visiting scholar programs, and internship opportunities. These provide early career professionals and students with invaluable hands-on experiences, equipping them with essential research skills and deepening their understanding of complex technological challenges. 

We’re committed to creating a nurturing environment where talent can thrive, collaborate, and contribute to the global tech landscape. By combining innovation with real-world impact, we aim to inspire the next generation to push boundaries and advance society.

Rapid evolution in computer vision 

Question: In today’s world, everything is moving toward digitization and intelligence. Ten years ago, your research focused on photometry and video analysis. Can you share some key outcomes from that period and explain how you see emerging technologies like AI influencing the field of computer vision? 

Yasuyuki Matsushita: Back then, my research centered on computer vision, specifically on photometry for 3D reconstruction and video analysis aimed at enhancing video quality. One of the standout projects during that period was the development of a gigapixel camera capable of capturing high-resolution 3D information. This camera played a crucial role in the Dunhuang Mogao Grottoes project, which sought to digitally preserve the cultural heritage of Dunhuang’s murals and Buddha shrines with unprecedented accuracy.  

Another notable project was the development of video stabilization technology, which was integrated into Windows 7 as part of Media Foundation. This technology improved video quality by compensating for unwanted camera movements, delivering smoother and more professional-looking output. The creation of real-time algorithms capable of processing and stabilizing video was groundbreaking at that time. 

Since then, the introduction of deep learning, large datasets, and sophisticated neural network architectures has propelled computer vision to new heights. Tasks that were once considered difficult, such as object detection, recognition, and segmentation, are now standard with modern AI techniques. Current research continues to push the boundaries by exploring innovative network architectures, new learning strategies, and enhanced datasets. A particularly exciting trend is the use of AI in real-world interactive scenarios, leading to the emergence of embodied AI, which is a major focus of my current work.

Understanding embodied AI beyond robotics 

Question: Your current research interests include embodied AI, which is also one of the key areas at Microsoft Research Asia – Tokyo. What exactly is embodied AI, and how does it differ from robotics? 

Yasuyuki Matsushita: Embodied AI goes beyond traditional robotics. While robots are typically machines equipped with actuators designed to execute specific tasks, embodied AI focuses on developing intelligent systems that can perform complex tasks while understanding and interacting within physical and virtual environments. Robotics and AI have developed independently, but embodied AI is the convergence of these two fields, integrating AI with physical agents that can perceive, act, and learn in dynamic real-world environments. 

This field is inherently interdisciplinary, involving aspects such as robotic control, reinforcement learning, spatial awareness, human-robot interaction, reasoning, and more. For instance, embodied AI includes the ability to infer cause and effect, such as understanding that an unsupported laptop will fall due to gravity. These types of interactions and interpretations stem from engaging with and understanding the physical world, making embodied AI an exciting and multifaceted area of research. 

Given the complexity of embodied AI, no single organization can cover all aspects of its development alone. We look forward to collaborating with local industry and academic institutions in Japan, leveraging their expertise alongside our strengths in AI to advance the field. 

Advice for aspiring researchers in computer vision and AI 

Question: You’ve had an extensive career spanning academia and industry. From your experience as both an educator and a researcher, what advice would you give to young people interested in pursuing research in computer vision and AI? 

Yasuyuki Matsushita: For students interested in computer vision and AI, a strong foundation in mathematics and computer science is essential, even as specific research topics and technologies evolve. A deep understanding of fundamental mathematical concepts, such as gradients, Jacobians, and vector spaces, is indispensable. Mastery of these principles will be beneficial regardless of changes in programming languages or development platforms. 

Maintaining a mindset of continuous learning is equally important, as the field is constantly evolving. For example, deep learning was not as prominent a decade ago but is now central to the field. At Microsoft, we emphasize the importance of a growth mindset—being adaptable, open to new technologies, and willing to pivot with industry advancements. Early career professionals should cultivate the ability to quickly acquire new skills while building on their foundational knowledge. This adaptability is key to long-term success in research and development.

The post Introducing Yasuyuki Matsushita: Tackling societal challenges with AI at Microsoft Research Asia – Tokyo  appeared first on Microsoft Research.

Read More

BiomedParse: A foundation model for smarter, all-in-one biomedical image analysis

BiomedParse: A foundation model for smarter, all-in-one biomedical image analysis

A stylized illustration of a green line-drawn hand holding a transparent prism with colorful bands of light being refracted through it against a black background.

In cancer diagnosis or advanced treatments like immunotherapy, every detail in a medical image counts. Radiologists and pathologists rely on these images to track tumors, understand their boundaries, and analyze how they interact with surrounding cells. This work demands pinpoint accuracy across several tasks—identifying whether a tumor is present, locating it precisely, and mapping its contours on complex CT scans or pathology slides. 

Yet, these crucial steps—object recognition, detection, and segmentation—are often tackled separately, which can limit the depth of analysis. Current tools like MedSAM (opens in new tab) and SAM (opens in new tab) focus on segmentation only, thus missing out on the opportunity to blend these insights holistically and relegating object as an afterthought. 

In this blog, we introduce BiomedParse (opens in new tab), a new approach for holistic image analysis by treating object as the first-class citizen. By unifying object recognition, detection, and segmentation into a single framework, BioMedParse allows users to specify what they’re looking for through a simple, natural-language prompt. The result is a more cohesive, intelligent way of analyzing medical images that supports faster, more integrated clinical insights. 

While biomedical segmentation datasets abound, there are relatively few prior works on object detection and recognition in biomedicine, let alone datasets covering all three tasks. To pretrain BiomedParse, we created the first such dataset by harnessing OpenAI’s GPT-4 for data synthesis from standard segmentation datasets (opens in new tab).

BiomedParse is a single foundation model that can accurately segment biomedical objects across nine modalities, as seen in Figure 1, outperforming prior best methods while requiring orders of magnitude fewer user operations, as it doesn’t require an object-specific bounding box. By learning semantic representation for individual object types, BiomedParse’s superiority is particularly pronounced in the most challenging cases with irregularly shaped objects. Through joint pretraining of object recognition, detection, and segmentation, BiomedParse opens new possibilities for holistic image analysis and image-based discovery in biomedicine.  

a, The GPT-4 constructed ontology showing a hierarchy of object types that are used to unify semantic concepts across datasets. Bar plots showing the number of images containing that object type. b, Bar plot showing the number of image–mask–description triples for each modality in BiomedParseData. CT is abbreviation for Computed Tomography. MRI is abbreviation for Magnetic Resonance Imaging. OCT is abbreviation for Optical Coherence Tomography. c, Flowchart of BiomedParse. BiomedParse takes an image and a text prompt as input and then outputs the segmentation masks for the objects specified in the prompt. Image-specific manual interaction such as bounding box or clicks is not required in our framework. To facilitate semantic learning for the image encoder, BiomedParse also incorporates a learning objective to classify the meta-object type. For online inference, GPT-4 is used to resolve text prompt into object types using the object ontology, which also uses the meta-object type output from BiomedParse to narrow down candidate semantic labels. d, Uniform Manifold Approximation and Projection (UMAP) plots contrasting the text embeddings for different cell types derived from BiomedParse text encoder (left) and PubMedBERT (right). e, UMAP plots contrasting the image embeddings for different cell types derived from BiomedParse image encoder (left) and Focal (right).
Figure 1. Overview of BiomedParse and BiomedParseData.

Image parsing: a unifying framework for holistic image analysis 

Back in 2005, researchers first introduced the concept of “image parsing”—a unified approach to image analysis that jointly conducts object recognition, detection, and segmentation. Built on Bayesian networks, this early model offered a glimpse into a future of joint learning and reasoning in image analysis, though it was limited in scope and application. Fast forward to today, cutting-edge advances in generative AI have breathed new life into this vision. With our model, BiomedParse, we have created a foundation for biomedical image parsing that leverages interdependencies across the three subtasks, thus addressing key limitations in traditional methods. BiomedParse enables users to simply input a natural-language description of an object, which the model uses to predict both the object label and its segmentation mask, thus eliminating the need for a bounding box (Figure 1c). In other words, this joint learning approach lets users segment objects based on text alone.

Microsoft research podcast

Abstracts: August 15, 2024

Advanced AI may make it easier for bad actors to deceive others online. A multidisciplinary research team is exploring one solution: a credential that allows people to show they’re not bots without sharing identifying information. Shrey Jain and Zoë Hitzig explain.


Harnessing GPT-4 for large-scale data synthesis from existing datasets 

We created the first dataset for biomedical imaging parsing by harnessing GPT-4 for large-scale data synthesis from 45 existing biomedical segmentation datasets (Figure 1a and 1b). The key insight is to leverage readily available natural-language descriptions already in these datasets and use GPT-4 to organize this often messy, unstructured text with established biomedical object taxonomies.  

Specifically, we use GPT-4 to help create a unifying biomedical object taxonomy for image analysis and harmonize natural language descriptions from existing datasets with this taxonomy. We further leverage GPT-4 to synthesize additional variations of object descriptions to facilitate more robust text prompting.  

This enables us to construct BiomedParseData, a biomedical image analysis dataset comprising over 6 million sets of images, segmentation masks, and text descriptions drawn from more than 1 million images. This dataset includes 64 major biomedical object types, 82 fine-grained subtypes, and spans nine imaging modalities.

a, Box plot comparing the Dice score between our method and competing methods on 102,855 test instances (image–mask–label triples) across nine modalities. MedSAM and SAM require bounding box as input. We consider two settings: oracle bounding box (minimum bounding box covering the gold mask); bounding boxes generated from the text prompt by Grounding DINO, a state-of-the-art text-based grounding model. Each modality category contains multiple object types. Each object type was aggregated as the instance median to be shown in the plot. n in the plot denotes the number of test instances in the corresponding modality. b, Nine examples comparing the segmentation results by BiomedParse and the ground truth, using just the text prompt at the top. c, Box plot comparing the Dice score between our method and competing methods on a cell segmentation test set with n=42 images. BiomedParse requires only a single user operation (the text prompt ‘Glandular structure in colon pathology’). By contrast, to get competitive results, MedSAM and SAM require 430 operations (one bounding box per an individual cell). d, Five examples contrasting the segmentation results by BiomedParse and MedSAM, along with text prompts used by BiomedParse and bounding boxes used by MedSAM. e, Comparison between BiomedParse and MedSAM on a benign tumor image (top) and a malignant tumor image (bottom). The improvement of BiomedParse over MedSAM is even more pronounced on abnormal cells with irregular shapes. f, Box plot comparing the two-sided K–S test P values between valid text prompt and invalid text prompt. BiomedParse learns to reject invalid text prompts describing object types not present in the image (small P value). We evaluated a total of 4,887 invalid prompts and 22,355 valid prompts. g, Plot showing the precision and recall of our method on detecting invalid text prompts across different K–S test P value cutoff. h,i, Scatter-plots comparing the area under the receiver operating characteristic curve (AUROC) (h) and F1 (i) between BiomedParse and Grounding DINO on detecting invalid descriptions.
Figure 2: Comparison on large-scale biomedical image segmentation datasets.

State-of-the-art performance across 64 major object types in 9 modalities

We evaluated BiomedParse on a large held-out test set with 102,855 image-mask-label sets across 64 major object types in nine modalities. BiomedParse outperformed prior best methods such as MedSAM and SAM, even when oracle per-object bounding boxes were provided. In the more realistic setting when MedSAM and SAM used a state-of-the-art object detector (Grounding DINO) to propose bounding boxes, BiomedParse outperformed them by a wide margin, between 75 and 85 absolute points in dice score (Figure 2a). BiomedParse also outperforms a variety of other prominent methods such as SegVol, Swin UNETR, nnU-Net, DeepLab V3+, and UniverSeg.

a, Attention maps of text prompts for irregular-shaped objects, suggesting that BiomedParse learns rather faithful representation of their typical shapes. US, ultrasound. b–d, Scatter-plots comparing the improvement in Dice score for BiomedParse over MedSAM with shape regularity in terms of convex ratio (b), box ratio (c) and inversed rotational inertia (d). A smaller number in the x axis means higher irregularity on average. Each dot represents an object type. e, Six examples contrasting BiomedParse and MedSAM on detecting irregular-shaped objects. Plots are ordered from the least irregular one (left) to the most irregular one (right). f,g Comparison between BiomedParseData and the benchmark dataset used by MedSAM in terms of convex ratio (f) and box ratio (g). BiomedParseData is a more faithful representation of real-world challenges in terms of irregular-shaped objects. h, Box plots comparing BiomedParse and competing approaches on BiomedParseData and the benchmark dataset used by MedSAM. BiomedParse has a larger improvement on BiomedParseData, which contains more diverse images and more irregular-shaped objects. The number of object types are as follows: n=50 for MedSAM benchmark and n=112 for BiomedParseData.
Figure 3. Evaluation on detecting irregular-shaped objects.

Recognizing and segmenting irregular and complex objects

Biomedical objects often have complex and irregular shapes, which present significant challenges for segmentation, even with oracle bounding box. By joint learning with object recognition and detection, BiomedParse learns to model object-specific shapes, and its superiority is particularly pronounced for the most challenging cases (Figure 3). Encompassing a large collection of diverse object types in nine modalities, BiomedParseData also provides a much more realistic representation of object complexity in biomedicine.  

a, Six examples showing the results of object recognition by our method. Object recognition identifies and segments all objects in an image without requiring any user-provided input prompt. b–d, Scatter-plots comparing the F1 (b), Precision (c) and Recall (d) scores between BiomedParse and Grounding DINO on identifying objects presented in the image. e, Comparison between BiomedParse and Grounding DINO on object identification in terms of median F1 score across different numbers of objects in the image. f, Box plot comparing BiomedParse and MedSAM/SAM (using bounding boxes generated by Grounding DINO) on end-to-end object recognition (including segmentation) in relation to various modalities. g, Comparison between BiomedParse and MedSAM/SAM (using bounding boxes generated by Grounding DINO) on end-to-end object recognition (including segmentation) in relation to numbers of distinct objects in the image.
Figure 4. Evaluation on object recognition.

Promising step toward scaling holistic biomedical image analysis

By operating through a simple text prompt, BiomedParse requires substantially less user effort than prior best methods that typically require object-specific bounding boxes, especially when an image contains a large number of objects (Figure 2c). By modeling object recognition threshold, BiomedParse can detect invalid prompt and reject segmentation requests when an object is absent from the image. BiomedParse can be used to recognize and segment all known objects in an image in one fell swoop (Figure 4). By scaling holistic image analysis, BiomedParse can potentially be applied to key precision health applications such as early detection, prognosis, treatment decision support, and progression monitoring.  

Going forward, there are numerous growth opportunities. BiomedParse can be extended to handle more modalities and object types. It can be integrated into advanced multimodal frameworks such as LLaVA-Med (opens in new tab) to facilitate conversational image analysis by “talking to the data.” To facilitate research in biomedical image analysis, we have made BiomedParse open-source (opens in new tab) with Apache 2.0 license. We’ve also made it available on Azure AI (opens in new tab) for direct deployment and real-time inference. For more information, check out our demo. (opens in new tab) 

BiomedParse is a joint work with Providence and the University of Washington’s Paul G. Allen School of Computer Science & Engineering, and brings collaboration from multiple teams within Microsoft*. It reflects Microsoft’s larger commitment to advancing multimodal generative AI for precision health, with other exciting progress such as GigaPath (opens in new tab), BiomedCLIP (opens in new tab),  LLaVA-Rad (opens in new tab), BiomedJourney (opens in new tab), MAIRA (opens in new tab), Rad-DINO (opens in new tab), Virchow (opens in new tab).  

(Acknowledgment footnote) *: Within Microsoft, it is a wonderful collaboration among Health Futures, MSR Deep Learning, and Nuance. 

Paper co-authors: Theodore Zhao, Yu Gu, Jianwei Yang (opens in new tab), Naoto Usuyama (opens in new tab), Ho Hin Lee, Sid Kiblawi, Tristan Naumann (opens in new tab), Jianfeng Gao (opens in new tab), Angela Crabtree, Jacob Abel, Christine Moung-Wen, Brian Piening, Carlo Bifulco, Mu Wei, Hoifung Poon (opens in new tab), Sheng Wang (opens in new tab)

The post BiomedParse: A foundation model for smarter, all-in-one biomedical image analysis appeared first on Microsoft Research.

Read More