Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the country’s largest sovereign AI supercomputer, aimed at breakthroughs in quantum computing, clean energy, biotechnology and other areas serving Danish society and the world.

Denmark’s first AI supercomputer, named Gefion after a goddess in Danish mythology, is an NVIDIA DGX SuperPOD driven by 1,528 NVIDIA H100 Tensor Core GPUs and interconnected using NVIDIA Quantum-2 InfiniBand networking.

Gefion is operated by the Danish Center for AI Innovation (DCAI), a company established with funding from the Novo Nordisk Foundation, the world’s wealthiest charitable foundation, and the Export and Investment Fund of Denmark. The new AI supercomputer was symbolically turned on by King Frederik X of Denmark, Huang and Nadia Carlsten, CEO of DCAI, at an event in Copenhagen.

Huang sat down with Carlsten, a quantum computing industry leader, to discuss the public-private initiative to build one of the world’s fastest AI supercomputers in collaboration with NVIDIA.

The Gefion AI supercomputer comes to Copenhagen to serve industry, startups and academia.

“Gefion is going to be a factory of intelligence. This is a new industry that never existed before. It sits on top of the IT industry. We’re inventing something fundamentally new,” Huang said.

The launch of Gefion is an important milestone for Denmark in establishing its own sovereign AI. Sovereign AI can be achieved when a nation has the capacity to produce artificial intelligence with its own data, workforce, infrastructure and business networks. Having a supercomputer on national soil provides a foundation for countries to use their own infrastructure as they build AI models and applications that reflect their unique culture and language.

“What country can afford not to have this infrastructure, just as every country realizes you have communications, transportation, healthcare, fundamental infrastructures — the fundamental infrastructure of any country surely must be the manufacturer of intelligence,” said Huang. “For Denmark to be one of the handful of countries in the world that has now initiated on this vision is really incredible.”

The new supercomputer is expected to address global challenges with insights into infectious disease, climate change and food security. Gefion is now being prepared for users, and a pilot phase will begin to bring in projects that seek to use AI to accelerate progress, including in such areas as quantum computing, drug discovery and energy efficiency.

“The era of computer-aided drug discovery must be within this decade. I’m hoping that what the computer did to the technology industry, it will do for digital biology,” Huang said.

Supporting Next Generation of Breakthroughs With Gefion

The Danish Meteorological Institute (DMI) is in the pilot and aims to deliver faster and more accurate weather forecasts. It promises to reduce forecast times from hours to minutes while greatly reducing the energy footprint required for these forecasts when compared with traditional methods.

Researchers from the University of Copenhagen are tapping into Gefion to implement and carry out a large-scale distributed simulation of quantum computer circuits. Gefion enables the simulated system to increase from 36 to 40 entangled qubits, which brings it close to what’s known as “quantum supremacy,” or essentially outperforming a traditional computer while using less resources.

The University of Copenhagen, the Technical University of Denmark, Novo Nordisk and Novonesis are working together on a multi-modal genomic foundation model for discoveries in disease mutation analysis and vaccine design. Their model will be used to improve signal detection and the functional understanding of genomes, made possible by the capability to train LLMs on Gefion.

Startup Go Autonomous seeks training time on Gefion to develop an AI model that understands and uses multi-modal input from both text, layout and images. Another startup, Teton, is building an AI Care Companion with large video pretraining, using Gefion.

Addressing Global Challenges With Leading Supercomputer

The Gefion supercomputer and ongoing collaborations with NVIDIA will position Denmark, with its renowned research community, to pursue the world’s leading scientific challenges with enormous social impact as well as large-scale projects across industries.

With Gefion, researchers will be able to work with industry experts at NVIDIA to co-develop solutions to complex problems, including research in pharmaceuticals and biotechnology and protein design using the NVIDIA BioNeMo platform.

Scientists will also be collaborating with NVIDIA on fault-tolerant quantum computing using NVIDIA CUDA-Q, the open-source hybrid quantum computing platform.

Read More

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the country’s largest sovereign AI supercomputer, aimed at breakthroughs in quantum computing, clean energy, biotechnology and other areas serving Danish society and the world.

Denmark’s first AI supercomputer, named Gefion after a goddess in Danish mythology, is an NVIDIA DGX SuperPOD driven by 1,528 NVIDIA H100 Tensor Core GPUs and interconnected using NVIDIA Quantum-2 InfiniBand networking.

Gefion is operated by the Danish Center for AI Innovation (DCAI), a company established with funding from the Novo Nordisk Foundation, the world’s wealthiest charitable foundation, and the Export and Investment Fund of Denmark. The new AI supercomputer was symbolically turned on by King Frederik X of Denmark, Huang and Nadia Carlsten, CEO of DCAI, at an event in Copenhagen.

Huang sat down with Carlsten, a quantum computing industry leader, to discuss the public-private initiative to build one of the world’s fastest AI supercomputers in collaboration with NVIDIA.

The Gefion AI supercomputer comes to Copenhagen to serve industry, startups and academia.

“Gefion is going to be a factory of intelligence. This is a new industry that never existed before. It sits on top of the IT industry. We’re inventing something fundamentally new,” Huang said.

The launch of Gefion is an important milestone for Denmark in establishing its own sovereign AI. Sovereign AI can be achieved when a nation has the capacity to produce artificial intelligence with its own data, workforce, infrastructure and business networks. Having a supercomputer on national soil provides a foundation for countries to use their own infrastructure as they build AI models and applications that reflect their unique culture and language.

“What country can afford not to have this infrastructure, just as every country realizes you have communications, transportation, healthcare, fundamental infrastructures — the fundamental infrastructure of any country surely must be the manufacturer of intelligence,” said Huang. “For Denmark to be one of the handful of countries in the world that has now initiated on this vision is really incredible.”

The new supercomputer is expected to address global challenges with insights into infectious disease, climate change and food security. Gefion is now being prepared for users, and a pilot phase will begin to bring in projects that seek to use AI to accelerate progress, including in such areas as quantum computing, drug discovery and energy efficiency.

“The era of computer-aided drug discovery must be within this decade. I’m hoping that what the computer did to the technology industry, it will do for digital biology,” Huang said.

Supporting Next Generation of Breakthroughs With Gefion

The Danish Meteorological Institute (DMI) is in the pilot and aims to deliver faster and more accurate weather forecasts. It promises to reduce forecast times from hours to minutes while greatly reducing the energy footprint required for these forecasts when compared with traditional methods.

Researchers from the University of Copenhagen are tapping into Gefion to implement and carry out a large-scale distributed simulation of quantum computer circuits. Gefion enables the simulated system to increase from 36 to 40 entangled qubits, which brings it close to what’s known as “quantum supremacy,” or essentially outperforming a traditional computer while using less resources.

The University of Copenhagen, the Technical University of Denmark, Novo Nordisk and Novonesis are working together on a multi-modal genomic foundation model for discoveries in disease mutation analysis and vaccine design. Their model will be used to improve signal detection and the functional understanding of genomes, made possible by the capability to train LLMs on Gefion.

Startup Go Autonomous seeks training time on Gefion to develop an AI model that understands and uses multi-modal input from both text, layout and images. Another startup, Teton, is building an AI Care Companion with large video pretraining, using Gefion.

Addressing Global Challenges With Leading Supercomputer

The Gefion supercomputer and ongoing collaborations with NVIDIA will position Denmark, with its renowned research community, to pursue the world’s leading scientific challenges with enormous social impact as well as large-scale projects across industries.

With Gefion, researchers will be able to work with industry experts at NVIDIA to co-develop solutions to complex problems, including research in pharmaceuticals and biotechnology and protein design using the NVIDIA BioNeMo platform.

Scientists will also be collaborating with NVIDIA on fault-tolerant quantum computing using NVIDIA CUDA-Q, the open-source hybrid quantum computing platform.

Read More

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the country’s largest sovereign AI supercomputer, aimed at breakthroughs in quantum computing, clean energy, biotechnology and other areas serving Danish society and the world.

Denmark’s first AI supercomputer, named Gefion after a goddess in Danish mythology, is an NVIDIA DGX SuperPOD driven by 1,528 NVIDIA H100 Tensor Core GPUs and interconnected using NVIDIA Quantum-2 InfiniBand networking.

Gefion is operated by the Danish Center for AI Innovation (DCAI), a company established with funding from the Novo Nordisk Foundation, the world’s wealthiest charitable foundation, and the Export and Investment Fund of Denmark. The new AI supercomputer was symbolically turned on by King Frederik X of Denmark, Jensen and Nadia Carlsten, CEO of DCAI, at an event in Copenhagen.

Jensen sat down with Carlsten, a quantum computing industry leader, to discuss the public-private initiative to build one of the world’s fastest AI supercomputers in collaboration with NVIDIA.

The Gefion AI supercomputer comes to Copenhagen to serve industry, startups and academia.

“Gefion is going to be a factory of intelligence. This is a new industry that never existed before. It sits on top of the IT industry. We’re inventing something fundamentally new,” Jensen said.

The launch of Gefion is an important milestone for Denmark in establishing its own sovereign AI. Sovereign AI can be achieved when a nation has the capacity to produce artificial intelligence with its own data, workforce, infrastructure and business networks. Having a supercomputer on national soil provides a foundation for countries to use their own infrastructure as they build AI models and applications that reflect their unique culture and language.

“What country can afford not to have this infrastructure, just as every country realizes you have communications, transportation, healthcare, fundamental infrastructures — the fundamental infrastructure of any country surely must be the manufacturer of intelligence,” said Jensen. “For Denmark to be one of the handful of countries in the world that has now initiated on this vision is really incredible.”

The new supercomputer is expected to address global challenges with insights into infectious disease, climate change and food security. Gefion is now being prepared for users, and a pilot phase will begin to bring in projects that seek to use AI to accelerate progress, including in such areas as quantum computing, drug discovery and energy efficiency.

“The era of computer-aided drug discovery must be within this decade. I’m hoping that what the computer did to the technology industry, it will do for digital biology,” Jensen said.

Supporting Next Generation of Breakthroughs With Gefion

The Danish Meteorological Institute (DMI) is in the pilot and aims to deliver faster and more accurate weather forecasts. It promises to reduce forecast times from hours to minutes while greatly reducing the energy footprint required for these forecasts when compared with traditional methods.

Researchers from the University of Copenhagen are tapping into Gefion to implement and carry out a large-scale distributed simulation of quantum computer circuits. Gefion enables the simulated system to increase from 36 to 40 entangled qubits, which brings it close to what’s known as “quantum supremacy,” or essentially outperforming a traditional computer while using less resources.

The University of Copenhagen, the Technical University of Denmark, Novo Nordisk and Novonesis are working together on a multi-modal genomic foundation model for discoveries in disease mutation analysis and vaccine design. Their model will be used to improve signal detection and the functional understanding of genomes, made possible by the capability to train LLMs on Gefion.

Startup Go Autonomous seeks training time on Gefion to develop an AI model that understands and uses multi-modal input from both text, layout and images. Another startup, Teton, is building an AI Care Companion with large video pretraining, using Gefion.

Addressing Global Challenges With Leading Supercomputer

The Gefion supercomputer and ongoing collaborations with NVIDIA will position Denmark, with its renowned research community, to pursue the world’s leading scientific challenges with enormous social impact as well as large-scale projects across industries.

With Gefion, researchers will be able to work with industry experts at NVIDIA to co-develop solutions to complex problems, including research in pharmaceuticals and biotechnology and protein design using the NVIDIA BioNeMo platform.

Scientists will also be collaborating with NVIDIA on fault-tolerant quantum computing using NVIDIA CUDA-Q, the open-source hybrid quantum computing platform.

Read More

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the country’s largest sovereign AI supercomputer, aimed at breakthroughs in quantum computing, clean energy, biotechnology and other areas serving Danish society and the world.

Denmark’s first AI supercomputer, named Gefion after a goddess in Danish mythology, is an NVIDIA DGX SuperPOD driven by 1,528 NVIDIA H100 Tensor Core GPUs and interconnected using NVIDIA Quantum-2 InfiniBand networking.

Gefion is operated by the Danish Center for AI Innovation (DCAI), a company established with funding from the Novo Nordisk Foundation, the world’s wealthiest charitable foundation, and the Export and Investment Fund of Denmark. The new AI supercomputer was symbolically turned on by King Frederik X of Denmark, Jensen and Nadia Carlsten, CEO of DCAI, at an event in Copenhagen.

Jensen sat down with Carlsten, a quantum computing industry leader, to discuss the public-private initiative to build one of the world’s fastest AI supercomputers in collaboration with NVIDIA.

The Gefion AI supercomputer comes to Copenhagen to serve industry, startups and academia.

“Gefion is going to be a factory of intelligence. This is a new industry that never existed before. It sits on top of the IT industry. We’re inventing something fundamentally new,” Jensen said.

The launch of Gefion is an important milestone for Denmark in establishing its own sovereign AI. Sovereign AI can be achieved when a nation has the capacity to produce artificial intelligence with its own data, workforce, infrastructure and business networks. Having a supercomputer on national soil provides a foundation for countries to use their own infrastructure as they build AI models and applications that reflect their unique culture and language.

“What country can afford not to have this infrastructure, just as every country realizes you have communications, transportation, healthcare, fundamental infrastructures — the fundamental infrastructure of any country surely must be the manufacturer of intelligence,” said Jensen. “For Denmark to be one of the handful of countries in the world that has now initiated on this vision is really incredible.”

The new supercomputer is expected to address global challenges with insights into infectious disease, climate change and food security. Gefion is now being prepared for users, and a pilot phase will begin to bring in projects that seek to use AI to accelerate progress, including in such areas as quantum computing, drug discovery and energy efficiency.

“The era of computer-aided drug discovery must be within this decade. I’m hoping that what the computer did to the technology industry, it will do for digital biology,” Jensen said.

Supporting Next Generation of Breakthroughs With Gefion

The Danish Meteorological Institute (DMI) is in the pilot and aims to deliver faster and more accurate weather forecasts. It promises to reduce forecast times from hours to minutes while greatly reducing the energy footprint required for these forecasts when compared with traditional methods.

Researchers from the University of Copenhagen are tapping into Gefion to implement and carry out a large-scale distributed simulation of quantum computer circuits. Gefion enables the simulated system to increase from 36 to 40 entangled qubits, which brings it close to what’s known as “quantum supremacy,” or essentially outperforming a traditional computer while using less resources.

The University of Copenhagen, the Technical University of Denmark, Novo Nordisk and Novonesis are working together on a multi-modal genomic foundation model for discoveries in disease mutation analysis and vaccine design. Their model will be used to improve signal detection and the functional understanding of genomes, made possible by the capability to train LLMs on Gefion.

Startup Go Autonomous seeks training time on Gefion to develop an AI model that understands and uses multi-modal input from both text, layout and images. Another startup, Teton, is building an AI Care Companion with large video pretraining, using Gefion.

Addressing Global Challenges With Leading Supercomputer

The Gefion supercomputer and ongoing collaborations with NVIDIA will position Denmark, with its renowned research community, to pursue the world’s leading scientific challenges with enormous social impact as well as large-scale projects across industries.

With Gefion, researchers will be able to work with industry experts at NVIDIA to co-develop solutions to complex problems, including research in pharmaceuticals and biotechnology and protein design using the NVIDIA BioNeMo platform.

Scientists will also be collaborating with NVIDIA on fault-tolerant quantum computing using NVIDIA CUDA-Q, the open-source hybrid quantum computing platform.

Read More

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the country’s largest sovereign AI supercomputer, aimed at breakthroughs in quantum computing, clean energy, biotechnology and other areas serving Danish society and the world.

Denmark’s first AI supercomputer, named Gefion after a goddess in Danish mythology, is an NVIDIA DGX SuperPOD driven by 1,528 NVIDIA H100 Tensor Core GPUs and interconnected using NVIDIA Quantum-2 InfiniBand networking.

Gefion is operated by the Danish Center for AI Innovation (DCAI), a company established with funding from the Novo Nordisk Foundation, the world’s wealthiest charitable foundation, and the Export and Investment Fund of Denmark. The new AI supercomputer was symbolically turned on by King Frederik X of Denmark, Jensen and Nadia Carlsten, CEO of DCAI, at an event in Copenhagen.

Jensen sat down with Carlsten, a quantum computing industry leader, to discuss the public-private initiative to build one of the world’s fastest AI supercomputers in collaboration with NVIDIA.

The Gefion AI supercomputer comes to Copenhagen to serve industry, startups and academia.

“Gefion is going to be a factory of intelligence. This is a new industry that never existed before. It sits on top of the IT industry. We’re inventing something fundamentally new,” Jensen said.

The launch of Gefion is an important milestone for Denmark in establishing its own sovereign AI. Sovereign AI can be achieved when a nation has the capacity to produce artificial intelligence with its own data, workforce, infrastructure and business networks. Having a supercomputer on national soil provides a foundation for countries to use their own infrastructure as they build AI models and applications that reflect their unique culture and language.

“What country can afford not to have this infrastructure, just as every country realizes you have communications, transportation, healthcare, fundamental infrastructures — the fundamental infrastructure of any country surely must be the manufacturer of intelligence,” said Jensen. “For Denmark to be one of the handful of countries in the world that has now initiated on this vision is really incredible.”

The new supercomputer is expected to address global challenges with insights into infectious disease, climate change and food security. Gefion is now being prepared for users, and a pilot phase will begin to bring in projects that seek to use AI to accelerate progress, including in such areas as quantum computing, drug discovery and energy efficiency.

“The era of computer-aided drug discovery must be within this decade. I’m hoping that what the computer did to the technology industry, it will do for digital biology,” Jensen said.

Supporting Next Generation of Breakthroughs With Gefion

The Danish Meteorological Institute (DMI) is in the pilot and aims to deliver faster and more accurate weather forecasts. It promises to reduce forecast times from hours to minutes while greatly reducing the energy footprint required for these forecasts when compared with traditional methods.

Researchers from the University of Copenhagen are tapping into Gefion to implement and carry out a large-scale distributed simulation of quantum computer circuits. Gefion enables the simulated system to increase from 36 to 40 entangled qubits, which brings it close to what’s known as “quantum supremacy,” or essentially outperforming a traditional computer while using less resources.

The University of Copenhagen, the Technical University of Denmark, Novo Nordisk and Novonesis are working together on a multi-modal genomic foundation model for discoveries in disease mutation analysis and vaccine design. Their model will be used to improve signal detection and the functional understanding of genomes, made possible by the capability to train LLMs on Gefion.

Startup Go Autonomous seeks training time on Gefion to develop an AI model that understands and uses multi-modal input from both text, layout and images. Another startup, Teton, is building an AI Care Companion with large video pretraining, using Gefion.

Addressing Global Challenges With Leading Supercomputer

The Gefion supercomputer and ongoing collaborations with NVIDIA will position Denmark, with its renowned research community, to pursue the world’s leading scientific challenges with enormous social impact as well as large-scale projects across industries.

With Gefion, researchers will be able to work with industry experts at NVIDIA to co-develop solutions to complex problems, including research in pharmaceuticals and biotechnology and protein design using the NVIDIA BioNeMo platform.

Scientists will also be collaborating with NVIDIA on fault-tolerant quantum computing using NVIDIA CUDA-Q, the open-source hybrid quantum computing platform.

Read More

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact

NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the country’s largest sovereign AI supercomputer, aimed at breakthroughs in quantum computing, clean energy, biotechnology and other areas serving Danish society and the world.

Denmark’s first AI supercomputer, named Gefion after a goddess in Danish mythology, is an NVIDIA DGX SuperPOD driven by 1,528 NVIDIA H100 Tensor Core GPUs and interconnected using NVIDIA Quantum-2 InfiniBand networking.

Gefion is operated by the Danish Center for AI Innovation (DCAI), a company established with funding from the Novo Nordisk Foundation, the world’s wealthiest charitable foundation, and the Export and Investment Fund of Denmark. The new AI supercomputer was symbolically turned on by King Frederik X of Denmark, Jensen and Nadia Carlsten, CEO of DCAI, at an event in Copenhagen.

Jensen sat down with Carlsten, a quantum computing industry leader, to discuss the public-private initiative to build one of the world’s fastest AI supercomputers in collaboration with NVIDIA.

The Gefion AI supercomputer comes to Copenhagen to serve industry, startups and academia.

“Gefion is going to be a factory of intelligence. This is a new industry that never existed before. It sits on top of the IT industry. We’re inventing something fundamentally new,” Jensen said.

The launch of Gefion is an important milestone for Denmark in establishing its own sovereign AI. Sovereign AI can be achieved when a nation has the capacity to produce artificial intelligence with its own data, workforce, infrastructure and business networks. Having a supercomputer on national soil provides a foundation for countries to use their own infrastructure as they build AI models and applications that reflect their unique culture and language.

“What country can afford not to have this infrastructure, just as every country realizes you have communications, transportation, healthcare, fundamental infrastructures — the fundamental infrastructure of any country surely must be the manufacturer of intelligence,” said Jensen. “For Denmark to be one of the handful of countries in the world that has now initiated on this vision is really incredible.”

The new supercomputer is expected to address global challenges with insights into infectious disease, climate change and food security. Gefion is now being prepared for users, and a pilot phase will begin to bring in projects that seek to use AI to accelerate progress, including in such areas as quantum computing, drug discovery and energy efficiency.

“The era of computer-aided drug discovery must be within this decade. I’m hoping that what the computer did to the technology industry, it will do for digital biology,” Jensen said.

Supporting Next Generation of Breakthroughs With Gefion

The Danish Meteorological Institute (DMI) is in the pilot and aims to deliver faster and more accurate weather forecasts. It promises to reduce forecast times from hours to minutes while greatly reducing the energy footprint required for these forecasts when compared with traditional methods.

Researchers from the University of Copenhagen are tapping into Gefion to implement and carry out a large-scale distributed simulation of quantum computer circuits. Gefion enables the simulated system to increase from 36 to 40 entangled qubits, which brings it close to what’s known as “quantum supremacy,” or essentially outperforming a traditional computer while using less resources.

The University of Copenhagen, the Technical University of Denmark, Novo Nordisk and Novonesis are working together on a multi-modal genomic foundation model for discoveries in disease mutation analysis and vaccine design. Their model will be used to improve signal detection and the functional understanding of genomes, made possible by the capability to train LLMs on Gefion.

Startup Go Autonomous seeks training time on Gefion to develop an AI model that understands and uses multi-modal input from both text, layout and images. Another startup, Teton, is building an AI Care Companion with large video pretraining, using Gefion.

Addressing Global Challenges With Leading Supercomputer

The Gefion supercomputer and ongoing collaborations with NVIDIA will position Denmark, with its renowned research community, to pursue the world’s leading scientific challenges with enormous social impact as well as large-scale projects across industries.

With Gefion, researchers will be able to work with industry experts at NVIDIA to co-develop solutions to complex problems, including research in pharmaceuticals and biotechnology and protein design using the NVIDIA BioNeMo platform.

Scientists will also be collaborating with NVIDIA on fault-tolerant quantum computing using NVIDIA CUDA-Q, the open-source hybrid quantum computing platform.

Read More

ExecuTorch Beta: On-Device AI and LLMs, Stability, and Acceleration with Partners

TLDR

  • ExecuTorch has achieved Beta status with the release of v0.4, providing stable APIs and runtime, as well as extensive kernel coverage.
  • ExecuTorch is the recommended on-device inference engine for Llama 3.2 1B/3B models, offering enhanced performance and memory efficiency for both original and quantized models.
  • There has been a significant increase in adoption and ecosystem growth for ExecuTorch, and the focus is now on improving reliability, performance, and coverage for non-CPU backends as the next steps.

Current On-Device AI Market

The on-device AI market has been rapidly expanding, and is revolutionizing the way we interact with technology. It is unlocking new experiences, enabling personalization, and reducing latency. Traditionally, computer vision and speech recognition have been the primary use-cases for on-device AI, particularly in IoT, industrial applications, and mobile devices. However, the emergence of Large Language Models (LLMs) has made Generative AI the fastest growing sector in AI, subsequently highlighting the importance of on-device Generative AI. IDC forecasts by 2028, close to 1 billion GenAI capable smartphones being shipped worldwide.

LLMs are not only getting smaller but more powerful. This has led to the creation of a new class of applications that leverage multiple models for intelligent agents and streamlined workflows. The community is rapidly adopting and contributing to these new models, with quantized versions being created within hours of model release. Several leading technology companies are investing heavily in small LLMs, even deploying Low-Rank Adaptation (LoRA) at scale on-device to transform user experiences.

However, this rapid progress comes at a cost. The fragmentation of our on-device AI landscape creates complexity and inefficiency when going from model authoring to edge deployment. This is where PyTorch’s ExecuTorch comes in – our Beta announcement marks an important milestone in addressing these challenges and empowering developers to create innovative, AI-powered applications.

What’s New Today

It’s been exactly one year since we first open sourced ExecuTorch, six months since Alpha release, and today, we’re excited to announce three main developments:

1. Beta. ExecuTorch has reached Beta status starting from v0.4! It is now widely adopted and used in production environments across Meta. Through this adoption process we’ve identified and addressed feature gaps, improved stability, and expanded kernel and accelerator coverage. These improvements make us confident to promote ExecuTorch from Alpha to Beta status, and we are happy to welcome the community to adopt it in their own production settings. Here are three concrete enhancements:

  1. Developers can write application code and include the latest ExecuTorch as a dependency, updating when needed with a clean API contract. This is possible due to our API stabilization efforts, as well as our explicit API lifecycle and backwards compatibility policy.
  2. Running ExecuTorch on CPUs reached the necessary performance, portability and coverage. In particular, we have implemented more than 85% of all core ATen operators as part of our portable CPU kernels library to ensure running a model on ExecuTorch just works in most cases and making missing ops an exception rather than the norm. Moreover, we integrated and extensively tested our XNNPACK delegate for high performance on a wide range of CPU architectures. It is used in a number of production cases today.
  3. In addition to the low-level ExecuTorch components for greater portability, we built extensions and higher-level abstractions to support more common use-cases such as developer tooling to support on-device debugging and profiling, and Module.h extension to simplify deployment for mobile devices.

2. On-Device Large-Language Models (LLMs). There has been a growing interest in the community to deploy Large Language Models (LLMs) on edge devices, as it offers improved privacy and offline capabilities. However, these models are quite large, pushing the limits of what is possible. Fortunately, ExecuTorch can support these models, and we’ve enhanced the overall framework with numerous optimizations.

  • ExecuTorch is the recommended framework to run latest Llama models on-device with excellent performance today. The Llama 3.2 1B/3B models are well-suited for mobile deployment, and it is especially true with the official quantized 1B/3B model releases from Meta, as it provides a great balance between performance, accuracy, and size. When deploying Llama 3.2 1B/3B quantized models, decode latency improved by 2.5x and prefill latency improved by 4.2x on average, while model size decreased by 56% and memory usage reduced by 41% on average when benchmarked on Android OnePlus 12 device (we’ve also verified similar relative performance on Samsung S24+ for 1B and 3B, and Samsung S22 for 1B). For Llama 3.2 1B quantized model, for example, ExecuTorch is able to achieve 50.2 tokens/s for decoding and 260 tokens/s for prefill on the OnePlus 12, using the latest CPU kernels from XNNPACK and Kleidi libraries. These quantized models allow developers to integrate LLMs into memory and power-constrained devices while still maintaining quality and safety.
  • One of the value propositions of ExecuTorch is being able to use accelerators on mobile devices seamlessly. In fact, ExecuTorch also showcased accelerators to achieve even greater performance running Llama across Apple MPS backend, Qualcomm AI Accelerator, and MediaTek AI Accelerator.
  • There has been growing community and industry interest in multimodal and beyond text-only LLMs, evidenced by Meta’s Llama 3.2 11B/90B vision models and open-source models like Llava. We have so far enabled Llava 1.5 7B model on phones via ExecuTorch, making many optimizations, notably reducing runtime memory from 11GB all the way down to 5GB.

3. Ecosystem and Community Adoption
Now that ExecuTorch is in Beta, it is mature enough to be used in production. It is being increasingly used at Meta across various product surfaces. For instance, ExecuTorch already powers various ML inference use cases across Meta’s Ray-Ban Meta Smart Glasses and Quest 3 VR headsets as well as Instagram and WhatsApp.

We also partnered with Hugging Face to provide native ExecuTorch support for models being exported using torch.export. This collaboration ensures exported artifacts can directly be lowered and run efficiently on various mobile and edge devices. Models like gemma-2b and phi3-mini are already supported and more foundational models support is in progress.

With stable APIs and Gen AI support, we’re excited to build and grow ExecuTorch with the community. The on-device AI community is growing rapidly and finding ways to adopt ExecuTorch across various fields. For instance, ExecuTorch is being used in a mobile app built by Digica to streamline inventory management in hospitals. As another example, Software Mansion developed an app, EraserAI, to remove unwanted objects from a photo with EfficientSAM running on-device with ExecuTorch via Core ML delegate.

Towards General Availability (GA):
Since the original release of ExecuTorch alpha, we’ve seen a growing interest within the community in using ExecuTorch in various production environments. To that end, we have made great progress towards more stabilized and matured APIs and have made a significant investment in community support, adoption and contribution to ExecuTorch. As are are getting close to GA, we are investing our efforts in the following areas:

  • Non-CPU backends: Bringing non-CPU backends to even greater robustness, coverage and performance is our next goal. From day one of our original launch, we have partnered with Apple (for Core ML and MPS), Arm (for EthosU NPU) and Qualcomm (for Hexagon NPU) on accelerator integration with ExecuTorch, and we’ve since then expanded our partnership to MediaTek (NPU) and Cadence (XTensa DSP). We’re also building Vulkan GPU integration in-house. In terms of feature coverage, we’ve successfully implemented the core functionalities with our partners, ensured seamless integration with our developer tooling, and showcased successful LLM integration with many of the accelerators. Our next big step is to thoroughly validate the performance and reliability of the system in real-world, production use-cases. This stage will help us fine-tune the experience and ensure the stability needed for smooth operations.

  • Benchmarking infra: As part of our ongoing testing efforts, we’ve developed a benchmarking infrastructure along with a public dashboard to showcase our progress toward on-device model inference benchmarking. This allows us to transparently track and display model coverage across various backends, giving our community real-time insights into how we’re advancing towards our goals.

We’re excited to share these developments with you and look forward to continued improvements in collaboration with our partners and the community! We welcome community contribution to help us make ExecuTorch the clear choice for deploying AI and LLM models on-device. We invite you to start using ExecuTorch in your on-device projects, or even better consider contributing to it. You can also report any issues on our GitHub page.

Read More

PyTorch 2.5 Release Blog

We are excited to announce the release of PyTorch® 2.5 (release note)! This release features a new CuDNN backend for SDPA, enabling speedups by default for users of SDPA on H100s or newer GPUs. As well, regional compilation of torch.compile offers a way to reduce the cold start up time for torch.compile by allowing users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Finally, TorchInductor CPP backend offers solid performance speedup with numerous enhancements like FP16 support, CPP wrapper, AOT-Inductor mode, and max-autotune mode.

This release is composed of 4095 commits from 504 contributors since PyTorch 2.4. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.5. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

As well, please check out our new ecosystem projects releases with TorchRec and TorchFix.

Beta Prototype
CuDNN backend for SDPA FlexAttention
torch.compile regional compilation without recompilations Compiled Autograd
TorchDynamo added support for exception handling & MutableMapping types Flight Recorder
TorchInductor CPU backend optimization Max-autotune Support on CPU with GEMM Template
TorchInductor on Windows
FP16 support on CPU path for both eager mode and TorchInductor CPP backend
Autoload Device Extension
Enhanced Intel GPU support

*To see a full list of public feature submissions click here.

BETA FEATURES

[Beta] CuDNN backend for SDPA

The cuDNN “Fused Flash Attention” backend was landed for torch.nn.functional.scaled_dot_product_attention. On NVIDIA H100 GPUs this can provide up to 75% speed-up over FlashAttentionV2. This speedup is enabled by default for all users of SDPA on H100 or newer GPUs.

[Beta] torch.compile regional compilation without recompilations

Regional compilation without recompilations, via torch._dynamo.config.inline_inbuilt_nn_modules which default to True in 2.5+. This option allows users to compile a repeated nn.Module (e.g. a transformer layer in LLM) without recompilations. Compared to compiling the full model, this option can result in smaller compilation latencies with 1%-5% performance degradation compared to full model compilation.

See the tutorial for more information.

[Beta] TorchInductor CPU backend optimization

This feature advances Inductor’s CPU backend optimization, including CPP backend code generation and FX fusions with customized CPU kernels. The Inductor CPU backend supports vectorization of common data types and all Inductor IR operations, along with the static and symbolic shapes. It is compatible with both Linux and Windows OS and supports the default Python wrapper, the CPP wrapper, and AOT-Inductor mode.

Additionally, it extends the max-autotune mode of the GEMM template (prototyped in 2.5), offering further performance gains. The backend supports various FX fusions, lowering to customized kernels such as oneDNN for Linear/Conv operations and SDPA. The Inductor CPU backend consistently achieves performance speedups across three benchmark suites—TorchBench, Hugging Face, and timms—outperforming eager mode in 97.5% of the 193 models tested.

PROTOTYPE FEATURES

[Prototype] FlexAttention

We’ve introduced a flexible API that enables implementing various attention mechanisms such as Sliding Window, Causal Mask, and PrefixLM with just a few lines of idiomatic PyTorch code. This API leverages torch.compile to generate a fused FlashAttention kernel, which eliminates extra memory allocation and achieves performance comparable to handwritten implementations. Additionally, we automatically generate the backwards pass using PyTorch’s autograd machinery. Furthermore, our API can take advantage of sparsity in the attention mask, resulting in significant improvements over standard attention implementations.

For more information and examples, please refer to the official blog post and Attention Gym.

[Prototype] Compiled Autograd

Compiled Autograd is an extension to the PT2 stack allowing the capture of the entire backward pass. Unlike the backward graph traced by AOT dispatcher, Compiled Autograd tracing is deferred until backward execution time, which makes it impervious to forward pass graph breaks, and allows it to record backward hooks into the graph.

Please refer to the tutorial for more information.

[Prototype] Flight Recorder

Flight recorder is a new debugging tool that helps debug stuck jobs. The tool works by continuously capturing information about collectives as they run. Upon detecting a stuck job, the information can be used to quickly identify misbehaving ranks/machines along with code stack traces.

For more information please refer to the following tutorial.

[Prototype] Max-autotune Support on CPU with GEMM Template

Max-autotune mode for the Inductor CPU backend in torch.compile profiles multiple implementations of operations at compile time and selects the best-performing one. This is particularly beneficial for GEMM-related operations, using a C++ template-based GEMM implementation as an alternative to the ATen-based approach with oneDNN and MKL libraries. We support FP32, BF16, FP16, and INT8 with epilogue fusions for x86 CPUs. We’ve seen up to 7% geomean speedup on the dynamo benchmark suites and up to 20% boost in next-token latency for LLM inference.

For more information please refer to the tutorial.

[Prototype] TorchInductor CPU on Windows

Inductor CPU backend in torch.compile now works on Windows. We support MSVC (cl), clang (clang-cl) and Intel compiler (icx-cl) for Windows inductor currently.

See the tutorial for more details.

[Prototype] FP16 support on CPU path for both eager mode and TorchInductor CPP backend

Float16 is a commonly used reduced floating point type for performance improvement in neural network inference/training. Since this release, float16 for both eager and TorchInductor is supported on the CPU path.

[Prototype] Autoload Device Extension

PyTorch now supports autoloading for out-of-tree device extensions, streamlining integration by eliminating the need for manual imports. This feature, enabled through the torch.backends entrypoint, simplifies usage by ensuring seamless extension loading, while allowing users to disable it via an environment variable if needed.

See the tutorial for more information.

[Prototype] Enhanced Intel GPU support

Intel GPUs support enhancement is now available for both Intel® Data Center GPU Max Series and Intel® Client GPUs (Intel® Core™ Ultra processors with built-in Intel® Arc™ graphics and Intel® Arc™ Graphics for dGPU parts), which is to make it easier to accelerate your Machine Learning workflows on Intel GPUs in PyTorch 2.5 release. We also enabled the initial support of PyTorch on Windows for Intel® Client GPUs in this release.

  • Expanded PyTorch hardware backend support matrix to include both Intel Data Center and Client GPUs.  
  • The implementation of SYCL* kernels to enhance coverage and execution of Aten operators on Intel GPUs to boost performance in PyTorch eager mode.
  • Enhanced Intel GPU backend of torch.compile to improve inference and training performance for a wide range of deep learning workloads.

These features are available through PyTorch preview and nightly binary PIP wheels. For more information regarding Intel GPU support, please refer to documentation.

Read More

Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison

The goal of aligning language models to human preferences requires data that reveal these preferences. Ideally, time and money can be spent carefully collecting and tailoring bespoke preference data to each downstream application. However, in practice, a select few publicly available preference datasets are often used to train reward models for reinforcement learning from human feedback (RLHF). While new preference datasets are being introduced with increasing frequency, there are currently no existing efforts to measure and compare these datasets. In this paper, we systematically study…Apple Machine Learning Research

CtrlSynth: Controllable Image-Text Synthesis for Data-Efficient Multimodal Learning

Pretraining robust vision or multimodal foundation models (e.g., CLIP) relies on large-scale datasets that may be noisy, potentially misaligned, and have long-tail distributions. Previous works have shown promising results in augmenting datasets by generating synthetic samples. However, they only support domain-specific ad hoc use cases (e.g., either image or text only, but not both), and are limited in data diversity due to a lack of fine-grained control over the synthesis process. In this paper, we design a controllable image-text synthesis pipeline, CtrlSynth, for data-efficient and robust…Apple Machine Learning Research