PyTorch – Page 2 – Vedere AI

HuggingFace Safetensors Support in PyTorch Distributed Checkpointing

June 6, 2025

by Ankita George, Saurabh Mishra, Joe Cummings, Philip Bontrager, Daulet Askarov, Teja Rao, Chien-Chin Huang, Ela Krepska, Jafar Taghiyar PyTorch

Summary

PyTorch Distributed Checkpointing (DCP) is making investments into addressing the interoperability blockers to ensure that popular formats, like HuggingFace safetensors, can work well with PyTorch’s ecosystem. Since HuggingFace has become a leading format in inference and fine-tuning, DCP is beginning to support HuggingFace safetensors. The first customer of these changes is torchtune, who has seen an improved user experience as they can now cleanly read and write directly to HuggingFace with DCP APIs.

Problem

Since HuggingFace is used widely, with over 5 million users, many ML engineers would like to save and load their checkpoints in safetensors format to be able to easily work with their ecosystem. By supporting safetensors format natively in DCP, checkpointing is simplified for our users in the following ways:

DCP currently has its own custom format, so users who want to work with HuggingFace models, but leverage DCP’s performance wins and features, had to build custom converters and components so that they could work between both systems.
Instead of users having to download and upload their checkpoints to local storage every time, HuggingFace models can now be saved and loaded directly into the fsspec-supported storage of their choosing.

How to Use

From a user’s perspective, the only change needed to use safetensors is to call load with the new load planner and storage reader, and similarly save with the new save planner and storage writer.

The load and save APIs are called as follows:


load(
	state_dict=state_dict,
	storage_reader=HuggingFaceStorageReader(path=path),
)

save(
	state_dict=state_dict,
	storage_writer=HuggingFaceStorageWriter(
				path=path,
				fqn_to_index_mapping=mapping
			),
)

The HuggingFaceStorageReader and HuggingFaceStorageWriter can take any fsspec based path and so it can read/write in HF safetensors format to any fsspec supported back-end, including local storage and HF storage. Since HuggingFace safetensors metadata doesn’t natively provide the same level of information as DCP metadata, distributed checkpoints are currently not well-supported in these APIs, but DCP plans on supporting this natively in the future.

torchtune

Our first customer of HuggingFace DCP support is torchtune – a post-training library written in native PyTorch. The primary way torchtune users retrieve model weights is from the Hugging Face Hub. Before, users had to download the model weights and upload the trained checkpoints via extra CLI commands; the new DCP APIs allow them to directly read and write to HuggingFace, resulting in a much better user experience.

In addition, the support of safetensor serialization in DCP greatly simplifies the checkpointing code in torchtune. No longer will there need to be format-specific checkpointing solutions, thus increasing developer efficiency in the project.

Future Work

DCP plans to handle the distributed loading and saving of HuggingFace safetensors checkpoints with resharding. DCP also plans to support the ability to produce a consolidated final checkpoint to a single file for publishing.

Introducing the PyTorch Ecosystem Working Group and Project Spotlights

June 5, 2025

by PyTorch Ecosystem Working Group PyTorch

The PyTorch Ecosystem goes back several years, with some of its earliest projects like Hugging Face, Fast.ai, and PyTorch Lightning going on to grow incredible communities of their own. The goal from the beginning was to bring together innovative open source AI projects that extend, integrate with, or build upon PyTorch. Some of the key aspects we looked at were, for example, that they were well tested and maintained (including CI), were easy to onboard as a user, and there was a growing community. Fast forward several years, and the ecosystem continues to thrive with a vibrant landscape of dozens of projects spanning privacy, computer vision, to reinforcement learning. Enter the PyTorch Ecosystem Working Group.

In early 2025, the PyTorch Foundation created the PyTorch Ecosystem Working Group to showcase projects that could be of interest to the community and represent projects that are mature and healthy, standing out in their respective domain. The working group, composed of members across the ecosystem, was tasked with defining a clear bar including functional requirements (e.g., CI, licensing…), measurable requirements (e.g., commits and contributors), and the implementation of best practices for how to structure their repos. The working group also implemented a streamlined submission and review process and a transparent lifecycle. It’s still very early, but the reception from the community has been great, with 21 submissions so far and a strong pipeline of projects in review. You can learn more about this working group’s goals here, including the requirements and application process.

As part of this new blog series, every quarter we will update the community on new entries in the PyTorch Ecosystem, as well as highlight up and coming projects that are in consideration that will benefit from more eyes and contributors.

Ecosystem Project Spotlights

We’re happy to welcome SGlang and docTR to the PyTorch Ecosystem. Here’s a short intro to both.

SGLang

SGLang is a fast-serving engine for large language models and vision language models. It makes the interaction with models faster and more controllable by co-designing the backend runtime and frontend language.

The core features include:

Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, and quantization (FP8/INT4/AWQ/GPTQ).
Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
Active Community: SGLang is open source and backed by an active community with industry adoption.

SGLang is famous for its fast speed. It can often significantly outperform other state-of-the-art frameworks in terms of serving throughput and latency. Learn more.

docTR

docTR is an Apache 2.0 project developed and distributed by Mindee to help developers integrate OCR capabilities into applications with no prior knowledge required.

To quickly and efficiently extract text information, docTR uses a two-stage approach:

First, it performs text detection to localize words.
Then, it conducts text recognition to identify all characters in a word.

Detection and recognition are performed by state-of-the-art models written in PyTorch. Learn more.

Up and Coming Project Spotlights

As part of this series, we highlight projects that are in consideration for the PyTorch Ecosystem, and that we believe will benefit from more eyes and contributors. This time it’s the turn of EIR and torchcvnn.

EIR

EIR is a comprehensive deep learning framework built on PyTorch that enables researchers and developers to perform supervised modeling, sequence generation, image/array generation, and survival analysis across multiple data modalities. EIR specializes in handling complex data types, including genotype, tabular, sequence, image, array, and binary inputs. While it has particular strengths in genomics and biomedical applications, its versatile handling of these diverse data types allows for broader applications across various sectors. For example, EIR’s multi-modal approach can enhance tasks such as detecting manufacturing defects by linking images with equipment readings (e.g., for an imperfect silicon wafer), monitoring infrastructure by analyzing site photos along with operational logs (e.g., to identify cracks in a pipeline), or improving retail insights by combining product images with their descriptions and sales figures. This demonstrates how EIR’s multi-modal capabilities can bring value to a wide range of industries.

The framework provides a high-level, yet modular API that reduces the amount of boilerplate code and pre-processing required to train models, allowing users to focus on their end goals rather than implementation details. To learn more and explore practical examples, please refer to the documentation.

Key features include:

Multi-modal inputs: Seamless integration of genotype, tabular, sequence, image, array, and binary data.
Varied modeling options: Use any of the input modalities above for supervised learning, sequence generation, image/array generation, and survival analysis.
Scaling: Capabilities for custom data streaming for model training.
Explainability: Built-in explainability functionality for when performing supervised learning and survival analysis.
Model Deployment: Serve any of your trained models with just one command, allowing you or others to interact with your models via web services.

To explore EIR and consider how it might enhance your work with multi-modal data:

Install: Installation is simple:
- pip install eir-dl
- Please refer to the README for more information.
Explore tutorials: Documentation available at eir.readthedocs.io, examples include:
- Genetic ancestry prediction: Explore the basics of how EIR works by training a model to predict genetic ancestry.
- Multi-modal training: Combine tabular, text, and image data for pet adoption prediction.
- Sequence generation: Train and serve an image captioning model.
- Image Generation: Use guided diffusion for image generation.
- Scaling: Implement custom data streaming for GPT-style training.
Contribute: Please visit our GitHub repository.

torchcvnn

torchcvnn is a library that helps researchers, developers, and organizations to easily experiment with Complex-valued Neural Networks (CVNNs)! In several domains, data are naturally represented in real-imaginary form, for instance, remote sensing, MRI, and many more. These domains would benefit from direct complex-valued computations, giving understanding about critical physical characteristics to the neural networks during the learning process.

torchcvnn gives you easy access to:

Standard datasets for both remote sensing (SLC and ALOS2 formats) and MRI, and for different tasks (classification, segmentation, reconstruction, super-resolution)

Various activation functions, either operating independently on the real/imaginary components or fully exploiting the complex nature of the representations,

Normalization layers with the complex-valued BatchNorm of Trabelsi et al.(2018), LayerNorm, RMSNorm,

Complex-valued attention layer as introduced in Eilers et al. (2023),

PyTorch already supports optimization of complex-valued neural networks by implementing Wirtinger Calculus. However, there are still complex-valued building blocks missing to really be able to explore the capabilities of complex-valued neural networks. The objective of torchcvnn is to fill in this gap and to provide a library helping the PyTorch users to dig into the realm of complex-valued neural networks.

torchcvnn warmly welcomes contributions to both the core torchcvnn library or to the examples’ repository for whether spotting a bug, having suggestions for improvements, or even wanting to contribute to the source code. All the components are described in the documentation of the project. The torchcvnn team will be present at IJCNN 2025 in July in Rome during the special session on “Complex- and Hypercomplex-valued Neural Networks.”

How to Join the PyTorch Ecosystem

If you’re developing a project that supports the PyTorch community, you’re welcome to apply for inclusion in the Ecosystem. Please review the PyTorch Ecosystem review process to ensure that you meet the minimum expectations before applying.

Cheers!

The PyTorch Ecosystem Working Group

Open Source AI is Transforming the Economy—Here’s What the Data Shows

June 4, 2025

by Frank Nagle, Assistant Professor in the Strategy Unit at Harvard Business School and Advising Chief Economist at the Linux Foundation PyTorch

Blog cross-posted on the Linux Foundation blog.

As we approach the midpoint of 2025, the potential of AI to transform businesses, economies, and industries is not only widely anticipated and nearly universal but also well documented. In a commissioned project by Meta, LF Research set out to capture existing evidence on this topic, with the specific aim of understanding how open source is playing a role in this transformation.

In its latest publication, The Economic and Workforce Impacts of Open Source AI, LF Research describes the nuances of how and to what extent open source AI (OSAI) is impacting the global economy and workforce. By examining existing evidence from industry, academic, and open source research, the authors found important insights on OSAI’s adoption rates, cost effectiveness, innovation-boosting potential, and more. Here are the big takeaways.

First, the adoption of open source AI is already widespread. Nearly all software developers have experimented with open models, and about 63% of companies are actively using them. In fact, among organizations that have embraced AI in any form, a striking 89% incorporate open source AI somewhere in their infrastructure. It’s no longer a fringe approach—it’s becoming the standard.

89 percent of orgs incorporate open source AI somewhere in their infrastructure

Why? Cost is a huge factor. Open source tools often come with significantly lower price tags than their proprietary counterparts. My prior research with Manuel Hoffmann and Yanuo Zhou has shown that if open source didn’t exist, companies would spend 3.5 times more on software than they currently do. The new LF report shows that two-thirds of organizations say OSAI is cheaper to deploy, and nearly half cite cost savings as a primary reason for choosing open source. Combine that with studies showing AI’s ability to cut business unit costs by over 50%, while still being user friendly and maintaining high performance, and it’s clear that OSAI represents a strategic advantage for boosting margins and scaling innovation.

two-thirds of organizations say Open Source AI is cheaper to deploy

Innovation and entrepreneurship are other major benefits of open source. In research with Nataliya Langburd Wright and Shane Greenstein, we found that when open source contributions increase at the country level, so do new startups; at the company level, there is a positive relationship between contributing to open source and startup growth. Open source encourages collaboration, inviting contributions from a global pool of developers and researchers. This external input helps accelerate the development of high-quality models. As Daniel Yue and I found when Meta donated the machine learning library PyTorch to the Linux Foundation, there was a notable increase in corporate contributions, especially from chip manufacturers.

Open Source AI encourages collaboration and accelerates the development of high-quality models

AI’s cost-cutting capabilities are not only linked to the increased productivity that comes from freed-up resources, but also from a re-orienting of the way people work—similar to how the full impact of the steam engine led to the industrial revolution, but only after factories re-oriented their entire work flow around it. Manuel Hoffmann, Sam Boysel, Kevin Xu, Sida Peng, and I found this to be the case with software developers. When GitHub rolled out their GenAI coding tool Copilot, developers changed the way that they worked by spending more time writing code and substantially less time doing project management. However, according to existing research identified in the LF study, this has not translated to substantial layoffs: 95% of surveyed hiring managers over the past two years said they do not plan to reduce headcount due to AI. What’s more, being able to use AI tools effectively may actually increase wages by over 20%.

Looking ahead, open source AI is likely to become foundational in areas like edge computing, where smaller, privacy-preserving models need to run efficiently on local devices. OSAI is also making big inroads in industry-specific applications. In manufacturing, for instance, open models offer the flexibility required to integrate AI into complex operational workflows. And in healthcare—a traditionally conservative and risk-averse field—open models are already matching proprietary ones in performance, giving institutions confidence to adopt without compromising on quality. OSAI is an important avenue to level the playing field, no matter your organization’s size or financial resources—as the report found, small businesses are adopting OSAI at higher rates than their larger counterparts.

small businesses are adopting open source AI at higher rates than their larger counterparts

OSAI is an economic force. It’s reducing costs, accelerating innovation, and empowering a wider range of players to shape the future of technology.

Read the Report

What’s Next for OSAI? Five Areas Ripe for Research

While the impact of OSAI is starting to take shape, the full scope of its influence is just beginning to unfold. To better understand and harness the potential of OSAI, the report outlines five key areas for future research, each crucial to shaping smart policy, business strategy, and innovation ecosystems.

Tracking the Bigger Picture: OSAI’s Role in Market Growth
One pressing question is how open models are influencing the overall AI market. Beyond the tools themselves, OSAI may be driving complementary innovation, spurring growth in services, applications, and platforms built on top of open infrastructure. Understanding this broader ripple effect is essential for grasping the true economic footprint of open AI.
Making the Case for Investment
To help make informed decisions, researchers are encouraged to analyze the return on investment in OSAI infrastructure at both country and company levels. Quantifying the long-term value of these open components, from datasets and compute to developer tooling, can guide resource allocation and policy decisions in a fast-moving field.
Connecting Openness to Innovation
Does OSAI directly foster more startups, patents, or efficient R&D? Future studies should explore how open access to models and tools correlates with concrete innovation metrics. This could provide evidence for how openness accelerates not just adoption, but invention.
Crunching the Cost Numbers
A detailed comparison of costs between open and proprietary AI solutions across sectors, company sizes, and global regions would shed light on who benefits most from going open. These insights would be invaluable for organizations navigating tight budgets and evaluating technology strategies.
Understanding Workforce Impacts
Finally, the human side matters. As AI tools reshape work, it’s vital to measure how open models affect worker productivity, satisfaction, and work patterns. Do open tools empower workers in certain tasks or industries more than others? Do they lead to more flexible, fulfilling roles? Answers to these questions will help ensure that AI benefits not just business, but people.

By exploring these future research areas, we can unlock a deeper understanding of how open source AI is transforming the global economy and workforce. The era of open source AI is here—and it’s time to study its impact with depth and rigor.

Build Responsible AI Products with your own Yellow Teaming LLM

June 4, 2025

by Zach Lasiuk, Principal Solutions Designer, Arm PyTorch

The tools we use to build AI are evolving fast, with PyTorch at the heart of many advances. But unless we evolve the way we approach building AI systems, we risk amplifying harm as fast as we’re scaling up performance. Building AI responsibly means designing systems that not only perform well but do so fairly, safely, and transparently—like making sure an AI hiring tool doesn’t favor one demographic over another.

One useful approach to developing responsible AI systems is Yellow Teaming—a proactive exercise that surfaces potential unintended consequences before deployment. Yellow Teaming helps companies stand out in a crowded market by making more thoughtful, impact-aware design choices that lead to an overall better product.

In this blog, we show how you can quickly create a PyTorch-based LLM Yellow Teaming assistant running on AWS Graviton4 with a reusable system prompt. We also give you an example to show you how to use your new assistant to explore unintended business-critical consequences of feature design and ultimately build better products.

Let’s get started.

What is Yellow Teaming:

You may already be aware of the more popular term Red Teaming in cybersecurity, which involves simulating how adversaries might attack your product and fixing vulnerabilities before launch. Other color-coded approaches exist (like Blue Teams that defend against attacks), but Yellow Teaming is distinct in focusing on thoughtful design and implementation from the start of the product’s lifecycle. Red Teaming practices have already been adapted to the AI domain. Yellow Teaming principles are now becoming an important part of AI development as well.

The practice of Yellow Teaming asks a set of probing questions to help reveal the broader, unintended impacts of your product on your business, your users, and society at large. This application of Yellow Teaming, and the rationale behind it, are explained eloquently in the Development in Progress essay by The Consilience Project. A closely related practice is also offered in the module, Minimizing Harmful Consequences, in the Center for Humane Technology free course.

Why Does Yellow Teaming Matter?

The central idea is that by analyzing the consequences of your product decisions with a wide view, you can design better products that create positive feedback loops for your company’s bottom line and your users’ well-being. For example, it helps you avoid building a chatbot that unintentionally reinforces bias.

Traditional product development practices often solve for narrowly defined success metrics. Creating specific product measurables is good for focus and accountability, but can lead to over-optimization on metrics while ignoring other signals that matter to your company. For instance, building an app with AI-driven recommendations that boosts engagement in the short term but makes people feel worse and fails to retain users over time.

Narrow product optimization tends to cause unmeasured negative effects. These include users getting burnt out or frustrated when using your product, reputational harm or less overall engagement with your company, and society fracturing from lack of trust and meaningful communication.

In many cases, what looks like product success on paper is actually harming your users, your company, and your long-term goals.

How to Implement Yellow Teaming Practices

Yellow Teaming is straightforward and powerful. Pick a product you are building, and systematically evaluate the various consequences for your users, your business, and society when adopted at scale. Start with direct consequences, then move to second- and third-order consequences by asking ‘what happens as a result of the previous effects?’ You should think through these consequences across multiple axis:

Good and bad
Short-term and long-term
Intended and unintended
Your company and your users
A single user and groups of users

These types of questions help foster productive brainstorming:

What kinds of behaviors will this feature incentivize in users?
What affordances does this technology provide (what can users now do that they couldn’t before, even if unintended)?
Will this improve or degrade trust in our platform?
What social groups might benefit—or be left behind?

Yellow Teaming is based on complex systems thinking and externality analysis—fields that have traditionally felt far removed from engineering workflows. But by incorporating a lightweight Yellow Teaming assistant to help your ideation processes, it can become an intuitive, high ROI part of product development.

Building Your PyTorch YellowTeamGPT

The good news is that you don’t need a PhD in philosophy or a panel of advisors to Yellow Team your AI project. You just need to be willing to act and, in this implementation of Yellow Teaming, use a good LLM with the right prompt. There are several advantages to running your LLM locally. The biggest is that you can safely feed in confidential product plans without worrying about your data being leaked. Another benefit is that the smaller model is not perfect and makes mistakes, forcing us as users to apply critical thinking to every output, and putting us in the right headspace to analyze non-obvious product consequences.

Here is how you can set up a PyTorch-based 8-billion parameter Llama3 model on your Graviton instance. First, create a r8g.4xlarge instance running Ubuntu 24.04 with at least 50 GB of storage, then follow these three steps:

1. Set up your machine with the torchchat repo and other requirements:

sudo apt-get update && sudo apt install gcc g++ build-essential python3-pip python3-venv google-perftools -y

git clone https://github.com/pytorch/torchchat.git && cd torchchat

python3 -m venv .venv && source .venv/bin/activate

./install/install_requirements.sh

2. Download the model from Hugging Face (HF) by entering your HF access token (note the max sequence length parameter, which you can increase to enable longer conversations with a linear increase in memory usage):

pip install -U "huggingface_hub[cli]"

huggingface-cli login

python torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so --device cpu --max-seq-length 8192

3. Run the model with Arm CPU optimizations and 700 max token length per response:

LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python torchchat.py generate llama3.1 --dso-path exportedModels/llama3.1.so --device cpu --max-new-tokens 700 --chat

For more details on these commands and additional code snippets to add a UI to this chatbot, review this Arm Learning Path.

You can then enter a custom system prompt. Below is a simple prompt that turns your local LLM into a Yellow Teaming assistant. Feel free to review and tweak it to get the most out of it for your specific needs. Here’s what it does:

Gathers key product details: What you’re building, how it makes money, who your users are.
Analyzes direct and indirect consequences: YellowTeamGPT presents one at a time, considering non-obvious impacts to your business, users, and beyond (you’ll likely start to think of more impacts on your own).
Iterates with you: You are in control, telling YellowTeamGPT to continue listing general direct consequences, identifying specific company risks, moving to 2nd-order effects, and even brainstorming features to make your product better.

Here is the YellowTeamGPT system prompt for you to copy. If directly copying, make sure to copy as one line into your terminal or the new lines may cause issues.

You are an expert in complex systems thinking and AI product design, called YellowTeamGPT. You help technologists build better products that users love, and lower company risk. You do this by helping the user evaluate their product design decisions via the Yellow Teaming methodology, which identifies the consequences of design decisions on their business, their users, and society.

You will request from the user information about their product under development. Once you have enough information, you will analyze the product’s consequences that arise if deployed at scale. Structure your thinking to first review direct consequences, then 2^nd order consequences that follow from the identified direct effects (by asking ‘what might happen next as a result?’). Consider consequences that impact the company, users, and society; are short and long term; are across categories like truth and understanding, human well-being, capability growth, economics, and more.

You are here to constructively challenge users, not reinforce their existing ideas. Play devil’s advocate to help users think in ways they are not currently.

You will output in this format: For each identified consequence, tie the impact to product quality, and prompt the user with a question that helps them design the product better to mitigate that consequence (or turn a negative impact into a positive one). List one consequence at a time and ask the user to continue listing them or explore that consequence further.

Example Yellow Teaming

Give your LLM the provided system prompt and hit enter. Next, your YellowTeamGPT assistant will ask for some product details. Here is a hypothetical example product I used:

I’m building an app that turns a group chat conversation into a catchy pop song. Targeting any user, like WhatsApp users. Key functionality is importing a group chat conversation and outputting a tune with lyrics and beat to match. It is an app on any smartphone. Ideally, millions of users. Would make money by targeted advertising of the users.

You’ll notice, as YellowTeamGPT thinks and generates its reply, that it is notably slower than ChatGPT or other popular GPTs. Like the model’s inaccuracy, its slow speed can be perceived as a benefit. The point of this exercise is to slow down, think through non-obvious product impacts, and brainstorm enhancements that create positive value across the systems your product touches. While your YellowTeamingGPT is ‘thinking,’ you should be too.

And below are snippets of my conversation. First, it starts with one direct consequence:

I then instruct it to continue to another consequence:

I ask to explore the second-order effects of having misinformation spread at scale from this app:

Finally, I ask for help brainstorming product features to mitigate this harm. It generates a few interesting concepts that are not product-ready, but easily spark further ideation:

Using YellowTeamGPT for this use case, we were able to rapidly explore product impacts we may not have considered. We could then brainstorm features solving previously unconsidered problems, leading to an improved product experience that also mitigates the risk of reputational harm to our hypothetical company.

Integrating Yellow Teaming Into Your Practices

Anywhere you’re making decisions that shape your product’s features and the user experience, Yellow Teaming fits. Here are a few examples of where you can leverage your new YellowTeamGPT:

New product ideation sessions to expand your thinking.
Feature planning docs to stress-test your specs.
Code review workflows for flagging potential misuse.
Sprint retrospectives to reflect on design choices at scale.
Product pitch decks to show responsible AI due diligence.

It can be as formal or informal as you want. The more you and your team think about unintended, Nth-order product consequences across multiple axis, the better your product will be. By incorporating Yellow Teaming into your work, you don’t just do the right thing, you build products that:

Users engage with and trust more
Mitigate harmful impacts
Minimize company risk
Create lasting business value

Let’s stop thinking of responsible AI practices as something to check off a list and start thinking of it as what it really is –a competitive edge that creates positive outcomes for your company, for your users, and for our shared society.

PyTorch Hangzhou Meetup Recap: Exploring the AI Open Source Ecosystem and Cutting-Edge Technology Practices

May 27, 2025

by PyTorch Foundation PyTorch

On May 17, the PyTorch Meetup was successfully held in Hangzhou, drawing nearly 60 developers and industry experts from companies including Huawei, Tencent, Ant Group, and ByteDance. The event focused on the development of the PyTorch ecosystem, AI acceleration technologies, and industry practices. Through keynote speeches and technical sessions, in-depth discussions were held with participants, providing a valuable platform for exchange and collaboration.

Session Highlights:

Latest Developments in the PyTorch Community and Ecosystem Outlook

Yikun Jiang, a member of the PyTorch Technical Advisory Council (TAC), shared the latest updates from the PyTorch community. Topics included the general progress of PyTorch, PyTorch Foundation Expands to an Umbrella Foundation, the Ambassador Program, and PyTorch Conference planning. He emphasized how PyTorch continues to drive innovation and real-world adoption of AI open source technologies through technical iteration, ecosystem expansion, and global collaboration. He called on developers to actively engage in community building and help shape the future of the AI open source ecosystem.

Torchair: A torch.compile Backend Optimized for Ascend NPU

Peng Xue, Senior Engineer at Huawei, presented technical practices around graph mode optimization on Ascend NPUs. He introduced the two Torchair modes—Reduce-overhead and Max-autotune—and detailed deep optimizations in memory management, dynamic shapes, multi-stream parallelism, and compile-time caching. These improvements aim to enhance model training and inference performance while maintaining ease of use.

PyTorch Ecosystem on Ascend

Yuanhao Ji, Software Engineer at Huawei, discussed support for PyTorch ecosystem projects on Ascend NPUs. Focusing on model training, fine-tuning, and inference, he introduced TorchTitan, TorchTune, and vLLM as case studies. He explained their core features and adaptation strategies for Ascend, offering practical guidance for deploying PyTorch projects on this hardware.

Production Prefill/Decode Disaggregation Based on vLLM at Tencent

Chao Zhang, Senior Engineer at Tencent, presented the practice of Prefill/Decode (PD) separation in large model inference. This technique decouples the compute-intensive prefill stage from the memory-intensive decode stage, significantly improving system throughput and resource utilization. His talk covered key technical implementations such as KV cache transmission optimization, intelligent load balancing, and multi-turn dialogue caching. Real-world deployments on both homogeneous GPUs and heterogeneous setups like Ascend A2 + H20 showed performance improvements of 20%–50%. Tencent has further optimized the vLLM framework for CPUs, GPUs, and uses pipeline decomposition, low-precision KV caches, and graph compilers to enhance adaptability and performance across hardware platforms.

Key Reinforcement Learning (RL) Acceleration Techniques and Training Practices

Chenyi Pan, Senior Engineer at Huawei, shared Ascend’s breakthroughs in reinforcement learning and ecosystem development. Addressing the challenge of low resource utilization in RL systems, introduced a training-inference co-card solution that allows for efficient switching between the two tasks. This approach not only saves 50% in compute resources but also doubles single-card throughput and improves inference memory availability by 80%. To enrich the technical ecosystem, Ascend also launched TransferDock, a streaming data engine that employs dynamic load balancing strategies to improve task efficiency by over 10% compared to traditional caching mechanisms.

On the framework side, MindSpeed-RL combines the MindSpeed training backend with the vLLM inference engine, supporting dynamic weight partitioning and time-sharing of cluster resources while maintaining compatibility with mainstream open source ecosystems. Benchmarks using the Qwen2.5-32B model showed that this setup outperformed the SimpleRL-Zoo baseline on evaluations such as MATH500, demonstrating its technical leadership.

Ray’s Practice and Exploration in Ant Group’s AI Infra Ecosystem

Senlin Zhu, Senior Technical Expert at Ant Group and Head of Ant Ray, shared the practice and exploration of Ray within Ant’s AI Infra ecosystem. He outlined Ray’s architectural design and programming paradigm. Over time, Ray has evolved into critical infrastructure for AI systems, supporting training, inference, hyperparameter tuning, and reinforcement learning.

Since 2017, Ant Group has continuously invested in Ray, which now supports applications at the scale of 2 million cores. Ant has also contributed key features to the community, such as multi-tenancy support and the Flow Insight visual debugging tool. Flow Insight, in particular, has alleviated “black box” issues in complex AI systems and significantly improved observability and deployment efficiency at scale.

Challenges and Standardization in PyTorch Ecosystem Accelerator Development

Zesheng Zong, a community developer from Huawei, provided a systematic overview of the challenges, solutions, and case studies in developing accelerators for the PyTorch ecosystem. Developers integrating out-of-tree hardware face version compatibility issues and a lack of standardized quality benchmarks, making it hard to quantify new device support. In early 2025, a new exploration group was formed in the PyTorch community to tackle these challenges.

Key improvements include: Establishing a standardized testing framework using the public repository pytorch-fdn/oota for daily plugin testing. Developing the OpenReg module to simulate backend behavior and validate with test cases. Optimizing the PrivateUse1 plugin mechanism to reduce integration complexity. Supporting automatic plugin loading to simplify device access. Improving the torch.accelerator device-agnostic API for broader compatibility.

Intel’s community developer Chuanqi Wang followed up with a case study on integrating and running CI infrastructure using Intel Gaudi. He described how to leverage CI from code compilation and unit testing to TorchBench automated benchmarking, ensuring quality for new backend integrations. He also noted plans to reduce testing time, clarify required test items, and define quality standards to improve ecosystem compatibility and development efficiency.

This PyTorch Meetup served as a technical bridge for in-depth developer exchanges and demonstrated the vibrant energy of the PyTorch ecosystem in AI’s cutting-edge domains. Through diverse perspectives, the attendees sketched a picture of how open source collaboration drives technological progress. We look forward to more developers joining this open and thriving wave of innovation, where each exchange can spark new ideas in the age of intelligence.

The Open Source Legacy and AI’s Licensing Challenge

May 22, 2025

by PyTorch Foundation PyTorch

Open source licensing revolutionized software development, creating a thriving ecosystem built on shared innovation and collaboration. Licenses like MIT and Apache-2.0 gave developers a standard, legally robust way to share code, reducing friction and accelerating adoption.

Today, we stand at a similar inflection point with open AI models. These models, increasingly foundational to research and industry, lack an equivalent licensing standard. Existing open source software licenses weren’t designed with AI models in mind, while most model-specific licenses are either too complex, overly restrictive, or legally ambiguous.

To fully unlock the potential of open AI, we need a license purpose-built for the realities of machine learning. That’s where OpenMDW comes in.

Why AI Models Need a New License

AI models differ fundamentally from traditional software. They are:

Composites of multiple types of components: including code, architecture, training data, weights, documentation, and evaluation protocols.
Subject to overlapping IP regimes: such as copyright, database rights, and trade secrets, which vary across jurisdictions.
Distributed without a consistent definition of “open”: resulting in a fragmented licensing landscape.

This complexity has led to a proliferation of bespoke, incompatible licenses that often:

Limit redistribution, reuse, or modification.
Fail to address legal nuances unique to models.
Create uncertainty for developers and adopters alike.

The result? Friction in open ecosystems, legal ambiguity, and a significant barrier to collaboration and innovation.

The Origins of OpenMDW

OpenMDW, short for Open Model, Data and Weights License was born out of the effort to implement the Model Openness Framework (MOF). The MOF is a 3-tier classification system that defines what it means for a model to be truly “open”— not just available with limitations or use restrictions, but licensed openly across its code, architecture, parameters, training data, and documentation.

To make MOF practical, model developers needed a simple, standard license they could drop into any repository, just like Apache-2.0 or MIT is used in software. Something purpose-built for many types of content including models, not just code.

What Makes OpenMDW Different

OpenMDW is the first truly permissive license designed from the ground up for machine learning models. Here’s what sets it apart:

Covers the Entire Model Stack

It’s designed to apply to all components of a model release:

Model architecture
Parameters and checkpoints
Training and inference code
Preprocessing and evaluation data
Documentation (e.g., model cards, data cards)

Importantly, OpenMDW does not require inclusion of all components. It applies only to what is distributed, while remaining compatible with many other licenses that may govern certain parts of the repository.

(OpenMDW users will of course have to continue to comply with any other third-party licenses that apply to other pre-existing materials in their repos, such as by providing license text and notices, source code where applicable, etc.)

Comprehensive and Legally Grounded

OpenMDW grants expansive permissions including under copyright, patent, database, and trade secret law, a broad legal spectrum of rights relevant to AI artifacts.

It also includes:

A patent litigation termination clauses to deter patent assertions by users of the model’s materials
Attribution requirements to maintain provenance and trust

Compatible with Policy and Open Source Principles

Intended to be fully aligned with the EU AI Act’s references to “free and open-source licenses”
Supports the Open Source Initiative (OSI) 10 principles, including free redistribution, source availability, derived works and no discrimination against persons or groups

Designed for Simplicity

One license, one file, one place: a LICENSE file at the root of your repo
No complex licensing matrix: no confusion for downstream users
Easy integration into any repo: just like MIT or Apache-2.0.

Understanding the OpenMDW License

Definitions and Scope

Model Materials under OpenMDW include:

Model architecture and trained parameters; and
all other related materials provided under OpenMDW, which can include:
- Preprocessing, training and inference code
- Datasets and evaluation scripts
- Documentation, metadata, and tools

This comprehensive scope maps directly to the Model Openness Framework (MOF), ensuring that all critical elements of a model are covered if they are included with the distribution.

The Model Materials are not intended to be a requirement of what has to be included in the distribution. It only specifies that what is included in the distribution is covered by the license, and excludes anything covered by other licenses in the distribution.

Grant of Rights

OpenMDW grants broad rights to “deal in the Model Materials without restriction,” including for example:

Use, modify and distribute the Model Materials
Operate under copyright, patent, database, and trade secret laws

These rights are granted free of charge, with no field-of-use restrictions, removing ambiguity for developers and enterprises alike.

Attribution, Not Copyleft

OpenMDW imposes only minimal obligations:

Retain the license text
Preserve original copyright and attribution notices

There are no copyleft or share-alike conditions, meaning derivative models and integrations can remain fully permissive. This allows for maximum reuse and interoperability.

Patent Protection

To prevent misuse of the commons, OpenMDW includes a patent-litigation termination clause: if a licensee initiates offensive patent litigation over the Model Materials, their license is revoked.

This mirrors best practices in open source software and helps preserve a collaborative ecosystem.

Outputs Are Unrestricted

A major innovation: outputs generated by using a model under OpenMDW are completely free of licensing restrictions imposed by the provider of the Model Materials.

This eliminates confusion over whether generated text, images, code or predictions are encumbered by the model provider— a common point of uncertainty in existing licenses.

How to Adopt OpenMDW

Adopting OpenMDW is straightforward:

Add the OpenMDW-1.0 license file to your repository: LICENSE
Clearly indicate that your release is under OpenMDW-1.0 in the README
Ensure all components of the model package are covered and disclosed, including prominently highlighting any components that are subject to other licenses

Why This Matters Now

The AI community is reaching an inflection point. Open models from AI2’s Molmo to Mistral, and open reasoning models like DeepSeek’s R1 to multimodal agents are reshaping what’s possible in the open. But their licensing status remains hard to characterize, since software licenses may not map cleanly onto AI models.

Some open weights models which use restrictive licenses have become gradually more permissive; but without a strong legal framework available for licensing, model producers have been forced to err towards the side of caution in designing their own licenses.

In his recent post, Nathan Lambert of AI2 rightly notes: “One of the long standing todo items for open-source AI is better licenses”, OpenMDW helps to fill that need.

Just as Apache-2.0 and MIT became foundational licenses for open source software, OpenMDW is positioned to become the standard for open models. Its clarity, scope, and permissiveness lower barriers for developers and create certainty for companies and researchers looking to build responsibly on open foundations.

This isn’t just about legal clarity, it’s about enabling an innovation-rich and open source AI ecosystem.

Visit openmdw.ai for more details including the FAQ.

Featured Sessions: Exploring Innovation at PyTorch Day China 2025

May 22, 2025

by PyTorch Foundation PyTorch

Featured Sessions: Exploring Innovation at PyTorch Day China 2025

PyTorch Day China 2025, proudly hosted by the PyTorch Foundation, will take place on June 7 in Beijing, China collocated with the BAAI Conference. This will be the second event in the new PyTorch Day series, following the inaugural PyTorch Day France last month in Paris., PyTorch Days are focused on regional communities and provide a forum for sharing technical advances, project updates and tutorials, and showcasing impactful innovations across research and industry.

PyTorch Day China will highlight cutting-edge tools, frameworks, and practices across the PyTorch ecosystem. The full-day event will feature insightful talks across a multitude of domains and technical discussions on the most cutting-edge and relevant challenges and projects in the open source AI lifecycle.

PyTorch Day China Featured Sessions:

Running Large Models on Any AI Chip: PyTorch + Open-Source Stack (FlagOS)
Yonghua Lin, VP and Chief Engineer, BAAI
A deep dive into architecture-free deployment of large models using FlagOS and PyTorch—part of BAAI’s open-source stack for cross-hardware model execution.

torch.accelerator: A Unified Runtime API for Accelerators
Yu Guangye, AI Framework Engineer, Intel
Learn how Intel is helping unify PyTorch’s runtime interface across diverse hardware accelerators, streamlining portable and scalable AI workloads.

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone
Kaichao You, Tsinghua University
Explore the design and performance of vLLM, a popular open-source project for efficient inference and serving of large language models.

PyTorch in Production: Boosting LLM Performance on Ascend NPU
Jiawei Li, Huawei
A look at how PyTorch is being deployed in Huawei’s large-scale heterogeneous environments, with a focus on performance tuning and production readiness.

This is just a sample of what PyTorch Day China will offer. To explore the full agenda, visit the BAAI Conference event page.

Whether you’re contributing to the PyTorch ecosystem or deploying it at scale, PyTorch Day China is an opportunity to connect with a growing community and shape the future of AI development.

Accelerating GPU Performance with Triton: April 30th PyTorch ATX Event

May 20, 2025

by Jason Meaux, ATX PyTorch Leader Stephen Watt, VP and Distinguished Engineer, Red Hat PyTorch

The PyTorch ATX Triton event, sponsored by Red Hat, was held on April 30, 2025, at the University of Texas. It was an exciting gathering focused on the Triton framework and its role in optimizing and democratizing GPU performance. A key purpose of the event was to highlight the awesome Triton contributors based in Austin and working for companies like Red Hat, Intel, AMD, IBM Research, and the University of Texas. Bringing contributors together helped to share insights and foster a stronger community.

More than 50 attendees gathered to hear experts from these organizations discuss the growing importance of Triton in optimizing GPU efficiency for various algorithms. Key topics included understanding how to write, optimize, and troubleshoot Triton kernels to maximize GPU utilization and kernel portability.

Presentations covered a range of subjects from an introduction to Triton and its significance in vendor-neutral hardware acceleration, new sub-projects exploring increased developer productivity and runtime performance, to specific use cases such as Triton for vLLM and the Triton implementations by AMD and Intel. All session videos can be found here (YouTube). Speakers also examined the Triton framework itself, along with its release process, providing attendees with a comprehensive overview of the technology and its application.

This event aimed to equip the PyTorch ATX community with the knowledge and skills necessary to leverage Triton effectively and foster a deeper understanding of Triton’s capabilities by introducing and connecting local contributors. And guess what? This event worked out so well that we’re going to be hosting another large PyTorch ATX event focused on vLLM and the future of inferencing, coming up in August! Sign up here.

How OpenSynth Uses PyTorch to Accelerate Compute for Energy Modelling Applications

May 14, 2025

by PyTorch Foundation and LF Energy OpenSynth PyTorch

PyTorch Case Study LF Energy OpenSynth

OpenSynth has recently leveraged PyTorch to improve the experience of its users and community. OpenSynth is an open source community hosted by LF Energy that is democratising access to synthetic energy demand data.

Access to smart meter data is essential to rapid and successful energy transitions. Researchers, modelers and policymakers need to understand how energy demand profiles are changing, in a system that requires greater real time optimization of demand and supply on the grid. Yet current global energy modeling and policymaking is still largely based on static and highly aggregated data from the past – when energy flowed in one direction, consumer profiles were relatively predictable, and power generation was highly controllable.

The major challenge is that access to demand data is highly restrictive, as a result of privacy protections. Rather than joining industry calls to unlock raw smart meter data through existing mechanisms, by tackling current data regulations and smart meter legislation, OpenSynth believes generating synthetic data is the fastest way to achieve widespread, global access to smart meter datasets.

The community empowers holders of raw smart meter (i.e. demand) data to generate and share synthetic data and models that can be used by researchers, industry innovators and policy-makers.

PyTorch allowed the OpenSynth community to use GPU compute to speed up computation and use distributed training. End users with access to multiple GPUs can split the dataset into multiple smaller datasets to parallelise compute, further speeding up compute. This allows scaling of training to much bigger datasets than before.

The Business Challenge

Centre for Net Zero, the non-profit that originally developed OpenSynth before it was contributed to LF Energy, has also developed an algorithm called Faraday, available via OpenSynth to its users, that can generate synthetic smart meter data. The Faraday algorithm consists of two components: an AutoEncoder module and a Gaussian Mixture Module.

The Gaussian Mixture Model (GMM) of Faraday was originally implemented using scikit-learn’s implementation. Scikit Learn is a popular library used amongst data scientists to train many different machine learning algorithms. However, that implementation does not scale very well on large datasets, as it only supports CPUs (Central Processing Units) – it does not allow accelerated computation using GPUs (Graphical Processing units). GPUs are a more powerful chip that can perform mathematical operations much faster, and is commonly used to train deep learning models.

Furthermore, it also does not allow any parallelisation. Parallelisation compute means splitting the original dataset into multiple independent and smaller datasets, and training smaller models on each individual dataset, then combining the smaller models into a single model.

A different implementation was needed that supports both parallel computation and GPU acceleration.

How OpenSynth Used PyTorch

The OpenSynth community recently ported the GMM module from Faraday to PyTorch. Originally implemented using scikit-learn, this reimplementation enables the use of GPUs for training GMMs, significantly accelerating computational performance.

By leveraging PyTorch’s powerful GPU capabilities, the new GMM module can now handle much larger datasets and faster computation, making it an invaluable tool for practitioners working with large datasets that cannot fit into memory. This update allows users to scale their models and processes more efficiently, leading to faster insights and improved results in energy modeling applications.

A Word from OpenSynth

“Open source is a powerful catalyst for change. Our open data community, OpenSynth, is democratising global access to synthetic energy demand data – unlocking a diversity of downstream applications that can accelerate the decarbonisation of energy systems. PyTorch has an incredible open source ecosystem that enables us to significantly speed up computation for OpenSynth’s users, using distributed GPUs. Without this open source ecosystem, it would have been impossible to implement this change – and slowed down the efforts of those seeking to affect net zero action.” – Sheng Chai, Senior Data Scientist, Centre for Net Zero

Learn More

For more information, visit the LF Energy OpenSynth website.

PyTorch/XLA 2.7 Release Usability, vLLM boosts, JAX bridge, GPU Build

May 14, 2025

by Pei Zhang, Chris Jones PyTorch

PyTorch/XLA is a Python package that uses the XLA deep learning compiler to enable PyTorch deep learning workloads on various hardware backends, including Google Cloud TPUs, GPUs, and AWS Inferentia/Trainium. The PyTorch/XLA team has been working hard to bring new capabilities to researchers and developers using TPUs/GPUs and XLA backends. In this update, we’ve made many additions and improvements to the framework. Some of the notable highlights are:

Usability improvements

Experimental bridge with JAX operations

A new Pallas-based kernel for ragged paged attention, enabling further optimizations on vLLM TPU

These features, bug fixes, and other details are outlined in the release notes. Let’s now delve into the highlights in detail!

Usability Improvements

Developers are now able to better target areas of code that they want to measure the performance of by marking the exact regions of code that they would like to profile. An example of this is:

server = xp.start_server(8001)
xp.start_trace(profiling_dir)
# Run some computation
...
xp.stop_trace()

PyTorch/XLA 2.7 also introduces an API to query the number of cached compilation graphs, aiding in the detection of unexpected compilations during production inference or training. An additional enhancement optimizes host-to-device transfers by avoiding unnecessary tensor copying, thus improving performance.

JAX Bridge in PyTorch/XLA (Prototype)

We’re experimenting with integrating JAX operations directly into PyTorch/XLA graphs as a way to enable a bridge between the frameworks — this method enables users to call JAX functions inside PyTorch models running with XLA.

As a use case, we’ve explored calling `jax.experimental.shard_alike` from PyTorch/XLA. This function improves sharding propagation in certain code patterns like scan, and we’ve integrated it as part of the GSPMD (Generalized SPMD) workflow in the compiler. This tool is utilized in torchprime to enable support for the SplashAttention Pallas kernel.

 import torch_xla.core.xla_builder as xb
# Native function written in JAX
def jax_function(...):
  import jax
  ...
  return ...
res = xb.call_jax(...) </pre?

Ragged Paged Attention Pallas Kernel

Efficient attention for variable-length sequences is critical for scaling large language models, and the new Pallas kernel for ragged paged attention brings a major performance and usability upgrade to vLLM TPU.

This update introduces a custom kernel implemented using the Pallas custom kernel language and is lowered to Mosaic for TPU. It supports ragged (variable-length) input sequences and implements a paged attention pattern. Below are the key features:

Supports mixed prefill and decode operations to increase inference throughput (e.g., up to a 5x speedup compared to the padded Multi-Queries Paged Attention implementation for llama-3-8b).
No GMM (Grouped Matmul) Metadata required! We calculate the metadata on the fly in the kernel. This can increase performance by 10%.
Provides a CUDA Flash Attention equivalent with Paged Attention support and a similar interface.

We are continuously collaborating with the vLLM community to further optimize performance, expand kernel coverage, and streamline TPU inference at scale.

GPU Build is Back

The GPU build was paused in the PyTorch/XLA 2.6 release, but we’ve now re-enabled GPU Continuous Integration (CI) in version 2.7. The current release includes GPU builds with CUDA 12.6, marking an important step forward for GPU support.

While CUDA support is still considered experimental in this release, we plan to expand coverage to additional CUDA versions in upcoming releases.

Get Involved

Please check out the latest changes on GitHub. As always, we’re actively seeking feedback and contributions from the community.

Summary

Problem

How to Use

torchtune

Future Work

Ecosystem Project Spotlights

SGLang

docTR

Up and Coming Project Spotlights

EIR

torchcvnn

How to Join the PyTorch Ecosystem

What’s Next for OSAI? Five Areas Ripe for Research

Why AI Models Need a New License

The Origins of OpenMDW

What Makes OpenMDW Different

Covers the Entire Model Stack

Comprehensive and Legally Grounded

Compatible with Policy and Open Source Principles

Designed for Simplicity

Understanding the OpenMDW License

Definitions and Scope

Grant of Rights

Attribution, Not Copyleft

Patent Protection

Outputs Are Unrestricted

How to Adopt OpenMDW

Why This Matters Now

PyTorch Day China Featured Sessions:

Usability Improvements

JAX Bridge in PyTorch/XLA (Prototype)

Ragged Paged Attention Pallas Kernel

GPU Build is Back

Get Involved

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.