Abstracts: April 16, 2024

Abstracts: April 16, 2024

Stylized microphone and sound waves illustration.

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Research Software Engineer Tusher Chakraborty joins host Gretchen Huizinga to discuss “Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things,” which was accepted at the 2024 USENIX Symposium on Networked Systems Design and Implementation (NSDI). In the paper, Chakraborty and his coauthors share their efforts to address the challenges of delivering reliable and affordable IoT connectivity via satellite-based networks. They propose a method for leveraging the motion of small satellites to facilitate efficient communication between a large IoT-satellite constellation and devices on Earth within a limited spectrum.

Transcript

[MUSIC]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

I’m talking today to Tusher Chakraborty, a senior research software engineer at Microsoft Research. Tusher is coauthor of a paper called “Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things.” Tusher, thanks for joining us on Abstracts!


TUSHER CHAKRABORTY: Hi. Thank you for having me here, Gretchen, today. Thank you.

HUIZINGA: So because this show is all about abstracts, in just a few sentences, tell us about the problem your paper addresses and why we should care about it.

CHAKRABORTY: Yeah, so think of, I’m a farmer living in a remote area and bought a sensor to monitor the soil quality of my farm. The big headache for me would be how to connect the sensor so that I can get access to the sensor data from anywhere. We all know that connectivity is a major bottleneck in remote areas. Now, what if, as a farmer, I could just click the power button of the sensor, and it gets connected from anywhere in the world. It’s pretty amazing, right? And that’s what our research is all about. Get your sensor devices connected from anywhere in the world with just the click of power button. We call it one-click connectivity. Now, you might be wondering, what’s the secret sauce? It’s not magic; it’s direct-to-satellite connectivity. So these sensors directly get connected to the satellites overhead from anywhere on Earth. The satellites, which are orbiting around the earth, collect the data from the sensing devices and forward to the ground stations in some other convenient parts of the world where these ground stations are connected to the internet.

HUIZINGA: So, Tusher, tell us what’s been tried before to address these issues and how your approach contributes to the literature and moves the science forward.

CHAKRABORTY: So satellite connectivity is nothing new and has been there for long. However, what sets us apart is our focus on democratizing space connectivity, making it affordable for everyone on the planet. So we are talking about the satellites that are at least 10 to 20 times cheaper and smaller than state-of-the-art satellites. So naturally, this ambitious vision comes with its own set of challenges. So when you try to make something cheaper and smaller, you’ll face lots of challenges that all these big satellites are not facing. So if I just go a bit technical, think of the antenna. So these big satellite antennas, they can actually focus on particular part of the world. So this is something called beamforming. On the other hand, when we try to make the satellites cheaper and smaller, we can’t have that luxury. We can’t have beamforming capability. So what happens, they have omnidirectional antenna. So it seems like … you can’t focus on a particular part of the earth rather than you create a huge footprint on all over the earth. So this is one of the challenges that you don’t face in the state-of-the-art satellites. And we try to solve these challenges because we want to make connectivity affordable with cheaper and smaller satellites.

HUIZINGA: Right. So as you’re describing this, it sounds like this is a universal problem, and people have obviously tried to make things smaller and more affordable in the past. How is yours different? What methodology did you use to resolve the problems, and how did you conduct the research?

CHAKRABORTY: OK, I’m thrilled that you asked this one because the research methodology was the most exciting part for me here. As a part of this research, we launched a satellite in a joint effort with a satellite company. Like, this is very awesome! So it was a hands-on experience with a real-deal satellite system. It was not simulation-based system. The main goal here was to learn the challenge from a real-world experience and come up with innovative solutions; at the same time, evaluate the solutions in real world. So it was all about learning by doing, and let me tell you, it was quite the ride! [LAUGHTER] We didn’t do anything new when we launched the satellites. We just tried to see how industry today does this. We want to learn from them, hey, what’s the industry practice? We launched a satellite. And then we faced a lot of problems that today’s industry is facing. And from there, we learned, hey, like, you know, this problem is industry facing; let’s go after this, and let’s solve this. And then we tried to come up with the solutions based on those problems. And this was our approach. We didn’t want to assume something beforehand. We want to learn from how industry is going today and help them. Like, hey, these are the problems you are facing, and we are here to help you out.

HUIZINGA: All right, so assuming you learned something and wanted to pass it along, what were your major findings?

CHAKRABORTY: OK, that’s a very good question. So I was talking about the challenges towards this democratization earlier, right? So one of the most pressing challenges: shortage of spectrum. So let me try to explain this from the high level. So we need hundreds of these satellites, hundreds of these small satellites, to provide 24-7 connectivity for millions of devices around the earth. Now, I was talking, the footprint of a satellite on Earth can easily cover a massive area, somewhat similar to the size of California. So now with this large footprint, a satellite can talk with thousands of devices on Earth. You can just imagine, right? And at the same time, a device on Earth can talk with multiple satellites because we are talking about hundreds of these satellites. Now, things get tricky here. [LAUGHTER] We need to make sure that when a device and a satellite are talking, another nearby device or a satellite doesn’t interfere. Otherwise, there will be chaos—no one hearing others properly. So when we were talking about this device and satellite chat, right, so what is that all about? This, all about in terms of communication, is packet exchange. So the device sends some packet to the satellite; satellite sends some packet to the device—it’s all about packet exchange. Now, you can think of, if multiple of these devices are talking with a satellite or multiple satellites are talking with a device, there will be a collision in this packet exchange if you try to send the packets at the same time. And if you do that, then your packet will be collided, and you won’t be able to get any packet on the receiver end. So what we do, we try to send this packet on different frequencies. It’s like a different sound or different tone so that they don’t collide with each other. And, like, now, I said that you need different frequencies, but frequency is naturally limited. And the choice of frequency is even limited. This is very expensive. But if you have limited frequency and you want to resolve this collision, then you have a problem here. How do you do that? So we solve this problem by smartly looking at an artifact of these satellites. So these satellites are moving really fast around the earth. So when they are moving very fast around the earth, they create a unique signature on the frequency that they are using to talk with the devices on Earth. And we use this unique signature, and in physics, this unique signature is known as Doppler signature. And now you don’t need a separate frequency to sound them different, to have packets on different frequencies. You just need to recognize that unique signature to distinguish between satellites and distinguish between their communications and packets. So in that sense, there won’t be any packet collision. And this is all about our findings. So with this, now multiple devices and satellites can talk with each other at the same time without interference but using the same frequency.

HUIZINGA: It sounds, like, very similar to a big room filled with a lot of people. Each person has their own voice, but in the mix, you, kind of, lose track of who’s talking and then you want to, kind of, tune in to that specific voice and say, that’s the one I’m listening to.

CHAKRABORTY: Yeah, I think you picked up the correct metaphor here! This is the scenario you can try to explain here. So, yeah, like what we are essentially doing, like, if you just, in a room full of people and they are trying to talk with each other, and then if they’re using the same tone, no one will be distinguished one person from another.

HUIZINGA: Right …

CHAKRABORTY: Everyone will sound same and that will be colliding. So you need to make sure that, how you can differentiate the tones …

HUIZINGA: Yeah …

CHAKRABORTY: … and the satellites differentiate their tones due to their fast movement. And we use our methodology to recognize that tone, which satellite is sending that tone.

HUIZINGA: So you sent up the experimental satellite to figure out what’s happening. Have you since tested it to see if it works?

CHAKRABORTY: Yeah, yeah, so we have tried it out, because this is a software solution, to be honest.

HUIZINGA: Ah.

CHAKRABORTY: As I was talking about, there is no hardware modification required at this point. So what we did, we just implemented this software in the ground stations, and then we tried to recognize which satellite is creating which sort of signature. That’s it!

HUIZINGA: Well, it seems like this research would have some solid real-world impact. So who would you say it helps most and how?

CHAKRABORTY: OK, that’s a very good one. So the majority of the earth still doesn’t have affordable connectivity. The lack of connectivity throws a big challenge to critical industries such as agriculture—the example that I gave—energy, and supply chain, so hindering their ability to thrive and innovate. So our vision is clear: to bring 24-7 connectivity for devices anywhere on Earth with just a click of power button. Moreover, affordability at the heart of our mission, ensuring that this connectivity is accessible to all. So in core, our efforts are geared towards empowering individuals and industries to unlock their full potential in an increasingly connected world.

HUIZINGA: If there was one thing you want our listeners to take away from this research, what would it be?

CHAKRABORTY: OK, if there is one thing I want you to take away from our work, it’s this: connectivity shouldn’t be a luxury; it’s a necessity. Whether you are a farmer in a remote village or a business owner in a city, access to reliable, affordable connectivity can transform your life and empower your endeavors. So our mission is to bring 24-7 connectivity to every corner of the globe with just a click of a button.

HUIZINGA: I like also how you say every corner of the globe, and I’m picturing a square! [LAUGHTER] OK, last question. Tusher, what’s next for research on satellite networks and Internet of Things? What big unanswered questions or unsolved problems remain in the field, and what are you planning to do about it?

CHAKRABORTY: Uh … where do I even begin? [LAUGHTER] Like, there are countless unanswered questions and unsolved problems in this field. But let me highlight one that we talked here: limited spectrum. So as our space network expands, so does our need for spectrum. But what’s the tricky part here? Just throw more and more spectrum. The problem is the chunk of spectrum that’s perfect for satellite communication is often already in use by the terrestrial networks. Now, a hard research question would be how we can make sure that the terrestrial and the satellite networks coexist in the same spectrum without interfering [with] each other. It’s a tough nut to crack, but it’s a challenge we are excited to tackle head-on as we continue to push the boundaries of research in this exciting field.

[MUSIC]

HUIZINGA: Tusher Chakraborty, thanks for joining us today, and to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts (opens in new tab). You can also read it on the Networked Systems Design and Implementation, or NSDI, website, and you can hear more about it at the NSDI conference this week. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: April 16, 2024 appeared first on Microsoft Research.

Read More

torchtune: Easily fine-tune LLMs using PyTorch

We’re pleased to announce the alpha release of torchtune, a PyTorch-native library for easily fine-tuning large language models.

Staying true to PyTorch’s design principles, torchtune provides composable and modular building blocks along with easy-to-extend training recipes to fine-tune popular LLMs on a variety of consumer-grade and professional GPUs.

torchtune supports the full fine-tuning workflow from start to finish, including

  • Downloading and preparing datasets and model checkpoints.
  • Customizing the training with composable building blocks that support different model architectures, parameter-efficient fine-tuning (PEFT) techniques, and more.
  • Logging progress and metrics to gain insight into the training process.
  • Quantizing the model post-tuning.
  • Evaluating the fine-tuned model on popular benchmarks.
  • Running local inference for testing fine-tuned models.
  • Checkpoint compatibility with popular production inference systems.

To get started, jump right into the code or walk through our many tutorials!

Why torchtune?

Over the past year there has been an explosion of interest in open LLMs. Fine-tuning these state of the art models has emerged as a critical technique for adapting them to specific use cases. This adaptation can require extensive customization from dataset and model selection all the way through to quantization, evaluation and inference. Moreover, the size of these models poses a significant challenge when trying to fine-tune them on consumer-level GPUs with limited memory.

Existing solutions make it hard to add these customizations or optimizations by hiding the necessary pieces behind layers of abstractions. It’s unclear how different components interact with each other and which of these need to be updated to add new functionality. torchtune empowers developers to adapt LLMs to their specific needs and constraints with full control and visibility.

torchtune’s Design

torchtune was built with the following principles in mind

  • Easy extensibility – New techniques emerge all the time and everyone’s fine-tuning use case is different. torchtune’s recipes are designed around easily composable components and hackable training loops, with minimal abstraction getting in the way of fine-tuning your fine-tuning. Each recipe is self-contained – no trainers or frameworks, and is designed to be easy to read – less than 600 lines of code!
  • Democratize fine-tuning – Users, regardless of their level of expertise, should be able to use torchtune. Clone and modify configs, or get your hands dirty with some code! You also don’t need beefy data center GPUs. Our memory efficient recipes have been tested on machines with a single 24GB gaming GPU.
  • Interoperability with the OSS LLM ecosystem – The open source LLM ecosystem is absolutely thriving, and torchtune takes advantage of this to provide interoperability with a wide range of offerings. This flexibility puts you firmly in control of how you train and use your fine-tuned models.

Over the next year, open LLMs will become even more powerful, with support for more languages (multilingual), more modalities (multimodal) and more tasks. As the complexity of these models increases, we need to pay the same attention to “how” we design our libraries as we do to the features provided or performance of a training run. Flexibility will be key to ensuring the community can maintain the current pace of innovation, and many libraries/tools will need to play well with each other to power the full spectrum of use cases. torchtune is built from the ground up with this future in mind.

In the true PyTorch spirit, torchtune makes it easy to get started by providing integrations with some of the most popular tools for working with LLMs.

  • Hugging Face Hub – Hugging Face provides an expansive repository of open source models and datasets for fine-tuning. torchtune seamlessly integrates through the tune download CLI command so you can get started right away with fine-tuning your first model.
  • PyTorch FSDP – Scale your training using PyTorch FSDP. It is very common for people to invest in machines with multiple consumer level cards like the 3090/4090 by NVidia. torchtune allows you to take advantage of these setups by providing distributed recipes powered by FSDP.
  • Weights & Biases – torchtune uses the Weights & Biases AI platform to log metrics and model checkpoints during training. Track your configs, metrics and models from your fine-tuning runs all in one place!
  • EleutherAI’s LM Evaluation Harness – Evaluating fine-tuned models is critical to understanding whether fine-tuning is giving you the results you need. torchtune includes a simple evaluation recipe powered by EleutherAI’s LM Evaluation Harness to provide easy access to a comprehensive suite of standard LLM benchmarks. Given the importance of evaluation, we will be working with EleutherAI very closely in the next few months to build an even deeper and more “native” integration.
  • ExecuTorch – Models fine-tuned with torchtune can be easily exported to ExecuTorch, enabling efficient inference to be run on a wide variety of mobile and edge devices.
  • torchao – Easily and efficiently quantize your fine-tuned models into 4-bit or 8-bit using a simple post-training recipe powered by the quantization APIs from torchao.

What’s Next?

This is just the beginning and we’re really excited to put this alpha version in front of a vibrant and energetic community. In the coming weeks, we’ll continue to augment the library with more models, features and fine-tuning techniques. We’d love to hear any feedback, comments or feature requests in the form of GitHub issues on our repository, or on our Discord channel. As always, we’d love any contributions from this awesome community. Happy Tuning!

Read More

Hindsight PRIORs for Reward Learning from Human Preferences

Preference based Reinforcement Learning (PbRL) has shown great promise in learning from human preference binary feedback on agent’s trajectory behaviors, where one of the major goals is to reduce the number of queried human feedback. While the binary labels are a direct comment on the goodness of a trajectory behavior, there is still a need for resolving credit assignment especially in limited feedback. We propose our work, PRIor On Rewards (PRIOR) that learns a forward dynamics world model to approximate apriori selective attention over states which serves as a means to perform credit…Apple Machine Learning Research

AI Is Tech’s ‘Greatest Contribution to Social Elevation,’ NVIDIA CEO Tells Oregon State Students

AI Is Tech’s ‘Greatest Contribution to Social Elevation,’ NVIDIA CEO Tells Oregon State Students

AI promises to bring the full benefits of the digital revolution to billions across the globe, NVIDIA CEO Jensen Huang said Friday during a conversation with Oregon State University President Jayathi Murthy.

“I believe that artificial intelligence is the technology industry’s single greatest contribution to social elevation, to lift all of the people that have historically been left behind,” Huang told more than 2,000 faculty, students and staff gathered for his conversation with Murthy.

The talk was the highlight of a forum marking the groundbreaking for a new research building that will be named for Huang and his wife, Lori, both Oregon State alumni.

The facility positions Oregon State as a leader not just in the semiconductor industry but also at the intersection of high performance computing and a growing number of fields.

Friday’s event at Oregon State University followed the groundbreaking for the Jen-Hsun Huang and Lori Mills Huang Collaborative Innovation Complex. Image courtesy of Oregon State University.

Those innovations have world-changing implications.

Huang said those who know a programming language such as C++ typically have greater opportunities.

“Because programming is so hard, the number of people who have benefitted from this, putting it to use for their economic prosperity, has been limited,” Huang said.

AI unlocks that and more.

“So you essentially have a collaborator with you at all times, essentially have a tutor at all times, and so I think the ability for AI to elevate all of the people left behind is quite extraordinary,” he added.

Huang’s appearance in Corvallis, Oregon, capped off a week of announcements underscoring NVIDIA’s commitment to preparing the future workforce with advanced AI, data science and high performance computing training.

On Tuesday, NVIDIA announced that it would participate in a $110 million partnership between Japan and the United States, which would include funding for university research.

On Wednesday, Georgia Tech announced a new NVIDIA-powered supercomputer that will help prepare undergraduate students to solve complex challenges with AI and HPC.

And later this month, NVIDIA founder Chris Malachowsky will be inducted into the Hall of Fame for the University of Florida’s Department of Electrical & Computer Engineering, following the November inauguration of the university’s $150 million Malachowsky Hall for Data Science & Information Technology.

Educating Future Leaders for ‘New Industrial Revolution’

NVIDIA has been investing in universities for decades, providing computing resources, advanced training curricula, donations and other support.

These contributions enable students and professors to access the high performance computing necessary for groundbreaking results at a key moment in the history of the industry.

“We’re at the beginning of a new industrial revolution, and the reason why I say that is because an industrial revolution produces something new that was impossible to produce in the past,” Huang said.

“And in this new world, you can apply electricity, and what’s going to come out of it is a whole bunch of floating-point numbers. We call them tokens, and those tokens are essentially artificial intelligence,” Huang said.

“And so this industrial revolution is going to be manufacturing intelligence at a very large scale,” Huang said.

OSU Breaks Ground on $213 Million Research Complex

Friday’s event in Oregon highlighted the Huangs’ commitment to education and reflected the couple’s deep personal ties to Oregon State, where the two met.

The conversation with Murthy followed the groundbreaking for the Jen-Hsun Huang and Lori Mills Huang Collaborative Innovation Complex, which took place Friday morning on the Corvallis campus.

When it opens in 2026, the 150,000-square-foot, $213 million complex — supported by a $50 million gift from the Huangs — will increase Oregon State’s support for the semiconductor and technology industry in Oregon and beyond.

Harnessing one of the nation’s most powerful NVIDIA supercomputers, the complex will bring together faculty and students to solve critical challenges facing the world in areas such as climate science, clean energy and water resources.

Huang sees the center — and AI — as helping put the benefits of computing at the service of people doing work across a broad range of disciplines.

Oregon State is one of the world’s premier schools in forestry, Huang said, adding that “let’s just face it, it’s very unlikely that somebody who was in forestry, it’s not impossible, but C++ is probably not your thing,” Huang said.

Thanks to ChatGPT, you can “now use a computer to apply it to your field of science and apply this computing technology to revolutionize your work.”

That makes learning how to think — and how to collaborate — more important than ever, Huang said. It’s “no different than if I gave you a partner to collaborate with you to solve problems,” Huang said.

“You still need to know how to collaborate, how to prompt, how to frame a problem, how to refine the solution, how to iterate on it and how to change your mind.”

NVIDIA Joins $110 Million Partnership to Help Universities Teach AI Skills

The groundbreaking at Oregon State is just one of several announcements highlighting NVIDIA’s global commitment to advancing the global technology industry.

Last week, the Biden Administration announced a new $110 million AI partnership between Japan and the United States, including an initiative to fund research through collaboration between the University of Washington and the University of Tsukuba.

As part of this, NVIDIA is committing $25 million to a collaboration with Amazon to bring the latest technologies to the University of Washington, in Seattle, and the University of Tsukuba, northeast of Tokyo.

Georgia Tech Unveils New AI Makerspace in Collaboration With NVIDIA

And on Wednesday, Georgia Tech’s College of Engineering established an AI supercomputing hub dedicated to teaching students.

The AI Makerspace was launched in collaboration with NVIDIA. College leaders call it a “digital sandbox” for students to understand and use AI. Initially focusing on undergraduates, the AI Makerspace aims to democratize access to computing resources typically reserved for researchers or technology companies.

Students will access the cluster online as part of their coursework. The Makerspace will also better position students after graduation as they work with AI professionals and help shape future applications.

‘Beginning of a New World’

To be sure, AI has limits, Huang explained. “It’s no different than when you work with teammates or lab partners; you’re guiding each other along because you know each other’s weaknesses and strengths,” he said.

However, Huang said now is a fantastic time to get an education and prepare for a career.

“This is the beginning of a new world and this is the best of times to go to school — the whole world is changing, right? New technology and new capabilities, new instruments and new ways to learn,” Huang said.

Images courtesy of Oregon State University.

Read More

Vanishing Gradients in Reinforcement Finetuning of Language Models

Pretrained language models are commonly adapted to comply with human intent and downstream tasks via finetuning. The finetuning process involves supervised finetuning (SFT), using labeled samples, and/or reinforcement learning based fine-tuning (RFT) via policy gradient methods, using a (possibly learned) reward function. This work highlights an overlooked optimization hurdle in RFT: we prove that the expected gradient for an input sample (i.e. prompt) vanishes if its reward standard deviation under the model is low, regardless of whether the reward mean is near-optimal or not. We then…Apple Machine Learning Research

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Inspired by the advancements in foundation models for language-vision modeling, we explore the utilization of transformers and large-scale pretraining on biosignals. In this study, our aim is to design a general-purpose architecture for biosignals that can be easily trained on multiple modalities and can be adapted to new modalities or tasks with ease.
The proposed model is designed with three key features: (i) A frequency-aware architecture that can efficiently identify local and global information from biosignals by leveraging global filters in the frequency space. (ii) A channel-independent…Apple Machine Learning Research

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization

Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks. However, such models mainly perform zero-shot recognition in a closed-set manner, and thus struggle to handle open-domain visual concepts by design. There are recent finetuning methods, such as prompt learning, that not only study the discrimination between in-distribution (ID) and out-of-distribution (OOD) samples, but also show some improvements in both ID and OOD accuracies. In this paper, we first demonstrate that vision-language models, after long enough finetuning but without proper…Apple Machine Learning Research

Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage

Long prompts present a significant challenge for practical LLM-based systems that need to operate with low latency and limited resources. We investigate prompt compression for zero-shot dialogue systems that learn to use unseen APIs directly in-context from their documentation, which may take up hundreds of prompt tokens per API. We start from a recently introduced approach (Mu et al., 2023) that learns to compress the prompt into a few “gist token” activations during finetuning. However, this simple idea is ineffective in compressing API documentation, resulting in low accuracy compared to…Apple Machine Learning Research