New NVIDIA RTX A400 and A1000 GPUs Enhance AI-Powered Design and Productivity Workflows

New NVIDIA RTX A400 and A1000 GPUs Enhance AI-Powered Design and Productivity Workflows

AI integration across design and productivity applications is becoming the new standard, fueling demand for advanced computing performance. This means professionals and creatives will need to tap into increased compute power, regardless of the scale, complexity or scope of their projects.

To meet this growing need, NVIDIA is expanding its RTX professional graphics offerings with two new NVIDIA Ampere architecture-based GPUs for desktops: the NVIDIA RTX A400 and NVIDIA RTX A1000.

They expand access to AI and ray-tracing technology, equipping professionals with the tools they need to transform their daily workflows.

A New Era of Creativity, Performance and Efficiency

The RTX A400 GPU introduces accelerated ray tracing and AI to the RTX 400 series GPUs. With 24 Tensor Cores for AI processing, it surpasses traditional CPU-based solutions, enabling professionals to run cutting-edge AI applications, such as intelligent chatbots and copilots, directly on their desktops.

The GPU delivers real-time ray tracing so creators can build vivid, physically accurate 3D renders that push the boundaries of creativity and realism.

The A400 also includes four display outputs, a first for its series. This makes it ideal for high-density display environments, which are critical for industries like financial services, command and control, retail, and transportation.

The NVIDIA RTX A1000 GPU brings Tensor Cores and RT Cores to the RTX 1000 series GPUs for the first time, unlocking accelerated AI and ray-tracing performance for creatives and professionals.

With 72 Tensor Cores, the A1000 offers a tremendous upgrade over the previous generation, delivering over 3x faster generative AI processing for tools like Stable Diffusion. In addition, its 18 RT Cores speed graphics and rendering tasks by up to 3x, accelerating professional workflows such as 2D and 3D computer-aided design (CAD), product and architectural design, and 4K video editing.

The A1000 also excels in video processing, handling up to 38% more encode streams and offering 2x faster decode performance over the previous generation.

With a sleek, single-slot design and consuming just 50W, the A400 and A1000 GPUs bring impressive features to compact, energy-efficient workstations.

Expanding the Reach of RTX

These new GPUs empower users with cutting-edge AI, graphics and compute capabilities to boost productivity and unlock creative possibilities. Advanced workflows involving ray-traced renders and AI are now within reach, allowing professionals to push the boundaries of their work and achieve stunning levels of realism.

Industrial planners can use ‌these new powerful and energy-efficient computing solutions for edge deployments. Creators can boost editing and rendering speeds to produce richer visual content. Architects and engineers can seamlessly transition ideas from 3D CAD concepts into tangible designs. Teams working in smart spaces can use the GPUs for real-time data processing, AI-enhanced security and digital signage management in space-constrained settings. And healthcare professionals can achieve quicker, more precise medical imaging analyses.

Financial professionals have always used expansive, high-resolution visual workspaces for more effective trading, analysis and data management. With the RTX A400 GPU supporting up to four 4K displays natively, financial services users can now achieve a high display density with fewer GPUs, streamlining their setups and reducing costs.

Next-Generation Features and Accelerated Performance 

The NVIDIA RTX A400 and A1000 GPUs are equipped with features designed to supercharge everyday workflows, including:

  • Second-generation RT Cores: Real-time ray tracing, photorealistic, physically based rendering and visualization for all professional workflows, including architectural drafting, 3D design and content creation, where accurate lighting and shadow simulations can greatly enhance the quality of work.
  • Third-generation Tensor Cores: Accelerates AI-augmented tools and applications such as generative AI, image rendering denoising and deep learning super sampling to improve image generation speed and quality.
  • Ampere architecture-based CUDA cores: Up to 2x the single-precision floating point throughput of the previous generation for significant speedups in graphics and compute workloads.
  • 4GB or 8GB of GPU memory: 4GB of GPU memory with the A400 GPU and 8GB with the A1000 GPU accommodate a range of professional needs, from basic graphic design and photo editing to more demanding 3D modeling with textures or high-resolution editing and data analyses. The GPUs also feature increased memory bandwidth over the previous generation for quicker data processing and smoother handling of larger datasets and scenes.
  • Encode and decode engines: With seventh-generation encode (NVENC) and fifth-generation decode (NVDEC) engines, the GPUs offer efficient video processing to support high-resolution video editing, streaming and playback with ultra-low latency. Inclusion of AV1 decode enables higher efficiency and smoother playback of more video formats.

Availability 

The NVIDIA RTX A1000 GPU is now available through global distribution partners such as PNY and Ryoyo Electric. The RTX A400 GPU is expected to be available from channel partners starting in May, with anticipated availability from manufacturers in the summer.

Read More

A secure approach to generative AI with AWS

A secure approach to generative AI with AWS

Generative artificial intelligence (AI) is transforming the customer experience in industries across the globe. Customers are building generative AI applications using large language models (LLMs) and other foundation models (FMs), which enhance customer experiences, transform operations, improve employee productivity, and create new revenue channels.

FMs and the applications built around them represent extremely valuable investments for our customers. They’re often used with highly sensitive business data, like personal data, compliance data, operational data, and financial information, to optimize the model’s output. The biggest concern we hear from customers as they explore the advantages of generative AI is how to protect their highly sensitive data and investments. Because their data and model weights are incredibly valuable, customers require them to stay protected, secure, and private, whether that’s from their own administrator’s accounts, their customers, vulnerabilities in software running in their own environments, or even their cloud service provider from having access.

At AWS, our top priority is safeguarding the security and confidentiality of our customers’ workloads. We think about security across the three layers of our generative AI stack:

  • Bottom layer – Provides the tools for building and training LLMs and other FMs
  • Middle layer – Provides access to all the models along with tools you need to build and scale generative AI applications
  • Top layer – Includes applications that use LLMs and other FMs to make work stress-free by writing and debugging code, generating content, deriving insights, and taking action

Each layer is important to making generative AI pervasive and transformative.

With the AWS Nitro System, we delivered a first-of-its-kind innovation on behalf of our customers. The Nitro System is an unparalleled computing backbone for AWS, with security and performance at its core. Its specialized hardware and associated firmware are designed to enforce restrictions so that nobody, including anyone in AWS, can access your workloads or data running on your Amazon Elastic Compute Cloud (Amazon EC2) instances. Customers have benefited from this confidentiality and isolation from AWS operators on all Nitro-based EC2 instances since 2017.

By design, there is no mechanism for any Amazon employee to access a Nitro EC2 instance that customers use to run their workloads, or to access data that customers send to a machine learning (ML) accelerator or GPU. This protection applies to all Nitro-based instances, including instances with ML accelerators like AWS Inferentia and AWS Trainium, and instances with GPUs like P4, P5, G5, and G6.

The Nitro System enables Elastic Fabric Adapter (EFA), which uses the AWS-built AWS Scalable Reliable Datagram (SRD) communication protocol for cloud-scale elastic and large-scale distributed training, enabling the only always-encrypted Remote Direct Memory Access (RDMA) capable network. All communication through EFA is encrypted with VPC encryption without incurring any performance penalty.

The design of the Nitro System has been validated by the NCC Group, an independent cybersecurity firm. AWS delivers a high level of protection for customer workloads, and we believe this is the level of security and confidentiality that customers should expect from their cloud provider. This level of protection is so critical that we’ve added it in our AWS Service Terms to provide an additional assurance to all of our customers.

Innovating secure generative AI workloads using AWS industry-leading security capabilities

From day one, AWS AI infrastructure and services have had built-in security and privacy features to give you control over your data. As customers move quickly to implement generative AI in their organizations, you need to know that your data is being handled securely across the AI lifecycle, including data preparation, training, and inferencing. The security of model weights—the parameters that a model learns during training that are critical for its ability to make predictions—is paramount to protecting your data and maintaining model integrity.

This is why it is critical for AWS to continue to innovate on behalf of our customers to raise the bar on security across each layer of the generative AI stack. To do this, we believe that you must have security and confidentiality built in across each layer of the generative AI stack. You need to be able to secure the infrastructure to train LLMs and other FMs, build securely with tools to run LLMs and other FMs, and run applications that use FMs with built-in security and privacy that you can trust.

At AWS, securing AI infrastructure refers to zero access to sensitive AI data, such as AI model weights and data processed with those models, by any unauthorized person, either at the infrastructure operator or at the customer. It’s comprised of three key principles:

  1. Complete isolation of the AI data from the infrastructure operator – The infrastructure operator must have no ability to access customer content and AI data, such as AI model weights and data processed with models.
  2. Ability for customers to isolate AI data from themselves – The infrastructure must provide a mechanism to allow model weights and data to be loaded into hardware, while remaining isolated and inaccessible from customers’ own users and software.
  3. Protected infrastructure communications – The communication between devices in the ML accelerator infrastructure must be protected. All externally accessible links between the devices must be encrypted.

The Nitro System fulfills the first principle of Secure AI Infrastructure by isolating your AI data from AWS operators. The second principle provides you with a way to remove administrative access of your own users and software to your AI data. AWS not only offers you a way to achieve that, but we also made it straightforward and practical by investing in building an integrated solution between AWS Nitro Enclaves and AWS Key Management Service (AWS KMS). With Nitro Enclaves and AWS KMS, you can encrypt your sensitive AI data using keys that you own and control, store that data in a location of your choice, and securely transfer the encrypted data to an isolated compute environment for inferencing. Throughout this entire process, the sensitive AI data is encrypted and isolated from your own users and software on your EC2 instance, and AWS operators cannot access this data. Use cases that have benefited from this flow include running LLM inferencing in an enclave. Until today, Nitro Enclaves operate only in the CPU, limiting the potential for larger generative AI models and more complex processing.

We announced our plans to extend this Nitro end-to-end encrypted flow to include first-class integration with ML accelerators and GPUs, fulfilling the third principle. You will be able to decrypt and load sensitive AI data into an ML accelerator for processing while providing isolation from your own operators and verified authenticity of the application used for processing the AI data. Through the Nitro System, you can cryptographically validate your applications to AWS KMS and decrypt data only when the necessary checks pass. This enhancement allows AWS to offer end-to-end encryption for your data as it flows through generative AI workloads.

We plan to offer this end-to-end encrypted flow in the upcoming AWS-designed Trainium2 as well as GPU instances based on NVIDIA’s upcoming Blackwell architecture, which both offer secure communications between devices, the third principle of Secure AI Infrastructure. AWS and NVIDIA are collaborating closely to bring a joint solution to market, including NVIDIA’s new NVIDIA Blackwell GPU 21 platform, which couples NVIDIA’s GB200 NVL72 solution with the Nitro System and EFA technologies to provide an industry-leading solution for securely building and deploying next-generation generative AI applications.

Advancing the future of generative AI security

Today, tens of thousands of customers are using AWS to experiment and move transformative generative AI applications into production. Generative AI workloads contain highly valuable and sensitive data that needs the level of protection from your own operators and the cloud service provider. Customers using AWS Nitro-based EC2 instances have received this level of protection and isolation from AWS operators since 2017, when we launched our innovative Nitro System.

At AWS, we’re continuing that innovation as we invest in building performant and accessible capabilities to make it practical for our customers to secure their generative AI workloads across the three layers of the generative AI stack, so that you can focus on what you do best: building and extending the uses of the generative AI to more areas. Learn more here.


About the authors

Anthony Liguori is an AWS VP and Distinguished Engineer for EC2

Colm MacCárthaigh is an AWS VP and Distinguished Engineer for EC2

Read More

To Cut a Long Story Short: Video Editors Benefit From DaVinci Resolve’s New AI Features Powered by RTX

To Cut a Long Story Short: Video Editors Benefit From DaVinci Resolve’s New AI Features Powered by RTX

Editor’s note: This post is part of our In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Video editors have more to look forward to than just April showers.

Blackmagic Design’s DaVinci Resolve released version 19, adding the IntelliTrack AI point tracker and UltraNR AI-powered features to further streamline video editing workflows.

The NAB 2024 trade show is bringing together thousands of content professionals
from all corners of the broadcast, media and entertainment industries, with video editors and livestreamers seeking ways to improve their creative workflows with NVIDIA RTX technology.

The recently launched Design app SketchUp 2024 introduced a new graphics engine that uses DirectX 12, which renders scenes 2.5x faster than the previous engine.

April also brings the latest NVIDIA Studio Driver, which optimizes the latest creative app updates, available for download today.

And this week’s featured In the NVIDIA Studio artist Rakesh Kumar created his captivating 3D scene The Rooted Vault using RTX acceleration.

Video Editor’s DaVinci Code

DaVinci Resolve is a powerful video editing package with color correction, visual effects, motion graphics and audio post-production all in one software tool. Its elegant, modern interface is easy to learn for new users, while offering powerful capabilities for professionals.

Two new AI features make video editing even more efficient: the IntelliTrack AI point tracker for object tracking, stabilization and audio panning, and UltraNR, which uses AI for spatial noise reduction — doing so 3x faster on the GeForce RTX 4090 vs. the Mac M2 Ultra.

All DaVinci Resolve AI effects are accelerated on RTX GPUs by NVIDIA TensorRT, boosting AI performance by up to 2x. The update also includes acceleration for Beauty, Edge Detect and Watercolor effects, doubling performance on NVIDIA GPUs.

For more information, check out the DaVinci Resolve website.

SketchUp Steps Up

SketchUp 2024 is a professional-grade 3D design software toolkit for designing buildings and landscapes, commonly used by designers and architects.

The new app, already receiving positive reviews, introduced a robust graphics engine that uses DirectX 12, which increases frames per second (FPS) by a factor of 2.5x over the previous engine. Navigating and orbiting complex models feels considerably lighter and faster with quicker, more predictable performance.

In testing, the scene below runs 4.5x faster FPS using the NVIDIA RTX 4090 vs. the Mac M2 Ultra and other competitors.

2.5x faster FPS with the GeForce RTX 4090 GPU. Image courtesy of Trimble SketchUp.

SketchUp 2024 also unlocks import and export functionality for OpenUSD files to efficiently manage the interoperability of complex 3D scenes and animations across numerous 3D apps.

Get the full release details.

Art Rooted in Nature

Rakesh Kumar’s passion for 3D modeling and animation stemmed from his love for gaming and storytelling.

“My goal is to inspire audiences and take them to new realms by showcasing the power of immersive storytelling, captivating visuals and the idea of creating worlds and characters that evoke emotions,” said Kumar.

His scene The Rooted Vault aims to convey the beauty of the natural world, transporting viewers to a serene setting filled with the soothing melodies of nature.

 

Kumar began by gathering reference material.

There’s reference sheets … and then there’s reference sheets.

He then used Autodesk Maya to block out the basic structure and piece together the house as a series of modules. GPU-accelerated viewport graphics ensured fast, interactive 3D modeling and animations.

Next, Kumar used ZBrush to sculpt high-resolution details into the modular assets.

Fine details applied in ZBrush.

“I chose an NVIDIA RTX GPU-powered system for real-time ray tracing to achieve lifelike visuals, reliable performance for smoother workflows, faster render times and industry-standard software compatibility.” — Rakesh Kumar

He used the ZBrush decimation tool alongside Unreal Engine’s Nanite workflow to efficiently create most of the modular building props.

Traditional poly-modeling workflows for the walls enabled vertex blending shaders for seamless texture transitions.

Textures were created with Adobe Substance 3D Painter. Kumar’s RTX GPU used RTX-accelerated light and ambient occlusion to bake and optimize assets in mere seconds.

Kumar moved the project to Unreal Engine 5, where near-final finishing touches such as lighting, shadows and visual effects were applied.

Textures applied in Adobe Substance 3D Painter.

GPU acceleration played a crucial role in real-time rendering, allowing him to instantly see and adjust the scene.

Adobe Premiere Pro has a vast selection of GPU-accelerated features.

Kumar then moved to Blackmagic Design’s DaVinci Resolve to color grade the scene for the desired mood and aesthetic, before he began final editing in Premiere Pro, adding transitions and audio.

“While the initial concept required significant revisions, the final result demonstrates the iterative nature of artistic creation — all inspired by my mentors, friends and family, who were always there to support me,” Kumar said.

3D artist Rakesh Kumar.

Check out Kumar’s latest work on Instagram.

Follow NVIDIA Studio on Instagram, X and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Abstracts: April 16, 2024

Abstracts: April 16, 2024

Stylized microphone and sound waves illustration.

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Research Software Engineer Tusher Chakraborty joins host Gretchen Huizinga to discuss “Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things,” which was accepted at the 2024 USENIX Symposium on Networked Systems Design and Implementation (NSDI). In the paper, Chakraborty and his coauthors share their efforts to address the challenges of delivering reliable and affordable IoT connectivity via satellite-based networks. They propose a method for leveraging the motion of small satellites to facilitate efficient communication between a large IoT-satellite constellation and devices on Earth within a limited spectrum.

Transcript

[MUSIC]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

I’m talking today to Tusher Chakraborty, a senior research software engineer at Microsoft Research. Tusher is coauthor of a paper called “Spectrumize: Spectrum-efficient Satellite Networks for the Internet of Things.” Tusher, thanks for joining us on Abstracts!


TUSHER CHAKRABORTY: Hi. Thank you for having me here, Gretchen, today. Thank you.

HUIZINGA: So because this show is all about abstracts, in just a few sentences, tell us about the problem your paper addresses and why we should care about it.

CHAKRABORTY: Yeah, so think of, I’m a farmer living in a remote area and bought a sensor to monitor the soil quality of my farm. The big headache for me would be how to connect the sensor so that I can get access to the sensor data from anywhere. We all know that connectivity is a major bottleneck in remote areas. Now, what if, as a farmer, I could just click the power button of the sensor, and it gets connected from anywhere in the world. It’s pretty amazing, right? And that’s what our research is all about. Get your sensor devices connected from anywhere in the world with just the click of power button. We call it one-click connectivity. Now, you might be wondering, what’s the secret sauce? It’s not magic; it’s direct-to-satellite connectivity. So these sensors directly get connected to the satellites overhead from anywhere on Earth. The satellites, which are orbiting around the earth, collect the data from the sensing devices and forward to the ground stations in some other convenient parts of the world where these ground stations are connected to the internet.

HUIZINGA: So, Tusher, tell us what’s been tried before to address these issues and how your approach contributes to the literature and moves the science forward.

CHAKRABORTY: So satellite connectivity is nothing new and has been there for long. However, what sets us apart is our focus on democratizing space connectivity, making it affordable for everyone on the planet. So we are talking about the satellites that are at least 10 to 20 times cheaper and smaller than state-of-the-art satellites. So naturally, this ambitious vision comes with its own set of challenges. So when you try to make something cheaper and smaller, you’ll face lots of challenges that all these big satellites are not facing. So if I just go a bit technical, think of the antenna. So these big satellite antennas, they can actually focus on particular part of the world. So this is something called beamforming. On the other hand, when we try to make the satellites cheaper and smaller, we can’t have that luxury. We can’t have beamforming capability. So what happens, they have omnidirectional antenna. So it seems like … you can’t focus on a particular part of the earth rather than you create a huge footprint on all over the earth. So this is one of the challenges that you don’t face in the state-of-the-art satellites. And we try to solve these challenges because we want to make connectivity affordable with cheaper and smaller satellites.

HUIZINGA: Right. So as you’re describing this, it sounds like this is a universal problem, and people have obviously tried to make things smaller and more affordable in the past. How is yours different? What methodology did you use to resolve the problems, and how did you conduct the research?

CHAKRABORTY: OK, I’m thrilled that you asked this one because the research methodology was the most exciting part for me here. As a part of this research, we launched a satellite in a joint effort with a satellite company. Like, this is very awesome! So it was a hands-on experience with a real-deal satellite system. It was not simulation-based system. The main goal here was to learn the challenge from a real-world experience and come up with innovative solutions; at the same time, evaluate the solutions in real world. So it was all about learning by doing, and let me tell you, it was quite the ride! [LAUGHTER] We didn’t do anything new when we launched the satellites. We just tried to see how industry today does this. We want to learn from them, hey, what’s the industry practice? We launched a satellite. And then we faced a lot of problems that today’s industry is facing. And from there, we learned, hey, like, you know, this problem is industry facing; let’s go after this, and let’s solve this. And then we tried to come up with the solutions based on those problems. And this was our approach. We didn’t want to assume something beforehand. We want to learn from how industry is going today and help them. Like, hey, these are the problems you are facing, and we are here to help you out.

HUIZINGA: All right, so assuming you learned something and wanted to pass it along, what were your major findings?

CHAKRABORTY: OK, that’s a very good question. So I was talking about the challenges towards this democratization earlier, right? So one of the most pressing challenges: shortage of spectrum. So let me try to explain this from the high level. So we need hundreds of these satellites, hundreds of these small satellites, to provide 24-7 connectivity for millions of devices around the earth. Now, I was talking, the footprint of a satellite on Earth can easily cover a massive area, somewhat similar to the size of California. So now with this large footprint, a satellite can talk with thousands of devices on Earth. You can just imagine, right? And at the same time, a device on Earth can talk with multiple satellites because we are talking about hundreds of these satellites. Now, things get tricky here. [LAUGHTER] We need to make sure that when a device and a satellite are talking, another nearby device or a satellite doesn’t interfere. Otherwise, there will be chaos—no one hearing others properly. So when we were talking about this device and satellite chat, right, so what is that all about? This, all about in terms of communication, is packet exchange. So the device sends some packet to the satellite; satellite sends some packet to the device—it’s all about packet exchange. Now, you can think of, if multiple of these devices are talking with a satellite or multiple satellites are talking with a device, there will be a collision in this packet exchange if you try to send the packets at the same time. And if you do that, then your packet will be collided, and you won’t be able to get any packet on the receiver end. So what we do, we try to send this packet on different frequencies. It’s like a different sound or different tone so that they don’t collide with each other. And, like, now, I said that you need different frequencies, but frequency is naturally limited. And the choice of frequency is even limited. This is very expensive. But if you have limited frequency and you want to resolve this collision, then you have a problem here. How do you do that? So we solve this problem by smartly looking at an artifact of these satellites. So these satellites are moving really fast around the earth. So when they are moving very fast around the earth, they create a unique signature on the frequency that they are using to talk with the devices on Earth. And we use this unique signature, and in physics, this unique signature is known as Doppler signature. And now you don’t need a separate frequency to sound them different, to have packets on different frequencies. You just need to recognize that unique signature to distinguish between satellites and distinguish between their communications and packets. So in that sense, there won’t be any packet collision. And this is all about our findings. So with this, now multiple devices and satellites can talk with each other at the same time without interference but using the same frequency.

HUIZINGA: It sounds, like, very similar to a big room filled with a lot of people. Each person has their own voice, but in the mix, you, kind of, lose track of who’s talking and then you want to, kind of, tune in to that specific voice and say, that’s the one I’m listening to.

CHAKRABORTY: Yeah, I think you picked up the correct metaphor here! This is the scenario you can try to explain here. So, yeah, like what we are essentially doing, like, if you just, in a room full of people and they are trying to talk with each other, and then if they’re using the same tone, no one will be distinguished one person from another.

HUIZINGA: Right …

CHAKRABORTY: Everyone will sound same and that will be colliding. So you need to make sure that, how you can differentiate the tones …

HUIZINGA: Yeah …

CHAKRABORTY: … and the satellites differentiate their tones due to their fast movement. And we use our methodology to recognize that tone, which satellite is sending that tone.

HUIZINGA: So you sent up the experimental satellite to figure out what’s happening. Have you since tested it to see if it works?

CHAKRABORTY: Yeah, yeah, so we have tried it out, because this is a software solution, to be honest.

HUIZINGA: Ah.

CHAKRABORTY: As I was talking about, there is no hardware modification required at this point. So what we did, we just implemented this software in the ground stations, and then we tried to recognize which satellite is creating which sort of signature. That’s it!

HUIZINGA: Well, it seems like this research would have some solid real-world impact. So who would you say it helps most and how?

CHAKRABORTY: OK, that’s a very good one. So the majority of the earth still doesn’t have affordable connectivity. The lack of connectivity throws a big challenge to critical industries such as agriculture—the example that I gave—energy, and supply chain, so hindering their ability to thrive and innovate. So our vision is clear: to bring 24-7 connectivity for devices anywhere on Earth with just a click of power button. Moreover, affordability at the heart of our mission, ensuring that this connectivity is accessible to all. So in core, our efforts are geared towards empowering individuals and industries to unlock their full potential in an increasingly connected world.

HUIZINGA: If there was one thing you want our listeners to take away from this research, what would it be?

CHAKRABORTY: OK, if there is one thing I want you to take away from our work, it’s this: connectivity shouldn’t be a luxury; it’s a necessity. Whether you are a farmer in a remote village or a business owner in a city, access to reliable, affordable connectivity can transform your life and empower your endeavors. So our mission is to bring 24-7 connectivity to every corner of the globe with just a click of a button.

HUIZINGA: I like also how you say every corner of the globe, and I’m picturing a square! [LAUGHTER] OK, last question. Tusher, what’s next for research on satellite networks and Internet of Things? What big unanswered questions or unsolved problems remain in the field, and what are you planning to do about it?

CHAKRABORTY: Uh … where do I even begin? [LAUGHTER] Like, there are countless unanswered questions and unsolved problems in this field. But let me highlight one that we talked here: limited spectrum. So as our space network expands, so does our need for spectrum. But what’s the tricky part here? Just throw more and more spectrum. The problem is the chunk of spectrum that’s perfect for satellite communication is often already in use by the terrestrial networks. Now, a hard research question would be how we can make sure that the terrestrial and the satellite networks coexist in the same spectrum without interfering [with] each other. It’s a tough nut to crack, but it’s a challenge we are excited to tackle head-on as we continue to push the boundaries of research in this exciting field.

[MUSIC]

HUIZINGA: Tusher Chakraborty, thanks for joining us today, and to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts (opens in new tab). You can also read it on the Networked Systems Design and Implementation, or NSDI, website, and you can hear more about it at the NSDI conference this week. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: April 16, 2024 appeared first on Microsoft Research.

Read More

torchtune: Easily fine-tune LLMs using PyTorch

We’re pleased to announce the alpha release of torchtune, a PyTorch-native library for easily fine-tuning large language models.

Staying true to PyTorch’s design principles, torchtune provides composable and modular building blocks along with easy-to-extend training recipes to fine-tune popular LLMs on a variety of consumer-grade and professional GPUs.

torchtune supports the full fine-tuning workflow from start to finish, including

  • Downloading and preparing datasets and model checkpoints.
  • Customizing the training with composable building blocks that support different model architectures, parameter-efficient fine-tuning (PEFT) techniques, and more.
  • Logging progress and metrics to gain insight into the training process.
  • Quantizing the model post-tuning.
  • Evaluating the fine-tuned model on popular benchmarks.
  • Running local inference for testing fine-tuned models.
  • Checkpoint compatibility with popular production inference systems.

To get started, jump right into the code or walk through our many tutorials!

Why torchtune?

Over the past year there has been an explosion of interest in open LLMs. Fine-tuning these state of the art models has emerged as a critical technique for adapting them to specific use cases. This adaptation can require extensive customization from dataset and model selection all the way through to quantization, evaluation and inference. Moreover, the size of these models poses a significant challenge when trying to fine-tune them on consumer-level GPUs with limited memory.

Existing solutions make it hard to add these customizations or optimizations by hiding the necessary pieces behind layers of abstractions. It’s unclear how different components interact with each other and which of these need to be updated to add new functionality. torchtune empowers developers to adapt LLMs to their specific needs and constraints with full control and visibility.

torchtune’s Design

torchtune was built with the following principles in mind

  • Easy extensibility – New techniques emerge all the time and everyone’s fine-tuning use case is different. torchtune’s recipes are designed around easily composable components and hackable training loops, with minimal abstraction getting in the way of fine-tuning your fine-tuning. Each recipe is self-contained – no trainers or frameworks, and is designed to be easy to read – less than 600 lines of code!
  • Democratize fine-tuning – Users, regardless of their level of expertise, should be able to use torchtune. Clone and modify configs, or get your hands dirty with some code! You also don’t need beefy data center GPUs. Our memory efficient recipes have been tested on machines with a single 24GB gaming GPU.
  • Interoperability with the OSS LLM ecosystem – The open source LLM ecosystem is absolutely thriving, and torchtune takes advantage of this to provide interoperability with a wide range of offerings. This flexibility puts you firmly in control of how you train and use your fine-tuned models.

Over the next year, open LLMs will become even more powerful, with support for more languages (multilingual), more modalities (multimodal) and more tasks. As the complexity of these models increases, we need to pay the same attention to “how” we design our libraries as we do to the features provided or performance of a training run. Flexibility will be key to ensuring the community can maintain the current pace of innovation, and many libraries/tools will need to play well with each other to power the full spectrum of use cases. torchtune is built from the ground up with this future in mind.

In the true PyTorch spirit, torchtune makes it easy to get started by providing integrations with some of the most popular tools for working with LLMs.

  • Hugging Face Hub – Hugging Face provides an expansive repository of open source models and datasets for fine-tuning. torchtune seamlessly integrates through the tune download CLI command so you can get started right away with fine-tuning your first model.
  • PyTorch FSDP – Scale your training using PyTorch FSDP. It is very common for people to invest in machines with multiple consumer level cards like the 3090/4090 by NVidia. torchtune allows you to take advantage of these setups by providing distributed recipes powered by FSDP.
  • Weights & Biases – torchtune uses the Weights & Biases AI platform to log metrics and model checkpoints during training. Track your configs, metrics and models from your fine-tuning runs all in one place!
  • EleutherAI’s LM Evaluation Harness – Evaluating fine-tuned models is critical to understanding whether fine-tuning is giving you the results you need. torchtune includes a simple evaluation recipe powered by EleutherAI’s LM Evaluation Harness to provide easy access to a comprehensive suite of standard LLM benchmarks. Given the importance of evaluation, we will be working with EleutherAI very closely in the next few months to build an even deeper and more “native” integration.
  • ExecuTorch – Models fine-tuned with torchtune can be easily exported to ExecuTorch, enabling efficient inference to be run on a wide variety of mobile and edge devices.
  • torchao – Easily and efficiently quantize your fine-tuned models into 4-bit or 8-bit using a simple post-training recipe powered by the quantization APIs from torchao.

What’s Next?

This is just the beginning and we’re really excited to put this alpha version in front of a vibrant and energetic community. In the coming weeks, we’ll continue to augment the library with more models, features and fine-tuning techniques. We’d love to hear any feedback, comments or feature requests in the form of GitHub issues on our repository, or on our Discord channel. As always, we’d love any contributions from this awesome community. Happy Tuning!

Read More

Hindsight PRIORs for Reward Learning from Human Preferences

Preference based Reinforcement Learning (PbRL) has shown great promise in learning from human preference binary feedback on agent’s trajectory behaviors, where one of the major goals is to reduce the number of queried human feedback. While the binary labels are a direct comment on the goodness of a trajectory behavior, there is still a need for resolving credit assignment especially in limited feedback. We propose our work, PRIor On Rewards (PRIOR) that learns a forward dynamics world model to approximate apriori selective attention over states which serves as a means to perform credit…Apple Machine Learning Research

AI Is Tech’s ‘Greatest Contribution to Social Elevation,’ NVIDIA CEO Tells Oregon State Students

AI Is Tech’s ‘Greatest Contribution to Social Elevation,’ NVIDIA CEO Tells Oregon State Students

AI promises to bring the full benefits of the digital revolution to billions across the globe, NVIDIA CEO Jensen Huang said Friday during a conversation with Oregon State University President Jayathi Murthy.

“I believe that artificial intelligence is the technology industry’s single greatest contribution to social elevation, to lift all of the people that have historically been left behind,” Huang told more than 2,000 faculty, students and staff gathered for his conversation with Murthy.

The talk was the highlight of a forum marking the groundbreaking for a new research building that will be named for Huang and his wife, Lori, both Oregon State alumni.

The facility positions Oregon State as a leader not just in the semiconductor industry but also at the intersection of high performance computing and a growing number of fields.

Friday’s event at Oregon State University followed the groundbreaking for the Jen-Hsun Huang and Lori Mills Huang Collaborative Innovation Complex. Image courtesy of Oregon State University.

Those innovations have world-changing implications.

Huang said those who know a programming language such as C++ typically have greater opportunities.

“Because programming is so hard, the number of people who have benefitted from this, putting it to use for their economic prosperity, has been limited,” Huang said.

AI unlocks that and more.

“So you essentially have a collaborator with you at all times, essentially have a tutor at all times, and so I think the ability for AI to elevate all of the people left behind is quite extraordinary,” he added.

Huang’s appearance in Corvallis, Oregon, capped off a week of announcements underscoring NVIDIA’s commitment to preparing the future workforce with advanced AI, data science and high performance computing training.

On Tuesday, NVIDIA announced that it would participate in a $110 million partnership between Japan and the United States, which would include funding for university research.

On Wednesday, Georgia Tech announced a new NVIDIA-powered supercomputer that will help prepare undergraduate students to solve complex challenges with AI and HPC.

And later this month, NVIDIA founder Chris Malachowsky will be inducted into the Hall of Fame for the University of Florida’s Department of Electrical & Computer Engineering, following the November inauguration of the university’s $150 million Malachowsky Hall for Data Science & Information Technology.

Educating Future Leaders for ‘New Industrial Revolution’

NVIDIA has been investing in universities for decades, providing computing resources, advanced training curricula, donations and other support.

These contributions enable students and professors to access the high performance computing necessary for groundbreaking results at a key moment in the history of the industry.

“We’re at the beginning of a new industrial revolution, and the reason why I say that is because an industrial revolution produces something new that was impossible to produce in the past,” Huang said.

“And in this new world, you can apply electricity, and what’s going to come out of it is a whole bunch of floating-point numbers. We call them tokens, and those tokens are essentially artificial intelligence,” Huang said.

“And so this industrial revolution is going to be manufacturing intelligence at a very large scale,” Huang said.

OSU Breaks Ground on $213 Million Research Complex

Friday’s event in Oregon highlighted the Huangs’ commitment to education and reflected the couple’s deep personal ties to Oregon State, where the two met.

The conversation with Murthy followed the groundbreaking for the Jen-Hsun Huang and Lori Mills Huang Collaborative Innovation Complex, which took place Friday morning on the Corvallis campus.

When it opens in 2026, the 150,000-square-foot, $213 million complex — supported by a $50 million gift from the Huangs — will increase Oregon State’s support for the semiconductor and technology industry in Oregon and beyond.

Harnessing one of the nation’s most powerful NVIDIA supercomputers, the complex will bring together faculty and students to solve critical challenges facing the world in areas such as climate science, clean energy and water resources.

Huang sees the center — and AI — as helping put the benefits of computing at the service of people doing work across a broad range of disciplines.

Oregon State is one of the world’s premier schools in forestry, Huang said, adding that “let’s just face it, it’s very unlikely that somebody who was in forestry, it’s not impossible, but C++ is probably not your thing,” Huang said.

Thanks to ChatGPT, you can “now use a computer to apply it to your field of science and apply this computing technology to revolutionize your work.”

That makes learning how to think — and how to collaborate — more important than ever, Huang said. It’s “no different than if I gave you a partner to collaborate with you to solve problems,” Huang said.

“You still need to know how to collaborate, how to prompt, how to frame a problem, how to refine the solution, how to iterate on it and how to change your mind.”

NVIDIA Joins $110 Million Partnership to Help Universities Teach AI Skills

The groundbreaking at Oregon State is just one of several announcements highlighting NVIDIA’s global commitment to advancing the global technology industry.

Last week, the Biden Administration announced a new $110 million AI partnership between Japan and the United States, including an initiative to fund research through collaboration between the University of Washington and the University of Tsukuba.

As part of this, NVIDIA is committing $25 million to a collaboration with Amazon to bring the latest technologies to the University of Washington, in Seattle, and the University of Tsukuba, northeast of Tokyo.

Georgia Tech Unveils New AI Makerspace in Collaboration With NVIDIA

And on Wednesday, Georgia Tech’s College of Engineering established an AI supercomputing hub dedicated to teaching students.

The AI Makerspace was launched in collaboration with NVIDIA. College leaders call it a “digital sandbox” for students to understand and use AI. Initially focusing on undergraduates, the AI Makerspace aims to democratize access to computing resources typically reserved for researchers or technology companies.

Students will access the cluster online as part of their coursework. The Makerspace will also better position students after graduation as they work with AI professionals and help shape future applications.

‘Beginning of a New World’

To be sure, AI has limits, Huang explained. “It’s no different than when you work with teammates or lab partners; you’re guiding each other along because you know each other’s weaknesses and strengths,” he said.

However, Huang said now is a fantastic time to get an education and prepare for a career.

“This is the beginning of a new world and this is the best of times to go to school — the whole world is changing, right? New technology and new capabilities, new instruments and new ways to learn,” Huang said.

Images courtesy of Oregon State University.

Read More

Vanishing Gradients in Reinforcement Finetuning of Language Models

Pretrained language models are commonly adapted to comply with human intent and downstream tasks via finetuning. The finetuning process involves supervised finetuning (SFT), using labeled samples, and/or reinforcement learning based fine-tuning (RFT) via policy gradient methods, using a (possibly learned) reward function. This work highlights an overlooked optimization hurdle in RFT: we prove that the expected gradient for an input sample (i.e. prompt) vanishes if its reward standard deviation under the model is low, regardless of whether the reward mean is near-optimal or not. We then…Apple Machine Learning Research

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Inspired by the advancements in foundation models for language-vision modeling, we explore the utilization of transformers and large-scale pretraining on biosignals. In this study, our aim is to design a general-purpose architecture for biosignals that can be easily trained on multiple modalities and can be adapted to new modalities or tasks with ease.
The proposed model is designed with three key features: (i) A frequency-aware architecture that can efficiently identify local and global information from biosignals by leveraging global filters in the frequency space. (ii) A channel-independent…Apple Machine Learning Research