Thinking beyond audio: Augmenting headphones for everyday digital interactions

Thinking beyond audio: Augmenting headphones for everyday digital interactions

This research was accepted by and received a Best Paper Award during ACM Designing Interactive Systems (DIS) 2023, which is dedicated to advancing the field of user-centered system design.

Headphones are traditionally used to provide and manage audio experiences through physical controls and a range of sensors. Nonetheless, these controls and sensors have remained confined to audio input and output functionality, such as adjusting the volume or muting the microphone. Imagine if headphones could transcend their role as mere audio devices. 

Because headphones rank among the most popular wearables in the market, we have an exciting opportunity to expand their capabilities through integrating existing sensors with supplementary ones to enable a wide variety of experiences that go beyond traditional audio control. In our paper, “Beyond Audio: Towards a Design Space of Headphones as a Site for Interaction and Sensing,” we share a vision that explores this potential.

By using sensors such as microphones, proximity sensors, motion sensors, inertial measurement units (IMUs), and LiDARs, headphone designers can explore new avenues of input and interaction. The fact that headphones are worn on a person’s head allows for a wide range of applications, such as following head movements, body postures, and hand gestures. Furthermore, as wearable devices, headphones have the potential to provide wearers with context-rich information and enable more intuitive and immersive interactions with their devices and environment beyond traditional button-based controls.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

Potential scenarios for sensor-enhanced headphones 

To explore this concept further, we propose augmenting headphones with additional sensors and input widgets. These include: 

  • IMUs to sense head orientation
  • Swappable sets of input controls  
  • A range-sensing LiDAR that enables the sensing of hand gestures

By incorporating these capabilities, we envision a wide range of applications where headphone input acts as a bridge between the person wearing it and their environment and enable more efficient and context-aware interactions among multiple devices and tasks. For example, a headphone could assist people with applications like video games or help manage interruptions during a video call.  

Let’s explore some scenarios to illustrate the potential of our headphone design concept. Consider a person engaged in a video call with teammates when they are suddenly interrupted by a colleague who approaches in person. In this situation, our headphones would be equipped to detect contextual cues, such as when the wearer rotates their head away from a video call, signaling a shift in attention. In response, the headphones could automatically blur the video feed and mute the microphone to protect the wearer’s privacy, as shown in Figure 1. This feature could also communicate to other participants that the wearer is temporarily engaged in another conversation or activity. When the wearer returns their attention to the call, the system removes the blur and reactivates the microphone.

Figure 1: Two videos side-by-side showing the headphones in a context-aware privacy-control scenario. On the left, there is an over-the-shoulder view of a wearer participating in a video call on a laptop. As he looks away from the call, the laptop screen changes color, and the application is muted, depicted by a mute icon overlayed on the video. As the wearer looks back at the screen, it becomes unblurred and a unmute icon is overlaid on the image, indicating the mute has been turned off. On the right, we see the laptop screen previously described.
Figure 1. These videos illustrate a context-aware privacy control system implemented during a video conference. In this scenario, the wearer temporarily disengages from the video conference to engage in an in-person conversation. After a predefined period, the system detects the wearer’s continued attention directed away from any known device, taking into account the environment context. As a result, privacy measures are triggered, including video blurring, microphone muting, and notifying other participants on the call. Once the wearer re-engages with the screen, their video and microphone settings return to normal, ensuring a seamless experience.

In another privacy-focused scenario, imagine a person simultaneously conversing with multiple teammates in separate video call channels. Our headphone design allows the wearer to control to whom their speech is directed by simply looking at their intended audience, as shown in Figure 2. This directed speech interaction can extend beyond video calls and be applied to other contexts, such as sending targeted voice commands to teammates in a multiplayer video game.

DIS 2023 - Figure 2: Two videos side-by-side showing the wearer controlling where his input is being sent among a multitude of devices. On the left, a video shows an over-the-shoulder view of a wearer interacting with a monitor and aptop while wearing headphones. There are two separate video calls on each screen. As the wearer turns from one screen to another, a large microphone icon appears on the screen at which the wearer is looking, and a muted microphone icon is shown on the other screen.

The video on the right shows an over-the-shoulder view of a wearer interacting with a laptop while wearing headphones. The laptop screen shows a video game and four circular icons on each corner depicting the other players. The user looks at the bottom left of the screen, which enlarges the icon of the teammate in that corner, and the wearer starts to speak. The wearer then looks at the top-right of the screen, and the teammate in that corner is highlighted while the wearer speaks.
Figure 2. Headphones track the wearer’s head pose, seamlessly facilitating the distribution of video and/or audio across multiple private chats. They effectively communicate the wearer’s availability to other participants, whether in a video conferencing scenario (left) or a gaming scenario (right).

In our paper, we also demonstrate how socially recognizable gestures can introduce new forms of audio-visual control instead of relying solely on on-screen controls. For example, wearers could interact with media through gestural actions, such as cupping their ear towards the audio source to increase the volume while simultaneously reducing ambient noise, as shown in Figure 3. These gestures, ingrained in social and cultural contexts, can serve as both control mechanisms and nonverbal communication signals.

DIS 2023 - Fig 3 - image showing gestural controls for volume
Figure 3. Top: Raising the earcup, a commonly used gesture to address in-person interruptions, mutes both the sound and the microphone to ensure privacy. Bottom: Cupping the earcup, a gesture indicating difficulty hearing, increases the system volume.

Additionally, we can estimate the wearer’s head gaze through the use of an IMU. When combined with the physical location of computing devices in the wearer’s vicinity, it opens up possibilities for seamless interactions across multiple devices. For instance, during a video call, the wearer can share the screen of the device they are actively focusing on. In this scenario, the wearer shifts their attention from an external monitor to a tablet device. Even though this tablet is not directly connected to the main laptop, our system smoothly transitions the screen sharing for the wearer’s audience in the video call, as shown in Figure 4.

DIS 2023 - Figure 4: Two videos side-by-side showing a headphone wearer among a multitude of devices controlling which screen is shared in a video call. The video on the left shows an over-the-shoulder view of a person interacting with three screens—a monitor, a laptop, and a tablet—while wearing headphones. A video call is in progress on the laptop, and the wearer is giving a presentation, which appears as a slide on the attached monitor. As the wearer turns from the laptop screen to the monitor, the presentation slide appears on the shared laptop screen. The video on the right shows an over-the-shoulder view of the person interacting with three screens—a monitor, a laptop, and a tablet—while wearing headphones. We see the wearer looking at the monitor with a presentation slide, which is mirrored on the laptop screen. He then turns from the monitor to the tablet, which has a drawing app open. As he does this, the drawing app appears on the shared laptop screen. The wearer uses a pen to draw on the tablet, and this is mirrored on the laptop. Finally, the wearer looks up from the tablet to the laptop, and the laptop screen switches to the video call view with the participants’ videos.
Figure 4. A wearer delivers a presentation using a video conferencing tool. As the wearer looks at different devices, the streamed video dynamically updates to display the relevant source to participants.

Finally, in our paper we also show the use of embodied interactions, where the wearer’s body movements serve to animate a digital representation of themselves, such as an avatar in a video call, as shown in Figure 5. This feature can also be implemented as a gameplay mechanism. Take a racing game for instance, where the wearer’s body movements could control the vehicle’s steering, shown on the left in Figure 6. To extend this capability, these movements could enable a wearer to peek around obstacles in any first-person game, enhancing the immersion and gameplay experience, shown on the right in Figure 6.

DIS 2023 - Figure 5: Two videos showing a headphone wearer controlling an avatar in a video call through head movements. The video on the left shows an over-the-shoulder view of a headphones wearer interacting with another participant on the call. The video on the right shows a wearer using a touch control to depict an emotion in his avatar.
Figure 5. Left: Headphones use an IMU to monitor and capture natural body movements, which are then translated into corresponding avatar movements. Right: Touch controls integrated into headphones enable wearers to evoke a range of emotions on the avatar, enhancing the user experience.
DIS 2023 - Figure 6: Two videos showing a wearer playing a video game while leaning left and right. These movements control his character’s movements, enabling him to duck and peek around walls.
Figure 6. Leaning while wearing the headphone (with an integrated IMU) has a direct impact on game play action. On the left, it results in swerving the car to the side, while on the right, in enables the player to duck behind a wall.

Design space for headphone interactions 

We define a design space for interactive headphones through an exploration of two distinct concepts, which we discuss in depth in our paper.

First, we look at the type of input gesture for the interaction, which we further classify into three categories. The gestural input from the wearer might fall under one or more of these categories, which we outline in more detail below and illustrate in Figure 7.

  • Touch-based gestures that involve tangible inputs on the headphones, such as buttons or knobs, requiring physical contact by the wearer
  • Mid-air gestures, which the wearer makes with their hands in close proximity to the headphones, detected through LiDAR technology
  • Head orientation, indicating the direction of the wearer’s attention
DIS 2023 - Figure 7: List of three stylized images showing the three main kinds of gestures we look at: touch, head orientation, and mid-air gestures.
Figure 7. Sensor-enhanced headphones can use touch-based gestures (left), head orientation (middle), or mid-air gestures (right) as types of input.

The second way that we define the design space is through the context within which the wearer executes the action. Here, design considerations for sensor-enhanced headphones go beyond user intentionality and observed motion. Context-awareness enables these headphones to understand the wearer’s activities, the applications they are engaged with, and the devices in their vicinity, as illustrated in Figure 8. This understanding enables the headphones to provide personalized experiences and seamlessly integrate with the wearer’s environment. The four categories that define this context-awareness are comprised of the following: 

  • Context-free actions, which produce similar results regardless of the active application, the wearer’s activity, or the social or physical environment.  
  • Context that is defined by the application with which the wearer is interacting. For example, are they listening to music, on a video call, or watching a movie?  
  • Context that is defined by the wearer’s body. For example, is the wearer’s gesture close to a body part that has an associated meaning? Eyes might relate to visual functions, ears to audio input, and the mouth to audio output. 
  • Context that is defined by the wearer’s environment. For example, are there other devices or people around the wearer with whom they might want to interact?
DIS 2023 - Figure 8: Diagram showing the different levels of context we look at: context free, application, user's body, and the environment.
Figure 8. The system uses diverse contextual information to enable personalized responses to user input.

Looking ahead: Expanding the possibilities of HCI with everyday wearables  

Sensor-enhanced headphones offer a promising avenue for designers to create immersive and context-aware user experiences. By incorporating sensors, these headphones can capture subtle user behaviors, facilitating seamless interactions and enhancing the wearer’s overall experience.  

From safeguarding privacy to providing intuitive control mechanisms, the potential applications for sensor-enhanced headphones are vast and exciting. This exploration with headphones scratches the surface of what context-aware wearable technology can empower its wearers to achieve. Consider the multitude of wearables we use every day that could benefit from integrating similar sensing and interaction capabilities into these devices. For example, imagine a watch that can track your hand movements and detect gestures. By enabling communication between sensor-enhanced wearables, we can establish a cohesive ecosystem for human-computer interaction that spans across applications, devices, and social contexts.

The post Thinking beyond audio: Augmenting headphones for everyday digital interactions appeared first on Microsoft Research.

Read More

Score! Team NVIDIA Takes Trophy in Recommendation Systems

Score! Team NVIDIA Takes Trophy in Recommendation Systems

A crack NVIDIA team of five machine learning experts spread across four continents won all three tasks in a hotly contested, prestigious competition to build state-of-the-art recommendation systems.

The results reflect the group’s savvy applying the NVIDIA AI platform to real-world challenges for these engines of the digital economy. Recommenders serve up trillions of search results, ads, products, music and news stories to billions of people daily.

More than 450 teams of data scientists competed in the Amazon KDD Cup ‘23. The three-month challenge had its share of twists and turns and a nail-biter of a finish.

Shifting Into High Gear

In the first 10 weeks of the competition, the team had a comfortable lead. But in the final phase, organizers switched to new test datasets and other teams surged ahead.

The NVIDIANs shifted into high gear, working nights and weekends to catch up. They left a trail of round-the-clock Slack messages from team members living in cities from Berlin to Tokyo.

“We were working nonstop, it was pretty exciting,” said Chris Deotte, a team member in San Diego.

A Product by Any Other Name

The last of the three tasks was the hardest.

Participants had to predict which products users would buy based on data from their browsing sessions. But the training data didn’t include brand names of many possible choices.

“I knew from the beginning, this would be a very, very difficult test,” said Gilberto “Giba” Titericz.

KGMON to the Rescue

Based in Curitaba, Brazil, Titericz was one of four team members ranked as grandmasters in Kaggle competitions, the online Olympics of data science. They’re part of a team of machine learning ninjas who’ve won dozens of competitions. NVIDIA founder and CEO Jensen Huang calls them KGMON (Kaggle Grandmasters of NVIDIA), a playful takeoff on Pokémon.

In dozens of experiments, Titericz used large language models (LLMs) to build generative AIs to predict product names, but none worked.

In a creative flash, the team discovered a work-around. Predictions using their new hybrid ranking/classifier model were spot on.

Down to the Wire

In the last hours of the competition, the team raced to package all their models together for a few final submissions. They’d been running overnight experiments across as many as 40 computers.

Kazuki Onodera, a KGMON in Tokyo, was feeling jittery. “I really didn’t know if our actual scores would match what we were estimating,” he said.

KGMON pictures
The four KGMON (clockwise from upper left) Onodera, Titericz, Deotte and Puget.

Deotte, also a KGMON, remembered it as “something like 100 different models all working together to produce a single output … we submitted it to the leaderboard, and POW!”

The team inched ahead of its closest rival in the AI equivalent of a photo finish.

The Power of Transfer Learning

In another task, the team had to take lessons learned from large datasets in English, German and Japanese and apply them to meager datasets a tenth the size in French, Italian and Spanish. It’s the kind of real-world challenge many companies face as they expand their digital presence around the globe.

Jean-Francois Puget, a three-time Kaggle grandmaster based outside Paris, knew an effective approach to transfer learning. He used a pretrained multilingual model to encode product names, then fine-tuned the encodings.

“Using transfer learning improved the leaderboard scores enormously,” he said.

Blending Savvy and Smart Software

The KGMON efforts show the field known as recsys is sometimes more art than science, a practice that combines intuition and iteration.

It’s expertise that’s encoded into software products like NVIDIA Merlin, a framework to help users quickly build their own recommendation systems.

Chart of Merlin framework for recommendation
The Merlin framework provides an end-to-end solution for building recommendation systems.

Benedikt Schifferer, a Berlin-based teammate who helps design Merlin, used the software to train transformer models that crushed the competition’s classic recsys task.

“Merlin provides great results right out of the box, and the flexible design lets me customize models for the specific challenge,” he said.

Riding the RAPIDS

Like his teammates, he also used RAPIDS, a set of open-source libraries for accelerating data science on GPUs.

For example, Deotte accessed code from NGC, NVIDIA’s hub for accelerated software. Called DASK XGBoost, the code helped spread a large, complex task across eight GPUs and their memory.

For his part, Titericz used a RAPIDS library called cuML to search through millions of product comparisons in seconds.

The team focused on session-based recommenders that don’t require data from multiple user visits. It’s a best practice these days when many users want to protect their privacy.

To learn more:

 

 

Read More

MosaicML Helps AI Users Boost Accuracy, Cut Costs and Save Time

MosaicML Helps AI Users Boost Accuracy, Cut Costs and Save Time

Startup MosaicML is on a mission to help the AI community improve prediction accuracy, decrease costs and save time by providing tools for easy training and deployment of large AI models.

In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with MosaicML CEO and co-founder Naveen Rao about how the company aims to democratize access to large language models.

MosaicML, a member of NVIDIA’s Inception program, has identified two key barriers to widespread adoption: the difficulty of coordinating a large number of GPUs to train a model and the costs associated with this process.

MosaicML was in the news earlier this month when Databricks announced an agreement to acquire MosaicML for $1.3 billion.

Making training of models accessible is key for many companies that need control over model behavior, respect data privacy and iterate fast to develop new products based on AI.

You Might Also Like

Jules Anh Tuan Nguyen Explains How AI Lets Amputee Control Prosthetic Hand, Video Games

A postdoctoral researcher at the University of Minnesota discusses his efforts to allow amputees to control their prosthetic limb — right down to the finger motions — with their minds.

Overjet’s Ai Wardah Inam on Bringing AI to Dentistry

Overjet, a member of NVIDIA Inception, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of the company, discusses using AI to improve patient care.

Immunai CTO and Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs

Luis Voloch, co-founder and chief technology officer of Immunai, talks about tackling the challenges of the immune system with a machine learning and data science mindset.

Subscribe to the AI Podcast

The AI Podcast is now available through Amazon Music. Additionally, you can also get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.

Read More

An open-source gymnasium for machine learning assisted computer architecture design

An open-source gymnasium for machine learning assisted computer architecture design

Computer Architecture research has a long history of developing simulators and tools to evaluate and shape the design of computer systems. For example, the SimpleScalar simulator was introduced in the late 1990s and allowed researchers to explore various microarchitectural ideas. Computer architecture simulators and tools, such as gem5, DRAMSys, and many more have played a significant role in advancing computer architecture research. Since then, these shared resources and infrastructure have benefited industry and academia and have enabled researchers to systematically build on each other’s work, leading to significant advances in the field.

Nonetheless, computer architecture research is evolving, with industry and academia turning towards machine learning (ML) optimization to meet stringent domain-specific requirements, such as ML for computer architecture, ML for TinyML accelerationDNN accelerator datapath, memory controllers, power consumption, security, and privacy. Although prior work has demonstrated the benefits of ML in design optimization, the lack of strong, reproducible baselines hinders fair and objective comparison across different methods and poses several challenges to their deployment. To ensure steady progress, it is imperative to understand and tackle these challenges collectively.

To alleviate these challenges, in “ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design”, accepted at ISCA 2023, we introduced ArchGym, which includes a variety of computer architecture simulators and ML algorithms. Enabled by ArchGym, our results indicate that with a sufficiently large number of samples, any of a diverse collection of ML algorithms are capable of finding the optimal set of architecture design parameters for each target problem; no one solution is necessarily better than another. These results further indicate that selecting the optimal hyperparameters for a given ML algorithm is essential for finding the optimal architecture design, but choosing them is non-trivial. We release the code and dataset across multiple computer architecture simulations and ML algorithms.

Challenges in ML-assisted architecture research

ML-assisted architecture research poses several challenges, including:

  1. For a specific ML-assisted computer architecture problem (e.g., finding an optimal solution for a DRAM controller) there is no systematic way to identify optimal ML algorithms or hyperparameters (e.g., learning rate, warm-up steps, etc.). There is a wider range of ML and heuristic methods, from random walk to reinforcement learning (RL), that can be employed for design space exploration (DSE). While these methods have shown noticeable performance improvement over their choice of baselines, it is not evident whether the improvements are because of the choice of optimization algorithms or hyperparameters.

    Thus, to ensure reproducibility and facilitate widespread adoption of ML-aided architecture DSE, it is necessary to outline a systematic benchmarking methodology.

  2. While computer architecture simulators have been the backbone of architectural innovations, there is an emerging need to address the trade-offs between accuracy, speed, and cost in architecture exploration. The accuracy and speed of performance estimation widely varies from one simulator to another, depending on the underlying modeling details (e.g., cycleaccurate vs. MLbased proxy models). While analytical or ML-based proxy models are nimble by virtue of discarding low-level details, they generally suffer from high prediction error. Also, due to commercial licensing, there can be strict limits on the number of runs collected from a simulator. Overall, these constraints exhibit distinct performance vs. sample efficiency trade-offs, affecting the choice of optimization algorithm for architecture exploration.

    It is challenging to delineate how to systematically compare the effectiveness of various ML algorithms under these constraints.

  3. Finally, the landscape of ML algorithms is rapidly evolving and some ML algorithms need data to be useful. Additionally, rendering the outcome of DSE into meaningful artifacts such as datasets is critical for drawing insights about the design space.

    In this rapidly evolving ecosystem, it is consequential to ensure how to amortize the overhead of search algorithms for architecture exploration. It is not apparent, nor systematically studied how to leverage exploration data while being agnostic to the underlying search algorithm.

ArchGym design

ArchGym addresses these challenges by providing a unified framework for evaluating different ML-based search algorithms fairly. It comprises two main components: 1) the ArchGym environment and 2) the ArchGym agent. The environment is an encapsulation of the architecture cost model — which includes latency, throughput, area, energy, etc., to determine the computational cost of running the workload, given a set of architectural parameters — paired with the target workload(s). The ArchGym agent is an encapsulation of the ML algorithm used for the search and consists of hyperparameters and a guiding policy. The hyperparameters are intrinsic to the algorithm for which the model is to be optimized and can significantly influence performance. The policy, on the other hand, determines how the agent selects a parameter iteratively to optimize the target objective.

Notably, ArchGym also includes a standardized interface that connects these two components, while also saving the exploration data as the ArchGym Dataset. At its core, the interface entails three main signals: hardware state, hardware parameters, and metrics. These signals are the bare minimum to establish a meaningful communication channel between the environment and the agent. Using these signals, the agent observes the state of the hardware and suggests a set of hardware parameters to iteratively optimize a (user-defined) reward. The reward is a function of hardware performance metrics, such as performance, energy consumption, etc. 

ArchGym comprises two main components: the ArchGym environment and the ArchGym agent. The ArchGym environment encapsulates the cost model and the agent is an abstraction of a policy and hyperparameters. With a standardized interface that connects these two components, ArchGym provides a unified framework for evaluating different ML-based search algorithms fairly while also saving the exploration data as the ArchGym Dataset.

ML algorithms could be equally favorable to meet user-defined target specifications

Using ArchGym, we empirically demonstrate that across different optimization objectives and DSE problems, at least one set of hyperparameters exists that results in the same hardware performance as other ML algorithms. A poorly selected (random selection) hyperparameter for the ML algorithm or its baseline can lead to a misleading conclusion that a particular family of ML algorithms is better than another. We show that with sufficient hyperparameter tuning, different search algorithms, even random walk (RW), are able to identify the best possible normalized reward. However, note that finding the right set of hyperparameters may require exhaustive search or even luck to make it competitive.

With a sufficient number of samples, there exists at least one set of hyperparameters that results in the same performance across a range of search algorithms. Here the dashed line represents the maximum normalized reward. Cloud-1, cloud-2, stream, and random indicate four different memory traces for DRAMSys (DRAM subsystem design space exploration framework).

Dataset construction and high-fidelity proxy model training

Creating a unified interface using ArchGym also enables the creation of datasets that can be used to design better data-driven ML-based proxy architecture cost models to improve the speed of architecture simulation. To evaluate the benefits of datasets in building an ML model to approximate architecture cost, we leverage ArchGym’s ability to log the data from each run from DRAMSys to create four dataset variants, each with a different number of data points. For each variant, we create two categories: (a) Diverse Dataset (DD), which represents the data collected from different agents (ACO, GA, RW, and BO), and (b) ACO only, which shows the data collected exclusively from the ACO agent, both of which are released along with ArchGym. We train a proxy model on each dataset using random forest regression with the objective to predict the latency of designs for a DRAM simulator. Our results show that:

  1. As we increase the dataset size, the average normalized root mean squared error (RMSE) slightly decreases.
  2. However, as we introduce diversity in the dataset (e.g., collecting data from different agents), we observe 9× to 42× lower RMSE across different dataset sizes.

Diverse dataset collection across different agents using ArchGym interface.
The impact of a diverse dataset and dataset size on the normalized RMSE.

The need for a community-driven ecosystem for ML-assisted architecture research

While, ArchGym is an initial effort towards creating an open-source ecosystem that (1) connects a broad range of search algorithms to computer architecture simulators in an unified and easy-to-extend manner, (2) facilitates research in ML-assisted computer architecture, and (3) forms the scaffold to develop reproducible baselines, there are a lot of open challenges that need community-wide support. Below we outline some of the open challenges in ML-assisted architecture design. Addressing these challenges requires a well coordinated effort and a community driven ecosystem.

Key challenges in ML-assisted architecture design.

We call this ecosystem Architecture 2.0. We outline the key challenges and a vision for building an inclusive ecosystem of interdisciplinary researchers to tackle the long-standing open problems in applying ML for computer architecture research. If you are interested in helping shape this ecosystem, please fill out the interest survey.

Conclusion

ArchGym is an open source gymnasium for ML architecture DSE and enables an standardized interface that can be readily extended to suit different use cases. Additionally, ArchGym enables fair and reproducible comparison between different ML algorithms and helps to establish stronger baselines for computer architecture research problems.

We invite the computer architecture community as well as the ML community to actively participate in the development of ArchGym. We believe that the creation of a gymnasium-type environment for computer architecture research would be a significant step forward in the field and provide a platform for researchers to use ML to accelerate research and lead to new and innovative designs.

Acknowledgements

This blogpost is based on joint work with several co-authors at Google and Harvard University. We would like to acknowledge and highlight Srivatsan Krishnan (Harvard) who contributed several ideas to this project in collaboration with Shvetank Prakash (Harvard), Jason Jabbour (Harvard), Ikechukwu Uchendu (Harvard), Susobhan Ghosh (Harvard), Behzad Boroujerdian (Harvard), Daniel Richins (Harvard), Devashree Tripathy (Harvard), and Thierry Thambe (Harvard).  In addition, we would also like to thank James Laudon, Douglas Eck, Cliff Young, and Aleksandra Faust for their support, feedback, and motivation for this work. We would also like to thank John Guilyard for the animated figure used in this post. Amir Yazdanbakhsh is now a Research Scientist at Google DeepMind and Vijay Janapa Reddi is an Associate Professor at Harvard.

Read More

Access private repos using the @remote decorator for Amazon SageMaker training workloads

Access private repos using the @remote decorator for Amazon SageMaker training workloads

As more and more customers are looking to put machine learning (ML) workloads in production, there is a large push in organizations to shorten the development lifecycle of ML code. Many organizations prefer writing their ML code in a production-ready style in the form of Python methods and classes as opposed to an exploratory style (writing code without using methods or classes) because this helps them ship production-ready code faster.

With Amazon SageMaker, you can use the @remote decorator to run a SageMaker training job simply by annotating your Python code with an @remote decorator. The SageMaker Python SDK will automatically translate your existing workspace environment and any associated data processing code and datasets into a SageMaker training job that runs on the SageMaker training platform.

Running a Python function locally often requires several dependencies, which may not come with the local Python runtime environment. You can install them via package and dependency management tools like pip or conda.

However, organizations operating in regulated industries like banking, insurance, and healthcare operate in environments that have strict data privacy and networking controls in place. These controls often mandate having no internet access available to any of their environments. The reason for such restriction is to have full control over egress and ingress traffic so they can reduce the chances of unscrupulous actors sending or receiving non-verified information through their network. It’s often also mandated to have such network isolation as part of the auditory and industrial compliance rules. When it comes to ML, this restricts data scientists from downloading any package from public repositories like PyPI, Anaconda, or Conda-Forge.

To provide data scientists access to the tools of their choice while also respecting the restrictions of the environment, organizations often set up their own private package repository hosted in their own environment. You can set up private package repositories on AWS in multiple ways:

In this post, we focus on the first option: using CodeArtifact.

Solution overview

The following architecture diagram shows the solution architecture.

Solution-Architecture-vpc-no-internet

The high-level steps to implement the solution are as follows

  • Set up a virtual private cloud (VPC) with no internet access using an AWS CloudFormation template.
  • Use a second CloudFormation template to set up CodeArtifact as a private PyPI repository and provide connectivity to the VPC, and set up an Amazon SageMaker Studio environment to use the private PyPI repository.
  • Train a classification model based on the MNIST dataset using an @remote decorator from the open-source SageMaker Python SDK. All the dependencies will be downloaded from the private PyPI repository.

Note that using SageMaker Studio in this post is optional. You can choose to work in any integrated development environment (IDE) of your choice. You just need to set up your AWS Command Line Interface (AWS CLI) credentials correctly. For more information, refer to Configure the AWS CLI.

Prerequisites

You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account.

Set up a VPC with no internet connection

Create a new CloudFormation stack using the vpc.yaml template. This template creates the following resources:

  • A VPC with two private subnets across two Availability Zones with no internet connectivity
  • A Gateway VPC endpoint for accessing Amazon S3
  • Interface VPC endpoints for SageMaker, CodeArtifact, and a few other services to allow the resources in the VPC to connect to AWS services via AWS PrivateLink

Provide a stack name, such as No-Internet, and complete the stack creation process.

vpc-no-internet-stack

Wait for the stack creation process to complete.

Set up a private repository and SageMaker Studio using the VPC

The next step is to deploy another CloudFormation stack using the sagemaker_studio_codeartifact.yaml template. This template creates the following resources:

Provide a stack name and keep the default values or adjust the parameters for the CodeArtifact domain name, private repository name, user profile name for SageMaker Studio, and name for the upstream public PyPI repository. You also we need to provide the VPC stack name created in the previous step.

Studio-CodeArtifact-stack

When the stack creation is complete, the SageMaker domain should be visible on the SageMaker console.

studio-domain

To verify there is no internet connection available in SageMaker Studio, launch SageMaker Studio. Choose File, New, and Terminal to launch a terminal and try to curl any internet resource. It should fail to connect, as shown in the following screenshot.

terminal-showing-no-internet

Train an image classifier using an @remote decorator with the private PyPI repository

In this section, we use the @remote decorator to run a PyTorch training job that produces a MNIST image classification model. To achieve this, we set up a configuration file, develop the training script, and run the training code.

Set up a configuration file

We set up a config.yaml file and provide the configurations needed to do the following:

  • Run a SageMaker training job in the no-internet VPC created earlier
  • Download the required packages by connecting to the private PyPI repository created earlier

The file looks like the following code:

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: '../config/requirements.txt'
        InstanceType: 'ml.m5.xlarge'
        PreExecutionCommands:
            - 'aws codeartifact login --tool pip --domain <domain-name> --domain-owner <AWS account number> --repository <private repository name> --endpoint-url <VPC-endpoint-url-prefixed with https://>
        RoleArn: '<execution role ARN for running training job>'
        S3RootUri: '<s3 bucket to store the job output>'
        VpcConfig:
            SecurityGroupIds: 
            - '<security group id used by SageMaker Studio>'
            Subnets: 
            - '<VPC subnet id 1>'
            - '<VPC subnet id 2>'

The Dependencies field contains the path to requirements.txt, which contains all the dependencies needed. Note that all the dependencies will be downloaded from the private repository. The requirements.txt file contains the following code:

torch
torchvision
sagemaker>=2.156.0,<3

The PreExecutionCommands section contains the command to connect to the private PyPI repository. To get the CodeArtifact VPC endpoint URL, use the following code:

response = ec2.describe_vpc_endpoints(
    Filters=[
        {
            'Name': 'service-name',
            'Values': [
                f'com.amazonaws.{boto3_session.region_name}.codeartifact.api'
            ]
        },
    ]
)

code_artifact_api_vpc_endpoint = response['VpcEndpoints'][0]['DnsEntries'][0]['DnsName']

endpoint_url = f'https://{code_artifact_api_vpc_endpoint}'
endpoint_url

Generally, we get two VPC endpoints for CodeArtifact, and we can use any of them in the connection commands. For more details, refer to Use CodeArtifact from a VPC.

Additionally, configurations like execution role, output location, and VPC configurations are provided in the config file. These configurations are needed to run the SageMaker training job. To know more about all the configurations supported, refer to Configuration file.

It’s not mandatory to use the config.yaml file in order to work with the @remote decorator. This is just a cleaner way to supply all configurations to the @remote decorator. All the configs could also be supplied directly in the decorator arguments, but that reduces readability and maintainability of changes in the long run. Also, the config file can be created by an admin and shared with all the users in an environment.

Develop the training script

Next, we prepare the training code in simple Python files. We have divided the code into three files:

  • load_data.py – Contains the code to download the MNIST dataset
  • model.py – Contains the code for the neural network architecture for the model
  • train.py – Contains the code for training the model by using load_data.py and model.py

In train.py, we need to decorate the main training function as follows:

@remote(include_local_workdir=True)
def perform_train(train_data,
                  test_data,
                  *,
                  batch_size: int = 64,
                  test_batch_size: int = 1000,
                  epochs: int = 3,
                  lr: float = 1.0,
                  gamma: float = 0.7,
                  no_cuda: bool = True,
                  no_mps: bool = True,
                  dry_run: bool = False,
                  seed: int = 1,
                  log_interval: int = 10,
                  ):
    # pytorch native training code........

Now we’re ready to run the training code.

Run the training code with an @remote decorator

We can run the code from a terminal or from any executable prompt. In this post, we use a SageMaker Studio notebook cell to demonstrate this:

!python ./train.py

Running the preceding command triggers the training job. In the logs, we can see that it’s downloading the packages from the private PyPI repository.

training-job-logs

This concludes the implementation of an @remote decorator working with a private repository in an environment with no internet access.

Clean up

To clean up the resources, follow the instructions in CLEANUP.md.

Conclusion

In this post, we learned how to effectively use the @remote decorator’s capabilities while still working in restrictive environments without any internet access. We also learned how can we integrate CodeArtifact private repository capabilities with the help of configuration file support in SageMaker. This solution makes iterative development much simpler and faster. Another added advantage is that you can still continue to write the training code in a more natural, object-oriented way and still use SageMaker capabilities to run training jobs on a remote cluster with minimal changes in your code. All the code shown as part of this post is available in the GitHub repository.

As a next step, we encourage you to check out the @remote decorator functionality and Python SDK API and use it in your choice of environment and IDE. Additional examples are available in the amazon-sagemaker-examples repository to get you started quickly. You can also check out the post Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes for more details.


About the author

Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Read More

Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures

Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures

This research paper was accepted by the 50th EATCS International Colloquium on Automata, Languages and Programming (ICALP 2023), which is dedicated to advancing the field of theoretical computer science.

Image that includes the blog title, ”Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures.” It also shows the ICALP 2023 logo, which is in Paderborn, Germany, and an image of the first page of the published paper. The background is a subtle abstract design.

In the rapidly evolving landscape of cloud computing, escalating demand for cloud resources is placing immense pressure on cloud providers, driving them to consistently invest in new hardware to accommodate datacenters’ expanding capacity needs. Consequently, the ability to power all this hardware has emerged as a key bottleneck, as devices that power datacenters have limited capacity and necessitate efficient utilization. Efficiency is crucial, not only to lower operational costs and subsequently prices to consumers, but also to support the sustainable use of resources, ensure their long-term availability, and preserve the environment for future generations.

At the same time, it is of the utmost importance to ensure power availability for servers, particularly in the event of a power device failure. Modern datacenter architectures have adopted a strategy to mitigate this risk by avoiding the reliance on a single power source for each server. Instead, each server is powered by two devices. Under normal operations, a server draws half its required power from each device. In the event of a failover, the remaining device steps in to support the server’s full power demand, potentially operating at an increased capacity during the failover period.

Two diagrams depicting the power allocation of three servers (1, 2, and 3) to three power devices (a, b, and c). The first diagram shows the allocation in normal operations with server 1 taking half of its power from device a and half from device b, server 2 being split half and half on devices a and c, and server 3 on devices b and c. The second diagram shows the failover due to device c becoming unavailable; now device a must support the full power of server 2, and device b the full power of server 3, server 1 is still taking power from both devices a and b.
Figure 1: This diagram depicts the power allocation of three servers (1, 2, and 3) to the power devices (a, b, and c) that serve them, during both normal operations and in a failover scenario. The height of each power device represents its capacity, and the height of each server within those power devices shows its power consumption. During failover, the servers have fewer devices from which to draw power, resulting in increased energy consumption from the resources available.

Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

Challenges of optimizing power allocation

In our paper, “Online Demand Scheduling with Failovers,” which we’re presenting at the 50th EATCS International Colloquium on Automata, Languages and Programming, (ICALP 2023), we explore a simplified model that emphasizes power management to determine how to optimally place servers within a datacenter. Our model contains multiple power devices and new servers (demands), where each power device has a normal capacity of 1 and a larger failover capacity, B. Demands arrive over time, each with a power requirement, and we must irrevocably assign each demand to a pair of power devices with sufficient capacity. The goal is to maximize the total power of the assigned demands until we are forced to reject a demand due to lack of power. This helps us maximize the usage of the available power devices.

One crucial aspect of this model is the presence of uncertainty. We assign capacity for each demand without knowing future needs. This uncertainty adds complexity, as selection of device pairs for each demand must be carefully executed to avoid allocations that could hinder the placement of future demands. Figure 2 provides an illustration.

Two diagrams, each showing four power devices a, b, c, and d, and demands that have per-device power requirement ¼ each. The first diagram is labelled “bad assignment” and has two demands that take their power from a and b, and two demands from c and d. The second diagram is labelled “good assignment” and has six demands total, each on a different pair of devices.
Figure 2: This example shows four power devices a, b, c, and d, with failover capacity B=1, and six demands that arrive sequentially, with a per-device power requirement of ¼ each. Suppose four demands have arrived so far. The example on the left represents a bad assignment that cannot accept the remaining two demands. This is because if we placed an additional demand, say, on device a, its failover capacity would be exceeded if b failed. On the other hand, a good assignment, depicted on the right, allows for the placement of all six demands.

The example in Figure 2 suggests that we should spread the demands across device pairs. Otherwise, pairs with large loads could have a big impact on the remaining devices should one device fail. On the other hand, there is a danger in spreading out the demands too much and not leaving enough devices free, as shown in Figure 3.

Two diagrams, each showing four power devices a, b, c, and d, and demands that have per-device power requirement either a small amount epsilon or 0.5. The first diagram is labelled “bad assignment” and has six demands of power epsilon that are each assigned on a different pair of devices. The second diagram is labelled “good assignment” and has six demands of power epsilon assigned to the devices a and b, and one demand of power 0.5 assigned to the devices c and d.
Figure 3: This example involves four power devices, labeled a, b, c, and d, each with failover capacity B=1. The scenario also includes seven demands that arrive sequentially, with the first six requiring little power per device (epsilon = 0.01 units of power, for example), and the last requiring 0.5 units of power. In (a), when the small demands are distributed across pairs, the final demand cannot be accommodated because the failover capacity is exceeded. However, by grouping the small demands on a single pair of power devices, as shown in (b), all demands can be successfully accommodated.

In analyzing the examples in Figures 2 and 3, it becomes clear that striking the right balance is crucial. This entails effectively distributing demands across pairs to minimize any consequences resulting from failovers while also ensuring the availability of an adequate number of devices to meet future demand. Attaining the optimal allocation avoids prematurely ending up with an unassignable demand.

To tackle this challenge, we developed algorithms that are guaranteed to efficiently utilize the available power resources. In fact, our allocations are provably close to optimal, even without upfront knowledge of the demands. Our algorithms essentially conquer the inherent uncertainty of the problem.

Optimizing for the worst-case scenario

Our first algorithm takes a conservative approach, protecting against the worst-case scenario. It guarantees that, regardless of the sequence of demand requests, it will utilize at least half of the power when compared with an optimal solution that has prior knowledge of all demands. As we show in the paper, this result represents the optimal outcome achievable in the most challenging instances.

To effectively strike a balance between distributing demands across devices and ensuring sufficient device availability, the goal of this algorithm is to group demands based on their power requirements. Each group of demands is then allocated separately to an appropriately sized collection of devices. The algorithm aims to consolidate similar demands in controlled regions, enabling efficient utilization of failover resources. Notably, we assign at most one demand per pair of devices in each collection, except in the case of minor demands, which are handled separately.

Optimizing for real-world demand

Because the previous algorithm prioritizes robustness against the worst-case demand sequences, it may unintentionally result in underutilizing the available power by a factor of ½, which is unavoidable in that setting. However, these worst-case scenarios are uncommon in typical datacenter operations. Accordingly, we shifted our focus to a more realistic model where demands arise from an unknown probability distribution. We designed our second algorithm in this stochastic arrival model, demonstrating that as the number of demands and power devices increases, its assignment progressively converges to the optimal omniscient solution, ensuring that no power is wasted.

To achieve this, the algorithm learns from historical data, enabling informed assignment decisions based on past demands. By creating “allocation templates” derived from previous demands, we learn how to allocate future demands. To implement this concept and prove its guarantee, we have developed new tools in probability and optimization that may be valuable in addressing similar problems in the future.

The post Microsoft at ICALP 2023: Deploying cloud capacity robustly against power failures appeared first on Microsoft Research.

Read More

Sierra Division Studios Presents Three Epic Projects Built With NVIDIA Omniverse

Sierra Division Studios Presents Three Epic Projects Built With NVIDIA Omniverse

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Jacob Norris is a 3D artist and the president, co-founder and creative director of Sierra Division Studios — an outsource studio specializing in digital 3D content creation. The studio was founded with a single goal in mind: to make groundbreaking artwork at the highest level.

His team is entirely remote — giving employees added flexibility to work from anywhere in the world while increasing the pool of prospective artists who have a vast array of experiences and skill sets that the studio can draw from.

Norris envisions a future where incredible 3D content can be made regardless of location, time or even language, he said. It’s a future in which NVIDIA Omniverse, a platform for connecting and building custom 3D tools and metaverse applications, will play a critical role.

Omniverse is also a powerful tool for making SimReady assets — 3D objects with accurate physical properties. Combined with synthetic data, these assets can help solve real-world problems in simulation, including for AI-powered 3D artists. Learn more about AI and access creative resources to level up your passion projects on the NVIDIA Studio creative side hustle page.

Plus, check out the new community challenge, #StartToFinish. Use the hashtag to submit a screenshot of a favorite project featuring both its beginning and ending stages for a chance to be showcased on the @NVIDIAStudio and @NVIDIAOmniverse social channels.

Tapping Omniverse for Omnipresent Work 

“Omniverse is an incredibly powerful tool for our team in the collaboration process,” said Norris. He noted that the Universal Scene Description format, aka OpenUSD, is key to achieving efficient content creation.

“We used OpenUSD to build a massive library of all the assets from our team,” Norris said. “We accomplished this by adding every mesh and element of a single model into a large, easily viewable overview scene for kitbashing, which is the process of combining elements from several assets into an entirely new model.”

The byproduct of kitbashing.

“Since everything is shared in OpenUSD, our asset library is easily accessible and reduces the time needed to access materials and make edits on the fly,” Norris added. “This helps spur inspirational and imaginational forces.”

During the review phase, the team can compare photorealistic models with incredible visual fidelity side by side in a shared space, ensuring the models are “created to the highest set of standards,” said Norris.

The Last Oil Rig on Earth 

Sierra Division’s The Oil Rig video is set on Earth’s last operational fossil fuel rig, which is visited by a playful drone named Quark. The piece’s storytelling takes the audience through an impeccably detailed environment.

Real or rendered?

A scene as complex as the one above required blockouts in Unreal Engine. The team snapped models together from a set of greybox modular pieces, ensuring the environment bits were easy to work with. Once satisfied with the environment concept and layout, the team added further detail to the models.

Building blocks in Unreal Engine.

Norris’ Lenovo ThinkPad P73 NVIDIA Studio laptop with NVIDIA RTX A5000 graphics powered NVIDIA DLSS technology to increase the interactivity of the viewport — by using AI to upscale frames rendered at lower resolution while retaining high-fidelity detail.

Sierra Division then created tiling textures, trim sheets and materials to apply to near-finalized models. The studio used Adobe Substance 3D Painter to design custom textures with edge wear and grunge, taking advantage of RTX-accelerated light and ambient occlusion for baking and optimizing assets in seconds.

Oil rig ocean materials refined in Adobe Substance 3D Painter and Designer.

Next, lighting scenarios were tested in the Omniverse USD Composer app with the Unreal Engine Connector, which eliminates the need to upload, download and refile formats, thanks to OpenUSD.

Stunning detail.

“With OpenUSD, it’s very easy to open the same file you’re viewing in the engine and quickly make edits without having to re-import,” said Norris.

Team-wide review of renders made easier with Omniverse USD Composer.

Sierra Division analyzed daytime, nighttime, rainy and cloudy scenarios to see how the scene resonated emotionally, helping to decide the mood of the story they wanted to tell with their in-progress assets. They settled on a cloudy environment with well-placed lights to evoke feelings of mystery and intrigue.

Don’t underestimate the value of emotion. ‘Oil Rig’ re-light artwork by Ted Mebratu.

“From story-building to asset and scene creation to final renders with RTX, AI and GPU-accelerated features helped us every step of the way.” — Jacob Norris

From here, the team added cameras to the scene to determine compositions for final renders.

“If we were to try to compose the entire environment without cameras or direction, it would take much longer, and we wouldn’t have perfectly laid-out camera shots nor specifically lit renders,” said Norris. “It’s just much easier and more fun to do it this way and to pick camera shots earlier on.”

Final renders were exported lightning fast with Norris’ RTX A5000 GPU into Adobe Photoshop. Over 30 GPU-accelerated features gave Norris plenty of options to play with colors and contrast, and make final image adjustments smoothly and quickly.

Pick camera angles before final composition.

The Oil Rig modular set is available for purchase on Epic Games Unreal Marketplace. Sierra Division donates a portion of every sale to Ocean Conservancy — a nonprofit working to reduce trash, create sustainable fisheries and preserve wildlife.

The Explorer’s Room

For another video, called “The Explorer’s Room,” Sierra Division collaborated with 3D artist Mostafa Sohbi. An environment originally created by Sohbi was a great starting point to expand on the idea of an “explorer” that collects artifacts, gets into precarious situations and uses tools to help him out.

Norris and Sierra Division’s creative workflow for this piece closely mirrored the team’s work on The Oil Rig.

The Explorer’s Room.

“We all know Nathan Drake, Lara Croft, Indiana Jones and other adventurous characters,” said Norris. “They were big inspirations for us to create a living environment that tells our version of the story, while also allowing others to take the same assets and work on their own versions of the story, adding or changing elements in it.”

Don’t lose your way.

Norris stressed the importance of GPU technology in his creative workflow. “Having the fastest-performing GPU allowed us to focus more on the creative process and telling our story, instead of trying to work around slow technology or accommodating for poor performance with lower-quality artwork,” said Norris.

It’s the little details that make this render extraordinary.

“We simply made what we thought was awesome, looked awesome and felt great to share with others,” Norris said. “So it was a no-brainer for us to use NVIDIA RTX GPUs.”

Traveling soon?

Money Heist 

Norris said much of the content Sierra Division creates offers the opportunity for others to use the studio’s assets to tell their own stories. “We don’t always want to over impose our own ideas into a scene or an environment, but we do want to show what is possible,” he added.

Don’t get any ideas.

Sierra Division created the Heist Essentials and Tools Collection set of props to share with game developers, content creators and virtual production teams.

Photorealistic detail.

“It’s always a thrill to recreate props inspired by movies like Ocean’s Eleven and Mission Impossible, and create assets someone might use during these types of missions and sequences,” said Norris.

How much money is that?

Try to spot all of the hidden treasures.

Jacob Norris, president, co-founder and creative director of Sierra Division Studios.

Check out Sierra Division on the studio’s website, ArtStation, Twitter and Instagram.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team. Developers can get started with Omniverse resources. Stay up to date on the platform by subscribing to the newsletter, and follow NVIDIA Omniverse on Instagram, Medium and TwitterFor more, join the Omniverse community and check out the Omniverse forums, Discord server, Twitch and YouTube channels.

Read More

Exploring institutions for global AI governance

Exploring institutions for global AI governance

New white paper investigates models and functions of international institutions that could help manage opportunities and mitigate risks of advanced AI. Growing awareness of the global impact of advanced artificial intelligence (AI) has inspired public discussions about the need for international governance structures to help manage opportunities and mitigate risks involved. Many discussions have drawn on analogies with the ICAO (International Civil Aviation Organization) in civil aviation; CERN (European Organization for Nuclear Research) in particle physics; IAEA (International Atomic Energy Agency) in nuclear technology, and intergovernmental and multi-stakeholder organisations in many other domains. And yet, while analogies can be a useful start, the technologies emerging from AI will be unlike aviation, particle physics, or nuclear technology. To succeed with AI governance, we need to better understand: what specific benefits and risks we need to manage internationally, what governance functions those benefits and risks require, what organisations can best provide those functions.Read More