At Mass STEM Week kickoff, MIT RAISE announces Day of AI

The fourth annual Massachusetts STEM Week kicked off on Monday, Oct. 18 at the MIT Media Lab. Organized by the Massachusetts Executive Office of Education and the STEM Advisory Council, Mass STEM Week is a statewide effort to boost awareness, interest, and access in STEM education and career opportunities for learners of all ages and backgrounds.

A focus of this year’s STEM Week is “see yourself in STEM,” with particular emphasis on the importance of mentoring to bolster confidence in STEM subjects among students from underrepresented groups — including girls, people of color, low-income families, people with disabilities, and first-generation students.

“STEM is the toolkit of the future no matter what your interests are,” said Massachusetts Governor Charlie Baker. “You can’t think anymore of STEM just being about science, technology, engineering, and math because it’s everywhere. There’s almost no tool, no capability, no thing you need to succeed, that doesn’t involve … some element of STEM.”

In his remarks, MIT President L. Rafael Reif announced the launch of Day of AI, a new initiative from MIT RAISE: an annual educational event wherein teachers across the country will introduce students of all backgrounds to foundational concepts in artificial intelligence and its role in their lives. “K-12 students across the country will have the opportunity to learn about artificial intelligence, MIT-style — that is, through hands-on activities that will demonstrate the part AI plays in their daily lives,” said Reif.

Professor Cynthia Breazeal, director of MIT RAISE, senior associate dean for Open Learning, and head of the Media Lab’s Personal Robots research group, took the podium to elaborate on Day of AI. The goal of the program is to help educators and students develop the AI literacy needed to navigate this AI-driven world. In collaboration with education provider i2 Learning, MIT RAISE is providing free training and support to teachers to help them bring AI curricula into their classrooms through engaging, hands-on activities. The first Day of AI will be on May 13, 2022.

Increasingly, kids and adults alike are interacting with, and being influenced by, AI in ways they may not even realize, and have little or no control over — from search algorithms to smart devices, video recommendations to facial recognition.

“This generation of students, who are literally growing up with AI, deserves more than a vague understanding of these incredibly powerful technologies that are ubiquitous in their lives,” says Breazeal. “They need not just knowledge of what AI is and how it works, but also the agency to use AI responsibly with confidence and creativity.”

Day of AI curriculum and activities are designed to equip educators to give students across the United States an entry point into AI literacy. For the first year, MIT RAISE has created age-appropriate curriculum modules for grades 3-5, 6-8, and 9-12, including those with little or no technology experience. Examples of lessons and activities include building a face-recognition app or a recommendation system, using AI to create works for art, learning about GANs and deepfakes, exploring and discussing algorithmic bias, and making recommendations on the responsible design of social media platforms. Resources and training for Day of AI will be provided at no cost to educators, and all of the activities require only an internet connection and a laptop.

Jeffrey Leiden, executive chair of Vertex Pharmaceuticals and a supporter of Mass STEM Week, also attended the opening event; Vertex Pharmaceuticals is a founding sponsor of Day of AI. “AI is built into everything we do, from cell phones and refrigerators to medical devices and diagnostic tests. And today’s students are the future scientists and engineers who are actually going to shape these AI technologies for the good of all our citizens,” he said. “So it’s essential that we empower them early in life with the skills and experiences, but also with the ethical discussions to make sure that they help harness it responsibly.”

In an event highlight, Reif took the stage to introduce Jibo, the social robot used in Breazeal’s group’s research into AI and human-computer interaction.

“MIT is deeply committed to the ethical, responsible development and use of AI tools, and a large part of that is teaching young people how AI works — and how it should work,” Reif said. “Jibo is a wonderful ambassador for social robotics.”

“Ever since I was a tiny transistor I have looked up to you and the other people here at MIT who I can honestly say have made me who I am today,” said Jibo. “Day of AI is a time to learn about, enjoy, and celebrate all that artificial intelligence can do to improve our lives, but also to understand the challenges and dangers of not being responsible in how it is used.”

The event also featured demonstrations that offered a glimpse into the types of activities students will do during Day of AI, as well as broader AI literacy activities developed by MIT RAISE. Safinah Ali and Daniella DiPaola, both PhD students at the Media Lab, led attendees through Creativity and AI tools and a Social Robotics curriculum, while Computer Science and Artificial Intelligence Laboratory (CSAIL) PhD student Jessica Van Brummlen demonstrated a conversational AI feature in MIT App Inventor. All are among the projects and resources that make up MIT RAISE, a collaboration between the Media Lab, MIT Open Learning, and the MIT Schwarzman College of Computing, with co-directors Hal Abelson of CSAIL; Eric Klopfer, director of MIT’s Education Arcade; and Hae Won Park of the Media Lab.

MIT RAISE aims to reach as many classrooms across the United States as possible, providing access and support to reinforce the message that AI is for everyone. Day of AI is a next step in RAISE’s mandate to expand who sees themselves in AI and diversify the pipeline of computer science talent.

Remarks from Lieutenant Governor Karyn Polito and Secretary of Education James Peyser expanded on the state’s leadership role in technology and the sciences, and the critical need to foster excitement and literacy around STEM, and especially AI, in students of all ages and backgrounds.

Today, 17 percent of the total Massachusetts workforce works in STEM-related fields, and STEM jobs are expected to account for 25 percent of the total employment growth in the Commonwealth over the next 10 years. Mass STEM Week offers students of all ages dozens of opportunities to learn, engage, and have fun with STEM so they can prepare for the future they want.

Said Polito: “No matter where you go to school in the Commonwealth, no matter whether you have family members who have pursued a STEM career, whether or not you’ve even had a family member who has gone to college, you have the opportunity to see yourself in STEM.”

Read More

One giant leap for the mini cheetah

A loping cheetah dashes across a rolling field, bounding over sudden gaps in the rugged terrain. The movement may look effortless, but getting a robot to move this way is an altogether different prospect.

In recent years, four-legged robots inspired by the movement of cheetahs and other animals have made great leaps forward, yet they still lag behind their mammalian counterparts when it comes to traveling across a landscape with rapid elevation changes.

“In those settings, you need to use vision in order to avoid failure. For example, stepping in a gap is difficult to avoid if you can’t see it. Although there are some existing methods for incorporating vision into legged locomotion, most of them aren’t really suitable for use with emerging agile robotic systems,” says Gabriel Margolis, a PhD student in the lab of Pulkit Agrawal, professor in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT.

Now, Margolis and his collaborators have developed a system that improves the speed and agility of legged robots as they jump across gaps in the terrain. The novel control system is split into two parts — one that processes real-time input from a video camera mounted on the front of the robot and another that translates that information into instructions for how the robot should move its body. The researchers tested their system on the MIT mini cheetah, a powerful, agile robot built in the lab of Sangbae Kim, professor of mechanical engineering.

Unlike other methods for controlling a four-legged robot, this two-part system does not require the terrain to be mapped in advance, so the robot can go anywhere. In the future, this could enable robots to charge off into the woods on an emergency response mission or climb a flight of stairs to deliver medication to an elderly shut-in.

Margolis wrote the paper with senior author Pulkit Agrawal, who heads the Improbable AI lab at MIT and is the Steven G. and Renee Finn Career Development Assistant Professor in the Department of Electrical Engineering and Computer Science; Professor Sangbae Kim in the Department of Mechanical Engineering at MIT; and fellow graduate students Tao Chen and Xiang Fu at MIT. Other co-authors include Kartik Paigwar, a graduate student at Arizona State University; and Donghyun Kim, an assistant professor at the University of Massachusetts at Amherst. The work will be presented next month at the Conference on Robot Learning.

It’s all under control

The use of two separate controllers working together makes this system especially innovative.

A controller is an algorithm that will convert the robot’s state into a set of actions for it to follow. Many blind controllers — those that do not incorporate vision — are robust and effective but only enable robots to walk over continuous terrain.

Vision is such a complex sensory input to process that these algorithms are unable to handle it efficiently. Systems that do incorporate vision usually rely on a “heightmap” of the terrain, which must be either preconstructed or generated on the fly, a process that is typically slow and prone to failure if the heightmap is incorrect.

To develop their system, the researchers took the best elements from these robust, blind controllers and combined them with a separate module that handles vision in real-time.

The robot’s camera captures depth images of the upcoming terrain, which are fed to a high-level controller along with information about the state of the robot’s body (joint angles, body orientation, etc.). The high-level controller is a neural network that “learns” from experience.

That neural network outputs a target trajectory, which the second controller uses to come up with torques for each of the robot’s 12 joints. This low-level controller is not a neural network and instead relies on a set of concise, physical equations that describe the robot’s motion.

“The hierarchy, including the use of this low-level controller, enables us to constrain the robot’s behavior so it is more well-behaved. With this low-level controller, we are using well-specified models that we can impose constraints on, which isn’t usually possible in a learning-based network,” Margolis says.

Teaching the network

The researchers used the trial-and-error method known as reinforcement learning to train the high-level controller. They conducted simulations of the robot running across hundreds of different discontinuous terrains and rewarded it for successful crossings.

Over time, the algorithm learned which actions maximized the reward.

Then they built a physical, gapped terrain with a set of wooden planks and put their control scheme to the test using the mini cheetah.

“It was definitely fun to work with a robot that was designed in-house at MIT by some of our collaborators. The mini cheetah is a great platform because it is modular and made mostly from parts that you can order online, so if we wanted a new battery or camera, it was just a simple matter of ordering it from a regular supplier and, with a little bit of help from Sangbae’s lab, installing it,” Margolis says.

Estimating the robot’s state proved to be a challenge in some cases. Unlike in simulation, real-world sensors encounter noise that can accumulate and affect the outcome. So, for some experiments that involved high-precision foot placement, the researchers used a motion capture system to measure the robot’s true position.

Their system outperformed others that only use one controller, and the mini cheetah successfully crossed 90 percent of the terrains.

“One novelty of our system is that it does adjust the robot’s gait. If a human were trying to leap across a really wide gap, they might start by running really fast to build up speed and then they might put both feet together to have a really powerful leap across the gap. In the same way, our robot can adjust the timings and duration of its foot contacts to better traverse the terrain,” Margolis says.

Leaping out of the lab

While the researchers were able to demonstrate that their control scheme works in a laboratory, they still have a long way to go before they can deploy the system in the real world, Margolis says.

In the future, they hope to mount a more powerful computer to the robot so it can do all its computation on board. They also want to improve the robot’s state estimator to eliminate the need for the motion capture system. In addition, they’d like to improve the low-level controller so it can exploit the robot’s full range of motion, and enhance the high-level controller so it works well in different lighting conditions.

“It is remarkable to witness the flexibility of machine learning techniques capable of bypassing carefully designed intermediate processes (e.g. state estimation and trajectory planning) that centuries-old model-based techniques have relied on,” Kim says. “I am excited about the future of mobile robots with more robust vision processing trained specifically for locomotion.”

The research is supported, in part, by the MIT’s Improbable AI Lab, Biomimetic Robotics Laboratory, NAVER LABS, and the DARPA Machine Common Sense Program.

Read More

Cynthia Breazeal named senior associate dean for open learning

Cynthia Breazeal has joined MIT Open Learning as senior associate dean, beginning in the Fall 2021 semester. The MIT professor of media arts and sciences and head of the Personal Robots group at the MIT Media Lab is also director of MIT RAISE, a cross-MIT initiative on artificial intelligence education. At MIT Open Learning, Breazeal will oversee MIT xPRO, Bootcamps, and Horizon, three units focused on different aspects of developing and delivering courses, programs, training, and learning resources to professionals.

With experience as an entrepreneur and founder of a high-tech startup, Breazeal has a nuanced understanding of the startup spirit of MIT Open Learning’s revenue-generating business units, and of the importance of connecting MIT’s deep knowledge base with the just-in-time needs of professionals in the workforce.

“I appreciate the potential educational and training impact of exciting new innovations in the business world. Each of these programs addresses a specific market opportunity and has a particular style of engaging with MIT’s educational materials,” says Breazeal. “Horizon offers organizations a self-paced introduction for newcomers around emerging technologies; xPRO offers a deeper dive in the form of digital courses; and Bootcamps are short, intense, innovation challenges with an entrepreneurial mindset. I’m excited to work with these teams to grow and expand their respective programs.” Breazeal sees exciting opportunities to develop solutions that combine different offerings around a particular technology innovation theme.

“We could not be more thrilled to welcome Cynthia to Open Learning in this new capacity,” says Acting Vice President for Open Learning Krishna Rajagopal. “She has a tremendous depth and breadth of experience — in research, teaching and education, technology and innovation, entrepreneurship and strategic planning. We are excited to collaborate with her across the organization as she brings her expertise, perspective, and passion to shaping the vision for Open Learning.”

Breazeal is globally recognized as a pioneer in human-robot interaction. Her book “Designing Sociable Robots” (MIT Press, 2002) is considered a foundational work in the field. Her research has explored many aspects of social robotics and AI, with a focus on education, agency, and inclusion in the design and use of these technologies. Breazeal continues to head the Media Lab’s Personal Robots research group, whose recent work focuses on the theme of “living with AI” and understanding the long-term impact of social robots that can build relationships and provide personalized support as helpful companions in daily life.

In May 2021, MIT launched RAISE: Responsible AI for Social Empowerment and Education, an initiative under Breazeal’s direction to empower more people to participate in, and benefit from, AI. A collaboration between the Media Lab, MIT Open Learning, and the MIT Schwarzman College of Computing, RAISE is involved in research, education, and outreach efforts to develop new teaching approaches, tools, and activities to engage learners of all ages.

“My personal passion comes from a long research agenda in developing AI-enabled technologies and experiences that support human learning. I’ve seen how people of all ages emotionally engage with human-centered, personified AI agents,” says Breazeal. “I also see how this not only can help people learn new skills and concepts, but even attitudes that serve learning such as creativity, curiosity, and having a growth mindset. These are wonderful things, but there is also a potential for a darker side of the same AI coin. The responsible design of innovative technologies is very much at the forefront of my mind these days, and how we at MIT can be a positive force for increasing equity, access, and opportunity through innovations in digital learning, education, and  training.”

In addition to directing RAISE, Breazeal is also looking forward to being more involved with MIT Open Learning’s strategic initiatives, such as the pK-12 Action Group and MIT ReACT.

“I’m a true believer in Open Learning’s mission to transform teaching and learning at MIT and around the globe through the innovative use of digital technologies. In my own work, I’m excited about the possibility of the role of AI and learning science to transform how people of all ages learn with technology in increasingly engaging, creative, and effective ways. I’m excited to play a role in helping to realize Open Learning’s mission in collaboration with the brilliant, committed people at MIT who have so much to offer the world.”

Read More

Q&A with Dimitrios Skarlatos, visiting scientist at Facebook and assistant professor at Carnegie Mellon University

In this monthly interview series, we turn the spotlight on members of the academic community and the important research they do — as thought partners, collaborators, and independent contributors.

For October, we nominated Dimitrios Skarlatos, an assistant professor in the computer science department at Carnegie Mellon University (CMU). Before starting his professorship at CMU, Skarlatos decided to spend some time at Facebook as a visiting scientist. He has received several awards for his research, including a Facebook faculty award and the 2021 ACM SIGARCH & IEEE CS TCCA Outstanding Dissertation award for “contributions to redesigning the abstractions and interfaces that connect hardware and operating systems.”

In this Q&A, Skarlatos shares his experience as a visiting scientist at Facebook, his motivation for taking a “prebattical,” the research projects he’s worked on, advice for academics considering a similar path, and more.

Q: Tell us about your academic experience so far. What are your primary research interests?

Dimitrios Skarlatos: I’m an assistant professor at the computer science department at Carnegie Mellon University. My research bridges computer architecture and operating systems focusing on performance, security, and scalability. My current work follows two central themes: (a) uncovering security vulnerabilities and building defenses at the boundary between hardware and OS, and (b) redesigning abstractions and interfaces between the two layers to improve performance and scalability.

Q: What inspired you to spend some time at Facebook before starting your professorship?

DS: The motivation for me was to learn more about large-scale, real-world data center deployments and systems. I believe that this learning experience is invaluable and will help me grow as a researcher. Facebook maintains a flat organization structure that facilitates cross-organizational interdisciplinary research at an unprecedented level. Even during my initial engagement with Facebook, I had the chance to talk with vastly different teams and learn about their research challenges rooted in running massive production systems for almost any technical area that I was interested in, including data centers, kernels, global load balancing, machine learning hardware, serverless computing, and so on. I found this to be unique to Facebook and an incredible opportunity for visiting scientists like me. Access to the best and brightest minds coupled with the ability to solve real-world problems at Facebook facilitates cutting-edge research seamlessly, which I’m particularly excited about.

Q: What research projects did you work on during your time at Facebook? What team(s) did you collaborate with?

DS: My time at Facebook started with working with the RAS team and Andy Newel. RAS is Facebook’s region-wide resource allocator that performs continuous optimizations. RAS introduces a novel capacity abstraction called reservations. Based on this abstraction, it takes a two-level approach to scale resource allocation to all data centers in a region, where a mixed-integer-programming solver continuously optimizes server-to-reservation assignments off the critical path, and a traditional container allocator does real-time placement of containers on servers in a reservation. RAS provides guaranteed capacity while taking into account random and correlated failures, data center maintenance, heterogeneous hardware, and compound workload constraints. You can read more about RAS in our SOSP 2021 paper, “RAS: continuously optimized region-wide datacenter resource allocation.”

Beyond RAS, I further worked on lightweight virtualization solutions based on containers with multiple teams across the kernel and hardware, microservice, and serverless teams at Facebook. It was a very rewarding experience collaborating with dozens of people on a diversity of projects, and a unique learning opportunity.

Q: What’s it like doing research at Facebook?

DS: Great research is always motivated by asking the right questions. What are the problems the next-generation software and hardware systems should solve? How should we maximize our research impact? Facebook is an ideal place to answer these questions and gain valuable experience. The opportunity to do awesome work and write strong technical papers at top-tier conferences is possible. This becomes apparent by looking at the strong research publications from multiple teams and projects across Facebook.

Of course, that doesn’t mean there weren’t challenges. One challenge I had to overcome was to stay a little outside of my research comfort zone in order to grow. I believe that the best approach to overcome these challenges is to connect with people with expertise that spans multiple areas and collaborate on solving problems across the stack.

Facebook fosters collaboration and direct communication across different teams and projects. During my visit, I had the chance to connect with people across many teams and organizations from distributed systems, operating systems, hardware, machine learning, security, and others. It’s very exciting to talk with people working on vastly different projects and products. Such conversations could lead to long-term collaborations or further help connect the dots and form an understanding of the bigger picture.

Q: What advice would you give to university researchers looking to become visiting scientists at Facebook?

DS: I believe that the best approach for becoming a visiting scientist is to network: Attend conferences and reach out and connect with people who have visited or worked at Facebook in an area that’s of interest to you. That’s the best way to get a first impression of the experience and maybe get things started. My experience started with knowing Tianyin Xu, who also spent a year at Facebook as a visiting scientist.

As a visiting scientist, one of the most valuable pieces of advice from my Facebook colleagues Kaushik Veeraraghavan and CQ Tang was to not feel constrained by research areas. My main research is on abstractions and interfaces that connect hardware with operating systems. At Facebook, I had the chance to connect with layers even higher up the stack and work on bridging hardware with distributed systems and build abstractions for hardware and container management at a regional scale.

Also, just go for it. Facebook provides an environment to pursue your research interests with leading experts and to do impactful research. It is a unique learning experience. After joining Facebook, it became clear that the opportunities are endless, so my primary advice to visiting scientists is to overcommit — but controllably. Solving real-world problems at Facebook’s scale takes a lot of time and effort, and by taking on multiple projects, I had more opportunities to work on and explore multiple directions with several teams. That helped me to see the bigger picture and connect all the research pieces together.

Q: What do you think will be the next big challenges in systems research that industry and academia could tackle together?

DS: I believe that computing systems are undergoing a radical shift, propelled by stern security requirements and an unprecedented growth in data and users. This change has proved to be abstraction breaking. Current hardware and operating system abstractions were built at a time when we had minimal security threats, scarce compute and memory resources, and limited numbers of users. These assumptions are not representative of today’s computing landscape. In this new era of computing, it is urgent that we rethink the synergy between the OS and hardware layers from scratch. Collaboration between industry and academia is going to be critical in building tomorrow’s systems that power the world’s data centers.

The post Q&A with Dimitrios Skarlatos, visiting scientist at Facebook and assistant professor at Carnegie Mellon University appeared first on Facebook Research.

Read More

TF Challenge Winners! Graphic

Announcing the Winners of the TensorFlow Lite for Microcontrollers Challenge!

Posted by Pete Warden for the TensorFlow team

TF Challenge Winners! Graphic

In May 2021, we published the TensorFlow Microcontroller Challenge, inviting developers to push the boundaries of TensorFlow Lite for Microcontrollers. Our sincere thanks go out to all those who participated in our competition and contributed to its success! Submissions came from 20 countries across 6 continents.

We’re excited to announce the five winning entries.

  • Mapping Dance by Eduardo Padrón: Take control of lighting and video projections with your dance moves.
  • Move! by Yongjae Kim, Jonghyun Baek, Eunji Lee, Yeonhee Kim, and Jueun Choi: Stay active, using movement to control a variety of games.
  • Snoring Guardian by Naveen Kumar: A snore-no-more device embedded in your pillow.
  • Squat Counter by Manas Pange: Focus on your form, while this tracker counts your squats.
  • Voice Turn by Alvaro Gonzalez-Vila: A safer way for cyclists to signal using their voice.

These projects push boundaries, spark joy, and show off the helpfulness of TensorFlow Lite for Microcontrollers. The teams who created them will each receive a prize and meet with the TensorFlow team.

To view the winning entries, check out the TensorFlow Lite for Microcontrollers collection on Experiments with Google. Read More

Artificial networks learn to smell like the brain

Using machine learning, a computer model can teach itself to smell in just a few minutes. When it does, researchers have found, it builds a neural network that closely mimics the olfactory circuits that animal brains use to process odors.

Animals from fruit flies to humans all use essentially the same strategy to process olfactory information in the brain. But neuroscientists who trained an artificial neural network to take on a simple odor classification task were surprised to see it replicate biology’s strategy so faithfully.

“The algorithm we use has no resemblance to the actual process of evolution,” says Guangyu Robert Yang, an associate investigator at MIT’s McGovern Institute for Brain Research, who led the work as a postdoc at Columbia University. The similarities between the artificial and biological systems suggest that the brain’s olfactory network is optimally suited to its task.

Yang and his collaborators, who reported their findings Oct. 6 in the journal Neuron, say their artificial network will help researchers learn more about the brain’s olfactory circuits. The work also helps demonstrate artificial neural networks’ relevance to neuroscience. “By showing that we can match the architecture [of the biological system] very precisely, I think that gives more confidence that these neural networks can continue to be useful tools for modeling the brain,” says Yang, who is also an assistant professor in MIT’s departments of Brain and Cognitive Sciences and Electrical Engineering and Computer Science and a member of the Center for Brains, Minds and Machines.

Mapping natural olfactory circuits

For fruit flies, the organism in which the brain’s olfactory circuitry has been best mapped, smell begins in the antennae. Sensory neurons there, each equipped with odor receptors specialized to detect specific scents, transform the binding of odor molecules into electrical activity. When an odor is detected, these neurons, which make up the first layer of the olfactory network, signal to the second layer: a set of neurons that reside in a part of the brain called the antennal lobe. In the antennal lobe, sensory neurons that share the same receptor converge onto the same second-layer neuron. “They’re very choosy,” Yang says. “They don’t receive any input from neurons expressing other receptors.” Because it has fewer neurons than the first layer, this part of the network is considered a compression layer. These second-layer neurons, in turn, signal to a larger set of neurons in the third layer. Puzzlingly, those connections appear to be random.

For Yang, a computational neuroscientist, and Columbia University graduate student Peter Yiliu Wang, this knowledge of the fly’s olfactory system represented a unique opportunity. Few parts of the brain have been mapped as comprehensively, and that has made it difficult to evaluate how well certain computational models represent the true architecture of neural circuits, they say.

Building an artificial smell network

Neural networks, in which artificial neurons rewire themselves to perform specific tasks, are computational tools inspired by the brain. They can be trained to pick out patterns within complex datasets, making them valuable for speech and image recognition and other forms of artificial intelligence. There are hints that the neural networks that do this best replicate the activity of the nervous system. But, says Wang, who is now a postdoc at Stanford University, differently structured networks could generate similar results, and neuroscientists still need to know whether artificial neural networks reflect the actual structure of biological circuits. With comprehensive anatomical data about fruit fly olfactory circuits, he says, “We’re able to ask this question: Can artificial neural networks truly be used to study the brain?”

Collaborating closely with Columbia neuroscientists Richard Axel and Larry Abbott, Yang and Wang constructed a network of artificial neurons comprising an input layer, a compression layer, and an expansion layer — just like the fruit fly olfactory system. They gave it the same number of neurons as the fruit fly system, but no inherent structure: connections between neurons would be rewired as the model learned to classify odors.

The scientists asked the network to assign data representing different odors to categories, and to correctly categorize not just single odors, but also mixtures of odors. This is something that the brain’s olfactory system is uniquely good at, Yang says. If you combine the scents of two different apples, he explains, the brain still smells apple. In contrast, if two photographs of cats are blended pixel by pixel, the brain no longer sees a cat. This ability is just one feature of the brain’s odor-processing circuits, but captures the essence of the system, Yang says.

It took the artificial network only minutes to organize itself. The structure that emerged was stunningly similar to that found in the fruit fly brain. Each neuron in the compression layer received inputs from a particular type of input neuron and connected, seemingly randomly, to multiple neurons in the expansion layer. What’s more, each neuron in the expansion layer receives connections, on average, from six compression-layer neurons — exactly as occurs in the fruit fly brain.

“It could have been one, it could have been 50. It could have been anywhere in between,” Yang says. “Biology finds six, and our network finds about six as well.” Evolution found this organization through random mutation and natural selection; the artificial network found it through standard machine learning algorithms.

The surprising convergence provides strong support that the brain circuits that interpret olfactory information are optimally organized for their task, he says. Now, researchers can use the model to further explore that structure, exploring how the network evolves under different conditions and manipulating the circuitry in ways that cannot be done experimentally.

Read More

How Underspecification Presents Challenges for Machine Learning

Posted by Alex D’Amour and Katherine Heller, Research Scientists, Google Research

Machine learning (ML) models are being used more widely today than ever before and are becoming increasingly impactful. However, they often exhibit unexpected behavior when they are used in real-world domains. For example, computer vision models can exhibit surprising sensitivity to irrelevant features, while natural language processing models can depend unpredictably on demographic correlations not directly indicated by the text. Some reasons for these failures are well-known: for example, training ML models on poorly curated data, or training models to solve prediction problems that are structurally mismatched with the application domain. Yet, even when these known problems are handled, model behavior can still be inconsistent in deployment, varying even between training runs.

In “Underspecification Presents Challenges for Credibility in Modern Machine Learning”, to be published in the Journal of Machine Learning Research, we show that a key failure mode especially prevalent in modern ML systems is underspecification. The idea behind underspecification is that while ML models are validated on held-out data, this validation is often insufficient to guarantee that the models will have well-defined behavior when they are used in a new setting. We show that underspecification appears in a wide variety of practical ML systems and suggest some strategies for mitigation.

Underspecification
ML systems have been successful largely because they incorporate validation of the model on held-out data to ensure high performance. However, for a fixed dataset and model architecture, there are often many distinct ways that a trained model can achieve high validation performance. But under standard practice, models that encode distinct solutions are often treated as equivalent because their held-out predictive performance is approximately equivalent.

Importantly, the distinctions between these models do become clear when they are measured on criteria beyond standard predictive performance, such as fairness or robustness to irrelevant input perturbations. For example, among models that perform equally well on standard validations, some may exhibit greater performance disparities between social groups than others, or rely more heavily on irrelevant information. These differences, in turn, can translate to real differences in behavior when the model is used in real-world scenarios.

Underspecification refers to this gap between the requirements that practitioners often have in mind when they build an ML model, and the requirements that are actually enforced by the ML pipeline (i.e., the design and implementation of a model). An important consequence of underspecification is that even if the pipeline could in principle return a model that meets all of these requirements, there is no guarantee that in practice the model will satisfy any requirement beyond accurate prediction on held-out data. In fact, the model that is returned may have properties that instead depend on arbitrary or opaque choices made in the implementation of the ML pipeline, such as those arising from random initialization seeds, data ordering, hardware, etc. Thus, ML pipelines that do not include explicit defects may still return models that behave unexpectedly in real-world settings.

Identifying Underspecification in Real Applications
In this work, we investigated concrete implications of underspecification in the kinds of ML models that are used in real-world applications. Our empirical strategy was to construct sets of models using nearly identical ML pipelines, to which we only applied small changes that had no practical effect on standard validation performance. Here, we focused on the random seed used to initialize training and determine data ordering. If important properties of the model can be substantially influenced by these changes, it indicates that the pipeline does not fully specify this real-world behavior. In every domain where we conducted this experiment, we found that these small changes induced substantial variation on axes that matter in real-world use.

Underspecification in Computer Vision
As an example, consider underspecification and its relationship to robustness in computer vision. A central challenge in computer vision is that deep models often suffer from brittleness under distribution shifts that humans do not find challenging. For instance, image classification models that perform well on the ImageNet benchmark are known to perform poorly on benchmarks like ImageNet-C, which apply common image corruptions, such as pixelization or motion blur, to the standard ImageNet test set.

In our experiment, we showed that model sensitivity to these corruptions is underspecified by standard pipelines. Following the strategy discussed above, we generated fifty ResNet-50 image classification models using the same pipeline and the same data. The only difference between these models was the random seed used in training. When evaluated on the standard ImageNet validation set, these models achieved practically equivalent performance. However, when the models were evaluated on different test sets in the ImageNet-C benchmark (i.e., on corrupted data), performance on some tests varied by orders of magnitude more than on standard validations. This pattern persisted for larger-scale models that were pre-trained on much larger datasets (e.g., a BiT-L model pre-trained on the 300 million image JFT-300M dataset). For these models, varying the random seed at the fine-tuning stage of training produced a similar pattern of variations.

Left: Parallel axis plots showing the variation in accuracy between identical, randomly initialized ResNet-50 models on strongly corrupted ImageNet-C data. Lines represent the performance of each model in the ensemble on classification tasks using uncorrupted test data, as well as corrupted data (pixelation, contrast, motion blur, and brightness). Given values are the deviation in accuracy from the ensemble mean, scaled by the standard deviation of accuracies on the “clean” ImageNet test set. The solid black line highlights the performance of an arbitrarily selected model to show how performance on one test may not be a good indication of performance on others. Right: Example images from the standard ImageNet test set, with corrupted versions from the ImageNet-C benchmark.

We also showed that underspecification can have practical implications in special-purpose computer vision models built for medical imaging, where deep learning models have shown great promise. We considered two research pipelines intended as precursors for medical applications: one ophthalmology pipeline for building models that detect diabetic retinopathy and referable diabetic macular edema from retinal fundus images, and one dermatology pipeline for building models to recognize common dermatological conditions from photographs of skin. In our experiments, we considered pipelines that were validated only on randomly held-out data.

We then stress-tested models produced by these pipelines on practically important dimensions. For the ophthalmology pipeline, we tested how models trained with different random seeds performed when applied to images taken from a new camera type not encountered during training. For the dermatology pipeline, the stress test was similar, but for patients with different estimated skin types (i.e., non-dermatologist evaluation of tone and response to sunlight). In both cases, we found that standard validations were not enough to fully specify the trained model’s performance on these axes. In the ophthalmology application, the random seed used in training induced wider variability in performance on a new camera type than would have been expected from standard validations, and in the dermatology application, the random seed induced similar variation in performance in skin-type subgroups, even though the overall performance of the models was stable across seeds.

These results reiterate that standard hold-out testing alone is not sufficient to ensure acceptable model behavior in medical applications, underscoring the need for expanded testing protocols for ML systems intended for application in the medical domain. In the medical literature, such validations are termed “external validation” and have historically been part of reporting guidelines such as STARD and TRIPOD. These are being emphasized in updates such as STARD-AI and TRIPOD-AI. Finally, as part of regulated medical device development processes (see, e.g., US and EU regulations), there are other forms of safety and performance related considerations, such as mandatory compliance to standards for risk management, human factors engineering, clinical validations and accredited body reviews, that aim to ensure acceptable medical application performance.

Relative variability of medical imaging models on stress tests, using the same conventions as the figure above. Top left: Variation in AUC between diabetic retinopathy classification models trained using different random seeds when evaluated on images from different camera types. In this experiment, camera type 5 was not encountered during training. Bottom left: Variation in accuracy between skin condition classification models trained using different random seeds when evaluated on different estimated skin types (approximated by dermatologist-trained laypersons from retrospective photographs and potentially subject to labeling errors). Right: example images from the original test set (left) and the stress test set (right).

Underspecification in Other Applications

The cases discussed above are a small subset of models that we probed for underspecification. Other cases we examined include:

  • Natural Language Processing: We showed that on a variety of NLP tasks, underspecification affected how models derived from BERT-processed sentences. For example, depending on the random seed, a pipeline could produce a model that depends more or less on correlations involving gender (e.g., between gender and occupation) when making predictions.
  • Acute Kidney Injury (AKI) prediction: We showed that underspecification affects reliance on operational versus physiological signals in AKI prediction models based on electronic health records.
  • Polygenic Risk Scores (PRS): We showed that underspecification influences the ability for (PRS) models, which predict clinical outcomes based on patient genomic data, to generalize across different patient populations.

In each case, we showed that these important properties are left ill-defined by standard training pipelines, making them sensitive to seemingly innocuous choices.

Conclusion
Addressing underspecification is a challenging problem. It requires full specification and testing of requirements for a model beyond standard predictive performance. Doing this well needs full engagement with the context in which the model will be used, an understanding of how the training data were collected, and often, incorporation of domain expertise when the available data fall short. These aspects of ML system design are often underemphasized in ML research today. A key goal of this work is to show how underinvestment in this area can manifest concretely, and to encourage the development of processes for fuller specification and testing of ML pipelines.

Some important first steps in this area are to specify stress testing protocols for any applied ML pipeline that is meant to see real-world use. Once these criteria are codified in measurable metrics, a number of different algorithmic strategies may be useful for improving them, including data augmentation, pretraining, and incorporation of causal structure. It should be noted, however, that ideal stress testing and improvement processes will usually require iteration: both the requirements for ML systems, and the world in which they are used, are constantly changing.

Acknowledgements
We would like to thank all of our co-authors, Dr. Nenad Tomasev (DeepMind), Prof. Finale Doshi-Velez (Harvard SEAS), UK Biobank, and our partners, EyePACS, Aravind Eye Hospital and Sankara Nethralaya.

Read More

Registration now open for the 2021 Testing and Verification Symposium

Our annual Testing and Verification (TAV) Symposium brings together academia and industry in an open environment to exchange ideas and showcase the top experts from testing and verification scientific research and practice.

This year, the fifth annual TAV Symposium will be held virtually from Wednesday, December 1, through Thursday, December 2, 2021. The event is open to all testing and verification practitioners and researchers and is free to attend. The symposium’s agenda will include 10 talks that will offer opportunities for Q&A via the event platform.

Those interested in attending may submit their registration request below.

REGISTER

“My team, Sapienz, is collaborating with previous speakers on the ideas discussed at last year’s TAV symposium,” says Facebook Software Engineer Yue Jia. “The symposium is a great venue to share the challenges we are facing and to stimulate the technology transfer of research results to software testing and verification practices.”

Jia continues: “Beyond the great presentations, I enjoyed the discussions among the various testing and verification communities from last year’s TAV Symposium. The symposium offers an open platform to bridge research and practice in software testing, verification, and validation.”

“The TAV Symposium is a unique venue in the conference landscape where testing and verification, industry and academia, and theory and practice all meet as one community realizing their shared goal: raising everyone’s confidence in software,” says Software Engineer Jules Villard. “I cannot wait to meet this year’s TAV community and discover new collaboration opportunities.”

Below is the list of confirmed speakers, which can also be found on the registration page. Leading up to the event, as additional speakers are confirmed, they will be added to the registration site.

Confirmed speakers

Viktor Malík (Brno University of Technology)

Ke Mao (WhatsApp)

Azalea Raad (Imperial College London)

Tao Xie (Peking University)

Chunyang Chen (Monash University)

Isabel Min Li (Imperial College London)

Kinga Bojarczuk (Facebook)

Sébastien Bardin (Commissariat à l’Énergie Atomique)

Raphaël Monat (Sorbonne Université)

Noam Zilberstein (Cornell University)

For more information about speakers, including full bios and topics, visit the registration page.

The post Registration now open for the 2021 Testing and Verification Symposium appeared first on Facebook Research.

Read More