Robots dress humans without the full picture

Robots are already adept at certain things, such as lifting objects that are too heavy or cumbersome for people to manage. Another application they’re well suited for is the precision assembly of items like watches that have large numbers of tiny parts — some so small they can barely be seen with the naked eye.

“Much harder are tasks that require situational awareness, involving almost instantaneous adaptations to changing circumstances in the environment,” explains Theodoros Stouraitis, a visiting scientist in the Interactive Robotics Group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL).

“Things become even more complicated when a robot has to interact with a human and work together to safely and successfully complete a task,” adds Shen Li, a PhD candidate in the MIT Department of Aeronautics and Astronautics.

Li and Stouraitis — along with Michael Gienger of the Honda Research Institute Europe, Professor Sethu Vijayakumar of the University of Edinburgh, and Professor Julie A. Shah of MIT, who directs the Interactive Robotics Group — have selected a problem that offers, quite literally, an armful of challenges: designing a robot that can help people get dressed. Last year, Li and Shah and two other MIT researchers completed a project involving robot-assisted dressing without sleeves. In a new work, described in a paper that appears in an April 2022 issue of IEEE Robotics and Automation, Li, Stouraitis, Gienger, Vijayakumar, and Shah explain the headway they’ve made on a more demanding problem — robot-assisted dressing with sleeved clothes. 

The big difference in the latter case is due to “visual occlusion,” Li says. “The robot cannot see the human arm during the entire dressing process.” In particular, it cannot always see the elbow or determine its precise position or bearing. That, in turn, affects the amount of force the robot has to apply to pull the article of clothing — such as a long-sleeve shirt — from the hand to the shoulder.

To deal with the issue of obstructed vision, the team has developed a “state estimation algorithm” that allows them to make reasonably precise educated guesses as to where, at any given moment, the elbow is and how the arm is inclined — whether it is extended straight out or bent at the elbow, pointing upwards, downwards, or sideways — even when it’s completely obscured by clothing. At each instance of time, the algorithm takes the robot’s measurement of the force applied to the cloth as input and then estimates the elbow’s position — not exactly, but placing it within a box or volume that encompasses all possible positions. 

That knowledge, in turn, tells the robot how to move, Stouraitis says. “If the arm is straight, then the robot will follow a straight line; if the arm is bent, the robot will have to curve around the elbow.” Getting a reliable picture is important, he adds. “If the elbow estimation is wrong, the robot could decide on a motion that would create an excessive, and unsafe, force.” 

The algorithm includes a dynamic model that predicts how the arm will move in the future, and each prediction is corrected by a measurement of the force that’s being exerted on the cloth at a particular time. While other researchers have made state estimation predictions of this sort, what distinguishes this new work is that the MIT investigators and their partners can set a clear upper limit on the uncertainty and guarantee that the elbow will be somewhere within a prescribed box.   

The model for predicting arm movements and elbow position and the model for measuring the force applied by the robot both incorporate machine learning techniques. The data used to train the machine learning systems were obtained from people wearing “Xsens” suits with built-sensors that accurately track and record body movements. After the robot was trained, it was able to infer the elbow pose when putting a jacket on a human subject, a man who moved his arm in various ways during the procedure — sometimes in response to the robot’s tugging on the jacket and sometimes engaging in random motions of his own accord.

This work was strictly focused on estimation — determining the location of the elbow and the arm pose as accurately as possible — but Shah’s team has already moved on to the next phase: developing a robot that can continually adjust its movements in response to shifts in the arm and elbow orientation. 

In the future, they plan to address the issue of “personalization” — developing a robot that can account for the idiosyncratic ways in which different people move. In a similar vein, they envision robots versatile enough to work with a diverse range of cloth materials, each of which may respond somewhat differently to pulling.

Although the researchers in this group are definitely interested in robot-assisted dressing, they recognize the technology’s potential for far broader utility. “We didn’t specialize this algorithm in any way to make it work only for robot dressing,” Li notes. “Our algorithm solves the general state estimation problem and could therefore lend itself to many possible applications. The key to it all is having the ability to guess, or anticipate, the unobservable state.” Such an algorithm could, for instance, guide a robot to recognize the intentions of its human partner as it works collaboratively to move blocks around in an orderly manner or set a dinner table. 

Here’s a conceivable scenario for the not-too-distant future: A robot could set the table for dinner and maybe even clear up the blocks your child left on the dining room floor, stacking them neatly in the corner of the room. It could then help you get your dinner jacket on to make yourself more presentable before the meal. It might even carry the platters to the table and serve appropriate portions to the diners. One thing the robot would not do would be to eat up all the food before you and others make it to the table.  Fortunately, that’s one “app” — as in application rather than appetite — that is not on the drawing board.

This research was supported by the U.S. Office of Naval Research, the Alan Turing Institute, and the Honda Research Institute Europe.

Read More

School of Engineering welcomes Thomas Tull as visiting innovation scholar

Thomas Tull, leading visionary entrepreneur and investor, has been appointed a School of Engineering visiting innovation scholar, effective April 1.

Throughout his career, Tull has leveraged the power of technology, artificial intelligence, and data science to disrupt and revolutionize disparate industries. Today, as the founder, chair, and CEO of Tulco LLC, a privately held holding company, he looks to partner with companies employing cutting-edge ideas in industries that are established but often underfunded and under-innovated. Under Tull’s leadership Tulco has deployed proprietary technology, including new methods in data creation and deep learning, to help companies bring their ideas to fruition and facilitate industry-leading change.

Tull’s hands-on approach involves not only data science and analytical tools, but also a close partnership with business leaders. Along with Tull’s success on the infusion of transformational technology into business practices has come a focus on its societal impact and human interface.

As part of his role in the School of Engineering, Tull will focus on how cutting-edge programs centered around AI, quantum computing, and semiconductors might be leveraged for the greater good, while likewise helping to advance the role of humanities in developing emerging technologies and leaders. Tull will also engage with students, faculty, and staff through a variety of activities including seminars and speaking engagements, and will serve as a strategic advisor to the dean on various initiatives.

“Thomas is an incredible advocate and ambassador for innovation and technology,” says Anantha Chandrakasan, dean of the MIT School of Engineering and Vannevar Bush Professor for Electrical Engineering and Computer Science. “His commitment to these areas and impact on so many industries have been impressive, and we’re thrilled that he will join us to foster innovation across the school.”

Prior to starting Tulco, Tull was the founder and CEO of the film company Legendary Entertainment, which he started in 2004, producing a number of blockbuster films including “The Dark Knight” trilogy, “300,” “The Hangover” franchise, and many others. At Legendary, Tull deployed sophisticated and innovative AI, machine learning, and data analytics into the company to increase the commercial success of its films,  forever changing how movies are marketed.  

“Technological advancement is essential to our future and MIT is one of the leaders committed to exploring new frontiers and the latest technologies to enable the next generation to continue to create cutting-edge innovation,” says Tull. “I have always greatly admired MIT’s and the School of Engineering’s work on this front and it is an honor to be invited to contribute to this amazing institution. I look forward to working with the school over the next year.”

Tull is also an active supporter of philanthropic causes that support education, medical and scientific research, and conservation through the Tull Family Foundation. He is a member of the MIT School of Engineering Dean’s Advisory Council, and a trustee of Carnegie Mellon University, Yellowstone Forever, the National Baseball Hall of Fame and Museum, and the Smithsonian Institution. Tull is also part of the ownership group of the Pittsburgh Steelers and owns a farm in Pittsburgh where he has implemented the use of robotics, drones, analytics, and other advanced technologies to boost yields of high-quality natural foods.

Tull received his undergraduate degree from Hamilton College and resides in Pittsburgh, Pennsylvania.

Read More

Dan Huttenlocher ponders our human future in an age of artificial intelligence

What does it mean to be human in an age where artificial intelligence agents make decisions that shape human actions? That’s a deep question with no easy answers, and it’s been on the mind of Dan Huttenlocher SM ’84, PhD ’88, dean of the MIT Schwarzman College of Computing, for the past few years.

“Advances in AI are going to happen, but the destination that we get to with those advances is up to us, and it is far from certain,” says Huttenlocher, who is also the Henry Ellis Warren Professor in the Department of Electrical Engineering and Computer Science.

Along with former Google CEO Eric Schmidt and elder statesman Henry Kissinger, Huttenlocher recently explored some of the quandaries posed by the rise of AI, in the book, “The Age of AI: And Our Human Future.” For Huttenlocher and his co-authors, “Our belief is that, to get there, we need much more informed dialogue and much more multilateral dialogue. Our hope is that the book will get people interested in doing that from a broad range of places,” he says.

Now, with nearly two and a half years as the college dean, Huttenlocher doesn’t just talk the talk when it comes to interdisciplinarity. He is leading the college as it incorporates computer science into all fields of study at MIT while teaching students to use formidable tools like artificial intelligence ethically and responsibly.

That mission is being accomplished, in part, through two campus-wide initiatives that Huttenlocher is especially excited about: the Common Ground for Computing Education and Social and Ethical Responsibilities of Computing (SERC). SERC is complemented by numerous research and scholarly activities, such as AI for Health Care Equity and the Research Initiative for Combatting Systemic Racism. The Common Ground supports the development of cross-disciplinary courses that integrate computing into other fields of study, while the SERC initiative provides tools that help researchers, educators, and students understand how to conceptualize issues about the impacts of computing early in the research process.

“When I was a grad student, you worked on computer vision assuming that it was going to be a research problem for the rest of your lifetime,” he says. “Now, research problems have practical applications almost overnight in computing-related disciplines. The social impacts and ethical implications around computing are things that need to be considered from the very beginning, not after the fact.”

Budding interest in a nascent field

A deep thinker from an early age, Huttenlocher began pondering questions at the intersection of human intelligence and computing when he was a teenager.

With a mind for math, the Chicago native learned how to program before he entered high school, which was a rare thing in the 1970s. His parents, both academics who studied aspects of the human mind, influenced the path he would follow. His father was a neurologist at the University of Chicago Medical School who studied brain development, while his mother was a professor of cognitive psychology at the same institution.

Huttenlocher pursued a joint major in computer science and cognitive psychology as an undergraduate at the University of Michigan in an effort to bring those two disciplines together. When it came time to apply to graduate school, he found the perfect fit for his dual interests in the nascent field of AI, and enrolled at MIT.

While earning his master’s degree and PhD (in 1984 and 1988, respectively), he researched speech recognition, object recognition, and computer vision. He became fascinated by how machines can directly perceive the world around them. Huttenlocher was also drawn in by the entrepreneurial activity that was then ramping up around Cambridge. He spent summers interning at Silicon Valley startups and small tech companies in the Boston area, which piqued his interest in industry.

“I grew up in an academic household and had a healthy skepticism of following in my parents’ footsteps. So when I graduated, I wasn’t quite sure if I wanted an academic path or not. And to be honest, I’ve been a little bit ambivalent about it ever since. For better or worse, I’ve often ended up doing both at the same time,” he says.

Big problems, direct impact

Huttenlocher joined the computer science faculty at Cornell University in 1988 and also took a position at the Xerox Palo Alto Research Center (PARC), where he had interned as a graduate student. He taught computer science courses and worked on academic research projects when Cornell was in session, and spent his summers at Xerox PARC, as well as one day a week consulting remotely. (Long before Zoom, remote connectivity was “still pretty sketchy” in those days, he says.)

“I’ve long wanted to pair the deeper, bigger problems that we tend to try to make progress on in academia with a more direct and immediate impact on people, so spending time at Xerox PARC and at Cornell was a good way to do that,” he says.

Early in his research career, Huttenlocher took a more algorithmic approach to solving computer vision problems, rather than taking the generic optimization approaches that were more common at the time. Some of the techniques he and his collaborators developed, such as using a graph-based representation of an image, are still being used more than 20 years later.

Later, he and his colleagues conducted some of the first studies on how communities come together on social networks. In those pre-Facebook days, they studied LiveJournal, a social networking site that was popular in the early 2000s. Their work revealed that a person’s tendency to join an online community is not only influenced by the number of friends they have in that community, but also by how those friends are connected to one another.

In addition to research, Huttenlocher was passionate about bridging gaps between disciplines. He was named dean of the interdisciplinary Faculty of Computing and Information Science at Cornell in 2009. Three years later, he took his bridge-building skills to New York City when he became the founding dean of Cornell Tech, a new graduate school being established on Roosevelt Island.

That role was a tremendous challenge but also an extraordinary opportunity to create a campus that combined academia in computing-related disciplines with the growing tech community in New York, he says.

In a way, the role prepared him well to be founding dean of the MIT Schwarzman College of Computing, whose launch represented the most significant structural change to the Institute since the early 1950s.

“I think this place is very special. MIT has its own culture. It is a distinctive place in the positive sense of the word ‘distinctive.’ People are insanely curious here and very collaborative when it comes to solving problems. Just the opportunity to help build something new at MIT, something that will be important for the Institute but also for the country and the world, is amazing,” he says.

Making connections

While Huttenlocher was overseeing the creation of Cornell Tech, he was also forging connections around New York City. Before the Roosevelt Island campus was built, the school rented space in Google’s Eighth Avenue building, which is how he met then-Google CEO Eric Schmidt. The two enjoyed talking about (and sometimes arguing about) the promises and perils of artificial intelligence. At the same time, Schmidt was discussing AI with Henry Kissinger, whom he had befriended at a conference. By happenstance, the three got together and started talking about AI, which led to an article in The Atlantic and, eventually, the book.

“What we realized when we started talking about these questions is that the broader historical and philosophical context for an AI age is not something that has been looked at very much. When people are looking at social and ethical issues around computing, it is usually focused on the current problem, which is vital, but we think this broader framing is also important,” he says.

And when it comes to questions about AI, Huttenlocher feels a sense of urgency.

Advancements are happening so rapidly that there is immense pressure for educational institutions to keep up. Academic courses need to have computing woven through them as part of their intellectual fabric, especially as AI continues to become a larger part of everyday life, he says. This underscores the important work the college is doing, and the challenge it faces moving forward.

For Huttenlocher, who has found himself working in the center of a veritable Venn diagram of disciplines since his days as an undergraduate, it is a challenge he has fully embraced.

“It should not just be computer scientists or engineers looking at these problems. But it should not just be social scientists or humanists looking at them either,” he says. “We really need to bring different groups together.”

Read More

Generating new molecules with graph grammar

Chemical engineers and materials scientists are constantly looking for the next revolutionary material, chemical, and drug. The rise of machine-learning approaches is expediting the discovery process, which could otherwise take years. “Ideally, the goal is to train a machine-learning model on a few existing chemical samples and then allow it to produce as many manufacturable molecules of the same class as possible, with predictable physical properties,” says Wojciech Matusik, professor of electrical engineering and computer science at MIT. “If you have all these components, you can build new molecules with optimal properties, and you also know how to synthesize them. That’s the overall vision that people in that space want to achieve”

However, current techniques, mainly deep learning, require extensive datasets for training models, and many class-specific chemical datasets contain a handful of example compounds, limiting their ability to generalize and generate physical molecules that could be created in the real world.

Now, a new paper from researchers at MIT and IBM tackles this problem using a generative graph model to build new synthesizable molecules within the same chemical class as their training data. To do this, they treat the formation of atoms and chemical bonds as a graph and develop a graph grammar — a linguistics analogy of systems and structures for word ordering — that contains a sequence of rules for building molecules, such as monomers and polymers. Using the grammar and production rules that were inferred from the training set, the model can not only reverse engineer its examples, but can create new compounds in a systematic and data-efficient way. “We basically built a language for creating molecules,” says Matusik “This grammar essentially is the generative model.”

Matusik’s co-authors include MIT graduate students Minghao Guo, who is the lead author, and Beichen Li as well as Veronika Thost, Payal Das, and Jie Chen, research staff members with IBM Research. Matusik, Thost, and Chen are affiliated with the MIT-IBM Watson AI Lab. Their method, which they’ve called data-efficient graph grammar (DEG), will be presented at the International Conference on Learning Representations.

“We want to use this grammar representation for monomer and polymer generation, because this grammar is explainable and expressive,” says Guo. “With only a few number of the production rules, we can generate many kinds of structures.”

A molecular structure can be thought of as a symbolic representation in a graph — a string of atoms (nodes) joined together by chemical bonds (edges). In this method, the researchers allow the model to take the chemical structure and collapse a substructure of the molecule down to one node; this may be two atoms connected by a bond, a short sequence of bonded atoms, or a ring of atoms. This is done repeatedly, creating the production rules as it goes, until a single node remains. The rules and grammar then could be applied in the reverse order to recreate the training set from scratch or combined in different combinations to produce new molecules of the same chemical class.

“Existing graph generation methods would produce one node or one edge sequentially at a time, but we are looking at higher-level structures and, specifically, exploiting chemistry knowledge, so that we don’t treat the individual atoms and bonds as the unit. This simplifies the generation process and also makes it more data-efficient to learn,” says Chen.

Further, the researchers optimized the technique so that the bottom-up grammar was relatively simple and straightforward, such that it fabricated molecules that could be made.

“If we switch the order of applying these production rules, we would get another molecule; what’s more, we can enumerate all the possibilities and generate tons of them,” says Chen. “Some of these molecules are valid and some of them not, so the learning of the grammar itself is actually to figure out a minimal collection of production rules, such that the percentage of molecules that can actually be synthesized is maximized.” While the researchers concentrated on three training sets of less than 33 samples each — acrylates, chain extenders, and isocyanates — they note that the process could be applied to any chemical class.

To see how their method performed, the researchers tested DEG against other state-of-the-art models and techniques, looking at percentages of chemically valid and unique molecules, diversity of those created, success rate of retrosynthesis, and percentage of molecules belonging to the training data’s monomer class.

“We clearly show that, for the synthesizability and membership, our algorithm outperforms all the existing methods by a very large margin, while it’s comparable for some other widely-used metrics,” says Guo. Further, “what is amazing about our algorithm is that we only need about 0.15 percent of the original dataset to achieve very similar results compared to state-of-the-art approaches that train on tens of thousands of samples. Our algorithm can specifically handle the problem of data sparsity.”

In the immediate future, the team plans to address scaling up this grammar learning process to be able to generate large graphs, as well as produce and identify chemicals with desired properties.

Down the road, the researchers see many applications for the DEG method, as it’s adaptable beyond generating new chemical structures, the team points out. A graph is a very flexible representation, and many entities can be symbolized in this form — robots, vehicles, buildings, and electronic circuits, for example. “Essentially, our goal is to build up our grammar, so that our graphic representation can be widely used across many different domains,” says Guo, as “DEG can automate the design of novel entities and structures,” says Chen.

This research was supported, in part, by the MIT-IBM Watson AI Lab and Evonik.

Read More

Featured video: L. Rafael Reif on the power of education

MIT President L. Rafael Reif recently joined Raúl Rodríguez, associate vice president of internationalization at Tecnológico de Monterrey, for a wide-ranging fireside chat about the power of education and its impact in addressing global issues, even more so in a post pandemic world. 

“When I was younger, my parents used to always tell me and my brothers that we had to have an education because your education is the only thing you can bring with you, if you have to leave in a hurry,” recalled Reif, who was visiting with students and researchers on the Monterrey Tec campus at the invitation of José Antonio Fernández, chair of the board at the Tec and a member of the MIT Corporation.

Reif recounted his own experiences both academic and personal and shared his hope for a better future, emphasizing the role students will play in shaping it. 

“Many think that the purpose of university is to educate and advance knowledge — education and research — and that it should stop there… but students want to do something good. They want to make an impact and help,” said Reif. “So, I think that the purpose of university is not only to educate and advance knowledge, but to help students use that knowledge to solve problems — problems facing their cities, their states, their country, their world.”

Conecta, a news site of Monterrey Tec, has additional coverage and photos from the MIT president’s visit. 

Video by: Monterrey Tec | 52 min, 46 sec

Read More

Solving the challenges of robotic pizza-making

Imagine a pizza maker working with a ball of dough. She might use a spatula to lift the dough onto a cutting board then use a rolling pin to flatten it into a circle. Easy, right? Not if this pizza maker is a robot.

For a robot, working with a deformable object like dough is tricky because the shape of dough can change in many ways, which are difficult to represent with an equation. Plus, creating a new shape out of that dough requires multiple steps and the use of different tools. It is especially difficult for a robot to learn a manipulation task with a long sequence of steps — where there are many possible choices — since learning often occurs through trial and error.

Researchers at MIT, Carnegie Mellon University, and the University of California at San Diego, have come up with a better way. They created a framework for a robotic manipulation system that uses a two-stage learning process, which could enable a robot to perform complex dough-manipulation tasks over a long timeframe. A “teacher” algorithm solves each step the robot must take to complete the task. Then, it trains a “student” machine-learning model that learns abstract ideas about when and how to execute each skill it needs during the task, like using a rolling pin. With this knowledge, the system reasons about how to execute the skills to complete the entire task.

The researchers show that this method, which they call DiffSkill, can perform complex manipulation tasks in simulations, like cutting and spreading dough, or gathering pieces of dough from around a cutting board, while outperforming other machine-learning methods.

Beyond pizza-making, this method could be applied in other settings where a robot needs to manipulate deformable objects, such as a caregiving robot that feeds, bathes, or dresses someone elderly or with motor impairments.

“This method is closer to how we as humans plan our actions. When a human does a long-horizon task, we are not writing down all the details. We have a higher-level planner that roughly tells us what the stages are and some of the intermediate goals we need to achieve along the way, and then we execute them,” says Yunzhu Li, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and author of a paper presenting DiffSkill.

Li’s co-authors include lead author Xingyu Lin, a graduate student at Carnegie Mellon University (CMU); Zhiao Huang, a graduate student at the University of California at San Diego; Joshua B. Tenenbaum, the Paul E. Newton Career Development Professor of Cognitive Science and Computation in the Department of Brain and Cognitive Sciences at MIT and a member of CSAIL; David Held, an assistant professor at CMU; and senior author Chuang Gan, a research scientist at the MIT-IBM Watson AI Lab. The research will be presented at the International Conference on Learning Representations.

Student and teacher

 The “teacher” in the DiffSkill framework is a trajectory optimization algorithm that can solve short-horizon tasks, where an object’s initial state and target location are close together. The trajectory optimizer works in a simulator that models the physics of the real world (known as a differentiable physics simulator, which puts the “Diff” in “DiffSkill”). The “teacher” algorithm uses the information in the simulator to learn how the dough must move at each stage, one at a time, and then outputs those trajectories.

Then the “student” neural network learns to imitate the actions of the teacher. As inputs, it uses two camera images, one showing the dough in its current state and another showing the dough at the end of the task. The neural network generates a high-level plan to determine how to link different skills to reach the goal. It then generates specific, short-horizon trajectories for each skill and sends commands directly to the tools.

The researchers used this technique to experiment with three different simulated dough-manipulation tasks. In one task, the robot uses a spatula to lift dough onto a cutting board then uses a rolling pin to flatten it. In another, the robot uses a gripper to gather dough from all over the counter, places it on a spatula, and transfers it to a cutting board. In the third task, the robot cuts a pile of dough in half using a knife and then uses a gripper to transport each piece to different locations.

robot at work

A cut above the rest

DiffSkill was able to outperform popular techniques that rely on reinforcement learning, where a robot learns a task through trial and error. In fact, DiffSkill was the only method that was able to successfully complete all three dough manipulation tasks. Interestingly, the researchers found that the “student” neural network was even able to outperform the “teacher” algorithm, Lin says.

“Our framework provides a novel way for robots to acquire new skills. These skills can then be chained to solve more complex tasks which are beyond the capability of previous robot systems,” says Lin.

Because their method focuses on controlling the tools (spatula, knife, rolling pin, etc.) it could be applied to different robots, but only if they use the specific tools the researchers defined. In the future, they plan to integrate the shape of a tool into the reasoning of the “student” network so it could be applied to other equipment.

The researchers intend to improve the performance of DiffSkill by using 3D data as inputs, instead of images that can be difficult to transfer from simulation to the real world. They also want to make the neural network planning process more efficient and collect more diverse training data to enhance DiffSkill’s ability to generalize to new situations. In the long run, they hope to apply DiffSkill to more diverse tasks, including cloth manipulation.

This work is supported, in part, by the National Science Foundation, LG Electronics, the MIT-IBM Watson AI Lab, the Office of Naval Research, and the Defense Advanced Research Projects Agency Machine Common Sense program.

Read More

Q&A: Alberto Rodriguez on teaching a robot to find your keys

Growing up in Spain’s Catalonia region, Alberto Rodriguez loved taking things apart and putting them back together. But it wasn’t until he joined a robotics lab his last year in college that he realized robotics, and not mathematics or physics, would be his life’s calling. “I fell in love with the idea that you could build something and then tell it what to do,” he says. “That was my first intense exposure to the magic combo of building and coding, and I was hooked.”

After graduating from university in Barcelona, Rodriguez looked for a path to study in the United States. Through his undergraduate advisor, he met Matt Mason, a professor at Carnegie Mellon University’s Robotics Institute, who invited Rodriguez to join his lab for his PhD. “I began to engage with research, and I experienced working with a great mentor,” he says, “someone that was not there to tell me what to do, but rather to let me try, fail, and guide me through the process of trying again.”

Rodriguez arrived at MIT as a postdoc in 2013, where he continued to try, fail, and try again. In January, Rodriguez was promoted to associate professor with tenure in MIT’s Department of Mechanical Engineering.

Through the Science Hub, he’s currently working on a pair of projects with Amazon that explore the use of touch and inertial dynamics to teach robots to rapidly sort through clutter to find a specific object. In collaboration with MIT’s Phillip Isola and Russ Tedrake, one project is focused on training a robot to pick up, move, and place objects of a variety of shapes and sizes without damaging them.

In a recent interview, Rodriguez discussed the nuts and bolts of tactile robotics and where he sees the field heading.

Q: Your PhD thesis, Shape for Contact, led to the work you do now in tactile robotics. What was it about?

A: During my PhD, I got interested in the principles that guide the design of a robot’s fingers. Fingers are essential to how we (and robots) manipulate objects and interact with the environment. Robotics research has focused on the control and on the morphology of robotic fingers and hands, with less emphasis on their connection. In my thesis, I focused on techniques for designing the shape and motion of rigid fingers for specific tasks, like grasping objects or picking them up from a table. It got me interested in the connection between shape and motion, and in the importance of friction and contact mechanics.

Q: At MIT, you joined the Amazon Robotics Challenge. What did you learn?

A: After starting my research group at MIT, the MCube Lab, we joined the Amazon Robotics Challenge. The goal was to advance autonomous systems for perceiving and manipulating objects buried in clutter. It presented a unique opportunity to deal with the practical issues of building a robotic system to do something as simple and natural as extending your arm to pick a book from a box. The lessons and challenges from that experience inspired a lot of the research we now do in my lab, including tactile and vision-based manipulation.

Q: What’s the biggest technical challenge facing roboticists right now?

A: If I have to pick one, I think it’s the ability to integrate tactile sensors and to use tactile feedback to skillfully manipulate objects. Our brains unconsciously resolve all kinds of uncertainties that arise in mundane tasks, for example, fetching a key from your pocket, moving it into a stable grasp, and inserting it in a lock to open the door. Tactile feedback allows us to resolve those uncertainties. It’s a natural way to deal with never-seen-before objects, materials, and poses. Mimicking this flexibility in robots is key.

Q: What’s the biggest ethical challenge?

A: I think we should redouble our efforts to understand the effects of robotic automation on the future of work, and find ways to ensure that the benefits of this next wave of robotic automation are distributed more evenly than in the past.

Q: What should every aspiring roboticist know?

A: There are no silver bullets in robotics. Robotics benefits from advancements in many fields: actuation, control, planning, computer vision, and machine learning, to name a few. We’re currently fixated on solving robotics by pairing the right dataset with the right learning algorithm. Years ago, we were looking for the solution to robotics in computer vision, and before that, in planning algorithms.

These deep dives are key for the field to make progress, but they can also blind the individual roboticist. Ultimately, robotics is a systems discipline. No unilateral effort will get us to the capable and adaptable robots we want. Getting there is closer to the challenge of sending a human to the moon than achieving superhuman performance at the game of chess.

Q: How big of a role should industry play in robotics?

A: I’ve always found inspiration from working close to industry. Industry has a clear perspective of what problems need to be solved. Plus, robotics is a systems discipline, and so observations and discoveries need to consider the entire system. That requires a high level of commitment, resources, and engineering know-how, which is increasingly difficult for academic labs alone to provide. As robotics evolves, academia-industry engagements will be especially critical.

Read More

New program bolsters innovation in next-generation artificial intelligence hardware

The MIT AI Hardware Program is a new academia and industry collaboration aimed at defining and developing translational technologies in hardware and software for the AI and quantum age. A collaboration between the MIT School of Engineering and MIT Schwarzman College of Computing, involving the Microsystems Technologies Laboratories and programs and units in the college, the cross-disciplinary effort aims to innovate technologies that will deliver enhanced energy efficiency systems for cloud and edge computing.

“A sharp focus on AI hardware manufacturing, research, and design is critical to meet the demands of the world’s evolving devices, architectures, and systems,” says Anantha Chandrakasan, dean of the MIT School of Engineering and Vannevar Bush Professor of Electrical Engineering and Computer Science. “Knowledge-sharing between industry and academia is imperative to the future of high-performance computing.”

Based on use-inspired research involving materials, devices, circuits, algorithms, and software, the MIT AI Hardware Program convenes researchers from MIT and industry to facilitate the transition of fundamental knowledge to real-world technological solutions. The program spans materials and devices, as well as architecture and algorithms enabling energy-efficient and sustainable high-performance computing.

“As AI systems become more sophisticated, new solutions are sorely needed to enable more advanced applications and deliver greater performance,” says Daniel Huttenlocher, dean of the MIT Schwarzman College of Computing and Henry Ellis Warren Professor of Electrical Engineering and Computer Science. “Our aim is to devise real-world technological solutions and lead the development of technologies for AI in hardware and software.”

The inaugural members of the program are companies from a wide range of industries including chip-making, semiconductor manufacturing equipment, AI and computing services, and information systems R&D organizations. The companies represent a diverse ecosystem, both nationally and internationally, and will work with MIT faculty and students to help shape a vibrant future for our planet through cutting-edge AI hardware research.

The five inaugural members of the MIT AI Hardware Program are:  

  • Amazon, a global technology company whose hardware inventions include the Kindle, Amazon Echo, Fire TV, and Astro;
     
  • Analog Devices, a global leader in the design and manufacturing of analog, mixed signal, and DSP integrated circuits;
     
  • ASML, an innovation leader in the semiconductor industry, providing chipmakers with hardware, software, and services to mass produce patterns on silicon through lithography;
     
  • NTT Research, a subsidiary of NTT that conducts fundamental research to upgrade reality in game-changing ways that improve lives and brighten our global future; and
     
  • TSMC, the world’s leading dedicated semiconductor foundry.

The MIT AI Hardware Program will create a roadmap of transformative AI hardware technologies. Leveraging MIT.nano, the most advanced university nanofabrication facility anywhere, the program will foster a unique environment for AI hardware research.  

“We are all in awe at the seemingly superhuman capabilities of today’s AI systems. But this comes at a rapidly increasing and unsustainable energy cost,” says Jesús del Alamo, the Donner Professor in MIT’s Department of Electrical Engineering and Computer Science. “Continued progress in AI will require new and vastly more energy-efficient systems. This, in turn, will demand innovations across the entire abstraction stack, from materials and devices to systems and software. The program is in a unique position to contribute to this quest.”

The program will prioritize the following topics:

  • analog neural networks;
  • new roadmap CMOS designs;
  • heterogeneous integration for AI systems;
  • onolithic-3D AI systems;
  • analog nonvolatile memory devices;
  • software-hardware co-design;
  • intelligence at the edge;
  • intelligent sensors;
  • energy-efficient AI;
  • intelligent internet of things (IIoT);
  • neuromorphic computing;
  • AI edge security;
  • quantum AI;
  • wireless technologies;
  • hybrid-cloud computing; and
  • high-performance computation.

“We live in an era where paradigm-shifting discoveries in hardware, systems communications, and computing have become mandatory to find sustainable solutions — solutions that we are proud to give to the world and generations to come,” says Aude Oliva, senior research scientist in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and director of strategic industry engagement in the MIT Schwarzman College of Computing.

The new program is co-led by Jesús del Alamo and Aude Oliva, and Anantha Chandrakasan serves as chair.

Read More

Security tool guarantees privacy in surveillance footage

Surveillance cameras have an identity problem, fueled by an inherent tension between utility and privacy. As these powerful little devices have cropped up seemingly everywhere, the use of machine learning tools has automated video content analysis at a massive scale — but with increasing mass surveillance, there are currently no legally enforceable rules to limit privacy invasions

Security cameras can do a lot — they’ve become smarter and supremely more competent than their ghosts of grainy pictures past, the ofttimes “hero tool” in crime media. (“See that little blurry blue blob in the right hand corner of that densely populated corner — we got him!”) Now, video surveillance can help health officials measure the fraction of people wearing masks, enable transportation departments to monitor the density and flow of vehicles, bikes, and pedestrians, and provide businesses with a better understanding of shopping behaviors. But why has privacy remained a weak afterthought? 

The status quo is to retrofit video with blurred faces or black boxes. Not only does this prevent analysts from asking some genuine queries (e.g., Are people wearing masks?), it also doesn’t always work; the system may miss some faces and leave them unblurred for the world to see. Dissatisfied with this status quo, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), in collaboration with other institutions, came up with a system to better guarantee privacy in video footage from surveillance cameras. Called “Privid,” the system lets analysts submit video data queries, and adds a little bit of noise (extra data) to the end result to ensure that an individual can’t be identified. The system builds on a formal definition of privacy — “differential privacy” — which allows access to aggregate statistics about private data without revealing personally identifiable information.

Typically, analysts would just have access to the entire video to do whatever they want with it, but Privid makes sure the video isn’t a free buffet. Honest analysts can get access to the information they need, but that access is restrictive enough that malicious analysts can’t do too much with it. To enable this, rather than running the code over the entire video in one shot, Privid breaks the video into small pieces and runs processing code over each chunk. Instead of getting results back from each piece, the segments are aggregated, and that additional noise is added. (There’s also information on the error bound you’re going to get on your result — maybe a 2 percent error margin, given the extra noisy data added). 

For example, the code might output the number of people observed in each video chunk, and the aggregation might be the “sum,” to count the total number of people wearing face coverings, or the “average” to estimate the density of crowds. 

Privid allows analysts to use their own deep neural networks that are commonplace for video analytics today. This gives analysts the flexibility to ask questions that the designers of Privid did not anticipate. Across a variety of videos and queries, Privid was accurate within 79 to 99 percent of a non-private system.

“We’re at a stage right now where cameras are practically ubiquitous. If there’s a camera on every street corner, every place you go, and if someone could actually process all of those videos in aggregate, you can imagine that entity building a very precise timeline of when and where a person has gone,” says MIT CSAIL PhD student ​​Frank Cangialosi, the lead author on a paper about Privid. “People are already worried about location privacy with GPS — video data in aggregate could capture not only your location history, but also moods, behaviors, and more at each location.” 

Privid introduces a new notion of “duration-based privacy,” which decouples the definition of privacy from its enforcement — with obfuscation, if your privacy goal is to protect all people, the enforcement mechanism needs to do some work to find the people to protect, which it may or may not do perfectly. With this mechanism, you don’t need to fully specify everything, and you’re not hiding more information than you need to. 

Let’s say we have a video overlooking a street. Two analysts, Alice and Bob, both claim they want to count the number of people that pass by each hour, so they submit a video processing module and ask for a sum aggregation.

The first analyst is the city planning department, which hopes to use this information to understand footfall patterns and plan sidewalks for the city. Their model counts people and outputs this count for each video chunk.

The other analyst is malicious. They hope to identify every time “Charlie” passes by the camera. Their model only looks for Charlie’s face, and outputs a large number if Charlie is present (i.e., the “signal” they’re trying to extract), or zero otherwise. Their hope is that the sum will be non-zero if Charlie was present. 

From Privid’s perspective, these two queries look identical. It’s hard to reliably determine what their models might be doing internally, or what the analyst hopes to use the data for. This is where the noise comes in. Privid executes both of the queries, and adds the same amount of noise for each. In the first case, because Alice was counting all people, this noise will only have a small impact on the result, but likely won’t impact the usefulness. 

In the second case, since Bob was looking for a specific signal (Charlie was only visible for a few chunks), the noise is enough to prevent them from knowing if Charlie was there or not. If they see a non-zero result, it might be because Charlie was actually there, or because the model outputs “zero,” but the noise made it non-zero. Privid didn’t need to know anything about when or where Charlie appeared, the system just needed to know a rough upper bound on how long Charlie might appear for, which is easier to specify than figuring out the exact locations, which prior methods rely on. 

The challenge is determining how much noise to add — Privid wants to add just enough to hide everyone, but not so much that it would be useless for analysts. Adding noise to the data and insisting on queries over time windows means that your result isn’t going to be as accurate as it could be, but the results are still useful while providing better privacy. 

Cangialosi wrote the paper with Princeton PhD student Neil Agarwal, MIT CSAIL PhD student Venkat Arun, assistant professor at the University of Chicago Junchen Jiang, assistant professor at Rutgers University and former MIT CSAIL postdoc Srinivas Narayana, associate professor at Rutgers University Anand Sarwate, and assistant professor at Princeton University and Ravi Netravali SM ’15, PhD ’18. Cangialosi will present the paper at the USENIX Symposium on Networked Systems Design and Implementation Conference in April in Renton, Washington. 

This work was partially supported by a Sloan Research Fellowship and National Science Foundation grants.

Read More

3 Questions: How the MIT mini cheetah learns to run

It’s been roughly 23 years since one of the first robotic animals trotted on the scene, defying classical notions of our cuddly four-legged friends. Since then, a barrage of the walking, dancing, and door-opening machines have commanded their presence, a sleek mixture of batteries, sensors, metal, and motors. Missing from the list of cardio activities was one both loved and loathed by humans (depending on whom you ask), and which proved slightly trickier for the bots: learning to run. 

Researchers from MIT’s Improbable AI Lab, part of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and directed by MIT Assistant Professor Pulkit Agrawal, as well as the Institute of AI and Fundamental Interactions (IAIFI) have been working on fast-paced strides for a robotic mini cheetah — and their model-free reinforcement learning system broke the record for the fastest run recorded. Here, MIT PhD student Gabriel Margolis and IAIFI postdoc Ge Yang discuss just how fast the cheetah can run. 

Q: We’ve seen videos of robots running before. Why is running harder than walking?  

A: Achieving fast running requires pushing the hardware to its limits, for example by operating near the maximum torque output of motors. In such conditions, the robot dynamics are hard to analytically model. The robot needs to respond quickly to changes in the environment, such as the moment it encounters ice while running on grass. If the robot is walking, it is moving slowly and the presence of snow is not typically an issue. Imagine if you were walking slowly, but carefully: you can traverse almost any terrain. Today’s robots face an analogous problem. The problem is that moving on all terrains as if you were walking on ice is very inefficient, but is common among today’s robots. Humans run fast on grass and slow down on ice — we adapt. Giving robots a similar capability to adapt requires quick identification of terrain changes and quickly adapting to prevent the robot from falling over. In summary, because it’s impractical to build analytical (human-designed) models of all possible terrains in advance, and the robot’s dynamics become more complex at high-velocities, high-speed running is more challenging than walking.

Q: Previous agile running controllers for the MIT Cheetah 3 and mini cheetah, as well as for Boston Dynamics’ robots, are “analytically designed,” relying on human engineers to analyze the physics of locomotion, formulate efficient abstractions, and implement a specialized hierarchy of controllers to make the robot balance and run. You use a “learn-by-experience model” for running instead of programming it. Why? 

A: Programming how a robot should act in every possible situation is simply very hard. The process is tedious, because if a robot were to fail on a particular terrain, a human engineer would need to identify the cause of failure and manually adapt the robot controller, and this process can require substantial human time. Learning by trial and error removes the need for a human to specify precisely how the robot should behave in every situation. This would work if: (1) the robot can experience an extremely wide range of terrains; and (2) the robot can automatically improve its behavior with experience. 

Thanks to modern simulation tools, our robot can accumulate 100 days’ worth of experience on diverse terrains in just three hours of actual time. We developed an approach by which the robot’s behavior improves from simulated experience, and our approach critically also enables successful deployment of those learned behaviors in the real world. The intuition behind why the robot’s running skills work well in the real world is: Of all the environments it sees in this simulator, some will teach the robot skills that are useful in the real world. When operating in the real world, our controller identifies and executes the relevant skills in real-time.  

Q: Can this approach be scaled beyond the mini cheetah? What excites you about its future applications?  

A: At the heart of artificial intelligence research is the trade-off between what the human needs to build in (nature) and what the machine can learn on its own (nurture). The traditional paradigm in robotics is that humans tell the robot both what task to do and how to do it. The problem is that such a framework is not scalable, because it would take immense human engineering effort to manually program a robot with the skills to operate in many diverse environments. A more practical way to build a robot with many diverse skills is to tell the robot what to do and let it figure out the how. Our system is an example of this. In our lab, we’ve begun to apply this paradigm to other robotic systems, including hands that can pick up and manipulate many different objects.

This work is supported by the DARPA Machine Common Sense Program, Naver Labs, MIT Biomimetic Robotics Lab, and the NSF AI Institute of AI and Fundamental Interactions. The research was conducted at the Improbable AI Lab.

Read More