Gestures, Speech and Vision—Towards a Multi-Modal Augmented Reality Human-Robot Interface

From The Theme
HUMAN MACHINE INTERACTION AND SENSING

WHAT IF?
What if humans could communicate with robots through visual and speech interfaces that would produce more natural interactions?

WHAT WE SET OUT TO DO
We set out to develop tools that would improve human-robot communication, collaboration and team-work. Specifically, we aimed to augment the STanford AI Robot (STAIR) with vision based human-robot interaction capabilities. This new interface would integrate visual data to enable STAIR to interact more effectively with humans.

WHAT WE FOUND
Our research explored methods for improving robotic visual perception and visual attention systems. We developed a learning algorithm that enables a robot to differentiate between a variety of novel objects – items seen for the first time through robotic computer vision. We also explored methods for real time object recognition and tracking in digital video.

LEARN MORE
STAIR (Stanford Artificial Intelligence Robot)

Stephen Gould, Joakim Arfvidsson, Adrian Kaehler, Benjamin Sapp, Marius Meissner, Gary Bradski, Paul Baumstarch, Sukwon Chung and Andrew Y. Ng. “Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video”. Proceedings of the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), 2007.

Ashutosh Saxena, Justin Driemeyer, Justin Kearns, and Andrew Y. Ng. “Robotic Grasping of Novel Objects” Neural Information Processing Systems (NIPS 19), 2006.

PEOPLE BEHIND THE PROJECT
Andrew Ng (at the time of this project) was Assistant Professor of Computer Science and Electrical Engineering (by courtesy) at Stanford University, where he led the STAIR project and was the director of the Autonomous Helicopter Lab. In 2011 he led the development of Stanford University’s main MOOC (Massive Open Online Courses) platform and also taught an online Machine Learning class that was offered to over 100,000 students, leading to the founding of Coursera. Currently, Andrew Ng is VP & Chief Scientist of Baidu; Co-Chairman and Co-Founder of Coursera; and an Adjunct Professor at Stanford University.

Daniel Chavez-Clemente is the Senior Mechanical Design Engineer at MDA US Systems LLC, where he designs mechanisms and robots for aerospace applications. At the time of the project, he was a PhD Candidate is the Department of Aeronautics and Astronautics at Stanford University. He received his PhD in 2010.

Anya Petrovskaya is the founder and Chief Science Officer at Eonite Perception, Inc. a computer vision company that develops software for positional tracking for VR and AR. At the time of the project, she was a PhD Candidate is Computer Science at Stanford University, receiving her PhD in 2011. Dr. Petrovskaya’s areas of expertise are in Computer Vision, Artificial Intelligence, and Robotics.