MIT Artificial Intelligence Laboratory’s Bryan Adams, Cynthia Breazeal,
Rodney A. Brooks, and Brian Scassellati
=======Research Report===============
HUMANOID ROBOTICS
HUMANOID ROBOTS CAN BE USED FOR MUCH MORE THAN THEIR TRADITIONAL ROLES.
EXPLORE THEORIES OF HUMAN INTELLIGENCE.
THE AUTHORS DISCUSS THEIR PROJECT AIMED AT DEVELOPING ROBOTS THAT CAN BEHAVE LIKE AND INTERACT WITH HUMANS.
=================
The term “robot” was first used by Karel Capek in his 1923 drama R.U.R. : Rossum’s Universal Robots as a derived term for the Czech robota (forced labor). Robots of today are confined to tasks that are either too dangerous or tiresome for humans, such as exploring other planets and inspecting nuclear power plants. In general, robots are still a long way from having the intelligence and flexibility of their fictional counterparts.
Work on developing robots that are a step closer to the science fiction androids is being done in humanoid robotics labs all around the world. A daunting engineering challenge, creating a human-like robot requires knowledge of real-time control, computer architecture, mechanical, electrical, and software engineering. We started a project in 1993 to build a humanoid robot for artificial intelligence.
We’ve had to address difficulties specific to integrated systems in addition to the pertinent engineering, computer architecture, real-time control, and studying theories of human intelligence. What sort of sensors ought we to employ, and how ought the robot to process the information? How can the robot perform a task with purposeful action while also being responsive to its surroundings? How can the system learn new tasks and adjust to changing circumstances? Many of the same issues with motor-control, vision, and machine learning must be addressed in every humanoid robotics lab.
The guiding principles of our approach
The fundamental differences in research priorities and underlying assumptions amongst organizations are what really separate them. Three fundamental concepts govern our research at the MIT AI Lab.
We create humanoid robots with the ability to engage with people while operating autonomously and securely in areas free from human oversight or control. They are not created as solutions for particular robotic requirements (unlike welding robots on production lines). Our objective is to create robots that perform in the same way across a wide range of real-world environments.
In order for anyone to interact with social robots without specific training or guidance, they must be able to recognize and comprehend everyday low-level human cues like head nods and eye contact. Additionally, they must be able to use such norms in an engaging interaction. The necessity of these capabilities has an impact on the physical embodiment and control-system architecture of the robot.
In order for anyone to interact with social robots without specific training or guidance, they must be able to recognize and comprehend everyday low-level human cues like head nods and eye contact. Additionally, they must be able to use such norms in an engaging interaction. The necessity of these capabilities has an impact on the physical embodiment and control-system architecture of the robot.
Autonomous machines around people
Our robots must be able to work flexibly under a variety of environmental conditions and for a variety of activities, unlike industrial robots that operate in a fixed environment and on a limited range of stimuli. We must handle research concerns like behavior selection and attentiveness since we need the system to function autonomously. Such autonomy frequently entails making a choice between generality in responding to a wider variety of stimuli and performance on specific tasks. The robustness and flexibility that autonomous systems offer, however, are qualities that task-specific systems will never be able to match.
We must develop systems that can handle the intricacies of natural environments if we want our robots to function autonomously alongside their human coworkers in a busy, cluttered, and noisy workplace. These settings are not only unsuitable for the robot, despite the fact that they are not nearly as hazardous as those faced by planetary explorers. Our robots must be able to learn from human demonstration in addition to being secure for human interaction and sensing and reacting to social cues.
Our robots are built using the principles of this research. For instance, Cog (see Figure 1) began as a single arm, 14-DOF upper torso, and a simple vision system. In this initial version, we implemented multimodal behavior systems, such as reaching for a visual target. Cog now features a seven DOF head, two six DOF arms, three torso joints, and significantly more sophisticated sensory systems.
The robot has a binocular, variable-resolution view of its surroundings thanks to each eye having a camera with a restricted field of view for high-resolution vision and a wide field of view for peripheral vision. The robot can more reliably coordinate motor responses thanks to an inertial system. Potentiometers measure position, while strain gauges gauge the output torque at each arm joint. Other proprioceptive inputs include limit switches, pressure sensors, and heat sensors, in addition to the two microphones that provide aural information.
Figure 1. We deliberately created our upper-torso development platform, Cog, with 22 degrees of freedom to precisely mimic human movement.
The robot exemplifies our idea of safe engagement on two different levels. First, using a torsional spring, we connected the arm motors in series with the joints. The spring’s compliance gives users of the arms a tangible sense of safety in addition to protecting the gearbox and removing high-frequency collision vibrations. Second, each joint behaves as if it were being controlled by a low-frequency spring system (soft springs and huge masses) when a spring law is used in conjunction with a low-gain force control loop.
With such control, the arms can move naturally from one position to the next at a relatively slow command rate, deflecting away from impediments rather than pushing through them recklessly, enabling safe and organic engagement. (See Cynthia Breazeal and her colleagues’ article Social Constraints on Animate Vision in this issue for discussion of Kismet, another robot designed for human interaction.)
Social interaction with people
We are particularly interested in social interaction because our robots must function in a human environment. Our robots can interact with humans in a natural way by having social abilities, and they can also learn more sophisticated behavior by being socially adept. Humans act as both educators who help mold the robot’s behavior and examples for it to follow.
The four social interaction features that are the focus of our present research are an emotional model for controlling social dynamics, shared attention as a way to determine saliency, receiving feedback through vocal prosody, and imitation learning. emotional modeling to control social dynamics. A robot with social intelligence must have an emotional model that can comprehend and control its surroundings.
For a robot to learn from such a model, it needs two skills. The first is the capacity to gather social input in order to comprehend the pertinent cues people give about their emotional state that can aid it in understanding the dynamics of any given interaction. The second skill is the capacity to control how the environment expresses its own emotional state in a way that can modify the dynamics of social interaction. The robot may show confusion if, for instance, it is watching an instructor demonstrate a task but the instructor is going too quickly for the robot to keep up with. Naturally, the instructor perceives this display as a cue to slow down. The rate and quality of the instruction can be influenced in this way by the robot.
Our current architecture includes a motivation model that covers these different kinds of exchange. We are creating a general control architecture for two of our humanoid robots. We list the components that we either have already implemented or are currently developing under each huge system. Additionally, many skills are found in the intersections of these modules, including the acquisition of visual-motor skills and the control of attention preferences according to motivation. Machine learning methods, which are a crucial component of each of these systems, are not individually listed here.
By using shared attention, saliency can be determined. Understanding the fundamentals of shared attention as conveyed by gaze direction, pointing, and other gestures is another crucial prerequisite for a robot to participate in social settings. Making sure the machine and instructor are both focused on the same item to understand where new information should be applied is one challenge in enabling a machine to learn from an instructor. In other words, the learner needs to be aware of which elements of the scenario apply to the current lesson. Human students direct their attention to certain things via a variety of social cues from the teacher. These cues include language determiners (such as this or that), gestural cues (such as pointing or eye direction), and postural cues (such as proximity).
We are putting in place technologies that are able to identify social cues related to shared attention and that can react suitably based on the social situation. speech prosody feedback collection. For many social encounters, taking part in vocal exchange is crucial. Other robotic auditory systems have concentrated on the identification of a constrained set of hardwired commands. Our studies have concentrated on developing a fundamental knowledge of vocal patterns. To enable our robots to recognize vocal affirmations, disapprovals, and attentional bids, we are putting in place an auditory system. The robot will then receive organic social feedback on the actions it has and has not successfully carried out.
Another natural way for a robot to learn new abilities and objectives is through imitation. Think of this illustration: The robot is watching as someone cracks open a glass jar. The person walks up to the robot and sets the jar on a nearby table. The individual starts by rubbing his hands together before taking off the jar’s cover. With the lid in one hand and the glass jar in the other, he starts to unscrew it by rotating it counterclockwise. He pauses to wipe his forehead as he opens the jar and looks over at the robot to see what it is doing. Then he starts the jar again. The robot then makes an effort to mimic the motion.
Although some of the problems this circumstance brings are addressed by traditional machine learning, creating a system that can learn from this kind of interaction necessitates a focus on additional research challenges. What portions of the imitation should be crucial (like flipping the lid counterclockwise) and what should be ignored (like wiping your brow)? How does the robot assess its performance after the action has been completed? How can the robot apply what it has learned from this experience to a circumstance that is similar? These concerns call an understanding of both the physical and social environments.
Developing and Evaluating ideas of human intelligence
We strive to use our implementations of these models in our research to test and evaluate the initial hypothesis in addition to using biological models as inspiration for our mechanical designs and software structures. Humanoid robots can be used to research and validate models from cognitive science and behavioral science, just as computer simulations of neural nets have been used to study and improve models from neuroscience. In our research, we have utilized the four biological models listed below.
This implementation produced some intriguing results. The two-step training procedure was more computationally straightforward from a computer science standpoint. The training centered on mastering two simpler mappings that could be linked together to create the desired behavior rather than attempting to map the visual-stimulus location’s two dimensions to the nine DOF required to orient and reach for an object. Additionally, Cog independently learnt the second mapping between eye position and postural primitives.
This was made possible by a reliable error signal being provided by the mapping between stimulus location and eye position (Figure 3). This implementation revealed a flaw in the postural primitive hypothesis from a biological perspective.
There was no mechanism for extrapolating to postures outside the first workplace, even if the model explained how to interpolate between postures in the original workspace. Reaching for a visual goal, Figure 3.
A ballistic mapping calculates the arm instructions required to reach for a stimulus after the robot has oriented to it. The robot watches the movement of its own arm. The error signal is then created using the same mapping that was used for orientation, and it may be used to train the ballistic map. 8
Rhythmic actions. A model of rhythmic motion-producing spinal cord neurons is presented by Kiyotoshi Matsuoka. This approach has been put into practice to produce repetitive arm actions, such turning a crank. Each arm joint is driven by two simulated neurons connected in an inhibitory manner, as seen in Figure 4. The oscillators constantly change the equilibrium point of the joint’s virtual spring using proprioceptive information from the joint. The entire arm motion is determined by the interaction of the oscillator dynamics at each joint and the physical dynamics of the arm. Neural oscillators, Figure 4. Each joint’s oscillators are made up of two neurons that inhibit each other.
Open white circles are excitatory connections, while black circles show inhibitory connections. The sum of the individual outputs from the neurons constitutes the final output. This application offered certain engineering advantages while validating Matsuoka’s model on numerous real-world tasks. First, neither a dynamic model of the system nor a kinematic model of the arm are necessary for the oscillators. No prior knowledge of the arm or the surroundings was necessary. Second, the oscillators may be tuned to a variety of tasks without requiring any adjustments to the control system, including turning a crank, playing with a Slinky, sawing a block of wood, and swinging a pendulum.
Third, the system was very perturbation-tolerant. We could attach heavy loads to the arm and the system would swiftly adjust to the change, in addition to being able to stop and start it with a very brief transitory period (often less than one cycle). Finally, additional modalities may provide the oscillators’ input. One illustration was the use of auditory input to enable the robot to drum in tandem with a human drummer.
Visual awareness and search. By combining low-level feature detectors for visual motion, inherent perceptual classifiers (such face detectors), color saliency, and depth segmentation with a motivational and behavioral model, we have integrated Jeremy Wolfe’s model of human visual search and attention (see Figure 5). With the aid of this attention mechanism, the robot is able to focus computational efforts and exploratory actions on environmental objects that are intrinsically or contextually salient.
Review of the attention system. An attention activation map is created by combining different visual feature detectors, such as color, motion, and face detectors, with a habituation function. The weighted feature-map combination is influenced by the attention process, which also affects the robot’s internal motivational and behavioral state and eye control. We took the pictures while conducting a behavioral trial. We were able to demonstrate preferential looking using both top-down task constraints and opportunistic exploitation of low-level characteristics thanks to this approach. For instance, the motivation system increases the weight of the face-detector feature when the robot is looking for ocial contact.
This results in a preference for facial expressions. The robot’s attention would nevertheless be drawn to the object’s low-level characteristics if it suddenly looked to be a particularly intriguing nonface object. This attention model incorporates saliency cues based on the model’s focus of attention. Additionally, we were able to create a straightforward technique for including habituation effects in Wolfe’s model. The robot will become accustomed to stimuli that are currently the focus of attention by treating time-decayed Gaussian fields as an additional low-level feature.
Theory of mind and shared attention. Recognizing that others have views, goals, and perceptions that are distinct from one’s own is an important developmental milestone for children. This developmental chain includes the capacities to recognize what another person can see, understand that another person holds a mistaken belief, and understand that another person prefers different games than the child does. Additionally, this developmental step may be linked to the capacity for self-awareness, the capacity for verbalization of perceptual experiences, and the capacities for imaginative and creative play.
We are putting into practice a social skill development model that takes into account both typical development and autism-related developmental problems. We are developing systems to recognize eye contact and have systems in place that can recognize faces and eyes in unrestricted visual situations. Despite the fact that this work is still in its early stages, we think that using a developing model on a robot will enable precise and controlled adjustments of the model while keeping the testing setup and methods used on human subjects.
Researchers can vary internal model parameters systematically as they evaluate the effects of different environmental conditions on each step of development. Because the robot brings the model into the same environment as a human subject, researchers can use similar evaluation criteria (whether subjective measurements from observers or quantitative measurements such as reaction time or accuracy). Also, researchers can subject a robot to testing that s potentially hazardous, costly, or unethical to conduct on humans.
Though science fiction is typically credited with being inspired by scientific research, it’s possible that with AI and robotics, fiction took the lead. But over the past ten years, a lot of research groups, conferences, and special issues have turned their attention to humanoid robots. While it may be challenging to surpass the creativity of science-fiction authors, our work does suggest one potential future. People will find it regular and natural for robots to engage with humans in human-like ways. By creating these systems, we will also continue to gain knowledge about the characteristics of our own intelligence.
Acknowledgemnts
This work was supported by ONR and DARPA under MURI N00014-95-1-0600 and by DARPA under
contract DABT 63-99-1-0012.
References
1. R.A. Brooks et al., Alternative Essences of Intelligence, Proc. 15th Nat l Conf. Artificial Intelligence (AAAI 98) and 10th Conf. Innovative Applications of Artificial Intelligence (IAAI 98), AAAI Press, Menlo Park, Calif., 1998, pp. 961—968.
- R.A. Brooks et al., The Cog Project: Building a Humanoid Robot, Computation for Metaphors, Analogy and Agents, C. Nehaniv, ed., Springer Lecture Notes in Artificial Intelligence, Vol. 1562, Springer-Verlag, Berlin, 1998.
- G.A. Pratt and M.M. Williamson, Series Elastic Actuators, Proc. IEEE/RSJ Int l Conf. Intelligent Robots and Systems (IROS 95), Vol. 1, IEEE Computer Soc. Press, Los Alamitos, Calif., 1995, pp. 399—406.
- C. Breazeal and B. Scassellati, Challenges in Building Robot s That Imitate People, to be published in Imitation In Animals and Artifacts, K. Dautenhahn and C. Nehaniv, eds., MIT Press, Cambridge, Mass., 2000.
- A. Diamond, Developmental Time Course in Human Infants and Infant Monkeys, and the Neural Bases of Inhibitory Control in Reaching, The Development and Neural Bases of Higher Cognitive Functions, New York Academy of Sciences, New York, 1990, pp. 637—676.
- M.J. Marjanovic, B. Scassellati, and M.M. Williamson, Self-Taught Visually Guided Pointing for a Humanoid Robot, From Animals to Animats 4: Proc. Fourth Int l Conf. Simulation of Adaptive Behavior (SAB 96), MIT Press, Cambridge, Mass, 1996, pp. 35—44.
- S.F. Giszter, F.A. Mussa-Ivaldi, and E. Bizzi, Convergent Force Fields Organized in the Frog s Sp inal Cord, J. Neuroscience, Vol. 13, No. 2, February 1993, pp. 467—491.
- K. Matsuoka, Sustained Oscillations Generated by Mutually Inhibiting Neurons with Adaption, Biological Cybernetics, Vol. 52, 1985, pp. 367—376.
- M. Williamson, Robot Arm Control Exploiting Natural Dynamics, doctoral thesis, Massachusetts Institute of Technology, Dept. Electrical Eng. and Computer Science, Cambridge, Mass., 1999.
- J. Wolfe, Guided Search 2.0: A Revised Model of Visual Search, Psychonomic Bull. and Rev., Vol. 1, No. 2, June 1994, pp. 202—238.
- C. Breazeal and B. Scassellati, A Context-Dependent Attention System for a Social Robot, Proc. 16th Int l Joint Conf. Artificial Intelligence (IJCAI 99), Morgan Kaufmann, San Francisco, 1999, pp. 1146—1153. Bryan Adams is completing his ME with Rodney Brooks Humanoid Robotics group and is interested in theories of intelligent control for humanoid arms. He received his BS in electrical engineering and computer science from MIT. Contact him at the MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge, MA 02139; bpadams@ai.mit.edu; www.ai.mit.edu/people/bpadams. Cynthia Breazeal received her ScD from the MIT Artificial Intelligence Laboratory. Her interests focus on humanlike robots that can interact in natural, social ways with humans. She received her BS in electrical and computer engineering
from the University of Calif., Santa Barbara, and her MS in electrical engineering and computer science from MIT.
Contact her at the MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge, MA 02139; cynthia@ai.mit.edu; www.ai.mit.edu/people/cynthia/cynthia.html.
Rodney A. Brooks is the Director of the MIT Artificial Intelligence Laboratory and the Fujitsu Professor of Computer Science and Engineering. His research interests include robotics, computer vision, and architectures for intelligence. He received his PhD in computer science from Stanford. He is an IEEE member and a fellow of both the American Association for Artificial Intelligence and American Association for the Advancement of Science. Contact him at the MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge, MA 02139; brooks@ai.mit.edu; www.ai.mit.edu/people/brooks. Brian Scassellati is completing his PhD with Rodney Brooks at the MIT Artificial Intelligence Laboratory. His work is strongly grounded in theories of how the human mind develops, and he is interested in robotics as a tool for evaluating models from biological sciences. He received a BS in computer science, a BS in brain and cognitive science, and an ME in
electrical engineering and computer science from MIT. Contact him at the MIT Artificial Intelligence Laboratory, 545 Technology Square, Cambridge, MA 02139; scaz@ai.mit.edu; www.ai.mit.edu/people/scaz.