Proceedings Article | 11 May 2009
KEYWORDS: Visualization, Sensors, Head, Cameras, Eye, Image segmentation, Image visualization, Robotics, Environmental sensing, Systems modeling
In many real-world situations and applications that involve humans or machines (e.g., situation awareness, scene
understanding, driver distraction, workload reduction, assembly, robotics, etc.) multiple sensory modalities (e.g., vision,
auditory, touch, etc.) are used. The incoming sensory information can overwhelm processing capabilities of both humans
and machines. An approach for estimating what is most important in our sensory environment (bottom-up or goal-driven)
and using that as a basis for workload reduction or taking an action could be of great benefit in applications
involving humans, machines or human-machine interactions. In this paper, we describe a novel approach for determining
high saliency stimuli in multi-modal sensory environments, e.g., vision, sound, touch, etc. In such environments, the high
saliency stimuli could be a visual object, a sound source, a touch event, etc. The high saliency stimuli are important and
should be attended to from perception, cognition or/and action perspective. The system accomplishes this by the fusion
of saliency maps from multiple sensory modalities (e.g., visual and auditory) into a single, fused multimodal saliency
map that is represented in a common, higher-level coordinate system. This paper describes the computational model and
method for generating multi-modal or fused saliency map. The fused saliency map can be used to determine primary and
secondary foci of attention as well as for active control of a hardware/device. Such a computational model of fused
saliency map would be immensely useful for a machine-based or robot-based application in a multi-sensory
environment. We describe the approach, system and present preliminary results on a real-robotic platform.