Document Type



Computer Engineering | Robotics


An attractive approach to improve tracking performance for visual surveillance is to use information from multiple visual sensory cues such as position, color, shape, etc. Previous work in fusion for tracking has tended to focus on fusion by numerically combining the scores assigned by each cue. We argue that for video scenes with many targets in a crowded situation, the splitting and merging of regions associated with targets, and the subsequent dramatic changes in cue values and reliabilities, renders this form of fusion less effective. In this paper we present experimental results showing that use of cue rank information in fusion produces a significantly better tracking result in crowded scenes. We also present a formalization of this fusion problem as a step in understanding why this effect occurs and how to build a tracking system that exploits it.

Article Number


Publication Date



IEEE Int. Conf. on Advanced Video & Signal-Based Surveillance (AVSS 2005) July 2005, Como Italy.

This research was conducted at the Fordham University Robotics and Computer Vision Lab. For more information about graduate programs in Computer Science, see, and the Fordham University Graduate School of Arts and Sciences, see

Included in

Robotics Commons