Motion Capture from Multiple Views without Markers or Model

Over the last years human motion estimation from video has gained in popularity. Applications include surveillance, human-machine interfaces and gaming and movie industry. Common approaches either use visual markers on clothing or a 3-D model of the human body. In this work we would like to investigate the use of visual hull techniques in combination with tracking algorithms to capture 3-D motion without markers or model. Visual-hull techniques (like shape-from-silhouettes) can find a volume-based representation of objects or humans using multiple, calibrated cameras that are synchronized in time. In order to do so, image segmentation in foreground (silhouettes) and background is necessary for each single view. The solution in practice is to use chroma-keying with blue or green backgrounds. The output of the visual-hull technique is a set of so-called voxels (volume elements) for each time frame. In this work we would like to investigate how these voxels move over time in 3-D. For that, a match must be found between corresponding voxels in subsequent frames. This match can possibly found by combining features from 3-D (relative position voxel in set, texture for boundary voxels) and 2-D (optical flow, difference images, appearance). As this matching can be ambiguous an additional temporal filtering using (Kalman) tracking is proposed to disambiguate matches and find a consistent 3-D optical flow.

Several datasets with multiple, calibrated views of human action are available, as well as C++ implementations of shape-from-silhouettes and marching cubes.

Keywords:
Computer Vision, Computer Graphics, Looking at People
Study:
Artificial Intelligence
Contact:
John Schavemaker 
Status:
Open
Location:
TNO / Universiteit van Amsterdam
References: