Improving real-time human pose estimation from multi-view video
MetadataVis full innførsel
Capturing human motion is a key problem in computer vision, because of the wide range of applications that can benefit from the acquired data. Motion capture is used to identify people by their gait, for interacting with computers using gestures, for improving the performance of athletes, for diagnosis of orthopaedic patients, and for creating virtual characters with more natural looking motions in movies and games. These are but a few of the possible applications of human motion capture. In some of the application areas mentioned above it is important that the data acquisition is unconstrained by the markers or wearable sensors traditionally used in commercial motion capture systems. Furthermore, there is a need for low latency and real-time performance for certain applications, for instance in perceptive user interfaces and gait recognition. Human pose estimation is defined as the process of estimating the configuration of the underlying skeletal structure of the human body. In this dissertation several algorithms that together form a real-time pose estimation pipeline are proposed. Images captured with a calibrated multi-camera system are input to the pipeline, and the 3D positions of 25 joints in the global coordinate frame are the resulting output. The steps of the pipeline are: a) subtract the background from the images to create silhouettes; b) reconstruct the volume occupied by the performer from the silhouette images; c) extract skeleton curves from the volume; d) identify extremities and segment the skeletal curves into body parts; and e) fit a model to the labelled data. The pipeline can initialise automatically, and can recover from errors in estimation. There are four main contributions of the research effort presented in this dissertation: a) a toolset for evaluating shape-from-silhouette-based pose estimation algorithms using synthetic motion sequences generated from real motion capture data; b) a fully parallel thinning algorithm implemented on the Graphics Processing Unit (GPU) that can skeletonise voxel volumes in real time; c) a real-time pose estimation algorithm that builds a tree structure segmented into body parts from skeleton data; and d) a constraint algorithm that can fit an articulated model to a labelled tree structure in real time.