Multiview 3D human pose estimation using improved least-squares and LSTM networks

Publicado en Neurocomputing

In this paper we present a deep learning based method to estimate the human pose in 3D when multiple 2D views are available. Our system is composed of a cascade of specialized systems. Firstly, 2D poses are obtained using a deep neural network for the detection of skeleton keypoints in each available view. Then, the 3D coordinates of each keypoint are reconstructed with our proposed least squares optimization method, that analyzes the quality of the 2D detections to decide either to consider or reject them. Once the 3D poses are obtained for each time step, full body pose estimation is performed with a long short-term memory (LSTM) neural network, that takes advantage of the process history to refine the final pose estimation. We provide evidence of the suitability of our contributions in an extensive experimental study. Finally, we were able to prove experimentally that our method obtains competitive results when it is compared to recent representative works in the literature.