Point cloud has been shown to be an efficient and precise representation modeling 3D environment. Based on point cloud, numerous researches have been actively conducted and demonstrated great performance in various practical tasks, such as object detection and tracking, localization, segmentation, and classification. Among these tasks, this thesis concentrates on point cloud-based 3D pose estimation and scene flow estimation task, since they provide the most fundamental understanding of the human pose and the surrounding dynamic environment. Thus, solutions to these tasks can be deploy wide range of practical applications, such as human-computer interaction, augmented/virtual reality and autonomous driving. Although, the selected topics have been already extensively explored to date, current approaches still suffer from the low model efficiency, which limits the applications to real-time resource-limited devices. To tackle this drawback, this thesis first proposes two efficient single-shot deep neural architectures, HandFoldingNet and Bi-PointFlowNet, that provide state-of-the-art performance in terms of both accuracy and efficiency. Furthermore, to improve the flexibility and accuracy for better application in various devices with different computation resources, this thesis further proposes two recurrent architectures, HandR2N2 and Mutli-Scale Bidirectional Recurrent Network (MSBRN), that dynamically iterates less parameters for significantly improved performance. At last, this thesis demonstrates an application of the scene flow estimation framework, which recognizes the 3D scene semantic and predicts the dense movement in the future scenes.