Imitation of Demonstrations Using Bayesian Filtering With Nonparametric Data-Driven Models

[+] Author and Article Information
Nurali Virani

Department of Mechanical and
Nuclear Engineering,
The Pennsylvania State University,
University Park,
State College, PA 16802
e-mail: nurali.virani88@gmail.com

Devesh K. Jha

Department of Mechanical and
Nuclear Engineering,
The Pennsylvania State University,
University Park,
State College, PA 16802
e-mail: devesh.dkj@gmail.com

Zhenyuan Yuan

Department of Electrical Engineering,
The Pennsylvania State University,
University Park,
State College, PA 16802
e-mail: zqy5086@psu.edu

Ishana Shekhawat

Department of Mechanical and
Nuclear Engineering,
The Pennsylvania State University,
University Park,
State College, PA 16802
e-mail: ibs5048@psu.edu

Asok Ray

Fellow ASME
Department of Mechanical and
Nuclear Engineering;
Department of Electrical Engineering,
The Pennsylvania State University,
University Park,
State College, PA 16802
e-mail: axr2@psu.edu

1Present address: GE Global Research, Niskayuna, NY 12309.

2Present address: Mitsubishi Electric Research Laboratories, Cambridge, MA 02139.

Contributed by the Dynamic Systems Division of ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CONTROL. Manuscript received February 15, 2017; final manuscript received June 22, 2017; published online November 8, 2017. Assoc. Editor: Prashant Mehta.

J. Dyn. Sys., Meas., Control 140(3), 030906 (Nov 08, 2017) (9 pages) Paper No: DS-17-1094; doi: 10.1115/1.4037782 History: Received February 15, 2017; Revised June 22, 2017

This paper addresses the problem of learning dynamic models of hybrid systems from demonstrations and then the problem of imitation of those demonstrations by using Bayesian filtering. A linear programming-based approach is used to develop nonparametric kernel-based conditional density estimation technique to infer accurate and concise dynamic models of system evolution from data. The training data for these models have been acquired from demonstrations by teleoperation. The trained data-driven models for mode-dependent state evolution and state-dependent mode evolution are then used online for imitation of demonstrated tasks via particle filtering. The results of simulation and experimental validation with a hexapod robot are reported to establish generalization of the proposed learning and control algorithms.

Copyright © 2018 by ASME
Your Session has timed out. Please sign back in to continue.


Vapnik, V. N. , 1998, Statistical Learning Theory, Wiley, New York.
Darema, F. , 2005, “ Dynamic Data Driven Applications Systems: New Capabilities for Application Simulations and Measurements,” Fifth International Conference on Computational Science (ICCS), Atlanta, GA, May 22–25, pp. 610–615.
Vapnik, V. , and Mukherjee, S. , 2000, “ Support Vector Method for Multivariate Density Estimation,” Advances in Neural Information Processing Systems, Vol. 12, S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA, pp. 659–665.
Virani, N. , Lee, J.-W. , Phoha, S. , and Ray, A. , 2016, “ Information-Space Partitioning and Symbolization of Multi-Dimensional Time-Series Data Using Density Estimation,” American Control Conference (ACC), Boston, MA, July 6–8, pp. 3328–3333.
Argall, B. D. , Chernova, S. , Veloso, M. , and Browning, B. , 2009, “ A Survey of Robot Learning From Demonstration,” Rob. Auton. Syst., 57(5), pp. 469–483. [CrossRef]
Sutton, R. S. , and Barto, A. G. , 1998, Reinforcement Learning: An Introduction, Vol. 1, MIT Press, Cambridge, MA.
Smart, W. D. , and Kaelbling, L. P. , 2002, “ Effective Reinforcement Learning for Mobile Robots,” IEEE International Conference on Robotics and Automation (ICRA), Washington, DC, May 11–15, pp. 3404–3410.
Stolle, M. , and Atkeson, C. G. , 2007, “ Knowledge Transfer Using Local Features,” IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), Honolulu, HI, Apr. 1–5, pp. 26–31.
Kuniyoshi, Y. , Inaba, M. , and Inoue, H. , 1994, “ Learning by Watching: Extracting Reusable Task Knowledge From Visual Observation of Human Performance,” IEEE Trans. Rob. Autom., 10(6), pp. 799–822. [CrossRef]
Chernova, S. , and Veloso, M. , 2007, “ Confidence-Based policy Learning From Demonstration Using Gaussian Mixture Models,” Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), Honolulu, HI, May 14–18, p. 233.
Schaal, S. , Ijspeert, A. , and Billard, A. , 2003, “ Computational Approaches to Motor Learning by Imitation,” Philos. Trans. R. Soc., B, 358(1431), pp. 537–547. [CrossRef]
Saunders, J. , Nehaniv, C. L. , and Dautenhahn, K. , 2006, “ Teaching Robots by Moulding Behavior and Scaffolding the Environment,” First ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI), Salt Lake City, UT, Mar. 2–3, pp. 118–125.
Ng, A. Y. , Coates, A. , Diel, M. , Ganapathi, V. , Schulte, J. , Tse, B. , Berger, E. , and Liang, E. , 2006, “ Autonomous Inverted Helicopter Flight Via Reinforcement Learning,” Experimental Robotics IX, Springer, Berlin, pp. 363–372. [CrossRef]
Trezza, A. , Virani, N. , Wolkowicz, K. , Moore, J. , and Brennan, S. , 2015, “ Indoor Mapping and Localization for a Smart Wheelchair Using Measurements of Ambient Magnetic Fields,” ASME Paper No. DSCC2015-9915.
Smola, A. J. , and Schlkopf, B. , 2004, “ A Tutorial on Support Vector Regression,” Stat. Comput., 14(3), pp. 199–222. [CrossRef]
Tucker, H. G. , 1959, “ A Generalization of the Glivenko-Cantelli Theorem,” Ann. Math. Stat., 30(3), pp. 828–830. [CrossRef]
Zou, B. , Zhang, H. , and Xu, Z. , 2009, “ Learning From Uniformly ergodic Markov Chains,” J. Complexity, 25(2), pp. 188–200. [CrossRef]
Karmarkar, N. , 1984, “ A New Polynomial-Time Algorithm for Linear Programming,” 16th Annual ACM Symposium on Theory of Computing (STOC), Washington, DC, Apr. 30–May 2, pp. 302–311.
Bishop, C. M. , 2006, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag, New York.
Arulampalam, M. S. , Maskell, S. , Gordon, N. , and Clapp, T. , 2002, “ A Tutorial on Particle Filters for Online Nonlinear/Non-Gaussian Bayesian Tracking,” IEEE Trans. Signal Process., 50(2), pp. 174–188. [CrossRef]
Thrun, S. , Burgard, W. , and Fox, D. , 2005, Probabilistic Robotics, MIT Press, Cambridge, MA.
Seto, Y. , Takahashi, N. , Jha, D. K. , Virani, N. , and Ray, A. , 2016, “ Data-Driven Robot Gait Modeling Via Symbolic Time Series Analysis,” American Control Conference (ACC), Boston, MA, July 6–8, pp. 3904–3909.
Kroemer, O. , Van Hoof, H. , Neumann, G. , and Peters, J. , 2014, “ Learning to Predict Phases of Manipulation Tasks as Hidden States,” IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, May 31–June 7, pp. 4009–4014.
Grisetti, G. , Stachniss, C. , and Burgard, W. , 2007, “ Improved Techniques for Grid Mapping With Rao-Blackwellized Particle Filters,” IEEE Trans. Rob., 23(1), pp. 34–45. [CrossRef]


Grahic Jump Location
Fig. 1

Schematic for proposed model learning and imitation

Grahic Jump Location
Fig. 5

Hexapod robot experiment setup

Grahic Jump Location
Fig. 2

Probability distribution of the next step

Grahic Jump Location
Fig. 3

Probability distribution of the next mode (top: mt = 1, bottom: mt = 2)

Grahic Jump Location
Fig. 6

Visualization of state-to-mode mapping for KNN classifier

Grahic Jump Location
Fig. 7

Visualization of state-to-mode mapping for GMM classifier

Grahic Jump Location
Fig. 8

Tracking and prediction by particle filters

Grahic Jump Location
Fig. 4

Probabilities of next mode as given by the particles and the actual mode given as input during demonstration



Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In