0
Research Papers

A Real-Time Computational Learning Model for Sequential Decision-Making Problems Under Uncertainty

[+] Author and Article Information
Andreas A. Malikopoulos

Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109amalik@umich.edu

Panos Y. Papalambros

Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109pyp@umich.edu

Dennis N. Assanis

Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI 48109assanis@umich.edu

J. Dyn. Sys., Meas., Control 131(4), 041010 (May 20, 2009) (8 pages) doi:10.1115/1.3117200 History: Received March 18, 2008; Revised February 04, 2009; Published May 20, 2009

Modeling dynamic systems incurring stochastic disturbances for deriving a control policy is a ubiquitous task in engineering. However, in some instances obtaining a model of a system may be impractical or impossible. Alternative approaches have been developed using a simulation-based stochastic framework, in which the system interacts with its environment in real time and obtains information that can be processed to produce an optimal control policy. In this context, the problem of developing a policy for controlling the system’s behavior is formulated as a sequential decision-making problem under uncertainty. This paper considers the problem of deriving a control policy for a dynamic system with unknown dynamics in real time, formulated as a sequential decision-making under uncertainty. The evolution of the system is modeled as a controlled Markov chain. A new state-space representation model and a learning mechanism are proposed that can be used to improve system performance over time. The major difference between the existing methods and the proposed learning model is that the latter utilizes an evaluation function, which considers the expected cost that can be achieved by state transitions forward in time. The model allows decision-making based on gradually enhanced knowledge of system response as it transitions from one state to another, in conjunction with actions taken at each state. The proposed model is demonstrated on the single cart-pole balancing problem and a vehicle cruise-control problem.

FIGURES IN THIS ARTICLE
<>
Copyright © 2009 by American Society of Mechanical Engineers
Your Session has timed out. Please sign back in to continue.

References

Figures

Grahic Jump Location
Figure 1

Construction of the POD domain

Grahic Jump Location
Figure 2

Partition of POD through the PRNs

Grahic Jump Location
Figure 3

The inverted pendulum

Grahic Jump Location
Figure 4

Free body diagram of the system

Grahic Jump Location
Figure 5

Simulation of the system after learning the balance control policy with POD for different initial conditions

Grahic Jump Location
Figure 6

Simulation of the system after learning the balance control policy with POD for different initial conditions (zoom in)

Grahic Jump Location
Figure 7

Number of failures until POD derives the balance control policy

Grahic Jump Location
Figure 8

Vehicle speed and accelerator pedal rate for different road grades by self-learning cruise control with POD

Grahic Jump Location
Figure 9

Engine speed and transmission gear selection for different road grades by self-learning cruise control with POD

Grahic Jump Location
Figure 10

Vehicle speed and accelerator pedal rate for a road grade increase from 0 deg to 10 deg

Grahic Jump Location
Figure 11

Engine speed and transmission gear selection for a road grade increase from 0 deg to 10 deg

Tables

Errata

Discussions

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In