0
Research Papers

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

[+] Author and Article Information
Thai Duong

School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: duong@eecs.oregonstate.edu

Duong Nguyen-Huu

School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: nguyendu@eecs.oregonstate.edu

Thinh Nguyen

School of Electrical Engineering and
Computer Science,
Oregon State University,
Corvallis, OR 97331
e-mail: thinhq@eecs.oregonstate.edu

Contributed by the Dynamic Systems Division of ASME for publication in the JOURNAL OF DYNAMIC SYSTEMS, MEASUREMENT, AND CONTROL. Manuscript received November 7, 2014; final manuscript received February 22, 2016; published online April 6, 2016. Assoc. Editor: Srinivasa M. Salapaka.

J. Dyn. Sys., Meas., Control 138(6), 061009 (Apr 06, 2016) (12 pages) Paper No: DS-14-1460; doi: 10.1115/1.4032875 History: Received November 07, 2014; Revised February 22, 2016

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

FIGURES IN THIS ARTICLE
<>
Copyright © 2016 by ASME
Your Session has timed out. Please sign back in to continue.

References

Figures

Grahic Jump Location
Fig. 2

The value iteration in an adiabatic setting

Grahic Jump Location
Fig. 1

The classic value iteration

Grahic Jump Location
Fig. 5

The Φ(·) function for simulation scenario 1

Grahic Jump Location
Fig. 6

The actual distance and its upper bound for λî=(1+ai)λ from Theorem 2 (simulation scenario 1)

Grahic Jump Location
Fig. 7

The actual distance and its upper bound for λî=(1+ai)λ from Theorem 3 (simulation scenario 1)

Grahic Jump Location
Fig. 8

The actual distance and its upper bound for λî=(1−bi)λ from Theorem 2 (simulation scenario 1)

Grahic Jump Location
Fig. 9

The actual distance and its upper bound for λî=(1−bi)λ from Theorem 3 (simulation scenario 1)

Grahic Jump Location
Fig. 4

An example of estimated λî and it bounds for actual λ = 40

Grahic Jump Location
Fig. 10

The actual distance and its upper bound from Theorem 2 (simulation scenario 2)

Grahic Jump Location
Fig. 11

The actual distance and its upper bound from Theorem 3 (simulation scenario 2)

Tables

Errata

Discussions

Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In