Bertsekas, D. P., and Tsitsiklis, J. N., 1996, "Neuro-Dynamic Programming" (Optimization and Neural Computation Series Vol. 3 ), 1st ed., Athena Scientific, Nashua, NH.
Sutton, R. S., and Barto, A. G., 1998, "Reinforcement Learning: An Introduction (Adaptive Computation and Machine Learning)", MIT, Cambridge, MA.
Borkar, V. S., 2000, “A Learning Algorithm for Discrete-Time Stochastic Control,” Probability in the Engineering and Informational Sciences, 14 , pp. 243–258.
[CrossRef]Kaelbling, L. P., Littman, M. L., and Moore, A. W., 1996, “Reinforcement Learning: A Survey,” J. Artif. Intell. Res., 4 , pp. 237–285.
Mandl, P., 1974, “Estimation and Control in Markov Chains,” Adv. Appl. Probab., 6 , pp. 40–60.
[CrossRef]Borkar, V., and Varaiya, P., 1979, “Adaptive Control of Markov Chains. I. Finite Parameter Set,” IEEE Trans. Autom. Control, AC-24 , pp. 953–957.
[CrossRef]Borkar, V., and Varaiya, P., 1982, “Identification and Adaptive Control of Markov Chains,” SIAM J. Control Optim., 20 , pp. 470–489.
[CrossRef]Kumar, P. R., 1982, “Adaptive Control With a Compact Parameter Set,” SIAM J. Control Optim., 20 , pp. 9–13.
[CrossRef]Doshi, B., and Shreve, S. E., 1980, “Strong Consistency of a Modified Maximum Likelihood Estimator for Controlled Markov Chains,” J. Appl. Probab., 17 , pp. 726–734.
[CrossRef]Kumar, P. R., and Becker, A., 1982, “A New Family of Optimal Adaptive Controllers for Markov Chains,” IEEE Trans. Autom. Control, AC-27 , pp. 137–146.
[CrossRef]Kumar, P. R., and Lin, W., 1982, “Optimal Adaptive Controllers for Unknown Markov Chains,” IEEE Trans. Autom. Control, AC-27 , pp. 765–774.
[CrossRef]Sato, M., Abe, K., and Takeda, H., 1982, “Learning Control of Finite Markov Chains With Unknown Transition Probabilities,” IEEE Trans. Autom. Control, AC-27 , pp. 502–505.
[CrossRef]Sato, M., Abe, K., and Takeda, H., 1985, “An Asymptotically Optimal Learning Controller for Finite Markov Chains With Unknown Transition Probabilities,” IEEE Trans. Autom. Control, AC-30 , pp. 1147–1149.
[CrossRef]Sato, M., Abe, K., and Takeda, H., 1988, “Learning Control of Finite Markov Chains With an Explicit Trade-Off Between Estimation and Control,” IEEE Trans. Syst. Man Cybern., 18 , pp. 677–684.
[CrossRef]Kumar, P. R., 1985, “A Survey of Some Results in Stochastic Adaptive Control,” SIAM J. Control Optim., 23 , pp. 329–380.
[CrossRef]Varaiya, P., 1982, “Adaptive Control of Markov Chains: A Survey,” Proceedings of the IFAC Symposium , New Delhi, India, pp. 89–93.
Agrawal, R., and Teneketzis, D., 1989, “Certainty Equivalence Control With Forcing: Revisited,” Proceedings of the IEEE Conference on Decision and Control Including the Symposium on Adaptive Processes , Tampa, FL, p. 2107.
Malikopoulos, A. A., 2008, “Real-Time, Self-Learning Identification and Stochastic Optimal Control of Advanced Powertrain Systems,” Ph.D. thesis, Department of Mechanical Engineering, University of Michigan, Ann Arbor, MI.
Malikopoulos, A. A., Papalambros, P. Y., and Assanis, D. N., 2007, “A State-Space Representation Model and Learning Algorithm for Real-Time Decision-Making Under Uncertainty,” Proceedings of the 2007 ASME International Mechanical Engineering Congress and Exposition , Seattle, WA, Nov. 11–15.
Malikopoulos, A. A., Papalambros, P. Y., and Assanis, D. N., 2007, “A Learning Algorithm for Optimal Internal Combustion Engine Calibration in Real Time,” Proceedings of the ASME 2007 International Design Engineering Technical Conferences Computers and Information in Engineering Conference , Las Vegas, NV, Sept. 4–7.
Malikopoulos, A. A., Assanis, D. N., and Papalambros, P. Y., 2007, “Real-Time, Self-Learning Optimization of Diesel Engine Calibration,” Proceedings of the 2007 Fall Technical Conference of the ASME Internal Combustion Engine Division , Charleston, SC, Oct. 14–17.
Malikopoulos, A. A., Assanis, D. N., and Papalambros, P. Y., 2008, “Optimal Engine Calibration for Individual Driving Styles,” Proceedings of the SAE 2008 World Congress and Exhibition , Detroit, MI, Apr. 14–17, SAE Paper No. 2008-01-1367.
Kumar, P. R., and Varaiya, P., 1986, "Stochastic Systems", Prentice-Hall, Englewood Cliffs, NJ.
Bertsekas, D. P., 2001, "Dynamic Programming and Optimal Control (Volumes 1 and 2)" (Optimization and Neural Computation Series ), 1st ed., Athena Scientific, Nashua, NH.
Kemeny, J. G., and Snell, J. L., 1983, "Finite Markov Chains", 1st ed., Springer, New York.
Krishnan, V., 2006, "Probability and Random Processes", 1st ed., Wiley, New York.
Gubner, J. A., 2006, Probability and Random Processes for Electrical and Computer Engineers, 1st ed., Cambridge University Press, Cambridge.
Grimmett, G. R., and Stirzaker, D. R., 2001, "Probability and Random Processes", 3rd ed., Oxford University Press, New York.
Gosavi, A., 2003, "Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning", 1st ed., Springer, New York.