Papers in Reverse Chronological Order


Go back to publications main page.

Refereed Conference and Journal Papers

  1. Pairwise Weights for Temporal Credit Assignment
    by Zeyu Zheng, Risto Vuorio, Richard Lewis, and Satinder Singh.
    In 36th AAAI Conference on Artificial Intelligence, 2022
    arXiv version.

  2. Reward is Enough
    by David Silver, Satinder Singh, Doina Precup, and Richard Sutton.
    In Artificial Intelligence, vol 299, 2021
    pdf.

  3. On the Expressivity of Markov Reward
    by David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh.
    In Neural Information Processing Systems (NeurIPS), 2021
    Outstanding Paper Award
    pdf.

  4. Proper Value Equivalence
    by Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, and Satinder Singh.
    In Neural Information Processing Systems (NeurIPS), 2021
    arXiv version.

  5. Discovery of Options via Meta-Learned Subgoals
    by Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, and Satinder Singh.
    In Neural Information Processing Systems (NeurIPS), 2021
    arXiv version.

  6. Learning State Representations from Random Deep Action-Conditional Predictions
    by Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, and Satinder Singh.
    In Neural Information Processing Systems (NeurIPS), 2021
    arXiv version.

  7. Reward is Enough for Convex MDPs
    by Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, and Satinder Singh.
    In Neural Information Processing Systems (NeurIPS), 2021
    arXiv version.

  8. Reinforcement Learning of Implicit and Explicit Control Flow Instructions
    by Ethan Brooks, Janarthanan Rajendran, Richard Lewis, and Satinder Singh.
    In International Conference on Machine Learning (ICML), 2021
    arXiv version.

  9. Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-Person Simulated 3D Environment
    by Wilka Carvalho, Anthony Liang, Kimin Lee, Honglak Lee, Richard Lewis, and Satinder Singh.
    In International Joint Conference on Artificial Intelligence (IJCAI), 2021
    arXiv version.

  10. Rational use of episodic and working memory: A normative account of prospective memory
    by Ida Mommennejad, Jarrod Lewis-Peacock, Kenneth A. Normal, Jonathan D. Cohen, Satinder Singh, and Richard L. Lewis.
    In Neuropsychologia, vol 158, 2021
    pdf.

  11. Efficient Querying for Cooperative Probabilistic Commitments
    by Qi Zhang, Edmund Durfee, and Satinder Singh.
    In 35th AAAI Conference on Artificial Intelligence (AAAI), 2021
    arXiv version.

  12. The Value Equivalence Principle for Model-Based Reinforcement Learning
    by Christopher Grimm, Andre Barreto, Satinder Singh, and David Silver.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  13. Discovering Reinforcement Learning Algorithms
    by Junhyuk Oh, Matteo Hessel, Wojciech Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, and David Silver.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  14. Meta-Gradient Reinforcement Learning with an Objective Discovered Online
    by Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, and David Silver.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  15. Learning to No-Press Diplomacy with Best Response Policy Iteration
    by Thomas Anthony, Tom Eccles, Andrea Tacchetti, Janos Kramar, Ian Gemp, Thomas Hudson, Nicolas Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, and Yoram Bachrach.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  16. A Self-Tuning Actor-Critic Algorithm
    by Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, and Satinder Singh.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  17. On Efficiency in Hierarchical Reinforcement Learning
    by Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, and Satinder Singh.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    pdf.

  18. What can Learned Intrinsic Rewards Capture?
    by Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, and Satinder Singh.
    In International Conference on Machine Learning (ICML), 2020.
    arxiv version.

  19. Sample Complexity of Reinforcement Learning Using Linearly Combined Model Ensembles
    by Aditya Modi, Nan Jiang, Ambuj Tewari, and Satinder Singh.
    In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
    arXiv version.

  20. How Should An Agent Practice?
    by Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, and Satinder Singh.
    In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
    pdf.

  21. Modeling Probabilistic Commitments for Maintainance is Inherently Harder than for Achievement
    by Qi Zhang, Edmund Durfee, and Satinder Singh.
    In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
    pdf.

  22. Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes
    by Shun Zhang, Edmund Durfee, and Satinder Singh.
    In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
    pdf.

  23. Discovery of Useful Questions as Auxiliary Tasks
    by Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hsselt, David Silver, and Satinder Singh.
    In Neural Information Processing Systems (NeurIPS), 2019.
    arxiv version.

  24. Behavior Suite for Reinforcement Learning
    by Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, and Hado Van Hasselt.
    In Neural Information Processing Systems (NeurIPS), 2019.
    arxiv version.

  25. Hindsight Credit Assignment
    by Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheslaghi Azar, Bilal Piot, Nicolas Heess, Hado van Hsselt, Gregory Wayne, Satinder Singh, Doina Precup, and Remi Munos.
    In Neural Information Processing Systems (NeurIPS), 2019.
    pdf.

  26. No Press Diplomacy: Modeling Multi-Agent Gameplay
    by Philip Paquette, Yuchen Lu, Steven Bocco, Max ). Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, and Aaron Courville.
    In Neural Information Processing Systems (NeurIPS), 2019.
    arxiv version.

  27. Disentangled Cumulants Help Succesor Representations Transfer to New Tasks
    by Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, and Satinder Singh.
    arxiv.

  28. Deep Reinforcment Learning for Dynamic Multi-Driver Dispatching and Repositioning Problem
    by John Holler, Risto Vuorio, Tiancheng Jin, Satinder Singh, Zhiwei Qin, Jieping Ye, Xiaocheng Tan, Yan Jiao, and Chenxi Wang.
    In International Conference on Data Mining (ICDM-Short Paper), 2019.
    pdf.

  29. Learning Independently-Obtainable Reward Functions
    by Christopher Grimm and Satinder Singh.
    arXiv version.

  30. Many-Goals Reinforcement Learning
    by Vivek Veeriah, Junhyuk Oh, and Satinder Singh.
    arXiv version.

  31. Learning to Communicate and Solve Visual Blocks-World Tasks
    by Qi Zhang, Richard Lewis, Satinder Singh, and Edmund Durfee.
    In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019.
    pdf.

  32. On Learning Intrinsic Rewards for Policy Gradient Methods
    by Zeyu Zheng, Junhyuk Oh, and Satinder Singh.
    In Neural Information Processing Systems (NIPS), 2018.
    arXiv version.

  33. Generative Adversarial Self-Imitation Learning
    by Yijie Guo, Junhyuk Oh, Satinder Singh, and Honglak Lee.
    In Neural Information Processing Systems (NeurIPS), 2018.
    arXiv version.

  34. Completing State Representations Using Spectral Learning
    by Nan Jiang, Alex Kulesza, and Satinder Singh.
    In Neural Information Processing Systems (NIPS), 2018.
    pdf.

  35. Learning End-to-End Goal-Oriented Dialog with Multiple Answers
    by Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, and Lazaros Polymenakos.
    In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
    pdf.

  36. Self-Imitation Learning
    by Junhyuk Oh, Yijie Guo, Satinder Singh, and Honglak Lee.
    In International Conference on Machine Learning (ICML), 2018.
    arXiv version.

  37. Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes
    by Shun Zhang, Edmund Durfee, and Satinder Singh.
    In International Joint Conference on Artificial Intelligence (IJCAI), 2018.
    pdf.

  38. Markov Decision Processes with Continuous Side Information
    by Aditya Modi, Nan Jiang, Satinder Singh, and Ambuj Tewari.
    In International Conference on Algorithmic Learning Theory (ALT), 2018.
    conf pdf, arXiv link.

  39. The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA
    by Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, and Jenna Wiens.
    In Sloan Sports Analytics Conference, 2018.
    arXiv link.

  40. Value Prediction Networks
    by Junhyuk Oh, Satinder Singh, Honglak Lee.
    In Neural Information Processing Systems (NIPS), 2017.
    arXiv link.

  41. Repeated Inverse Reinforcement Learning
    by Kareem Amin, Nan Jiang, and Satinder Singh.
    In Neural Information Processing Systems (NIPS), 2017.
    arXiv link.

  42. A Big Step for AI
    by Satinder Singh.
    In Nature: News & Views, 2017.
    pdf.

  43. Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
    by Junhyuk Oh, Satinder Singh, Honglak Lee, and Pushmeet Kohli.
    In International Conference on Machine Learning (ICML), 2017.
    pdf.

  44. A Stackelberg Game Model for Botnet Data Exfiltration
    by Thang Nguyen, Michael Wellman, and Satinder Singh.
    In Proceedings of the 8th Conference on Decision and Game Theory for Security (GameSec), 2017.
    pdf.

  45. Learning to Query, Reason, and Answer Questions on Ambiguous Texts.
    by Xiaoxiao Guo, Tim Klinger, Clemens Rosenbaum, Jospeh Bigus, Murray Campbell, Ban Kawas, Kartik Talamadupula, Gerald Tesauro, and Satinder Singh.
    In 5th International Conference on Learning Representations (ICLR), 2017.
    pdf.

  46. Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes.
    by Shun Zhang, Edmund Durfee, and Satinder Singh.
    In 27th International Conference on Automated Planning and Scheduling (ICAPS), 2017.
    pdf.

  47. Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making.
    by Qi Zhang, Satinder Singh, and Edmund Durfee.
    In 27th International Conference on Automated Planning and Scheduling (ICAPS), 2017.
    pdf.

  48. Predicting Counselor Behaviors in Motivational Interviewing Encounters.
    by Veronica Perez-Rosas, Rada Mihalcea, Kenneth Resnicow, Satinder Singh, Lawrence An, Kathy J. Goggin, and Delwyn Catley.
    In Proceedings of the European Association of Computational Linguistics, (EACL) 2017.
    pdf.

  49. Control of Memory, Active Perception, and Action in Minecraft.
    by Junhyuk Oh, Valliappa Chockalingum, Satinder Singh, and Honglak Lee.
    In 33rd International Conference on Machine Learning (ICML), 2016.
    pdf.

  50. Gradient Methods for Stackelberg Security Games.
    by Kareem Amin, Satinder Singh, and Michael Wellman.
    In Conference on Uncertainty in Artificial Intelligence (UAI), 2016.
    pdf.

  51. Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games.
    by Xiaoxiao Guo, Satinder Singh, Richard Lewis, and Honglak Lee.
    In 25th International Joint Conference on Artificial Intelligence (IJCAI), 2016.
    pdf.

  52. Commitment Semantics for Sequential Decision Making Under Reward Uncertainty.
    by Qi Zhang, Edmund Durfee, Satinder Singh, Anna Chen, and Stefan Witwicki.
    In 25th International Joint Conference on Artificial Intelligence (IJCAI), 2016.
    pdf.

  53. On Structural Properties of MDPs that Bound Loss Due to Shallow Planning.
    by Nan Jiang, Satinder Singh and Ambuj Tewari.
    In 25th International Joint Conference on Artificial Intelligence (IJCAI), 2016.
    pdf.

  54. On the Trustworthy Fulfillment of Commitments.
    by Edmund Durfee and Satinder Singh.
    In Proceedings of the 18th International Workshop on Trust in Agent Societies (TRUST), 2016.
    pdf.

  55. Building a Motivational Interviewing Dataset.
    by Veronica Perez-Rosas, Rada Mihalcea, Kenneth Resnicow, Lawrence An, and Satinder Singh.
    In Proceedings of the NAACL 2016 Workshop on Clinical Psychology, 2016.
    pdf.

  56. Patient-Centerd Pain Care Using Artificial Intelligence and Mobile Health Tools: Protocol for a Randomized Study Funded by the US Department of Veterans Affairs Health Services Research and Development Program.
    by Piette JD, Krein SL, Striplin D, Marinec N, Kerns RD, Farris KB, Singh S, An L, and Heapy AA.
    In JMIR Research Protocols; 5(2) 2016.
    pdf.

  57. Improving Predictive State Representations via Gradient Descent.
    by Nan Jiang, Alex Kulesza, and Satinder Singh.
    In Thirtieth AAAI Conference on Artificial Intelligence (AAAI), 2016.
    pdf.

  58. Confirming the theoretical structure of expert-developed text messages to improve adherence to anti-hypertensive medications.
    by Karen Farris, Teresa Salgado, Peter Batra, John Piette, Satinder Singh, Ahmed Guhad, Sean Newman, Vincent Marshall, and Larry An.
    In Research in Social and Administrative Pharmacy, 2015.
    pdf.

  59. Action-Conditional Video Prediction Using Deep Networks in ATARI Games.
    by Juhnyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh.
    In Neural Information Processing Systems, 2015.
    online videos
    arxiv pdf, NIPS pdf, NIPS Appendix pdf.

  60. Multi-Task Seizure Detection: Addressing Inter-Patient and Intra-Patient Variations in Seizure Morphologies.
    by Alex Van Esbroeck, Landon Smith, Zeeshan Syed, Satinder Singh, and Zahi Karam.
    In Machine Learning, 2015.
    pdf.

  61. Abstraction Selection in Model-Based Reinforcement Learning.
    by Nan Jiang, Alex Kulesza, and Satinder Singh.
    In 32nd International Conference on Machine Learning (ICML), 2015.
    pdf.

  62. The Dependence of Effective Planning Horizon on Model Accuracy.
    by Nan Jiang, Alex Kulesza, Satinder Singh, and Richard Lewis.
    In International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2015.
    Best Paper Award
    pdf.

  63. Low-Rank Spectral Learning with Weighted Loss Functions.
    by Alex Kulesza, Nan Jiang, and Satinder Singh.
    In Eighteenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2015.
    pdf.

  64. Spectral Learning of Predictive State Representations with Insufficient Statistics.
    by Alex Kulesza, Nan Jiang, and Satinder Singh.
    In Twenty-Ninth AAAI Conference, 2015.
    pdf.

  65. Optimal Rewards for Cooperative Agents.
    by Bingyao Liu, Satinder Singh, Richard Lewis, and Shiyin Qin.
    In IEEE Transactions on Autonomous Mental Development, Vol 6, Issue 4, 2014.
    pdf.

  66. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.
    by Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, and Xiaoshi Wang.
    In Neural Information Processing Systems (NIPS), 2014.
    pdf.

  67. Computationally Rational Saccadic Control: An Explanation of Spillover Effects Based on Sampling from Noisy Perception and Memory.
    by Michael Shvartsman, Richard L Lewis, and Satinder Singh.
    In Cognitive Modeling and Computational Linguistics (CMCL), 2014.
    pdf.

  68. The Potential Impact of Intelligent Systems for Mobile Health Self-Management Support: Monte-Carlo Simulations of Text Message Support for Medication Adherence.
    by John Piette, Karen Farris, Sean Newman, Larry An, Jeremy Sussman, and Satinder Singh.
    In Annals of Behavioral Medicine, 2014.
    pdf.

  69. Low-Rank Spectral Learning.
    by Alex Kulesza, Nadakuditi Raj Rao, and Satinder Singh.
    In Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2014.
    pdf.

  70. Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization.
    by Alex Van Esbroeck, Satinder Singh, Ilan Rubinfeld, and Zeeshan Syed.
    In 28th AAAI Conference on Artificial Intelligence (AAAI-14), 2014.
    pdf.

  71. Predicting Postoperative Atrial Fibrillation from Independent ECG Components.
    by Chih-Chun Chia, James Blum, Zahi Karam, Satinder Singh, and Zeeshan Syed.
    In 28th AAAI Conference on Artificial Intelligence (AAAI-14), 2014.
    pdf.

  72. Ecologically Valid Long-Term Mood Monitoring of Individuals with Bipolar Disorder Using Speech.
    by Zahi Karam, Emily Mower Provost, Satinder Singh, Jennifer Montgomery, Christopher Archer, Gloria Harrington, and Melvin Mcinnis.
    In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014.
    pdf.

  73. Characterizing EVOI-Sufficient k-Response-Query Sets in Decision Problems.
    by Robert Cohn, Satinder Singh, and Edmund Durfee.
    In Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2014.
    pdf.

  74. Improving UCT Planning via Approximate Homomorphisms.
    by Nan Jiang, Satinder Singh, and Richard Lewis.
    In 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2014.
    pdf.

  75. Utility Maximization and Bounds on Human Information Processing.
    by Andrew Howes, Richard L Lewis, and Satinder Singh.
    In Topics in Cognitive Science, Volume 6, Issue 2, pages 198-203, 2014.
    pdf.

  76. Computing Solutions in infinite-horizon discounted adversarial patrolling games.
    by Yevgeniy Vorobeychik, Bo An, Milind Tambe, and Satinder Singh.
    In 24th International Conference on Automated Planning and Scheduling (ICAPS), 2014.
    pdf.

  77. Computational Rationality: Linking Mechanism and Behavior Through Utility Maximization.
    by Richard L Lewis, Andrew Howes, and Satinder Singh.
    In Topics in Cognitive Science, Volume 6, Issue 2, pages 279-311, 2014.
    pdf.

  78. Reward Mapping for Transfer in Long-Lived Agents.
    by Xiaoxiao Guo, Satinder Singh, and Richard L Lewis.
    In Advances in Neural Information Processing Systems (NIPS), 26, 2013.
    pdf.

  79. The adaptive nature of eye-movements in linguistic tasks: How payoff and architecture shape speed-accuracy tradeoffs.
    by Richard L Lewis, Michael Shvartsman, and Satinder Singh.
    In Topics in Cognitive Science, Vol. 5, Issue 3, pages 581-610, 2013.
    pdf.

  80. Linking Context to Evaluation in the Design of Safety Critical Interfaces.
    by Michael Feary, Dorritt Billman, Xiuli Chen, Andrew Howes, Richard Lewis, Lance Sherry, and Satinder Singh.
    In Proceedings of Human-Computer Interaction International, 2013.
    pdf.

  81. An Exploration of Low-Rank Spectral Learning.
    by Alex Kulesza, Nadakuditi Raj Rao, and Satinder Singh.
    In ICML Workshop on Spectral Learning, 2013.
    pdf.

  82. Maximizing the Value of Mobile Health Monitoring by Avoiding Redundant Patient Records: Prediction of Depression-Related Symptoms and Adherence Problems in Automated Health Assessment Services.
    by John Piette, Jeremy Sussman, Paul Pfeiffer, Maria Silveira, Satinder Singh, and Mariel Lavieri.
    In Journal of Medical Internet Research, Vol 15, No. 7, 2013.

  83. Testing the Structure of SMS Messages for use in an Artificial Intelligence (AI)-driven SMS Antihypertensive Adherence Support Tool.
    by Karen Farris, Sean Newman, Satinder Singh, Larry An, and John Piette.
    Research Abstract in Wireless Health, 2013.
    pdf.

  84. Optimal Rewards in Multiagent Teams
    by Bingyao Liu, Satinder Singh, Richard L. Lewis, and Syiyin Qin
    In International Conference on Development and Learning-EpiRob, 2012.
    pdf.

  85. Lossy Stochastic Game Abstraction with Bounds
    by Tuomas Sandholm and Satinder Singh.
    In Proceedings of the 13th ACM Conference on Electronic Commerce (EC), 2012.
    pdf.
    A previous version appears in Fifth International Workshop on Optimization in Multi-Agent Systems (OPTMAS), 2012.

  86. Learning and Predicting Dynamic Networked Behavior with Graphical Multiagent Models
    by Quang Duong, Michael P. Wellman, Satinder Singh, and Michael Kearns.
    In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
    pdf.

  87. Strong Mitigation: Nesting Search for Good Policies within Search for Good Reward
    by Jeshua Bratman, Satinder Singh, Richard Lewis, and Jonathan Sorg.
    In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
    pdf.

  88. Security Games with Limited Surveillance
    by Bo An, David Kempe, Christopher Kiekintveld, Eric Shieh, Satinder Singh, Milind Tambe, and Yevgeniy Vorobeychik.
    In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI), 2012.
    pdf.

  89. Computing Stackelberg Equilibria in Discounted Stochastic Games
    by Yevgeniy Vorobeychik and Satinder Singh.
    In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI), 2012.
    pdf.
    (This is a corrected version of the paper that appeared in the conference proceedings.
    Major thanks to Vincent Conitzer for finding a counterexample to the main theorem in the now-corrected submitted version.)

  90. Planning Delayed-Response Queries and Transient Policies under Reward Uncertainty
    by Rob Cohn, Edmund Durfee and Satinder Singh.
    In Proceedings of the Seventh Annual Workshop on Multiagent Sequential Decision-Making Under Uncertainty (MSDM), held in conjunction with AAMAS, 2012.
    pdf.

  91. Planning and Evaluating Multiagent Influences Under Reward Uncertainty (Extended Abstract)
    by Stefan Witwicki, Inn-Tung Chen, Edmund Durfee and Satinder Singh.
    In 11th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2012.
    pdf.

  92. Learning to Make Predictions in Partially Observable Environments without a Generative Model
    by Erik Talvitie and Satinder Singh.
    In Journal of Artificial Intelligence Research, vol 42, pages 353-392, 2011.
    pdf.

  93. Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI), 2011.
    pdf.

  94. Comparing Action-Query Strategies in Semi-Autonomous Agents
    by Robert Cohn, Edmund Durfee, and Satinder Singh.
    In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI), 2011.
    pdf.
    An extended abstract also appears in the Proceedings of the 10th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2011.

  95. Learning and Predicting Dynamic Behavior with Graphical Multiagent Models
    by Quang Duong, Michael P. Wellman, Satinder Singh, and Michael Kearns.
    In 5th International Workshop on Social Networks Mining and Analysis at KDD (SNACKDD-11), 2011.
    pdf.
    An extended abstract also appears in the Proceedings of the 2nd Workshop on Information in Networks (WIN-10), 2010.

  96. Modeling Information Diffusion in Networks with Unobserved Links
    by Quang Duong, Michael P. Wellman, and Satinder Singh.
    In 3rd IEEE Conference on Social Computing (SocialCom-11), 2011.
    pdf.
    An earlier version also appears in the 5th International Workshop on Social Networks Mining and Analysis at KDD (SNACKDD-11), 2011

  97. Dynamic Incentive Mechanisms
    by David C. Parkes, Ruggiero Cavallo, Florin Constantin and Satinder Singh.
    In AI Magazine, Vol. 31, No. 4, pages 79-94, 2010.
    pdf.

  98. Reward Design via Online Gradient Ascent
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Neural Information Processing Systems (NIPS), 2010.
    pdf.

  99. Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective
    by Satinder Singh, Richard Lewis, Andrew Barto, and Jonathan Sorg.
    In IEEE Transactions on Autonomous Mental Development, Vol 2, No 2, 2010.
    pdf

  100. Modeling Multiple-mode Systems with Predictive State Representations
    by Britton Wolfe, Michael James and Satinder Singh.
    In Proceedings of the 13th International IEEE Conference on Intelligent Transportation Systems, 2010.
    pdf

  101. Variance-Based Rewards for Approximate Bayesian Reinforcement Learning
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), 2010.
    pdf

  102. Internal Rewards Mitigate Agent Boundedness
    by Jonathan Sorg, Satinder Singh, and Richard Lewis.
    In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010.
    pdf

  103. A New Approach to Exploring Language Emergence as Boundedly Optimal Control in the Face of Environmental and Cognitive Constraints
    by Jeshua Bratman, Michael Schvartsman, Richard Lewis, and Satinder Singh.
    In Proceedings of the 10th International Conference on Cognitive Modeling (ICCM), 2010.
    (Honorable mention for Allan Newell Best Student Paper Award at ICCM)
    pdf

  104. Selecting Operator Queries Using Expected Myopic Gain
    by Robert Cohn, Michael Maxim, Edmund Durfee, and Satinder Singh.
    In Proceedings of the International Conference on Intelligent Agent Technology (IAT), 2010.
    pdf

  105. History-Dependent Graphical Multiagent Models
    by Quang Duong, Michael Wellman, Satinder Singh, and Yevgeniy Vorobeychik.
    In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010.
    pdf

  106. Linear Options
    by Jonathan Sorg and Satinder Singh.
    In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010.
    (Finalist for Pragnesh Jay Modi Best Student Paper Award)
    pdf

  107. Transfer via Soft Homomorphisms
    by Jonathan Sorg and Satinder Singh.
    In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
    pdf

  108. SarsaLandmark: an Algorithm for Learning in POMDPs with Landmarks
    by Michael R. James and Satinder Singh.
    In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2009.
    pdf

  109. Learning Graphical Game Models
    by Quang Duong, Yevgeniy Vorobeychik, Satinder Singh and Michael Wellman.
    In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009.
    pdf

  110. Where Do Rewards Come From?
    by Satinder Singh, Richard L. Lewis and Andrew G. Barto.
    In Proceedings of the Annual Conference of the Cognitive Science Society (CogSci), 2009.
    pdf

  111. Maintaining Predictions Over Time Without a Model
    by Erik Talvitie and Satinder Singh.
    In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 2009.
    pdf

  112. Simple Local Models for Complex Dynamical Systems
    by Erik Talvitie and Satinder Singh.
    In Proceedings of the 22nd Annual Conference on Neural Information Processing Systems (NIPS), 2008.
    pdf

  113. Efficiently Learning Linear-Linear Exponential Family Predictive Representations of State
    by David Wingate and Satinder Singh.
    In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 1176-1183, 2008.
    pdf

  114. Building Incomplete but Accurate Models
    by Erik Talvitie, Britton Wolfe and Satinder Singh.
    In Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2008.
    pdf

  115. Predictive Linear-Gaussian Models of Stochastic Dynamical Systems with Vector-Value Actions and Observations
    by Matthew Rudary and Satinder Singh.
    In Proceedings of the Tenth International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2008.
    pdf

  116. Knowledge Combination in Graphical Multiagent Models
    by Quang Duong, Michael Wellman and Satinder Singh.
    In Proceedings of the 24th Annual Conference on Uncertainty in Artificial Intelligence (UAI), 2008.
    pdf

  117. Approximate Predictive State Representations
    by Britton Wolfe, Michael R. James and Satinder Singh.
    In Procedings of the 2008 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2008.
    (Finalist for Pragnesh Jay Modi Best Student Paper Award)
    pdf

  118. Learning Payoff Functions in Infinite Games
    by Yevgeniy Vorobeychik, Michael Wellman and Satinder Singh.
    Machine Learning Journal 67:145-168, 2007.
    pdf

  119. Constraint Satisfaction Algorithms for Graphical Games
    by Vishal Soni, Satinder Singh and Michael Wellman.
    In Procedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2007.
    pdf

  120. On Discovery and Learning of Models with Predictive State Representations of State for Agents with Continuous Actions and Observations
    by David Wingate and Satinder Singh.
    In Procedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2007.
    pdf

  121. Relational Knowledge with Predictive State Representations
    by David Wingate, Vishal Soni, Britton Wolfe and Satinder Singh.
    In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2035-2040, 2007.
    pdf

  122. An Experts Algorithm for Transfer Learning
    by Erik Talvitie and Satinder Singh.
    In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 1065-1070, 2007.
    pdf

  123. Abstraction in Predictive State Representations
    by Vishal Soni and Satinder Singh.
    In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI), 2007.
    pdf

  124. Exponential Family Predictive Representations of State
    by David Wnigate and Satinder Singh.
    In Proceedings of the Advances in Neural Information Processing Systems, 20 (NIPS), pages 1617-1624, 2007.
    pdf

  125. Cobot in LambdaMOO: An Adaptive Social Statistics Agent
    by Charles Isbell, Michael Kearns, Satinder Singh, Christian Shelton, Peter Stone and Dave Kormann.
    In Journal of Autonomous Agents and Multi-Agent Systems, 13(3), pages 327-354, 2006.
    pdf

  126. Mixtures of Predictive Linear Gaussian Models for Nonlinear Stochastic Dynamical Systems
    by David Wingate and Satinder Singh.
    In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
    pdf

  127. Using Homomorphisms to Transfer Options Across Reinforcement Learning Domains
    by Vishal Soni and Satinder Singh.
    In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 2006.
    pdf

  128. Kernel Predictive Linear-Gaussian Models for Nonlinear Stochastic Dynamical Systems
    by David Wingate and Satinder Singh.
    In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 1017-1024, 2006.
    pdf

  129. Predictive linear-Gaussian models of controlled stochastic dynamical systems
    by Matthew Rudary and Satinder Singh.
    In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 777-784, 2006.
    pdf

  130. Predictive State Representations with Options
    by Britton Wolfe and Satinder Singh.
    In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 1025-1032, 2006.
    pdf

  131. Empirical Game-Theoretic Analysis of Chaturanga
    by Christopher Kiekintveld, Michael Wellman and Satinder Singh.
    In Proceedings of AAMAS-06 Workshop on Game-Theoretic and Decision-Theoretic Agents, 2006.
    pdf

  132. Optimal Coordinated Planning Amongst Self-Interested Agents with Private State
    by Ruggiero Cavallo, David C. Parkes and Satinder Singh.
    In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI), 2006.
    pdf

  133. Optimal Coordination of Loosely-Coupled Self-InterestedRobots
    by Ruggeiro Cavallo, David C. Parkes, and Satinder Singh.
    In Workshop on Auction Mechanisms for Robot Coordination at AAAI'06, 2006.
    pdf

  134. Reinforcement Learning of Hierarchical Skills on the Sony Aibo Robot
    by Vishal Soni and Satinder Singh.
    In Proceedings of the 5th International Conference on Development and Learning (ICDL), 2006.
    pdf

  135. Off-policy Learning with Options and Recognizers
    by Doina Precup, Richard Sutton, Cosmin Paduraru, Anna Koop and Satinder Singh.
    In Proceedings of Advances in Neural Information Processing Systems 18 (NIPS), pages 1097-1104, 2006.
    pdf

  136. Intrinsically Motivated Reinforcement Learning
    by Satinder Singh, Andrew G. Barto and Nuttapong Chentanez.
    In Proceedings of Advances in Neural Information Processing Systems 17 (NIPS), pages 1281-1288, 2005.
    pdf

  137. Approximately Efficient Online Mechanism Design
    by David Parkes, Satinder Singh and Dimah Yanovsky.
    In Proceedings of Advances in Neural Information Processing Systems 17 (NIPS), pages 1049-1056, 2005.
    pdf

  138. Predictive linear-Gaussian models of stochastic dynamical systems
    by Matthew Rudary, Satinder Singh and David Wingate.
    In Proceedings of the Uncertainty in Artificial Intelligence (UAI), pages 501-508, 2005.
    pdf

  139. Combining Memory and Landmarks with Predictive State Representations
    by Michael R. James, Britton Wolfe and Satinder Singh.
    In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005.
    pdf

  140. Learning Payoff Functions in Infinite Games
    by Yevgeniy Vorobeychik, Michael Wellman and Satinder Singh.
    In Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), 2005
    pdf
    (An expanded version was later published in the Machine Learning Journal; pdf)

  141. Planning in Models that Combine Memory with Predictive Representations of State
    by Michael R. James and Satinder Singh.
    In Proceedings of the 20th National Conference on Artificial Intelligence (AAAI), pages 987-992, 2005.
    pdf

  142. Learning Predictive State Representations in Dynamical Systems Without Reset
    by Britton Wolfe, Michael R. James and Satinder Singh.
    In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005.
    pdf

  143. Intrinsically Motivated Learning of Hierarchical Collections of Skills
    by Andrew G. Barto, Satinder Singh, and Nuttapong Chentanez.
    In Proceedings of International Conference on Developmental Learning (ICDL), 2004.
    pdf

  144. Predictive State Representations: A New Theory for Modeling Dynamical Systems
    by Satinder Singh, Michael R. James and Matthew R. Rudary.
    In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI), pages 512-519, 2004.
    pdf

  145. Learning and Discovery of Predictive State Representations in Dynamical Systems with Reset
    by Michael James and Satinder Singh.
    In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 417-424, 2004.
    pdf

  146. Adaptive Cognitive Orthotics: Combining Reinforcement Learning and Constraint-Based Temporal Reasoning
    by Matthew Rudary, Satinder Singh and Martha Pollack.
    In Proceedings of the Twenty-First International Conference on Machine Learning (ICML), pages 719-726, 2004.
    pdf

  147. Planning with Predictive State Representations
    by Michael R. James, Satinder Singh and Michael Littman.
    In Proceedings of the International Conference on Machine Learning and Applications (ICMLA), pages 304-311, 2004.
    pdf

  148. Computing Approximate Bayes Nash Equilibria in Tree-Games of Incomplete Information
    by Satinder Singh, Vishal Soni and Michael Wellman.
    In Proceedings of the Fifth ACM Conference on Electronic Commerce (EC), pages 81-90, 2004.
    pdf

  149. Distributed Feedback Control for Decision Making on Supply Chains
    by Christopher Kiekintveld, Michael P. Wellman, Satinder Singh, Joshua Estelle, Yevgeniy Vorobeychik, Vishal Soni and Matthew Rudary.
    In Proceedings of the 14th International Conference on Automated Planning and Scheduling (ICAPS), pages 384-392, 2004.
    pdf.

  150. Strategic Interactions in the TAC 2003 Supply Chain Tournament
    by Joshua Estelle, Yevgeniy Vorobeychik, Michael P. Wellman, Satinder Singh, Christopher Kiekintveld and Vishal Soni.
    In Proceedings of the Fourth International Conference on Computer & Games, 2004.
    pdf

  151. A Nonlinear Predictive State Representation
    by Matthew Rudary and Satinder Singh.
    In Advances in Neural Information Processing Systems 16 (NIPS), pages 855-862, 2004.
    pdf

  152. Strategic Procurement in TAC/SCM: An Empirical Game-Theoretic Analysis
    by Joshua Estelle, Yevgeniy Vorobeychik, Michael P. Wellman, Satinder Singh, Christopher Kiekintveld, and Vishal Soni.
    In Workshop on Trading Agent Design and Analysis (TADA), 2004.
    pdf

  153. An MDP-Based Approach to Online Mechanism Design
    by David Parkes and Satinder Singh.
    In Advances in Neural Information Processing Systems 16 (NIPS), pages 791-798, 2004.
    pdf

  154. Learning Predictive State Representations
    by Satinder Singh, Michael Littman, Nicholas Jong, David Pardoe and Peter Stone.
    In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 712-719, 2003.
    pdf

  155. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System
    by Satinder Singh, Diane Litman, Michael Kearns and Marilyn Walker.
    In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133, 2002.
    pdf

  156. CobotDS: A Spoken Dialogue System for Chat
    by Michael Kearns, Charles Isbell, Satinder Singh, Diane Litman, and J. Howe.
    In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI), pages 435-430, 2002.
    pdf

  157. Near-Optimal Reinforcement Learning in Polynomial Time
    by Michael Kearns and Satinder Singh.
    In Machine Learning journal, Volume 49, Issue 2, pages 209-232, 2002.
    ( shorter version appears in ICML 1998).
    pdf

  158. Predictive Representations of State
    by Michael Littman, Richard Sutton and Satinder Singh.
    In Advances in Neural Information Processing Systems 14 (NIPS), pages 1555-1561, 2002.
    pdf

  159. ATTac-2000: An Adaptive Autonomous Bidding Agent
    by Peter Stone, Michael Littman, Satinder Singh and Michael Kearns.
    In Journal of Artificial Intelligence Research (JAIR), Vol 15, pages 189-206, 2001.
    pdf.
    (A shorter version also appears in AAAI'01 as listed below)

  160. Graphical Models for Game Theory
    by Michael Kearns, Michael Littman and Satinder Singh.
    In Proceedings of the Seventeenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 253-260, 2001.
    pdf

  161. An Efficient Exact Algorithm for Single Connected Graphical Games
    by Michael Littman, Michael Kearns and Satinder Singh.
    In Advances in Neural Information Processing Systems 14 (NIPS), pages 817-823, 2002.
    pdf

  162. FAucs: An FCC Spectrum Auction Simulator for Autonomous Bidding Agents
    by Janos Csirik, Michael Littman, Satinder Singh and Peter Stone.
    In Electronic Commerce: Proceedings of the Second Interanational Workshop 2001.
    pdf

  163. ATTac-2000: An Adaptive Autonomous Bidding Agent
    by Peter Stone, Michael Littman, Satinder Singh and Michael Kearns.
    In Proceedings of the Fifth International Conference on Autonomous Agents (AGENTS), pages 238-245, 2001.
    pdf

  164. Cobot: A Social Reinforcement Learning Agent
    by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone.
    In Advances in Neural Information Processing Systems 14 (NIPS) pages 1393-1400, 2002.
    pdf

  165. A Social Reinforcement Learning Agent
    by Charles Isbell, Christian Shelton, Michael Kearns, Satinder Singh and Peter Stone.
    In Proceedings of the Fifth International Conference on Autonomous Agents (AGENTS), pages 377-384, 2001.
    Winner of Best Paper Award.
    pdf

  166. Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System
    by Satinder Singh, Michael Kearns, Diane Litman, and Marilyn Walker.
    In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI), pages 645-651, 2000.
    pdf

  167. Cobot in LambdaMOO: A Social Statistics Agent
    by Charles Isbell, Michael Kearns, Dave Korman, Satinder Singh and Peter Stone.
    In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI), pages 36-41, 2000.
    pdf

  168. Automatic Optimization of Dialogue Management
    by Diane Litman, Michael Kearns, Satinder Singh and Marilyn Walker.
    In Proceedings of the 18th International Conference on Computational Linguistics (COLING), pages 502-508, 2000.
    pdf

  169. A Boosting Approach to Topic Spotting on Subdialogues
    by Kary Myers, Michael Kearns, Satinder Singh and Marilyn Walker.
    In Proceedings of the Seventeenth International Conference on Machine Learning (ICML) pages 655-662, 2000.
    pdf

  170. Eligibility Traces for Off-Policy Policy Evaluation
    by Doina Precup, Richard Sutton, and Satinder Singh.
    In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 759-766, 2000.
    pdf

  171. Nash Convergence of Gradient Dynamics in General-Sum Games
    by Satinder Singh, Michael Kearns and Yishay Mansour.
    In Proceedings of the Sixteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 541-548, 2000.
    pdf

  172. Fast Planning in Stochastic Games
    by Michael Kearns, Yishay Mansour, and Satinder Singh
    In Proceedings of the Sixteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 309-316, 2000.
    pdf

  173. "Bias-Variance" Error Bounds for Temporal Difference Updates
    by Michael Kearns and Satinder Singh.
    In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT), pages 142-147, 2000.
    pdf

  174. Reinforcement Learning for Spoken Dialogue Systems
    by Satinder Singh, Michael Kearns, Diane Litman and Marilyn Walker.
    In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
    pdf

  175. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
    by Satinder Singh, Tommi Jaakkola, Michael Littman, and Csaba Szpesvari.
    In Machine Learning Journal, vol 38(3), pages 287-308, 2000.
    pdf

  176. Policy Gradient Methods for Reinforcement Learning with Function Approximation
    by Richard Sutton, Dave McAllester, Satinder Singh and Yishay Mansour.
    In Advances in Neural Information Processing Systems 12 (NIPS), 2000.
    pdf

  177. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
    by Richard Sutton, Doina Precup and Satinder Singh.
    In Artificial Intelligence Journal, Volume 112, pages 181-211, 1999.
    pdf

  178. Approximate Planning for Factored POMDPs using Belief State Simplification
    by Dave McAllester and Satinder Singh.
    In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 409-416, 1999.
    pdf

  179. On the Complexity of Policy Iteration
    by Yishay Mansour and Satinder Singh.
    In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 401-408, 1999.
    pdf

  180. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms
    by Michael Kearns and Satinder Singh.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 996-1002, 1999.
    pdf

  181. Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
    by John K. Williams and Satinder Singh.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 1073-1079, 1999.
    pdf

  182. Optimizing admission control while ensuring quality of service in multimedia networks via reinforcement learning
    by Timothy Brown, Hong Tong, and Satinder Singh.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 982-988, 1999.
    pdf

  183. Improved switching among temporally abstract actions
    by Richard Sutton, Satinder Singh, Doina Precup and Balaraman Ravindran.
    In Advances in Neural Information Processing Systems 11 (NIPS), pages 1066-1072, 1999.
    pdf

  184. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes
    by John Loch and Satinder Singh.
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 323-331, 1998.
    pdf

  185. Near-Optimal Reinforcement Learning in Polynomial Time
    by Michael Kearns and Satinder Singh.
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 260-268, 1998.
    pdf

  186. Intra-Option Learning about Temporally Abstract Actions
    by Richard Sutton, Doina Precup and Satinder Singh.
    In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 556-564, 1998.
    pdf

  187. Theoretical Results on Reinforcement Learning with Temporally Abstract Behaviors
    by Doina Precup, Richard Sutton, and Satinder Singh.
    In Proceedings of the 10th European Conference on Machine Learning (ECML), pages 382-393. 1998.
    pdf

  188. Hierarchical Optimal Control of MDPs
    by Amy McGovern, Doina Precup, Balaraman Ravindran, Satinder Singh and Richard Sutton.
    In Proceedings of the Tenth Yale Workshop on Adaptive and Learning Systems, 1998.
    pdf

  189. How to Dynamically Merge Markov Decision Processes
    by Satinder Singh and David Cohn.
    In Advances in Neural Information Processing Systems 10 (NIPS), pages 1057-1063, 1998.
    pdf

  190. Analytical Mean Squared Error Curves for Temporal Difference Learning
    by Satinder Singh and Peter Dayan.
    In Machine Learning Journal, Volume 32, Issue 1, pages 5-40, 1998.
    pdf.
    A shorter version appears in the NIPS 9 Proceedings

  191. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems
    by Satinder Singh and Dimitri Bertsekas.
    In Advances in Neural Information Processing Systems 9 (NIPS), pages 974-980, 1997.
    pdf

  192. Planning with Closed-Loop Macro Actions
    by Doina Precup, Richard Sutton and Satinder Singh.
    In Proceedings of AAAI Fall Symposium on Model-directed Autonomous Systems, 1997.
    pdf

  193. Predicting Lifetimes in Dynamically Allocated Memory
    by David Cohn and Satinder Singh.
    In Advances in Neural Information Processing Systems 9 (NIPS), pages 939-945, 1997.
    pdf

  194. Analytical Mean Squared Error Curves for Temporal Difference Learning
    by Satinder Singh and Peter Dayan.
    In Advances in Neural Information Processing Systems 9 (NIPS), pages 1054-1060, 1997.
    pdf

  195. Reinforcement Learning with Replacing Eligibility Traces
    by Satinder Singh and Richard Sutton.
    In Machine Learning journal, Volume 22, Issue 1, pages 123-158, 1996.
    pdf abstract

  196. Learning Curve Bounds for Markov Decision Processes with Undiscounted Rewards
    by Lawrence Saul and Satinder Singh.
    In Proceedings of 9th Annual Conference on Computational Learning Theory (COLT), pages 147-156, 1996.
    pdf

  197. Long Term Potentiation, Navigation and Dynamic Programming
    by Peter Dayan and Satinder Singh.
    In Proceedings of Computation and Neural Systems Meeting (CNS) 1996.
    pdf

  198. Improving Policies Without Measuring Merits
    by Peter Dayan and Satinder Singh.
    In Advances in Neural Information Processing Systems 8 (NIPS), pages 1059-1065, 1996.
    pdf

  199. Markov Decision Processes in Large State Spaces
    by Lawrence Saul and Satinder Singh.
    In Proceedings of 8th Annual Workshop on Computational Learning Theory (COLT), pages 281-288, 1995.
    pdf

  200. Learning to Act using Real-Time Dynamic Programming
    by Andrew Barto, Steve Bradtke and Satinder Singh.
    In Artificial Intelligence, Volume 72, pages 81-138, 1995.
    pdf

  201. On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
    by Tommi Jaakkola, Michael Jordan and Satinder Singh.
    In Neural Computation, Volume 6, Number 6, pages 1185-1201, 1994.
    pdf

  202. Reinforcement Learning With Soft State Aggregation
    by Satinder Singh, Tommi Jaakkola and Michael Jordan.
    In Advances in Neural Information Processing Systems 7 (NIPS), pages 361-368, 1995.
    pdf

  203. Stochastic Convergence of Iterative DP Algorithms
    by Tommi Jaakkola, Michael Jordan and Satinder Singh.
    In Advances in Neural Information Processing Systems 6 (NIPS), pages 703-710, 1994.
    pdf

  204. Reinforcement Learning Algorithm for Partially Observable Markov Problems
    by Tommi Jaakkola, Satinder Singh and Michael Jordan.
    In Advances in Neural Information Processing Systems 7 (NIPS), pages 345-352, 1995.
    pdf

  205. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes
    by Satinder Singh.
    In Proceedings of the Twelth National Conference on Artificial Intelligence (AAAI), pages 700-705, 1994.
    pdf

  206. Learning Without State-Estimation in Partially Observable Markovian Decision Processes
    by Satinder Singh, Tommi Jaakkola and Michael Jordan.
    In Machine Learning: Proceedings of the Eleventh International Conference (ICML), pages 284-292, 1994.
    pdf

  207. On Step-Size and Bias in Temporal-Difference Learning
    by Richard Sutton and Satinder Singh.
    In Proceedings of Eighth Yale Workshop on Adaptive and Learning Systems, 1994.
    pdf abstract

  208. Robust Reinforcement Learning in Motion Planning
    by Satinder Singh, Andrew Barto, Roderic Grupen, and Christopher Connolly.
    In Advances in Neural Information Processing Systems 6 (NIPS), pages 655-662, 1994.
    pdf

  209. An Upper Bound on the Loss from Approximate Optimal-Value Functions
    by Satinder Singh and Richard Yee.
    In Machine Learning, Volume 16, Issue 3, pages 227-233, 1994.
    pdf

  210. Distributed Representation of Limb Motor Programs in Arrays of Adjustable Pattern Generators
    by Neil Berthier, Satinder Singh, Andrew Barto, and Jim Houk.
    In Journal of Cognitive Neuroscience, vol 5:1, pages 56-78, 1993.
    pdf

  211. Reinforcement Learning with a Hierarchy of Abstract Models
    by Satinder Singh.
    In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI), pages 202-207, 1992.
    pdf

  212. A Cortico-Cerebellar model that learns to generate distributed motor commands to control a kinetic arm
    by Satinder Singh, Neil Berthier, Andrew Barto, and Jim Houk.
    In Advances in Neural Information Processing Systems 4 (NIPS), pages 611-618, 1992.
    pdf

  213. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models
    by Satinder Singh.
    In Proceedings of the Ninth Machine Learning Conference, pages 406-415, 1992.
    pdf

  214. Transfer of Learning by Composing Solutions of Elemental Sequential Tasks
    by Satinder Singh.
    In Machine Learning Journal, Volume 8, Issue 3, pages 323-339, 1992.
    pdf

  215. The Efficient Learning of Multiple Task Sequences
    by Satinder Singh.
    In Advances in Neural Information Processing Systems 4 (NIPS), pages 251-258, 1992.
    pdf

  216. Transfer of Learning Across Compositions of Sequential Tasks
    by Satinder Singh.
    In Machine Learning: Proceedings of the Eighth International Workshop, pages 348-352, 1991.
    pdf

  217. Reinforcement Learning and Dynamic Programming
    by Andrew Barto and Satinder Singh.
    In Proceedings of Sixth Yale Workshop on Adaptive and Learning Systems, 1990.

Magazine Articles, Book Chapters and Others

  • Value-Driven Procurement in the TAC Supply Chain Game
    by Christopher Kiekintveld, Michael P. Wellman, Satinder Singh, and Vishal Soni. SIGecom Exchanges, Volume4.3, pages 9-19, 2004.
    pdf

  • Reinforcement Learning for 3 vs. 2 Keepaway
    by Peter Stone and R. Sutton and Satinder Singh.
    In RoboCup-2000: Robot Soccer World Cup IV, P. Stone, T. Balch, and G. Kraetszchmar, Eds., Springer Verlag.
    pdf.
    An earlier version appeared in the Proceedings of the RoboCup-2000 Workshop, Melbourne, Australia

  • Soft Dynamic Programming Algorithms: Convergence Proofs
    by Satinder Singh.
    In Proceedings of Workshop on Computational Learning and Natural Learning (CLNL), Provincetown, Massachusetts, 1993.
    pdf

  • On the Computational Economics of Reinforcement Learning
    by Andrew Barto and Satinder Singh.
    In Proceedings of Connectionist Summer School, 1990.
    pdf

  • An Adaptive Sensorimotor Network Inspired by the Physiology of the Cerebellum
    by Jim Houk, Satinder Singh, Charles Fisher, and Andrew Barto.
    Appears as a chapter in WT Miller, RS Sutton, and PJ Werbos, editors, Neural Network for Control, pages 301-348, 1989.

    My one paper in a non-technical journal!

  • How to Make Software Agents Do the Right Thing: An Introduction to Reinforcement Learning
    by Satinder Singh, Peter Norvig and David Cohn.
    In Dr. Dobbs journal, March issue, 1997.

    pdf
    [html version]

    An Almost Tutorial on RL (extracted from my Thesis)

  • An (Almost) Tutorial on Reinforcement Learning
    . gzipped postscript. Extracted from my 1993 thesis

    Going Nowhere Papers

  • Asynchronous Modified Policy Iteration with Single-sided Updates
    . Satinder Singh and Vijay Gullapalli. Working Paper, 1993.
    pdf