Pairwise Weights for Temporal Credit Assignment by Zeyu Zheng, Risto Vuorio, Richard Lewis, and Satinder Singh. In 36th AAAI Conference on Artificial Intelligence, 2022 arXiv version.
Reward is Enough by David Silver, Satinder Singh, Doina Precup, and Richard Sutton. In Artificial Intelligence, vol 299, 2021 pdf.
On the Expressivity of Markov Reward by David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh. In Neural Information Processing Systems (NeurIPS), 2021 Outstanding Paper Award pdf.
Proper Value Equivalence by Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, and Satinder Singh. In Neural Information Processing Systems (NeurIPS), 2021 arXiv version.
Discovery of Options via Meta-Learned Subgoals by Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, and Satinder Singh. In Neural Information Processing Systems (NeurIPS), 2021 arXiv version.
Learning State Representations from Random Deep Action-Conditional Predictions by Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, and Satinder Singh. In Neural Information Processing Systems (NeurIPS), 2021 arXiv version.
Reward is Enough for Convex MDPs by Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, and Satinder Singh. In Neural Information Processing Systems (NeurIPS), 2021 arXiv version.
Reinforcement Learning of Implicit and Explicit Control Flow Instructions by Ethan Brooks, Janarthanan Rajendran, Richard Lewis, and Satinder Singh. In International Conference on Machine Learning (ICML), 2021 arXiv version.
Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-Person Simulated 3D Environment by Wilka Carvalho, Anthony Liang, Kimin Lee, Honglak Lee, Richard Lewis, and Satinder Singh. In International Joint Conference on Artificial Intelligence (IJCAI), 2021 arXiv version.
Rational use of episodic and working memory: A normative account of prospective memory by Ida Mommennejad, Jarrod Lewis-Peacock, Kenneth A. Normal, Jonathan D. Cohen, Satinder Singh, and Richard L. Lewis. In Neuropsychologia, vol 158, 2021 pdf.
Efficient Querying for Cooperative Probabilistic Commitments by Qi Zhang, Edmund Durfee, and Satinder Singh. In 35th AAAI Conference on Artificial Intelligence (AAAI), 2021 arXiv version.
The Value Equivalence Principle for Model-Based Reinforcement Learning by Christopher Grimm, Andre Barreto, Satinder Singh, and David Silver. In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020 arXiv version.
Discovering Reinforcement Learning Algorithms by Junhyuk Oh, Matteo Hessel, Wojciech Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, and David Silver. In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020 arXiv version.
Meta-Gradient Reinforcement Learning with an Objective Discovered Online by Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, and David Silver. In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020 arXiv version.
Learning to No-Press Diplomacy with Best Response Policy Iteration by Thomas Anthony, Tom Eccles, Andrea Tacchetti, Janos Kramar, Ian Gemp, Thomas Hudson, Nicolas Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, and Yoram Bachrach. In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020 arXiv version.
A Self-Tuning Actor-Critic Algorithm by Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, and Satinder Singh. In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020 arXiv version.
On Efficiency in Hierarchical Reinforcement Learning by Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, and Satinder Singh. In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020 pdf.
What can Learned Intrinsic Rewards Capture? by Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, and Satinder Singh. In International Conference on Machine Learning (ICML), 2020. arxiv version.
Sample Complexity of Reinforcement Learning Using Linearly Combined Model Ensembles by Aditya Modi, Nan Jiang, Ambuj Tewari, and Satinder Singh. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020 arXiv version.
How Should An Agent Practice? by Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, and Satinder Singh. In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020. pdf.
Modeling Probabilistic Commitments for Maintainance is Inherently Harder than for Achievement by Qi Zhang, Edmund Durfee, and Satinder Singh. In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020. pdf.
Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes by Shun Zhang, Edmund Durfee, and Satinder Singh. In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020. pdf.
Discovery of Useful Questions as Auxiliary Tasks by Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hsselt, David Silver, and Satinder Singh. In Neural Information Processing Systems (NeurIPS), 2019. arxiv version.
Behavior Suite for Reinforcement Learning by Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, and Hado Van Hasselt. In Neural Information Processing Systems (NeurIPS), 2019. arxiv version.
Hindsight Credit Assignment by Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheslaghi Azar, Bilal Piot, Nicolas Heess, Hado van Hsselt, Gregory Wayne, Satinder Singh, Doina Precup, and Remi Munos. In Neural Information Processing Systems (NeurIPS), 2019. pdf.
No Press Diplomacy: Modeling Multi-Agent Gameplay by Philip Paquette, Yuchen Lu, Steven Bocco, Max ). Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, and Aaron Courville. In Neural Information Processing Systems (NeurIPS), 2019. arxiv version.
Disentangled Cumulants Help Succesor Representations Transfer to New Tasks by Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, and Satinder Singh. arxiv.
Deep Reinforcment Learning for Dynamic Multi-Driver Dispatching and Repositioning Problem by John Holler, Risto Vuorio, Tiancheng Jin, Satinder Singh, Zhiwei Qin, Jieping Ye, Xiaocheng Tan, Yan Jiao, and Chenxi Wang. In International Conference on Data Mining (ICDM-Short Paper), 2019. pdf.
Learning Independently-Obtainable Reward Functions by Christopher Grimm and Satinder Singh. arXiv version.
Many-Goals Reinforcement Learning by Vivek Veeriah, Junhyuk Oh, and Satinder Singh. arXiv version.
Learning to Communicate and Solve Visual Blocks-World Tasks by Qi Zhang, Richard Lewis, Satinder Singh, and Edmund Durfee. In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019. pdf.
On Learning Intrinsic Rewards for Policy Gradient Methods by Zeyu Zheng, Junhyuk Oh, and Satinder Singh. In Neural Information Processing Systems (NIPS), 2018. arXiv version.
Generative Adversarial Self-Imitation Learning by Yijie Guo, Junhyuk Oh, Satinder Singh, and Honglak Lee. In Neural Information Processing Systems (NeurIPS), 2018. arXiv version.
Completing State Representations Using Spectral Learning by Nan Jiang, Alex Kulesza, and Satinder Singh. In Neural Information Processing Systems (NIPS), 2018. pdf.
Learning End-to-End Goal-Oriented Dialog with Multiple Answers by Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, and Lazaros Polymenakos. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. pdf.
Self-Imitation Learning by Junhyuk Oh, Yijie Guo, Satinder Singh, and Honglak Lee. In International Conference on Machine Learning (ICML), 2018. arXiv version.
Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes by Shun Zhang, Edmund Durfee, and Satinder Singh. In International Joint Conference on Artificial Intelligence (IJCAI), 2018. pdf.
Markov Decision Processes with Continuous Side Information by Aditya Modi, Nan Jiang, Satinder Singh, and Ambuj Tewari. In International Conference on Algorithmic Learning Theory (ALT), 2018. conf pdf, arXiv link.
The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA by Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, and Jenna Wiens. In Sloan Sports Analytics Conference, 2018. arXiv link.