Satinder Singh's Recent Papers

2018+ Papers

Pairwise Weights for Temporal Credit Assignment
by Zeyu Zheng, Risto Vuorio, Richard Lewis, and Satinder Singh.
In 36th AAAI Conference on Artificial Intelligence, 2022
arXiv version.

Reward is Enough
by David Silver, Satinder Singh, Doina Precup, and Richard Sutton.
In Artificial Intelligence, vol 299, 2021
pdf.

On the Expressivity of Markov Reward
by David Abel, Will Dabney, Anna Harutyunyan, Mark K. Ho, Michael L. Littman, Doina Precup, and Satinder Singh.
In Neural Information Processing Systems (NeurIPS), 2021
Outstanding Paper Award
pdf.

Proper Value Equivalence
by Christopher Grimm, Andre Barreto, Gregory Farquhar, David Silver, and Satinder Singh.
In Neural Information Processing Systems (NeurIPS), 2021
arXiv version.

Discovery of Options via Meta-Learned Subgoals
by Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, and Satinder Singh.
In Neural Information Processing Systems (NeurIPS), 2021
arXiv version.

Learning State Representations from Random Deep Action-Conditional Predictions
by Zeyu Zheng, Vivek Veeriah, Risto Vuorio, Richard Lewis, and Satinder Singh.
In Neural Information Processing Systems (NeurIPS), 2021
arXiv version.

Reward is Enough for Convex MDPs
by Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, and Satinder Singh.
In Neural Information Processing Systems (NeurIPS), 2021
arXiv version.

Reinforcement Learning of Implicit and Explicit Control Flow Instructions
by Ethan Brooks, Janarthanan Rajendran, Richard Lewis, and Satinder Singh.
In International Conference on Machine Learning (ICML), 2021
arXiv version.

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-Person Simulated 3D Environment
by Wilka Carvalho, Anthony Liang, Kimin Lee, Honglak Lee, Richard Lewis, and Satinder Singh.
In International Joint Conference on Artificial Intelligence (IJCAI), 2021
arXiv version.

Rational use of episodic and working memory: A normative account of prospective memory
by Ida Mommennejad, Jarrod Lewis-Peacock, Kenneth A. Normal, Jonathan D. Cohen, Satinder Singh, and Richard L. Lewis.
In Neuropsychologia, vol 158, 2021
pdf.

Efficient Querying for Cooperative Probabilistic Commitments
by Qi Zhang, Edmund Durfee, and Satinder Singh.
In 35th AAAI Conference on Artificial Intelligence (AAAI), 2021
arXiv version.

The Value Equivalence Principle for Model-Based Reinforcement Learning
by Christopher Grimm, Andre Barreto, Satinder Singh, and David Silver.
In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
arXiv version.

Discovering Reinforcement Learning Algorithms
by Junhyuk Oh, Matteo Hessel, Wojciech Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, and David Silver.
In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
arXiv version.

Meta-Gradient Reinforcement Learning with an Objective Discovered Online
by Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, and David Silver.
In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
arXiv version.

Learning to No-Press Diplomacy with Best Response Policy Iteration
by Thomas Anthony, Tom Eccles, Andrea Tacchetti, Janos Kramar, Ian Gemp, Thomas Hudson, Nicolas Porcel, Marc Lanctot, Julien Perolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, and Yoram Bachrach.
In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
arXiv version.

A Self-Tuning Actor-Critic Algorithm
by Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, and Satinder Singh.
In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
arXiv version.

On Efficiency in Hierarchical Reinforcement Learning
by Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, and Satinder Singh.
In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
pdf.

What can Learned Intrinsic Rewards Capture?
by Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, and Satinder Singh.
In International Conference on Machine Learning (ICML), 2020.
arxiv version.

Sample Complexity of Reinforcement Learning Using Linearly Combined Model Ensembles
by Aditya Modi, Nan Jiang, Ambuj Tewari, and Satinder Singh.
In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
arXiv version.

How Should An Agent Practice?
by Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, and Satinder Singh.
In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
pdf.

Modeling Probabilistic Commitments for Maintainance is Inherently Harder than for Achievement
by Qi Zhang, Edmund Durfee, and Satinder Singh.
In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
pdf.

Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes
by Shun Zhang, Edmund Durfee, and Satinder Singh.
In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
pdf.

Discovery of Useful Questions as Auxiliary Tasks
by Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hsselt, David Silver, and Satinder Singh.
In Neural Information Processing Systems (NeurIPS), 2019.
arxiv version.

Behavior Suite for Reinforcement Learning
by Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, and Hado Van Hasselt.
In Neural Information Processing Systems (NeurIPS), 2019.
arxiv version.

Hindsight Credit Assignment
by Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheslaghi Azar, Bilal Piot, Nicolas Heess, Hado van Hsselt, Gregory Wayne, Satinder Singh, Doina Precup, and Remi Munos.
In Neural Information Processing Systems (NeurIPS), 2019.
pdf.

No Press Diplomacy: Modeling Multi-Agent Gameplay
by Philip Paquette, Yuchen Lu, Steven Bocco, Max ). Smith, Satya Ortiz-Gagne, Jonathan K. Kummerfeld, Satinder Singh, Joelle Pineau, and Aaron Courville.
In Neural Information Processing Systems (NeurIPS), 2019.
arxiv version.

Disentangled Cumulants Help Succesor Representations Transfer to New Tasks
by Christopher Grimm, Irina Higgins, Andre Barreto, Denis Teplyashin, Markus Wulfmeier, Tim Hertweck, Raia Hadsell, and Satinder Singh.
arxiv.

Deep Reinforcment Learning for Dynamic Multi-Driver Dispatching and Repositioning Problem
by John Holler, Risto Vuorio, Tiancheng Jin, Satinder Singh, Zhiwei Qin, Jieping Ye, Xiaocheng Tan, Yan Jiao, and Chenxi Wang.
In International Conference on Data Mining (ICDM-Short Paper), 2019.
pdf.

Learning Independently-Obtainable Reward Functions
by Christopher Grimm and Satinder Singh.
arXiv version.

Many-Goals Reinforcement Learning
by Vivek Veeriah, Junhyuk Oh, and Satinder Singh.
arXiv version.

Learning to Communicate and Solve Visual Blocks-World Tasks
by Qi Zhang, Richard Lewis, Satinder Singh, and Edmund Durfee.
In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019.
pdf.

On Learning Intrinsic Rewards for Policy Gradient Methods
by Zeyu Zheng, Junhyuk Oh, and Satinder Singh.
In Neural Information Processing Systems (NIPS), 2018.
arXiv version.

Generative Adversarial Self-Imitation Learning
by Yijie Guo, Junhyuk Oh, Satinder Singh, and Honglak Lee.
In Neural Information Processing Systems (NeurIPS), 2018.
arXiv version.

Completing State Representations Using Spectral Learning
by Nan Jiang, Alex Kulesza, and Satinder Singh.
In Neural Information Processing Systems (NIPS), 2018.
pdf.

Learning End-to-End Goal-Oriented Dialog with Multiple Answers
by Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, and Lazaros Polymenakos.
In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
pdf.

Self-Imitation Learning
by Junhyuk Oh, Yijie Guo, Satinder Singh, and Honglak Lee.
In International Conference on Machine Learning (ICML), 2018.
arXiv version.

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes
by Shun Zhang, Edmund Durfee, and Satinder Singh.
In International Joint Conference on Artificial Intelligence (IJCAI), 2018.
pdf.

Markov Decision Processes with Continuous Side Information
by Aditya Modi, Nan Jiang, Satinder Singh, and Ambuj Tewari.
In International Conference on Algorithmic Learning Theory (ALT), 2018.
conf pdf, arXiv link.

The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA
by Jiaxuan Wang, Ian Fox, Jonathan Skaza, Nick Linck, Satinder Singh, and Jenna Wiens.
In Sloan Sports Analytics Conference, 2018.
arXiv link.

All My Papers in Reverse Chronological Order