Intersection of Deep Learning and Reinforcement Learning

Go back to publications main page.

  • Discovering Reinforcement Learning Algorithms
    by Junhyuk Oh, Matteo Hessel, Wojciech Czarnecki, Zhongwen Xu, Hado van Hasselt, Satinder Singh, and David Silver.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  • Meta-Gradient Reinforcement Learning with an Objective Discovered Online
    by Zhongwen Xu, Hado van Hasselt, Matteo Hessel, Junhyuk Oh, Satinder Singh, and David Silver.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  • A Self-Tuning Actor-Critic Algorithm
    by Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, and Satinder Singh.
    In Thirty Fourth Conference on Neural Information Processing Systems (NeurIPS), 2020
    arXiv version.

  • What can Learned Intrinsic Rewards Capture?
    by Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss, Hado van Hasselt, David Silver, and Satinder Singh.
    In International Conference on Machine Learning (ICML), 2020.
    arxiv version.

  • How Should An Agent Practice?
    by Janarthanan Rajendran, Richard Lewis, Vivek Veeriah, Honglak Lee, and Satinder Singh.
    In Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI), 2020.
    pdf.

  • Discovery of Useful Questions as Auxiliary Tasks
    by Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Richard Lewis, Janarthanan Rajendran, Junhyuk Oh, Hado van Hsselt, David Silver, and Satinder Singh.
    In Neural Information Processing Systems (NeurIPS), 2019.
    arxiv version.

  • Deep Reinforcment Learning for Dynamic Multi-Driver Dispatching and Repositioning Problem
    by John Holler, Risto Vuorio, Tiancheng Jin, Satinder Singh, Zhiwei Qin, Jieping Ye, Xiaocheng Tan, Yan Jiao, and Chenxi Wang.
    In International Conference on Data Mining (ICDM-Short Paper), 2019.
    pdf.

  • Learning Independently-Obtainable Reward Functions
    by Christopher Grimm and Satinder Singh.
    arXiv version.

  • Many-Goals Reinforcement Learning
    by Vivek Veeriah, Junhyuk Oh, and Satinder Singh.
    arXiv version.

  • Learning to Communicate and Solve Visual Blocks-World Tasks
    by Qi Zhang, Richard Lewis, Satinder Singh, and Edmund Durfee.
    In Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019.
    pdf.

  • On Learning Intrinsic Rewards for Policy Gradient Methods
    by Zeyu Zheng, Junhyuk Oh, and Satinder Singh.
    In Neural Information Processing Systems (NIPS), 2018.
    arXiv version.

  • Self-Imitation Learning
    by Junhyuk Oh, Yijie Guo, Satinder Singh, and Honglak Lee.
    In International Conference on Machine Learning (ICML), 2018.
    arXiv version.

  • Learning End-to-End Goal-Oriented Dialog with Multiple Answers
    by Janarthanan Rajendran, Jatin Ganhotra, Satinder Singh, and Lazaros Polymenakos.
    In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
    pdf.

  • Value Prediction Networks
    by Junhyuk Oh, Satinder Singh, Honglak Lee.
    In Neural Information Processing Systems (NIPS), 2017.
    arXiv.

  • Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
    by Junhyuk Oh, Satinder Singh, Honglak Lee, and Pushmeet Kohli.
    In International Conference on Machine Learning (ICML), 2017.
    pdf.

  • Learning to Query, Reason, and Answer Questions on Ambiguous Texts.
    by Xiaoxiao Guo, Tim Klinger, Clemens Rosenbaum, Jospeh Bigus, Murray Campbell, Ban Kawas, Kartik Talamadupula, Gerald Tesauro, and Satinder Singh.
    In 5th International Conference on Learning Representations (ICLR), 2017.
    pdf.

  • Control of Memory, Active Perception, and Action in Minecraft.
    by Junhyuk Oh, Valliappa Chockalingum, Satinder Singh, and Honglak Lee.
    In 33rd International Conference on Machine Learning (ICML), 2016.
    pdf.

  • Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games.
    by Xiaoxiao Guo, Satinder Singh, Richard Lewis, and Honglak Lee.
    In 25th International Joint Conference on Artificial Intelligence (IJCAI), 2016.
    pdf.

  • Action-Conditional Video Prediction Using Deep Networks in ATARI Games.
    by Juhnyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard Lewis, and Satinder Singh.
    In Neural Information Processing Systems, 2015.
    online videos
    arxiv pdf, NIPS pdf, NIPS Appendix pdf.

  • Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.
    by Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard Lewis, and Xiaoshi Wang.
    In Neural Information Processing Systems (NIPS), 2014.
    pdf.