Conversational Agents

Intelligent agents that converse with users in real-world spaces allow continuous, seamless task support beyond screen-based interactions, making these assistants easier to interact with and more available, reducing the cognitive overhead required to get support. This research direction explores situated interaction and conversation with intelligent assistants, intelligent sensing technology, and augmented collaboration between people.

  • How do we hold a consistent conversation with many people?
  • How can crowd-powered dialog systems best be used to train AI dialog systems?
  • What incentives and voting mechanims can be used to aggregate answers quickly and reliably?
  • How can crowds collaborativly find answers quickly and reliably?


Chorus [4,1] is a crowd-powered personal assistant capable to holding a dialog with end users. Behind the scenes, multiple human contributors bring their individual knowledge and expertise to bear, while collectively maintaining a consistent, on-topic conversation. Voting techniques help support integrating answers and selecting the best responses from the crowd. Incentive mechanisms incentivize and reward contributions based on their utility to the conversation.

On-going work seeks to integrate Chorus' crowd-powered approach with automated dialog systems. As a first step, Gaurdian [2] uses the crowd to train dialog trees that connect natural language requests to API function calls.


Chorus:View ("View") [3] aims to use conversational interaction to improve how visual questions can be answered for blind and visually-impaired users. View extends Chorus by including a live video stream to provide context for the questions and clarifications being asked. Using this approach, answers that took 10-15 minutes to find with prior approaches take 1-2 minutes on average.


RegionSpeak [5] aims to capture the benefits of more rich per-turn interactions than simple text responses, without requiring the same continuous interaction that Chorus entails. Multiple responses to a visual question, as well as spacial information about how answers relate to the image, are provided to the end user. As a result, answers can be collected quickly, while requiring fewer interactions.


[1]   T. Huang, W.S. Lasecki, A. Azaria, J.P. Bigham. "Is there anything else I can help you with?": Challenges in Deploying an On-Demand Crowd-Powered Conversational Agent. In Proceedings of the AAAI Conference on Human Computation (HCOMP 2016). Austin, TX.
[2]   T. Huang, W.S. Lasecki, and J.P. Bigham. Guardian: A Crowd-Powered Spoken Dialogue System for Web APIs. In Proceedings of the AAAI Conference on Human Computation (HCOMP 2015). San Diego, CA.

[2]  W.S. Lasecki, P. Thiha, Y. Zhong, E. Brady, J.P. Bigham. Answering Visual Questions with Conversational Crowd Assistants. In Proceedings of the International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2013). Seattle, WA. Article #18.

[4]   W.S. Lasecki, R. Wesley, J. Nichols, A. Kulkarni, J.F. Allen, J.P. Bigham. Chorus: A Crowd-Powered Conversational Assistant. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2013). St Andrews, UK. p151-162.

[5]  Y. Zhong, W.S. Lasecki, E. Brady, J.P. Bigham. RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users. In Proceedings of the International ACM Conference on Human Factors in Computing Systems (CHI 2015). Seoul, Korea. p2353-2362.