Personal Access Technology


Figure of the Scribe system architecture. Multiple workers are engaged to help type parts of what they hear, which are all then passed to a server to be merged into a final caption stream that is displayed to the end user.

Figure of the Scribe system architecture. Multiple workers are engaged to help type parts of what they hear, which are all then passed to a server to be merged into a final caption stream that is displayed to the end user.



Achieving a universally accessible world very difficult because people's abilities vary so greatly. Technology offers a means to bridge many of these accessibility gaps either completely automatically, or through human-powered access technologies. This research direction explores how to design and create technologies that allow people with disabilities to more easily access the world around them, and then deploy these systems with real end users.


Broad Challenges:
  • How can we use interactive intelligent systems to improve access for people with disabilities?
  • How can crowds of peers be leveraged to help provide access accomodations?
  • How collective responses yield more usable interfaces for access technology?


Captioning

Producing real-time captions for deaf and hard of hearing (DHH) people is difficult because of the speed and complexity of natural language speech, and the variations between different speakers and settings. As a result automatic speech recognition (ASR) is not able to achieve accurate enough results in many real-world settings. However, while people can understand speech naturally, they cannot type fast enough (up to hundreds of words per minute) to capture what is being said without years of prior training as an expert stenographer. Scribe [4,5] coordinates multiple non-expert contributors who type a sub-set of what they hear, allowing them to keep up with live speech. By intelligently recombining and filtering this input [2], DHH end users can get accurate captions, even in never-before-seen settings, within 5 seconds of the original speech.

This work has also explored how presentation effects real end-users [1], and how caption display can be improved [3]. By combining people's abilities, and taking into account users on both sides of captioning accommodations, better access to spoken content can be provided.


Visual Question Answering

Chorus:View ("View") [7] aims to use conversational interaction to improve how visual questions can be answered for blind and visually-impaired users. View extends Chorus by including a live video stream to provide context for the questions and clarifications being asked. Using this approach, answers that took 10-15 minutes to find with prior approaches take 1-2 minutes on average.

RegionSpeak [8] aims to capture the benefits of more rich per-turn interactions than simple text responses, without requiring the same continuous interaction that Chorus entails. Multiple responses to a visual question, as well as spatial information about how answers relate to the image, are provided to the end user. As a result, answers can be collected quickly, while requiring fewer interactions.


Activity Recognition for Cognitive Assistants

Individuals with cognitive impairments, such as Alzheimer's, often require live-in caretakers, who are expensive, and not available 24/7. Automated prompting systems could be used to provide a cheaper, more available source of reminders and care, but cannot reliably recognize people's activities and actions. Legion:AR [6] uses crowds of family, volunteers, and other non-experts to provide real-time streams of activity recognition labels, faster and more accurately than any individual could alone. Automated recognitions systems can also be integrated into this process to make the system faster and cheaper when they correctly recognize an event, while not adding error when they make a mistake, thanks to crowd oversight.



Publications

[1]  R. Kushalnagar, W.S. Lasecki, J.P. Bigham. Captions Versus Transcripts for Online Video Content. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A 2013). p71-78.

[2]  W.S. Lasecki, J.P. Bigham. Online Quality Control for Real-time Crowd Captioning. In Proceedings of the International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2012). Boulder, CO. p143-150.

[3]  W.S. Lasecki, R. Kushalnagar, and J.P. Bigham. Helping Students Keep Up with Real-Time Captions by Pausing and Highlighting. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A 2014). Seoul, Korea. Best Technical Paper

[4]  W.S. Lasecki, C.D. Miller, J.P. Bigham. Warping Time for More Effective Real-Time Crowdsourcing. In Proceedings of the International ACM Conference on Human Factors in Computing Systems (CHI 2013). Paris, France. p2033-2036. Best Paper Honorable Mention

[5]  W.S. Lasecki, C.D. Miller, A. Sadilek, A. Abumoussa, D. Borrello, R. Kushalnagar, J.P. Bigham. Real-time Captioning by Groups of Non-Experts. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2012). Boston, MA. p23-34. Best Paper Award Nominee

[6]  W.S. Lasecki, Y. Song, H. Kautz, J.P. Bigham. Real-Time Crowd Labeling for Deployable Activity Recognition. In Proceedings of the International ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2013). San Antonio, TX. p1203-1212.

[7]  W.S. Lasecki, P. Thiha, Y. Zhong, E. Brady, J.P. Bigham. Answering Visual Questions with Conversational Crowd Assistants. In Proceedings of the International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS 2013). Seattle, WA. Article #18.

[8]  Y. Zhong, W.S. Lasecki, E. Brady, J.P. Bigham. RegionSpeak: Quick Comprehensive Spatial Descriptions of Complex Images for Blind Users. In Proceedings of the International ACM Conference on Human Factors in Computing Systems (CHI 2015). Seoul, Korea. p2353-2362.
[9]  Y. Gaur, W.S. Lasecki, F. Metze, J.P. Bigham. The Effects of Automatic Speech Recognition Quality on Human Transcription Latency. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A 2016). Montreal, Canada. Best Paper