Visual Understanding and Robotics

Figure of example traces from the robot navigation trials using Legion. In Legion, there are solo, vote, active, and leader, four conditions and they are distinguished by how to process gathered data from crowds.

Understanding visual scenes allows understanding of user behavior, event activity recognition, and the ability to deploy robots in novel domains. However, this task is one that is an exceedingly difficult to fully automate in general domains, but one that people can handle naturally. This research area aims to combine human and machine intelligence to generate real-time annotations and training data for visual content in open domains.

  • How can we analyze human behaviors in large data sets?
  • How can we answer to complex queries regarding environments that traditional sensors cannot? For example: "Does the table needs to be cleaned?"
  • How can we reassemble and classify crowds' natural language descriptions of scenes?
  • How an we control robots in a ways that adapt to diverse, novel environments?
  • How can we automate systems and improve perfomance on future tasks by learning from crowds?

Smart Sensing

Glance [2] is a system that allows researchers to rapidly query, sample, and analyze large video datasets for behavioral events that are hard to detect automatically. Glance's rapid responses to natural language queries and feedback regarding question ambiguity and anomalies in data allow users to have a conversation-like interaction with their data.

Zensors [1] is a new sensing approach that fuses real-time human intelligence from online crowd workers with automatic approaches to provide robust, adaptive, and readily deployable intelligent sensors. With Zensors, users can go from questions about unexpected situations or complicated environments, to live sensor feeds in less than 60 seconds.

Activity Recognition and Understanding

Legion:AR [4] provides robust, deployable activity recognition by supplementing existing recognition systems with on-demand, real-tie activity identification using input from the crowd. Legion:AR uses activity labels collected from crowd workers to train an automatic activity recognition system online to automatically recognize future occurrences.

We have also explored how structural information can be elicited from crowd workers and reassembled by the system [5]. The result is a dependency graph that can be used to train a systems many times faster than labels alone.

Robot Navigation and Control

Controlling a robot is a difficult task that requires understanding visual scenes, problems and their context, how environments can be manipulated, and more (e.g., natural language). Even relatively simple tasks can become complex: for example: simple navigation tasks can require substantial customization for the robot and the environment. However, these are usually tasks that people understand naturally and perform on a daily basis.

In prior work, we developed Legion [3], a system that allowed crowds of online contributors to collectively control existing graphical user interfaces (GUIs), including robot-control interfaces, in real time. Using Legion, a robot could accept natural language commands and potentially complete navigation tasks quicker than any individual person could. This project seeks to combine human and machine intelligence to create robotic-control systems that can complete complex tasks better than either AI or human controllers could alone by using a combination of both.


[1]  G. Laput, W.S. Lasecki J. Wiese, R. Xiao, J.P. Bigham, C. Harrison. Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds. In Proceedings of the International ACM Conference on Human Factors in Computing Systems (CHI 2015). Seoul, Korea. p1935-1944. [ Video ]

[2]   W.S. Lasecki, M. Gordon, D. Koutra, M.F. Jung, S.P. Dow and J.P. Bigham. Glance: Rapidly Coding Behavioral Video with the Crowd. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2014). Honolulu, HI. p551-562. [ Video ]

[3]   W.S. Lasecki, K.I. Murray, S. White, R.C. Miller, J.P. Bigham. Real-time Crowd Control of Existing Interfaces. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST 2011). Santa Barbara, CA. p23-32.

[4]   W.S. Lasecki, Y. Song, H. Kautz, J.P. Bigham. Real-Time Crowd Labeling for Deployable Activity Recognition. In Proceedings of the International ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW 2013). San Antonio, TX. p1203-1212.

[5]   W.S. Lasecki, L. Weingard, G. Ferguson, J.P. Bigham. Finding Dependencies Between Actions Using the Crowd. In Proceedings of the International ACM Conference on Human Factors in Computing Systems (CHI 2014). Toronto, Canada.