Learning to Act from Demonstration

PI: Min Sun (National Tsing Hua University)

Status Quo:

1. Risk assessment given visual observation has not been widely explored. Existing methods assess risk caused either by the environment or by priors on social activities, whereas we focus on assessing risk explicitly triggered by the observed actions of an agent and its interactions with the environment.

2. Risk assessment is related to predicting the possibility of catastrophic events occurring in the future. Existing methods focus on activity categories and do not study risk assessment of objects and regions in the video.

3. Anticipation has been applied in tasks other than event anticipation. Human activity anticipation can also improve human-robot collaboration

4. Parameter prediction in deep networks is a relatively new idea. We introduce a novel dynamic parameter predictor layer for estimating spatial riskiness depending on the agent behavior.

Key New Insights:

1. Agent-centric: We define our problem as risk assessment with respect to the agent appearing at each time step.

2. Risk assessment contains two domains, (1) in the temporal domain (accident anticipation) and (2) in spatial domain (risky region localization).

3. Accident anticipation: to predict an accident before it occurs.

4. Risky region localization: to spatially localize the regions in the scene that might be involved in a future accident.

5. The Epic Fail (EF) dataset: (1) Consisting of 3000 viral videos capturing various accidents: (1)We ask annotator to annotate the region causing the failure event. (2) The agents and the risky regions are annotated by 2D bounding boxes. (3) We manually identify the time when accident occurs in a subset of raw videos and sample short videos of 3-4 seconds from the subset.

For more information, click here to visit our project website.

(Updated in Jul, 2017)