Novel techniques to
classify snippets of video according to the activities they entail
Novel
techniques to quickly localize “activity proposals”, i.e. temporal segments in
video where the probability of finding interesting activities is high.
Combining the
knowledge of objects and scenes in classifying an activity, since an activity
is a spatiotemporal phenomenon where humans interacts with objects in a
particular place
Crowd-sourcing
framework (e.g. using Amazon Mechanical Turk) to cheaply extend the annotations
of ActivityNet to object and place classes, as well as, free-form text
description. These annotations will enrich the dataset, forge links with other
large-scale datasets, and enable new functionality (e.g. textual translation of
a video that enables text queries).