Human-Object Interaction(HOI) recognition in videos aims to classify the interaction states of human and objects within each video segment of the activity. Each video segment is an atomic activity and these atomic activities can constitute a high-level activity according to a certain temporal relationship. The existence of the temporal relationship between segments indicates that there is a local constraint between a video segment and its following segment. The existing HOI recognition models do not consider the previous segment output when predicting the interaction state of the segment. So it is difficult to learn the accurate relationship between segments. Therefore, we propose a method that uses explicit knowledge to guide the networks to capture strict relations between segments. First, the transition relationships of interaction states between segments in the dataset are summarized and filtered as prior knowledge. Then we use graphs to express the extracted knowledge. We inject prior knowledge into transition matrices of conditional random fields to model this local constraint relationship between interaction states. In terms of micro and macro evaluation criteria, the knowledge guidance method proposed by us has achieved better results than the state-of-the-art on CAD-120 dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.