Existing methods on appearance-based gaze estimation mostly regress gaze direction from eye images, neglecting facial information and head pose which can be much helpful. In this paper, we propose a robust appearance-based gaze estimation method that regresses gaze directions jointly from human face and eye. The face and eye regions are located based on the detected landmark points, and representations of the two modalities are modeled with the convolutional neural networks (CNN), which are finally combined for gaze estimation by a fused network. Furthermore, considering the various impact of different facial regions on human gaze, the spatial weights for facial area are learned automatically with an attention mechanism and are applied to refine the facial representation. Experimental results validate the benefits of fusing multiple modalities in gaze estimation on the Eyediap benchmark dataset, and the propose method can yield better performance to previous advanced methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.