Despite a strong evidence of the clinical and economic benefits of minimally invasive surgery (MIS) for many common surgical procedures, there is a gross underutilization of MIS in many US hospitals, potentially due to its steep learning curve. Intraoperative videos captured using a camera inserted into the body during MIS procedures are emerging as an invaluable resource for MIS education, skill assessment and quality assurance. However, these videos often have a duration of several hours and there is a pressing need for automated tools to help surgeons quickly find key semantic segments of interest within MIS videos. In this paper, we present a novel integrated approach for facilitating content-based retrieval of video segments that are semantically similar to a query video within a large collection of MIS videos. We use state-of-theart deep 3D convolutional neural network (CNN) models pre-trained on large public video classification datasets to extract spatiotemporal features from MIS video segments and employ an iterative query refinement (IQR) strategy where in a support vector machine (SVM) classifier trained online based on relevance feedback from the user is used to refine the search results iteratively. We show that our method outperforms the state-of-the-art on the SurgicalActions160 dataset containing 160 video clips of typical surgical actions in gynecologic MIS procedures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.