Bruno Silva, Sandro Queirós, Marcos Fernández-Rodríguez, Bruno Oliveira, Helena Torres, Pedro Morais, Lukas Buschle, Jorge Correia-Pinto, Estevão Lima, João Vilaça
Inspired by the ”What Matters in Unsupervised Optical Flow” study, the goal of this work is to evaluate the performance of the ARFlow architecture for unsupervised optical flow in the context of tracking keypoints in laparoscopic videos. This assessment could provide insight into the applicability of ARFlow and similar architectures for this particular application, as well as their strengths and limitations. To do so, we use the SurgT challenge’s dataset and metrics to evaluate the tracker’s accuracy and robustness and its relationship with distinct network components. Our results corroborate some of the findings reported by Jonschkowski et al. However, certain components demonstrate a distinct behavior, possibly indicating underlying issues, namely intrinsic to the application, that impact overall performance and which may have to be addressed in the context of soft-tissue trackers. These results point to potential bottlenecks and areas where future work may target on.
Marcos Fernández-Rodríguez, Bruno Silva, Sandro Queirós, Helena Torres, Bruno Oliveira, Pedro Morais, Lukas Buschle, Jorge Correia-Pinto, Estevão Lima, João Vilaça
Surgical instrument segmentation in laparoscopy is essential for computer-assisted surgical systems. Despite the Deep Learning progress in recent years, the dynamic setting of laparoscopic surgery still presents challenges for precise segmentation. The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information. The framework’s ease of use, including its ability to be automatically configured, and its low expertise requirements, have made it a popular base framework for comparisons. Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information. This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task, taking advantage of the fact that instruments are the main moving objects in the surgical field. With this new input, the temporal component would be indirectly added without modifying the architecture. Using CholecSeg8k dataset, three different representations of movement were estimated and used as new inputs, comparing them with a baseline model. Results showed that the use of OF maps improves the detection of classes with high movement, even when these are scarce in the dataset. To further improve performance, future work may focus on implementing other OF-preserving augmentations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.