Simultaneous localization and mapping (SLAM) is a problem in robotics aiming to model the environment and estimate the pose of a device within it at the same time. Developed solution is the core technology for emerging applications such as self-driving cars, automated guided vehicles (AGV), and domestic robots. Inevitably, the performance of SLAM algorithms relies highly on input signals from optical equipment ranging from cameras, laser rangefinders, and LIDAR. Loop closure, the function detecting visited locations to correct accumulated errors, is a crucial element in a SLAM system. Conventionally, geometric features are used to interpret the scenes for similarity estimation. In scenarios with nearly identical scenes existing, the feature-based approaches remain ineffective. Semantic objects and the comparison of multi-frame, therefore, can be integrated into the process and present a new level of environmental information. In this article, we first provide an overview of the SLAM system. Then the semantic object-assisted and the time and spatial sequence comparison approach are proposed to improve the similarity measurement in the SLAM process. By integrating recognized objects like landmarks and signs, we can classify similar scenes better and significantly improve building-scale indoor mapping results. The performance of systems adopting various optical technologies is also compared in this work.
In this paper, we present our new results in news video story
segmentation and classification in the context of TRECVID video
retrieval benchmarking event 2003. We applied and extended the
Maximum Entropy statistical model to effectively fuse diverse
features from multiple levels and modalities, including visual,
audio, and text. We have included various features such as motion,
face, music/speech types, prosody, and high-level text
segmentation information. The statistical fusion model is used to
automatically discover relevant features contributing to the
detection of story boundaries. One novel aspect of our method is
the use of a feature wrapper to address different types of
features -- asynchronous, discrete, continuous and delta ones. We
also developed several novel features related to prosody. Using
the large news video set from the TRECVID 2003 benchmark, we
demonstrate satisfactory performance (F1 measures up to 0.76 in
ABC news and 0.73 in CNN news), present how these multi-level
multi-modal features construct the probabilistic framework, and
more importantly observe an interesting opportunity for further
improvement.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.