A human-centric navigation system has been developed with a focus on supporting blind users of prosthetic vision devices by providing these users the ability to navigate their environment independently. The system maps the environment and localizes the user while incorporating context-enhanced information about the scene generated by AI-based methods. A deep learning semantic segmentation engine is utilized to process information from RGB and incorporates depth imaging sensors to produce semantic mappings of the scene. The heightened level of environmental interpretability provided by semantic mapping enables high-level human-computer interactions with the user, such as queries for guidance to specific objects or features within the environment. Unlike traditional sensor-based mapping frameworks that represent the environment as simple occupied / unoccupied space, our semantic mapping approach interprets the identity of occupied space as specific types of objects and their regional association to region types (e.g., static, movable, dynamic). The semantic segmentation also enables contextually-aware scene processing, which our framework leverages for robust ground estimation and tracking with fused depth data to distinguish above-ground obstacles. To help address the highly limited vision performance of current prosthetic vision technology, the processed depth information is used to generate augmented vision feedback for the prosthetic vision user by filtering out ground and background scene elements and highlighting near-field obstacles to aid in visual identification and avoidance of obstacles while navigating. Supplemental user feedback is provided via a directional haptic headband and voice-based notifications paired with spatial sound for path following along autonomously computed trajectories towards desired destinations. An optimized architecture enables real-time performance on a wearable embedded processing platform, which provides high-fidelity update rates for time-critical tasks such as localization and user feedback while decoupling tasks with heavy computational loads. Substantial speed-up is thereby achieved compared to the conventional baseline implementation.
|