Camera-based road scene analysis is an important task for building driving assistance systems and autonomous vehicles. An crucial component of road scene analysis is detection, tracking, and recognition of text object. In this paper, we consider the recognition of road scene text objects in sequences of video frames, and propose an approach to per-frame recognition results accumulation with a dynamic stopping decision. Experimental evaluation on an open dataset RoadText-1K showed that the proposed approach allows to achieve mean lower recognition error for the same mean number of processed frames, and significantly reduce the number of text objects which have to be recognized in each frame, thus relieving the load on the computational unit.
|