Continuous acquisition of the latest information about the shape of the object allows for more efficient and robust classification, as well as accurate estimation of the target state. However, previous methods have often overlooked this problem and used only the target information from the first frame in tracking. In this paper, we propose three specific and practical guidelines aimed at updating the target state, enabling the development of an anchor-free generic object tracker without requiring any prior knowledge. These guidelines offer a clear path and direction for the development process. Using these guidelines, we develop our Dynamic Update Template (DUT) tracker that includes a template, a dynamic template, and a search branch, ensures unambiguous classification scores, provides estimation quality scores, and multiplies them to obtain the pcore, which serves as the basis for updating the dynamic template. By conducting thorough analyses and ablation studies, we validate the efficacy of our proposed guidelines. Our DUT tracker achieves better performance on challenging benchmarks (LaSOT) without excessive modifications. On the extensive TrackingNet dataset, DUT attains an impressive AUC score of 82.1 while maintaining a swift frame rate exceeding 90 FPS, surpassing the threshold for real-time performance.
In recent years, methods based on deep convolutional neural networks (CNNs) have gradually become the focus of research in the field of hyperspectral image (HSI) classification. It is well known that hyperspectral data itself contains spatial and spectral information. While CNN-based methods have advantages in extracting local spatial features, they are not good at handling spectral features and global information. Therefore, this paper proposes a multi-attention network that fuses local and key channel information to complete the task of HSI classification. First, the principal component analysis (PCA) is used to pre-process the HSI data. Second, a feature information fusion module based on the SE module and 2D convolution is constructed to fuse local spatial information and enhanced feature channel information. Third, the global covariance pooling function accelerates the convergence rate of the network. Finally, the fused features are sent to the Vision Transformer (ViT) module for position encoding to capture global sequential information and improve the hyperspectral image classification results. Experiments carried out on several typical three public datasets demonstrate that the proposed network method can provide competitive results compared to the other state-of-the-art HSI networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.