Transformers are becoming the state-of-the-art in multiple Computer vision (CV) and Natural language processing (NLP) tasks. As for hyperspectral target detection, a Transformer architecture named SpectralFormer2 has been developed and demonstrated improved performance over the previously state-of-the-art Convolutional neural network (CNN) architecture on widely studied classification tasks. The SpectralFormer was adapted from a CV architecture, the Vision Transformer (ViT). Concurrently, still in CV, a hierarchical and multi-scale version of the ViT, named Shifted windows (Swin) Transformer, is gaining momentum and is already the stateof-the-art on multiple tasks. In this paper, we adapt the Swin Transformer for hyperspectral classification and rare sub-pixel target detection. We apply this new architecture to commonly studied classification benchmarks public datasets such as Pavia University and Centre datasets, and to a new, large-scale airborne sub-pixel target detection dataset we developed. This new dataset is composed of over 100 M pixels taken over three collect days at three locations thousands of kilometers apart and in different climates. The proposed model reaches competitive performance on all of those tasks while reducing memory usage by 63 %.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.