Transformers are becoming the state-of-the-art in multiple Computer vision (CV) and Natural language processing (NLP) tasks. As for hyperspectral target detection, a Transformer architecture named SpectralFormer2 has been developed and demonstrated improved performance over the previously state-of-the-art Convolutional neural network (CNN) architecture on widely studied classification tasks. The SpectralFormer was adapted from a CV architecture, the Vision Transformer (ViT). Concurrently, still in CV, a hierarchical and multi-scale version of the ViT, named Shifted windows (Swin) Transformer, is gaining momentum and is already the stateof-the-art on multiple tasks. In this paper, we adapt the Swin Transformer for hyperspectral classification and rare sub-pixel target detection. We apply this new architecture to commonly studied classification benchmarks public datasets such as Pavia University and Centre datasets, and to a new, large-scale airborne sub-pixel target detection dataset we developed. This new dataset is composed of over 100 M pixels taken over three collect days at three locations thousands of kilometers apart and in different climates. The proposed model reaches competitive performance on all of those tasks while reducing memory usage by 63 %.
|