Joint merging and pruning: adaptive selection of better token compression strategy

Wei Peng; Liancheng Zeng; Lizhuo Zhang; Yue Shen

doi:10.1117/1.JEI.33.4.043045

16 August 2024 Joint merging and pruning: adaptive selection of better token compression strategy

Wei Peng, Liancheng Zeng, Lizhuo Zhang, Yue Shen

Author Affiliations +

Journal of Electronic Imaging, Vol. 33, Issue 4, 043045 (August 2024). https://doi.org/10.1117/1.JEI.33.4.043045

Abstract

Vision transformer (ViT) is widely used to handle artificial intelligence tasks, making significant advances in a variety of computer vision tasks. However, due to the secondary interaction between tokens, the ViT model is inefficient, which greatly limits the application of the ViT model in real scenarios. In recent years, people have noticed that not all tokens contribute equally to the final prediction of the model, so token compression methods have been proposed, which are mainly divided into token pruning and token merging. Yet, we believe that neither pruning only to reduce non-critical tokens nor merging to reduce similar tokens are optimal strategies for token compression. To overcome this challenge, this work proposes a token compression framework: joint merging and pruning (JMP), which adaptively selects a better token compression strategy based on the similarity between critical tokens and non-critical tokens in each sample. JMP effectively reduces computational complexity while maintaining model performance and does not require the introduction of additional trainable parameters, achieving a good balance between efficiency and performance. Taking DeiT-S as an example, JMP reduces floating point operations by 35% and increases throughput by more than 45% while only decreasing accuracy by 0.2% on ImageNet.

Citation Download Citation

Wei Peng, Liancheng Zeng, Lizhuo Zhang, and Yue Shen "Joint merging and pruning: adaptive selection of better token compression strategy," Journal of Electronic Imaging 33(4), 043045 (16 August 2024). https://doi.org/10.1117/1.JEI.33.4.043045

Received: 23 April 2024; Accepted: 26 July 2024; Published: 16 August 2024

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

;

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $24.00

Non-members: $28.00 ADD TO CART

JOURNAL ARTICLE
13 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Education and training

Performance modeling

Transformers

Matrices

Visual process modeling

Data modeling

Statistical modeling

Show All Keywords

Keywords/Phrases

Search In:

Publication Years