Paper
1 May 2022 Referential genome sequence compression with low memory consumption
Zhiwen Lu, Jianhua Chen, Rongshu Wang
Author Affiliations +
Proceedings Volume 12171, Thirteenth International Conference on Signal Processing Systems (ICSPS 2021); 121711A (2022) https://doi.org/10.1117/12.2631583
Event: Thirteenth International Conference on Signal Processing Systems (ICSPS 2021), 2021, Shanghai, China
Abstract
With the rapid development of genome sequencing technology, a large amount of genome data has been generated, it also brings the storage problem of this massive data. Therefore, the compression of genome data has become a research hotspot. We propose a new genome data compression algorithm called LCMRGC (low memory consumption referential genome compressor) for FASTA format sequences. The algorithm uses the suffix array data structure to support the search of matching strings, and uses the binary search method to accelerate accurate matching, so as to obtain better compression ratio. Experiment results on standard genome data show that the proposed algorithm significantly reduces the memory requirement for program operation, and is competitive in compression ratio and compression time.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Zhiwen Lu, Jianhua Chen, and Rongshu Wang "Referential genome sequence compression with low memory consumption", Proc. SPIE 12171, Thirteenth International Conference on Signal Processing Systems (ICSPS 2021), 121711A (1 May 2022); https://doi.org/10.1117/12.2631583
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Detection and tracking algorithms

Data compression

Binary data

Data storage

Data conversion

Information science

Lutetium

Back to Top