Paper
11 October 2023 Lossless compression for quality scores of genomic data based on high-throughput sequencing
Xingxi He
Author Affiliations +
Proceedings Volume 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023); 128000G (2023) https://doi.org/10.1117/12.3004030
Event: 6th International Conference on Computer Information Science and Application Technology (CISAT 2023), 2023, Hangzhou, China
Abstract
The FASTQ format is a common genome file format generated by sequencing, mainly composed of identifiers, bases and quality scores. The quality score is an indicator representing the error rate of a base, which corresponds to the base. Although the compression of bases has achieved remarkable developments in recent years, the compression of quality scores remains challenging. This paper mainly improves ACO, a compression algorithm for quality scores. The main content of the algorithm includes the following parts. In terms of traversal quality scores, we comprehensively consider the sequencing process and the principle of FASTQ file generation. Instead of using the traditional raster scanning order, we choose a more reasonable serpentine traversal order. Base changes are introduced as contextual information. The weighted average of the quality score sequence is used as the index of the statistical characteristics of the sequence to perform clustering to improve the coding performance.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Xingxi He "Lossless compression for quality scores of genomic data based on high-throughput sequencing", Proc. SPIE 12800, Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), 128000G (11 October 2023); https://doi.org/10.1117/12.3004030
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Genomics

Data storage

Modeling

Data analysis

Signal processing

Statistical analysis

Back to Top