Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107184
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorCheng, KO-
dc.creatorLaw, NF-
dc.creatorSiu, WC-
dc.date.accessioned2024-06-13T01:04:26Z-
dc.date.available2024-06-13T01:04:26Z-
dc.identifier.issn1545-5963-
dc.identifier.urihttp://hdl.handle.net/10397/107184-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rights© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.rightsThe following publication K. -O. Cheng, N. -F. Law and W. -C. Siu, "Clustering-Based Compression for Population DNA Sequences," in IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 16, no. 1, pp. 208-221, Jan.-Feb. 2019 is available at https://doi.org/10.1109/TCBB.2017.2762302.en_US
dc.subjectBiology and geneticsen_US
dc.subjectClusteringen_US
dc.subjectCompression technologiesen_US
dc.subjectData compaction and compressionen_US
dc.titleClustering-based compression for population DNA sequencesen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage208-
dc.identifier.epage221-
dc.identifier.volume16-
dc.identifier.issue1-
dc.identifier.doi10.1109/TCBB.2017.2762302-
dcterms.abstractDue to the advancement of DNA sequencing techniques, the number of sequenced individual genomes has experienced an exponential growth. Thus, effective compression of this kind of sequences is highly desired. In this work, we present a novel compression algorithm called Reference-based Compression algorithm using the concept of Clustering (RCC). The rationale behind RCC is based on the observation about the existence of substructures within the population sequences. To utilize these substructures, k-means clustering is employed to partition sequences into clusters for better compression. A reference sequence is then constructed for each cluster so that sequences in that cluster can be compressed by referring to this reference sequence. The reference sequence of each cluster is also compressed with reference to a sequence which is derived from all the reference sequences. Experiments show that RCC can further reduce the compressed size by up to 91.0 percent when compared with state-of-the-art compression approaches. There is a compromise between compressed size and processing time. The current implementation in Matlab has time complexity in a factor of thousands higher than the existing algorithms implemented in C/C++. Further investigation is required to improve processing time in future.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIEEE/ACM transactions on computational biology and bioinformatics, Jan.-Feb. 2019, v. 16, no. 1, p. 208-221-
dcterms.isPartOfIEEE/ACM transactions on computational biology and bioinformatics-
dcterms.issued2019-01-
dc.identifier.scopus2-s2.0-85054078610-
dc.identifier.pmid29028207-
dc.identifier.eissn1557-9964-
dc.description.validate202403 bckw-
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumberEIE-0424en_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextHong Kong SAR Government; Centre for Signal Processing, The Hong Kong Polytechnic Universityen_US
dc.description.pubStatusPublisheden_US
dc.identifier.OPUS19838749en_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Cheng_Clustering-Based_Compression_Population.pdfPre-Published version14.95 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

1
Citations as of Jun 30, 2024

Downloads

2
Citations as of Jun 30, 2024

SCOPUSTM   
Citations

13
Citations as of Jun 27, 2024

WEB OF SCIENCETM
Citations

8
Citations as of Jun 27, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.