Cross chromosomal similarity for DNA sequence compression

Wu, CPP; Law, NF; Siu, WC

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/25884

DC Field	Value	Language
dc.contributor	Department of Electronic and Information Engineering	-
dc.creator	Wu, CPP	-
dc.creator	Law, NF	-
dc.creator	Siu, WC	-
dc.date.accessioned	2015-08-28T04:30:33Z	-
dc.date.available	2015-08-28T04:30:33Z	-
dc.identifier.issn	0973-2063	en_US
dc.identifier.uri	http://hdl.handle.net/10397/25884	-
dc.language.iso	en	en_US
dc.publisher	Biomedical Informatics Publishing Group	en_US
dc.rights	© 2008 Biomedical Informatics Publishing Group	en_US
dc.rights	This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.	en_US
dc.rights	The following publication Wu, C. P. P., Law, N. F., & Siu, W. C. (2008). Cross chromosomal similarity for DNA sequence compression. Bioinformation, 2(9), 412-416 is available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2533061/	en_US
dc.subject	DNA	en_US
dc.subject	Sequence	en_US
dc.subject	Chromosome	en_US
dc.subject	Prediction	en_US
dc.subject	S. cerevisiae	en_US
dc.title	Cross chromosomal similarity for DNA sequence compression	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	412	en_US
dc.identifier.epage	416	en_US
dc.identifier.volume	2	en_US
dc.identifier.issue	9	en_US
dcterms.abstract	Current DNA compression algorithms work by finding similar repeated regions within the DNA sequence and then encoding these regions together to achieve compression. Our study on chromosome sequence similarity reveals that the length of similar repeated regions within one chromosome is about 4.5% of the total sequence length. The compression gain is often not high because of these short lengths. It is well known that similarity exist among different regions of chromosome sequences. This implies that similar repeated sequences are found among different regions of chromosome sequences. Here, we study cross-chromosomal similarity for DNA sequence compression. The length and location of similar repeated regions among the sixteen chromosomes of S. cerevisiae are studied. It is found that the average percentage of similar subsequences found between two chromosome sequences is about 10% in which 8% comes from cross-chromosomal prediction and 2% from self-chromosomal prediction. The percentage of similar subsquences is about 18% in which only 1.2% comes from self-chromosomal prediction while the rest is from cross-chromosomal prediction among the 16 chromosomes studied. This suggests the importance of cross-chromosomal similarities in addition to self-chromosomal similarities in DNA sequence compression. An additional 23% of storage space could be reduced on average using self-chromosomal and cross-chromosomal predictions in compressing the 16 chromosomes of S. cerevisiae.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Bioinformation, 2008, v. 2, no. 9, p. 412-416	-
dcterms.isPartOf	Bioinformation	-
dcterms.issued	2008	-
dc.identifier.rosgroupid	r40621	-
dc.description.ros	2008-2009 > Academic research: refereed > Publication in refereed journal	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_IR/PIRA	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	VoR allowed	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Wu_Cross_Chromosomal_Similarity.pdf		130.24 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Page views

236

Last Week
2

Last month

Citations as of Feb 9, 2026

Downloads

89

Citations as of Feb 9, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

Google ScholarTM

Google Scholar^TM