TY - GEN
T1 - Word-based block-sorting text compression
AU - Isal, R. Yugo Kartono
AU - Moffat, A.
N1 - Publisher Copyright:
© 2001 IEEE.
PY - 2001
Y1 - 2001
N2 - Block sorting is an innovative compression mechanism introduced in by M. Burrows and D.J. Wheeler (1994). It involves three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. Until now, block-sorting implementations have assumed that the input message is a sequence of characters. In this paper, we extend the block-sorting mechanism to word-based models. We also consider other transformations as an alternative to MTF, and are able to show improved compression results compared to MTF. For large text files, the combination of word-based modelling, BWT and MTF-like transformations allows excellent compression effectiveness to be attained within reasonable resource costs.
AB - Block sorting is an innovative compression mechanism introduced in by M. Burrows and D.J. Wheeler (1994). It involves three steps: permuting the input one block at a time through the use of the Burrows-Wheeler transform (BWT); applying a move-to-front (MTF) transform to each of the permuted blocks; and then entropy coding the output with a Huffman or arithmetic coder. Until now, block-sorting implementations have assumed that the input message is a sequence of characters. In this paper, we extend the block-sorting mechanism to word-based models. We also consider other transformations as an alternative to MTF, and are able to show improved compression results compared to MTF. For large text files, the combination of word-based modelling, BWT and MTF-like transformations allows excellent compression effectiveness to be attained within reasonable resource costs.
UR - http://www.scopus.com/inward/record.url?scp=34547622455&partnerID=8YFLogxK
U2 - 10.1109/ACSC.2001.906628
DO - 10.1109/ACSC.2001.906628
M3 - Conference contribution
AN - SCOPUS:34547622455
T3 - Proceedings - 24th Australasian Computer Science Conference, ACSC 2001
SP - 92
EP - 99
BT - Proceedings - 24th Australasian Computer Science Conference, ACSC 2001
A2 - Oudshoom, Michael
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th Australasian Computer Science Conference, ACSC 2001
Y2 - 29 January 2001 through 2 February 2001
ER -