teach-ict.com logo

THE education site for computer science and ICT

2. Basic principle

Compression depends on patterns being present in the information. This is true of text, image, video or music data.

A pattern implies that some of the data is identical, but is just located in a different place in the file.

Consider a text file that just contains the same statement over and over again.

test file example

The original data file has a 100 words in it, ignoring the spaces. But the information (as opposed to the data) is really just the first sentence containing the ten words "All work and no play makes Jack a dull boy". This sentence is repeated 10 times.

A text compression algorithm would spot this pattern and create a file that effectively only contains the first 10 words plus some instructions about repeating it.

Therefore 100 words become just 10 along with a few additional details. As a result, this has compressed the file by a factor of almost ten.

The efficiency of compression is a simple formula: -

compression ratio data size

            

Compression is also used by video and music streaming services to reduce the bit rate per second. The compression formula then becomes

Compression Ratio data rate