3. Run Length Encoding format

The previous page described the principle of RLE. However in reality, it is not practical to insert a '4' into the data because how is an application meant to spot it from all the other bytes?

The solution, is to format the data to use two bytes for each "run" (section of data). The first byte is the run length and the second byte is the data itself.

Look at the following 25 character string:

                              PPPPPPPPPPPPPPPPPPPPPPPPP

To encode this with RLE, you set the run length as the first value, and the second value is the data itself. So it would now become:

25P

No data has been lost, but the information is now only 2 bytes long rather than 25.

Now let's look at a more mixed data set:

                              PPPPRRPPPPPPPPPXXXXXRPX

The rule is: the first byte is the run length and the second is the data itself. So the RLE encoding for the above data set would be:

                               4P 2R 9P 5X 1R 1P 1X

This is 14 bytes long compared to 23 bytes.

Run length encoding uses two bytes for every run - even runs that are only one unit long. So it's possible that, if there were not enough repeating patterns, RLE would result in a bigger file since it adds an extra byte to every data item. This is why it only makes sense to use RLE if there is a pattern inside the data that allows compression.

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: What is run length encoding