THE education site for computer science and ICT

1. Floating point limits

Real numbers (those with decimal parts) are represented as binary floating point numbers. We have a section on it here.

Computers store floating point numbers in memory. They use a fixed number of bits for each number. For example a 32 bit computer uses 32 bits of memory (4 bytes) to store each floating point number.

This means that there is a limit as to the range of numbers that can be represented in a given amount of memory.

Floating point was developed because it can cover a wide range and yet have reasonable precision. It can do so because the decimal point is allowed to 'float' compared to a fixed point scheme.

A 32 bit represenation is called 'single precision' and using the computer industry standard format for floating point (IEEE standard) the largest number is

$$±3.4 \times 10^{38}$$

with a precision of about 7 decimal digits.

On the other hand the smallest number that it can represent is

$$±1.4 \times 10^{-45}$$

This is quite all right for most applications. But for scientific or engineering application running on the computer such as CAD or a simulation, you might need more precision. For this, you need more space to store each number. This is called 'double precision', and uses 64 bits (8 bytes) to store a single floating point number. This vastly increases the range but it runs a lot slower.

Challenge see if you can find out one extra fact on this topic that we haven't already told you

Click on this link: Floating point IEEE 754