There is a disturbing amount of people willing to compare floating point numbers directly for equality. For most intents and purposes this is harmless, but if one talks about comparing physical quantities it can be a problem. The fact that many people ignore is that real numbers can not be represented accurately in computers. Irrational numbers have infinite number of terms after the decimal points and whatever we do, it is necessary to truncate somewhere to fit a number in the finite memory of our computers.
Another more often acknowledged problem is the dynamic range of physical quantities. While the mass of cars, people or other everyday objects are in the comfortable zone of kilograms up to tonnes, the mass of stars are way beyond these magnitudes, and to make matters worse the mass of particles are on the other far end of the scale. It is obvious that some sort of normalization should be employed to bring all that dynamic range into manageable chunks. Naturally this is done by simply scaling the quantities and is not such a problem. Still it is necessary to be able to represent both large and small real numbers, so here the floating point numbers come in play. The structure of a floating point number is pretty simple – there is a fixed point number between zero and one which is multiplied by an exponent (for computers, conveniently, it is a power of 2):
Where m is the mantissa, and p is the power (sl. exponent).It is pretty obvious that truncating the fixed point number (the mantissa) and the power of the exponent imposes some constraints on the representation of the numbers:
- The dynamic range of the numbers is finite because there is no way to represent any arbitrary integer for the exponent.
- The precision is finite because one can't represent a fixed point number with infinite number of terms after the decimal dot.
All that said I will go back to the original problem – how to check if floating point numbers are equal?
It should be obvious by now that the imprecise representation excludes just naïvely comparing them with the equality operator and while I am not proponent of the epsilon-delta formalism in mathematics here it is quite appropriate. The idea is to set an epsilon representing the maximum relative difference between the two numbers and use it as a threshold.
No comments:
Post a Comment