Non-Properties of floating point numbers
[I’m moving some my more interesting old blogs from web archive to someplace they can actually be found. Keep in mind these were written a lot time ago.]
I recently received a customer question that boiled down to the oft-encountered binary floating point inexact representation issue. They were rather troubled that basic identities normal to numbers did not apply to floating point arithmetic.
Learning that floating point is kind of messed up is kind of like finding out there is no Santa Claus. Sorry if I just spoiled that for you :)
It made me think about some of my undergraduate math, and so I decided to illustrate just how messed up floating point numbers are.
What are some of the properties of floating point math compared to normal numbers? Do they form a “Group”?
For all a, b in G, the result of the operation, a + b, is also in G.
Well, no, we lose already because floating point numbers can overflow.
Oops. OK but nevermind that, it’s mostly closed right? I mean overflow hardly ever happens. Let’s trudge on bravely.
For all a, b and c in G, (a + b) + c = a + (b + c).
That looks harmless enough right? I can just reorder the parens, what could possibly go wrong?
There exists an element e in G, such that for every element a in G, the equation e + a = a + e = a holds. Such an element is unique, and thus one speaks of the identity element.
OK this one is really easy right?
All I need is a unique identity element, that’s the “zero” and I can add zero and get the same thing back.
Great! But wait…
And 1e-100 isn’t another name for zero…
For each a in G, there exists an element b in G such that a + b = b + a = 0
OK on this one I think we’re good. If I started with a number I can represent in floating point, x, then there is exactly one number I can add to x which will give me zero and that is -x. Now this may seem a little like I’m cheating because
Seems to indicate that there is another number that I could subtract to get zero.
As we can see 1.111111111111111111111111111111 isn’t a valid floating point number on my system so it’s not fair to try to use it. It’s automatically converted to the best approximation which is 1.1111111111111111.
Now, anyone want to tell me why addition of floating point numbers is necessarily commutative?