> For double/bigint joins that leads to observable differences between joins and plain comparisons, which is very bad.
This was one of the bigger hidden performance issues when I was working on Hive - the default coercion goes to Double, which has a bad hash code implementation [1] & causes joins to cluster & chain, which caused every miss on the hashtable to probe that many away from the original index.
The hashCode itself was smeared to make values near Machine epsilon to hash to the same hash bucket so that .equals could do its join, but all of this really messed up the folks who needed 22 digit numeric keys (eventually Decimal implementation handled it by adding a big fixed integer).
Databases and Double join keys was one of the red-flags in a SQL query, mostly if you see it someone messed up something.
That won't work; as integers, 100.02 and 99.997 are unequal, but 1.0002 and 0.99997 are equal at 0.01 precision. (And indeed also equal at 0.001 precision!) You'd need to round.
I had the impression that the usual way to compare floats is to define a precision and check for -p < (a - b) < p. In this case 0.99997 - 1.0002 = -0.00023, which correctly tells us that the two numbers are equal at 0.001 precision and unequal at 0.0001.
Both ints and floats represent real, rational values, but every operation in no way matches math. Associative? No. Commutative? No. Partially Ordered? No. Weakly Ordered? No. Symmetric? No. Reflexive? No. Antisymmetric? No. Nothing.
The only reasonable way to compare rationals is the decimal expansion of the string.
What exactly do you say is not commutative? This Wikipedia article claims that at least floating-point addition and multiplication are both commutative:
> For double/bigint joins that leads to observable differences between joins and plain comparisons, which is very bad.
This was one of the bigger hidden performance issues when I was working on Hive - the default coercion goes to Double, which has a bad hash code implementation [1] & causes joins to cluster & chain, which caused every miss on the hashtable to probe that many away from the original index.
The hashCode itself was smeared to make values near Machine epsilon to hash to the same hash bucket so that .equals could do its join, but all of this really messed up the folks who needed 22 digit numeric keys (eventually Decimal implementation handled it by adding a big fixed integer).
Databases and Double join keys was one of the red-flags in a SQL query, mostly if you see it someone messed up something.
[1] - https://issues.apache.org/jira/browse/HADOOP-12217
One simple solution would be to convert both operands to 80/128 bit float, which should avoid any precision loss, and compare those?
or you could learn about how to do comparisons with floating point numbers
like multiplying them by the precision that you'd like to compare and comparing them as integers? /s
That won't work; as integers, 100.02 and 99.997 are unequal, but 1.0002 and 0.99997 are equal at 0.01 precision. (And indeed also equal at 0.001 precision!) You'd need to round.
I had the impression that the usual way to compare floats is to define a precision and check for -p < (a - b) < p. In this case 0.99997 - 1.0002 = -0.00023, which correctly tells us that the two numbers are equal at 0.001 precision and unequal at 0.0001.
Both ints and floats represent real, rational values, but every operation in no way matches math. Associative? No. Commutative? No. Partially Ordered? No. Weakly Ordered? No. Symmetric? No. Reflexive? No. Antisymmetric? No. Nothing.
The only reasonable way to compare rationals is the decimal expansion of the string.
It’s not straightforward to compare numerical ordering using the decimal expansion.
What exactly do you say is not commutative? This Wikipedia article claims that at least floating-point addition and multiplication are both commutative:
https://en.wikipedia.org/wiki/Floating-point_arithmetic#Accu...
it is for finite values, but because IEEE did some dumb things it isn't specified to be for NaN values (and on several architectures, isn't).
> The only reasonable way to compare rationals is the decimal expansion of the string.
Careful, someone is liable to throw this in an LLM prompt and get back code expanding the ASCII characters for string values like "1/346".