There is a relevant Wikipedia page about minifloats [0]
> The smallest possible float size that follows all IEEE principles, including normalized numbers, subnormal numbers, signed zero, signed infinity, and multiple NaN values, is a 4-bit float with 1-bit sign, 2-bit exponent, and 1-bit mantissa.
Even the latest CPUs have a 2:1 fp64:fp32 performance ratio - plus the effects of 2x the data size in cache and bandwidth use mean you can often get greater than a 2x difference.
If you're in a numeric heavy use case that's a massive difference. It's not some outdated "Ancient Lore" that causes languages that care about performance to default to fp32 :P
> Even the latest CPUs have a 2:1 fp64:fp32 performance ratio
Not completely - for basic operations (and ignoring byte size for things like cache hit ratios and memory bandwidth) if you look at (say Agner Fog's optimisation PDFs of instruction latency) the basic SSE/AVX latency for basic add/sub/mult/div (yes, even divides these days), the latency between float and double is almost always the same on the most recent AMD/Intel CPUs (and normally execution ports can do both now).
Where it differs is gather/scatter and some shuffle instructions (larger size to work on), and maths routines like transcendentals - sqrt(), sin(), etc, where the backing algorithms (whether on the processor in some cases or in libm or equivalent) obviously have to do more work (often more iterations of refinement) to calculate the value to greater precision for f64.
> In ancient times, floating point numbers were stored in 32 bits.
I thought in ancient times, floating point numbers used to be 80 bit. They lived in a funky mini stack on the coprocessor (x87). Then one day, somebody came along and standardized those 32 and 64 bit floats we still have today.
80 bits is just in the processor. Thats why you might a little bit different result, depending how you calculated first and maybe stored something in the RAM
I was going to reply that just because intel did something funny doesn't mean that it was the beginning of the story. but it turns out that the release of the 8087 predates the ratification of IEEE floats by 2 years. in addition, the primary numeric designer for the 8087 was apparently Kahan, which means that they were both part of the same design process. of course there were other formats predating both of these
The floating point "standard" was basically codifying multiple different vendor implementations of the same idea. Hence the mess that floating point is not consistent across implementations.
FP4 1:2:0:1 (other examples: binary32 1:8:0:23, 8087 ep 1:15:1:63)
S:E:l:M
S = sign bit present (or magnitude-only absolute value)
E = exponent bits (typically biased by 2^(E-1) - 1)
l = explicit leading integer present (almost always 0 because the leading digit is always 1 for normals, 0 for denormals, and not very useful for special values)
M = mantissa (fraction) bits
The limitations of FP4 are that it lacks infinities, [sq]NaNs, and denormals that make it very limited to special purposes only. There's no denying that it might be extremely efficient for very particular problems.
If a more even distribution were needed, a simpler fixed point format like 1:2:1 (sign:integer:fraction bits) is possible.
There is a relevant Wikipedia page about minifloats [0]
> The smallest possible float size that follows all IEEE principles, including normalized numbers, subnormal numbers, signed zero, signed infinity, and multiple NaN values, is a 4-bit float with 1-bit sign, 2-bit exponent, and 1-bit mantissa.
[0] https://en.wikipedia.org/wiki/Minifloat
> Programmers were grateful for the move from 32-bit floats to 64-bit floats. It doesn’t hurt to have more precision
Someome didn't try it on GPU...
Even the latest CPUs have a 2:1 fp64:fp32 performance ratio - plus the effects of 2x the data size in cache and bandwidth use mean you can often get greater than a 2x difference.
If you're in a numeric heavy use case that's a massive difference. It's not some outdated "Ancient Lore" that causes languages that care about performance to default to fp32 :P
> Even the latest CPUs have a 2:1 fp64:fp32 performance ratio
Not completely - for basic operations (and ignoring byte size for things like cache hit ratios and memory bandwidth) if you look at (say Agner Fog's optimisation PDFs of instruction latency) the basic SSE/AVX latency for basic add/sub/mult/div (yes, even divides these days), the latency between float and double is almost always the same on the most recent AMD/Intel CPUs (and normally execution ports can do both now).
Where it differs is gather/scatter and some shuffle instructions (larger size to work on), and maths routines like transcendentals - sqrt(), sin(), etc, where the backing algorithms (whether on the processor in some cases or in libm or equivalent) obviously have to do more work (often more iterations of refinement) to calculate the value to greater precision for f64.
> languages that care about performance to default to fp32
What do you mean by this? In C 1.0 is a double.
> In ancient times, floating point numbers were stored in 32 bits.
I thought in ancient times, floating point numbers used to be 80 bit. They lived in a funky mini stack on the coprocessor (x87). Then one day, somebody came along and standardized those 32 and 64 bit floats we still have today.
80 bits is just in the processor. Thats why you might a little bit different result, depending how you calculated first and maybe stored something in the RAM
I was going to reply that just because intel did something funny doesn't mean that it was the beginning of the story. but it turns out that the release of the 8087 predates the ratification of IEEE floats by 2 years. in addition, the primary numeric designer for the 8087 was apparently Kahan, which means that they were both part of the same design process. of course there were other formats predating both of these
The floating point "standard" was basically codifying multiple different vendor implementations of the same idea. Hence the mess that floating point is not consistent across implementations.
FP4 1:2:0:1 (other examples: binary32 1:8:0:23, 8087 ep 1:15:1:63)
S:E:l:M
S = sign bit present (or magnitude-only absolute value)
E = exponent bits (typically biased by 2^(E-1) - 1)
l = explicit leading integer present (almost always 0 because the leading digit is always 1 for normals, 0 for denormals, and not very useful for special values)
M = mantissa (fraction) bits
The limitations of FP4 are that it lacks infinities, [sq]NaNs, and denormals that make it very limited to special purposes only. There's no denying that it might be extremely efficient for very particular problems.
If a more even distribution were needed, a simpler fixed point format like 1:2:1 (sign:integer:fraction bits) is possible.