Yuhang He's Blog

Some birds are not meant to be caged, their feathers are just too bright.

Fast compare the absolute value of floating-point values

Existing computer system is notoriously inefficient to process floating related numbers, but excels at bitwise operation. A natural desirable way to accelerate an algorithm where floating number related process is frequently called is to transform floating point arithmetic into bitwise operation. Not all floating point arithmetics are bitwise operation replaceable, but here comes one: compare the absolute value of floating numbers. Before directly entering into the trick, we begin with briefly introducing floating-point value storage format in computer memory.

Floating-point Storage in Computer Memory

Of course, a floating-point value is stored as 0-1 binary value in a contiguous block of memory. Let’s use the 32-bit floating-point value here for example. An intuitive anatomy of a floating-point value storage is that we have to three parts: plus-minus sign, integer part and decimal part. Similar but not exactly the same, computer system stores a float-point value as a binary scientific notation. For example, $1101.1101$ is represented as $1.1011101 \times 2^3$. Mathematically, any floating-point value can be expressed as:

where $s\in {0,1}$ is the plus-minus sign bit, $M$ is the mantissa and $E$ is the exponent. For example, in $1.1011101 \times 2^3$, $s=0$, $M=1.1011101$ and $E=3$. In computer system, the contiguous block of memeory is used to store the sign bit $s$, mentissa $M$ and exponent $E$ sequentially, as is shown in the following figure,

float-point-figure

Note that the exponent $E$ can be both plus and minus. To satisfy the plus-minus property, the stored $E$ is deliberately plused by 177. Since the integer part of $M$ is always 1, we can directly remove the integer 1 from memory block for high memory usage but add it when analyizing it. By the way, the way converting the decimal part of a decimal value into binary codes works as by sequentially multipying 2 to gets its integer until no decimal parts is left or reaches the maximum storage length. For example, $0.14$ is binarily encodes as: $0.14 \times 2 = 0.28$, $0$ is kept. $0.28 \times 2 = 0.56 $, $0$ is kept. $0.56 \times 2 = 1.12 $, $1$ is kept. Thus, the final binary code is $001\cdots$.

Compare Two Floating-point Values via Bitwise Operation

With the floating-point storage mechanism introduced above, we can easily find out that, except the first binary sign bit, the larger of $E$ or the larger of $M$ if $E$ is the same, surely the larger the absolute value of the floating-point value is! Yes! We can definitely treat this contiguous memory block as integer block, and directly compare the the pseudo integer it represents! This is the core idea of this blog. For implementation, it works as:

1
2
3
if( ((int&)(f1) & 0x7FFFFFFF) < ((int&)(f2) & 0x7FFFFFFF)){
	fprintf( stderr, "this is bitwise operation comparison!\n");
}

while the conventional C++ function is:

1
2
3
4
#include<cmath>
if( fabsf(f1) < fabsf(f2) ){
	fprintf( stderr, "this is conventional comparison!\n");
}

Note that (int&) here means brutally converting the float-point memory block into integer block. 0x7FFFFFFF is a 32-bit hexadecimal with all bits are 1, except the fist one is 0. By running the & bitwise operation, we can get the pseudo integer by ignoring its first plus-minus sign.

Experiment Comparison

I test bitwise operation and conventional fabsf() time on Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz by differentiating the cycle number. The result is shown in the following table:

Cycle times CPU cycles(bitwise) CPU cycles(fabs())
1*10^9 3.71 * 10^6 4.61 * 10^6
1*10^8 0.39 * 10^6 0.47 * 10^6
1*10^7 5000 5000
1*10^6 0 10000