--== FLOAT 2 INT ==-- by: Alex Chalfin (aka Phred) achalfin@one.net Since the introduction of the Intel Pentium chip, many programmers have switched from fixed point mathematics system to floating point. This is due to the superior floating point unit on the Pentium chip. However, this has left a small debate within the programming world. What is the best way to perform a floating point to integer conversion? I will present 4 methods for float to int conversion in this document. It is up to you to decide which is best for you. Method 1: --------- Typecasting. This method is a high level language method for converting a float to an integer. Here is a small piece of code demonstrating it: MyInt = (int)MyFloat; Advantages: - Completely portable and standard. - Works with float and double without modification. - Performs correct rounding. Disadvantages: - Heavily compiler dependant. - Tends to be slow (i.e. Watcom's slow typecast). Method 2: --------- Explicit FPU instruction to convert to an integer. On the x86 platform, the instructions take the following form: fist dword ptr [eax] ; store integer -or- fistp dword ptr [eax] ; store integer and pop Using this form on x86 platforms generally avoids the overhead associated with the compiler type casting. When compared to the typecasting under the Watcom 10.6 compiler, the cycle count dropped from 40 to 6. Advantages: - Good performance - Works with float and double without modification. Disadvantages: - x86 CPU dependant - requires assembler (not really a disadvantage) - requires 6 cycles (6 cycles for 1 instruction is quite a bit) - Ignores rounding state of the FPU Method 3: --------- Magic number/fadd trick. This method uses a trick in the IEEE double format to perform the typecasting without actual conversion. int FLT2INT {0,0x43380000}; int FLT2FXD24_8 {0,0x42B80000}; int FLT2FXD16_16 {0,0x42380000}; int FLT2FXD8_24 {0,0x41B80000}; int TEMP {0,0}; fadd qword ptr [FLT2INT]; fstp qword ptr [TEMP]; Mov eax,[TEMP+4]; Advantages: - Good performance Disadvantages: - Dependant on "double" data type, doesn't work on "float". - Extra constants (has to be stored as a double or two ints). - Ignores rounding Method 4: --------- Integer pipeline conversion. This method takes the IEEE float format and uses it completely to convert to an integer. FltInt = *(int *)&MyFloat; mantissa = (FltInt & 0x07fffff) | 0x800000; exponent = 150 - ((FltInt >> 23) & 0xff); if (exponent < 0) MyInt = (mantissa << -exponent); else MyInt = (mantissa >> exponent); if (FltInt & 0x80000000) MyInt = -MyInt; Advantages: - Good performance - Pure integer pipeline based (good for pairing with FPU) Disadvantages: - Separate routines necessary for floats and doubles. - Costly jump to handle negatives (can hurt on PPro machines) - Ignores rounding Stuff ----- The main purpose of this document is to introduce the fourth method of float to int conversion. I had never seen anything like it and I thought it was pretty cool. Here is how it works in a little bit more detail: IEEE 32-bit floating point number: 31 30 23 0 ________________________________ |s| exp | mantissa | -------------------------------- What this diagram shows is the 23-bit mantissa, the 8-bit exponent, and the 1-bit sign. The first stage of the conversion is to extract the mantissa. This is done with simple bit masking. mantissa = (FltInt & 0x07fffff); With IEEE numbers, the most significant bit is always assumed to be set. This is why the mantissa bits are all zeros for numbers which are powers of two (like 16, 256, etc.). This bit needs to be inserted into the mantissa. mantissa |= 0x800000; Now the exponent needs to be extracted. This is a simple shift and mask operation. Since IEEE numbers also have a bias attached with each of their exponents, it has to be removed before the exponent is useful. For 32-bit floats the exponent bias is 0x7f (127). exponent = ((FltInt >> 23) & 0xff) - 0x7f; The final step in the conversion is to move the mantissa bits into the proper location. This is done by taking the exponent and subtracting it from the number of bits in the mantissa (23), then shifting the mantissa the correct amount. If your number is greater than (1 << 23) (or less than -(1 << 23)) the shift number will be negative. In this case you must shift the exponent left instead of right: if ((23 - exponent) < 0) MyInt = (mantissa << -(23 - exponent)); else MyInt = (mantissa >> (23 - exponent)); One last thing remains. The sign of the number. For this all you have to do is test the sign bit from the original number (if s = 1, the number is negative). if (FltInt & 0x80000000) MyInt = -MyInt; That's it! you now have a nice integer number. The same method can also be used for doubles. You just have to change a few things around to extract the mantissa and exponents from the original number. A note about Methods 2-4 ------------------------ To use these methods, you need to store the float value, then retrieve it again. On the Intel architecture, a write/read combination is approx. 40 cycles (plus the cycles for that routine). So unless you're doing your stuff in pure ASM, you can't delay the read-back of the stored value. Greets to Midnight (Paul Nettle) for providing the fadd magic numbers and the tid bit about the memory read/write latency.