Floating Point Math Hazards

What if you want to know the number of digits needed to represent an integer variable in base 10? You may want to know before printing if it will fit within a printed field. Furthermore, you’d like it to be done pretty efficiently.

Why not simply convert to a string and check the length? First off, that’s really slow. And well, what if you’re the one writing the string conversion code. Counting digits fast is a must for high performance serialization and other high-throughput applications like HFT.

In any case, on to the simple, obvious and wrong solution, why it’s wrong, and how to do it right.

Mathematically the number of base-10 digits is:

 digits = floor(log10(n))	 + 1

Looks pretty efficient (one significant function call.) And, it’s super easy to read and understand. I had this little function sitting around doing no harm for years. It mostly received small numbers and performed perfectly fine. At one time I’d had it using log10l() which hides it’s problems (more on that shortly.) It hid behind a expectedDigits(int64_t value) function name and I never gave the implementation another thought until the strange bug appeared…

Using a mmath formula feels cleanand elegant. however, it won’t always give the correct result in practice, but usually does, so it’s hard to discover in testing if you’re not careful. Also, a little more thought might lead us to question how fast the standard library log functions actually are. More on that later.

Using the log10() method comes with two distinct correctness problems:

The log10() function in the C standard library returns a 64-bit float (“double”) and does calculations using (typically) 64 bit floats, yet it accepts 64 bit integers which can represent larger integers than ‘double’ 64-bit floats can perfectly represent (53 bits.) Therefore if you hand log10() a very large number it converts it internally to a double and will be working with a slightly imprecise version of the original input before it even begins calculation.
In addition, due to the log calculation method used,(maybe a Taylor series?) the log functions can produce results that are imprecise even when the inputs are not actually too large to store in the floating point types. Related, a floating point number type can’t exactly represent any number so any function returning them should be expected to only give an approximation of some formulas.

Loss of precision on input

Within and beyond 16-digit size inputs, log10()’s result will necessarily be slightly off simply from loss of precision in the input – it’s converted to a double in the function. At this scale the gap between representable integers is greater than 1.0 – you don’t have enough bits in the 53 bits to perfectly match every 64 bit integer after all.

Since internally, numbers are stored as binary (powers of two) the loss of precision won’t fall on the powers of ten boundaries. The specific upper limit is 9,007,199,254,740,991. Any integer larger than this (which is in the middle of the 16-digit range) will be rounded to the nearest even number before calculations begin.

This limitation is at least understandable and predictable once you think about it.

Crossing number boundaries due to approximation in calculation

Worse yet, even as you approach 16 digits but haven’t exceeded the 53 bit limit, the log10() result may get imprecise enough to cross the integer boundary from below, to above the true value. As a person who knows math, you’d expect a fifteen digit number to always have a log base 10 of less than 15, but this is not always the case withthe the log10() function.

Let’s look at a specific example: 999999999999999 (10^15 - 1.) Plugging that into floor( log10(num)) +1, you get 16, not 15. There’s example code at the end of this post demonstrating this.

10^15 -1 is only a fifteen digit number, perfectly represented by less than 53 bits. So what’s the problem? A 64-bit float doesn’t have infinite resolution, while mathematically, the exact value of log10(10^15−1) is an infinitely long number starting with 14.999999999999999565… The problem lies in how floating-point architecture represents this result and the rule for representing approximated values.

Mathematically, we can show that the result in our example is actually closer to the number 15.0 than the approximation 64 bits can represent under 15.0. The IEEE standard says that calculations should round to the nearest representable number. The gap between each representable number is known as the ULP (Unit in Last Place.) At this scale the gap is too small to represent the tiny difference between the true result and the next 15.0 – it’s actually larger than the difference so 15.0 is more correct. It’s a better general purpose result for log10() to return than any 14.0 + any fractional number storable by a 64-bit IEEE 754 standard double float.

In math, abs( C’s log10(10^15-1) - ideal log10(10^15-1)) > abs(15.0 - ideal log10(10^15-1)).

To be clear, the floating point value returned by log10() result for large inputs remains still quite close to correct and very useful – but may be higher than the true value rather than lower (but still that higher value will be closer than the unrounded lower alternative.)

All this mostly matters only if you’re trying to get an answer that approaches a natural number, and the function over-shoots the integer boundary, breaking the floor(...) + 1 formula or some similar formula. Knowing when that’s the case makes getting the digit count tricky using the log10() approach, rendering the whole thing useless as an elegant approach.

Solutions

So log10() is not reliable, and it may not be all that fast anyhow, it just looks elegant.

Iterative division by 10 is the simplest portable alternative to find the number of base 10 digits. You have to account for the base case n of 0 yielding one digit, avoiding a 0/10 operation!

#include <cstdint>

int count_digits_iterative(uint64_t n) {
    int count = 1;
    
    while (n >= 10) {
        n /= 10;
        count++;
    }
    
    return count;
}

For a correct, much faster approach than a simple loop: Counting the number of digits in an integer quickly. It uses lookup tables.

Note 1: At least on Intel x86 platforms you can use ‘long double’ precision floats (80 bit) to avoid the miscalculation on large 15-digit numbers, but even there you can hit the same problem. If you use log10l(). with an input 18 digits long (‘999999999999999999’,) you get an answer of 19 unfortunately. An 80-bit float (which is what log10l() probably will use) has a 64-bit mantissa and 18 digits gets close to filling that. The actual problem in this case is still from a tiny rounding error.

Note 2: An additional difficulty is that while the log10l() function will be available in the standard library, its precision will vary depending on the platform. On ARM or Windows Visual C++, for example, long double is aliased to double (64-bit floats.) So for portability across platforms and reliability on large numbers, don’t count on log10l() functions from the standard library for extra precision.

Note 3: If performance isn’t a concern you can simply convert a number to a string and check its length assuming you didn’t request any special formatting like leading 0’s. It will be ten to fifty times slower than the simple division method.

Note 4: The log() functions in the C standard library aren’t all that fast. Modern implementations don’t use Taylor series – those can run for hundres of iterations. Their approach takes a constant time, but a big one – 40 to 100 CPU cycles. Modern compilers will convert the iterative division to bitshifting operations and so it will take two to three cycles per iteration. On the largest numbers it’s performance is similar to using log10() but at leastit’s actually correct. The approach in the Lemire post linked above can be between four to ten times faster than the iterative division algorithm.

This test code demonstrates the imprecise nature of log10().

#include <math.h>
#include <iostream>

using namespace std;

int64_t digits(uint64_t input) {
        return uint64_t(floor(log10(input))) + 1;
}

int64_t digits_long(uint64_t input) {
        return uint64_t(floor(log10l(input))) + 1;
}

int main() {
        uint64_t input_uint64 = 25;

        cout << "digits(25): " << digits(input_uint64) << endl;
        cout << "digits(999999999): " << digits(999999999) << endl;
        cout << "digits(99999999999999) should be 14, got " << digits(99999999999999) << endl;
        cout << "digits(999999999999999): should be 15, got " << digits(999999999999999) << endl;
        cout << "digits_long(999999999999999999): " << "should be 18, got " << digits_long(999999999999999999) << endl;

        return 0;
}

and the output is:

digits(25): 2
digits(999999999): 9
digits(99999999999999) should be 14, got 14
digits(999999999999999): should be 15, got 16
digits_long(999999999999999999): should be 18, got 19