Wednesday, February 2, 2011

Working Around the Java Double.parseDouble Bug

You may have seen recently that Java suffers from a similar floating-point parsing bug to the one that recently affected PHP users. The basic gist of it is that for this special 64-bit floating point value, the Java call Double.parseDouble("2.2250738585072012e-308") will get stuck in an infinite loop. Read the link above to understand what's happening.

Naturally, this affects all JVM languages too, since we all use Double.parseDouble for something or another. In fact, it affects almost all the JVM language parsers and compilers (including javac itself), since they need to turn strings into doubles.

Being the upright citizens we are on the JRuby team, we figured we'd try to beat Oracle to the punch and patch around the bug, at least for Ruby-land conversions of String to Float.

I started by looking for calls to Double.parseDouble in JRuby. It turned out there were only two: one for the lexer, and one used by String#to_f, BigDecimal#new, and so on. That was a relief; I expected to find dozens of calls.

It also turned out all cases had already parsed out Ruby float literal oddities, like underscores, using 'd' or 'D' for the exponentiation marker, allowing ill-formatted exponents to be treated as zero, and so on.

My first attempt was to simply normalize the cleaned-up string and pass it to new java.math.BigDecimal(), converting that result back to a primitive double. Unfortunately, BigDecimal's constructor *also* passes through the offending Double.parseDouble code, and we're back where we started.

Ultimately, I ended up with the following code. I make no claims this is efficient, but it appears to pass all the Float tests and specs for JRuby and does not DOS like the bad code in Double.parseDouble:

    public static double parseDouble(String value) {
        String normalString = normalizeDoubleString(value);
        int offset = normalString.indexOf('E');
        BigDecimal base;
        int exponent;
        if (offset == -1) {
            base = new BigDecimal(value);
            exponent = 0;
        } else {
            base = new BigDecimal(normalString.substring(0, offset));
            exponent = Integer.parseInt(normalString.charAt(offset + 1) == '+' ?
                normalString.substring(offset + 2) :
                normalString.substring(offset + 1));
        }
        return base.scaleByPowerOfTen(exponent).doubleValue();
    }


I didn't say it was particularly clever or efficient...but there you have it. A few notes:
  • Do I really need UNLIMITED precision here? I almost used it to ensure there's no peculiarities passing through BigDecimal on the way to double, but are any such peculiarities outside 128-bit precision?
  • It might have been more efficient to normalize the decimal position and exponent and then see if it matched the magic value. But of course this magic value was not known until recently, so why risk there being another one?
  • Using BigDecimal is also lazy. I am lazy.
I welcome improvements. Everyone will probably need to start using code like this, since there will be a lot of unpatched JVMs out there for a long time.

I'm happy to say JRuby will be the first JVM language to route around the Double.parseDouble bug :)

Update: The JRuby commit with this logic is 4c71296, and the JRuby bug is at http://jira.codehaus.org/browse/JRUBY-5441.

Update: A commented on Hacker News pointed out that BigDecimal.doubleValue actually just converts to a string and calls Double.parseDouble. So unfortunately, the mechanism above only worked in an earlier version where I was losing some precision by calling Math.pow(10, exponent) rather than scaleByPowerOfTen. The version above unfortunately does not work, so it's back to the drawing board. C'est la vie!