Wednesday, February 2, 2011

Working Around the Java Double.parseDouble Bug

You may have seen recently that Java suffers from a similar floating-point parsing bug to the one that recently affected PHP users. The basic gist of it is that for this special 64-bit floating point value, the Java call Double.parseDouble("2.2250738585072012e-308") will get stuck in an infinite loop. Read the link above to understand what's happening.

Naturally, this affects all JVM languages too, since we all use Double.parseDouble for something or another. In fact, it affects almost all the JVM language parsers and compilers (including javac itself), since they need to turn strings into doubles.

Being the upright citizens we are on the JRuby team, we figured we'd try to beat Oracle to the punch and patch around the bug, at least for Ruby-land conversions of String to Float.

I started by looking for calls to Double.parseDouble in JRuby. It turned out there were only two: one for the lexer, and one used by String#to_f, BigDecimal#new, and so on. That was a relief; I expected to find dozens of calls.

It also turned out all cases had already parsed out Ruby float literal oddities, like underscores, using 'd' or 'D' for the exponentiation marker, allowing ill-formatted exponents to be treated as zero, and so on.

My first attempt was to simply normalize the cleaned-up string and pass it to new java.math.BigDecimal(), converting that result back to a primitive double. Unfortunately, BigDecimal's constructor *also* passes through the offending Double.parseDouble code, and we're back where we started.

Ultimately, I ended up with the following code. I make no claims this is efficient, but it appears to pass all the Float tests and specs for JRuby and does not DOS like the bad code in Double.parseDouble:

    public static double parseDouble(String value) {
        String normalString = normalizeDoubleString(value);
        int offset = normalString.indexOf('E');
        BigDecimal base;
        int exponent;
        if (offset == -1) {
            base = new BigDecimal(value);
            exponent = 0;
        } else {
            base = new BigDecimal(normalString.substring(0, offset));
            exponent = Integer.parseInt(normalString.charAt(offset + 1) == '+' ?
                normalString.substring(offset + 2) :
                normalString.substring(offset + 1));
        }
        return base.scaleByPowerOfTen(exponent).doubleValue();
    }


I didn't say it was particularly clever or efficient...but there you have it. A few notes:
  • Do I really need UNLIMITED precision here? I almost used it to ensure there's no peculiarities passing through BigDecimal on the way to double, but are any such peculiarities outside 128-bit precision?
  • It might have been more efficient to normalize the decimal position and exponent and then see if it matched the magic value. But of course this magic value was not known until recently, so why risk there being another one?
  • Using BigDecimal is also lazy. I am lazy.
I welcome improvements. Everyone will probably need to start using code like this, since there will be a lot of unpatched JVMs out there for a long time.

I'm happy to say JRuby will be the first JVM language to route around the Double.parseDouble bug :)

Update: The JRuby commit with this logic is 4c71296, and the JRuby bug is at http://jira.codehaus.org/browse/JRUBY-5441.

Update: A commented on Hacker News pointed out that BigDecimal.doubleValue actually just converts to a string and calls Double.parseDouble. So unfortunately, the mechanism above only worked in an earlier version where I was losing some precision by calling Math.pow(10, exponent) rather than scaleByPowerOfTen. The version above unfortunately does not work, so it's back to the drawing board. C'est la vie!

13 comments:

  1. Sorry to rain, but any solution is beating Sun/Oracle by nearly 10 years.

    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4421494

    ReplyDelete
  2. Pardon the question, but where goes normalizeDoubleString() come from?

    ReplyDelete
  3. Nevermind, I see that it's defined in the checkin

    ReplyDelete
  4. Apache Harmony doesn't exhibit this bug. parseDouble() is implemented in it partially as a native method, but should be doable to convert it to plain java...

    ReplyDelete
  5. @MLeoDaalder: the problem is that if Oracle just fix it trivially (no more loop), they would reduce the precision for other Double parsing results. SO that would not be a good solution. And using BigDecimal is good as a workaround of course, but it would not be a real fix in the JVM (much much slower, and could have side effects).

    About Harmony, I just looked at the native code used for this parsing, and it unfortunately depends on a lot of other native functions, so I don't think it would be easy or even desirable to integrate it, because it would break the architecture of the current JVM code...

    And Harmony may apparently loop endlessly too, in specific cases: "There is a possibility that the function will end up in an endless loop if the given approximating floating-point number (a very small floating-point whose value is very close to zero) straddles between two approximating integer values".

    ReplyDelete
  6. "BigDecimal.doubleValue actually just converts to a string and calls Double.parseDouble"

    Hmm, jruby has a fix nearby: http://jira.codehaus.org/browse/JRUBY-2195, for BigInteger. Maybe that could be extended? (btw, the faster version in http://jira.codehaus.org/browse/JRUBY-3331 was never applied)

    ReplyDelete
  7. interesting that the quoted bug page is now ending up at a page that says "down for maintenance" or simply times out. Thanks Oracle - good to see Java is in the hands of a competent, open and transparent organisation. Not.

    Our main concern where I work was the trivial way that this bug could be triggered over the web - locale parsing of the request is a common operation in web apps and frameworks like Spring. We've decided to block the string using mod_security on the Apache servers that front all of our public Java app servers. We have rules in there now that scan the request URI, all HTTP headers, request params and payloads (POST/PUT's) for a regex that describes the problematic series of numbers.

    ReplyDelete
  8. pldms: Yes, I think we're going to need our own logic, or perhaps we can port the logic from C Ruby?

    BTW, thanks for pointing out JRUBY-3331's faster patch was never applied.

    ReplyDelete
  9. Catch up guys, the official Oracle fix is already here: http://www.oracle.com/technetwork/java/javase/fpupdater-tool-readme-305936.html

    ReplyDelete
  10. If you still need a workaround for older, unpatched versions of java, this might be useful:
    http://extremestjython.blogspot.com/2011/02/no-more-java-double-trouble.html

    ReplyDelete
  11. Have you found a solution to this? The BigDecimal approach is likely to be very slow because of all the object creation.

    Have you thought about a crude approach? Make an array of all 606 possible exponents as doubles (IE 10**x where -303 <= x <= +303. Then parse the mantissa into a long and get the double of that and multiply by the double exponent you pre-worked out (in static{}).

    It will be slower than the built in one by quite a bit - but should work.

    I second step would be to inline the long parse code and get rid of the long object, though escape analysis should be able to get rid of it in Java 7.

    Best wishes - AJ

    ReplyDelete
  12. OK - down side of posting in a hurry. Long.parseLong is static. So 99% sure it will accessed super fast by invokestatic from the generate code. I might have a quick go at this approach and see how it holds up against the built in Double.parseDouble.

    ReplyDelete