Monday, March 7, 2011

Differing java.util.regex.Matcher Unmatched Group Results on Android

Android is really an amazing little platform, but occasionally you will run into API differences. Some of these are actual bugs (like a number of reflection and enum issues in early releases), and others are just weakly-specified APIs.

Today, I worked on JRUBY-5541: Problem with java_import on Android (Ruboto)

The issue boiled down to how we turn Java's camelCased method names into Ruby's snake_cased form. We were using the following code:


    private static final Pattern CAMEL_CASE_SPLITTER = Pattern.compile("(([a-z0-9])([A-Z])|([A-Za-z0-9])([A-Z][a-z]))");
    public static String getRubyCasedName(String javaCasedName) {
        Matcher m = CAMEL_CASE_SPLITTER.matcher(javaCasedName);
        return m.replaceAll("$2$4_$3$5").toLowerCase();
    }


The logic here is to basically attempt two matches ORed together: methods of the form getName in the first half, and methods of the form getURLHandler in the second half. Given the resulting match, we "cleverly" did a replaceAll for both matches at the same time, combining what would be "$2_$3" for the first half and "$4_$5" in the second half.


This works fine against Hotspot/OpenJDK and any JVMs that use its class libraries. But Android uses Harmony's class libraries, and behaves differently. On OpenJDK, unmatched groups returned an empty string "" for the unmatched groups, properly turning "getName" and into "get_name" and "getURLHandler" into get_url_handler". On Android, however, the unmatched groups return null for the $ variables in replaceAll, causing "getName" to become "getnull_nnullame" and "getURLHandler" into something awful like "getnull_unullrlnull_hnullandler". Subsequent logic in JRuby that tried to turn methods of the form "get_name" into "name" attributes then failed to execute, causing the issue in the bug report.


The fix is a bit cumbersome, but not too difficult to understand: manually walk the matches and appendReplacement using only the groups that matched:


    public static String getRubyCasedName(String javaCasedName) {
        Matcher m = CAMEL_CASE_SPLITTER.matcher(javaCasedName);
        // We do this replace loop manually because Android's Matcher produces null for unmatched $ groups.
        // See JRUBY-5541
        if (m.find()) {
            StringBuffer buffer = new StringBuffer();
            m.reset();
            while (m.find()) {
                if (m.group(2) != null) {
                    // first part matched
                    m.appendReplacement(buffer, "$2_$3");
                } else {
                    // second part matched {
                    m.appendReplacement(buffer, "$4_$5");
                }
            }
            m.appendTail(buffer);
            return buffer.toString().toLowerCase();
        } else {
            return javaCasedName;
        }
    }

I'm not sure whether Android (Harmony) or OpenJDK is "right" in this case, since the API for Matcher.group does say it will return null for unmatched groups, but nowhere is it specified if $ variables in replace calls should do the same.

2 comments:

  1. Might be specified in the TCK; oh wait, that's right...

    ReplyDelete
  2. It's impossible to have a "complete and non ambiguous" spec everywhere, and that's true for any specification. I guess that in this case OpenJDK always win, because it is the reference platform.

    BTW, I'm working in a standardization committee, and I know how it feels when you are sure that something is really clear enough, but then somebody comes and ask something like "Am I doing it right", and then you look at the spec, and realize it is written nowhere.

    ReplyDelete