Headius: 2011

Friday, October 14, 2011

Why Clojure Doesn't Need Invokedynamic (Unless You Want It to be More Awesome)

This was originally posted as a comment on @fogus's blog post "Why Clojure doesn’t need invokedynamic, but it might be nice". I figured it's worth a top-level post here.

Ok, there's some good points here and a few misguided/misinformed positions. I'll try to cover everything.

First, I need to point out a key detail of invokedynamic that may have escaped notice: any case where you must bounce through a generic piece of code to do dispatch -- regardless of how fast that bounce may be -- prevents a whole slew of optimizations from happening. This might affect Java dispatch, if there's any argument-twiddling logic shared between call sites. It would definitely affect multimethods, which are using a hand-implemented PIC. Any case where there's intervening code between the call site and the target would benefit from invokedynamic, since invokedynamic could be used to plumb that logic and let it inline straight through. This is, indeed, the primary benefit of using invokedynamic: arbitrarily complex dispatch logic folds away allowing the dispatch to optimize as if it were direct.

Your point about inference in Java dispatch is a fair one...if Clojure is able to infer all cases, then there's no need to use invokedynamic at all. But unless Clojure is able to infer all cases, then you've got this little performance time bomb just waiting to happen. Tweak some code path and obscure the inference, and kablam, you're back on a slow reflective impl. Invokedynamic would provide a measure of consistency; the only unforeseen perf impact would be when the dispatch turns out to *actually* be polymorphic, in which case even a direct call wouldn't do much better.

For multimethods, the benefit should be clear: the MM selection logic would be mostly implemented using method handles and "leaf" logic, allowing hotspot to inline it everywhere it is used. That means for small-morphic MM call sites, all targets could potentially inline too. That's impossible without invokedynamic unless you generate every MM path immediately around the eventual call.

Now, on to defs and Var lookup. Depending on the cost of Var lookup, using a SwitchPoint-based invalidation plus invokedynamic could be a big win. In Java 7u2, SwitchPoint-based invalidation is essentially free until invalidated, and as you point out that's a rare case. There would essentially be *no* cost in indirecting through a var until that var changes...and then it would settle back into no cost until it changes again. Frequently-changing vars could gracefully degrade to a PIC.

It's also dangerous to understate the impact code size has on JVM optimization. The usual recommendation on the JVM is to move code into many small methods, possibly using call-through logic as in multimethods to reuse the same logic in many places. As I've mentioned, that defeats many optimizations, so the next approach is often to hand-inline logic everywhere it's used, to let the JVM have a more optimizable view of the system. But now we're stepping on our own feet...by adding more bytecode, we're almost certainly impacting the JVM's optimization and inlining budgets.

OpenJDK (and probably the other VMs too) has various limits on how far it will go to optimize code. A large number of these limits are based on the bytecoded size of the target methods. Methods that get too big won't inline, and sometimes won't compile. Methods that inline a lot of code might not get inlined into other methods. Methods that inline one path and eat up too much budget might push out more important calls later on. The only way around this is to reduce bytecode size, which is where invokedynamic comes in.

As of OpenJDK 7u2, MethodHandle logic is not included when calculating inlining budgets. In other words, if you push all the Java dispatch logic or multimethod dispatch logic or var lookup into mostly MethodHandles, you're getting that logic *for free*. That has had a tremendous impact on JRuby performance; I had previous versions of our compiler that did indeed infer static target methods from the interpreter, but they were often *slower* than call site caching solely because the code was considerably larger. With invokedynamic, a call is a call is a call, and the intervening plumbing is not counted against you.

Now, what about negative impacts to Clojure itself...

#0 is a red herring. JRuby supports Java 5, 6, and 7 with only a few hundred lines of changes in the compiler. Basically, the compiler has abstract interfaces for doing things like constant lookup, literal loading, and dispatch that we simply reimplement to use invokedynamic (extending the old non-indy logic for non-indified paths). In order to compile our uses of invokedynamic, we use Rémi Forax's JSR-292 backport, which includes a "mock" jar with all the invokedynamic APIs stubbed out. In our release, we just leave that library out, reflectively load the invokedynamic-based compiler impls, and we're off to the races.

#1 would be fair if the Oracle Java 7u2 early-access drops did not already include the optimizations that gave JRuby those awesome numbers. The biggest of those optimizations was making SwitchPoint free, but also important are the inlining discounting and MutableCallSite improvements. The perf you see for JRuby there can apply to any indirected behavior in Clojure, with the same perf benefits as of 7u2.

For #2, to address the apparent vagueness in my blog post...the big perf gain was largely from using SwitchPoint to invalidate constants rather than pinging a global serial number. Again, indirection folds away if you can shove it into MethodHandles. And it's pretty easy to do it.

#3 is just plain FUD. Oracle has committed to making invokedynamic work well for Java too. The current thinking is that "lambda", the support for closures in Java 7, will use invokedynamic under the covers to implement "function-like" constructs. Oracle has also committed to Nashorn, a fully invokedynamic-based JavaScript implementation, which has many of the same challenges as languages like Ruby or Python. I talked with Adam Messinger at Oracle, who explained to me that Oracle chose JavaScript in part because it's so far away from Java...as I put it (and he agreed) it's going to "keep Oracle honest" about optimizing for non-Java languages. Invokedynamic is driving the future of the JVM, and Oracle knows it all too well.

As for #4...well, all good things take a little effort :) I think the effort required is far lower than you suspect, though.

Wednesday, August 10, 2011

Invokedynamic in JRuby: Constant Lookup

This is the first of a set (not a series...there's no particular order) of articles I'll write on how JRuby is using invokedynamic. Hopefully they will show Rubyists how drastically invokedynamic is going to improve JRuby, and show other JVM language folks how to use invokedynamic effectively.

Hello friends!

I figured it's about time for me to start writing a bit on how JRuby is actually using invokedynamic.

As of today, JRuby utilizes invokedynamic far more than any other mainstream JVM language. We have worked very closely with the JSR leads and the OpenJDK developers to make sure invokedynamic runs well. And we have been advocating invokedynamic as a game-changer for the JVM and for JVM languages.

Let's explore one area where JRuby is using invokedynamic: Ruby's "constant" lookup.

Non-constant "Constants"

A constant in Ruby is defined on a class or module, and is subject to Ruby's typical namespacing logic. Constants start with a capital letter.

I often put "constants" in parentheses because constant values can be reassigned. This will usually produce a warning...but not an error. This means we can't simply look up constant values once and never look them up again (without special tricks I'll get into later).

Constant lookup is a also bit more complicated than method lookup. When retrieving a constant, Ruby first scans lexically-enclosing scopes' classes and modules for the constant. If the constant can't be found, the next search walks the current class's inheritance hierarchy. If we still can't find the constant, const_missing is called on the current class.

In order to make constant lookup fast, we want to do some sort of caching. In classic JRuby, Ruby 1.9 (YARV), Rubinius, and probably most other modern Ruby implementations, this is done with a global serial number. Whenever a constant is updated or a module is included (changing the inheritance hierarchy) all cached constants everywhere are forced to lookup again.

I have played with mechanisms for reducing the global impact of constant invalidation, but because constants can be looked up lexically it's simply too complicated to localize (since we need invalidate classes down-hierarchy from the change and we also need to update all lexical scopes that might see the change).

Constant Invalidation in JRuby 1.6

The logic in JRuby 1.6 goes something like this:

If cache is empty or invalid, retrieve the constant value in the usual way (lexical, hierarchical search). Store the value with the current global constant serial number.
On subsequent lookups, check cache for validity against the global constant serial number. If we have a value cached and the cache is still valid, return it.
If any constant in the system is updated, or if a module is included into an existing class hierarchy, flip the serial number and force future constant lookups to re-cache.

This turns out to work fairly well. The same mechanism in Ruby 1.9 produced drastically faster constant lookups, and JRuby's performance is even better than 1.9.

But there's a problem here. Because there's this constant pinging of the global constant serial number, every constant access can potentially produce a new value. So we're paying the cost to check that serial number as well as interfering with optimizations that want to see constant values actually be constant.

Can we do better?

Quick Invokedynamic Primer

The main atom of invokedynamic is the MethodHandle. Method handles are essentially function pointers, which can point at Java methods or fields, constructors, constant values, or other method handles. Invokedynamic also provides the MethodHandles utility class, which lets us juggle method handles in various ways:

adapting method signatures by casting, adding, moving, or dropping arguments
combining three handles ("test", "target", and "fallback") to form new a "guard with test" if-statement-like handle
wrap handles with exception handling or argument/return pre/post-processing

You can think of method handles and the chains of adapter handles that stitch them together as a special sort of functional language the JVM knows how to optimize. Given a chain of handles, you should usually get a piece of code that optimizes as well as (or better, in some cases) writing the same logic by hand in Java.

The invokedynamic bytecode simply provides a place to plug a method handle chain into code. When the JVM encounters an invokedynamic bytecode, it calls a "bootstrap method" associated with that bytecode for further instructions.

The bootstrap method returns a CallSite object, provided in java.lang.invoke. There are constant call sites for constant values, mutable call sites for when the target handle chain may have to change, and volatile call sites for when those changes must immediately be reflected across threads.

Once a CallSite has been installed for a given invokedynamic, subsequent hits skip the bootstrapping process, and we're off to the races.

SwitchPoint

I mentioned that the MethodHandles class provides a "guardWithTest" method for combining a test, a target (the "then" branch), and a fallback (the "else" branch). SwitchPoint, also in java.lang.invoke, acts like an on/off guardWithTest that once turned off can never be turned on again. You provide a target and fallback, and until the "switch" is thrown the target will be invoked. After the switch is thrown the fallback will be called.

What's the difference between this and a guardWithTest where the test just pings some global value? The difference is that SwitchPoint doesn't need to check anything.

Optimization and Deoptimization in the JVM

When the JVM decides to optimize a piece of code, it does so in an optimistic way. In very broad terms, this means it assumes its information up to this point is perfect: no new methods or classes will be introduced, profiling information is accurate, etc. Based on this "perfect" view of the world, it aggressively optimizes code.

Of course, the world isn't perfect. The JVM has to give up profiling and monitoring at some point, so it always has an imperfect view of the system. In order to avoid its aggressive optimizations triggering a fatal error later on, JVMs like OpenJDK (Hotspot) do something called deoptimization.

Deoptimization is the process by which running, optimized code can adapt on-the-fly to a changing system. In OpenJDK, there's several ways this is accomplished:

Branches out of compiled code back into the interpreter, when compiled code is determined to be invalid.
Guards around inlined virtual method accesses, to ensure we're still calling against the same class.
On-stack replacement, for fixing up a running method already on the native call stack
...

Because of this ability to deoptimize, it's possible to support zero-cost guards at the JVM level. Returning to SwitchPoint, we can see how this new form of "guardWithTest" can be basically free: we're explicitly telling the JVM this switch is a rare occurrence it can optimize aggressively.

SwitchPoint for Constant Lookup

JRuby on invokedynamic uses SwitchPoint for constant lookup, as you'd expect. Instead of actively pinging that global constant serial number, we instead use a global SwitchPoint object to guard all cached constant accesses. When it comes time to invalidate the system's constants, we just flip the SwitchPoint off and create a new one. All SwitchPoint-guarded constant accesses in the system must then recache and use the new SwitchPoint.

In a well-behaved system, we should reach a steady state where no new constants are being defined and no new modules are being introduced. Because we're using SwitchPoint, the stable state means all constant accesses are treated as truly constant by the JVM, allowing optimizations that were impossible before. And of course this also means that we've achieved constant lookup performance very near a theoretical maximum.

Numbers

First, a caveat: SwitchPoint is implemented in a fairly naïve way in the released OpenJDK 7, using a volatile field as the switch value. As a result, SwitchPoint guardWithTest is very slow currently, and JRuby's SwitchPoint-based constant logic must be enabled. I show numbers below based on leading-edge Hotspot compiler patches that will go into the first update release (numbers provided by one of the Hotspot devs, Christian Thalinger...thanks Christian!)

The benchmark we're running is a modified version of bench_const_lookup in JRuby's benchmark suite. The modification here runs more iterations (10M instead of 1M) with more constant lookups (50 instead of 10) to get a better idea of optimized performance.

Here's JRuby running our constant-lookup benchmark without SwitchPoint-based constants on Java 7:

As I said before, this is pretty good. JRuby's existing constant lookup performance is roughly 2x faster than Ruby 1.9.2.

Next, we'll try JRuby with SwitchPoint constants on Java 7 (released version, so we expect this to be slow):

The perf hit of purely volatile SwitchPoint is apparent.

And finally, JRuby with SwitchPoint constants on a dev build of Hotspot, which uses deoptimization rather than a volatile field:

This is basically the performance of the 10M iteration loop alone. In fact, if you look at the resulting optimized assembly, the constant accesses have been eliminated entirely since they're optimistically inlined and never used. Of course this would normally not happen in real code, but it shows how much better the JVM can optimized Ruby's behavior using invokedynamic.

Tuesday, August 2, 2011

JRuby and Java 7: What to Expect

Java 7 has landed, with a modest set of new features and a few major improvements as well. What can you expect from JRuby running on Java 7?

What's In Java 7

The biggest changes in Java 7 are not related to the Java language at all. Sure, there's the "project coin" enhancements to the Java language, which add some exception-handling shortcuts, new literals for numbers, arrays, hashes, the oft-requested "strings in switch" support, and a few other things. But they're modest incremental changes; the real revolution is at the JVM and JDK level.

Invokedynamic

The most important change in Java 7 is the incorporation of a new bytecode -- invokedynamic -- and an API for building chains of "method handles" to back that bytecode up.

You can look at invokedynamic as a way for JVM users to communicate directly with the optimizing backend of the JVM. Method handles act as both function pointers and as function combinators, allowing a built-in way to construct a call protocol flow from a caller to a callee. You can move arguments around, insert new arguments, process existing arguments and return values, catch exceptions, and perform fast guarded branches between two (or more) paths. The invokedynamic bytecode itself provides a bytecode-level hook to which you attach your method handle chain, with the assumption that the JVM can optimize that chain directly into the invokedynamic caller.

The tl;dr is that invokedynamic makes it possible for the JVM to see through complicated method call logic, such as that found in dynamic languages, and optimize that logic like it would for regular "static" calls.

JRuby's master branch already takes heavy advantage of invokedynamic, by routing most Ruby calls through invokedynamic operations. For simple paths and those that have been optimized by the Hotspot guys (Hotspot is the VM at the core of OpenJDK), invokedynamic often provides performance improvements of 150-200%, with work ongoing to make it even faster. Other paths may not be as well-optimized by the "dot zero" version of OpenJDK 7, so there's opportunity to improve them.

Because JRuby is already well along the road to utilizing invokedynamic, you can try it out today.

Build your own JRuby from master or grab a snapshot from our CI server.
Grab a build of OpenJDK 7 from Oracle (or a build of OpenJDK 7 for OS X).
Point JAVA_HOME at the new JDK and try out JRuby!

We're looking for small benchmarks that show the performance of invokedynamic (good or bad), so please contact me, the JRuby team, or the JRuby users mailing list with your reports from the field. Also, feel free to open performance bugs on the JRuby bug tracker if invokedynamic performs worse than non-invokedynamic. Pass -Xcompile.invokedynamic=false to JRuby to revert to the old non-invokedynamic logic.

NIO.2

NIO is Java's "New IO" APIs, a set of wrappers around low-level file-descriptor logic and memory buffers. NIO has been around since Java 1.4, but the recent update -- dubbed NIO.2 -- brings a sorely-needed update to the functionality provided:

Filesystem operations (like symlinks, permissions, etc) are now almost all available through NIO.2's filesystem APIs. This also includes standard, cross-platform support for filesystem events, such as watching a directory for changes (using efficient OS-level operations, rather than polling).
File and directory walking now comes with considerably less overhead and more options for filtering directory lists before handing filenames off to user code. There's also support for opening a directory directly and walking its contents as you would a file.
Most IO channel types now have asynchronous versions. Asynchronous in this case means "punt my IO operation to a built-in thread pool", with subsequent code checking on the status of those operations and getting results from a "future" handle.

For JRuby, the new IO APIs will mean we can support more filesystem operations across platforms without resorting to native code. It will also provide JRuby users a means of handling filesystem events and asynchronous IO operations without using a platform-specific library. We have not yet started adding NIO.2 support to JRuby's core classes, but that will come soon.

General Improvements

There's lots of smaller, less flashy changes in OpenJDK that also appear to help JRuby.

Even without invokedynamic, the latest OpenJDK 7 builds usually perform better than OpenJDK 6. Some benchmarks have proven to be as much as 2x faster, just by upgrading the JVM! General perf improvements will be more modest, but in almost every case we've tested OpenJDK 7 definitely performs better.

The release of OpenJDK 7 also brings improvements to the "tiered" compilation mode. Tiered compilation aims to merge the benefits of the "client" mode (fast startup) with those of the "server" mode (maximum peak performance). You can turn on tiered compilation using -XX:+TieredCompilation (in JAVA_OPTS or at the "java" command line, or prefixed with -J when passed to JRuby). We're looking for user reports about how well "tiered" mode works, too.

This general improvement means that even JRuby 1.6.x users can take advantage of OpenJDK 7 today, with the promise of even bigger improvements in JRuby 1.7 (our target release for pervasive invokedynamic support).

Consistency

As with previous Java releases, a great deal of care has been taken to ensure existing applications work properly. That applies as well to Java 7. We have been testing against Java 7 for over a year, on and off, and recently started running tests "green" with even heavy invokedynamic use.

We have made no major Java 7-specific fixes in JRuby...it should generally "just work".

Let Us Know!

As always, we really want to hear from you bleeding-edge users that are playing around with JRuby on Java 7. Please don't be shy...let us know how it works for you!

Update: The Hotspot guys have been helping me find invokedynamic bottlenecks in a few JRuby microbenchmarks, and discovered that a flaw in invokedynamic was causing too much code to inline, forcing out more important optimizations. The details belong in another post, but they offered me a long Hotspot flag to accomplish basically what their fix does: -XX:CompileCommand=dontinline,org.jruby.runtime.invokedynamic.InvokeDynamicSupport::invocationFallback ... With this flag, performance on e.g. "tak" easily beats stock JRuby (see the third benchmark run here: https://gist.github.com/1121880).

I would recommend trying this flag if you are finding invokedynamic slowdowns in JRuby.

Thursday, July 21, 2011

Next July, Last Friday, This Tuesday

So after months of not blogging anything technical, I'm going to blog something non-technical. Hopefully tech posts will pick up soon once my new baby boy Elliott is a bit older and less needy :)

When Is "This Friday"?

The most confusing time-oriented statements (for me) are when people use "this", "next", and "last" to describe a specific day or month. Some people consider "this" to always be the day/month coming soon", and others have different meanings. This short post will describe what I mean, in a way that hopefully convinces you to do the same.

If we look at what "this", "next", and "last" are modifying, a simple pattern emerges. For days of the week, they're indicating what week contains the given day. "This week" means the week we're in right now, "next week" means the week that follows this one, and "last week" means the week that preceded this one. Taking that to days of the week, then, "this Friday" should always mean "Friday of this week". Similarly, when those modifiers are applied to a month, they usually mean what year contains the given month. "This August" would mean the August of this year, and so on.

There's no perfect way to interpret these modifiers, and even my system has some mildly confusing points.

Let's say today is Thursday. The following day is "this Friday", as you'd expect. A week from tomorrow would be "next Friday", the Friday of next week...not tomorrow, even though that's the Friday that comes "next" in time. Perhaps a bit more confusing is using "this" to describe days in the past; "this Wednesday" would mean the day before today, since that's the Wednesday of this week. A proper sentence would be "this Wednesday I went to the store." Note the past-tense there.

Sunday and Saturday are peculiarities too, and in almost any system they are the source of the most confusion. By my system, "this Sunday" would almost always mean a day in the past, since that's the Sunday of this week (and it would be weird to say "this Sunday" on Sunday). Similarly, "next Saturday" will almost always mean two Saturdays from now.

Confusion about days in the past or in the future can be avoided with additional modifiers "coming" and "past". "This past Saturday" always means the Saturday nearest in the past, and "this coming Sunday" always means the next Sunday in the future. My system is not ambiguous, but adding these additional modifiers can help smooth over places where it might confuse folks unfamiliar with it.

One alternative would be to always have "this" mean the day/month next in time, "next" to always mean the one after that, and "last" to be the one nearest in the past. But that ends up ambiguous, since if tomorrow is Friday it's unclear if "next Friday" is tomorrow or the Friday of next week.

So, what do you think? Does this system make sense? Is there a better way to disambiguate these modifiers?

Monday, March 7, 2011

Differing java.util.regex.Matcher Unmatched Group Results on Android

Android is really an amazing little platform, but occasionally you will run into API differences. Some of these are actual bugs (like a number of reflection and enum issues in early releases), and others are just weakly-specified APIs.

Today, I worked on JRUBY-5541: Problem with java_import on Android (Ruboto)

The issue boiled down to how we turn Java's camelCased method names into Ruby's snake_cased form. We were using the following code:

   private static final Pattern CAMEL_CASE_SPLITTER = Pattern.compile("(([a-z0-9])([A-Z])|([A-Za-z0-9])([A-Z][a-z]))");
   public static String getRubyCasedName(String javaCasedName) {
   Matcher m = CAMEL_CASE_SPLITTER.matcher(javaCasedName);
   return m.replaceAll("$2$4_$3$5").toLowerCase();
   }

The logic here is to basically attempt two matches ORed together: methods of the form getName in the first half, and methods of the form getURLHandler in the second half. Given the resulting match, we "cleverly" did a replaceAll for both matches at the same time, combining what would be "$2_$3" for the first half and "$4_$5" in the second half.

This works fine against Hotspot/OpenJDK and any JVMs that use its class libraries. But Android uses Harmony's class libraries, and behaves differently. On OpenJDK, unmatched groups returned an empty string "" for the unmatched groups, properly turning "getName" and into "get_name" and "getURLHandler" into get_url_handler". On Android, however, the unmatched groups return null for the $ variables in replaceAll, causing "getName" to become "getnull_nnullame" and "getURLHandler" into something awful like "getnull_unullrlnull_hnullandler". Subsequent logic in JRuby that tried to turn methods of the form "get_name" into "name" attributes then failed to execute, causing the issue in the bug report.

The fix is a bit cumbersome, but not too difficult to understand: manually walk the matches and appendReplacement using only the groups that matched:

   public static String getRubyCasedName(String javaCasedName) {
   Matcher m = CAMEL_CASE_SPLITTER.matcher(javaCasedName);
   // We do this replace loop manually because Android's Matcher produces null for unmatched $ groups.
   // See JRUBY-5541
   if (m.find()) {
   StringBuffer buffer = new StringBuffer();
   m.reset();
   while (m.find()) {
   if (m.group(2) != null) {
   // first part matched
   m.appendReplacement(buffer, "$2_$3");
   } else {
   // second part matched {
   m.appendReplacement(buffer, "$4_$5");
   }
   }
   m.appendTail(buffer);
   return buffer.toString().toLowerCase();
   } else {
   return javaCasedName;
   }
   }

I'm not sure whether Android (Harmony) or OpenJDK is "right" in this case, since the API for Matcher.group does say it will return null for unmatched groups, but nowhere is it specified if $ variables in replace calls should do the same.

Wednesday, February 2, 2011

Working Around the Java Double.parseDouble Bug

You may have seen recently that Java suffers from a similar floating-point parsing bug to the one that recently affected PHP users. The basic gist of it is that for this special 64-bit floating point value, the Java call Double.parseDouble("2.2250738585072012e-308") will get stuck in an infinite loop. Read the link above to understand what's happening.

Naturally, this affects all JVM languages too, since we all use Double.parseDouble for something or another. In fact, it affects almost all the JVM language parsers and compilers (including javac itself), since they need to turn strings into doubles.

Being the upright citizens we are on the JRuby team, we figured we'd try to beat Oracle to the punch and patch around the bug, at least for Ruby-land conversions of String to Float.

I started by looking for calls to Double.parseDouble in JRuby. It turned out there were only two: one for the lexer, and one used by String#to_f, BigDecimal#new, and so on. That was a relief; I expected to find dozens of calls.

It also turned out all cases had already parsed out Ruby float literal oddities, like underscores, using 'd' or 'D' for the exponentiation marker, allowing ill-formatted exponents to be treated as zero, and so on.

My first attempt was to simply normalize the cleaned-up string and pass it to new java.math.BigDecimal(), converting that result back to a primitive double. Unfortunately, BigDecimal's constructor *also* passes through the offending Double.parseDouble code, and we're back where we started.

Ultimately, I ended up with the following code. I make no claims this is efficient, but it appears to pass all the Float tests and specs for JRuby and does not DOS like the bad code in Double.parseDouble:

   public static double parseDouble(String value) {
   String normalString = normalizeDoubleString(value);
   int offset = normalString.indexOf('E');
   BigDecimal base;
   int exponent;
   if (offset == -1) {
   base = new BigDecimal(value);
   exponent = 0;
   } else {
   base = new BigDecimal(normalString.substring(0, offset));
   exponent = Integer.parseInt(normalString.charAt(offset + 1) == '+' ?
   normalString.substring(offset + 2) :
   normalString.substring(offset + 1));
   }
   return base.scaleByPowerOfTen(exponent).doubleValue();
   }

I didn't say it was particularly clever or efficient...but there you have it. A few notes:

Do I really need UNLIMITED precision here? I almost used it to ensure there's no peculiarities passing through BigDecimal on the way to double, but are any such peculiarities outside 128-bit precision?
It might have been more efficient to normalize the decimal position and exponent and then see if it matched the magic value. But of course this magic value was not known until recently, so why risk there being another one?
Using BigDecimal is also lazy. I am lazy.

I welcome improvements. Everyone will probably need to start using code like this, since there will be a lot of unpatched JVMs out there for a long time.

I'm happy to say JRuby will be the first JVM language to route around the Double.parseDouble bug :)

Update: The JRuby commit with this logic is 4c71296, and the JRuby bug is at http://jira.codehaus.org/browse/JRUBY-5441.

Update: A commented on Hacker News pointed out that BigDecimal.doubleValue actually just converts to a string and calls Double.parseDouble. So unfortunately, the mechanism above only worked in an earlier version where I was losing some precision by calling Math.pow(10, exponent) rather than scaleByPowerOfTen. The version above unfortunately does not work, so it's back to the drawing board. C'est la vie!

Wednesday, January 19, 2011

JRuby on Rails on Amazon Elastic Beanstalk

Amazon this week announced Elastic Beanstalk, a managed Apache Tomcat service for AWS. Naturally, I had to try JRuby on it.

First, the bad:

AWSEB is really slow to deploy stuff. Several times it got "stuck" and I waited for more than 30 minutes for it to recover. It did not appear to be an app issue, since the app came up just fine.
The default instance size is t1.micro. I was able to get a Rails app to boot there, but it's a very underpowered size.
It appears to start up JVMs with 256MB of memory max and 64MB of permgen. For a larger app, or one with many Rails instances, that might not be enough. For a "threadsafe" Rails app, though, it's plenty.
The default EC2 load balancer for the new Beanstalk instance is set to ping the "/" URL. If you don't rig up a / route in your Rails app (like I forgot to do) the app will come up for a few minutes and immediately get taken out.

And the good news: it works great once you get past the hassles! Here's the process that worked for my app (assuming app is already build and ready for deploy).

Preparing the app:

Ensure jruby-openssl is in Gemfile. Rails seems to want it in production mode.
Edit config/environments/production.rb to enable threadsafe mode.
`warble`

Preparing Elastic Beanstalk:

Create a new instance, specifying the .war file Warbler created above as the app to deploy
There is no step two

Once the instance has been prepared, you may want to resize it to something larger than t1.micro if it's meant to be a real app...but it should boot ok.

Have fun!

Wednesday, January 5, 2011

Representing Non-Unicode Text on the JVM

JRuby is an implementation of Ruby, and in order to achieve the high level of compatibility we boast we've had to put in some extra work. Probably the biggest area is in management of String data.

Strings in Ruby 1.8

Ruby 1.8 did not differentiate between strings of bytes or strings of characters. A String was just an array of bytes, and the representation of those bytes was dependent on how you used them. You could do regular expression matches against non-text binary sequences (used by some to parse binary formats like PNG). You could treat them as UTF-8 text by setting global encoding variables to assume UTF-8 in literal strings. And you could use the same methods for dealing with strings of bytes you would with strings of characters...split, gsub, and so on. The lack of a character abstraction was painful, but the ability to do character-string operations against byte-strings was frequently useful.

In order to support all this, we were forced in JRuby to also represent strings as byte[]. This was not an easy decision. Java's strings are all UTF-16 internally. By moving to byte[]-based strings, we lost many benefits of being on the JVM, like built-in regular expression support, seamless passing of strings to Java methods, and easy interaction with any Java libraries that accept, return, or manipulate Java strings. We eventually had to implement our own regexp engines (or the byte[]-to-char[]-to-byte[] overhead would kill us) and calls from Ruby to Java still pay a cost to pass Strings.

But we got a lot out of it too. We would not have been able to represent binary content easily with a char[]-based string, since it would either get garbled (when Java tried to decode it) or we'd have to force the data into only the lower byte of each char, doubling the size of all strings in memory. We have some of the fastest String IO capabilities of any JVM language, since we never have to decode text. And most importantly, we've achieved an incredibly high level of compatibility with C Ruby that would have been difficult or impossible forcing String data into char[].

There's also another major benefit: we can support Ruby 1.9's "multilingualization".

Strings in Ruby 1.9

Ruby 1.9 still represents all string data as byte[] internally, but it adds an additional field to all strings: Encoding (there's also "code range", but it's merely an internal optimization).

The biggest problem with encoded text in MRI was that you never knew what a string's encoding was supposed to be; the String object itself only aggregated byte[], and if you ever ended up with mixed encodings in a given app, you'd be in trouble. Rails even introduced its own "Chars" type specifically to deal with the lack of encoding support.

In Ruby 1.9, however, Strings know their own encoding. Strings can be forced to a specific encoding or transcoded to another. IO streams are aware of (and configurable for) external and internal encodings, and there's also default external and internal encodings. And you can still deal with raw binary data in the same structure and with the same String-manipulating features. For a full discussion of encoding support in Ruby 1.9, see Yehuda Katz's excellent post on Ruby 1.9 Encodings: A Primer.

As part of JRuby 1.6, we've been working on getting much closer to 100% compatible with Ruby 1.9. Of course this has meant working on encoding support. Luckily, we had a hacking machine some years ago in Marcin Mielzynski, who implemented not only our encoding-agnostic regexp engine (a port of Oniguruma from C), but also our byte[]-based String logic and almost all of our Encoding support. The remaining work has trickled in over the subsequent years, leading up the the last few months of heavy activity on 1.9 support.

How It Works

You might find it interesting to know how all this works, since JRuby is a JVM-based language where Strings are supposed to be UTF-16 always.

First off, String, implemented by the RubyString class. RubyString aggregates an Encoding and an array of bytes, using a structure we call ByteList. ByteList represents an arbitrary array of bytes, a view into them, and an encoding. All operations against a String operate within RubyString's code against ByteLists.

IO streams, implemented by RubyIO (and subclasses) and ChannelStream/ChannelDescriptor, accept and return ByteList instances. ByteList is the text/binary currency of JRuby...our java.lang.String.

Regexp is implemented in RubyRegexp using Joni, our Java port of the Oniguruma regular expression library. Oniguruma accepts byte arrays and uses encoding-specific information at match time to know what constitutes a character in that byte array. It is the only regular expression engine on the JVM capable of dealing with encodings other than UTF-16.

The JRuby parser also ties into encoding, using it on a per-file basis to know how to encode each literal string it encounters. Literal strings in the AST are held in StrNode, which aggregates a ByteList and constructs new String instances from it.

The compiler is an interesting case. Ideally we would like all literal strings to still go into the class file's constant pool, so that they can be loaded quickly and live as part of the class metadata. In order to do this, the byte[]-based string content is forced into a char[], which is forced into a java.lang.String that goes in the constant pool. At load time, we unpack the bytes and return them to a ByteList that knows their actual encoding. Dare I claim that JRuby is the first JVM language to allow representing literal strings in encodings other than UTF-16?

What It Means For You

At the end of the day, how are you affected by all this? How is your life improved?

If you are a Ruby user, you can count on JRuby having solid support for Ruby 1.9's "M17N" strings. That may not be complete for JRuby 1.6, but we intend to take it as far as possible. JRuby 1.6 *will* have the lion's share of M17N in-place and working.

If you are a JVM user...JRuby represents the *only* way you can deal with arbitrarily-encoded text without converting it to UTF-16 Unicode. At a minimum, this means JRuby has the potential to deal with raw wire data much more efficiently than libraries that have to up-convert to UTF-16 and downconvert back to UTF-8. It may also mean encodings without complete representation in Unicode (like Japanese "emoji" characters) can *only* be losslessly processed using JRuby, since forcing them into UTF-16 would either destroy them or mangle characters. And of course no other JVM language provides JRuby's capabilities for using String-like operations against arbitrary binary data. That's gotta be worth something!

I want to also take this opportunity to again thank Marcin for his work on JRuby in the past; Tom Enebo for his suffering through encoding-related parser work the past few weeks; Yukihiro "Matz" Matsumoto for adding encoding support to Ruby; and all JRuby committers and contributors who have helped us sort out M17N for JRuby 1.6.

Tuesday, January 4, 2011

Flat and Graph Profiles for JRuby 1.6

Sometimes it's the little things that make all the difference in the world.

For a long time, we've extolled the virtues of the amazing JVM tool ecosystem. There's literally dozens of profiling, debugging, and monitoring tools, making JRuby perhaps the best Ruby tool ecosystem you can get. But there's a surprising lack of tools for command-line use, and that's an area many Rubyists take for granted.

To help improve the situation, we recently got the ruby-debug maintainers to ship our JRuby version, so JRuby has easy-to-use command-line Ruby debugging support. You can simply "gem install ruby-debug" now, so we'll stop shipping it in JRuby 1.6.

We also shipped a basic "flat" instrumented profiler for JRuby 1.5.6. It's almost shocking how few command-line profiling tools there are available for JVM users; most require you to boot up a GUI and click a bunch of buttons to get any information at all. Even when there are tools for profiling, they're often bad at reporting results for non-Java languages like JRuby. So we decided to whip out a --profile flag that gives you a basic, flat, instrumented profile of your code.

To continue this trend, we enlisted the help of Dan Lucraft of the RedCar project to expand our profiler to include "graph" profiling results. Dan previously implemented JRuby support for the "ruby-prof" project, a native extension to C Ruby, in the form of "jruby-prof" (which you can install and use today on any recent JRuby release). He was a natural to work on the built-in profiling support.

For the uninitiated, "flat" profiles just show how much time each method body takes, possibly with downstream aggregate times and total aggregate times. This is what you usually get from built-in command-line profilers like the "hprof" profiler that ships with Hotspot/OpenJDK. Here's a "flat" profile for a simple piece of code.

~/projects/jruby ➔ jruby --profile.flat -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
Total time: 0.99

    total        self    children       calls  method
----------------------------------------------------------------
     0.99        0.00        0.99           1  Object#foo
     0.99        0.08        0.90           1  Fixnum#times
     0.70        0.70        0.00      100000  Bignum#to_s
     0.21        0.21        0.00      100000  Fixnum#**
     0.00        0.00        0.00         145  Class#inherited
     0.00        0.00        0.00           1  Module#method_added

A "graph" profile shows the top N call stacks from your application's run, breaking them down by how much time is spent in each method. It gives you a more complete picture of where time is being spent while running your application. Here's a "graph" profile (abbreviated) for the same code.

~/projects/jruby ➔ jruby --profile.graph -e "def foo; 100000.times { (2 ** 200).to_s }; end; foo"
%total   %self    total        self    children                 calls  name
---------------------------------------------------------------------------------------------------------
100%     0%        1.00        0.00        1.00                     0  (top)
                   1.00        0.00        1.00                   1/1  Object#foo
                   0.00        0.00        0.00               145/145  Class#inherited
                   0.00        0.00        0.00                   1/1  Module#method_added
---------------------------------------------------------------------------------------------------------
                   1.00        0.00        1.00                   1/1  (top)
 99%     0%        1.00        0.00        1.00                     1  Object#foo
                   1.00        0.09        0.91                   1/1  Fixnum#times
---------------------------------------------------------------------------------------------------------
                   1.00        0.09        0.91                   1/1  Object#foo
 99%     8%        1.00        0.09        0.91                     1  Fixnum#times
                   0.70        0.70        0.00         100000/100000  Bignum#to_s
                   0.21        0.21        0.00         100000/100000  Fixnum#**
---------------------------------------------------------------------------------------------------------
                   0.70        0.70        0.00         100000/100000  Fixnum#times
 69%    69%        0.70        0.70        0.00                100000  Bignum#to_s
---------------------------------------------------------------------------------------------------------
                   0.21        0.21        0.00         100000/100000  Fixnum#times
 21%    21%        0.21        0.21        0.00                100000  Fixnum#**
---------------------------------------------------------------------------------------------------------
                   0.00        0.00        0.00               145/145  (top)
  0%     0%        0.00        0.00        0.00                   145  Class#inherited
---------------------------------------------------------------------------------------------------------
                   0.00        0.00        0.00                   1/1  (top)
  0%     0%        0.00        0.00        0.00                     1  Module#method_added

As you can see, you get a much better picture of why certain methods are taking up time and what component calls are contributing most to that time.

We haven't settled on the final command-line flags, but look for the new graph profiling (and the cleaned-up flat profile) to ship with JRuby 1.6 (real soon now!)