Saturday, January 5, 2013

Constant and Global Optimization in JRuby 1.7.1 and 1.7.2

With every JRuby release, there's always at least a handful of optimizations. They range from tiny improvements in the compiler to perf-aware rewrites of core class methods, but they're almost always driven by real-world cases.

In JRuby 1.7.1 and 1.7.2, I made several improvements to the performance of Ruby constants and global variables that might be of some interest to you, dear reader.

Constants

In Ruby, a constant is a lexically and hierarchically accessed variable that starts with a capital letter. Class and module names like Object, Kernel, String, are all constants defined under the Object class. When I say constants are both lexical and hierarchically accessed, what I mean is that at access time we first search outward through lexically-enclosing scopes, and failing that we search through the class hierarchy of the innermost scope. For example:


Here, the first two constant accesses inside class B are successful; the first (IN_FOO) is located lexically in Foo, because it encloses the body of class B. The second (IN_A) is located hierarchically by searching B's ancestors. The third access fails, because the IN_BAR constant is only available within the Bar module's scope, so B can't see it.

Constants also...aren't. It is possible to redefine a constant, or define new constants deeper in a lexical or hierarchical strcture that mask earlier ones. However in most code (i.e. "good" code) constants eventually stabilize. This makes it possible to perform a variety of optimizations against them, even though they're not necessarily static.

Constants are used heavily throughout Ruby, both for constant values like Float::MAX and for classes like Array or Hash. It is therefore especially important that they be as fast as possible.

Global Variables

Globals in Ruby are about like you'd expect...name/value pairs in a global namespace. They start with  $ character. Several global variables are "special" and exist in a more localized source, like $~ (last regular expression match in this call frame), $! (last exception raised in this thread), and so on. Use of these "local globals" mostly just amounts to special variable names that are always available; they're not really true global variables.

Everyone knows global variables should be discouraged, but that's largely referring to global variable use in normal program flow. Using global state across your application – potentially across threads – is a pretty nasty thing to do to yourself and your coworkers. But there are some valid uses of globals, like for logging state and levels, debugging flags, and truly global constructs like standard IO.


Here, we're using the global $DEBUG to specify whether logging should occur in MyApp#log. Those log messages are written to the stderr stream accessed via $stderr. Note also that $DEBUG can be set to true by passing -d at the JRuby command line.

Optimizing Constant Access (pre-1.7.1)

I've posted in the past about how JRuby optimizes constant access, so I'll just quickly review that here.

At a given access point, constant values are looked up from the current lexical scope and cached. Because constants can be modified, or new constants can be introduce that mask earlier ones, the JRuby runtime (org.jruby.Ruby) holds a global constant invalidator checked on each access to ensure the previous value is still valid.

On non-invokedynamic JVMs, verifying the cache involves an object identity comparison every time, which means a non-final value must be accessed via a couple levels of indirection. This adds a certain amount of overhead to constant access, and also makes it impossible for the JVM to fold multiple constant accesses away, or make static decisions based on a constant's value.

On an invokedynamic JVM, the cache verification is in the form of a SwitchPoint. SwitchPoint is a type of on/off guard used at invokedynamic call sites to represent a hard failure. Because it can only be switched off, the JVM is able to optimize the SwitchPoint logic down to what's called a "safe point", a very inexpensive ping back into the VM. As a result, constant accesses under invokedynamic can be folded away, and repeat access or unused accesses are not made at all.

However, there's a problem. In JRuby 1.7.0 and earlier, the only way we could access the current lexical scope (in a StaticScope object) was via the current call frame's DynamicScope, a heap-based object created on each activation of a given body of code. In order to reduce the performance hit to methods containing constants, we introduced a one-time DynamicScope called the "dummy scope", attached to the lexical scope and only created once. This avoided the huge hit of constructing a DynamicScope for every call, but caused constant-containing methods to be considerably slower than those without constants.

Lifting Lexical Scope Into Code

In JRuby 1.7.1, I decided to finally bite the bullet and make the lexical scope available to all method bodies, without requiring a DynamicScope intermediate. This was a nontrivial piece of work that took several days to get right, so although most of the work occurred before JRuby 1.7.0 was released, we opted to let it bake a bit before release.

The changes made it possible for all class, module, method, and block bodies to access their lexical scope essentially for free. It also helped us finally deliver on the promise of truly free constant access when running under invokedynamic.

So, does it work?


Assuming constant access is free, the three loops here should perform identically. The non-expression calls to foo and bar should disappear, since they both return a constant value that's never used. The calls for decrementing the 'a' variable should produce a constant value '1' and perform the same as the literal decrement in the control loop.

Here's Ruby (MRI) 2.0.0 performance on this benchmark.


The method call itself adds a significant amount of overhead here, and the constant access adds another 50% of that overhead. Ruby 2.0.0 has done a lot of work on performance, but the cost of invoking Ruby methods and accessing constants remains high, and constant accesses do not fold away as you would like.

Here's JRuby 1.7.2 performance on the same benchmark.


We obviously run all cases significantly faster than Ruby 2.0.0, but the important detail is that the method call adds only about 11% overhead to the control case, and constant access adds almost nothing.

For comparison, here's JRuby 1.7.0, which did not have free access to lexical scopes.


So by avoiding the intermediate DynamicScope, methods containing constant accesses are somewhere around 7x faster than before. Not bad.

Optimizing Global Variables

Because global variables have a much simpler structure than constants, they're pretty easy to optimize. I had not done so up to JRuby 1.7.1 mostly because I didn't see a compelling use case and didn't want to encourage their use. However, after Tony Arcieri pointed out that invokedynamic-optimized global variables could be used to add logging and profiling to an application with zero impact when disabled, I was convinced. Let's look at the example from above again.


In this example, we would ideally like there to be no overhead at all when $DEBUG is untrue, so we're free to add optional logging throughout the application with no penalty. In order to support this, two improvements were needed.

First, I modified our invokedynamic logic to cache global variables using a per-variable SwitchPoint. This makes access to mostly-static global variables as free as constant access, with the same performance improvements.

Second, I added some smarts into the compiler for conditional forms like "if $DEBUG" that would avoid re-checking the $DEBUG value at all if it were false the first time (and start checking it again if it were modified).

It's worth noting I also made this second optimization for constants; code like "if DEBUG_ENABLED" will also have the same performance characteristics.

Let's see how it performs.


In this case, we should again expect that all three forms have identical performance. Both the constant and the global resolve to an untrue value, so they should ideally not introduce any overhead compared to the bare method.

Here's Ruby (MRI) 2.0.0:


Both the global and the constant add overhead here in the neighborhood of 25% over an empty method. This means you can't freely add globally-conditional logic to your application without accepting a performance hit.

JRuby 1.7.2:


Again we see JRuby + invokedynamic optimizing method calls considerably better than MRI, but additionally we see that the untrue global conditions add no overhead compared to the empty method. You can freely use globals as conditions for logging, profiling, and other code you'd like to have disabled most of the time.

And finally, JRuby 1.7.1, which optimized constants, did not optimize globals, and did not have specialized conditional logic for either:

Where Do We Go From Here?

Hopefully I've helped show that we're really just seeing the tip of the iceberg as far as optimizing JRuby using invokedynamic. More than anything we want you to report real-world use cases that could benefit from additional optimization, so we can target our work effectively. And as always, please try out your apps on JRuby, enable JRuby testing in Travis CI, and let us know what we can do to make your JRuby experience better!