Headius: ruby

Showing posts with label ruby. Show all posts

Saturday, January 5, 2013

Constant and Global Optimization in JRuby 1.7.1 and 1.7.2

With every JRuby release, there's always at least a handful of optimizations. They range from tiny improvements in the compiler to perf-aware rewrites of core class methods, but they're almost always driven by real-world cases.

In JRuby 1.7.1 and 1.7.2, I made several improvements to the performance of Ruby constants and global variables that might be of some interest to you, dear reader.

Constants

In Ruby, a constant is a lexically and hierarchically accessed variable that starts with a capital letter. Class and module names like Object, Kernel, String, are all constants defined under the Object class. When I say constants are both lexical and hierarchically accessed, what I mean is that at access time we first search outward through lexically-enclosing scopes, and failing that we search through the class hierarchy of the innermost scope. For example:

Here, the first two constant accesses inside class B are successful; the first (IN_FOO) is located lexically in Foo, because it encloses the body of class B. The second (IN_A) is located hierarchically by searching B's ancestors. The third access fails, because the IN_BAR constant is only available within the Bar module's scope, so B can't see it.

Constants also...aren't. It is possible to redefine a constant, or define new constants deeper in a lexical or hierarchical strcture that mask earlier ones. However in most code (i.e. "good" code) constants eventually stabilize. This makes it possible to perform a variety of optimizations against them, even though they're not necessarily static.

Constants are used heavily throughout Ruby, both for constant values like Float::MAX and for classes like Array or Hash. It is therefore especially important that they be as fast as possible.

Global Variables

Globals in Ruby are about like you'd expect...name/value pairs in a global namespace. They start with $ character. Several global variables are "special" and exist in a more localized source, like $~ (last regular expression match in this call frame), $! (last exception raised in this thread), and so on. Use of these "local globals" mostly just amounts to special variable names that are always available; they're not really true global variables.

Everyone knows global variables should be discouraged, but that's largely referring to global variable use in normal program flow. Using global state across your application – potentially across threads – is a pretty nasty thing to do to yourself and your coworkers. But there are some valid uses of globals, like for logging state and levels, debugging flags, and truly global constructs like standard IO.

Here, we're using the global $DEBUG to specify whether logging should occur in MyApp#log. Those log messages are written to the stderr stream accessed via $stderr. Note also that $DEBUG can be set to true by passing -d at the JRuby command line.

Optimizing Constant Access (pre-1.7.1)

I've posted in the past about how JRuby optimizes constant access, so I'll just quickly review that here.

At a given access point, constant values are looked up from the current lexical scope and cached. Because constants can be modified, or new constants can be introduce that mask earlier ones, the JRuby runtime (org.jruby.Ruby) holds a global constant invalidator checked on each access to ensure the previous value is still valid.

On non-invokedynamic JVMs, verifying the cache involves an object identity comparison every time, which means a non-final value must be accessed via a couple levels of indirection. This adds a certain amount of overhead to constant access, and also makes it impossible for the JVM to fold multiple constant accesses away, or make static decisions based on a constant's value.

On an invokedynamic JVM, the cache verification is in the form of a SwitchPoint. SwitchPoint is a type of on/off guard used at invokedynamic call sites to represent a hard failure. Because it can only be switched off, the JVM is able to optimize the SwitchPoint logic down to what's called a "safe point", a very inexpensive ping back into the VM. As a result, constant accesses under invokedynamic can be folded away, and repeat access or unused accesses are not made at all.

However, there's a problem. In JRuby 1.7.0 and earlier, the only way we could access the current lexical scope (in a StaticScope object) was via the current call frame's DynamicScope, a heap-based object created on each activation of a given body of code. In order to reduce the performance hit to methods containing constants, we introduced a one-time DynamicScope called the "dummy scope", attached to the lexical scope and only created once. This avoided the huge hit of constructing a DynamicScope for every call, but caused constant-containing methods to be considerably slower than those without constants.

Lifting Lexical Scope Into Code

In JRuby 1.7.1, I decided to finally bite the bullet and make the lexical scope available to all method bodies, without requiring a DynamicScope intermediate. This was a nontrivial piece of work that took several days to get right, so although most of the work occurred before JRuby 1.7.0 was released, we opted to let it bake a bit before release.

The changes made it possible for all class, module, method, and block bodies to access their lexical scope essentially for free. It also helped us finally deliver on the promise of truly free constant access when running under invokedynamic.

So, does it work?

Assuming constant access is free, the three loops here should perform identically. The non-expression calls to foo and bar should disappear, since they both return a constant value that's never used. The calls for decrementing the 'a' variable should produce a constant value '1' and perform the same as the literal decrement in the control loop.

Here's Ruby (MRI) 2.0.0 performance on this benchmark.

The method call itself adds a significant amount of overhead here, and the constant access adds another 50% of that overhead. Ruby 2.0.0 has done a lot of work on performance, but the cost of invoking Ruby methods and accessing constants remains high, and constant accesses do not fold away as you would like.

Here's JRuby 1.7.2 performance on the same benchmark.

We obviously run all cases significantly faster than Ruby 2.0.0, but the important detail is that the method call adds only about 11% overhead to the control case, and constant access adds almost nothing.

For comparison, here's JRuby 1.7.0, which did not have free access to lexical scopes.

So by avoiding the intermediate DynamicScope, methods containing constant accesses are somewhere around 7x faster than before. Not bad.

Optimizing Global Variables

Because global variables have a much simpler structure than constants, they're pretty easy to optimize. I had not done so up to JRuby 1.7.1 mostly because I didn't see a compelling use case and didn't want to encourage their use. However, after Tony Arcieri pointed out that invokedynamic-optimized global variables could be used to add logging and profiling to an application with zero impact when disabled, I was convinced. Let's look at the example from above again.

In this example, we would ideally like there to be no overhead at all when $DEBUG is untrue, so we're free to add optional logging throughout the application with no penalty. In order to support this, two improvements were needed.

First, I modified our invokedynamic logic to cache global variables using a per-variable SwitchPoint. This makes access to mostly-static global variables as free as constant access, with the same performance improvements.

Second, I added some smarts into the compiler for conditional forms like "if $DEBUG" that would avoid re-checking the $DEBUG value at all if it were false the first time (and start checking it again if it were modified).

It's worth noting I also made this second optimization for constants; code like "if DEBUG_ENABLED" will also have the same performance characteristics.

Let's see how it performs.

In this case, we should again expect that all three forms have identical performance. Both the constant and the global resolve to an untrue value, so they should ideally not introduce any overhead compared to the bare method.

Here's Ruby (MRI) 2.0.0:

Both the global and the constant add overhead here in the neighborhood of 25% over an empty method. This means you can't freely add globally-conditional logic to your application without accepting a performance hit.

JRuby 1.7.2:

Again we see JRuby + invokedynamic optimizing method calls considerably better than MRI, but additionally we see that the untrue global conditions add no overhead compared to the empty method. You can freely use globals as conditions for logging, profiling, and other code you'd like to have disabled most of the time.

And finally, JRuby 1.7.1, which optimized constants, did not optimize globals, and did not have specialized conditional logic for either:

Where Do We Go From Here?

Hopefully I've helped show that we're really just seeing the tip of the iceberg as far as optimizing JRuby using invokedynamic. More than anything we want you to report real-world use cases that could benefit from additional optimization, so we can target our work effectively. And as always, please try out your apps on JRuby, enable JRuby testing in Travis CI, and let us know what we can do to make your JRuby experience better!

Sunday, April 27, 2008

Promise and Peril for Alternative Ruby Impls

My how things have changed in a couple short years.

Two years ago, in 2006, there were essentially two viable Ruby implementations: Matz's Ruby 1.8.x codebase, and JRuby. At the time, JRuby was just barely starting to run Rails. I consider that a sort of "singularity" in the lifetime of an implementation, the inflection point at which it becomes more than a toy. (OT: these days, I consider the ability to run Rails faster than Matz's Ruby a better inflection point, but we've had the Rails thing going for two years). Cardinal (Ruby on Parrot) was mostly dead, or at least on its way to being dead. YARV, eventually to become the Ruby 1.9 VM, was perhaps only half completed and was not yet officially marked to be the next Ruby. Rubinius had not really been started, or at least had not officially been named and could not be considered anywhere near viable. IronRuby was still Wilco Bauer's IronRuby, a doomed codebase and project name eventually to be adopted by Microsoft's later Ruby implementation effort. So there was very little competition, and most people still considered JRuby to be a big joke. Ha ha.

Fast forward to Spring 2008. Ruby 1.8.x has mostly been put in maintenance mode, but remains by far the most widely-deployed Ruby implementation, despite its relatively poor performance. Largely, this is because only Ruby 1.8.x is 100% compatible with Ruby 1.8.x, and because it's already packaged and shipped on a number of OSes, including Rubyist favorite OS X. But the rest of the field has become a lot more muddled. There are now around six implementations that have past the Rails inflection point or will soon, a handful of others likely to fade into obscurity (after contributing their own genetics to the Ruby ecosystem, surely), a few mysterious up-and-comers...and Cardinal is still dead.

Let's review the promise, peril, and status of all the implementations. Note, this is largely a mix of facts and my opinions. Corrections for the facts are welcome. Corrections for the opinions...well...let's take it offline.

Ruby 1.8

Matz's venerable code base (usually called MRI for "Matz's Ruby Interpreter", MatzRuby or Ruby 1.8 in this article) still has a death-grip on the Ruby world. For 99% of Ruby developers, MatzRuby is still the king of the hill. This is in the face of poor relative performance (slower than JRuby and 1.9 for sure, and slower than most of the others for many cases), poor memory management (conservative, non-compacting GC), and many upstart implementations which solve these problems. Why is this?

It's not hard to answer. It's compatibility and status quo. If all my apps run fine on MatzRuby, and MatzRuby is already installed, and I'm satisfied with the performance characteristics of MatzRuby...why would I run anything else? Because MatzRuby is largely not *that bad* for most uses, especially as relates to simple system scripting work, it's unlikely to go away any time soon. And since all but one of the alternative impls is targeting Ruby 1.8 features and compatibility, the still-in-development Ruby 1.9 is not gaining much traction yet (which is probably a good thing).

You should all know the specs and status for MatzRuby. Current official release is 1.8.6 patchlevel 114. There's a 1.8.7 preview 2 out now that backports a whole bunch of features from 1.9 and breaks some compatibility. The jury's still out on whether it will go final as-is. MatzRuby is a simple AST-walking interpreter, with a conservative GC, minimal/cumbersome Unicode support, and a large library of third-party native extensions. MatzRuby is still the gold standard for what Ruby 1.8 "is".

The peril for Ruby 1.8 right now involves keeping that 99% of Ruby users interested in Ruby while 1.9 bakes without breaking compatibility. Ruby 1.8.7 pre1 introduced some Ruby 1.9 features but also broke a bunch of stuff (most notably Rails), and pre2 doesn't pass the specs 1.8.6 does. We had a design meeting last week where it was decided that we folks working on the Ruby specs need to help the Ruby core team get involved, so they can start running the full suite as part of their development process. That's going to happen soon, but until it does 1.8 releases can break basically anything from day today and nobody will know about it.

Beyond compatibility and keeping the masses happy, Ruby 1.8 could use a little performance, scaling, and memory lovin' too. Unfortunately almost all that effort is going toward Ruby 1.9 now, leaving the vast majority of Ruby users stuck on one of the slowest implementations. That's good for us alternative implementers, since it means we're gaining users every day; but it's not good for the MatzRuby lineage because they're losing mindshare. It's hard to deny that the future of Ruby lies with the "excellent" implementations, if not with the "best" ones, and the definition of "excellent" is moving forward every day. Ruby 1.8 is not.

Ruby 1.9

Ruby 1.9 is the merging of the Ruby 1.8 class library and memory model with a large number of new features and a bytecode-based execution engine. It represents the work of Koichi Sasada, who first announced his YARV ("Yet Another Ruby VM") project at RubyConf 2004. YARV took a bit longer than he and many others expected to be completed, but as a result of his tireless efforts it is now the official Ruby 1.9 VM.

Ruby 1.9 introduces many new features, like a character-aware (Unicode and any other encoding) String implementation, Enumerator/Enumerable enhancements, and numerous refinements and additions to the rest of the class library I won't attempt to list here. Ruby 1.9 is defining, in essence, what the rest of the implementations will soon have to implement. For all the debates, most of the additions in Ruby 1.9 have been well-received by the community, though the exposure level is still extremely low.

Ruby 1.9 has, I believe, reached the Rails singularity. With some work over the past few months, Rails has moved closer and closer to running on Ruby 1.9. Last I heard, there was only one bug that needed patching in the 1.9 codebase for Rails trunk to run unmodified. Expect to see an announcement about Ruby 1.9 and Rails at RailsConf next month.

The interesting thing about Ruby 1.9 is that it will mark only the second non-MatzRuby implementation to run Rails, JRuby being the first. This is due in large part to the massive effort required to implement a bytecode VM for Ruby (1.9 was only just released this past December), and to the fact that Ruby 1.9 is still very much a moving target. APIs are being added and refined, optimizations are being tossed about at the VM level, and memory and GC improvements are being considered. So while it's unlikely that anyone will be moving Rails apps to Ruby 1.9 in the near future, Ruby 1.9 is certainly viable...and represents the most-likely future evolution of the Ruby language itself.

What worries me about Ruby 1.9 is that its performance doesn't seem "better enough" to change the future of Ruby. While Koichi's own benchmarks show it's much faster than Ruby 1.8, on general application benchmarks it's usually less than a 50% improvement. Many times, JRuby is able to exceed Ruby 1.9 performance, even without similar optimizations and feature removals. It is certainly a "better" performance story than Ruby 1.8, but is it enough?

Ruby 1.9 also took the first steps toward concurrency by making threads native, but it encumbered them with a giant lock a la Python. That means you still can't get concurrent execution of Ruby code on Ruby 1.9, something JRuby's been able to do all along, and you can't scale Ruby 1.9 any better on wide systems than you could with Ruby 1.8. There's plans to solve this by adding fine-grained locks to most internal data structures, but that's a hard problem to solve on par with JRuby 2.0 challenges I'll talk about in a minute. And without something like the JVM to really optimize that locking, performance will take a hit.

It's also unclear if people really *want* all of what Ruby 1.9 has to offer. Sure, people love the idea of a real encoding-aware String, but response to the rest of what Ruby 1.9 offers--including performance--has been a collective "meh." People are not flocking to Ruby 1.9 in droves, and many are contemplating whether their future Ruby work will simply be a lateral move to one of the 1.8-compatible implementations. And on the JRuby project, we've received almost no requests to implement Ruby 1.9 features, so we've only added a few tiny ones. Whither Ruby 2.0?

JRuby

Ahh, JRuby. How you have changed my life.

JRuby is a Java-based implementation of Ruby, or if you prefer not to speak the word "Java", it's Ruby for the JVM. JRuby was started in 2002 by Jan Arne Petersen, and though it had a couple good years of activity it never really got to a compatible-enough level to run real Ruby applications. Jan Arne moved on at some point and efforts were largely picked up by Thomas Enebo, current co-lead of JRuby. He was especially active when JRuby was being updated from Ruby 1.6 compatibility to Ruby 1.8.4 compatibility, a task which today is largely complete. I joined the project in fall 2004 after attending RubyConf 2004, and at the time I did not know Ruby. Now I know Ruby in a deeper way than I ever really wanted to...but that's a discussion for another day. I wasn't a hugely active contributor until late 2005, when I started working on a new interpreter and refactoring JRuby internals. I presented JRuby for the first time at RubyConf 2005, and then in early 2006 milestones started dropping like flies: IRB ran, then RubyGems, then Rake...and then Rails. We were still dog slow at the time, but we were viable.

JRuby reached the Rails singularity in time for JavaOne 2006, an event that led Sun to hire Tom and me in fall 2006. We demonstrated on stage JRuby running a simple Rails application. It was cobbled together, running under either Tomcat or simply WEBrick at the time, but we had proved it was possible to have an alternative Ruby implementation compatible enough to run Rails. JRuby was no longer a joke.

Over the past two years (man, has it really been two years?) we've essentially rewritten almost all of JRuby a piece at a time. We've been through three interpreters, one prototype compiler and one complete compiler, multiple Regexp engines, and at least two implementations of the key core classes. I wrote the compiler for JRuby during summer 2007, completing it around RailsConf EU 2007. My first compiler! We now run faster than Ruby 1.8 in both interpreted and compiled modes, with interpreted being perhaps 15-20% faster and compiled being at least a few times faster, generally on par with Ruby 1.9. Of course since we're based on the JVM, we share its object model, garbage collector, binary representation. So JRuby is certainly a "mini-VM" but we leave the nasty bits to JVM implementers to handle. Pragmatism, friends, pragmatism.

Perhaps the most notable result of JRuby's existence is that there are now so many Ruby implementations. If we had not shown the promise, many of the others might not have risked the peril. Oh, and we've ended up with a cracker-jack implementation of Ruby on the JVM too...I suppose that's worth a little something.

Perils...always perils. JRuby has managed to surmount most of the perils that await other implementations. And being on the other side of the chasm, I can tell you now it doesn't get easier.

Compatibility is *hard*. I'm not talking a little hard, I'm talking monumentally hard. Ruby is a very flexible, complicated language to implement, and it ships with a number of very flexible, complicated core class implementations. Very little exists in the way of specifications and test kits, so what we've done with JRuby we've done by stitching together every suite we could find. And after all this time, we still have known bugs and weekly reports of minor incompatibilities. I don't think an alternative implementation can ever truly become "compatible" as much as "more compatible". We're certainly the most compatible alternative impl, and even now we've got our hands full fixing bugs. Then there's Ruby 1.9 support, coming up probably in JRuby 1.2ish. Another adventure.

Performance is also hard, but maybe not *hard* hard. JRuby is lucky to run on one of the fastest VMs in existence. The JVM, in its many incarnations, has been so refined and the JVM implementation arena so competitive that we get a lot of performance for free. But by "for free" I mean Java performance. Java's far easier to write and maintain than C, and on the JVM we know we won't pay a performance penalty for not writing C code. But making Ruby fast on the JVM is where it gets tricky. JVMs are optimized for Java and Java-like languages. JRuby has to include all sorts of tricks and subsystems to make the runtime and compiled Ruby code "feel" a bit more like Java to the JVM. This has involved, in many cases, implementing our own "mini-VM" on top of the JVM, with mixed-mode execution (interpreted, then JITed to JVM bytecode), call site caches (to speed method lookup), and code alterations that sometimes improve performance at the cost of LOC and readability. The challenge for us going forward is to continue improving performance without making JRuby a jumbled mess. John Rose's work on the Da Vinci Machine and dynamic invocation for JDK 7 will help us rip a lot of code out, but only a small subset of JRuby users will see the benefits in the near term. So expect to see us spend a lot more time on performance for Java 5/6 compatible JVMs.

The final big peril for us relates to JRuby 2.0's Java integration support. JRuby currently has a split object model, where Ruby types are all "IRubyObject" implementations and the runtime only understands how to deal with "IRubyObject. This means that in order for us to call methods on non-Ruby Java types, we must wrap them with an IRubyObject wrapper. This is partially to attach our meta-object protocol to those objects, but mostly because every bloody method in the system accepts only IRubyObject as a parameter or return type. In order for us to achieve the "last mile" of Java integration, we need to make the entire system accept "Object" and act appropriately; we've been calling this approach "lightweights", since it would enable using normal Java objects for several core classes like Fixnum and Float. Support for "Object" lightweights would then feed into JRuby 2.0's reworked Java integration layer, eliminating most of the overhead (and code) associated with calling Java methods today. It would also fit better into the invokedynamic work. It's a big job I wouldn't expect to be complete until later this fall, and we need to mercilessly write specs and tests for the current behavior to avoid regressing features we support today. But we're going to do it; we've already started.

Rubinius

Evan Phoenix's Rubinius project is an effort to implement Ruby using as much Ruby code as possible. It is not, as professed, "Ruby in Ruby" anymore. Rubinius started out as a 100% Ruby implementation of Ruby that bootstrapped and ran on top of MatzRuby. Over time, though the "Ruby in Ruby" moniker has stuck, Rubinius has become more or less half C and half Ruby. It boasts a stackless bytecode-based VM (compare with Ruby 1.9, which does use the C stack), a "better" generational, compacting garbage collector, and a good bit more Ruby code in the core libraries, making several of the core methods easier to understand, maintain, and implement in the first place.

The promise of Rubinius is pretty large. If it can be made compatible, and made to run fast, it might represent a better Ruby VM than YARV. Because a fair portion of Rubinius is actually implemented in Ruby, being able to run Ruby code fast would mean all code runs faster. And the improved GC would solve some of the scaling issues Ruby 1.8 and Ruby 1.9 will face.

Rubinius also brings some other innovations. The one most likely to see general visibility is Rubinius's Multiple-VM API. JRuby has supported MVM from the beginning, since a JRuby runtime is "just another Java object". But Evan has built simple MVM support in Rubinius and put a pretty nice API on it. That API is the one we're currently looking at improving and making standard for user-land MVM in JRuby and Ruby 1.9. Rubinius has also shown that taking a somewhat more Smalltalk-like approach to Ruby implementation is feasible.

But here be dragons.

In the 1.5 years since Rubinius was officially named and born into the Ruby world, it has not yet met any of these promises. It is not generally faster than Ruby 1.8, though it performs pretty well on some low-level microbenchmarks. It is not implemented in Ruby: the current VM is written in C and the codebase hosts as much C code as it does Ruby code. Evan's work on a C++ rewrite of the VM will make Rubinius the first C++-based Ruby implementation. It has not reached the Rails singularity yet, though they may achieve it for RailsConf (probably in the same cobbled-together state JRuby did at JavaOne 2006...or maybe a bit better). And the second Rails inflection point--running Rails faster than Ruby 1.8--is still far away.

Compatibility is not going to be a problem for Rubinius. They've worked very hard from the beginning to match Ruby behavior, even launching a Ruby specification suite project to officially test that behavior using Ruby 1.8 as the standard. I have no doubt Rubinius will be able to run Rails and most other Ruby apps people throw at it. And despite Evan's frequent cowboy attitude to language compatibility (such as his early refusal to implement left-to-right evaluation ordering, a fatal decision that led to the current VM rework), compatibility is likely to be a simple matter of time and effort, driven by the spec suite and by actual applications, as people start running real code on Rubinius.

Performance is going to be a much harder problem for Rubinius. In order for Rubinius to perform well, method invocation must be extremely fast. Not just faster than Ruby 1.8 or Ruby 1.9, but perhaps an order of magnitude faster than the fastest Ruby implementations. The simple reason for this is that with so much of the core classes implemented in Ruby, Rubinius is doing many times more dynamic invocations than any other implementation. If a given String method represents one or two dynamic calls in JRuby or Ruby 1.8, it may represent twenty in Rubinius...and sometimes more. All that dispatch has a severe cost, and on most benchmarks involving heavily Ruby-based classes Rubinius has absolutely dismal performance--even with call-site optimizations that finally pushed JRuby's performance to Ruby 1.9 levels. A few benchmarks I've run from JRuby's suite must be ratcheted down a couple orders of magnitude to even complete.

And the Rubinius team knows this. Over the past few months, more and more core methods have been reimplemented in C as "primitives", sometimes because they have to be to interact with C-level memory and VM constructs, but frequently for performance reasons. So the "Ruby in Ruby" implementation has evolved away from that ideal rather than towards it, and performance is still not acceptable for most applications. In theory, none of this should be insurmountable. Smalltalk VMs run significantly faster than most Ruby implementations and still implement all or most of the core in Smalltalk. Even the JVM, largely associated with the statically-typed Java language, is essentially an optimized dynamic language VM, and the majority of Java's core is implemented in Java...often behind interfaces and abstractions that require a good dynamic runtime. But these projects have hundreds of man-years behind them, where Rubinius has only a handful of full-time and part-time enthusiastic Rubyists, most with no experience in implementing high-performance language runtimes. And Evan is still primarily responsible for everything at the VM level.

Of course, it would be folly to suggest that the Rubinius team should focus on performance before compatibility. The "Ruby in Ruby" meme needs to die (seriously!), but other than that Rubinius is an extremely promising implementation of Ruby. Its performance is terrible for most apps, but not all that much worse than JRuby's performance was when we reached the Rails singularity ourselves. And its design is going to be easier to evolve than comparable C implementations, assuming that people other than Evan learn to really understand the VM core. I believe the promise of Rubinius is certainly great enough to continue the project, even if the perils are going to present some truly epic challenges for Evan and company to overcome.

Update: The Rubinius team has had a few things to say about this as well.

Evan Phoenix argues that the 50/50 split between C and Ruby really translates into a lot more logic in Ruby, because of course we all know about Ruby's legendary terseness and density. And he's got a good point; even at 50/50 Rubinius is easily "mostly" Ruby. But nobody would claim that the JVM is Java in Java, even though the ratio of C/C++ code to Java code is vastly in Java's favor. Rubinius has a C core...Rubinius's VM is written in C. It might be 100% "Ruby in Ruby" some day, but for now it's not.

Brian Ford posted further about the "Ruby in Ruby" meme, arguing rightly that "Ruby in Ruby" is an ideal we should all be striving for, and that ideal should never die. I agree wholeheartedly on that point. The meme I referred to is the growing idea that Rubinius is automatically going to be a better implementation than all others simply because it's written in Ruby. So far that hasn't been the case. And that meme has also been used as a club, claiming other implementations (even Matz's own implementation) are not for Ruby programmers as much as Rubinius is.

Rubinius is, and always has been, a great project and a great idea. I talk with Evan and Brian and all the others on a daily basis, I contribute specs whenever I find gaps or fix bugs in JRuby, and I secretly harbor a desire to implement a JRuby/JVM backend for the Rubinius kernel. I'm sure we'll see great things from Rubinius in the future.

IronRuby

I've had a love/hate relationship with Microsoft over the years. Recently, it's been more "I love to hate them", but there are some shining stars over there. And IronRuby is certainly one of them.

Microsoft's IronRuby project (or perhaps "Microsoft IronRuby") is the current most-viable .NET-based Ruby implementation. It is led by John Lam, formerly of RubyCLR fame, and a small team of folks in Microsoft's languages group. IronRuby really has its roots in the Ruby.NET project from Queensland University of Technology, and like JRuby the real seed for both projects was the implementation of a Ruby 1.8-compatible parser. IronRuby is based on Microsoft's Dynamic Language Runtime, a collection of libraries to make dynamic languages easier to implement and to work around the performance constraints of CLR's strong preference for static-typed languages. In recent months it's probably safe to say that IronRuby has been driving DLR work, since by my estimation it represents the most difficult-to-implement dynamic language Microsoft is currently working on, and IronPython is mostly done, other than ongoing performance work.

I call IronRuby a shining star not because of the implementation, which is fairly mundane, or because of the DLR, which is perhaps clever in places but certainly "just plain necessary" for CLR dynlang performance in others. IronRuby is a shining star because it's the first Microsoft project under the Microsoft Permissive License, a "truly OSS" license approved by OSI. It represents the first project at Microsoft that I've thought gives the company any real hope for the future, because John Lam and company could truly show the advantage of more openness and closer community cooperation. So the OSS thing is certainly part of the promise of IronRuby, but it's not really a Ruby thing.

The main promise of IronRuby is a compatible, performant implementation of Ruby that runs on the CLR (and by extension, runs well on Windows). IronRuby currently mostly uses normal CLR types for all the core classes, building Ruby's String on CLR's StringBuilder, Ruby's Array on CLR's ArrayList, and so on. They've also made Ruby objects and CLR objects largely indistinguishable from one another as far as call dispatch goes, where in JRuby all non-Ruby Java objects entering the system must be wrapped with a Ruby-aware dispatcher (to be remedied in JRuby 2.0ish, as mentioned above). Of course IronRuby boasts advantages similar to JRuby, since it can leverage the CLR garbage collector, memory model, performance. And of course, having Microsoft backing your project should count for something.

IronRuby could also provide a Rails deployment option that Windows folks will actually want to use. Windows support in Ruby has always lagged behind UNIX support, partially because Windows "sucks" for various definitions of "sucks", and partially because most Rubyists don't use Windows. The ones that do use Windows have often felt abandoned, leading to projects like Daniel Berger's Ruby fork Sapphire, which counts among its features "Better support for MS Windows". IronRuby on .NET on Windows would in theory integrate very well with other Windows/.NET properties like IIS, ADO (or whatever it's called now), and Microsoft's new MVC layer. So for Windows users, IronRuby ought to be a big win, and they're understandably excited about it. Set up a tweetscan for "ironruby" and you'll see what I mean...there are nearly as many anticipatory tweets about IronRuby as there are practical tweets about JRuby.

But there's some peril here too. IronRuby is largely still being developed in a vacuum. Perhaps in order to have secrets to announce at "the next big conference" or perhaps because Microsoft's own policies require it, IronRuby's development process proceeds largely from all-internal commits, all-internal discussions, and all-internal emails that periodically result in a blob of code tossed over the fence to external contributors. The OSS story has improved, since those of us on the outside can actually get access to the code, but the necessary two-way street still isn't there. That's going to slow progress, and eventually could make it impossible for IronRuby to keep up as resources are moved to other projects at Microsoft. JRuby has managed to sustain for as long as it has with only two fulltime developers entirely because of our community and openness, and indeed JRuby would never have been possible without a fully OSS process.

IronRuby is also going to have trouble running Rails in its current form. Rails 2.x is still hindered by its inability to process concurrent requests in parallel on a single process. Because of various thread-unsafeties in the Rails libraries, concurrent requests must be shunted off to separate processes, or in the case of JRuby to separate JRuby instances in the same JVM process. There is work underway to improve this, some of it through GSOC, but it's still going to be a while before you can run your entire app on a single process with any of the C implementations. Even then, if you want to run many Rails apps, you'll still need multiple process with Ruby 1.8. And this is where IronRuby is going to get burned.

As I understand it, currently there is no way to provide multiple isolated execution environments in IronRuby as you can in JRuby or Rubinius. The reasons for this are beyond me, but many language implementations on top of a general-purpose VM avoid the complexity of "multiple runtimes" to better integrate with the rest of the system. JRuby's MVM support, for example, makes serializing Ruby objects as though they were normal Java objects nearly impossible, because upon deserialization there's no way to know which JRuby instance to attach the object to. In IronRuby's case, any inability to run multiple environments in the same process will mean IronRuby users must launch multiple processes to run Rails, just like the Ruby 1.8 and 1.9 users do. And that would be, in my opinion, an unacceptable state of affairs.

I also believe that the IronRuby team does not yet understand the scope of what's necessary to run Rails. John Lam has been quoted at several events saying they hope to run a "hello world" Rails app at RailsConf, but IronRuby can't run IRB, RubyGems, or Rake yet. John has also been tweeting periodic updates on IronRuby's spec-passing rate, even though Rubinius passes most of those specs and still can't run Rails (and JRuby passes more than Rubinius along with another 48000 assertions in our own test suite, only some of which have equivalents in the specs). As far as time spent on implementation, IronRuby really only has about a year of progress in, since Silverlight integration, demos, and presentations often pull them away for weeks at a time.

There's also a final peril the IronRuby will have to deal with: Microsoft would never back an OSS web framework like Rails in preference to its own. John Lam has repeatedly said that IronRuby will run Rails, and I believe him. But that goal is almost certainly not a Microsoft priority, since they have their own proprietary technologies to push. If John's able to do it, it will have to come from his small team and community contributors, leading back to the OSS peril above.

Be that as it may, an implementation of Ruby for the CLR is certainly going to happen, and I believe it's necessary for the Ruby ecosystem to survive for there to be a CLR Ruby. IronRuby is going to be that project, and it's already driving competition in the Ruby implementation world. John Lam and I talk at conferences, exchange tweets and emails, and I've been building and running IronRuby occasionally to check on their progress. I also know the IronRuby team realizes they could be more open and probably wants to be more open. And perhaps if they read this article they'll start to realize there's a lot more work involved in reaching the Rails singularity than running specs and having a working String implementation. There's pain involved, and they've not yet begun to feel it.

MacRuby

Ruby 1.9 fixes some of MatzRuby's issues, but not all of them. Though it brings a much-faster bytecode VM, improved method-dispatch cost, and a number of other execution-related performance tweaks, it does not solve problems with Ruby's memory model and garbage collector. And partially for this reason Apple's Laurent Sansonetti has been working on MacRuby, a forked rework of Ruby 1.9 targeting the Objective C runtime.

Laurent is famous for his past work on RubyCocoa, bindings to allow Ruby to deliver beautiful, top-notch UIs on OS X. I don't know how long he's been working on MacRuby, since some of its life was spent in secret, but it's been open-source for a few months now. Currently Laurent has been working on converting the core classes from C implementations over to using ObjC equivalents. Part of the goal of MacRuby is to provide a Ruby that can interact directly with ObjC: Ruby objects are ObjC objects and vice versa; Ruby can call methods on ObjC objects and vice-versa. So in this sense, MacRuby is perhaps more similar to JRuby or IronRuby than to Rubinius or the MatzRuby lineage. It is an implementation of Ruby for a general-purpose runtime.

MacRuby promises one thing for certain: tight integration with ObjC and by extension with much of OS X. Because ObjC is the language of choice for development on OS X, MacRuby users will certainly have the cleanest, tightest integration with OS libraries and primitives, far better than any other implementation can provide.

There's also a chance that ObjC's core classes (which essentially are repurposed as MacRuby's core classes) will have better peformance characteristics than the hand-written C impls in MatzRuby. Because MacRuby is based on Ruby 1.9, it shares Ruby 1.9's bytecode-based execution engine. This means that most performance gains will come from better core class implementations. Already Laurent has been tweeting some impressive microbenchmark numbers showing e.g. Array performance can be substantially better than Ruby 1.9. And there are performance gains to be had calling ObjC code, of course, since dispatch is now essentially ObjC dispatch directly rather than passing through Ruby's own dispatch logic.

MacRuby may also eventually represent a better way to run Rails on OS X. Because of YARV, it should have good performance. Because of ObjC, it should have a good memory model. And because it's "MacRuby" it should fit well into the rest of the system, likely leading to a simpler, better-integrated deployment story for Rails.

The biggest peril for MacRuby is pretty obvious: Why Ruby 1.9? Ruby 1.9 is still under active development, and APIs are being added and tweaked as we speak. Laurent's fork is going to get further and further away, even if he's able to keep portions up-to-date; that's going to make compatibility a serious challenge. Choosing Ruby 1.9 certainly makes sense from a performance perspective, but many Ruby apps out there don't run on it, and most developers aren't targeting it because it's still a moving target. Even on JRuby, where we've got prototype implementations of both YARV's and Rubinius's bytecode engines, we've opted not to hit 1.9 features hard yet. Ruby 1.9 is in progress, and that will mean a lot more effort required to keep MacRuby up to date and to make it an attractive option.

There's also a chance that Ruby implemented on top of Objective C isn't going to perform that much better, on the whole, than Ruby 1.9's all-C approach. While some of Laurent's benchmarks have been impressively faster than Ruby 1.9, many others have been equally slower. And all benchmarks I've run, which are a little less "micro", have been slower on MacRuby than either Ruby 1.9 or MatzRuby. That's not very promising.

Regardless of the peril, MacRuby seems like a great idea. Objective C is a solid runtime, and the fact that it's so heavily used on OS X means MacRuby will have an excellent integration story. MacRuby may not add much value in the runtime/performance/execution department, but will potentially teach us all lessons about integrating with a general-purpose runtime. And of course MacRuby may eventually be shipped with OS X, making Ruby a first-class language for writing Mac apps. That alone is probably worth it.

Fading Implementations

It's worth spending a few words on the "no longer viable" implementations here as well.

XRuby, product of Xue Yong Zhi and a few others, was the first Ruby implementation to have a full JVM bytecode compiler. Performance early on looked very good, but the lack of a compatible set of core classes and resource limitations caused its development to lag terribly. The most recent release was 0.3.3 on March 24, and since then there have been only two commits. Xue has admitted he has no time to work on the project, and without a heavy, long-term time investment from someone XRuby is likely to fade away.

Ruby.NET is a similar story. Started off a research grant from Microsoft at Queensland University of Technology, Ruby.NET is the product of John Gough and Wayne Kelly. The project was a proof-of-concept for Ruby on CLR, to show it could be done and work through some of the early challenges in getting there. It was officially made into an open source project last year, and for a while there were many interested contributors. But although the Ruby.NET parser lives on in IronRuby, the project has largely ground to a halt since Wayne officially threw his support behind Microsoft's project. There have been two commits in the past month, and the last two really active mailing list threads were titled "The future of Ruby.NET" and "Has IronRuby killed off Ruby.NET".

And Cardinal is still pining for the fjords.

New Contenders

I don't know much about these projects, but I'm sure they'll all bring their own flavor to the Ruby soup.

HotRuby is an implementation of Ruby in JavaScript. It implements Ruby 1.9's bytecode engine and all core classes using JavaScript types. Performance looks good on their site, but they can't run anything yet and haven't even begun to discuss compatibility. It may show promise in time, but it's an interesting toy for now.

MagLev from GemStone appears to be something Ruby-related. Not much (anything?) has been publicly said about it. It's mysterious. It has a nice viral front page. Rails may be involved.

IronMonkey is an effort to port IronPython and IronRuby to the Tamarin VM Adobe donated to the Mozilla project. It's being led by Seo Sanghyoen, though from what I hear he hasn't had a lot of time to work on it. Python and Ruby in the browser, cross-platform, without vendor-lock in. Could be interesting.

Final Thoughts

Have you started working on your Ruby implementation yet? All the cool kids are doing it. It's remarkable how many implementations of Ruby are in the works right now. It remains to be seen whether the ecosystem can support such diversity in the long term, but at the very least they're introducing splendid variation. And there's a lot more to do with Ruby in terms of performance, scaling, and "getting things done". Ruby's future is looking bright, in no small part due to the many implementations. How's your favorite language looking?

Update: Vladimir Sizikov has a nice short article on the value of the RubySpecs project, and since it's so important for the future of these alternative implementations I thought it deserved a mention. He also includes links to his RubySpecs quickstart guide and the current RubySpecs overview page. If you haven't contributed to the specs yet, you should feel guilty.

Thursday, July 12, 2007

To Keyword Or Not To Keyword

One of the most attractive aspects of Ruby is the fact that it has relatively few sacred keywords. In most cases, things you'd expect to be keywords are actually methods, and you can wrap or hook their behavior and create amazing potential.

One perfect example of this is require. Because require is just a method, you can define your own version that wraps its behavior. This is exactly how RubyGems does its magic...rather than immediately calling the default require, it can modify load paths based on your installed gems, allowing for a dynamically-expanding load path and the pluggability we've all come to know and love.

But all such keyword-like methods are not so well behaved. Many methods make runtime changes that are otherwise impossible to do from normal Ruby code. Most of these are on Kernel. I propose that several of these methods should actually be keywords.

Update: Evan Phoenix of Rubinius (EngineYard), Wayne Kelly of Ruby.NET (Queensland University), and John Lam of IronRuby (Microsoft) have voiced their agreement on this interesting ruby-core mailing list thread. Have you shared your thoughts?

Justifying Keywords

There's a number of (in my opinion, very strong) justifications for this:

Many Kernel methods manipulate runtime state in ways no other methods can. For example: local_variables requires access to the caller's variable scope; block_given? requires access to the block/iter stacks (in MRI code); eval requires access to just about everything having to do with a call; and there are others, see below.
Because many of these methods manipulate normally-inaccessible runtime state, it is not possible to implement them in Ruby code. Therefore, even if someone wanted to override them (the primary reason for them to be methods) they could not duplicate their behavior in the overridden version. Overriding only destroys their utility.
These methods are exactly the ones that complicate optimizing Ruby in all implementations, including Ruby 1.9, Rubinius, JRuby, Ruby.NET, and others. They confound a compiler's efforts to optimize calls by always leaving open questions about the behavior of a method. Will it need access to a heap-allocated scope? Will it save off a binding or the current call frame? No way to know for sure, since they're methods.

In short, there appears to be no good reason to keep them as methods, and many reasons to make them keywords. What follows is a short list of such methods and why they ought to be keywords:

*eval - requires implicit access to the caller's binding
block_given?/iterator? - requires access to block/iter information
local_variables - requires access to caller's scope
public/private/protected - requires access to current frame's visibility

There may be others, but these are definitely the biggest offenders. The three points above were used to compile this list, but my criteria for a keyword could be the following more straightforward points. A feature should be implemented (or converted to) a keyword if it fits either of the following criteria:

It manipulates runtime state in ways impossible from user-created code
It can't be implemented in user-created code, and therefore could not reasonably be overridden or hooked to provide additional behavior

As an alternative, if modifications could be made to ensure these methods were not overridable, Ruby implementations could safely treat them as keywords; searching for calls to "eval" in a given context would be guaranteed to mean an eval would take place in that context.

What do we gain from doing all this?

I can at least give a JRuby perspective. I expect others can give their perspectives.

In JRuby, we could greatly optimize method invocations if, for example, we knew we could just use Java's local variables (on Java's stack) rather than always heap-allocating a scoping structure. We could also avoid allocating a frame or binding when they are not needed, just allowing Java's call frame to be "enough" for us. We can already detect if there are closures in a given context, which helps us learn that a heap-allocated scope will be necessary, but we can't safely detect eval, block_given?, etc. As a result of these methods-that-would-be-keywords, we're forced to set up and tear down every method in the most expensive manner.

Other implementations&emdash;including Ruby 1.9/2.0 and Rubinius&emdash;would probably be able to make similar optimizations if we could calculate ahead of time whether these keyword operations would occur.

For what it's worth, I may go ahead and implement JRuby's compiler to treat these methods as keywords, only falling back on the "method" behavior when we detect in the rest of the system that the keyword has been overridden. But that situation is far from ideal...we'd like to see all implementations adopt this behavior and so benefit equally.

As an example, here's an early demonstration of the performance change in our old friend fib() when we can know ahead of time if any of these keywords are called (fib calls none of them). This example shows the performance today and the performance when we can safely just use Java local variables and scoping constructs. We could additionally omit heap-allocated frames for each call, giving a further boost.

I've included Ruby 1.8.6 to provide a reference value.

Current JRuby:

~ $ jruby -J-server bench_fib_recursive.rb
1.323000   0.000000   1.323000 (  1.323000)
1.118000   0.000000   1.118000 (  1.119000)
1.055000   0.000000   1.055000 (  1.056000)
1.054000   0.000000   1.054000 (  1.054000)
1.055000   0.000000   1.055000 (  1.054000)
1.055000   0.000000   1.055000 (  1.055000)
1.055000   0.000000   1.055000 (  1.055000)
1.049000   0.000000   1.049000 (  1.049000)

~ $ jruby -J-server bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
3.901000   0.000000   3.901000 (  3.901000)
4.468000   0.000000   4.468000 (  4.468000)
2.446000   0.000000   2.446000 (  2.446000)
2.400000   0.000000   2.400000 (  2.400000)
2.423000   0.000000   2.423000 (  2.423000)
2.397000   0.000000   2.397000 (  2.397000)
2.399000   0.000000   2.399000 (  2.399000)
2.401000   0.000000   2.401000 (  2.401000)
2.427000   0.000000   2.427000 (  2.428000)
2.403000   0.000000   2.403000 (  2.403000)

Using Java's local variables instead of a heap-allocated scope:

~ $ jruby -J-server bench_fib_recursive.rb
2.360000   0.000000   2.360000 (  2.360000)
0.818000   0.000000   0.818000 (  0.818000)
0.775000   0.000000   0.775000 (  0.775000)
0.773000   0.000000   0.773000 (  0.773000)
0.799000   0.000000   0.799000 (  0.799000)
0.771000   0.000000   0.771000 (  0.771000)
0.776000   0.000000   0.776000 (  0.776000)
0.770000   0.000000   0.770000 (  0.769000)

~ $ jruby -J-server bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
3.100000   0.000000   3.100000 (  3.100000)
3.487000   0.000000   3.487000 (  3.487000)
1.705000   0.000000   1.705000 (  1.706000)
1.684000   0.000000   1.684000 (  1.684000)
1.678000   0.000000   1.678000 (  1.678000)
1.683000   0.000000   1.683000 (  1.683000)
1.679000   0.000000   1.679000 (  1.679000)
1.679000   0.000000   1.679000 (  1.679000)
1.681000   0.000000   1.681000 (  1.681000)
1.679000   0.000000   1.679000 (  1.679000)

Ruby 1.8.6:

~ $ ruby bench_fib_recursive.rb     
1.760000   0.010000   1.770000 (  1.775304)
1.750000   0.000000   1.750000 (  1.770101)
1.760000   0.010000   1.770000 (  1.768833)
1.750000   0.010000   1.760000 (  1.782908)
1.750000   0.010000   1.760000 (  1.774193)
1.750000   0.000000   1.750000 (  1.766951)
1.750000   0.010000   1.760000 (  1.777814)
1.750000   0.010000   1.760000 (  1.782449)

~ $ ruby bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
2.240000   0.000000   2.240000 (  2.268611)
2.160000   0.010000   2.170000 (  2.187729)
2.280000   0.010000   2.290000 (  2.292342)
2.210000   0.010000   2.220000 (  2.250331)
2.190000   0.010000   2.200000 (  2.210965)
2.230000   0.000000   2.230000 (  2.260737)
2.240000   0.010000   2.250000 (  2.256210)
2.150000   0.010000   2.160000 (  2.173298)
2.250000   0.010000   2.260000 (  2.271438)
2.160000   0.000000   2.160000 (  2.183670)

What do you think? Is it worth it?

Saturday, March 10, 2007

Ruby on Grails? Why the hell not?

I've spent some time this weekend looking over the Grails codebase. After a truly gigantic checkout (Grails bundles production JARs for all the libraries it integrates) and after swimming through a substantial, single-project sea of source files, I think I'm starting to follow it (though a tour by someone familiar with Grails internals would be nice). But there's something else I've learned in the process...

There's really no reason this couldn't use Ruby just as easily as Groovy.

Let's look at some really dumb numbers. Grails, as it turns out, is actually over 2/3 Java code, unlike Rails which is 100% Ruby. Grails contains around 436 Java source files for about 54kloc of code and 273 Groovy source files for about 23kloc of code. So Java outnumbers Groovy by a 2:1 ratio. And yes, Java's probably more verbose than Groovy in most cases. But my point stands, and I'll back it up with more...

Then there's the libraries. The Grails binary release weighs in at a whopping 26MB compressed and 49MB uncompressed (*wow*). But 31MB of that is third-party libraries completely independent of Groovy, things like Hibernate, XFire, Quartz, Apache Commons, and so on. Almost all of which (or perhaps all of which) are pure Java code. So the ratio is even more heavily toward Java.

These numbers and my exploration of the code base lead me to a surprising conclusion: Grails is not a "Groovy on Rails" by any means. It's a large number of popular, well-established Java libraries stitched together mostly by Java code with a thin layer at the front (20kloc is not much) of Groovy code for end-users to see and work with. And I guarantee you it's not the Groovy part that's most interesting here. Hell, you could write that part in Java if you followed enough "convention over configuration", and you certainly could write it in Ruby.

I decided to do this exploration after realizing there's nothing about Ruby or JRuby that couldn't be used to wire a bunch of Java libraries together--indeed, that's one of the biggest selling points of JRuby, the fact that you can write plain old Ruby code but call Java libraries almost seamlessly (with day-by-day improving integration points and performance numbers). So why not take Grails and do a port to Ruby? Groovy code isn't far off from Ruby in many ways, and everything you can do in Groovy you can do in Ruby (plus more), so the port should be pretty straightforward, right?

Then I find out that Grails is mostly Java code. And instead of being annoyed or repulsed by this revelation, I was intrigued. Why not make Grails work with Ruby as well? And really, why not make it work with any scripting language? Why limit that nicely-integrated back end by forcing everyone to use Groovy, regardless of preference? Have we learned nothing from a decade of "100% Pure Java"?

So I propose the following.

I believe it would be possible to completely isolate Grails' tastier bits from Groovy. I don't claim it would be easy, but Grails is fairly well interfacified, injectificated, and abstractilicious, so making that dividing line clear and bold is certainly possible (and heck, that's the point of dependency injection and component frameworks anyway, isn't it?). Really, I could start implementing a few of those interfaces right now with Ruby code, but that's not the right approach.
Grails' value is not in Groovy, but in the clean integration of established libraries behind the "Convention Over Configuration" mantra. There's nothing specifically Groovy (or Ruby) about that religion, and there's nothing Groovy does in Grails that another JVM language couldn't do just as easily. This is *truth*. And Grails would benefit tremendously by expanding to other languages. Being tied to Groovy alone will hold it back (just as a framework tied to any one language will be held back, Ruby included).
I also believe that if the Grails project doesn't make an official effort in this direction, others are likely to take up that mantle themselves. Though the debates will forever rage about which JVM dynlang should carry us into the future, the same facts that have guided programming for half a century still apply today: No one language will be enough to carry us forward; no languages will survive indefinitely; and the language you'll use in ten years you've probably not even heard of yet. Did anyone expect Java to be the most widely-used, widely-deployed language ten years ago? Well today, it is. It won't always be.
There's also a very interesting angle here for language enthusiasts. Nobody doubts that the Java platform is officially becoming multi-lingual, possibly a complete reversal from years past. But there are going to be growing pains here, and we're already feeling them on some projects at Sun. We have this wonderful platform with all these wonderful languages...and they all do things differently. We have a first stab at official "Scripting" support in JSR-223, but it only defines the integration point from Java to the language in question...only partially defining the integration points for those languages calling back to Java (a tricky problem) and defining *nothing* related to making those languages interoperate with each other. Again, Java becomes the integration point, and the bottleneck (as in "how do you pass a dynamic-typed object from one language to another by sending it through a statically typed language?"). The very exercise of making Grails (or any other framework) work equally well across multiple user-facing language interfaces will require better interoperability and more clearly-defined integration points for dynamic and static languages on the JVM.
The Phobos guys at Sun are already tackling many of these same problems while expanding their JavaScript-mostly Rails-inspired web framework to other JSR-223 languages (notably, Ruby). So anyone taking on the Grails effort would certainly not be alone, and there are some damn smart engineers at Sun interested in tackling these very same problems.

And I guess in the end it's a question of how visionary we all can be. Can the Grails team envision a future for Grails that doesn't require the use of Groovy? Are you (and am I) as a non-Groovy developer willing to help them get there? And in the end, isn't this what thought leaders on a multi-lingual platform are supposed to be doing?

--

So, enough of the inflammatory bull-baiting. I dare you to tell me why I'm wrong here. Even better, I dare you to look into it yourself. 75kloc of code, only a third of it Groovy. Established libraries you're sure to have used at some point. Clean integration, following the principals inspired and proven by Rails.

But fronted with your language of choice.

Make it so.

Tuesday, March 6, 2007

NetBeans 6 Ruby Support Even Better Than Expected

See, here's the deal. I'm not a very advanced IDE user. I used Eclipse for years and JDE before that, and I wouldn't ever have considered myself a "pro" with either. I learned what I needed to get the job done and not a whole lot else. Beyond that I never knew some of the coolest features, and was never so good at convincing people to make a switch. This actually has helped my migration to NetBeans, since it basically does all the things I was used to in Eclipse. But it has been problematic when I try to convey how cool the new NetBeans Ruby support really is.

It's super cool. It does things I've not seen any other IDE or editor do.

However, it's best not to take my advice, and Roumen Strobl has recorded two seriously excellent demonstrations of Rails and Ruby support in NetBeans. They make the sale way better than I could (though I now have many good ideas for how to sell it better in the future).

If you have any interest in Ruby editors or IDEs, you should take the time and watch these, especially the Ruby support.

Two Demos: JRuby on Rails and Advanced Ruby Editing in NetBeans!

Thursday, January 18, 2007

JRuby Compiler: In Trunk and Ready to Play

Times they are a-changing.

I posted previously on JRuby's compiler work. There have been various iterations of the compiler, many purely prototype and never intended to be completed, and a few genuine attempts at evolving toward full Ruby support. However I believe in the recent weeks I've settled on a design that will carry us to the JRuby compiler endgame.

For the past year, we've emphasized correctness over performance nine times out of ten. When we did focus on performance, it was solely on improving JRuby's interpreter speed, in an attempt to match Ruby's performance in this area and because we knew that JRuby could never entirely escape interpretation. Ruby's just too dynamic for that. So while compatibility with Ruby 1.8.x continued to improve by leaps and bounds, our performance was rather poor in comparison.

This past fall, things started to change. Compatibility reached a point where we could finally be confident about our set of regression tests and our understanding of "how Ruby works" across all its weirdest features. As we understood better the design of the C implementation and the quirky intricacies of the language, we started to see a path to enlightenment. We started to realize how we could support Ruby as it exists today while simultaneously evolving JRuby into a more efficient and cleaner design. And so the performance numbers started to change.

From 0.9.0 to 0.9.1, we had a clean doubling of performance across the board. Our favorite benchmark--RDoc generation--was easily twice as fast, and other simpler benchmarks like fib had similar improvements. 0.9.2 was more of a rushed release for JavaPolis, but we had a good 1/4 to 1/3 speedup even then, since the ongoing refactoring removed another large chunk of overhead from JRuby's core runtime.

From 0.9.2 to current trunk, however, has been a different matter entirely.

The first major change is that we've started to seriously alter the way JRuby does dynamic method dispatching. I did some research, read a few papers, and mocked up and benchmarked a few options. What we've settled on for the moment is a combination of STI for the core classes (STI provides a large table mapping methods and classes to actual code) and various forms of inline caching for non-core classes (basically, for pure Ruby classes; though this is yet to be implemented in trunk). STI provides an extremely fast path for dispatch on those hardest-hit methods, since it reduces calling most core methods to two array indexes and a switch, a vast improvement over the hash lookup and multiple layers of abstraction and framing we had before.

We are continuing to expand our use of STI as it is applicable, and I will soon start exploring options for interpreted-mode inline caching (polymorphic, likely, though I need to run a few trials to get numbers balanced right). So fast dynamic dispatching is well on its way, and will improve performance across the board.

Then there's the compiler work. You have no idea how much it's irritated me to hear people talk about JRuby the past year and say "yeah, but it doesn't compile to Java bytecode." This obviously amounts to pure FUD, but beyond that it totally ignores the complexity of the problem: not a single person on this earth has managed to compile Ruby to a general-purpose VM yet. So complaining about our missing compiler is a bit like complaining that we haven't moved mountains. Honestly people, what do you expect?

Of course, there's the flip side of this statement: compiling Ruby is a hard problem, and I like hard problems. For me it's doubly hard, since I've never written a compiler before. But hell, before JRuby I'd never even worked on an interpreter or language implementation before, and that seems to have gone alright. So there it is...Mount Ruby, waiting to be climbed. And climb it I must!

The current compiler design lives in two halves: the AST-walking half; and the code-generation half. I chose to split these two because it make several things easier. For starters, it allows me to abstract all the bytecode generation logic behind a simple interface, an interface that presents coarse-grained operations like invokeDynamic() and retrieveLocalVariable(). The ultimate implementation of those operations can then be modified at will. It also allows us to evolve the AST independently of the compiler backend, even to the point of swapping in a completely different parser and in-memory code representation (like YARV bytecodes) without harming the evolving code generator backend. So this split helps future-proof the compiler work.

The current design also has another advantage: not all of Ruby has to compile for it to be useful. Currently, as the AST walker encounters nodes, if it finds a node it can't deal with it simply raises an exception. Compilation terminates, and the compiler's client can deal with the result as it will. This leads to a really powerful feature of this design: we can install the compiler now as a JIT and as it evolves more and more code will automatically get optimized. So once we're confident that a given node type is 100% compiling correctly, that node will now be eligible for JIT compilation. As an example, here's the output from a gem installation with the current compiler enabled as a JIT (with my logging in place, naturally):

compiled: TarHeader.empty?
compiled: Entry.initialize
compiled: Entry.full_name
compiled: Entry.bytes_read
compiled: Entry.close
compiled: Entry.invalidate
Successfully installed rake, version 0.7.1
Installing ri documentation for rake-0.7.1...
compiled: LeveledNotifier.notify?
compiled: LeveledNotifier.<=>
compiled: RubyLex.getc
compiled: null.debug?
compiled: BufferedReader.ungetc
compiled: Token.set_text
compiled: RubyLex.line_no
compiled: RubyLex.char_no
compiled: BufferedReader.column
compiled: RubyToken.set_token_position
compiled: Token.initialize
compiled: RubyLex.get_read
compiled: RubyLex.getc_of_rests
compiled: BufferedReader.getc_already_read
compiled: BufferedReader.peek
compiled: RubyParser.peek_tk
compiled: TokenStream.add_token
compiled: TokenStream.pop_token
compiled: CodeObject.initialize
compiled: RubyParser.remove_token_listener
compiled: Context.ongoing_visibility=
compiled: PreProcess.initialize
compiled: AttrSpan.[]
compiled: null.wrap
compiled: JavaProxy.to_java_object
compiled: Lines.next
compiled: Line.isBlank?
compiled: Fragment.add_text
compiled: Fragment.initialize
compiled: ToFlow.convert_string
compiled: LineCollection.add
compiled: Entry_.path
compiled: Entry_.directory?
compiled: Entry_.dereference?
compiled: AttrSpan.initialize
compiled: Entry_.prefix
compiled: Entry_.rel
compiled: Entry_.remove
compiled: Lines.rewind
compiled: AnyMethod.<=>
compiled: Description.serialize
compiled: AttributeManager.change_attribute
compiled: AttributeManager.attribute
compiled: ToFlow.annotate
compiled: NamedThing.initialize
compiled: ClassModule.full_name
compiled: Lines.initialize
compiled: Lines.empty?
compiled: LineCollection.normalize
compiled: ToFlow.end_accepting
compiled: Verbatim.add_text
compiled: FalseClass.to_s
compiled: TopLevel.full_name
compiled: Attr.<=>
Installing RDoc documentation for rake-0.7.1...
compiled: Context.add_attribute
compiled: Context.add_require
compiled: Context.add_class
compiled: AbstructNotifier.notify?
compiled: Context.add_module
compiled: LineReader.read
compiled: null.instance
compiled: HtmlMethod.path
compiled: HtmlMethod.aref
compiled: ContextUser.initialize
compiled: HtmlClass.name
compiled: TokenStream.token_stream
compiled: LineReader.initialize
compiled: TemplatePage.write_html_on
compiled: Context.push
compiled: Context.pop
compiled: HtmlMethod.name
compiled: Context.find_local_symbol
compiled: SimpleMarkup.add_special
compiled: TopLevel.find_module_named
compiled: Context.find_enclosing_module_named
compiled: HtmlMethod.<=>
compiled: ToHtml.annotate
compiled: HtmlMethod.visibility
compiled: HtmlMethod.section
compiled: HtmlMethod.document_self
compiled: LineReader.dup
compiled: Lines.unget
compiled: ToHtml.accept_paragraph
compiled: ContextUser.document_self
compiled: ToHtml.accept_heading
compiled: Heading.head_level
compiled: ToHtml.accept_list_start
compiled: ToHtml.accept_list_end
compiled: ToHtml.accept_verbatim
compiled: SimpleMarkup.initialize
compiled: AttributeManager.initialize
compiled: ToHtml.initialize
compiled: ToHtml.end_accepting
compiled: HtmlMethod.singleton
compiled: Context.modules
compiled: Context.classes
compiled: ContextUser.build_include_list
compiled: HtmlMethod.description
compiled: HtmlMethod.parent_name
compiled: HtmlMethod.aliases
compiled: HtmlClass.parent_name
compiled: ContextUser.as_href
compiled: ContextUser.url
compiled: ContextUser.aref_to
compiled: HtmlFile.<=>
compiled: HtmlClass.<=>

You can see from the output that not only are RubyGems methods getting compiled, but so are stdlib methods and our own Java integration methods. And this is with the current compiler, which doesn't support compiling class defs, blocks, case statements, ... Hopefully you get the picture; this bit-by-bit implementation of the compiler allows us to slowly grow our ability to optimize Ruby into Java bytecodes.

So then, how well does it perform? It performs just dandy, when we're able to compile. Witness the following results for a simple recursive fib algorithm running under Ruby 1.8.5 and JRuby trunk with the JIT enabled.

$ ruby test/bench/bench_fib_recursive.rb
12.760000   1.400000  14.160000 ( 14.718925)
12.660000   1.490000  14.150000 ( 14.648681)
$ JAVA_OPTS=-Djruby.jit.enabled=true jruby test/bench/bench_fib_recursive.rb
compiled: Object.fib_ruby
8.780000   0.000000   8.780000 (  8.780000)
7.761000   0.000000   7.761000 (  7.761000)

Yes, that's nearly double the performance of the C implementation of Ruby. And this is absolutely real.

Now JITing is great, and it's obviously carried Java a long ways. The HotSpot JIT is an unbelievable piece of work, and any app that runs a long time is guaranteed to perform better and better as deeper optimizations start to take hold. But We're talking about Ruby here, which starts up at C-program speeds, and runs as fast as it does immediately. So then JRuby needs a way to compete for immediate execution performance, and the most straightforward way to do that is with an ahead-of-time compiler. That compiler is now also available in JRuby trunk.

The name of the command is "jrubyc", and it does just what you'd expect, it outputs a Java class file for your Ruby code. However the mapping from Ruby code to a class file is not as straightforward as you'd expect: a Ruby script may contain many classes or no classes at all, and those classes may be opened and re-opened by the same script or other scripts at runtime. So there's no way to map directly from a Ruby class to a Java class given the strict limitations of Java's class model. But there is a much smaller unit of code that does not change over time, aside from being mercilessly juggled around: methods.

Ruby, in the end, is a creative and sometimes complicated jumble of method "objects", floating from class to class, from module to module, from namespace to namespace. Methods can be renamed, redefined, added and removed, but never can they be directly modified. And so here is where we have our immutable item to compile.

JRuby's compiler takes a given Ruby script and generates the following Java methods out of it: One Java method for the top-level, straight-through execution of the script, including class bodies and "def"s and the like (called "__file__" in the eventual Java class...thanks Ola for the idea), and a Java method for every Ruby method body and closure contained therein, named in such a way as to avoid conflicts. So for the following piece of code:

require 'foo'

def bar
baz { puts "hello" }
end

def baz
yield
end

There would be four Java methods generated: one for the toplevel execution of the script, two for the bar and baz methods, and one for the closure contained within bar. The resulting class file would store these as static methods, so they are accessible from any class or object as necessary, and the toplevel run-through would bind the two Ruby methods to their appropriate names in Ruby-space.

Quite simple, really!

So then an example of the precious, precious JRuby compiler:

$ cat fib_recursive.rb
def fib_ruby(n)
if n < 2
n
else
fib_ruby(n - 2) + fib_ruby(n - 1)
end
end

puts fib_ruby(34)
$ jrubyc fib_recursive.rb
$ ls fib_recursive.*
fib_recursive.class  fib_recursive.rb
$ time java -cp lib/jruby.jar:lib/asm-2.2.2.jar:. fib_recursive
5702887

real    0m8.126s
user    0m7.632s
sys     0m0.208s
$ time ruby fib_recursive.rb
5702887

real    0m14.649s
user    0m12.945s
sys     0m1.480s

Again, about twice as fast as Ruby 1.8.5 for this particular benchmark.

Now I don't want you going off and saying JRuby has a perfect compiler that will double the performance of your Rails apps. That's not true yet. The current compiler covers only about 30% of the possible code constructs in Ruby, and the remaining 60% (Update: 70%...that's what I get for late-night blogging) contains some of the biggest challenges like closures and class definitions. It's sure to be buggy right now, and the JIT isn't even enabled by default, plus it has my nasty logging message burned into it, to discourage any production use.

But it is very real. JRuby has a partial but growing compiler for Ruby to Java bytecode now.

And oh my, look at the time. Tonight I have to finish my visa application for a trip to India, nail down schedules and descriptions for several upcoming talks, and prepare some slides and notes for presentations in the coming weeks. You will see more about the Java compilation and our developing YARV/Ruby 2.0 bytecode support over the next couple months...and you can expect JavaOne to be an interesting time for Ruby on the JVM this year ;)

Friday, January 5, 2007

Ruby Breaks TIOBE Top Ten; Declared Language of the Year

The headline says it all, really!

The TIOBE Programming Community Index measures language popularity based on "the world-wide availability of skilled engineers, courses and third party vendors" using the major search engines. It's not not a terribly scientific way to measure popularity, but I'm not sure anyone has a better index.

Ruby has been moving up every month during 2006 and for the first time has broken the top ten in January 2007. TIOBE also declared it the "Programming Language of 2006", which comes as no surprise to us Rubyists who love the language so much.

Congratulations, Ruby!

Thursday, January 4, 2007

New JRuby Compiler: Progress Updates

I've been cranking away on the new compiler. I'm a bit tired and planning to get some sleep, but I've gotten the following working:

all three kinds of calls
local variables
string, fixnum, array literals
'def' for simple methods and arg lists
closures

Now that last item comes with a big caveat: I have no way to pass closures I create. The code is compiling, basically as a Closure class that you initialize with local variables and invoke. But since block management in JRuby is still heavily dependent on ThreadContext nonsense, there's no easy way to pass it to a given method. So the next step to getting closures to work in the compiler is to start passing them on the call path as a parameter, much like we do for ThreadContext.

I've managed to keep the compiler fairly well isolated between node walking and bytecode generation, though the bytecode generator impl I have currently is getting a little large and cumbersome. It's commented to death, but it's pushing 900 LOC. It needs some heavy refactoring. However, it's behind a fairly straightforward interface, so the node-walking code doesn't ever see the ugliness. I believe it will be much easier to maintain, and it's certainly easier to follow.

In general, things are moving along well. I'm skipping edge cases for some nodes at the moment to get bulk code compiling. There's a potential that as this fills out more and handles compiling more code, it could start to be wired in as a JIT. Since it can fail gracefully if it can't compile an AST, we'd just drop back to interpreted mode in those cases.

So that's it.

...

Ok, ok, here's performance numbers. Twist my arm why don't you.

(best times only)

The new method dispatch benchmark tests 100M calls to a simple no-arg method that returns 'self', in this case Fixnum#to_i. The first part of the test is a control run that just does 100M local variable lookups.

method dispatch, control (var access only):
 interpreted, client VM: 1.433
 interpreted, server VM: 1.429
 ruby 1.8.5: 0.552
 compiled, client VM: 0.093
 compiled, server VM: 0.056

Much better. The compiler handles local var lookups using an array, rather than going through ThreadContext to get a DynamicScope object. Much faster, and HotSpot hits it pretty hard. At worst it takes about 0.223s, so it's faster than Ruby even before HotSpot gets ahold of it. The second part of the test just adds in the method calls.

method dispatch, test (with method calls):
 interpreted, client VM: 5.109
 interpreted, server VM: 3.876
 ruby 1.8.5: 1.294
 compiled, client VM: 3.167
 compiled, server VM: 1.932

Better than interpreted, but slow method lookup and dispatch is still getting in the way. Once we find a single fast way to dynamic dispatch I think this number will improve a lot.

So then, on to the good old fib tests.

recursive fib:
 interpreted, client VM: 6.902
 interpreted, server VM: 5.426
 ruby 1.8.5: 1.696
 compiled, client VM: 3.721
 compiled, server VM: 2.463

Looking a lot better, and showing more improvement over interpreted than the previous version of the compiler. It's not as fast as Ruby, but with the client VM it's under 2x and with the server VM it's in the 1.5x range. Our heavyweight Fixnum and method dispatch issues are to blame for the remaining performance trouble.

iterative fib:
 interpreted, client VM: 17.865
 interpreted, server VM: 13.284
 ruby 1.8.5: 17.317
 compiled, client VM: 17.549
 compiled, server VM: 12.215

Finally the compiler shows some improvement over the interpreted version for this benchmark! Of course this one's been faster than Ruby in server mode for quite a while, and it's more a test of Java's BigInteger support than anything else, but it's a fun one to try.

All the benchmarks are available in test/bench/compiler, and you can just run them directly. If you like, you can open them up and see how to use the compiler yourself; it's pretty easy. I will be continuing to work on this after I get some sleep, but any feedback is welcome.

Wednesday, January 3, 2007

InvokeDynamic: Actually Useful?

Over time I've become less convinced that hotswappable classes would be an absolute requirement for the proposed invokedynamic bytecode to be useful, and more convinced that there's a number of ways a dynamic language like Ruby or Groovy could utilize the new bytecode. This post gives a little background on invokedynamic and attempts to summarize a few ideas off the top of my head.

Many folks, myself included, have long held that the proposed invokedynamic bytecode would only be useful if coupled with hotswappable classes. Hotswapping is the mechanism by which we could alter class structure after definition and have existing instances of the class pick up those changes. It's true this would be required if we were to compile Ruby all the way to bytecode; since Ruby classes are always open, we need the ability to add and remove methods without destroying already-created instances. The argument goes that if invokedynamic requires a dynamically-invoked method to exist on a target receiver's type, then we would only ever be able to invokedynamic against compiled Ruby code if we could continue to alter those types when classes get re-opened.

I do believe that hotswapping would be useful, but it's fraught with many really difficult problems. To begin with, there's Java's security model, whereby a class that's been loaded into the system *can not* be modified in most typical security contexts. The JVM does have the ability to replace existing method definitions at runtime, but that's generally reserved for debugging purposes, and it doesn't allow adding or removing methods. It also does not currently have the ability to wholesale remove and replace a class that has live instances, and it's an open research question to even consider the ramifications of allowing such a thing.

So what are the alternatives? Gilad Bracha proposed having the ability to attach methods dynamically to a given static class at runtime. This would perhaps be similar to the CLR's "dynamic methods". This idea perhaps has more merit...one issue not addressed by hotswappable classes is that even once we compile Ruby to bytecode, it's still dynamic and duck-typed. Would all methods accept Object and return Object? Is that useful? By specifically stating that some methods are dynamic and mutable (in the case of a Ruby class, likely all methods we've compiled), you effectively create the equivalent of hotswapping without breaking existing static types and their security semantics.

But this is all research that could and perhaps should occur outside invokedynamic, and it all may or may not be related. So then, can invokedynamic be useful with these class-structure questions unanswered? What does invokedynamic mean?

To me, invokedynamic means the ability to invoke a method without statically binding to a specific type, and perhaps additionally without specifying static types for the parameter list. For those that don't know, when generating method-call bytecodes for the JVM, you must always provide two things in addition to the method name: the class within which the method you're invoking lives and the precise parameter list of the method you want to call. And there's not much wiggle room there; if you're off on the target type or if the receiver you're calling against has not yet been cast to (or been determined to match) that type, kaboom. If your parameter list doesn't match one on the target type, kaboom. If your parameters haven't been confirmed as being compatible with that signature, kaboom. Perhaps you can see, then, why writing a compiler for the JVM is such a complicated affair.

So there's potential for invokedynamic to make even static compilation easier. Without the need to specify all those types, we can defer that compile-time magic to the VM, if we so choose. We don't have to dig around for the exact signature we want or the exact target type. Given a receiver object, a method name, and a bundle of parameter objects, invokedynamic should "do the right thing."

Now we start to see where this could be useful. Any dynamic language on the JVM is going to be most interesting in the context of the platform's available libraries. Ruby is great on its own, and there's certainly an entire (potentially large) market segment that's interested in JRuby purely as an alternative Ruby runtime. But the larger market, and the more intriguing application of JRuby, is as a language to tie the thousands of available Java libraries together. And that requires calling Java code from Ruby and Ruby code from Java with as little complexity and overhead as possible.

Enter invokedynamic.

Now I've only recently started to see how invokedynamic could really be useful even without dynamic methods or hotswappable classes, so this list is bound to grow. I'd love to have all three features, of course, but here's a few areas that invokedynamic alone would be useful:

Our native implementations of Ruby methods can't really be tied to a specific concrete class, since we have to be able to rewire them at runtime if they're redefined. If invokedynamic came along with a mechanism for doing a Java-based "method_missing", whereby we could intercept dynamic calls to a given object and dispatch in our own way, we could make use of the bytecode without having hot-swappable classes.
It would also aid compilation and code generation. In my work on the prototype compiler, one of the biggest stumbling blocks is making sure I'm binding method calls to the appropriate target type. I must make sure the receiver of a method has been casted to the type I intend to bind to or Java complains about it. If there were a way to just say invokedynamic, omitting the target type, it would make compilation far simpler; and I don't believe HotSpot would have to do any additional work to make it fast, since it already has optimizations under the covers that are fairly type-agnostic.
To a lesser extent, invokedynamic could push the smarts of determining appropriate method signatures onto the VM. I would supply a series of parameters and a method name, and tell the VM to invokedynamic. The VM, in turn, would look at the params and name and select an appropriate method from the receiving object. This is in essence all that's needed for real duck typing to work.

This last item calls out a perhaps surprising area that invokedynamic would be very useful: invoking Java code from a dynamic language.

When calling Java code from Ruby, for example, all we really have to work with are two details: a method name and potentially an arity. We can do some inference based on the actual types of parameters, but there's a lot of magic and a number of heuristics involved. If there were a JVM-native mechanism for calling arbitrary methods on a given object, without having to statically bind to those methods, it would eliminate much of our Java integration layer.

All told, I think invokedynamic would definitely be much more than a PR stunt, as some have claimed. It would eliminate one of the most difficult barriers to generating JVM bytecodes by allowing arbitrary method calls that aren't necessarily bound to specific types. I for one would vote yes, and I plan to throw my weight behind making invokedynamic do everything I need it to do...with or without hotswapping.

Wednesday, December 27, 2006

Holiday Fun: Interpretation, Loading, Dispatching Work

This week (what's left of it) I'm spending on performance again. It's officially a holiday for Sun, so I'm not technically on the clock. I could even sleep the entire week and pick things back up on Tuesday.

Yeah, right.

One of the oddities of working on JRuby while working at Sun stems from the fact that I was working on JRuby for fun for the past two years. Now that this is the full-time job, it's even harder to pull myself away from it. I thought perhaps making JRuby my job might take away some of the attraction. In reality, it's just made a good thing better, since I can spend long hours on the hard problems I would never have tackled before.

So instead of stopping work completely, I'm shifting gears a bit. Instead of the heavy Rails focus we've had over the past month, I'm hitting other "fun" stuff instead. Compilation, interpretation performance, and so on. We know there's still lots of fruit to be plucked from the performance tree, and of course any additional work/research on compilation will help that eventual goal.

Compiler Refactoring Underway

As far as compilation goes, I've started committing a refactored compiler; it separates AST-node-walking from code generation, so the backend could be swapped out with a YARV generator or future compiler revisions. I think that's needed to allow the compiler and the AST to evolve independently, since I believe there will be many possible compile targets and potentially many AST changes in the future. Nothing too fancy there, but it's a bit more readable so hopefully others can also contribute.

Interpreter Enhancements and Fixes

On the interpretation front, there are a few items I've been working on.

1. Speeding up method invocation and block management (committed)

Ruby's AST represents method calls that take a block by putting the block first in the AST. You encounter an "iter node" that points at the "call" with which it is associated. So the typical way you evaluate those nodes is to create the block object and then visit the call. Unfortunately, the call itself may lead to other calls with their own blocks, for evaluating the arguments or receiver.

The end result of this node ordering is that every time a method is called, the evaluation of its args and receiver has to juggle the current "block available" status around, so the block doesn't get consumed before it's needed. Because the block gets created early, we have to push it and the "block available" status onto a stack. This is the primary reason the block logic in the interpreter and ThreadContext is so complicated.

An example would help illustrate what's happening. For the following code:

foo("hello") { 1 }

The parser generates the following hierarchy of AST nodes:

IterNode[]        <= this is the block
 NewlineNode[]
  FixnumNode[] 1  <= this is the Fixnum 1 inside the block
 FCallNode[] foo  <= this is the call to foo
  ArrayNode[]: {StrNode[]}
   StrNode[]"hello"

The node associated with the block is encountered first, so we construct the block then. We move on to the "foo" call, and the block is consumed. No problem, right? However, here's a more complicated example that illustrates the trouble with this AST ordering:

foo { 1 }.bar { 2 }.baz(hello { 3 }) { 4 }

Here things are a bit more interesting. The parser produces the following output:

IterNode[]            <= the "4" block
 NewlineNode[]
  FixnumNode[] 4
 CallNode[] baz
  IterNode[]          <= the "2" block
   NewlineNode[]
    FixnumNode[] 2
  CallNode[] bar
   IterNode[]         <= the "1" block
    NewlineNode[]
     FixnumNode[] 1
   FCallNode[] foo
 ArrayNode[]: {IterNode[]}
  IterNode[]          <= the "3" block
   NewlineNode[]
    FixnumNode[] 3
  FCallNode[] hello

So what actually happens here? First the "4" block is encountered, instantiated, and pushed onto the block stack. Then we proceed to the "baz" call. Unfortunately, the "baz" call has both a receiver and arguments, so we have to hide the "4" block to prevent it being consumed. We proceed to evaluate the receiver for "baz", encountering the "2" block. The "2" block is instantiated and pushed down, and we move on to the "bar" call. The "bar" call has another receiver; we evaluate that, encountering the "1" block and the "foo" call it's associated with. The "foo" call consumes the "1" block and returns a receiver for "bar". The "bar" call consumes the "2" block and returns a receiver for "baz". Now the "baz" call has to evaluate its arguments, so the "3" block is created and consumed by the "hello" call. Finally, with a receiver and args, we can call "baz" and consume the "4" block.

Confused? Me too. Why would you order it this way? Perhaps it's to ease parsing, or perhaps there's some reason I don't know. However, I believe the following ordering is much simpler (and I know it makes interpretation easier):

(extraneous nodes omitted)

CallNode[] baz
 CallNode[] bar       <= the receiver for "baz", a call to "bar"
  FCallNode[] foo     <= the receiver for "bar", a call to "foo"
   IterNode[]         <= the "1" block associated with "foo"
  IterNode[]          <= the "2" block associated with "bar"
 ArrayNode[]          <= args to "baz"
  FCallNode[] hello   <= the call to "hello"
   IterNode[]         <= ...and its "3" block
 IterNode[]           <= finally the "4" block

The advantages here should be obvious. We encounter the blocks in order, so there's no stack juggling involved. Because there's no stack juggling, we don't have to "hide" blocks as we evaluate receivers and arguments. Finally, because we know we'll only encounter blocks for methods that require them, there's no additional overhead for methods that don't need blocks. It's a good change, and I would love to understand why Ruby uses the more complicated AST structure instead of this.

I have already committed a change to reorder the way these AST nodes are handled. The AST itself is unchanged, but the visit to a given block (IterNode) just sets that node into the associated call and proceeds. The calls themselves are now responsible for creating an associated block (if necessary)...*after* receiver and args have been dealt with. This means two things: calls that don't accept blocks don't pay any block-manipulation penalty; and calls with peripheral blocks (for finding args or receivers) don't pay any block-manipulation penalty either. Only the calls that need blocks have to deal with them.

Eventually this will become an AST change, but this short-term fix resolves 90% of the interpreter goofiness right now. This change will also eventually mean the iter stack (and potentially the block stack) disappear too. Huzzah!

2. LoadService fixes, enhancements, and optimizations (committed)

LoadService cleanup and improvements are well under way. LoadService is responsible for "load" and "require" calls and does all the searching for files and management of loaded extensions and libraries. Unfortunately the existing heuristic was both broken and terribly inefficient.

Ruby's 'load' behavior is easy enough...just look for the exact file and execute it in the current runtime. Ruby's 'require' however has a bit more magic to it.

If you specify a full filename to 'require' it will use that filename to load either a source file or an extension, depending on whether you specify ".rb" or ".[so|o|dll|etc..]". If you do not specify an extension, it will search for .rb, .so, etc in turn until it finds something, If it finds nothing, that's a load error. Here's the "ri" doc for MRI's "require":

--------------------------------------------------------- Kernel#require
  require(string)    => true or false
------------------------------------------------------------------------
  Ruby tries to load the library named _string_, returning +true+ if
  successful. If the filename does not resolve to an absolute path,
  it will be searched for in the directories listed in +$:+. If the
  file has the extension ``.rb'', it is loaded as a source file; if
  the extension is ``.so'', ``.o'', or ``.dll'', or whatever the
  default shared library extension is on the current platform, Ruby
  loads the shared library as a Ruby extension. Otherwise, Ruby tries
  adding ``.rb'', ``.so'', and so on to the name. The name of the
  loaded feature is added to the array in +$"+. A feature will not be
  loaded if it's name already appears in +$"+. However, the file name
  is not converted to an absolute path, so that ``+require
  'a';require './a'+'' will load +a.rb+ twice.

There's also the issue of what paths to search. Ruby searches the current directory first, followed by custom load paths, site_ruby dirs, and ruby/1.8 dirs. This allows a number of mechanisms for overriding more general locations with more specific ones for particular uses.

JRuby adds a new wrinkle here: classloader resources. JRuby supports JARing up source files and loading them through Java's classloader mechanisms. This is how the JRuby applet and the "complete" JRuby JAR work: they simply include all Ruby source into the archive. So then for us the classloader/classpath represents an additional path to be searched.

The primary problem with JRuby's load heuristic is that it ended up searching the classloader far too frequently; and in many cases it searched it multiple times for files that could not exist, such as for complete absolute paths (starting with '/', which *never* works for classloader resources). A second problem with the heuristic is that it would try all locations for all extensions, so for example it would search for xxx.rb everywhere possible, then xxx.rb.ast.ser (our serialized AST format) everywhere possible, then xxx.so (used internally for extensions...though that may change), and so on. The result of this is that any extensions or serialized scripts went through the full monty of searches before being found on a second or third pass.

These two issues were compounded when running JRuby with a very large classpath. Because classloader resources can be expensive to search, our loading became linearly slower in relation to the number of JARs (or perhaps the size of jars) included into the JVM. More JARs, slower classloader resource searching, slower startup.

The fix for these issues is twofold: search a given load location for all filename extensions before moving on, and only search the classloader as a last resort for filenames likely to be found there. It's fairly simple to explain, but the LoadService code had been endlessly hacked and rejiggered over the years. The cleanup was 90% of the battle; the fixes were considerably easier.

A final issue with LoadService, which is now fixed, was that it allowed require to include files with no extensions at all. This is not correct Ruby behavior, and so now those files can't be loaded. This actually caused a bug a long time ago where the extension-free "rake" startup script was being loaded before the "rake.rb" library file it tried to locate. The result was an eventual stack overflow as the file tried to continually load itself. Goofy behavior we should never see again.

3. Speeding up dynamic dispatch

I'm also doing some experimental work to speed up dynamic dispatch. Currently, methods are located by name, looking up a callable object out of a big hash on a per-class basis. This works reasonably well, and there are caches to speed the process, but with all the recent performance work this search has started to become the new bottleneck.

The eventual callable objects looked up have another flaw: they make Hotspot optimization more difficult. Because they're behind an ICallable interface, because they have multiple levels of logic as well as pre/post-call setup and teardown, and because there are many different implementations of ICallable, they end up slowing execution down significantly.

Hotspot is really an amazing piece of work. For the vast majority of Java code, it's able to unroll loops, inline invocations, dynamically optimize conditionals and switches, and generally improve the speed of code by a drastic amount. When running in "server" mode, where Hotspot lets code run longer in interpreted mode before optimizing it, even greater improvements can be seen. For example, A fib benchmark--recursive and iterative--under Java 6 client and server VMs:

(best times shown)

client recursive:
8.551000

client iterative:
17.995000

server recursive:
5.008000

server iterative:
13.191000

MRI recursive:
1.670000

MRI iterative:
16.964403

These numbers aren't bad, really. We even beat MRI for this trivial benchmark when we start hitting Bignums heavily (Java's BigInteger implementation is quite a bit faster than Ruby's). However, we can certainly do better.

There are two experimental changes I'be been working on. The first eliminates the pre/post method setup for core libraries when that setup is not necessary. I call it "fast invocation", and it's applicable to a large majority of core class methods. For example, with just fast invocation, the recursive fib numbers above drop to the sub-5s range. This change is perfectly safe, since it's just eliminating interpreter overhead that would otherwise be wasted cycles. You can expect to see it included in JRuby soon.

The second change is the holy grail of dynamic invocation: eliminate, to the greatest extent possible, the overhead of looking up and dispatching to a given method. In short, make it as close to a simple static dispatch as possible. This is where the real speed gains in JRuby will start to show up.

I have some experimental code right now, focused on the fib benchmark, that is both safe and drastically improves performance. It's sub-par code at the moment, but it does produce results like this:

recursive before:
5.008000

recursive after:
3.864000

Now of course, this is still interpreted. The same change when applied to my experimental Ruby compiler produces a much more drastic effect:

compiled recursive after:
1.550100

Now we start to see the value of eliminating dynamic-dispatch overhead. This is actually *faster* than Ruby's recursive fib, a feat that hasn't been accomplished by JRuby at any time in the past.

The trick to this is fairly simple. For common core methods which are known to be simple Java code, such as for Fixnum's +, -, and < implementations, I provide integer IDs. Within the Fixnum implementation there's a new implementation of "callMethod", our dynamic-dispatcher, which switches on these IDs. For methods it knows, such as the aforementioned +, -, and <, it dispatches directly to op_plus, op_minus, or op_lt, the Java implementations. This skips the lookup phase, the ICallable implementation, the ThreadContext manipulation, and the pre/post-method setup code completely. It's also perfectly safe, again, because all those pieces only waste cycles for simple methods like this.

Now one problem with a simple approach like this is that if you redefine Fixnum#+, that change won't be picked up. The simple Fixnum callMethod won't ever try a method search for methods it knows can be fast-dispatched. I resolved this in my experimental code by adding a "clean" flag to the Fixnum class. If any any point after its initial definition the Fixnum class becomes "dirty", e.g. if you add or redefine a method, the old, slow dispatch will come back into play. My simple version is too coarse-grained, killing fast dispatching for all methods if any of them are changed, but the principal is sound. I'm going to be exploring this the rest of the week, trying to find a more complete solution...but some good things are around the corner.

--

All told, it's been a productive few days since JavaPolis. I'm hoping to write up a JavaPolis recap soon, but I'm keen to use this holiday time to get some cool stuff done. You'll hear more after the first of the year...hopefully with committed code and additional benchmarks against JRuby trunk :)