Showing posts with label jruby. Show all posts
Showing posts with label jruby. Show all posts

Saturday, January 5, 2013

Constant and Global Optimization in JRuby 1.7.1 and 1.7.2

With every JRuby release, there's always at least a handful of optimizations. They range from tiny improvements in the compiler to perf-aware rewrites of core class methods, but they're almost always driven by real-world cases.

In JRuby 1.7.1 and 1.7.2, I made several improvements to the performance of Ruby constants and global variables that might be of some interest to you, dear reader.

Constants

In Ruby, a constant is a lexically and hierarchically accessed variable that starts with a capital letter. Class and module names like Object, Kernel, String, are all constants defined under the Object class. When I say constants are both lexical and hierarchically accessed, what I mean is that at access time we first search outward through lexically-enclosing scopes, and failing that we search through the class hierarchy of the innermost scope. For example:


Here, the first two constant accesses inside class B are successful; the first (IN_FOO) is located lexically in Foo, because it encloses the body of class B. The second (IN_A) is located hierarchically by searching B's ancestors. The third access fails, because the IN_BAR constant is only available within the Bar module's scope, so B can't see it.

Constants also...aren't. It is possible to redefine a constant, or define new constants deeper in a lexical or hierarchical strcture that mask earlier ones. However in most code (i.e. "good" code) constants eventually stabilize. This makes it possible to perform a variety of optimizations against them, even though they're not necessarily static.

Constants are used heavily throughout Ruby, both for constant values like Float::MAX and for classes like Array or Hash. It is therefore especially important that they be as fast as possible.

Global Variables

Globals in Ruby are about like you'd expect...name/value pairs in a global namespace. They start with  $ character. Several global variables are "special" and exist in a more localized source, like $~ (last regular expression match in this call frame), $! (last exception raised in this thread), and so on. Use of these "local globals" mostly just amounts to special variable names that are always available; they're not really true global variables.

Everyone knows global variables should be discouraged, but that's largely referring to global variable use in normal program flow. Using global state across your application – potentially across threads – is a pretty nasty thing to do to yourself and your coworkers. But there are some valid uses of globals, like for logging state and levels, debugging flags, and truly global constructs like standard IO.


Here, we're using the global $DEBUG to specify whether logging should occur in MyApp#log. Those log messages are written to the stderr stream accessed via $stderr. Note also that $DEBUG can be set to true by passing -d at the JRuby command line.

Optimizing Constant Access (pre-1.7.1)

I've posted in the past about how JRuby optimizes constant access, so I'll just quickly review that here.

At a given access point, constant values are looked up from the current lexical scope and cached. Because constants can be modified, or new constants can be introduce that mask earlier ones, the JRuby runtime (org.jruby.Ruby) holds a global constant invalidator checked on each access to ensure the previous value is still valid.

On non-invokedynamic JVMs, verifying the cache involves an object identity comparison every time, which means a non-final value must be accessed via a couple levels of indirection. This adds a certain amount of overhead to constant access, and also makes it impossible for the JVM to fold multiple constant accesses away, or make static decisions based on a constant's value.

On an invokedynamic JVM, the cache verification is in the form of a SwitchPoint. SwitchPoint is a type of on/off guard used at invokedynamic call sites to represent a hard failure. Because it can only be switched off, the JVM is able to optimize the SwitchPoint logic down to what's called a "safe point", a very inexpensive ping back into the VM. As a result, constant accesses under invokedynamic can be folded away, and repeat access or unused accesses are not made at all.

However, there's a problem. In JRuby 1.7.0 and earlier, the only way we could access the current lexical scope (in a StaticScope object) was via the current call frame's DynamicScope, a heap-based object created on each activation of a given body of code. In order to reduce the performance hit to methods containing constants, we introduced a one-time DynamicScope called the "dummy scope", attached to the lexical scope and only created once. This avoided the huge hit of constructing a DynamicScope for every call, but caused constant-containing methods to be considerably slower than those without constants.

Lifting Lexical Scope Into Code

In JRuby 1.7.1, I decided to finally bite the bullet and make the lexical scope available to all method bodies, without requiring a DynamicScope intermediate. This was a nontrivial piece of work that took several days to get right, so although most of the work occurred before JRuby 1.7.0 was released, we opted to let it bake a bit before release.

The changes made it possible for all class, module, method, and block bodies to access their lexical scope essentially for free. It also helped us finally deliver on the promise of truly free constant access when running under invokedynamic.

So, does it work?


Assuming constant access is free, the three loops here should perform identically. The non-expression calls to foo and bar should disappear, since they both return a constant value that's never used. The calls for decrementing the 'a' variable should produce a constant value '1' and perform the same as the literal decrement in the control loop.

Here's Ruby (MRI) 2.0.0 performance on this benchmark.


The method call itself adds a significant amount of overhead here, and the constant access adds another 50% of that overhead. Ruby 2.0.0 has done a lot of work on performance, but the cost of invoking Ruby methods and accessing constants remains high, and constant accesses do not fold away as you would like.

Here's JRuby 1.7.2 performance on the same benchmark.


We obviously run all cases significantly faster than Ruby 2.0.0, but the important detail is that the method call adds only about 11% overhead to the control case, and constant access adds almost nothing.

For comparison, here's JRuby 1.7.0, which did not have free access to lexical scopes.


So by avoiding the intermediate DynamicScope, methods containing constant accesses are somewhere around 7x faster than before. Not bad.

Optimizing Global Variables

Because global variables have a much simpler structure than constants, they're pretty easy to optimize. I had not done so up to JRuby 1.7.1 mostly because I didn't see a compelling use case and didn't want to encourage their use. However, after Tony Arcieri pointed out that invokedynamic-optimized global variables could be used to add logging and profiling to an application with zero impact when disabled, I was convinced. Let's look at the example from above again.


In this example, we would ideally like there to be no overhead at all when $DEBUG is untrue, so we're free to add optional logging throughout the application with no penalty. In order to support this, two improvements were needed.

First, I modified our invokedynamic logic to cache global variables using a per-variable SwitchPoint. This makes access to mostly-static global variables as free as constant access, with the same performance improvements.

Second, I added some smarts into the compiler for conditional forms like "if $DEBUG" that would avoid re-checking the $DEBUG value at all if it were false the first time (and start checking it again if it were modified).

It's worth noting I also made this second optimization for constants; code like "if DEBUG_ENABLED" will also have the same performance characteristics.

Let's see how it performs.


In this case, we should again expect that all three forms have identical performance. Both the constant and the global resolve to an untrue value, so they should ideally not introduce any overhead compared to the bare method.

Here's Ruby (MRI) 2.0.0:


Both the global and the constant add overhead here in the neighborhood of 25% over an empty method. This means you can't freely add globally-conditional logic to your application without accepting a performance hit.

JRuby 1.7.2:


Again we see JRuby + invokedynamic optimizing method calls considerably better than MRI, but additionally we see that the untrue global conditions add no overhead compared to the empty method. You can freely use globals as conditions for logging, profiling, and other code you'd like to have disabled most of the time.

And finally, JRuby 1.7.1, which optimized constants, did not optimize globals, and did not have specialized conditional logic for either:

Where Do We Go From Here?

Hopefully I've helped show that we're really just seeing the tip of the iceberg as far as optimizing JRuby using invokedynamic. More than anything we want you to report real-world use cases that could benefit from additional optimization, so we can target our work effectively. And as always, please try out your apps on JRuby, enable JRuby testing in Travis CI, and let us know what we can do to make your JRuby experience better!

Sunday, April 27, 2008

Promise and Peril for Alternative Ruby Impls

My how things have changed in a couple short years.

Two years ago, in 2006, there were essentially two viable Ruby implementations: Matz's Ruby 1.8.x codebase, and JRuby. At the time, JRuby was just barely starting to run Rails. I consider that a sort of "singularity" in the lifetime of an implementation, the inflection point at which it becomes more than a toy. (OT: these days, I consider the ability to run Rails faster than Matz's Ruby a better inflection point, but we've had the Rails thing going for two years). Cardinal (Ruby on Parrot) was mostly dead, or at least on its way to being dead. YARV, eventually to become the Ruby 1.9 VM, was perhaps only half completed and was not yet officially marked to be the next Ruby. Rubinius had not really been started, or at least had not officially been named and could not be considered anywhere near viable. IronRuby was still Wilco Bauer's IronRuby, a doomed codebase and project name eventually to be adopted by Microsoft's later Ruby implementation effort. So there was very little competition, and most people still considered JRuby to be a big joke. Ha ha.

Fast forward to Spring 2008. Ruby 1.8.x has mostly been put in maintenance mode, but remains by far the most widely-deployed Ruby implementation, despite its relatively poor performance. Largely, this is because only Ruby 1.8.x is 100% compatible with Ruby 1.8.x, and because it's already packaged and shipped on a number of OSes, including Rubyist favorite OS X. But the rest of the field has become a lot more muddled. There are now around six implementations that have past the Rails inflection point or will soon, a handful of others likely to fade into obscurity (after contributing their own genetics to the Ruby ecosystem, surely), a few mysterious up-and-comers...and Cardinal is still dead.

Let's review the promise, peril, and status of all the implementations. Note, this is largely a mix of facts and my opinions. Corrections for the facts are welcome. Corrections for the opinions...well...let's take it offline.

Ruby 1.8

Matz's venerable code base (usually called MRI for "Matz's Ruby Interpreter", MatzRuby or Ruby 1.8 in this article) still has a death-grip on the Ruby world. For 99% of Ruby developers, MatzRuby is still the king of the hill. This is in the face of poor relative performance (slower than JRuby and 1.9 for sure, and slower than most of the others for many cases), poor memory management (conservative, non-compacting GC), and many upstart implementations which solve these problems. Why is this?

It's not hard to answer. It's compatibility and status quo. If all my apps run fine on MatzRuby, and MatzRuby is already installed, and I'm satisfied with the performance characteristics of MatzRuby...why would I run anything else? Because MatzRuby is largely not *that bad* for most uses, especially as relates to simple system scripting work, it's unlikely to go away any time soon. And since all but one of the alternative impls is targeting Ruby 1.8 features and compatibility, the still-in-development Ruby 1.9 is not gaining much traction yet (which is probably a good thing).

You should all know the specs and status for MatzRuby. Current official release is 1.8.6 patchlevel 114. There's a 1.8.7 preview 2 out now that backports a whole bunch of features from 1.9 and breaks some compatibility. The jury's still out on whether it will go final as-is. MatzRuby is a simple AST-walking interpreter, with a conservative GC, minimal/cumbersome Unicode support, and a large library of third-party native extensions. MatzRuby is still the gold standard for what Ruby 1.8 "is".

The peril for Ruby 1.8 right now involves keeping that 99% of Ruby users interested in Ruby while 1.9 bakes without breaking compatibility. Ruby 1.8.7 pre1 introduced some Ruby 1.9 features but also broke a bunch of stuff (most notably Rails), and pre2 doesn't pass the specs 1.8.6 does. We had a design meeting last week where it was decided that we folks working on the Ruby specs need to help the Ruby core team get involved, so they can start running the full suite as part of their development process. That's going to happen soon, but until it does 1.8 releases can break basically anything from day today and nobody will know about it.

Beyond compatibility and keeping the masses happy, Ruby 1.8 could use a little performance, scaling, and memory lovin' too. Unfortunately almost all that effort is going toward Ruby 1.9 now, leaving the vast majority of Ruby users stuck on one of the slowest implementations. That's good for us alternative implementers, since it means we're gaining users every day; but it's not good for the MatzRuby lineage because they're losing mindshare. It's hard to deny that the future of Ruby lies with the "excellent" implementations, if not with the "best" ones, and the definition of "excellent" is moving forward every day. Ruby 1.8 is not.

Ruby 1.9

Ruby 1.9 is the merging of the Ruby 1.8 class library and memory model with a large number of new features and a bytecode-based execution engine. It represents the work of Koichi Sasada, who first announced his YARV ("Yet Another Ruby VM") project at RubyConf 2004. YARV took a bit longer than he and many others expected to be completed, but as a result of his tireless efforts it is now the official Ruby 1.9 VM.

Ruby 1.9 introduces many new features, like a character-aware (Unicode and any other encoding) String implementation, Enumerator/Enumerable enhancements, and numerous refinements and additions to the rest of the class library I won't attempt to list here. Ruby 1.9 is defining, in essence, what the rest of the implementations will soon have to implement. For all the debates, most of the additions in Ruby 1.9 have been well-received by the community, though the exposure level is still extremely low.

Ruby 1.9 has, I believe, reached the Rails singularity. With some work over the past few months, Rails has moved closer and closer to running on Ruby 1.9. Last I heard, there was only one bug that needed patching in the 1.9 codebase for Rails trunk to run unmodified. Expect to see an announcement about Ruby 1.9 and Rails at RailsConf next month.

The interesting thing about Ruby 1.9 is that it will mark only the second non-MatzRuby implementation to run Rails, JRuby being the first. This is due in large part to the massive effort required to implement a bytecode VM for Ruby (1.9 was only just released this past December), and to the fact that Ruby 1.9 is still very much a moving target. APIs are being added and refined, optimizations are being tossed about at the VM level, and memory and GC improvements are being considered. So while it's unlikely that anyone will be moving Rails apps to Ruby 1.9 in the near future, Ruby 1.9 is certainly viable...and represents the most-likely future evolution of the Ruby language itself.

What worries me about Ruby 1.9 is that its performance doesn't seem "better enough" to change the future of Ruby. While Koichi's own benchmarks show it's much faster than Ruby 1.8, on general application benchmarks it's usually less than a 50% improvement. Many times, JRuby is able to exceed Ruby 1.9 performance, even without similar optimizations and feature removals. It is certainly a "better" performance story than Ruby 1.8, but is it enough?

Ruby 1.9 also took the first steps toward concurrency by making threads native, but it encumbered them with a giant lock a la Python. That means you still can't get concurrent execution of Ruby code on Ruby 1.9, something JRuby's been able to do all along, and you can't scale Ruby 1.9 any better on wide systems than you could with Ruby 1.8. There's plans to solve this by adding fine-grained locks to most internal data structures, but that's a hard problem to solve on par with JRuby 2.0 challenges I'll talk about in a minute. And without something like the JVM to really optimize that locking, performance will take a hit.

It's also unclear if people really *want* all of what Ruby 1.9 has to offer. Sure, people love the idea of a real encoding-aware String, but response to the rest of what Ruby 1.9 offers--including performance--has been a collective "meh." People are not flocking to Ruby 1.9 in droves, and many are contemplating whether their future Ruby work will simply be a lateral move to one of the 1.8-compatible implementations. And on the JRuby project, we've received almost no requests to implement Ruby 1.9 features, so we've only added a few tiny ones. Whither Ruby 2.0?

JRuby

Ahh, JRuby. How you have changed my life.

JRuby is a Java-based implementation of Ruby, or if you prefer not to speak the word "Java", it's Ruby for the JVM. JRuby was started in 2002 by Jan Arne Petersen, and though it had a couple good years of activity it never really got to a compatible-enough level to run real Ruby applications. Jan Arne moved on at some point and efforts were largely picked up by Thomas Enebo, current co-lead of JRuby. He was especially active when JRuby was being updated from Ruby 1.6 compatibility to Ruby 1.8.4 compatibility, a task which today is largely complete. I joined the project in fall 2004 after attending RubyConf 2004, and at the time I did not know Ruby. Now I know Ruby in a deeper way than I ever really wanted to...but that's a discussion for another day. I wasn't a hugely active contributor until late 2005, when I started working on a new interpreter and refactoring JRuby internals. I presented JRuby for the first time at RubyConf 2005, and then in early 2006 milestones started dropping like flies: IRB ran, then RubyGems, then Rake...and then Rails. We were still dog slow at the time, but we were viable.

JRuby reached the Rails singularity in time for JavaOne 2006, an event that led Sun to hire Tom and me in fall 2006. We demonstrated on stage JRuby running a simple Rails application. It was cobbled together, running under either Tomcat or simply WEBrick at the time, but we had proved it was possible to have an alternative Ruby implementation compatible enough to run Rails. JRuby was no longer a joke.

Over the past two years (man, has it really been two years?) we've essentially rewritten almost all of JRuby a piece at a time. We've been through three interpreters, one prototype compiler and one complete compiler, multiple Regexp engines, and at least two implementations of the key core classes. I wrote the compiler for JRuby during summer 2007, completing it around RailsConf EU 2007. My first compiler! We now run faster than Ruby 1.8 in both interpreted and compiled modes, with interpreted being perhaps 15-20% faster and compiled being at least a few times faster, generally on par with Ruby 1.9. Of course since we're based on the JVM, we share its object model, garbage collector, binary representation. So JRuby is certainly a "mini-VM" but we leave the nasty bits to JVM implementers to handle. Pragmatism, friends, pragmatism.

Perhaps the most notable result of JRuby's existence is that there are now so many Ruby implementations. If we had not shown the promise, many of the others might not have risked the peril. Oh, and we've ended up with a cracker-jack implementation of Ruby on the JVM too...I suppose that's worth a little something.

Perils...always perils. JRuby has managed to surmount most of the perils that await other implementations. And being on the other side of the chasm, I can tell you now it doesn't get easier.

Compatibility is *hard*. I'm not talking a little hard, I'm talking monumentally hard. Ruby is a very flexible, complicated language to implement, and it ships with a number of very flexible, complicated core class implementations. Very little exists in the way of specifications and test kits, so what we've done with JRuby we've done by stitching together every suite we could find. And after all this time, we still have known bugs and weekly reports of minor incompatibilities. I don't think an alternative implementation can ever truly become "compatible" as much as "more compatible". We're certainly the most compatible alternative impl, and even now we've got our hands full fixing bugs. Then there's Ruby 1.9 support, coming up probably in JRuby 1.2ish. Another adventure.

Performance is also hard, but maybe not *hard* hard. JRuby is lucky to run on one of the fastest VMs in existence. The JVM, in its many incarnations, has been so refined and the JVM implementation arena so competitive that we get a lot of performance for free. But by "for free" I mean Java performance. Java's far easier to write and maintain than C, and on the JVM we know we won't pay a performance penalty for not writing C code. But making Ruby fast on the JVM is where it gets tricky. JVMs are optimized for Java and Java-like languages. JRuby has to include all sorts of tricks and subsystems to make the runtime and compiled Ruby code "feel" a bit more like Java to the JVM. This has involved, in many cases, implementing our own "mini-VM" on top of the JVM, with mixed-mode execution (interpreted, then JITed to JVM bytecode), call site caches (to speed method lookup), and code alterations that sometimes improve performance at the cost of LOC and readability. The challenge for us going forward is to continue improving performance without making JRuby a jumbled mess. John Rose's work on the Da Vinci Machine and dynamic invocation for JDK 7 will help us rip a lot of code out, but only a small subset of JRuby users will see the benefits in the near term. So expect to see us spend a lot more time on performance for Java 5/6 compatible JVMs.

The final big peril for us relates to JRuby 2.0's Java integration support. JRuby currently has a split object model, where Ruby types are all "IRubyObject" implementations and the runtime only understands how to deal with "IRubyObject. This means that in order for us to call methods on non-Ruby Java types, we must wrap them with an IRubyObject wrapper. This is partially to attach our meta-object protocol to those objects, but mostly because every bloody method in the system accepts only IRubyObject as a parameter or return type. In order for us to achieve the "last mile" of Java integration, we need to make the entire system accept "Object" and act appropriately; we've been calling this approach "lightweights", since it would enable using normal Java objects for several core classes like Fixnum and Float. Support for "Object" lightweights would then feed into JRuby 2.0's reworked Java integration layer, eliminating most of the overhead (and code) associated with calling Java methods today. It would also fit better into the invokedynamic work. It's a big job I wouldn't expect to be complete until later this fall, and we need to mercilessly write specs and tests for the current behavior to avoid regressing features we support today. But we're going to do it; we've already started.

Rubinius

Evan Phoenix's Rubinius project is an effort to implement Ruby using as much Ruby code as possible. It is not, as professed, "Ruby in Ruby" anymore. Rubinius started out as a 100% Ruby implementation of Ruby that bootstrapped and ran on top of MatzRuby. Over time, though the "Ruby in Ruby" moniker has stuck, Rubinius has become more or less half C and half Ruby. It boasts a stackless bytecode-based VM (compare with Ruby 1.9, which does use the C stack), a "better" generational, compacting garbage collector, and a good bit more Ruby code in the core libraries, making several of the core methods easier to understand, maintain, and implement in the first place.

The promise of Rubinius is pretty large. If it can be made compatible, and made to run fast, it might represent a better Ruby VM than YARV. Because a fair portion of Rubinius is actually implemented in Ruby, being able to run Ruby code fast would mean all code runs faster. And the improved GC would solve some of the scaling issues Ruby 1.8 and Ruby 1.9 will face.

Rubinius also brings some other innovations. The one most likely to see general visibility is Rubinius's Multiple-VM API. JRuby has supported MVM from the beginning, since a JRuby runtime is "just another Java object". But Evan has built simple MVM support in Rubinius and put a pretty nice API on it. That API is the one we're currently looking at improving and making standard for user-land MVM in JRuby and Ruby 1.9. Rubinius has also shown that taking a somewhat more Smalltalk-like approach to Ruby implementation is feasible.

But here be dragons.

In the 1.5 years since Rubinius was officially named and born into the Ruby world, it has not yet met any of these promises. It is not generally faster than Ruby 1.8, though it performs pretty well on some low-level microbenchmarks. It is not implemented in Ruby: the current VM is written in C and the codebase hosts as much C code as it does Ruby code. Evan's work on a C++ rewrite of the VM will make Rubinius the first C++-based Ruby implementation. It has not reached the Rails singularity yet, though they may achieve it for RailsConf (probably in the same cobbled-together state JRuby did at JavaOne 2006...or maybe a bit better). And the second Rails inflection point--running Rails faster than Ruby 1.8--is still far away.

Compatibility is not going to be a problem for Rubinius. They've worked very hard from the beginning to match Ruby behavior, even launching a Ruby specification suite project to officially test that behavior using Ruby 1.8 as the standard. I have no doubt Rubinius will be able to run Rails and most other Ruby apps people throw at it. And despite Evan's frequent cowboy attitude to language compatibility (such as his early refusal to implement left-to-right evaluation ordering, a fatal decision that led to the current VM rework), compatibility is likely to be a simple matter of time and effort, driven by the spec suite and by actual applications, as people start running real code on Rubinius.

Performance is going to be a much harder problem for Rubinius. In order for Rubinius to perform well, method invocation must be extremely fast. Not just faster than Ruby 1.8 or Ruby 1.9, but perhaps an order of magnitude faster than the fastest Ruby implementations. The simple reason for this is that with so much of the core classes implemented in Ruby, Rubinius is doing many times more dynamic invocations than any other implementation. If a given String method represents one or two dynamic calls in JRuby or Ruby 1.8, it may represent twenty in Rubinius...and sometimes more. All that dispatch has a severe cost, and on most benchmarks involving heavily Ruby-based classes Rubinius has absolutely dismal performance--even with call-site optimizations that finally pushed JRuby's performance to Ruby 1.9 levels. A few benchmarks I've run from JRuby's suite must be ratcheted down a couple orders of magnitude to even complete.

And the Rubinius team knows this. Over the past few months, more and more core methods have been reimplemented in C as "primitives", sometimes because they have to be to interact with C-level memory and VM constructs, but frequently for performance reasons. So the "Ruby in Ruby" implementation has evolved away from that ideal rather than towards it, and performance is still not acceptable for most applications. In theory, none of this should be insurmountable. Smalltalk VMs run significantly faster than most Ruby implementations and still implement all or most of the core in Smalltalk. Even the JVM, largely associated with the statically-typed Java language, is essentially an optimized dynamic language VM, and the majority of Java's core is implemented in Java...often behind interfaces and abstractions that require a good dynamic runtime. But these projects have hundreds of man-years behind them, where Rubinius has only a handful of full-time and part-time enthusiastic Rubyists, most with no experience in implementing high-performance language runtimes. And Evan is still primarily responsible for everything at the VM level.

Of course, it would be folly to suggest that the Rubinius team should focus on performance before compatibility. The "Ruby in Ruby" meme needs to die (seriously!), but other than that Rubinius is an extremely promising implementation of Ruby. Its performance is terrible for most apps, but not all that much worse than JRuby's performance was when we reached the Rails singularity ourselves. And its design is going to be easier to evolve than comparable C implementations, assuming that people other than Evan learn to really understand the VM core. I believe the promise of Rubinius is certainly great enough to continue the project, even if the perils are going to present some truly epic challenges for Evan and company to overcome.

Update: The Rubinius team has had a few things to say about this as well.

Evan Phoenix argues that the 50/50 split between C and Ruby really translates into a lot more logic in Ruby, because of course we all know about Ruby's legendary terseness and density. And he's got a good point; even at 50/50 Rubinius is easily "mostly" Ruby. But nobody would claim that the JVM is Java in Java, even though the ratio of C/C++ code to Java code is vastly in Java's favor. Rubinius has a C core...Rubinius's VM is written in C. It might be 100% "Ruby in Ruby" some day, but for now it's not.

Brian Ford posted further about the "Ruby in Ruby" meme, arguing rightly that "Ruby in Ruby" is an ideal we should all be striving for, and that ideal should never die. I agree wholeheartedly on that point. The meme I referred to is the growing idea that Rubinius is automatically going to be a better implementation than all others simply because it's written in Ruby. So far that hasn't been the case. And that meme has also been used as a club, claiming other implementations (even Matz's own implementation) are not for Ruby programmers as much as Rubinius is.

Rubinius is, and always has been, a great project and a great idea. I talk with Evan and Brian and all the others on a daily basis, I contribute specs whenever I find gaps or fix bugs in JRuby, and I secretly harbor a desire to implement a JRuby/JVM backend for the Rubinius kernel. I'm sure we'll see great things from Rubinius in the future.

IronRuby

I've had a love/hate relationship with Microsoft over the years. Recently, it's been more "I love to hate them", but there are some shining stars over there. And IronRuby is certainly one of them.

Microsoft's IronRuby project (or perhaps "Microsoft IronRuby") is the current most-viable .NET-based Ruby implementation. It is led by John Lam, formerly of RubyCLR fame, and a small team of folks in Microsoft's languages group. IronRuby really has its roots in the Ruby.NET project from Queensland University of Technology, and like JRuby the real seed for both projects was the implementation of a Ruby 1.8-compatible parser. IronRuby is based on Microsoft's Dynamic Language Runtime, a collection of libraries to make dynamic languages easier to implement and to work around the performance constraints of CLR's strong preference for static-typed languages. In recent months it's probably safe to say that IronRuby has been driving DLR work, since by my estimation it represents the most difficult-to-implement dynamic language Microsoft is currently working on, and IronPython is mostly done, other than ongoing performance work.

I call IronRuby a shining star not because of the implementation, which is fairly mundane, or because of the DLR, which is perhaps clever in places but certainly "just plain necessary" for CLR dynlang performance in others. IronRuby is a shining star because it's the first Microsoft project under the Microsoft Permissive License, a "truly OSS" license approved by OSI. It represents the first project at Microsoft that I've thought gives the company any real hope for the future, because John Lam and company could truly show the advantage of more openness and closer community cooperation. So the OSS thing is certainly part of the promise of IronRuby, but it's not really a Ruby thing.

The main promise of IronRuby is a compatible, performant implementation of Ruby that runs on the CLR (and by extension, runs well on Windows). IronRuby currently mostly uses normal CLR types for all the core classes, building Ruby's String on CLR's StringBuilder, Ruby's Array on CLR's ArrayList, and so on. They've also made Ruby objects and CLR objects largely indistinguishable from one another as far as call dispatch goes, where in JRuby all non-Ruby Java objects entering the system must be wrapped with a Ruby-aware dispatcher (to be remedied in JRuby 2.0ish, as mentioned above). Of course IronRuby boasts advantages similar to JRuby, since it can leverage the CLR garbage collector, memory model, performance. And of course, having Microsoft backing your project should count for something.

IronRuby could also provide a Rails deployment option that Windows folks will actually want to use. Windows support in Ruby has always lagged behind UNIX support, partially because Windows "sucks" for various definitions of "sucks", and partially because most Rubyists don't use Windows. The ones that do use Windows have often felt abandoned, leading to projects like Daniel Berger's Ruby fork Sapphire, which counts among its features "Better support for MS Windows". IronRuby on .NET on Windows would in theory integrate very well with other Windows/.NET properties like IIS, ADO (or whatever it's called now), and Microsoft's new MVC layer. So for Windows users, IronRuby ought to be a big win, and they're understandably excited about it. Set up a tweetscan for "ironruby" and you'll see what I mean...there are nearly as many anticipatory tweets about IronRuby as there are practical tweets about JRuby.

But there's some peril here too. IronRuby is largely still being developed in a vacuum. Perhaps in order to have secrets to announce at "the next big conference" or perhaps because Microsoft's own policies require it, IronRuby's development process proceeds largely from all-internal commits, all-internal discussions, and all-internal emails that periodically result in a blob of code tossed over the fence to external contributors. The OSS story has improved, since those of us on the outside can actually get access to the code, but the necessary two-way street still isn't there. That's going to slow progress, and eventually could make it impossible for IronRuby to keep up as resources are moved to other projects at Microsoft. JRuby has managed to sustain for as long as it has with only two fulltime developers entirely because of our community and openness, and indeed JRuby would never have been possible without a fully OSS process.

IronRuby is also going to have trouble running Rails in its current form. Rails 2.x is still hindered by its inability to process concurrent requests in parallel on a single process. Because of various thread-unsafeties in the Rails libraries, concurrent requests must be shunted off to separate processes, or in the case of JRuby to separate JRuby instances in the same JVM process. There is work underway to improve this, some of it through GSOC, but it's still going to be a while before you can run your entire app on a single process with any of the C implementations. Even then, if you want to run many Rails apps, you'll still need multiple process with Ruby 1.8. And this is where IronRuby is going to get burned.

As I understand it, currently there is no way to provide multiple isolated execution environments in IronRuby as you can in JRuby or Rubinius. The reasons for this are beyond me, but many language implementations on top of a general-purpose VM avoid the complexity of "multiple runtimes" to better integrate with the rest of the system. JRuby's MVM support, for example, makes serializing Ruby objects as though they were normal Java objects nearly impossible, because upon deserialization there's no way to know which JRuby instance to attach the object to. In IronRuby's case, any inability to run multiple environments in the same process will mean IronRuby users must launch multiple processes to run Rails, just like the Ruby 1.8 and 1.9 users do. And that would be, in my opinion, an unacceptable state of affairs.

I also believe that the IronRuby team does not yet understand the scope of what's necessary to run Rails. John Lam has been quoted at several events saying they hope to run a "hello world" Rails app at RailsConf, but IronRuby can't run IRB, RubyGems, or Rake yet. John has also been tweeting periodic updates on IronRuby's spec-passing rate, even though Rubinius passes most of those specs and still can't run Rails (and JRuby passes more than Rubinius along with another 48000 assertions in our own test suite, only some of which have equivalents in the specs). As far as time spent on implementation, IronRuby really only has about a year of progress in, since Silverlight integration, demos, and presentations often pull them away for weeks at a time.

There's also a final peril the IronRuby will have to deal with: Microsoft would never back an OSS web framework like Rails in preference to its own. John Lam has repeatedly said that IronRuby will run Rails, and I believe him. But that goal is almost certainly not a Microsoft priority, since they have their own proprietary technologies to push. If John's able to do it, it will have to come from his small team and community contributors, leading back to the OSS peril above.

Be that as it may, an implementation of Ruby for the CLR is certainly going to happen, and I believe it's necessary for the Ruby ecosystem to survive for there to be a CLR Ruby. IronRuby is going to be that project, and it's already driving competition in the Ruby implementation world. John Lam and I talk at conferences, exchange tweets and emails, and I've been building and running IronRuby occasionally to check on their progress. I also know the IronRuby team realizes they could be more open and probably wants to be more open. And perhaps if they read this article they'll start to realize there's a lot more work involved in reaching the Rails singularity than running specs and having a working String implementation. There's pain involved, and they've not yet begun to feel it.

MacRuby

Ruby 1.9 fixes some of MatzRuby's issues, but not all of them. Though it brings a much-faster bytecode VM, improved method-dispatch cost, and a number of other execution-related performance tweaks, it does not solve problems with Ruby's memory model and garbage collector. And partially for this reason Apple's Laurent Sansonetti has been working on MacRuby, a forked rework of Ruby 1.9 targeting the Objective C runtime.

Laurent is famous for his past work on RubyCocoa, bindings to allow Ruby to deliver beautiful, top-notch UIs on OS X. I don't know how long he's been working on MacRuby, since some of its life was spent in secret, but it's been open-source for a few months now. Currently Laurent has been working on converting the core classes from C implementations over to using ObjC equivalents. Part of the goal of MacRuby is to provide a Ruby that can interact directly with ObjC: Ruby objects are ObjC objects and vice versa; Ruby can call methods on ObjC objects and vice-versa. So in this sense, MacRuby is perhaps more similar to JRuby or IronRuby than to Rubinius or the MatzRuby lineage. It is an implementation of Ruby for a general-purpose runtime.

MacRuby promises one thing for certain: tight integration with ObjC and by extension with much of OS X. Because ObjC is the language of choice for development on OS X, MacRuby users will certainly have the cleanest, tightest integration with OS libraries and primitives, far better than any other implementation can provide.

There's also a chance that ObjC's core classes (which essentially are repurposed as MacRuby's core classes) will have better peformance characteristics than the hand-written C impls in MatzRuby. Because MacRuby is based on Ruby 1.9, it shares Ruby 1.9's bytecode-based execution engine. This means that most performance gains will come from better core class implementations. Already Laurent has been tweeting some impressive microbenchmark numbers showing e.g. Array performance can be substantially better than Ruby 1.9. And there are performance gains to be had calling ObjC code, of course, since dispatch is now essentially ObjC dispatch directly rather than passing through Ruby's own dispatch logic.

MacRuby may also eventually represent a better way to run Rails on OS X. Because of YARV, it should have good performance. Because of ObjC, it should have a good memory model. And because it's "MacRuby" it should fit well into the rest of the system, likely leading to a simpler, better-integrated deployment story for Rails.

The biggest peril for MacRuby is pretty obvious: Why Ruby 1.9? Ruby 1.9 is still under active development, and APIs are being added and tweaked as we speak. Laurent's fork is going to get further and further away, even if he's able to keep portions up-to-date; that's going to make compatibility a serious challenge. Choosing Ruby 1.9 certainly makes sense from a performance perspective, but many Ruby apps out there don't run on it, and most developers aren't targeting it because it's still a moving target. Even on JRuby, where we've got prototype implementations of both YARV's and Rubinius's bytecode engines, we've opted not to hit 1.9 features hard yet. Ruby 1.9 is in progress, and that will mean a lot more effort required to keep MacRuby up to date and to make it an attractive option.

There's also a chance that Ruby implemented on top of Objective C isn't going to perform that much better, on the whole, than Ruby 1.9's all-C approach. While some of Laurent's benchmarks have been impressively faster than Ruby 1.9, many others have been equally slower. And all benchmarks I've run, which are a little less "micro", have been slower on MacRuby than either Ruby 1.9 or MatzRuby. That's not very promising.

Regardless of the peril, MacRuby seems like a great idea. Objective C is a solid runtime, and the fact that it's so heavily used on OS X means MacRuby will have an excellent integration story. MacRuby may not add much value in the runtime/performance/execution department, but will potentially teach us all lessons about integrating with a general-purpose runtime. And of course MacRuby may eventually be shipped with OS X, making Ruby a first-class language for writing Mac apps. That alone is probably worth it.

Fading Implementations

It's worth spending a few words on the "no longer viable" implementations here as well.

XRuby, product of Xue Yong Zhi and a few others, was the first Ruby implementation to have a full JVM bytecode compiler. Performance early on looked very good, but the lack of a compatible set of core classes and resource limitations caused its development to lag terribly. The most recent release was 0.3.3 on March 24, and since then there have been only two commits. Xue has admitted he has no time to work on the project, and without a heavy, long-term time investment from someone XRuby is likely to fade away.

Ruby.NET is a similar story. Started off a research grant from Microsoft at Queensland University of Technology, Ruby.NET is the product of John Gough and Wayne Kelly. The project was a proof-of-concept for Ruby on CLR, to show it could be done and work through some of the early challenges in getting there. It was officially made into an open source project last year, and for a while there were many interested contributors. But although the Ruby.NET parser lives on in IronRuby, the project has largely ground to a halt since Wayne officially threw his support behind Microsoft's project. There have been two commits in the past month, and the last two really active mailing list threads were titled "The future of Ruby.NET" and "Has IronRuby killed off Ruby.NET".

And Cardinal is still pining for the fjords.

New Contenders

I don't know much about these projects, but I'm sure they'll all bring their own flavor to the Ruby soup.

HotRuby is an implementation of Ruby in JavaScript. It implements Ruby 1.9's bytecode engine and all core classes using JavaScript types. Performance looks good on their site, but they can't run anything yet and haven't even begun to discuss compatibility. It may show promise in time, but it's an interesting toy for now.

MagLev from GemStone appears to be something Ruby-related. Not much (anything?) has been publicly said about it. It's mysterious. It has a nice viral front page. Rails may be involved.

IronMonkey is an effort to port IronPython and IronRuby to the Tamarin VM Adobe donated to the Mozilla project. It's being led by Seo Sanghyoen, though from what I hear he hasn't had a lot of time to work on it. Python and Ruby in the browser, cross-platform, without vendor-lock in. Could be interesting.

Final Thoughts

Have you started working on your Ruby implementation yet? All the cool kids are doing it. It's remarkable how many implementations of Ruby are in the works right now. It remains to be seen whether the ecosystem can support such diversity in the long term, but at the very least they're introducing splendid variation. And there's a lot more to do with Ruby in terms of performance, scaling, and "getting things done". Ruby's future is looking bright, in no small part due to the many implementations. How's your favorite language looking?

Update: Vladimir Sizikov has a nice short article on the value of the RubySpecs project, and since it's so important for the future of these alternative implementations I thought it deserved a mention. He also includes links to his RubySpecs quickstart guide and the current RubySpecs overview page. If you haven't contributed to the specs yet, you should feel guilty.

Tuesday, February 26, 2008

JRuby in Google Summer of Code 2008

Greetings!

Google's Summer of Code for 2008 is starting up again, and we're looking for folks to submit proposals. The JRuby Community or Sun or me or someone will sign up as a mentoring organization, so start thinking about or discussing possible proposals. Here's a few ideas to get you started (and hopefully, there are many other ideas out there not on this list):
  • Writing a whole suite of specs for current Java integration behavior...and then expanding that suite to include behavior we want to add. This work would go hand-in-hand with a rework of Java Integration we're likely to start soon.
  • Collect and help round out all the profiling/debugging/etc tools for JRuby and get them to final releasable states. There's several projects in the works, but most of them are stalled and folks need better debugging. This project could also include simply working with existing IDEs (NetBeans, etc) to figure out how to get them to debug compiled Ruby code correctly (currently they won't step into .rb files).
  • Continue work on an interface-compatible RMagick port. There's already RMagickJR which has a lot of work into it, but nobody's had time to continue it. A working RMagick gem would ease migration for lots of folks using Ruby.
  • Putting together a definitive set of fine and coarse-grained benchmarks for Rails. JRuby on most benchmarks has been faster than Ruby 1.8.6...and yet higher-performance Rails has been elusive. We need better benchmarks and better visibility into core Rails. Bonus work: help nail down what's slower about JRuby.
  • Survey all existing JRuby extensions and put together an official public API based on core JRuby methods they're using. This would help us reduce the hassle of migrating extensions across JRuby versions.
Please, anyone else who has ideas, feel free to post them here or on the mailing list for discussion. And if you have a proposal, go ahead and mail the JRuby dev list directly.

Update: Martin Probst added this idea in the comments:
Another idea might be to "fix" RDoc, whatever that means. That's not really JRuby centric, but still a very worthwhile task, I think.

A documentation system that allows easy doc writing (Wiki-alike) and provides a better view on the actual functionality you can find in a certain instance would be really helpful. Plus a decent search feature.
And that makes me think of another idea:
  • Add RDoc comments to all the JRuby versions of Ruby core methods, and get RI/RDocs generating as part of JRuby distribution builds. Then we could ship RI and have it work correctly. Ola Bini has already started some of the work, creating an RDoc annotation we can add to our Ruby-bound methods.
Update 2: sgwong in the comments suggested implementing the win32ole library for JRuby. This would also be an excellent contribution, since there's already libraries like Jacob to take some of the pain out of it, and it would be great to have it working on JRuby. And again, this makes me think of an few additional options:
  • Implement a full, compatible set of Apple Cocoa bindings for JRuby. You could use JNA, and I believe there's already Cocoa bindings for Java you could reuse as well, but I'm not familiar with them.
  • Complete implementation of the DL library (Ruby's stdlib for programmatic loading and calling native dynamic libraries) and/or Rubinius's FFI (same thing, with a somewhat tidier interface). Here too there's lots of help: I've already partially implemented DL using JNA, and it wouldn't be hard to finish it and/or implement Rubinius's FFI. And implementing Rubinius's FFI would have the added benefit of allowing JRuby to share some of Rubinius's library wrappers.

Thursday, July 12, 2007

To Keyword Or Not To Keyword

One of the most attractive aspects of Ruby is the fact that it has relatively few sacred keywords. In most cases, things you'd expect to be keywords are actually methods, and you can wrap or hook their behavior and create amazing potential.

One perfect example of this is require. Because require is just a method, you can define your own version that wraps its behavior. This is exactly how RubyGems does its magic...rather than immediately calling the default require, it can modify load paths based on your installed gems, allowing for a dynamically-expanding load path and the pluggability we've all come to know and love.

But all such keyword-like methods are not so well behaved. Many methods make runtime changes that are otherwise impossible to do from normal Ruby code. Most of these are on Kernel. I propose that several of these methods should actually be keywords.

Update: Evan Phoenix of Rubinius (EngineYard), Wayne Kelly of Ruby.NET (Queensland University), and John Lam of IronRuby (Microsoft) have voiced their agreement on this interesting ruby-core mailing list thread. Have you shared your thoughts?

Justifying Keywords

There's a number of (in my opinion, very strong) justifications for this:
  1. Many Kernel methods manipulate runtime state in ways no other methods can. For example: local_variables requires access to the caller's variable scope; block_given? requires access to the block/iter stacks (in MRI code); eval requires access to just about everything having to do with a call; and there are others, see below.
  2. Because many of these methods manipulate normally-inaccessible runtime state, it is not possible to implement them in Ruby code. Therefore, even if someone wanted to override them (the primary reason for them to be methods) they could not duplicate their behavior in the overridden version. Overriding only destroys their utility.
  3. These methods are exactly the ones that complicate optimizing Ruby in all implementations, including Ruby 1.9, Rubinius, JRuby, Ruby.NET, and others. They confound a compiler's efforts to optimize calls by always leaving open questions about the behavior of a method. Will it need access to a heap-allocated scope? Will it save off a binding or the current call frame? No way to know for sure, since they're methods.
In short, there appears to be no good reason to keep them as methods, and many reasons to make them keywords. What follows is a short list of such methods and why they ought to be keywords:
  • *eval - requires implicit access to the caller's binding
  • block_given?/iterator? - requires access to block/iter information
  • local_variables - requires access to caller's scope
  • public/private/protected - requires access to current frame's visibility
There may be others, but these are definitely the biggest offenders. The three points above were used to compile this list, but my criteria for a keyword could be the following more straightforward points. A feature should be implemented (or converted to) a keyword if it fits either of the following criteria:
  • It manipulates runtime state in ways impossible from user-created code
  • It can't be implemented in user-created code, and therefore could not reasonably be overridden or hooked to provide additional behavior
As an alternative, if modifications could be made to ensure these methods were not overridable, Ruby implementations could safely treat them as keywords; searching for calls to "eval" in a given context would be guaranteed to mean an eval would take place in that context.

What do we gain from doing all this?

I can at least give a JRuby perspective. I expect others can give their perspectives.

In JRuby, we could greatly optimize method invocations if, for example, we knew we could just use Java's local variables (on Java's stack) rather than always heap-allocating a scoping structure. We could also avoid allocating a frame or binding when they are not needed, just allowing Java's call frame to be "enough" for us. We can already detect if there are closures in a given context, which helps us learn that a heap-allocated scope will be necessary, but we can't safely detect eval, block_given?, etc. As a result of these methods-that-would-be-keywords, we're forced to set up and tear down every method in the most expensive manner.

Other implementations&emdash;including Ruby 1.9/2.0 and Rubinius&emdash;would probably be able to make similar optimizations if we could calculate ahead of time whether these keyword operations would occur.

For what it's worth, I may go ahead and implement JRuby's compiler to treat these methods as keywords, only falling back on the "method" behavior when we detect in the rest of the system that the keyword has been overridden. But that situation is far from ideal...we'd like to see all implementations adopt this behavior and so benefit equally.

As an example, here's an early demonstration of the performance change in our old friend fib() when we can know ahead of time if any of these keywords are called (fib calls none of them). This example shows the performance today and the performance when we can safely just use Java local variables and scoping constructs. We could additionally omit heap-allocated frames for each call, giving a further boost.

I've included Ruby 1.8.6 to provide a reference value.


Current JRuby:
~ $ jruby -J-server bench_fib_recursive.rb
1.323000 0.000000 1.323000 ( 1.323000)
1.118000 0.000000 1.118000 ( 1.119000)
1.055000 0.000000 1.055000 ( 1.056000)
1.054000 0.000000 1.054000 ( 1.054000)
1.055000 0.000000 1.055000 ( 1.054000)
1.055000 0.000000 1.055000 ( 1.055000)
1.055000 0.000000 1.055000 ( 1.055000)
1.049000 0.000000 1.049000 ( 1.049000)

~ $ jruby -J-server bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
3.901000 0.000000 3.901000 ( 3.901000)
4.468000 0.000000 4.468000 ( 4.468000)
2.446000 0.000000 2.446000 ( 2.446000)
2.400000 0.000000 2.400000 ( 2.400000)
2.423000 0.000000 2.423000 ( 2.423000)
2.397000 0.000000 2.397000 ( 2.397000)
2.399000 0.000000 2.399000 ( 2.399000)
2.401000 0.000000 2.401000 ( 2.401000)
2.427000 0.000000 2.427000 ( 2.428000)
2.403000 0.000000 2.403000 ( 2.403000)
Using Java's local variables instead of a heap-allocated scope:
~ $ jruby -J-server bench_fib_recursive.rb
2.360000 0.000000 2.360000 ( 2.360000)
0.818000 0.000000 0.818000 ( 0.818000)
0.775000 0.000000 0.775000 ( 0.775000)
0.773000 0.000000 0.773000 ( 0.773000)
0.799000 0.000000 0.799000 ( 0.799000)
0.771000 0.000000 0.771000 ( 0.771000)
0.776000 0.000000 0.776000 ( 0.776000)
0.770000 0.000000 0.770000 ( 0.769000)

~ $ jruby -J-server bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
3.100000 0.000000 3.100000 ( 3.100000)
3.487000 0.000000 3.487000 ( 3.487000)
1.705000 0.000000 1.705000 ( 1.706000)
1.684000 0.000000 1.684000 ( 1.684000)
1.678000 0.000000 1.678000 ( 1.678000)
1.683000 0.000000 1.683000 ( 1.683000)
1.679000 0.000000 1.679000 ( 1.679000)
1.679000 0.000000 1.679000 ( 1.679000)
1.681000 0.000000 1.681000 ( 1.681000)
1.679000 0.000000 1.679000 ( 1.679000)
Ruby 1.8.6:
~ $ ruby bench_fib_recursive.rb     
1.760000 0.010000 1.770000 ( 1.775304)
1.750000 0.000000 1.750000 ( 1.770101)
1.760000 0.010000 1.770000 ( 1.768833)
1.750000 0.010000 1.760000 ( 1.782908)
1.750000 0.010000 1.760000 ( 1.774193)
1.750000 0.000000 1.750000 ( 1.766951)
1.750000 0.010000 1.760000 ( 1.777814)
1.750000 0.010000 1.760000 ( 1.782449)

~ $ ruby bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
2.240000 0.000000 2.240000 ( 2.268611)
2.160000 0.010000 2.170000 ( 2.187729)
2.280000 0.010000 2.290000 ( 2.292342)
2.210000 0.010000 2.220000 ( 2.250331)
2.190000 0.010000 2.200000 ( 2.210965)
2.230000 0.000000 2.230000 ( 2.260737)
2.240000 0.010000 2.250000 ( 2.256210)
2.150000 0.010000 2.160000 ( 2.173298)
2.250000 0.010000 2.260000 ( 2.271438)
2.160000 0.000000 2.160000 ( 2.183670)

What do you think? Is it worth it?

Tuesday, May 22, 2007

The Final Bugs

We're within weeks of a final JRuby 1.0. We've pared down the bugs that we think we can or must fix for 1.0 and this email is a summary of the ones we need help with. So whatever time you can spare, please have a look and help resolve these.

In order of decreasing priority in JIRA:

JRUBY-820: Net::HTTP.get behaves differently form MRI, failing to get UTF8 properly
-and-
JRUBY-828: UTF-8 regular expressions aren't working

I believe these are both largely the same problem, and it's probably the most visible remaining bug we need to fix. Regular expressions in JRuby currently do not work well with unicode strings, both as incoming match strings and as the regular expression strings themselves. REXML uses /u regular expressions, which is why I believe these issues are closely related. Wes Nakamura has commented that he's looking into 828, but more eyes will help ensure this is fixed.
I believe these are true blocking bugs for 1.0, and I'm not comfortable doing a release if they are not resolved.

JRUBY-971: Make it clear which Java method maps to each equality method

I think we have subtle equality bugs remaining largely because we don't use a consistent naming convention for the Java implementations of Ruby methods like ==, ===, eql? and so on. We should make a quick sweep through the system and make sure all core JRuby code is using the same Java method names for all of these, and binding them in the same way. Note: This is actually the cause of some set_trace_func bugs, since the default === impl should actually invoke ==, causing trace events for both.

JRUBY-969: Add rake and rspec to standard JRuby distribution for 1.0

We have decided to ship both Rake and RSpec with JRuby 1.0, since they are both well-accepted and widely used (moreso for Rake, but RSpec has gained a lot of acceptance the past year). But the mechanism of including them is unclear. I do not want to commit installed gems to SVN; I would prefer to just install them as needed for testing and when building the JRuby distribution. Thoughts?

JRUBY-672: java.lang.Class representation of Ruby class not retrievable

This bug is basically just looking for a way to get at the actual java.lang.Class representing a Ruby-based extension of a Java class. There may have been an API added with Bill's work, or this could be simple to add.

JRUBY-914: JRuby's BigDecimal needs to round

Stu submitted a patch for this, but it unfortunately depends on behavior only present in Java 5 and higher. Since most of us are pretty unfamiliar with BigDecimal, we don't know of a good workaround to support it correctly in Java 1.4.

JRUBY-822: jruby fails to report process exit status correctly

I think Nick may have a grip on this one, but if anyone can offer suggestions as to why the exit codes are apparently shifted in this way, please let us know.

JRUBY-966: Various issues with LocalJumpError not being created early enough
-and-
JRUBY-767: next in an eval should produce a local jump error

These are likely going to be up to Tom and I since they involve pretty deep runtime/interpreter work. But we're also looking for any remaining known issues with LocalJumpError/JumpException that are still out there. I believe we've fixed all such externally-reported issues at the moment.

JRUBY-888: Make regular distro/build create the all-in-one jar by default

We waffle back and forth on this. What's the answer? An all-in-one jar by default, or on request?

JRUBY-873: ant test thread tests sometimes run forever

I managed to get a trace on this one. It appears that some of the tests cause a deadlock between stopping a thread and other operations. But I don't believe anyone's seen it in normal execution yet (or at least nobody's reported it). I'm probably the only one familiar enough with the threading subsystem to fix it, but I'd like to know if anyone's seen other issues.

JRUBY-98: allow "/" as absolute path in Windows
-and-
JRUBY-36: Dir['...'] incompatibilities between Ruby and accross platforms
-and-
JRUBY-61: IO CRLF compatibility with cruby on Windows
-and-
JRUBY-644: Windows line delimiter (\r\n) cause position errors

These blasted Windows pathing and newline bugs just seem to hang on forever. Nobody wants to fix them. There must be someone out there that can help resolve these, yes?

JRUBY-884: Create and consolidate extension and non-standard options

There are a number of configuration options for JRuby. Some have their own flags, like -O to disable ObjectSpace and -C to force compilation of a target script. But many others are only configurable via Java system properties. What we really could use from the community is some idea which settings are worth adding special flags for. We can handle the rest. My personal opinion is that the new -J flag that allows passing flags to the JVM is actually enough.

By my estimation, the rest of the bugs are either almost done or are trivial enough they could be punted to a post 1.0 release. But if we can get these issues knocked down soon (starting with those top couple), 1.0 is going to be an extremely solid release.

Saturday, March 10, 2007

Ruby on Grails? Why the hell not?

I've spent some time this weekend looking over the Grails codebase. After a truly gigantic checkout (Grails bundles production JARs for all the libraries it integrates) and after swimming through a substantial, single-project sea of source files, I think I'm starting to follow it (though a tour by someone familiar with Grails internals would be nice). But there's something else I've learned in the process...

There's really no reason this couldn't use Ruby just as easily as Groovy.

Let's look at some really dumb numbers. Grails, as it turns out, is actually over 2/3 Java code, unlike Rails which is 100% Ruby. Grails contains around 436 Java source files for about 54kloc of code and 273 Groovy source files for about 23kloc of code. So Java outnumbers Groovy by a 2:1 ratio. And yes, Java's probably more verbose than Groovy in most cases. But my point stands, and I'll back it up with more...

Then there's the libraries. The Grails binary release weighs in at a whopping 26MB compressed and 49MB uncompressed (*wow*). But 31MB of that is third-party libraries completely independent of Groovy, things like Hibernate, XFire, Quartz, Apache Commons, and so on. Almost all of which (or perhaps all of which) are pure Java code. So the ratio is even more heavily toward Java.

These numbers and my exploration of the code base lead me to a surprising conclusion: Grails is not a "Groovy on Rails" by any means. It's a large number of popular, well-established Java libraries stitched together mostly by Java code with a thin layer at the front (20kloc is not much) of Groovy code for end-users to see and work with. And I guarantee you it's not the Groovy part that's most interesting here. Hell, you could write that part in Java if you followed enough "convention over configuration", and you certainly could write it in Ruby.

I decided to do this exploration after realizing there's nothing about Ruby or JRuby that couldn't be used to wire a bunch of Java libraries together--indeed, that's one of the biggest selling points of JRuby, the fact that you can write plain old Ruby code but call Java libraries almost seamlessly (with day-by-day improving integration points and performance numbers). So why not take Grails and do a port to Ruby? Groovy code isn't far off from Ruby in many ways, and everything you can do in Groovy you can do in Ruby (plus more), so the port should be pretty straightforward, right?

Then I find out that Grails is mostly Java code. And instead of being annoyed or repulsed by this revelation, I was intrigued. Why not make Grails work with Ruby as well? And really, why not make it work with any scripting language? Why limit that nicely-integrated back end by forcing everyone to use Groovy, regardless of preference? Have we learned nothing from a decade of "100% Pure Java"?

So I propose the following.
  • I believe it would be possible to completely isolate Grails' tastier bits from Groovy. I don't claim it would be easy, but Grails is fairly well interfacified, injectificated, and abstractilicious, so making that dividing line clear and bold is certainly possible (and heck, that's the point of dependency injection and component frameworks anyway, isn't it?). Really, I could start implementing a few of those interfaces right now with Ruby code, but that's not the right approach.
  • Grails' value is not in Groovy, but in the clean integration of established libraries behind the "Convention Over Configuration" mantra. There's nothing specifically Groovy (or Ruby) about that religion, and there's nothing Groovy does in Grails that another JVM language couldn't do just as easily. This is *truth*. And Grails would benefit tremendously by expanding to other languages. Being tied to Groovy alone will hold it back (just as a framework tied to any one language will be held back, Ruby included).
  • I also believe that if the Grails project doesn't make an official effort in this direction, others are likely to take up that mantle themselves. Though the debates will forever rage about which JVM dynlang should carry us into the future, the same facts that have guided programming for half a century still apply today: No one language will be enough to carry us forward; no languages will survive indefinitely; and the language you'll use in ten years you've probably not even heard of yet. Did anyone expect Java to be the most widely-used, widely-deployed language ten years ago? Well today, it is. It won't always be.
  • There's also a very interesting angle here for language enthusiasts. Nobody doubts that the Java platform is officially becoming multi-lingual, possibly a complete reversal from years past. But there are going to be growing pains here, and we're already feeling them on some projects at Sun. We have this wonderful platform with all these wonderful languages...and they all do things differently. We have a first stab at official "Scripting" support in JSR-223, but it only defines the integration point from Java to the language in question...only partially defining the integration points for those languages calling back to Java (a tricky problem) and defining *nothing* related to making those languages interoperate with each other. Again, Java becomes the integration point, and the bottleneck (as in "how do you pass a dynamic-typed object from one language to another by sending it through a statically typed language?"). The very exercise of making Grails (or any other framework) work equally well across multiple user-facing language interfaces will require better interoperability and more clearly-defined integration points for dynamic and static languages on the JVM.
  • The Phobos guys at Sun are already tackling many of these same problems while expanding their JavaScript-mostly Rails-inspired web framework to other JSR-223 languages (notably, Ruby). So anyone taking on the Grails effort would certainly not be alone, and there are some damn smart engineers at Sun interested in tackling these very same problems.
And I guess in the end it's a question of how visionary we all can be. Can the Grails team envision a future for Grails that doesn't require the use of Groovy? Are you (and am I) as a non-Groovy developer willing to help them get there? And in the end, isn't this what thought leaders on a multi-lingual platform are supposed to be doing?

--

So, enough of the inflammatory bull-baiting. I dare you to tell me why I'm wrong here. Even better, I dare you to look into it yourself. 75kloc of code, only a third of it Groovy. Established libraries you're sure to have used at some point. Clean integration, following the principals inspired and proven by Rails.

But fronted with your language of choice.

Make it so.

Tuesday, March 6, 2007

NetBeans 6 Ruby Support Even Better Than Expected

See, here's the deal. I'm not a very advanced IDE user. I used Eclipse for years and JDE before that, and I wouldn't ever have considered myself a "pro" with either. I learned what I needed to get the job done and not a whole lot else. Beyond that I never knew some of the coolest features, and was never so good at convincing people to make a switch. This actually has helped my migration to NetBeans, since it basically does all the things I was used to in Eclipse. But it has been problematic when I try to convey how cool the new NetBeans Ruby support really is.

It's super cool. It does things I've not seen any other IDE or editor do.

However, it's best not to take my advice, and Roumen Strobl has recorded two seriously excellent demonstrations of Rails and Ruby support in NetBeans. They make the sale way better than I could (though I now have many good ideas for how to sell it better in the future).

If you have any interest in Ruby editors or IDEs, you should take the time and watch these, especially the Ruby support.

Two Demos: JRuby on Rails and Advanced Ruby Editing in NetBeans!

Behind The Scenes: JRuby 0.9.8 Released!

Things have been moving quickly in JRuby-land, and we've just kicked out a major release in JRuby 0.9.8. Among the big features:
  • Ruby classes can extend concrete/abstract Java classes and override methods
  • New Java primitive array syntax
  • Reimplementation of String, Numeric classes, and Array to be more correct and performant
  • Significant bottlenecks have been identified. In some cases IO is 6.5x faster than previous releases. Java included classes are significantly faster than in the past.
  • 220 Jira issues resolved since last release
Now these are certainly delicious. Performance improvements are always great, and we're been improving Java integration little by little. But there's another bullet I omitted that is worth a little more attention:
  • Ruby on Rails support
Now you may feel free to read that as you wish. I read it as "we expect pure-Ruby Rails apps to work." By "pure Ruby", I mean no unsupported C-language extensions required. By "work" I mean "work."

It was our original plan over two months ago that by Februrary's end we'd be passing enough of Rails' own test cases to call it "supported". And in a relative rarity for an open source project, we hit that milestone. It was actually early last week we started saying we officially support Rails, which at the time meant better than 95% of Rails test cases passing. By the time we cleaned up the remaining release items, we had passed 98%. And we're still climbing.

So what does this mean for a practical Rails user? Well, it means that the dream of deploying Rails apps on any Java-based server in any Java-based organization is another big step closer to coming true. I've actually been demoing Rails apps running in a WAR file on GlassFish for all my recent talks, and it's pretty solid. I don't imagine it's perfect, of course, but it's looking really very good. It will be "ready" by the time we hit 1.0.

A few folks have been wondering about the jump from JRuby 0.9.2 to JRuby 0.9.8. The simple answer is this: we're two releases away from 1.0. We wanted to make it obvious how much work has gone into 0.9.8 (as many SVN revisions in this release as the last three combined) and show our commitment to getting a 1.0 release out very soon. The 0.9.8 release will be followed by 0.9.9, which will be followed by...well, you get the picture. We're in the home stretch now.

The other core team members have reported on the release, so I won't cover all that. Here's a list of links:

Tom Enebo
Ola Bini
Nick Sieger

In related news, there's also a new release of ActiveRecord-JDBC, the gem-installable module enabling JDBC databases to be used by ActiveRecord. This is also very welcome, since there have been a number of bug fixes in trunk that weren't easily installable the past couple months. Now you should be able to run both Rails 1.2.2 and 1.1.6 well with JRuby, just using released code. It helps complete the puzzle.

So what's next for JRuby? Well, with the 0.9.8 release we've reached a really big milestone as far as Ruby compatibility, so we're likely to turn our attentions toward areas not directly related to Ruby 1.8. Specifically:
  • Work will continue on the bytecode compiler, and I'm hoping to have the JIT permanently enabled within the next week
  • Java integration will get a heavier focus, to start bringing us more in line with languages like Groovy for ease-of-integration. In general things are looking pretty solid right now, but we know there's a lot of work to do here. I'd like for us to come as close as possible to Groovy's level of tight integration as possible.
  • We will also start working on native Unicode support, by exposing a Chars class like the one provided in Rails' MultiByte library. This will allow us to maintain compatibility with Ruby and provide solid Unicode support without introducing an incompatible set of features.
  • Both Tom and Ola have expressed an interest in continuing work on our Ruby 2.0 bytecode engine. Ola would like to continue improving and expanding the bytecode support, while Tom would like to play with a second parser that goes straight to bytecode. Both of those will help feed into a potentially faster interpreted mode in the future, giving us many of the perf gains shown in Ruby 1.9 benchmarks.
For me personally, performance is going to be the big item. The recent Ruby Shootout showed that we have some work to do to bring out performance up to MRI standards, but it didn't tell the whole story. The truth of the matter is more like this:
  • JRuby interpreted mode is roughly 3-4x as slow as Ruby 1.8
  • JRuby compiled mode is roughly 1.5-2x faster than Ruby 1.8
  • KRI (Ruby 1.9, "Koichi's Ruby Implementation") is many times faster than MRI for the posted benchmarks
  • KRI is 2x or less faster than MRI for the general case
And this is without us doing any optimization of the compiler or adding in any special-case optimizations like KRI provides for fixnums. We're still looking great for future performance, and I want to start tackling this area heavily over the next couple months.

As always, we welcome contributors, bug reports, patches, and user anecdotes for JRuby. We love to hear about bugs, we want to know your opinion, and we respect and honor our community whenever we get the chance. Remember, JRuby is your project, and can only be successful with your help.

Visit www.jruby.org to learn more about the project, download the release, and get involved!

On to the next release!

Wednesday, February 21, 2007

Tech Days Talk a Success

I just completed my talk at Tech Days Hyderabad, and I think it was resounding success. If I heard right, the total head count for my "JRuby Essentials" talk was over 1200 people, by far the largest crowd I've presented for. There was a hoard of questioners immediately after the talk, who followed me into the hall to keep asking. I gave out my entire in-pocket batch of business cards, and even received a round of applause after the JRuby on Rails demo. All this even after my talk was delayed and cut to 40 minutes because the earlier talks ran too long. It was a sprint, but I think it went very well.

I will be posting the slides, but they're basically the same content many of you have probably seen the past couple months. For those who haven't had the pleasure, we'll start posting slides more regularly, and I'll try to blog a few example walkthroughs for those of you who want to duplicate them for your own talks. Feel free to steal it all and spread the word.

Sunday, February 18, 2007

The JRuby WORLD TOUR 2007

Yes, you read that right! I'm announcing my intent to circumnavigate the globe in only NINE DAYS spreading JRuby and joy at every stop. The Earth shall be wrapped in JRuby goodness!

To celebrate this monumental occasion, I'm inviting all developers with an interest in Ruby or Java to join the JRuby mailing lists and volunteer your services. Eternal gratitude and temporary fame could be yours, for the small price of bug reports, bug fixes, or high-performance rewrites of core JRuby libraries (it's so easy!). Operators are standing by to receive your emails and direct you to the promised land!

Now, on to the tour!

First Stop: Amsterdam, Netherlands

The first leg of my journey takes me from my home in Minneapolis, Minnesota to beautiful Amsterdam. There I will enjoy an delicious airport breakfast followed by a five-hour tour of the terminal. I will be available for autograph signing from 7:00 to 11:30 by appointment only. The first annual JRuby New-Age Concert of Magic will follow from 11:30 to 11:35. Tickets are first-come, first-served; cash only, please! I will depart from Amsterdam at 11:55 via a chartered jet I'm generously sharing with three-hundred other passengers and a small KLM Dutch Airways flight crew, en route to exotic Hyderabad, India!

Second Stop: Hyderabad, India

The next part of the JRuby "World Domination Tour 2007" takes me to Sun's Tech Days event in Hyderabad. In addition to my JRuby session, Tech Days will host such scintillating topics as "iPod Giveaway" and "Java Jacket Giveaway". But on Wednesday the 21st at 3:35PM you too can learn why JRuby is the programming alchemist's "Developer's Stone", transmuting static lead into dynamic gold. The topics covered will be exactly like those in recent JRuby talks, except cooler, faster, and after 26 hours of non-stop travel and five hours of sleep. Prepare for a punchy, laugh-riotous affair! Will I be able to maintain 90WPM during my interactive demonstrations? Will I remember that "alias" takes two arguments *without* a comma and with the original method first? Join me for what's sure to be 50 minutes you'll remember the rest of your life!

Third Stop: Bangalore, India

From Hyderabad, I take a short one-hour flight to Bangalore, home to Sun Microsystems India and the third leg of the JRuby Globe-trotting Festival of Light! On Monday, February 23rd, I will present JRuby to my tropical counterparts on the opposite side of the earth, featuring the exact same topics from Tech Days...in RANDOM ORDER. You never know what I'm going to do next.

I actually have all that Sunday free and Tuesday up until about 8:00 free to do some exploring. Event suggestions are welcome, and I challenge any of you to find food too spicy for me to eat! (impossible, I say!)

Fourth Stop: Bangkok, Thailand

The next step takes me from Bangalore to beautiful Thailand, jewel of Southeast Asia and home to one of my favorite cuisines. I will be available from 4:00AM to 5:30AM for a special, once-of-a-lifetime event I'm calling "JRuby Dawn at Bangkok Airport". And the great question on everyone's lips will be foremost on my mind:

Will the airport's Thai restaurants open before I board my next flight at 6:00? Stay Tuned!

Fifth Stop: Tokyo, Japan

Continuing the Asian leg of the tour, I'll spend 80 fun-filled minutes exploring the international terminal in Narita, only an hour's train ride from the Emperor's Palace! Naturally I'll be available for handshakes and baby-kissing, and hopefully the always-humorous photograph of "buying beer from a vending machine". This trip will serve as a preview for the main Tokyo event: Ruby Kaigi 2007 in June, where I'll finally present JRuby to the Land of the Rising Sun. Come 3:10PM it's time for "so long Japan", but I'll be back soon!

Sixth Stop: Minneapolis, Minnesota

And just 9 days after I departed, I'll be home in Minneapolis again, ready for my next thrilling adventure: The Greater Wisconsin Software Symposium (a No Fluff Just Stuff event) in Milwaukee from March 2-4. Join me for my two fabulous sessions "Bringing Ruby and Rails to the JVM" and "Become Super-Powerful with JRuby", putting the greatest dynamic language ever in the palm of your JVM.

Thursday, January 18, 2007

JRuby Compiler: In Trunk and Ready to Play

Times they are a-changing.

I posted previously on JRuby's compiler work. There have been various iterations of the compiler, many purely prototype and never intended to be completed, and a few genuine attempts at evolving toward full Ruby support. However I believe in the recent weeks I've settled on a design that will carry us to the JRuby compiler endgame.

For the past year, we've emphasized correctness over performance nine times out of ten. When we did focus on performance, it was solely on improving JRuby's interpreter speed, in an attempt to match Ruby's performance in this area and because we knew that JRuby could never entirely escape interpretation. Ruby's just too dynamic for that. So while compatibility with Ruby 1.8.x continued to improve by leaps and bounds, our performance was rather poor in comparison.

This past fall, things started to change. Compatibility reached a point where we could finally be confident about our set of regression tests and our understanding of "how Ruby works" across all its weirdest features. As we understood better the design of the C implementation and the quirky intricacies of the language, we started to see a path to enlightenment. We started to realize how we could support Ruby as it exists today while simultaneously evolving JRuby into a more efficient and cleaner design. And so the performance numbers started to change.

From 0.9.0 to 0.9.1, we had a clean doubling of performance across the board. Our favorite benchmark--RDoc generation--was easily twice as fast, and other simpler benchmarks like fib had similar improvements. 0.9.2 was more of a rushed release for JavaPolis, but we had a good 1/4 to 1/3 speedup even then, since the ongoing refactoring removed another large chunk of overhead from JRuby's core runtime.

From 0.9.2 to current trunk, however, has been a different matter entirely.

The first major change is that we've started to seriously alter the way JRuby does dynamic method dispatching. I did some research, read a few papers, and mocked up and benchmarked a few options. What we've settled on for the moment is a combination of STI for the core classes (STI provides a large table mapping methods and classes to actual code) and various forms of inline caching for non-core classes (basically, for pure Ruby classes; though this is yet to be implemented in trunk). STI provides an extremely fast path for dispatch on those hardest-hit methods, since it reduces calling most core methods to two array indexes and a switch, a vast improvement over the hash lookup and multiple layers of abstraction and framing we had before.

We are continuing to expand our use of STI as it is applicable, and I will soon start exploring options for interpreted-mode inline caching (polymorphic, likely, though I need to run a few trials to get numbers balanced right). So fast dynamic dispatching is well on its way, and will improve performance across the board.

Then there's the compiler work. You have no idea how much it's irritated me to hear people talk about JRuby the past year and say "yeah, but it doesn't compile to Java bytecode." This obviously amounts to pure FUD, but beyond that it totally ignores the complexity of the problem: not a single person on this earth has managed to compile Ruby to a general-purpose VM yet. So complaining about our missing compiler is a bit like complaining that we haven't moved mountains. Honestly people, what do you expect?

Of course, there's the flip side of this statement: compiling Ruby is a hard problem, and I like hard problems. For me it's doubly hard, since I've never written a compiler before. But hell, before JRuby I'd never even worked on an interpreter or language implementation before, and that seems to have gone alright. So there it is...Mount Ruby, waiting to be climbed. And climb it I must!

The current compiler design lives in two halves: the AST-walking half; and the code-generation half. I chose to split these two because it make several things easier. For starters, it allows me to abstract all the bytecode generation logic behind a simple interface, an interface that presents coarse-grained operations like invokeDynamic() and retrieveLocalVariable(). The ultimate implementation of those operations can then be modified at will. It also allows us to evolve the AST independently of the compiler backend, even to the point of swapping in a completely different parser and in-memory code representation (like YARV bytecodes) without harming the evolving code generator backend. So this split helps future-proof the compiler work.

The current design also has another advantage: not all of Ruby has to compile for it to be useful. Currently, as the AST walker encounters nodes, if it finds a node it can't deal with it simply raises an exception. Compilation terminates, and the compiler's client can deal with the result as it will. This leads to a really powerful feature of this design: we can install the compiler now as a JIT and as it evolves more and more code will automatically get optimized. So once we're confident that a given node type is 100% compiling correctly, that node will now be eligible for JIT compilation. As an example, here's the output from a gem installation with the current compiler enabled as a JIT (with my logging in place, naturally):
compiled: TarHeader.empty?
compiled: Entry.initialize
compiled: Entry.full_name
compiled: Entry.bytes_read
compiled: Entry.close
compiled: Entry.invalidate
Successfully installed rake, version 0.7.1
Installing ri documentation for rake-0.7.1...
compiled: LeveledNotifier.notify?
compiled: LeveledNotifier.<=>
compiled: RubyLex.getc
compiled: null.debug?
compiled: BufferedReader.ungetc
compiled: Token.set_text
compiled: RubyLex.line_no
compiled: RubyLex.char_no
compiled: BufferedReader.column
compiled: RubyToken.set_token_position
compiled: Token.initialize
compiled: RubyLex.get_read
compiled: RubyLex.getc_of_rests
compiled: BufferedReader.getc_already_read
compiled: BufferedReader.peek
compiled: RubyParser.peek_tk
compiled: TokenStream.add_token
compiled: TokenStream.pop_token
compiled: CodeObject.initialize
compiled: RubyParser.remove_token_listener
compiled: Context.ongoing_visibility=
compiled: PreProcess.initialize
compiled: AttrSpan.[]
compiled: null.wrap
compiled: JavaProxy.to_java_object
compiled: Lines.next
compiled: Line.isBlank?
compiled: Fragment.add_text
compiled: Fragment.initialize
compiled: ToFlow.convert_string
compiled: LineCollection.add
compiled: Entry_.path
compiled: Entry_.directory?
compiled: Entry_.dereference?
compiled: AttrSpan.initialize
compiled: Entry_.prefix
compiled: Entry_.rel
compiled: Entry_.remove
compiled: Lines.rewind
compiled: AnyMethod.<=>
compiled: Description.serialize
compiled: AttributeManager.change_attribute
compiled: AttributeManager.attribute
compiled: ToFlow.annotate
compiled: NamedThing.initialize
compiled: ClassModule.full_name
compiled: Lines.initialize
compiled: Lines.empty?
compiled: LineCollection.normalize
compiled: ToFlow.end_accepting
compiled: Verbatim.add_text
compiled: FalseClass.to_s
compiled: TopLevel.full_name
compiled: Attr.<=>
Installing RDoc documentation for rake-0.7.1...
compiled: Context.add_attribute
compiled: Context.add_require
compiled: Context.add_class
compiled: AbstructNotifier.notify?
compiled: Context.add_module
compiled: LineReader.read
compiled: null.instance
compiled: HtmlMethod.path
compiled: HtmlMethod.aref
compiled: ContextUser.initialize
compiled: HtmlClass.name
compiled: TokenStream.token_stream
compiled: LineReader.initialize
compiled: TemplatePage.write_html_on
compiled: Context.push
compiled: Context.pop
compiled: HtmlMethod.name
compiled: Context.find_local_symbol
compiled: SimpleMarkup.add_special
compiled: TopLevel.find_module_named
compiled: Context.find_enclosing_module_named
compiled: HtmlMethod.<=>
compiled: ToHtml.annotate
compiled: HtmlMethod.visibility
compiled: HtmlMethod.section
compiled: HtmlMethod.document_self
compiled: LineReader.dup
compiled: Lines.unget
compiled: ToHtml.accept_paragraph
compiled: ContextUser.document_self
compiled: ToHtml.accept_heading
compiled: Heading.head_level
compiled: ToHtml.accept_list_start
compiled: ToHtml.accept_list_end
compiled: ToHtml.accept_verbatim
compiled: SimpleMarkup.initialize
compiled: AttributeManager.initialize
compiled: ToHtml.initialize
compiled: ToHtml.end_accepting
compiled: HtmlMethod.singleton
compiled: Context.modules
compiled: Context.classes
compiled: ContextUser.build_include_list
compiled: HtmlMethod.description
compiled: HtmlMethod.parent_name
compiled: HtmlMethod.aliases
compiled: HtmlClass.parent_name
compiled: ContextUser.as_href
compiled: ContextUser.url
compiled: ContextUser.aref_to
compiled: HtmlFile.<=>
compiled: HtmlClass.<=>
You can see from the output that not only are RubyGems methods getting compiled, but so are stdlib methods and our own Java integration methods. And this is with the current compiler, which doesn't support compiling class defs, blocks, case statements, ... Hopefully you get the picture; this bit-by-bit implementation of the compiler allows us to slowly grow our ability to optimize Ruby into Java bytecodes.

So then, how well does it perform? It performs just dandy, when we're able to compile. Witness the following results for a simple recursive fib algorithm running under Ruby 1.8.5 and JRuby trunk with the JIT enabled.

$ ruby test/bench/bench_fib_recursive.rb
12.760000 1.400000 14.160000 ( 14.718925)
12.660000 1.490000 14.150000 ( 14.648681)
$ JAVA_OPTS=-Djruby.jit.enabled=true jruby test/bench/bench_fib_recursive.rb
compiled: Object.fib_ruby
8.780000 0.000000 8.780000 ( 8.780000)
7.761000 0.000000 7.761000 ( 7.761000)
Yes, that's nearly double the performance of the C implementation of Ruby. And this is absolutely real.

Now JITing is great, and it's obviously carried Java a long ways. The HotSpot JIT is an unbelievable piece of work, and any app that runs a long time is guaranteed to perform better and better as deeper optimizations start to take hold. But We're talking about Ruby here, which starts up at C-program speeds, and runs as fast as it does immediately. So then JRuby needs a way to compete for immediate execution performance, and the most straightforward way to do that is with an ahead-of-time compiler. That compiler is now also available in JRuby trunk.

The name of the command is "jrubyc", and it does just what you'd expect, it outputs a Java class file for your Ruby code. However the mapping from Ruby code to a class file is not as straightforward as you'd expect: a Ruby script may contain many classes or no classes at all, and those classes may be opened and re-opened by the same script or other scripts at runtime. So there's no way to map directly from a Ruby class to a Java class given the strict limitations of Java's class model. But there is a much smaller unit of code that does not change over time, aside from being mercilessly juggled around: methods.

Ruby, in the end, is a creative and sometimes complicated jumble of method "objects", floating from class to class, from module to module, from namespace to namespace. Methods can be renamed, redefined, added and removed, but never can they be directly modified. And so here is where we have our immutable item to compile.

JRuby's compiler takes a given Ruby script and generates the following Java methods out of it: One Java method for the top-level, straight-through execution of the script, including class bodies and "def"s and the like (called "__file__" in the eventual Java class...thanks Ola for the idea), and a Java method for every Ruby method body and closure contained therein, named in such a way as to avoid conflicts. So for the following piece of code:
require 'foo'

def bar
baz { puts "hello" }
end

def baz
yield
end
There would be four Java methods generated: one for the toplevel execution of the script, two for the bar and baz methods, and one for the closure contained within bar. The resulting class file would store these as static methods, so they are accessible from any class or object as necessary, and the toplevel run-through would bind the two Ruby methods to their appropriate names in Ruby-space.

Quite simple, really!

So then an example of the precious, precious JRuby compiler:
$ cat fib_recursive.rb
def fib_ruby(n)
if n < 2
n
else
fib_ruby(n - 2) + fib_ruby(n - 1)
end
end

puts fib_ruby(34)
$ jrubyc fib_recursive.rb
$ ls fib_recursive.*
fib_recursive.class fib_recursive.rb
$ time java -cp lib/jruby.jar:lib/asm-2.2.2.jar:. fib_recursive
5702887

real 0m8.126s
user 0m7.632s
sys 0m0.208s
$ time ruby fib_recursive.rb
5702887

real 0m14.649s
user 0m12.945s
sys 0m1.480s
Again, about twice as fast as Ruby 1.8.5 for this particular benchmark.

Now I don't want you going off and saying JRuby has a perfect compiler that will double the performance of your Rails apps. That's not true yet. The current compiler covers only about 30% of the possible code constructs in Ruby, and the remaining 60% (Update: 70%...that's what I get for late-night blogging) contains some of the biggest challenges like closures and class definitions. It's sure to be buggy right now, and the JIT isn't even enabled by default, plus it has my nasty logging message burned into it, to discourage any production use.

But it is very real. JRuby has a partial but growing compiler for Ruby to Java bytecode now.

And oh my, look at the time. Tonight I have to finish my visa application for a trip to India, nail down schedules and descriptions for several upcoming talks, and prepare some slides and notes for presentations in the coming weeks. You will see more about the Java compilation and our developing YARV/Ruby 2.0 bytecode support over the next couple months...and you can expect JavaOne to be an interesting time for Ruby on the JVM this year ;)