Wednesday, April 29, 2009

Stand and Be Counted

In the spirit of Nick Sieger's short statement on the recent uproar over Matt Aimonetti's "pr0n star" talk, I'm posting my one and only blog response to the whole thing.

Unlike Nick, I too often have used this blog as a soapbox. And too often I've ground my personal axe against projects that may or may not have deserved it. I'm human, I'm passionate and proud of my work, and I'm defensive of what we've accomplished, so I don't think this is surprising. I also see the same passion and pride in the Ruby community at large, and it's why I'm much more interested in attending Ruby conferences than Java conferences, where many attendees just seem to be going through the motions. And I know I've crossed a line at times, making or taking things too personal, and hopefully I've apologized or corrected myself whenever that's happened. If not, mea culpa.

But there's a disturbing trend in the Ruby community I haven't had to deal with since high school: in preference to open inclusion, more and more Rubyists seem to choose exclusivity.

This recent firestorm has continued in large part, I believe, because of the poor initial response by folks involved. Rather than recognize that there are people with different views, taking offense at different ideas and images, some decided to say "fuck you, this is who I am" and further alienate those people. I certainly don't expect we as passionate individuals won't commit occasional faux pas, especially when trying to be funny or provocative and especially when coming from different backgrounds that may be more or less accepting of certain behaviors. But to claim no responsibility for an obvious mistake, indeed to claim it's somehow the fault of the offended, or American sensibility, or political correctness...well that's just sophomoric.

I think to some extent we can understand (but not excuse) such behavior by realizing that the Ruby (or perhaps the Rails) community is largely a very *young* community. That's a large part of why this community is so passionate, why they're so committed to their ideals, why they're so opinionated, why they're so much more fun to hang out with than many 30-year programmers from other communities who've had the life sucked out of them. It's also a reason so many in the Ruby (or perhaps the Rails) community seem to act like they're in high school, forming cliques, sitting at their own tables, snubbing the new kids or the weird kids or anyone they perceive as "trying to be cool."

Have you been invited to any exclusive Ruby communities? I've been invited to a couple, and without exception I've found the idea offensive every time. In some cities, there are now multiple tiers of Ruby group: one for the proles, where anyone is welcome and everyone is either new to Ruby, a little weird, or both; and then perhaps one or two levels of more "exclusive" groups, usually more "advanced" and sometimes invite-only but generally exclusionary in some way.

There's also a technical "coolness" exclusivity many projects have had to cope with. Folks working on JRuby and IronRuby, for example, have had to deal with perceptions that they're either less "cool" because of their platform of choice or at least somehow less "Ruby" because they're not following the same golden path everyone else follows. Or perhaps their employers are out to take over Ruby, or they're going to infect Ruby with a bunch more "new" people who don't "get it". All the while the folks that use and work on these projects are working just as hard as anyone else to bring Ruby to the world, staying true to what makes Ruby special, and largely going against the grain in their original communities as well. Being snubbed, mocked, or attacked is often their reward.

You start to see a pattern here, yes?

So let's spell it out. I like the Ruby community because it's filled with people who love playing with new technology, without biases and prejudices getting in the way. My closest friends in the community are people like me, who find it repugnant that being opinionated has been too often equated with being rude and boorish, exclusionary and sophomoric, or simply mean. We are all here because of our love of technology, all here because we didn't feel like we fit in other places that weren't so passionate about beautiful code and fresh ideas. We are all here because we don't care if you're male or female, religious or irreligious, young or old, experienced or inexperienced, beautiful or plain, conservative or liberal, tall or short, fat or thin, foreign or domestic, gay or straight, black or white, or any grey areas in-between. We are all here because we love that more and more people like us join the community every day...the same people some of us immediately judge and box into their own subcool subgroups.

I don't want to join your damn clique. I don't think it's ok to set people aside or treat them like dirt because they don't believe what you believe or because they have their own way of thinking and acting or because they're not as worldly and mature and oh-so-smug as you are. I don't believe in "rock stars" and I don't believe that dubious title gives anyone the right to be an asshole to others or to have free reign to act any way they choose. I don't care what kind of car you drive, what house you live in, or what clothes you wear...and I sure as hell don't care how many people follow you on Twitter.

What I do care about is whether you're interested in sitting down and hacking out some code, looking at new projects with an open mind, helping someone new (maybe me) improve their skills, being part of something larger than yourself. If you promise not to treat me like a weirdo or a rock star, I promise to talk openly about your ideas, to show you the heart and soul of my code, and to freely share my thoughts...no matter who you are. I hope you'll attend my presentations and/or try out my projects, and in exchange I'll try to do the same the same for you. I hope you'll walk up to me at conferences and tell me about whatever "crazy" or "stupid" idea you have, and I guarantee to listen since it's probably not as crazy or stupid as you think. And I expect you to do the same for everyone else in the community and not treat me or anyone else any differently.

Now, let's move forward and get back to hacking and having fun!

Saturday, April 25, 2009

Setting up Typo on JRuby

I figured I'd give Typo a try on JRuby today. It has been working for quite a while, but with the GlassFish gem improving so much I thought it would be good to write up an updated walkthrough. It's pretty simple.

BTW, is Typo still the preeminent blog package for Rails? I certainly don't want to be out of fashion.

Prerequisites:
  1. MySQL already set up and working, with TCP sockets enabled (or I guess you can use SQLite too)
  2. Java (sudo apt-get install sun-java6-jdk or whatever's appropriate for your platform)
  3. JRuby (download, unpack, put bin/ in PATH)
  4. Appropriate gems installed (rails, activerecord-jdbcmysql-adapter, glassfish or mongrel)
The process:
  1. Download Typo. The gem unfortunately tries to install native extensions like sqlite3 and mysql (I sure wish they wouldn't do that!)
  2. Unpack the Typo zip wherever you want your blog site to live and cd into that directory
  3. Edit config/database.yml.example to your liking, replacing "mysql" with "jdbcmysql" and save it as config/database.yml
  4. Create the database:
    jruby -S rake db:create RAILS_ENV=production
  5. Migrate the database:
    jruby -S rake db:migrate RAILS_ENV=production
  6. Run the server:
    glassfish -p <port> -e production [and whatever other options you want]
    or
    jruby script/server -p <port> -e production
  7. Set up Apache to point at your new Typo instance (optional)
That's all there is to it! You'll want to be the first one to hit your new blog, so you can set up your admin account and lock down the server.

Perhaps it's time I finally moved my blog off Blogger and on to a JRuby-hosted server, eh?

Suggestions, improvements to this process? Add to comments and I'll update the post.

Apache + JRuby + Rails + GlassFish = Easy Deployment!

It occurred to me today that a lot of people probably want a JRuby deployment option that works with a front-end web server. I present for you the trivial steps required to host a JRuby server behind Apache.

Update: It's worth mentioning that this works fine with JRuby + Mongrel too, though Mongrel doesn't automatically multithread without this patch and Rails' production.rb config.threadsafe! line uncommented. The GlassFish gem will automatically multithread with several JRuby instances (in the same server process) by default or a single JRuby instance with config.threadsafe! uncommented.

Prerequisites:
  1. Apache with mod_proxy_http enabled (sudo a2enmod proxy_http on Ubuntu)
  2. Java (sudo apt-get install sun-java6-jdk or the openjdk flavors if you like)
  3. JRuby (download, unpack, put bin/ in PATH)
  4. gems appropriate to run your app with JRuby (e.g. rails, activerecord-jdbcsqlite3-adapter, etc)
  5. production DB all set to go
Ok. I'm no Apache expert, so I'm sure there's some tweaking necessary for this. Please add your tweaks and suggestions in comments. But basically, all you need to do is run your app using the GlassFish gem and set up Apache to proxy to it. It's that simple.
  1. Install the glassfish gem
    gem install glassfish
  2. From your application directory, run glassfish with these options:
    glassfish -p <port> -e production -c <context> -d
  3. Add ProxyPass and ProxyPassReverse lines to Apache (whereever is appropriate on your system) for the GlassFish server instance. For example, if <port> is 9000 and <context> is foo:
    ProxyPass /foo http://localhost:9000/foo
    ProxyPassReverse /foo http://localhost:9000/foo
  4. Reload Apache config, however is appropriate for your system
You'll now be able to access your app via http://servername/foo, and requests will all proxy to the GlassFish server instance.

This doesn't do anything to manage the server instance, but since GlassFish can start up as a daemon now, it should be easy to wire into whatever you normally use for that.

A few caveats:
  • I had some trouble getting mod_proxy to allow requests to proxy through. There's probably a right way I'm not doing, so I won't say what I did. If you know the ideal mod_proxy config for this sort of thing, post it in comments
  • Although the GlassFish gem is really nice already, we're still working minor kinks out of it and it may still have minor bugs. If you run into something, let us know and we'll get it fixed (and of course, you can use this mod_proxy setup with Mongrel too, if you like).
  • I'd love to get some help putting together something that manages servers similar to Phusion Passenger, because that's really the only missing piece here.
I'll update this post as suggestions come in. Enjoy!

Update: Dberg suggests the following improvement:
the one thing you want to remember to do also is to exclude the static assets that normally get served out of rails. To do this simply add some proxypass exclude lines like

ProxyPass /images !
ProxyPass /javascripts !
ProxyPass /stylesheets

Then make sure DocumentRoot is set to the right place for these files and you will get a slight performance boost !
Thanks for that!

Wednesday, April 22, 2009

The Future: Part One

There's been a lot of supposition about the future lately, and I've certainly read and succumbed to the temptation of the prognosticator. I'm not going to comment on any of the players or put forward my own suppositions about what might happen. What I will do here is talk about what *should* happen.

It's apparent that the Java platform is at a crossroads. One path leads to irrelevance, be it through general apathy that important technologies are getting sidelined, or through active emigration due to bureaucratic processes and waterfall platform evolution. The other path leads to a bright, open future, where polyglots romp and play with fresh new languages and developers have freedom to use whatever tools they feel are necessary for a given job. Given the large investment many of us have in this platform, we need to start talking now about which direction we want to go.

I've been doing a lot of thinking about the future of the Java platform and my future developing for it, and I've come up with a few things I believe must happen soon to ensure the future of the JVM and the applications and languages built for it.

Languages

As you might expect from me, the first area involves supporting many languages on the JVM. Over the past three years we've seen a spectacular transformation take place. The "Java" VM has become a truly multi-language platform. People are putting into production applications running all sorts of languages, often writing absolutely no Java to do so. And this is how things must be...this is how the platform is going to survive.

As I see it there are currently three primary players in the language domain: JRuby, Scala, and Groovy (in no particular order). I include these three due to the relative completeness of their implementations, vague popularity metrics, and availability of production use cases. These three languages, one static and two dynamic, bear further exploration.

JRuby represents the promise of off-platform languages being brought to the JVM. Rather than creating a language specifically tailored to fit well into the JVM's type system and limitations, we have managed to take a drastically different language and implement enough of it to be the only "alternative" implementation realistically considered for production applications. And in doing so we've had to stretch the platform. We've created native binding libraries, wired in POSIX functions the JDK doesn't provide, implemented our own core types like Strings, Arrays, and regular expressions, and done all this while managing to deliver the best performing compatible Ruby implementation available. And where the other two primary language contenders largely pull developers from other areas of the Java platform, JRuby actually brings in many developers that might not otherwise ever use the JVM. JRuby stretches and grows the platform.

Groovy is the second dynamic language under consideration. Groovy represents taking the best features of a number of dynamic languages and wiring them into a Java-like syntax that's easy for existing Java developers to learn. Groovy provides a solid dynamic language foundation without breaking Java type-system requirements like ahead-of-time compiled classes and static method signatures, allowing it to enlist directly in newer APIs that depend on those requirements. And while many developers come to Groovy from other areas of the Java platform, they might also have completely left the platform if not for Groovy. Groovy provides for Java developers a "dynamic layer" that doesn't require them to learn a new platform and a new set of libraries. And so Groovy's strength is in maintaining the platform and stemming the bleeding of developers to off-platform dynamic languages.

Scala, it must be stated, is the current heir apparent to the Java throne. No other language on the JVM seems as capable of being a "replacement for Java" as Scala, and the momentum behind Scala is now unquestionable. While Scala is not a dynamic language, it has many of the characteristics of popular dynamic languages, through its rich and flexible type system, its sparse and clean syntax, and its marriage of functional and object paradigms. The supposed weakness of Scala as being "too complicated" or "too rich" can be countered by developing coding standards, creating more robust editors and tools, and by better educating polyglots on how best to use Scala. Scala represents the rebirth of static-typed languages on the JVM, and like JRuby it has also begun to stretch the capabilities of the platform in ways Java never could.

Among the secondary players I include language implementations like Jython, Clojure, and Rhino. While still in early days of adoption (or in Jython and Rhino's cases, early days of re-awakening), they represent similar aspects to the primary three languages. For purposes of discussion, we'll leave it at that for now.

In order for the platform to embrace these and many future languages, several things need to happen:
  • Ongoing work on these languages must be funded in such a way as to avoid product lock-in. Funding them by tying them to specific technologies will only fracture development communities, and likely damage any existing open-source contributor base.
  • There must be paid support offerings for these languages outside of specific products. For people interested in running JRuby on Rails, Grails, or Liftweb, there must be support channels they can follow, regardless of product bundling attempts. In the JRuby world, for example, we receive many requests for paid support, either in the form of hired guns to optimize an application or through targeting resources to critical bugs. And so far, we have been unable to offer such support.
  • We must continue the work started on the OpenJDK and MLVM projects. OpenJDK has enabled such projects as Shark (Hotspot on LLVM) and the MLVM (JVM extensions for non-Java language features), and we must see these efforts through to completion.
  • Finally, we need a "languages czar" that can coordinate the technical aspects of these various projects and help direct resources where needed. This would largely be a community-cultivating role, but as a sort of "open-source manager" to ensure development efforts are both cooperating and not being needlessly duplicated.
I believe it's absolutely vital that these tasks be met, or we risk the future of the platform entirely. And losing must not be an option, lest we fall back into proprietary alternatives controlled by a single entity.

More to come.

Monday, April 13, 2009

JRuby Moves to Git

We have successfully migrated JRuby development to Git!

The main repository is on the JRuby kenai.com project, but most folks will just want to clone the official mirror on Github. The mirror lags by no more than five minutes, and we'll eliminate that lag soon.

Kenai: git://kenai.com/jruby~main
Github: git://github.com/jruby/jruby.git

The Github repository is also attached to the official Github "jruby" user.

The CI server has already been updated to point at the new repository on kenai, and nightly builds have been running for several days. If you just want to grab a current nightly snapshot, you can do that from github's capabilities or by visiting the nightly build page:

http://jruby.headius.com:8080/hudson/job/jruby-dist/

The old SVN repository on Codehaus is now defunct for JRuby development and will not be updated any more. We will likely remove the JRuby branches and trunk soon and replace them with a README pointing to the new repository locations.

Why Git?

We've known for a long time that we wanted to move to a distributed SCM, and had narrowed it down to Mercurial and Git.

For a long time Mercurial was the front-runner, partly because we were more familiar with it and partly because kenai.com, the site where we're moving JRuby's project hosting (and a JRuby on Rails-based site, btw), only supported Subversion and Mercurial.

But a few things changed our minds over the past couple months:
  • Kenai added git support
  • We realized that we'd get more Rubyists contributing if we had a mirror on Github
  • We became more comfortable with Git
Ultimately, the move to Git mostly came down to politics: Rubyists like Git better, and we're a Ruby-related project. Had we been Jython, we'd probably have chosen Mercurial.

Enjoy!

Thursday, April 2, 2009

How JRuby Makes Ruby Fast

At least once a year there's a maelstrom of posts about a new Ruby implementation with stellar numbers. These numbers are usually based on very early experimental code, and they are rarely accompanied by information on compatibility. And of course we love to see crazy performance numbers, so many of us eat this stuff up.

Posting numbers too early is a real disservice to any project, since they almost certainly don't represent the eventual real-world performance people will see. It encourages folks to look to the future, but it also marginalizes implementations that already provide both compatibility and performance, and ignores how much work it has taken to get there. Given how much we like to see numbers, and how thirsty the Ruby community is for "a fastest Ruby", I don't know whether this will ever change.

I thought perhaps a discussion about the process of optimizing JRuby might help folks understand what's involved in building a fast, compatible Ruby implementation, so that these periodic shootouts don't get blown out of proportion. Ruby can be fast, certainly even faster than JRuby is today. But getting there while maintaining compatibility is very difficult.

Performance Optimization, JRuby-style

The truth is it's actually very easy to make small snippits of Ruby code run really fast, especially if you optimize for the benchmark. But is it useful to do so? And can we extrapolate eventual production performance from these early numbers?

We begin our exploration by running JRuby in interpreted mode, which is the slowest way you can run JRuby. We'll be using the "tak" benchmark, since it's simple and easy to demonstrate relative performance at each optimization level.
# Takeuchi function performance, tak(24, 16, 8)
def tak x, y, z
if y >= x
return z
else
return tak( tak(x-1, y, z),
tak(y-1, z, x),
tak(z-1, x, y))
end
end

require "benchmark"

N = (ARGV.shift || 1).to_i

Benchmark.bm do |make|
N.times do
make.report do
i = 0
while i<10
tak(24, 16, 8)
i+=1
end
end
end
end

And here's our first set of results. I have provided Ruby 1.8.6 and Ruby 1.9.1 numbers for comparison.
Ruby 1.8.6p114:
➔ ruby bench/bench_tak.rb 5
user system total real
17.150000 0.120000 17.270000 ( 17.585128)
17.170000 0.140000 17.310000 ( 17.946869)
17.180000 0.160000 17.340000 ( 18.234570)
17.180000 0.150000 17.330000 ( 17.779536)
18.790000 0.190000 18.980000 ( 19.560232)

Ruby 1.9.1p0:
➔ ruby191 bench/bench_tak.rb 5
user system total real
3.570000 0.030000 3.600000 ( 3.614855)
3.570000 0.030000 3.600000 ( 3.615341)
3.560000 0.020000 3.580000 ( 3.608843)
3.570000 0.020000 3.590000 ( 3.591833)
3.570000 0.020000 3.590000 ( 3.640205)

JRuby 1.3.0-dev, interpreted, client VM
➔ jruby -X-C bench/bench_tak.rb 5
user system total real
24.981000 0.000000 24.981000 ( 24.903000)
24.632000 0.000000 24.632000 ( 24.633000)
25.459000 0.000000 25.459000 ( 25.459000)
29.122000 0.000000 29.122000 ( 29.122000)
29.935000 0.000000 29.935000 ( 29.935000)

Ruby 1.9 posts some nice numbers here, and JRuby shows how slow it can be when doing no optimizations at all. The first change we look at, and which we recommend to any users seeking best-possible performance out of JRuby, is to use the JVM's "server" mode, which optimizes considerably better.
JRuby 1.3.0-dev, interpreted, server VM
➔ jruby --server -X-C bench/bench_tak.rb 5
user system total real
8.262000 0.000000 8.262000 ( 8.192000)
7.789000 0.000000 7.789000 ( 7.789000)
8.012000 0.000000 8.012000 ( 8.012000)
7.998000 0.000000 7.998000 ( 7.998000)
8.000000 0.000000 8.000000 ( 8.000000)

The "server" VM differs from the default "client" VM in that it will optimistically inline code across calls and optimize the resulting code as a single unit. This obviously allows it to eliminate costly x86 CALL operations, but even more than that it allows optimizing algorithms which span multiple calls. By default, OpenJDK will attempt to inline up to 9 levels of calls, so long as they're monomorphic (only one valid target), not too big, and no early assumptions are changed by later code (like if a monomorphic call goes polymorphic later on). In this case, where we're not yet compiling Ruby code to JVM bytecode, this inlining is mostly helping JRuby's interpreter, core classes, and method-call logic. But already we're 3x faster than interpreted JRuby on the client VM.

The next optmization will be to turn on the compiler. I've modified JRuby for the next couple runs to *only* compile and not do any additional optimizations. We'll discuss those optimizations as I add them back.
JRuby 1.3.0-dev, compiled (unoptimized), server VM:
➔ jruby --server -J-Djruby.astInspector.enabled=false bench/bench_tak.rb 5
user system total real
5.436000 0.000000 5.436000 ( 5.376000)
3.655000 0.000000 3.655000 ( 3.655000)
3.662000 0.000000 3.662000 ( 3.662000)
3.683000 0.000000 3.683000 ( 3.683000)
3.668000 0.000000 3.668000 ( 3.668000)

By compiling, without doing any additional optimizations, we're able to improve performance 2x again. Because we're now JITing Ruby code as JVM bytecode, and the JVM eventually JITs JVM bytecode to native code, our Ruby code actually starts to benefit from the JVM's built-in optimizations. We're making better use of the system CPU and not making nearly as many calls as we would from the interpreter (since the interpreter is basically a long chain of calls for each low-level Ruby operation.

Next, we'll turn on the simplest and oldest JRuby compiler optimization, "heap scope elimination".
JRuby 1.3.0-dev, compiled (heap scope optz), server VM:
➔ jruby --server bench/bench_tak.rb 5
user system total real
4.014000 0.000000 4.014000 ( 3.942000)
2.776000 0.000000 2.776000 ( 2.776000)
2.760000 0.000000 2.760000 ( 2.760000)
2.769000 0.000000 2.769000 ( 2.769000)
2.768000 0.000000 2.768000 ( 2.769000)

The "heap scope elimination" optimization eliminates the use of an in-memory store for local variables. Instead, when there's no need for local variables to be accessible outside the context of a given method, they are compiled as Java local variables. This allows the JVM to put them into CPU registers, making them considerably faster than reading or writing them from/to main memory (via a cache, but still slower than registers). This also makes JRuby ease up on the JVM's memory heap, since it no longer has to allocate memory for those scopes on every single call. This now puts us comfortably faster than Ruby 1.9, and it represents the set of optimizations you see in JRuby 1.2.0.

Is this the best we can do? No, we can certainly do more, and some such experimental optimizations are actually already underway. Let's continue our exploration by turning on another optimization similar to the previous one: "backtrace-only frames".
JRuby 1.3.0-dev, compiled (heap scope + bracktrace frame optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true bench/bench_tak.rb 5
user system total real
3.609000 0.000000 3.609000 ( 3.526000)
2.600000 0.000000 2.600000 ( 2.600000)
2.602000 0.000000 2.602000 ( 2.602000)
2.598000 0.000000 2.598000 ( 2.598000)
2.602000 0.000000 2.602000 ( 2.602000)

Every Ruby call needs to store information above and beyond local variables. There's the current "self", the current method visibility (used for defining new methods), which class is currently the "current" one, backref and lastline values ($~ and $_), backtrace information (caller's file and line), and some other miscellany for handling long jumps (like return or break in a block). In most cases, this information is not used, and so storing it and pushing/popping it for every call wastes precious time. In fact, other than backtrace information (which needs to be present to provide Ruby-like backtrace output), we can turn most of the frame data off. This is where we start to break Ruby a bit, though there are ways around it. But you can see we get another small boost.

What if we eliminate frames entirely and just use the JVM's built-in backtrace logic? It turns out that having any pushing/popping of frames, even with only backtrace data, still costs us quite a bit of performance. So let's try "heap frame elimination":
JRuby 1.3.0-dev, compiled (heap scope + heap frame optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true bench/bench_tak.rb 5
user system total real
2.955000 0.000000 2.955000 ( 2.890000)
1.904000 0.000000 1.904000 ( 1.904000)
1.843000 0.000000 1.843000 ( 1.843000)
1.823000 0.000000 1.823000 ( 1.823000)
1.813000 0.000000 1.813000 ( 1.813000)

By eliminating frames entirely, we're a good 33% faster than the fastest "fully framed" run you'd get with stock JRuby 1.2.0. You'll notice the command line here is the same; that's because we're venturing into more and more experimental code, and in this case I've actually forced "frameless" to be "no heap frame" instead of "backtrace-only heap frame". And what do we lose with this change? We no longer would be able to produce a backtrace containing only Ruby calls, so you'd see some JRuby internals in the trace, similar to how Rubinius shows Rubinius internals. But we're getting respectably fast now.

Next up we'll turn on some optimizations for math operators.
JRuby 1.3.0-dev, compiled (heap scope, heap frame, fastops optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true -J-Djruby.compile.fastops=true bench/bench_tak.rb 5
user system total real
2.291000 0.000000 2.291000 ( 2.225000)
1.335000 0.000000 1.335000 ( 1.335000)
1.337000 0.000000 1.337000 ( 1.337000)
1.344000 0.000000 1.344000 ( 1.344000)
1.346000 0.000000 1.346000 ( 1.346000)

Most of the time, when calling + or - on an object, we do the full Ruby dynamic dispatch cycle. Dispatch involves retrieving the target object's metaclass, querying for a method (like "+" or "-"), and invoking that method with the appropriate arguments. This works fine for getting us respectable performance, but we want to take things even further. So JRuby has experimental "fast math" operations to turn most Fixnum math operators into static calls rather than dynamic ones, allowing most math operations to inline directly into the caller. And what do we lose? This version of "fast ops" makes it impossible to override Fixnum#+ and friends, since whenever we call + on a Fixnum it's going straight to the code. But it gets us another nearly 30% improvement.

Up to now we've still also been updating a lot of per-thread information. For every line, we're tweaking a per-thread field to say what line number we're on. We're also pinging a set of per-thread fields to handle the unsafe "kill" and "raise" operations on each thread...basically we're checking to see if another thread has asked the current one to die or raise an exception. Let's turn all that off:

JRuby 1.3.0-dev, compiled (heap scope, heap frame, fastops, threadless, positionless optz), server VM:
➔ jruby --server -J-Djruby.compile.frameless=true -J-Djruby.compile.fastops=true -J-Djruby.compile.positionless=true -J-Djruby.compile.threadless=true bench/bench_tak.rb 5
user system total real
2.256000 0.000000 2.256000 ( 2.186000)
1.304000 0.000000 1.304000 ( 1.304000)
1.310000 0.000000 1.310000 ( 1.310000)
1.307000 0.000000 1.307000 ( 1.307000)
1.301000 0.000000 1.301000 ( 1.301000)

We get a small but measurable performance boost from this change as well.

The experimental optimizations up to this point (other than threadless) comprise the set of options for JRuby's --fast option, shipped in 1.2.0. The --fast option additionally tries to statically inspect code to determine whether these optimizations are safe. For example, if you're running with --fast but still access backrefs, we're going to create a frame for you anyway.

We're not done yet. I mentioned earlier the JVM gets some of its best optimizations from its ability to profile and inline code at runtime. Unfortunately in current JRuby, there's no way to inline dynamic calls. There's too much plumbing involved. The upcoming "invokedynamic" work in Java 7 will give us an easier path forward, making dynamic calls as natural to the JVM as static calls, but of course we want to support Java 5 and Java 6 for a long time. So naturally, I have been maintaining an experimental patch that eliminates most of that plumbing and makes dynamic calls inline on Java 5 and Java 6.
JRuby 1.3.0-dev, compiled ("--fast", dyncall optz), server VM:
➔ jruby --server --fast bench/bench_tak.rb 5
user system total real
2.206000 0.000000 2.206000 ( 2.066000)
1.259000 0.000000 1.259000 ( 1.259000)
1.258000 0.000000 1.258000 ( 1.258000)
1.269000 0.000000 1.269000 ( 1.269000)
1.270000 0.000000 1.270000 ( 1.270000)

We improve again by a small amount, always edging the performance bar higher and higher. In this case, we don't lose compatibility, we lose stability. The inlining modification breaks method_missing and friends, since I have not yet modified the call pipeline to support both inlining and method_missing. And there's still a lot of extra overhead here that can be eliminated. But in general we're still mostly Ruby, and even with this change you can run a lot of code.

This represents the current state of JRuby. I've taken you all the way from slow, compatible execution, through fast, compatible execution, and all the way to faster, less-compatible execution. There's certainly a lot more we can do, and we're not yet as fast as some of the incomplete experimental Ruby VMs. But we run Ruby applications, and that's no small feat. We will continue making measured steps, always ensuring compatibility first so each release of JRuby is more stable and more complete than the last. If we don't immediately leap to the top of the performance heap, there's always good reasons for it.

Performance Optimization, Duby-style

As a final illustration, I want to show the tak performance for a language that looks like Ruby, and tastes like Ruby, but boasts substantially better performance: Duby.
def tak(x => :fixnum, y => :fixnum, z => :fixnum)
unless y < x
z
else
tak( tak(x-1, y, z),
tak(y-1, z, x),
tak(z-1, x, y))
end
end

puts "Running tak(24,16,8) 1000 times"

i = 0
while i<1000
tak(24, 16, 8)
i+=1
end

This is the Takeuchi function written in Duby. It looks basically like Ruby, except for the :fixnum type hints in the signature. Here's a timing of the above script (which calls tak the same as before but 1000 times instead of 5 times), running on the server JVM:
➔ time jruby -J-server bin/duby examples/tak.duby
Running tak(24,16,8) 1000 times

real 0m13.657s
user 0m14.529s
sys 0m0.450s

So what you're seeing here is that Duby can run "tak(24,16,8)", the same function we tested in JRuby above, in an average of 0.013 seconds--nearly two orders of magnitude faster than the fastest JRuby optimizations above and at least an order of magnitude faster than the fastest incomplete, experimental implementations of Ruby. What does this mean? Absolutely nothing, because Duby is not Ruby. But it shows how fast a Ruby-like language can get, and it shows there's a lot of runway left for JRuby to optimize.

Be a (Supportive) Critic!

So the next time someone posts an article with crazy-awesome performance numbers for a Ruby implementation, by all means applaud the developers and encourage their efforts, since they certainly deserve credit for finding new ways to optimize Ruby. But then ask yourself and the article's author how much of Ruby the implementation actually supports, because it makes a big difference.

Update, April 4: Several people told me I didn't go quite far enough in showing that by breaking Ruby you could get performance. And after enough cajoling, I was convinced to post one last modification: recursion optimization.
JRuby 1.3.0-dev, compiled ("--fast", dyncall optz, recursion optz), server VM:
➔ jruby --server --fast bench/bench_tak.rb 5
user system total real
0.524000 0.000000 0.524000 ( 0.524000)
0.338000 0.000000 0.338000 ( 0.338000)
0.325000 0.000000 0.325000 ( 0.325000)
0.299000 0.000000 0.299000 ( 0.299000)
0.310000 0.000000 0.310000 ( 0.310000)

Woah! What the heck is going on here? In this case, JRuby's compiler has been hacked to turn recursive "functional calls", i.e. calls to an implicit "self" receiver, into direct calls. The logic behind this is that if you're calling the current method from the current method, you're going to always dispatch back to the same piece of code...so why do all the dynamic call gymnastics? This fits a last piece into the JVM inlining-optimization puzzle, allowing mostly-recursive benchmarks like Takeuchi to inline more of those recursive calls. What do we lose? Well, I'm not sure yet. I haven't done enough testing of this optimization to know whether it breaks Ruby in some subtle way. It may work for 90% of cases, but fail for an undetectable 10%. Or it may be something we can determine statically, or something for which we can add an inexpensive guard. Until I know, it won't go into a release of JRuby, at least not as a default optimization. But it's out there, and I believe we'll find a way.

It is also, incidentally, only a few times slower than a pure Java version of the same benchmark, provided Java is using all boxed numerics too.