Headius: April 2008

Tuesday, April 29, 2008

Apple Chose...Poorly

So after a long wait, Apple finally released Java 6 for OS X. It's for Leopard only. And Leopard is apparently going to be the only Java-supporting OS without a 32-bit Java 6.

I can accept that it only runs in Leopard. They're moving forward, and with Java shipped as part of the OS it's a lot more hassle to backport it to Tiger. Plus, I've got Leopard so I'd be fine.

But the 32-bit thing really burns me.

It's not like there's only 64-bit Javas out there and Apple would have to do all the heavy lifting to support them on 32-bit machines. The vast majority of installed Java distributions are the 32-bit versions, and Sun ships 32 and 64-bit JDKs for both Linux and Windows. Hell, Landon Fuller even got the FreeBSD patchset JDK 6 to successfully build on Mac. It's missing the "last mile" of OS X integration like a Cocoa UI (X11 only right now) and sound support, but hell, it's there and it runs. So it's not even the JVM bits standing in their way.

Could it possibly be the OS integration? I don't buy that. Unless there's some serious problem with their libraries on 32 versus 64-bit systems, it oughta be a recompile. Even if it's a little more work than that, there's a lot of 32-bit Intel Macs out there.

Of course the fanboys are just going to tell me "welcome to the club." Yes, I know Apple regularly holds back features to encourage people to upgrade OS or hardware. And this is probably one of those cases, since there certainly doesn't seem to be a good technical reason for it. But seriously...this one just seems dumb, since they could have put the same bits in Landon's port and essentially had it working.

Maybe I'm too naive. Is this just standard operating procedure at Apple? Anything we can do to convince them?

The Rubyists are Wrong

There's something that's been bugging me for a long time that I need to get off my chest. Some of you may hate me for it, but perhaps there are others out there with the same complaint, silently in agony, wishing for death to take the pain away. It's time to set the record straight, and prove once and for all that the Rubyists are wrong.

Rubies are almost NEVER cut like this:

The cut shown here is what's called a "brilliant" cut (though it's not faceted enough...artistic license), or more specifically a "round brilliant". Brilliant is typically a diamond cut, and something like 75% of the world's diamonds are cut this way. The shape and angles of the facets are all mathematically designed to refract as much light as possible out the top of the diamond, resulting in the "brilliant" sparkling you see. This is possible because the most popular diamonds are CLEAR. Get it? It's clear so light passes through it. It's not RED. So it's cut in a way that takes advantage of it being CLEAR.

Rubies, on the other hand, are generally cut into ovals or "cushions" but also into some other cuts like "emeralds" (the most common cut for emeralds, in case it wasn't obvious), rectangles, or hearts (ugh). If they show an asterism (a "star" of four or six points due to the ruby's crystal structure) they're usually cut into cabochons, which are shaped but not faceted. Rubies are NOT typically cut into "brilliant" shapes.

See that one in the JRuby logo? That was a public-domain SVG graphic of a diamond that Tom Enebo colored red. A DIAMOND. Oh yeah, and we're wrong too. BTW, here's the Wikipedia article on ruby (the gemstone). Search for "brilliant". Yeah, I didn't think so.

Even the Ruby Association (of whose board Matz himself is chairman) has made the mistake of choosing this brilliant-shaped logo.

We can't really tell how the ruby in the RubyForge logo is cut, since it seems to just be a red hexagon. But I bet it's a hexagonal BRILLIANT cut again.

I don't even know what that is. It's like a brilliant cut with a dome on top of it. Maybe it was designed that way so you could fit "Ruby Inside" inside it. But it's definitely not something you're going to see in a jewelry store.

And there are lots more examples. Check your favorite Ruby project. If they have a ruby in the logo, it's probably a "brilliant" cut. And I think that proves once and for all that we Rubyists aren't as brilliant as we think.

Update: It turns out the Pythonistas are wrong too. Is there no sanity left in the world?

Sunday, April 27, 2008

Promise and Peril for Alternative Ruby Impls

My how things have changed in a couple short years.

Two years ago, in 2006, there were essentially two viable Ruby implementations: Matz's Ruby 1.8.x codebase, and JRuby. At the time, JRuby was just barely starting to run Rails. I consider that a sort of "singularity" in the lifetime of an implementation, the inflection point at which it becomes more than a toy. (OT: these days, I consider the ability to run Rails faster than Matz's Ruby a better inflection point, but we've had the Rails thing going for two years). Cardinal (Ruby on Parrot) was mostly dead, or at least on its way to being dead. YARV, eventually to become the Ruby 1.9 VM, was perhaps only half completed and was not yet officially marked to be the next Ruby. Rubinius had not really been started, or at least had not officially been named and could not be considered anywhere near viable. IronRuby was still Wilco Bauer's IronRuby, a doomed codebase and project name eventually to be adopted by Microsoft's later Ruby implementation effort. So there was very little competition, and most people still considered JRuby to be a big joke. Ha ha.

Fast forward to Spring 2008. Ruby 1.8.x has mostly been put in maintenance mode, but remains by far the most widely-deployed Ruby implementation, despite its relatively poor performance. Largely, this is because only Ruby 1.8.x is 100% compatible with Ruby 1.8.x, and because it's already packaged and shipped on a number of OSes, including Rubyist favorite OS X. But the rest of the field has become a lot more muddled. There are now around six implementations that have past the Rails inflection point or will soon, a handful of others likely to fade into obscurity (after contributing their own genetics to the Ruby ecosystem, surely), a few mysterious up-and-comers...and Cardinal is still dead.

Let's review the promise, peril, and status of all the implementations. Note, this is largely a mix of facts and my opinions. Corrections for the facts are welcome. Corrections for the opinions...well...let's take it offline.

Ruby 1.8

Matz's venerable code base (usually called MRI for "Matz's Ruby Interpreter", MatzRuby or Ruby 1.8 in this article) still has a death-grip on the Ruby world. For 99% of Ruby developers, MatzRuby is still the king of the hill. This is in the face of poor relative performance (slower than JRuby and 1.9 for sure, and slower than most of the others for many cases), poor memory management (conservative, non-compacting GC), and many upstart implementations which solve these problems. Why is this?

It's not hard to answer. It's compatibility and status quo. If all my apps run fine on MatzRuby, and MatzRuby is already installed, and I'm satisfied with the performance characteristics of MatzRuby...why would I run anything else? Because MatzRuby is largely not *that bad* for most uses, especially as relates to simple system scripting work, it's unlikely to go away any time soon. And since all but one of the alternative impls is targeting Ruby 1.8 features and compatibility, the still-in-development Ruby 1.9 is not gaining much traction yet (which is probably a good thing).

You should all know the specs and status for MatzRuby. Current official release is 1.8.6 patchlevel 114. There's a 1.8.7 preview 2 out now that backports a whole bunch of features from 1.9 and breaks some compatibility. The jury's still out on whether it will go final as-is. MatzRuby is a simple AST-walking interpreter, with a conservative GC, minimal/cumbersome Unicode support, and a large library of third-party native extensions. MatzRuby is still the gold standard for what Ruby 1.8 "is".

The peril for Ruby 1.8 right now involves keeping that 99% of Ruby users interested in Ruby while 1.9 bakes without breaking compatibility. Ruby 1.8.7 pre1 introduced some Ruby 1.9 features but also broke a bunch of stuff (most notably Rails), and pre2 doesn't pass the specs 1.8.6 does. We had a design meeting last week where it was decided that we folks working on the Ruby specs need to help the Ruby core team get involved, so they can start running the full suite as part of their development process. That's going to happen soon, but until it does 1.8 releases can break basically anything from day today and nobody will know about it.

Beyond compatibility and keeping the masses happy, Ruby 1.8 could use a little performance, scaling, and memory lovin' too. Unfortunately almost all that effort is going toward Ruby 1.9 now, leaving the vast majority of Ruby users stuck on one of the slowest implementations. That's good for us alternative implementers, since it means we're gaining users every day; but it's not good for the MatzRuby lineage because they're losing mindshare. It's hard to deny that the future of Ruby lies with the "excellent" implementations, if not with the "best" ones, and the definition of "excellent" is moving forward every day. Ruby 1.8 is not.

Ruby 1.9

Ruby 1.9 is the merging of the Ruby 1.8 class library and memory model with a large number of new features and a bytecode-based execution engine. It represents the work of Koichi Sasada, who first announced his YARV ("Yet Another Ruby VM") project at RubyConf 2004. YARV took a bit longer than he and many others expected to be completed, but as a result of his tireless efforts it is now the official Ruby 1.9 VM.

Ruby 1.9 introduces many new features, like a character-aware (Unicode and any other encoding) String implementation, Enumerator/Enumerable enhancements, and numerous refinements and additions to the rest of the class library I won't attempt to list here. Ruby 1.9 is defining, in essence, what the rest of the implementations will soon have to implement. For all the debates, most of the additions in Ruby 1.9 have been well-received by the community, though the exposure level is still extremely low.

Ruby 1.9 has, I believe, reached the Rails singularity. With some work over the past few months, Rails has moved closer and closer to running on Ruby 1.9. Last I heard, there was only one bug that needed patching in the 1.9 codebase for Rails trunk to run unmodified. Expect to see an announcement about Ruby 1.9 and Rails at RailsConf next month.

The interesting thing about Ruby 1.9 is that it will mark only the second non-MatzRuby implementation to run Rails, JRuby being the first. This is due in large part to the massive effort required to implement a bytecode VM for Ruby (1.9 was only just released this past December), and to the fact that Ruby 1.9 is still very much a moving target. APIs are being added and refined, optimizations are being tossed about at the VM level, and memory and GC improvements are being considered. So while it's unlikely that anyone will be moving Rails apps to Ruby 1.9 in the near future, Ruby 1.9 is certainly viable...and represents the most-likely future evolution of the Ruby language itself.

What worries me about Ruby 1.9 is that its performance doesn't seem "better enough" to change the future of Ruby. While Koichi's own benchmarks show it's much faster than Ruby 1.8, on general application benchmarks it's usually less than a 50% improvement. Many times, JRuby is able to exceed Ruby 1.9 performance, even without similar optimizations and feature removals. It is certainly a "better" performance story than Ruby 1.8, but is it enough?

Ruby 1.9 also took the first steps toward concurrency by making threads native, but it encumbered them with a giant lock a la Python. That means you still can't get concurrent execution of Ruby code on Ruby 1.9, something JRuby's been able to do all along, and you can't scale Ruby 1.9 any better on wide systems than you could with Ruby 1.8. There's plans to solve this by adding fine-grained locks to most internal data structures, but that's a hard problem to solve on par with JRuby 2.0 challenges I'll talk about in a minute. And without something like the JVM to really optimize that locking, performance will take a hit.

It's also unclear if people really *want* all of what Ruby 1.9 has to offer. Sure, people love the idea of a real encoding-aware String, but response to the rest of what Ruby 1.9 offers--including performance--has been a collective "meh." People are not flocking to Ruby 1.9 in droves, and many are contemplating whether their future Ruby work will simply be a lateral move to one of the 1.8-compatible implementations. And on the JRuby project, we've received almost no requests to implement Ruby 1.9 features, so we've only added a few tiny ones. Whither Ruby 2.0?

JRuby

Ahh, JRuby. How you have changed my life.

JRuby is a Java-based implementation of Ruby, or if you prefer not to speak the word "Java", it's Ruby for the JVM. JRuby was started in 2002 by Jan Arne Petersen, and though it had a couple good years of activity it never really got to a compatible-enough level to run real Ruby applications. Jan Arne moved on at some point and efforts were largely picked up by Thomas Enebo, current co-lead of JRuby. He was especially active when JRuby was being updated from Ruby 1.6 compatibility to Ruby 1.8.4 compatibility, a task which today is largely complete. I joined the project in fall 2004 after attending RubyConf 2004, and at the time I did not know Ruby. Now I know Ruby in a deeper way than I ever really wanted to...but that's a discussion for another day. I wasn't a hugely active contributor until late 2005, when I started working on a new interpreter and refactoring JRuby internals. I presented JRuby for the first time at RubyConf 2005, and then in early 2006 milestones started dropping like flies: IRB ran, then RubyGems, then Rake...and then Rails. We were still dog slow at the time, but we were viable.

JRuby reached the Rails singularity in time for JavaOne 2006, an event that led Sun to hire Tom and me in fall 2006. We demonstrated on stage JRuby running a simple Rails application. It was cobbled together, running under either Tomcat or simply WEBrick at the time, but we had proved it was possible to have an alternative Ruby implementation compatible enough to run Rails. JRuby was no longer a joke.

Over the past two years (man, has it really been two years?) we've essentially rewritten almost all of JRuby a piece at a time. We've been through three interpreters, one prototype compiler and one complete compiler, multiple Regexp engines, and at least two implementations of the key core classes. I wrote the compiler for JRuby during summer 2007, completing it around RailsConf EU 2007. My first compiler! We now run faster than Ruby 1.8 in both interpreted and compiled modes, with interpreted being perhaps 15-20% faster and compiled being at least a few times faster, generally on par with Ruby 1.9. Of course since we're based on the JVM, we share its object model, garbage collector, binary representation. So JRuby is certainly a "mini-VM" but we leave the nasty bits to JVM implementers to handle. Pragmatism, friends, pragmatism.

Perhaps the most notable result of JRuby's existence is that there are now so many Ruby implementations. If we had not shown the promise, many of the others might not have risked the peril. Oh, and we've ended up with a cracker-jack implementation of Ruby on the JVM too...I suppose that's worth a little something.

Perils...always perils. JRuby has managed to surmount most of the perils that await other implementations. And being on the other side of the chasm, I can tell you now it doesn't get easier.

Compatibility is *hard*. I'm not talking a little hard, I'm talking monumentally hard. Ruby is a very flexible, complicated language to implement, and it ships with a number of very flexible, complicated core class implementations. Very little exists in the way of specifications and test kits, so what we've done with JRuby we've done by stitching together every suite we could find. And after all this time, we still have known bugs and weekly reports of minor incompatibilities. I don't think an alternative implementation can ever truly become "compatible" as much as "more compatible". We're certainly the most compatible alternative impl, and even now we've got our hands full fixing bugs. Then there's Ruby 1.9 support, coming up probably in JRuby 1.2ish. Another adventure.

Performance is also hard, but maybe not *hard* hard. JRuby is lucky to run on one of the fastest VMs in existence. The JVM, in its many incarnations, has been so refined and the JVM implementation arena so competitive that we get a lot of performance for free. But by "for free" I mean Java performance. Java's far easier to write and maintain than C, and on the JVM we know we won't pay a performance penalty for not writing C code. But making Ruby fast on the JVM is where it gets tricky. JVMs are optimized for Java and Java-like languages. JRuby has to include all sorts of tricks and subsystems to make the runtime and compiled Ruby code "feel" a bit more like Java to the JVM. This has involved, in many cases, implementing our own "mini-VM" on top of the JVM, with mixed-mode execution (interpreted, then JITed to JVM bytecode), call site caches (to speed method lookup), and code alterations that sometimes improve performance at the cost of LOC and readability. The challenge for us going forward is to continue improving performance without making JRuby a jumbled mess. John Rose's work on the Da Vinci Machine and dynamic invocation for JDK 7 will help us rip a lot of code out, but only a small subset of JRuby users will see the benefits in the near term. So expect to see us spend a lot more time on performance for Java 5/6 compatible JVMs.

The final big peril for us relates to JRuby 2.0's Java integration support. JRuby currently has a split object model, where Ruby types are all "IRubyObject" implementations and the runtime only understands how to deal with "IRubyObject. This means that in order for us to call methods on non-Ruby Java types, we must wrap them with an IRubyObject wrapper. This is partially to attach our meta-object protocol to those objects, but mostly because every bloody method in the system accepts only IRubyObject as a parameter or return type. In order for us to achieve the "last mile" of Java integration, we need to make the entire system accept "Object" and act appropriately; we've been calling this approach "lightweights", since it would enable using normal Java objects for several core classes like Fixnum and Float. Support for "Object" lightweights would then feed into JRuby 2.0's reworked Java integration layer, eliminating most of the overhead (and code) associated with calling Java methods today. It would also fit better into the invokedynamic work. It's a big job I wouldn't expect to be complete until later this fall, and we need to mercilessly write specs and tests for the current behavior to avoid regressing features we support today. But we're going to do it; we've already started.

Rubinius

Evan Phoenix's Rubinius project is an effort to implement Ruby using as much Ruby code as possible. It is not, as professed, "Ruby in Ruby" anymore. Rubinius started out as a 100% Ruby implementation of Ruby that bootstrapped and ran on top of MatzRuby. Over time, though the "Ruby in Ruby" moniker has stuck, Rubinius has become more or less half C and half Ruby. It boasts a stackless bytecode-based VM (compare with Ruby 1.9, which does use the C stack), a "better" generational, compacting garbage collector, and a good bit more Ruby code in the core libraries, making several of the core methods easier to understand, maintain, and implement in the first place.

The promise of Rubinius is pretty large. If it can be made compatible, and made to run fast, it might represent a better Ruby VM than YARV. Because a fair portion of Rubinius is actually implemented in Ruby, being able to run Ruby code fast would mean all code runs faster. And the improved GC would solve some of the scaling issues Ruby 1.8 and Ruby 1.9 will face.

Rubinius also brings some other innovations. The one most likely to see general visibility is Rubinius's Multiple-VM API. JRuby has supported MVM from the beginning, since a JRuby runtime is "just another Java object". But Evan has built simple MVM support in Rubinius and put a pretty nice API on it. That API is the one we're currently looking at improving and making standard for user-land MVM in JRuby and Ruby 1.9. Rubinius has also shown that taking a somewhat more Smalltalk-like approach to Ruby implementation is feasible.

But here be dragons.

In the 1.5 years since Rubinius was officially named and born into the Ruby world, it has not yet met any of these promises. It is not generally faster than Ruby 1.8, though it performs pretty well on some low-level microbenchmarks. It is not implemented in Ruby: the current VM is written in C and the codebase hosts as much C code as it does Ruby code. Evan's work on a C++ rewrite of the VM will make Rubinius the first C++-based Ruby implementation. It has not reached the Rails singularity yet, though they may achieve it for RailsConf (probably in the same cobbled-together state JRuby did at JavaOne 2006...or maybe a bit better). And the second Rails inflection point--running Rails faster than Ruby 1.8--is still far away.

Compatibility is not going to be a problem for Rubinius. They've worked very hard from the beginning to match Ruby behavior, even launching a Ruby specification suite project to officially test that behavior using Ruby 1.8 as the standard. I have no doubt Rubinius will be able to run Rails and most other Ruby apps people throw at it. And despite Evan's frequent cowboy attitude to language compatibility (such as his early refusal to implement left-to-right evaluation ordering, a fatal decision that led to the current VM rework), compatibility is likely to be a simple matter of time and effort, driven by the spec suite and by actual applications, as people start running real code on Rubinius.

Performance is going to be a much harder problem for Rubinius. In order for Rubinius to perform well, method invocation must be extremely fast. Not just faster than Ruby 1.8 or Ruby 1.9, but perhaps an order of magnitude faster than the fastest Ruby implementations. The simple reason for this is that with so much of the core classes implemented in Ruby, Rubinius is doing many times more dynamic invocations than any other implementation. If a given String method represents one or two dynamic calls in JRuby or Ruby 1.8, it may represent twenty in Rubinius...and sometimes more. All that dispatch has a severe cost, and on most benchmarks involving heavily Ruby-based classes Rubinius has absolutely dismal performance--even with call-site optimizations that finally pushed JRuby's performance to Ruby 1.9 levels. A few benchmarks I've run from JRuby's suite must be ratcheted down a couple orders of magnitude to even complete.

And the Rubinius team knows this. Over the past few months, more and more core methods have been reimplemented in C as "primitives", sometimes because they have to be to interact with C-level memory and VM constructs, but frequently for performance reasons. So the "Ruby in Ruby" implementation has evolved away from that ideal rather than towards it, and performance is still not acceptable for most applications. In theory, none of this should be insurmountable. Smalltalk VMs run significantly faster than most Ruby implementations and still implement all or most of the core in Smalltalk. Even the JVM, largely associated with the statically-typed Java language, is essentially an optimized dynamic language VM, and the majority of Java's core is implemented in Java...often behind interfaces and abstractions that require a good dynamic runtime. But these projects have hundreds of man-years behind them, where Rubinius has only a handful of full-time and part-time enthusiastic Rubyists, most with no experience in implementing high-performance language runtimes. And Evan is still primarily responsible for everything at the VM level.

Of course, it would be folly to suggest that the Rubinius team should focus on performance before compatibility. The "Ruby in Ruby" meme needs to die (seriously!), but other than that Rubinius is an extremely promising implementation of Ruby. Its performance is terrible for most apps, but not all that much worse than JRuby's performance was when we reached the Rails singularity ourselves. And its design is going to be easier to evolve than comparable C implementations, assuming that people other than Evan learn to really understand the VM core. I believe the promise of Rubinius is certainly great enough to continue the project, even if the perils are going to present some truly epic challenges for Evan and company to overcome.

Update: The Rubinius team has had a few things to say about this as well.

Evan Phoenix argues that the 50/50 split between C and Ruby really translates into a lot more logic in Ruby, because of course we all know about Ruby's legendary terseness and density. And he's got a good point; even at 50/50 Rubinius is easily "mostly" Ruby. But nobody would claim that the JVM is Java in Java, even though the ratio of C/C++ code to Java code is vastly in Java's favor. Rubinius has a C core...Rubinius's VM is written in C. It might be 100% "Ruby in Ruby" some day, but for now it's not.

Brian Ford posted further about the "Ruby in Ruby" meme, arguing rightly that "Ruby in Ruby" is an ideal we should all be striving for, and that ideal should never die. I agree wholeheartedly on that point. The meme I referred to is the growing idea that Rubinius is automatically going to be a better implementation than all others simply because it's written in Ruby. So far that hasn't been the case. And that meme has also been used as a club, claiming other implementations (even Matz's own implementation) are not for Ruby programmers as much as Rubinius is.

Rubinius is, and always has been, a great project and a great idea. I talk with Evan and Brian and all the others on a daily basis, I contribute specs whenever I find gaps or fix bugs in JRuby, and I secretly harbor a desire to implement a JRuby/JVM backend for the Rubinius kernel. I'm sure we'll see great things from Rubinius in the future.

IronRuby

I've had a love/hate relationship with Microsoft over the years. Recently, it's been more "I love to hate them", but there are some shining stars over there. And IronRuby is certainly one of them.

Microsoft's IronRuby project (or perhaps "Microsoft IronRuby") is the current most-viable .NET-based Ruby implementation. It is led by John Lam, formerly of RubyCLR fame, and a small team of folks in Microsoft's languages group. IronRuby really has its roots in the Ruby.NET project from Queensland University of Technology, and like JRuby the real seed for both projects was the implementation of a Ruby 1.8-compatible parser. IronRuby is based on Microsoft's Dynamic Language Runtime, a collection of libraries to make dynamic languages easier to implement and to work around the performance constraints of CLR's strong preference for static-typed languages. In recent months it's probably safe to say that IronRuby has been driving DLR work, since by my estimation it represents the most difficult-to-implement dynamic language Microsoft is currently working on, and IronPython is mostly done, other than ongoing performance work.

I call IronRuby a shining star not because of the implementation, which is fairly mundane, or because of the DLR, which is perhaps clever in places but certainly "just plain necessary" for CLR dynlang performance in others. IronRuby is a shining star because it's the first Microsoft project under the Microsoft Permissive License, a "truly OSS" license approved by OSI. It represents the first project at Microsoft that I've thought gives the company any real hope for the future, because John Lam and company could truly show the advantage of more openness and closer community cooperation. So the OSS thing is certainly part of the promise of IronRuby, but it's not really a Ruby thing.

The main promise of IronRuby is a compatible, performant implementation of Ruby that runs on the CLR (and by extension, runs well on Windows). IronRuby currently mostly uses normal CLR types for all the core classes, building Ruby's String on CLR's StringBuilder, Ruby's Array on CLR's ArrayList, and so on. They've also made Ruby objects and CLR objects largely indistinguishable from one another as far as call dispatch goes, where in JRuby all non-Ruby Java objects entering the system must be wrapped with a Ruby-aware dispatcher (to be remedied in JRuby 2.0ish, as mentioned above). Of course IronRuby boasts advantages similar to JRuby, since it can leverage the CLR garbage collector, memory model, performance. And of course, having Microsoft backing your project should count for something.

IronRuby could also provide a Rails deployment option that Windows folks will actually want to use. Windows support in Ruby has always lagged behind UNIX support, partially because Windows "sucks" for various definitions of "sucks", and partially because most Rubyists don't use Windows. The ones that do use Windows have often felt abandoned, leading to projects like Daniel Berger's Ruby fork Sapphire, which counts among its features "Better support for MS Windows". IronRuby on .NET on Windows would in theory integrate very well with other Windows/.NET properties like IIS, ADO (or whatever it's called now), and Microsoft's new MVC layer. So for Windows users, IronRuby ought to be a big win, and they're understandably excited about it. Set up a tweetscan for "ironruby" and you'll see what I mean...there are nearly as many anticipatory tweets about IronRuby as there are practical tweets about JRuby.

But there's some peril here too. IronRuby is largely still being developed in a vacuum. Perhaps in order to have secrets to announce at "the next big conference" or perhaps because Microsoft's own policies require it, IronRuby's development process proceeds largely from all-internal commits, all-internal discussions, and all-internal emails that periodically result in a blob of code tossed over the fence to external contributors. The OSS story has improved, since those of us on the outside can actually get access to the code, but the necessary two-way street still isn't there. That's going to slow progress, and eventually could make it impossible for IronRuby to keep up as resources are moved to other projects at Microsoft. JRuby has managed to sustain for as long as it has with only two fulltime developers entirely because of our community and openness, and indeed JRuby would never have been possible without a fully OSS process.

IronRuby is also going to have trouble running Rails in its current form. Rails 2.x is still hindered by its inability to process concurrent requests in parallel on a single process. Because of various thread-unsafeties in the Rails libraries, concurrent requests must be shunted off to separate processes, or in the case of JRuby to separate JRuby instances in the same JVM process. There is work underway to improve this, some of it through GSOC, but it's still going to be a while before you can run your entire app on a single process with any of the C implementations. Even then, if you want to run many Rails apps, you'll still need multiple process with Ruby 1.8. And this is where IronRuby is going to get burned.

As I understand it, currently there is no way to provide multiple isolated execution environments in IronRuby as you can in JRuby or Rubinius. The reasons for this are beyond me, but many language implementations on top of a general-purpose VM avoid the complexity of "multiple runtimes" to better integrate with the rest of the system. JRuby's MVM support, for example, makes serializing Ruby objects as though they were normal Java objects nearly impossible, because upon deserialization there's no way to know which JRuby instance to attach the object to. In IronRuby's case, any inability to run multiple environments in the same process will mean IronRuby users must launch multiple processes to run Rails, just like the Ruby 1.8 and 1.9 users do. And that would be, in my opinion, an unacceptable state of affairs.

I also believe that the IronRuby team does not yet understand the scope of what's necessary to run Rails. John Lam has been quoted at several events saying they hope to run a "hello world" Rails app at RailsConf, but IronRuby can't run IRB, RubyGems, or Rake yet. John has also been tweeting periodic updates on IronRuby's spec-passing rate, even though Rubinius passes most of those specs and still can't run Rails (and JRuby passes more than Rubinius along with another 48000 assertions in our own test suite, only some of which have equivalents in the specs). As far as time spent on implementation, IronRuby really only has about a year of progress in, since Silverlight integration, demos, and presentations often pull them away for weeks at a time.

There's also a final peril the IronRuby will have to deal with: Microsoft would never back an OSS web framework like Rails in preference to its own. John Lam has repeatedly said that IronRuby will run Rails, and I believe him. But that goal is almost certainly not a Microsoft priority, since they have their own proprietary technologies to push. If John's able to do it, it will have to come from his small team and community contributors, leading back to the OSS peril above.

Be that as it may, an implementation of Ruby for the CLR is certainly going to happen, and I believe it's necessary for the Ruby ecosystem to survive for there to be a CLR Ruby. IronRuby is going to be that project, and it's already driving competition in the Ruby implementation world. John Lam and I talk at conferences, exchange tweets and emails, and I've been building and running IronRuby occasionally to check on their progress. I also know the IronRuby team realizes they could be more open and probably wants to be more open. And perhaps if they read this article they'll start to realize there's a lot more work involved in reaching the Rails singularity than running specs and having a working String implementation. There's pain involved, and they've not yet begun to feel it.

MacRuby

Ruby 1.9 fixes some of MatzRuby's issues, but not all of them. Though it brings a much-faster bytecode VM, improved method-dispatch cost, and a number of other execution-related performance tweaks, it does not solve problems with Ruby's memory model and garbage collector. And partially for this reason Apple's Laurent Sansonetti has been working on MacRuby, a forked rework of Ruby 1.9 targeting the Objective C runtime.

Laurent is famous for his past work on RubyCocoa, bindings to allow Ruby to deliver beautiful, top-notch UIs on OS X. I don't know how long he's been working on MacRuby, since some of its life was spent in secret, but it's been open-source for a few months now. Currently Laurent has been working on converting the core classes from C implementations over to using ObjC equivalents. Part of the goal of MacRuby is to provide a Ruby that can interact directly with ObjC: Ruby objects are ObjC objects and vice versa; Ruby can call methods on ObjC objects and vice-versa. So in this sense, MacRuby is perhaps more similar to JRuby or IronRuby than to Rubinius or the MatzRuby lineage. It is an implementation of Ruby for a general-purpose runtime.

MacRuby promises one thing for certain: tight integration with ObjC and by extension with much of OS X. Because ObjC is the language of choice for development on OS X, MacRuby users will certainly have the cleanest, tightest integration with OS libraries and primitives, far better than any other implementation can provide.

There's also a chance that ObjC's core classes (which essentially are repurposed as MacRuby's core classes) will have better peformance characteristics than the hand-written C impls in MatzRuby. Because MacRuby is based on Ruby 1.9, it shares Ruby 1.9's bytecode-based execution engine. This means that most performance gains will come from better core class implementations. Already Laurent has been tweeting some impressive microbenchmark numbers showing e.g. Array performance can be substantially better than Ruby 1.9. And there are performance gains to be had calling ObjC code, of course, since dispatch is now essentially ObjC dispatch directly rather than passing through Ruby's own dispatch logic.

MacRuby may also eventually represent a better way to run Rails on OS X. Because of YARV, it should have good performance. Because of ObjC, it should have a good memory model. And because it's "MacRuby" it should fit well into the rest of the system, likely leading to a simpler, better-integrated deployment story for Rails.

The biggest peril for MacRuby is pretty obvious: Why Ruby 1.9? Ruby 1.9 is still under active development, and APIs are being added and tweaked as we speak. Laurent's fork is going to get further and further away, even if he's able to keep portions up-to-date; that's going to make compatibility a serious challenge. Choosing Ruby 1.9 certainly makes sense from a performance perspective, but many Ruby apps out there don't run on it, and most developers aren't targeting it because it's still a moving target. Even on JRuby, where we've got prototype implementations of both YARV's and Rubinius's bytecode engines, we've opted not to hit 1.9 features hard yet. Ruby 1.9 is in progress, and that will mean a lot more effort required to keep MacRuby up to date and to make it an attractive option.

There's also a chance that Ruby implemented on top of Objective C isn't going to perform that much better, on the whole, than Ruby 1.9's all-C approach. While some of Laurent's benchmarks have been impressively faster than Ruby 1.9, many others have been equally slower. And all benchmarks I've run, which are a little less "micro", have been slower on MacRuby than either Ruby 1.9 or MatzRuby. That's not very promising.

Regardless of the peril, MacRuby seems like a great idea. Objective C is a solid runtime, and the fact that it's so heavily used on OS X means MacRuby will have an excellent integration story. MacRuby may not add much value in the runtime/performance/execution department, but will potentially teach us all lessons about integrating with a general-purpose runtime. And of course MacRuby may eventually be shipped with OS X, making Ruby a first-class language for writing Mac apps. That alone is probably worth it.

Fading Implementations

It's worth spending a few words on the "no longer viable" implementations here as well.

XRuby, product of Xue Yong Zhi and a few others, was the first Ruby implementation to have a full JVM bytecode compiler. Performance early on looked very good, but the lack of a compatible set of core classes and resource limitations caused its development to lag terribly. The most recent release was 0.3.3 on March 24, and since then there have been only two commits. Xue has admitted he has no time to work on the project, and without a heavy, long-term time investment from someone XRuby is likely to fade away.

Ruby.NET is a similar story. Started off a research grant from Microsoft at Queensland University of Technology, Ruby.NET is the product of John Gough and Wayne Kelly. The project was a proof-of-concept for Ruby on CLR, to show it could be done and work through some of the early challenges in getting there. It was officially made into an open source project last year, and for a while there were many interested contributors. But although the Ruby.NET parser lives on in IronRuby, the project has largely ground to a halt since Wayne officially threw his support behind Microsoft's project. There have been two commits in the past month, and the last two really active mailing list threads were titled "The future of Ruby.NET" and "Has IronRuby killed off Ruby.NET".

And Cardinal is still pining for the fjords.

New Contenders

I don't know much about these projects, but I'm sure they'll all bring their own flavor to the Ruby soup.

HotRuby is an implementation of Ruby in JavaScript. It implements Ruby 1.9's bytecode engine and all core classes using JavaScript types. Performance looks good on their site, but they can't run anything yet and haven't even begun to discuss compatibility. It may show promise in time, but it's an interesting toy for now.

MagLev from GemStone appears to be something Ruby-related. Not much (anything?) has been publicly said about it. It's mysterious. It has a nice viral front page. Rails may be involved.

IronMonkey is an effort to port IronPython and IronRuby to the Tamarin VM Adobe donated to the Mozilla project. It's being led by Seo Sanghyoen, though from what I hear he hasn't had a lot of time to work on it. Python and Ruby in the browser, cross-platform, without vendor-lock in. Could be interesting.

Final Thoughts

Have you started working on your Ruby implementation yet? All the cool kids are doing it. It's remarkable how many implementations of Ruby are in the works right now. It remains to be seen whether the ecosystem can support such diversity in the long term, but at the very least they're introducing splendid variation. And there's a lot more to do with Ruby in terms of performance, scaling, and "getting things done". Ruby's future is looking bright, in no small part due to the many implementations. How's your favorite language looking?

Update: Vladimir Sizikov has a nice short article on the value of the RubySpecs project, and since it's so important for the future of these alternative implementations I thought it deserved a mention. He also includes links to his RubySpecs quickstart guide and the current RubySpecs overview page. If you haven't contributed to the specs yet, you should feel guilty.

Friday, April 25, 2008

JRuby 1.1.1 in RedHat Fedora

Conrad Meyer of the Fedora project yesterday tossed me this link to a build log from a Fedora build machine for...you guessed it, JRuby. Conrad has been chasing down JRuby bugs, hanging out on IRC and working JRuby through the approval process to be packaged and available in Fedora, and after several months of effort it has happened. JRuby 1.1.1 is in Fedora!

Now that might be enough to make my day, but perhaps even more important is the list of dependency projects that also got shuttled through the system in order for JRuby to build and run.

jna and jna-posix - The Java Native Access library we used to call C library functions like symlink and chmod. Lots of folks are using JRuby, but the POSIX subsystem we've built upon it is now seeing use in Jython and has been split off as a jruby-contrib project. And as of today, jna and jna-posix is in Fedora too.

jline - A readline library for Java applications. JLine was actually already in, but it was a pretty old version. Over the past two years, JRuby's been not only the primary consumer of JLine (that I know of) but we've pushed a bunch of patches back to the project. So the latest JLine with all our patches is in Fedora too.

bytelist - The smallest package in the lot, this comprises exactly *one* class--ByteList--a mutable byte[] wrapper with StringBuffer-like methods. It's the backbone of our Ruby String implementation, and was spun off as a separate project for others to use. And now ByteList is in Fedora too.

jvyamlb - The byte-based version of Ola Bini's JvYAML YAML parser project. This actually represents the third YAML project from Ola, with the first two being the RbYAML library now in use in Rubinius and the char-based JvYAML project that we used to use with char-based String implementation. Now JvYAMLb is in Fedora too.

joda-time - Anyone who's had to do anything nontrivial with Java's many date and time classes will know Joda Time. Arguably the best Java solution to a very tricky problem, we moved to Joda Time a while ago to be able to better match C time API behaviors. Joda Time is in Fedora too.

joni - Saving the best for last. JOni is Marcin Mielzynski's amazing Oniguruma port. Oniguruma is a bytecode-based regular expression engine with lots of additional features like pluggable character encoding support and named groups. It's been the "last mile" for JRuby's String/Regexp support, and solved one of our last big performance bottlenecks: passing our byte-based strings through char-based regex engines. JOni is an incredible piece of work, and now it's in Fedora too.

And the whole ball of wax is building on IcedTea, RedHat's combination of GNU ClassPath and the amazing OpenJDK. We're very happy to know JRuby will be one-command installable for all Fedora users now. And we're looking forward to other Linux distributions making the latest JRuby releases available too :)

Thursday, April 24, 2008

JRuby at JavaOne "Script Bowl" Session

At JavaOne this year, there will be a session entitled "The Script Bowl", in which JRuby, Groovy, and Scala will face off on a series of challenges. I suggested that the list of challenges include a range of things each of our language communities could try to implement, and this is the list they came up with. I'm looking for help on these three, so I can show off JRuby's potential and give whoever implements the best example a nice plug and a t-shirt.

So let me repeat that as a call to action: Help implement these and the ones chosen for the session will earn a t-shirt and a plug on stage by me. Fair enough?

Here's the list:

#1 - Client Application

Write a simple, read-only Twitter client as a desktop app.
The Twitter API is documented at http://groups.google.com/group/twitter-development-talk/web/api-documentation.
User provides twitter email address and password , then browses friends and their statuses.
User can also filter content based on plain text or a regexp. (Our users are geeks.)
Your choice whether to use the JSON or XML output from Twitter.

#2 - Web Application

As a database, we'll use the "world" sample database from MySQL (http://dev.mysql.com/doc/world-setup/en/world-setup.html).
The Web Application lets a user browse countries, sorting them using different criteria (e.g. population or GNP), and select cities and display them on a map using one of the many map widgets around, e.g. Google (http://code.google.com/apis/maps/).
For geocoding you could use e.g. http://worldkit.org/geocoder/rest/.

#3 - Free round

Up to you. Show us something your language can do better than any other! But again, be concise.

So obviously #1 would be a GUI thing, probably using one of the frameworks available for JRuby. And #2 is going to be a Rails app. At the moment, I think any of Jeremy Ashkenas's ruby-processing demos would be an easy add for the script-bowl, but if anyone wants to try to top them go for it. I'd especially like to see something JavaFX-like, with nice vector-drawn graphics and maybe a physics model. Make it pretty. Sound is good :)

Feel free to email me directly, but the JRuby user mailing list would probably be a better place to do it so we can all see it and discuss the entries that represent the best of JRuby.

Ruby Implementers Design Meeting #1

On Monday evening, the various Ruby implementers (sans John Lam since he had a schedule conflict) got together on FreeNode IRC in #ruby-core to discuss various design and future-related agenda items.

Here's the agenda and IRC log, and here's a rundown of what was discussed.

Class and Module Constant Scoping

Evan Phoenix had the first item, a question about the behavior of constant scoping in classes versus modules. For example, the following snipit:

class A; end
class B; end
A::B # will find B above, but print a warning

module C; end
module D; end
C::D # will complain it can't find C::D

The reason for this is that the :: operator only searches the class hierarchy for constants. So in A::B, only A and its superclasses get searched. Since this includes Object, and B is defined at the top level, A::B finds B. But since this is probably not the behavior the user wanted, given that they were explicitly scoping the lookup to B, a warning is printed. By the same logic, C::D does not find the top-level D, because modules do not have superclasses.

Evan made the point that if the "class fallthrough" behavior is warned against, perhaps it is intended to be removed or deprecated. Though there was some discussion about the issue, it was mostly tabled and we decided it should be discussed at length on the ruby-core mailing list if there's anything more to discuss. I suspect it won't come up again.

Multiple VM Support Discussion

As you may know, Sun Microsystems and the Univesity of Tokyo are collaborating on a Multiple-VM specification and implementation for Ruby. The primary folks involved are me, Tom Enebo, and Koichi Sasada, creator of the Ruby 1.9 VM (YARV). So this agenda item was mine.

Largely, the MVM situation looks like this:

JRuby already has fairly complete and production-ready MVM support, since each JRuby runtime is "just an object" and you can have as many active as you like. But we lack a formal API and have no explicit APIs for cross-VM communication and control. The JRuby MVM support is how we're able to run multiple Rails instances in a single JVM, something no other implementation has achieved.
Rubinius has a simple implementation of MVM, with a standard API and support for cross-VM communication and control. But largely, nobody's using it, and like the rest of Rubinius it's not "production ready".
Ruby 1.9 (YARV) has neither support for MVM nor any API prepared to host it. The bulk of the work then would ultimately fall on Koichi and the ruby-core folks to both implement MVM and implement the APIs we decide upon to support it.

So the discussion largely focused on APIs and what we want them to do. In general, most folks agreed that sub-VMs should be largely independent from one another, but have a simple communication mechanism. While pipes were suggested, most agreed they were too low-level and would lead to each program implementing their own "wire protocol" unnecessarily. The model Rubinius takes is to allow passing messages of only integral types: String, Symbol, Numeric, Array, and Hash. I strongly prefer this model over the low-level pipe version and over anything more complex, since more complicated protocols can easily be built upon it. Rubinius also does this as fully asynchronous message passing, though it's unclear whether that would be a requirement of the MVM API or not. We also mostly agreed that if "forking" a VM in-process were ever to be supported, it would come after the first round of work on MVM. As far as I'm concerned, fork needs to die a flaming death RSN.

We closed out this item by getting the MVM list and wiki information out to people and encouraging more discussion. Hopefully this one will pick up a bit; I'd really love to put a standard API face on JRuby's excellent MVM support.

RubySpec and ruby-core

The final two agenda items we covered mostly focused on the RubySpec specs and how we can get the ruby-core folks to use them, contribute to them, and otherwise bless them as some sort of "official" specification effort. Brian Ford started out with an overview of the spec status, how they're run, where they're being used. Currently JRuby, IronRuby, and Rubinius all use the specs, though only JRuby and Rubinius developers are actively contributing back to it. The ruby-core team expressed some concerns. Will this mean replacing Ruby's existing test/ dir? How easy is it to run the specs? Do they require features we have changed/are changing in 1.9/2.0? Can we host them in the Ruby repository?

In general the discussion we very productive. One by one we tried to address their concerns, and I think we made a great case for them to start using and contributing to the specs. They were also interested in getting a list of spec failures we'd seen against Ruby 1.8.7pre2 (Vladimir handled my action item for that...thanks Vladimir!), and I suggested that Brian write up a simple "how to" to send to ruby-core, so they'd all understand the process and how it's supposed to work. I think we're getting very close to having ruby-core use and bless the specs as being a required part of Ruby development. And that's just fine with me.

Next Meeting

We scheduled the next meeting for the same time (9PM CDT, 11:00JST) on Apr 30/May 1. Here's the current agenda for the next ruby design meeting

Thursday, April 17, 2008

Converting Groovy to Ruby

In this post, Glen Stampoultzis posts a very interesting and clever sequence of steps by which he converts a piece of Java code into idiomatic Groovy code. I thought it was a nice article, so I'll do a very short spin on it: taking Glen's final Groovy code and converting it into idiomatic Ruby.

(I'll probably get my pants flamed off by you-know-who, but hey, it's a slow Thursday afternoon.)

Glen's code, an algorithm for finding all n-length subsequences in a given array, ends up here:

def subn(n, list) {
    if (n == 0) return [[]];
    if (list.isEmpty()) return [];

    def remainder = list.subList(1, list.size());
    return subn(n-1, remainder).collect() { [list.get(0)] + it } + subn(n, remainder);
}

(I had to make some edits because his code didn't appear to format right; I don't claim this code is 100% correct in this state.) Admittedly it's a very nice reduction from the original Java code. As Glen correctly surmises, it's largely due to those super-nice closures we get from Groovy. My first step is to make a few minor changes to make it run correctly in Ruby:

def subn(n, list)
    return [[]] if (n == 0);
    return [] if (list.empty?());

    remainder = list.slice(1, list.size());
    return subn(n-1, remainder).collect() {|it| [list.at(0)] + it } + subn(n, remainder);
end

The most notable changes here are flopping the statement-modifying conditionals on the first two returns and adding an "it" parameter to the collect block. Oh, there's also the "empty?" method. But largely it looks like pretty much the same code. The next obvious step is to remove things that aren't needed in Ruby.

def subn(n, list)
    return [[]] if n == 0
    return [] if list.empty?

    remainder = list.slice(1, list.size)
    return subn(n-1, remainder).collect {|it| [list.at(0)] + it } + subn(n, remainder)
end

Mostly just removing some parens that are unnecessary (and distracting, for me) as well as line-terminating semicolons. Note that Groovy also can omit line-terminating semicolons. I think it's a matter of taste.

def subn(n, list)
    return [[]] if n == 0
    return [] if list.empty?

    remainder = list[1..-1]
    subn(n-1, remainder).collect {|it| [list[0]] + it } + subn(n, remainder)
end

Now we've eliminated the last "return" statement and made the list accesses use the array-referencing "[]" method...in the first case with a more idiomatic range of values rather than two parameters. Again, a matter of taste I suppose; two arguments works just fine. So that brings us mostly to the end of converting from the original Groovy code into Ruby. The new version is as readable as the original, but shorter by several characters. And of course this is my biased opinion, but it reads nicer too. I'm sure that will bring about all sorts of flaming in itself.

At any rate, my point in this is certainly not to start a flame war about which version is better. My point is that the Groovy and Ruby versions are still largely the same. If you can learn to produce the Groovy end result, you can just as easily learn to produce the Ruby end result. So if you've stuck with Java because you're worried you can't learn Ruby, or you don't think Ruby is enough like Java...don't be scared! A whole Rubylicious world awaits you :)

(Oh, and for you Rubyists out there...feel free to post your own improvements in the comments. I didn't want to change the original flow of the code, but I know it can be golfed down a lot more.)

Saturday, April 5, 2008

EU Tour Quick Hits

I'm very tired. I've been in the EU since last week Wednesday, first at Euruko in Prague, then getting slides ready for Scotland on Rails, and finally preparing a release announcement with Tom today. It's been a very long 11 days. So here's some quick hits for you.

EuRuKo FTW - It seems like EuRuKo was the conference to be at. We had three choices over the past two weeks: EuRuKo, last weekend in Prague; RubyFools A and B, in Copenhagen and Denmark earlier this week; and Scotland on Rails in Edinburgh yesterday and today. We made earlier commitments to EuRuKo and Scotland and opted not to kill ourselves by making extra trips to Copenhagen and Oslo. But from the sounds of it, we did pretty well. EuRuKo had better than 300 people, and it was a great conference...felt very much like my first RubyConf in 2004. Scotland had a hundred-something people and I heard from a few other folks RubyFools was about the same. And although my opinion of Prague was only slightly improved this trip, EuRuKo turned out to be an excellent conference.
Beer is dangerously cheap in Prague - I'm not sure why I didn't learn my lesson the first time.
Scotland is beautiful - Maybe I like rainy, green places, but I thought Edinburgh was a beautiful city and Scotland was a beautiful country. I wish I'd had time to go hiking on the crags or take a day trip to Loch Ness. I'll definitely be back.
Rails developers like more than Rails - We went out on a limb with our 90-minute Scotland on Rails talk and also presented NetBeans Ruby/Rails support, performance/threading demos, Ruby-Processing graphics eye-candy, the classic IRB/Swing demonstration, and a healthy series of slides debunking most of the great myths about Java. And the attendees loved it. Sure, we had lots of people really excited about the GlassFish Gem and WAR-file deployment, but we had at least as many excited about using Swing in Ruby, distributing all-in-one application bundles with JRuby, compiling code for fun and profit (and a perhaps disturbing number of people wanting compilation solely for obfuscation purposes), and general NetBeans Ruby/Rails features. It was a far more receptive audience than I'd expected, and the talk felt just great.
I need to come back to the UK soon - I like it here, and there's tons of people interested in JRuby. Gotta find a good excuse to come back soon.

And of course, we got JRuby 1.1 out today (download), after a good 9-10 months since JRuby 1.0. Happy times! Now, back to work...JRuby 1.1.1 soon, additional maintenance releases as needed, and super-happy-ultra JRuby 1.2/2.0 work coming up.

I sleep now.

Thursday, April 3, 2008

Shared Data Considered Harmful

One of the big reasons people have been interested in JRuby lately is because it is currently the only Ruby implementation that can both host multiple instances in a single process and run threads *actually* in parallel. Ruby 1.8 is entirely green-threaded. Rubinius has support for multiple parallel in-process VMs, but each one is still green-threaded. Ruby 1.9 uses native threads, but they're almost always blocking on a giant lock. IronRuby uses native CLR threads, but can't host multiple instances in a single process. JRuby FTW.

Of course true parallel execution is not without its pitfalls, and we stumbled across one this week. No, it wasn't a lock contention issue or data corruption or anything you're thinking of. It was smaller, sneakier, and much lower-level.

Have a look at this code:

public class Trouble {
    public static int i = 1;
    public static int firedCount = 0;
    
    public static int totalSize = 1000000000;
    public static int maxLoops = 5;
    
    public static void main(String[] args) {
        final int threadCount = Integer.parseInt(args[0]);
        Thread threads[] = new Thread[threadCount];
        
        try {
            for (int loops = maxLoops; loops > 0; loops--) {
                firedCount = 0;
                long start = System.currentTimeMillis();

                for (int j = 0; j < threadCount; j++) threads[j] = new Thread() {
                    public void run() {
                        int total = totalSize / threadCount;
                        for (int k = 0; k < total; k++) {
                            if (((i += 1) & 0xFF) == 0) doNothing();
                        }
                    }
                };

                for (int j = 0; j < threadCount; j++) {
                    threads[j].start();
                }

                for (int j = 0; j < threadCount; j++) {
                    threads[j].join();
                }
                
                System.out.println("time: " + (System.currentTimeMillis() - start));
                System.out.println("fired: " + firedCount);
            }
        } catch (Exception e) {
            e.printStackTrace();
            System.exit(1);
        }
    }
    
    public static void doNothing() {
        firedCount++;
    }
}

It looks pretty straightforward, right? Given a number on the command line, it fires up that many threads, divides the overall loop count by the number of threads, and starts them all running. Each thread increments the same static field 'i', and if ever i % 256 == 0, they call doNothing which increments a hit counter. Simple!

This is obviously a pretty dumb microbenchmark, but it mirrors a similar piece of code in JRuby. Because Ruby 1.8 supports various "unsafe" threading operations, we must also support them. But since we can't kill a target thread or raise an exception arbitrarily in a target thread, we must periodically checkpoint. Every thread checks whether one of those events has fired at various points during execution. The full list is describe on the RubySpec wiki page on Ruby threading, but the bit we're interested in is the check that happens every 256 method calls. In JRuby, this was implemented with a static counter, and every time a call was made the counter was incremented and checked.

So because JRuby has the ability to *actually* run in parallel, a group at Sun has been working on some microbenchmarks to test our threaded scalability. The benchmark is fairly simple: given a array of strings, reverse each string and replace it in the array. When running with n threads, divide the array into n segments to parallelize the work. We'd naturally expect that performance would improve as threads were added up to the number of cores in the system. But we discovered a problem we never expected to see: threaded scaling completely tanked.

A Surprising Effect

Here's the result of running the Java benchmark on Java 5 with one and then two threads:

~/NetBeansProjects/jruby ➔ java -server Trouble 1
time: 3924
fired: 3906250
time: 3945
fired: 3906250
time: 1841
fired: 3906250
time: 1882
fired: 3906250
time: 1896
fired: 3906250
~/NetBeansProjects/jruby ➔ java -server Trouble 2
time: 3243
fired: 4090645
time: 3245
fired: 4100505
time: 1173
fired: 3906049
time: 1233
fired: 3906188
time: 1173
fired: 3906134

This is on my Core Duo MacBook Pro. With one thread, overall CPU usage is at around 60-65%. Spinning up the second thread consumes the remaining 35-40%, and so we see an improvement in the overall execution time. Yay parallel execution.

But now have a look at the Java 6 numbers on the same machine:

~/NetBeansProjects/jruby ➔ java -server Trouble 1
time: 1772
fired: 3906250
time: 1973
fired: 3906250
time: 2748
fired: 3906250
time: 2114
fired: 3906250
time: 2294
fired: 3906250
~/NetBeansProjects/jruby ➔ java -server Trouble 2
time: 3402
fired: 3848648
time: 3805
fired: 3885471
time: 4145
fired: 3866850
time: 4140
fired: 3839130
time: 3658
fired: 3880202

Woah, what the hell happened here? Not only is overall performance worse, the two-thread run is significantly slower than the single thread. And we saw the exact same effect in JRuby: two threads running a given benchmark performed markedly worse than a single thread...exactly the opposite effect we'd like to see.

Caching 101

Source: CPU Cache article on Wikipedia

Accessing main memory during execution is slow...so slow that all modern processors have multiple levels of cache in which to cache memory locations that are used repeatedly. There's cache on the processor itself, cache in fast memory chips located near the processor, and so on. The cache, in whatever form, is a smaller, faster memory store. Within the cache, data is organized into cache lines, segments of data that map to specific locations in main memory. So if your processor uses cache lines of 16 bytes, the cache will hold some number of 16-byte entries for the most recently or most frequently-used data from memory.

Writes to memory also benefit from caching. Whether a cache is write-through or not, most writes do not immediately go to main memory for the same performance reasons: main memory is slow. So if you have a single processor accessing a small range of memory, rapidly reading and writing data, it may operate almost entirely from the cache. To help this process along, most compilers will try to organize memory locations likely to be accessed temporally near each other such that they'll be loaded into nearby memory locations; the more you can jam into each cache line, the more likely you'll make good use of available cache memory.

Of course at some point, the cache becomes stale, and writes must flushed and reads must be refreshed. This can happen (for example) when there's a cache miss, and the program accesses a memory location not already cached. In this case, one of the other cache entries must be evicted to make room for the new one. It will also happen naturally as the program progresses or as new programs and threads are scheduled to execute...eventually the cache must be flushed simply to ensure that main memory has been updated and all lines of execution see the change. But there's another point at which the cache or (more likely) an individual cache line might be flushed

Cache Line Ping-Pong

In a multiprocessor/multicore machine, you will usually have two or more threads actually running in parallel. Depending on the architecture, each core usually has its own cache, allowing the two threads to run independently without influencing each others' work. Of course things started to get interesting when you have multiple threads accessing the *same* memory locations, as we have in this case.

This brings us to the reason for Java 6's poor performance on the benchmark above. John Rose helped me out by disassembling the resulting native code for this benchmark on Java 5 and Java 6. It turned out that while Java 5 located the two static fields in different quadwords (16 bytes), Java 6 located both fields in the same quadword. This optimization allows better cache utilization...since the two fields are immediately next to each other in memory, they're almost certain to share a cache line. But it also has a different effect.

Cache line ping-pong happens when two threads executing against different caches need to read and write the same memory locations. Because the two threads need to see each others' changes, the cache line ends up bouncing back and forth between the two caches, generally causing a flush or refresh from main memory in the process.

Main memory is slow, remember? Because of the cache line ping-pong, adding threads for this benchmark actually caused things to slow down. Every new thread increased the amount of back-and forth between caches and main memory up to the number of cores...at which point, performance remained about constant for additional threads (because the amount of ping-pong was no worse).

It's interesting to note that single-thread performance was also considerably worse on Java 6. I don't yet have a good explanation for this.

The Fix

For the Java benchmark above, given what we now know about cache lines and such, a simple fix that Vladimir Sizikov found was to move one of the two statics to a different class. Java 6 only aligned within a single quadword the two fields when on the same class; moving one elsewhere presumably results in their living on different cache lines and the amount of ping-ponging is reduced.

In JRuby, where we had only a single counter but a lot of other stuff happening around it, the answer was much clearer: we split the single global counter into several per-thread counters and the effect was entirely erased. So the string-reversal benchmark started to scale nearly-linearly with additional threads, while Ruby 1.8, Ruby 1.9, and Rubinius showed a slight performance degradation as threads were added. JRuby FTW.

The Moral

I suppose there's a few things we can learn from this:

Don't use statics - Gilad Bracha probably said it best in his article "Cutting out Static":
Static variables are bad for for concurrency. Of course, any shared state is bad for concurrency, but static state is one more subtle time bomb that can catch you by surprise.
Touché, Gilad.
Don't share state unless you absolutely have to - This is perhaps a lesson we all believe we know, but until you're bitten by something sneaky like this you may not realize the what bad habits you have.
One man's optimization is another man's deoptimization - Java 6 is significantly faster than Java 5, presumably because of optimizations like this. But every optimization has a tradeoff, and this one caught me by surprise.