Thursday, December 23, 2010

Improved JRuby Startup by Deferring Gem Plugins

Another present for you JRubyists out there!

JRuby has had notoriously bad startup times. Not as bad as, say, IronRuby (sorry guys!), but definitely a big fat hit every time you need to run some Ruby code from the command line. Some of this overhead was related to JRuby, and we've steadily worked to improve that over the years. Some of it is due to the JVM, most commonly due to running on the "server" Hotspot VM or another JVM that does not have an interpreter (both of which start up considerably slower than Hotspot/OpenJDK's "client" mode). I've blogged tips and tricks for JRuby startup before, and these mostly apply to vanilla JRuby startup performance.

However, a large part of the overhead was not specifically due to JRuby or the JVM, but to RubyGems. RubyGems in version 1.3 added support for "plugins", whereby gems could include a specially-named file to extend the functionality of RubyGems itself. Most of these plugins added command-line tools like "gem push" for pushing a new gem to gemcutter.org (now built-in for pushing to rubygems.org). Unfortunately, the feature was originally added by having RubyGems do a full scan of all installed gems on every startup. If you only had a few gems, this was a minor problem. If you had more than a few, it became a big fat O(N) problem, where each of those N could be arbitrarily complex in themselves.

The good news is that it looks like my proposed change – making plugin scanning happen *only* when using the "gem" command – appears likely to be approved for RubyGems 1.4, due out reasonably soon.

Here's the patch and the impact to RubyGems startup times are below. The first two times are without the patch, with the first time against a "cold" filesystem. The final time is with the patch in place. In all cases, it's against my local JRuby working copy, which has around 500 gems installed.

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
17.09

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
6.959

~/projects/jruby ➔ git stash pop
# On branch master
# Changed but not updated:
# (use "git add <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# modified: lib/ruby/site_ruby/1.8/rubygems.rb
# modified: lib/ruby/site_ruby/1.8/rubygems/gem_runner.rb
...

~/projects/jruby ➔ jruby -e "t = Time.now; require 'rubygems'; puts Time.now - t"
0.481


It's truly a shocking difference, and it's easy to see why JRuby (plus RubyGems) has had such a bad startup-time reputation.

I've already made this change locally to JRuby's copy of RubyGems, which should help any users working against JRuby master. The change will almost certainly ship in JRuby 1.6, with RCs showing up in the next couple weeks. So with this change and my JRuby startup tips, we're on the road to a much more pleasant JRuby experience.

Happy Hacking!

JRuby Finally Installs ruby-debug Gem

This should be a great Christmas present for many of you.

After over three years, the "ruby-debug" gem finally installs properly on JRuby.

~/projects/jruby ➔ gem install ruby-debug
Successfully installed ruby-debug-base-0.10.4-java
Successfully installed ruby-debug-0.10.4
2 gems installed


Back in 2007, folks working on NetBeans, Eclipse, and IntelliJ support for Ruby came together to build a new version of the ruby-debug backend that would work on JRuby. They shared the effort, we JRuby guys added features they needed to do a clean port of the ruby-debug C code, and the ultimate result was a ruby-debug-base gem that isolated the platform-specific bits.

For whatever reason, the JRuby version of ruby-debug-base never got pushed as a real, live "-java" gem. This meant that you had to download ruby-debug-base-VERSION-java.gem yourself to get ruby-debug to install. To make this process easier, we even shipped ruby-debug and ruby-debug-base preinstalled in JRuby 1.5.

Unfortunately, this was only a partial answer. Many libraries and applications want to install all their dependencies clean. If one of those dependencies was ruby-debug, it would fail to install. Rails even includes special JRuby-specific lines in its default Bundler Gemfile to exclude ruby-debug when bundling on JRuby.

All that nonsense ends today. Rocky Bernstein, one of the maintainers of the ruby-debug gem, agreed to push our ruby-debug-base to the canonical rubygems.org repository. As a result, ruby-debug now installs properly on JRuby. It only took three years to get that gem pushed (by nobody's fault...I think everyone expected everyone else to follow through on it).

Merry Christmas, Happy Chanukah, Joyous Kwanza, and enjoy your holiday season!

Quick Thoughts on Oracle/Apache and the Java TCK

This was going to be a reply to a friend on Twitter, when I realized it would be several tweets and I might as well put them in one place.

Grant Michaels (@grantmichaels) sent me this link to the October JCP (Java Community Process) EC (Expert Committee) meeting notes. The Apache/TCK (Technology Compatibility Kit) issue was discussed at length.

For those of you in the dark, Apache recently resigned from the JCP because of the ongoing dispute over their "Harmony" OSS (Open-Source Software) Java implementation's inability to get an unencumbered license to the Java TCK. Passing the TCK is a requirement for an implementation to officially be accepted as "Java".

I had heard about this problem from a distance, but only recently started to understand its complexity. The TCK includes FOU (Field of Use) clauses preventing TCK-tested implementations other than OpenJDK from being released as open-source. Only implementations "largely" based on OpenJDK (Open Java Development Kit, Sun's GPLed "Hotspot" VM and class libraries) are allowed to get around this requirement. Apache's Harmony, being entirely independent and Apache-licensed, does not qualify.

If that were all there is to it, Apache would not have any real grounds for complaining. But the JSPA (Java Specification Participation Agreement, PDF) requires only unencumbered specifications, reference implementations, and test kits be submitted to the JCP. This sets the stage for an ugly licensing battle that has stymied Java's progress for over three years.

I'm not going to do a tl;dr post on this like I did with Oracle/Google. Instead, just a few quick thoughts as I read through the notes.

Overview

The first thing you will notice is that this isn't a simple cut-and-dried issue. The meeting notes express Oracle's position as outwardly fearful of Harmony leading to many downstream forks, with no recourse for asserting they fulfill the requirements of the Java specification. There seems to be an implied stab at Android here, which uses Harmony's class libraries atop the Dalvik VM to implement a substantial portion (but not all) of what looks like Java SE 5. Oracle states this decision is final; Apache will not be granted an unencumbered TCK license. Oracle is not known for changing their minds.

Several EC members, including Doug Lea and Josh Bloch, point out that it's fairly clear the encumbered TCK violates the JSPA's openness clauses. Oracle refuses to comment on this "legal" matter. Doug suggests that EC members might be able to vote to move Java forward with a clearer conscience if the JSPA were amended to make the encumbered TCK "legal".

Another point brought up by several members is the frustration that they have to deal with licensing at all. They recall a golden age of the JCP where it actually voted on technical matters rather than arguing over licensing.

IBM declares they are unhappy with this decision, but even more unhappy that the Java platform has stagnated because of it for so long. IBM would eventually go on to vote "yes" to the disputed Java 7 JSR, even in the presence of the apparent JSPA violation.

Apache representative and longtime Harmony advocate Geir Magnusson also weighed in. He argued that the health of the platform would only be bolstered by allowing for many independent open-source implementations, and damaged by disallowing them. When asking for a clarification of why OpenJDK gets a free pass, Adam Messinger (Oracle) stated that he didn't want to answer a legal question, but that OpenJDK's GPL (Gnu Public License) requires reciprocity from downstream forks, reducing the damage and confusion they might cause if released publicly without full spec compliance (I'm paraphrasing based on the notes here).

Toward the end of the licensing discussion, Adam again called for all memory organizations to participate in OpenJDK. It's fairly clear from these notes and from previous announcements and discussions that Oracle intends for OpenJDK to be the "one true OSS Java", and for all comers to contribute to it. They even managed to get Apple and IBM, longtime GPL foes, to join the family. Apache doesn't do GPL, and so Apache will not contribute to or base projects off OpenJDK.

No Right Answer

Now we enter entirely into the realm of my own opinions.

At a glance, I hate Oracle for disallowing free use of the TCK. Nobody should be disallowed from implementing their own OSS Java. Harmony is a promising project that would have a promising future if the cloud of licensing, patent protection, and "compliance" could be lifted. It seems that's nearly impossible now.

I also hate to see a good project like Harmony "die" or be stuck in legal limbo. I'm sure dozens of developers have poured their hearts into Harmony, and they deserve to see it thrive.

On the other hand, I applaud Oracle for so vigorously promoting OpenJDK. OpenJDK is certainly a more mature JVM and class library than Harmony, having been Sun's official Java implementation for years and years. Bringing IBM and Apple into OpenJDK will help ensure it moves forward on all platforms of note. If you look only at OpenJDK, OSS Java is stronger now than it ever has been.

Oracle may have a point with the forking concerns. Because Apache's license is so open that anyone can create and release binary-only forks, it would in theory be possible to speckle the Java landscape with Harmony derivatives that are incompatible in subtle ways. Possible, but perhaps unlikely.

Ultimately, I feel sad about the direction Oracle has taken, but I'm bolstered by hopes that OpenJDK is going to really thrive and grow, and as a result "Java" will continue to see widespread use and adoption across a variety of platforms.

Other Ways Out

Now I go into pure speculation-mode.

A key problem with the TCK restriction is the fact that the Java specification provides patent grants only to compliant implementations. In other words, if you don't want an Oracle/Google-esque lawsuit once you have billions in the bank, you need to be compliant with the one true TCK. In order to be compliant with the TCK, you can't release your implementation as open source. The patent grant is obviously intended to ensure that third-party implementations are "really" Java.

There's a bit of a chicken-and-egg thing here. What if Harmony could pass the TCK? In theory, they may be at that point right now. Does the ability to pass the TCK but the inability to run it without tainting mean they're Java or not? If an unrelated third-party forked Harmony into "Barmony", acquired a TCK license, and proved that it passed...would that mean Harmony could be considered compliant without ever having run the TCK?

What if, as in Android, Harmony simply moved forward without claiming they were compliant? Oracle could eventually club them to death with patent bats. Perhaps such a legal battle would force the legal remifications of the JSPA violation to be addressed in court? Could Oracle survive a legal test of their violation in attempting to back up a patent suit?

What It Means For You

The ultimate question is how this affects Java developers today and going forward.

By most estimates, 99% of the worlds Java developers run on one of the standard Java implementations from Oracle, IBM, and other licensees. A massive number of them run atop Hotspot, either in an old closed-source Java 5/6 form or in an OpenJDK 6/7 form. "Nobody" runs Harmony, and so few if any day-to-day Java developers will be affected. That doesn't excuse the situation, but it does soften the actual damage caused.

Android is another peculiar case. It would be difficult for it to be tested compliant, even if there weren't FOU restrictions on doing so. It uses Harmony libraries but Dalvik VM. It is also a massive force in mobile development now, and killing it would likely put the final nail in mobile Java's coffin. Oracle has to know this. Oracle also has to know that killing Android would hand the mobile keys over to Apple and Microsoft forever. Could Android switch to using OpenJDK-based class libraries? Would that qualify it as being "largely" based on OpenJDK (noting that the class libraries are the vast majority of the code in OpenJDK)? Oracle/Google is likely to be stuck in court for a long time, while Android continues to expand into televisions and tablets along with telephones.

How about projects that build atop Java, like JRuby? Perhaps even higher a percentage of JRuby users are already running atop OpenJDK or an OpenJDK derivative like IcedTea or SoyLatte. Oracle pushing OpenJDK will only benefit those users. Ruboto (JRuby on Android) will follow whatever path Android itself ends up following, and nobody can see that future yet...but it seems unlikely Ruboto will ever die since it's unlikely Android will ever die.

In closing...I encourage everyone to read the EC notes and gather as much information as they can before claiming this is now the final "death" of Java. I also encourage everyone to contribute thoughts, clarifications, and speculation in the comments here.

Have a happy holiday!