Compiler Status
I've been working feverishly for the past several weeks to get the rest of the compiler complete. Currently, it's able to handle the majority of Ruby syntax. Here's a list of the remaining language features that do not compile:
- "rescue" blocks; exception handling in Ruby is rather complicated, and there's some particularly odd uses of rescue that will be a bit tricky to support with normal Java exception-handling.
- "class var declaration" is not yet supported. This is when you declare a class variable (@@foo) from within the body of a class or module. This primarily affects compiling class bodies, so although it prevents AOT compilation of some scripts, it doesn't usually affect individual methods.
- "opt n" execution. This is specifying "-n" to the Ruby runtime, and it loops the provided script as though it were surrounded by "while gets(); ... end". It's useful for line-by-line processing of stdin.
- "post execution" blocks. Post exe blocks are when you specify an END { ... } block somewhere in your script. These blocks are saved up and executed at the end of the script execution, regardless of where they appear in the script. They're a bit like Kernel#at_exit blocks.
- "retry". Tell me friends, do you know what "retry" actually does? Retry is used within a block/closure, and it causes the method containing the closure to be re-called anew. And as an interesting quirk, the original arguments to the method are re-evaluated, so if you call foo(bar()) and a retry is triggered within foo(), bar() will get invoked again for the retried call to foo(). Weird, eh? Update: I didn't explain this well. Here's another attempt: if you have the following code:
def foo(x = bar()); 1.times {retry}; end
And you call foo with no arguments, allowing the default argument logic to fire, retry will cause that logic to fire again and again. It's essentially re-entering the method anew with the original arguments, but causing *argument processing* to be revisited. I'm not sure why you'd want this behavior, since it could frequently result in default arguments to re-call methods that might only be valid the first time. - Some non-local flow control is not yet complete. Non-local flow control happens any time you return, break, or next from within a block (when not immediately inside a normal loop construct). Much of non-local flow control is working, but I need to flush out any remaining cases that aren't running correctly.
a = [1, 2, (begin; raise; rescue; 3; end)]When this code is compiled, it turns into a local variable assignment. The value assigned is a literal array construction with three elements: a Fixnum 1, a Fixnum 2, and a rescued block of code. The typical way to construct the array then is to follow these steps:
- Construct an array of the appropriate size
- Dup the array reference
- Push a constant integer zero
- Push Fixnum 1
- Insert Fixnum 1 at index zero in the array. This consumes the dup'ed array, the index, and the Fixnum1.
- Dup the array reference again
- Push a constant integer one
- Push Fixnum 2
- Insert Fixnum 2
- Dup the array reference again
- Push a constant integer two
- Now it gets complicated; we must recurse in the compiler to handle the rescue block
- The rescue block is compiled and a "raise" is triggered in the code
- The exception raised is handled, resulting in the whole rescue leaving a Fixnum 3 on the stack
- Insert the Fixnum 3
- Construct a RubyArray object with the remaining object array
We will likely have to solve this complication in one of two ways:
- We could save off the stack when entering code that might trigger exception handling
- We could put exception-handling logic in a separate method and invoke it in-place, thereby protecting our executing stack from clearage.
A Nice Performance Milestone
And on the topic of performance, the recent compiler work has allowed us to reach a new milestone: we now exceed Ruby 1.8.6's performance on M. Edward (Ed) Borasky's MatrixBenchmark.
Some months back, after the Mountain West RubyConf in Salt Lake City, Ed posted an interesting blog entry where he professed a lot of confidence in JRuby's future. We emailed a bit offline, and he pointed me to this matrix benchmark he'd been using to measure the relative performance of Ruby 1.8.6 and Ruby 1.9 (YARV). I told him I'd give it a try.
Originally, we were perhaps 50% to 100% slower than Ruby 1.8.6. This was back when hardly anything was compiling, and there had been few serious efforts to optimize the JRuby runtime. Performance slowly crept up as time went on. But as recent as a week ago, JRuby performance was still roughly 20-25% slower than 1.8.6.
So last week, I dug into it a bit more. I turned on JRuby's JIT logging (-J-Djruby.jit.logging=true) and verbose logging (-J-Djruby.jit.logging.verbose=true) to log compiling and non-compiling methods, respectively. As it turned out, the "inverse_from" method in matrix.rb was not yet compiling...and it was where the bulk of MatrixBenchmark's work was happening.
The final sticking point in the compiler for this method was "operator element assignment" syntax, or basically anything that looks like a[0] += 5. It's a little involved to compile; you have to retrieve the element, calculate the value, call the operator method, and reassign all in one operation. For the ||= or &&= versions, you have to perform a boolean check against the element to see if you should proceed to the assignment. A good bit of compiler code, but it had to be done.
So then, with "OpElementAsgn" compiling, it was time to re-run the numbers. And finally, finally, we were comfortably exceeding Ruby 1.8.6 performance:
Ruby 1.8.6:Or should I say vastly exceeding? By my calculation this is an easy 2x performance increase, and perhaps a 70% improvement just by getting this one extra method to compile.
Hilbert matrix of dimension 128 times its inverse = identity? true
586.110000 5.710000 591.820000 (781.251569)
JRuby trunk, Java 6 server, ObjectSpace disabled:
Hilbert matrix of dimension 128 times its inverse = identity? true
372.950000 0.000000 372.950000 (372.950000)
On Beyond Zebra
I believe we're pretty well on-target to have the compiler completed by RubyConf in November. I'm about to embark on a refactoring adventure to prepare for the stack-juggling I'll have to do to support rescue blocks. That will mean minimal progress on adding to the compiler until the end of the month, but ideally the refactoring will make it easy to get rescue compilation complete. The others are just a matter of spending some time.
Once the JRuby compiler is complete, we will start testing in earnest against a fully pre-compiled Ruby stdlib. Along with that, we'll wire in support for pre-compiling RubyGems as they install and pre-compiling Ruby scripts as they are executed and loaded. Much of this works already in prototype form, but it waits for the completion of the compiler to go into general use.
I also have plans for a "static" compiler for JRuby that enable compiling Ruby classes into normal, instantiable, callable, static Java classes. This would bring us on par with other compiled languages on the JVM, and allow you to directly instantiate and invoke JRuby/Ruby objects from within your Java code.
Beyond all this work, Tom and I have been discussing a whole raft of performance improvements we could make to the underlying JRuby runtime. There's a lot more performance to be had, and it's just around the corner.
Exciting times, friends. Exciting times.
Nice :)
ReplyDeleteHaving a static compiler is a great feature !
Great!
ReplyDeleteCharles, is there anything that somebody like me (with java and ruby experience, but without much knowlegde on JRuby internals) could help out. I have some spare time and would like to spend it better than just watching TV. :)
Any pointers or links would be appreciated!
I'd focus on Rails performance. Probably the most important benchmark at all. It would be great if JRuby 1.1 comes close to MRI performance in this regard.
ReplyDeleteSomething I noticed when looking at Rails performance with a profiler:
RubyModlue.includeModule() -> getRuntime().getCacheMap().clear() consumed about 10% of the complete request (including Jetty/Goldspike)
This was a very simple page though and the profiling was nothing scientific but a quickshot only (the stacktraces are impressive by the way)