Thursday, July 12, 2007

To Keyword Or Not To Keyword

One of the most attractive aspects of Ruby is the fact that it has relatively few sacred keywords. In most cases, things you'd expect to be keywords are actually methods, and you can wrap or hook their behavior and create amazing potential.

One perfect example of this is require. Because require is just a method, you can define your own version that wraps its behavior. This is exactly how RubyGems does its magic...rather than immediately calling the default require, it can modify load paths based on your installed gems, allowing for a dynamically-expanding load path and the pluggability we've all come to know and love.

But all such keyword-like methods are not so well behaved. Many methods make runtime changes that are otherwise impossible to do from normal Ruby code. Most of these are on Kernel. I propose that several of these methods should actually be keywords.

Update: Evan Phoenix of Rubinius (EngineYard), Wayne Kelly of Ruby.NET (Queensland University), and John Lam of IronRuby (Microsoft) have voiced their agreement on this interesting ruby-core mailing list thread. Have you shared your thoughts?

Justifying Keywords

There's a number of (in my opinion, very strong) justifications for this:
  1. Many Kernel methods manipulate runtime state in ways no other methods can. For example: local_variables requires access to the caller's variable scope; block_given? requires access to the block/iter stacks (in MRI code); eval requires access to just about everything having to do with a call; and there are others, see below.
  2. Because many of these methods manipulate normally-inaccessible runtime state, it is not possible to implement them in Ruby code. Therefore, even if someone wanted to override them (the primary reason for them to be methods) they could not duplicate their behavior in the overridden version. Overriding only destroys their utility.
  3. These methods are exactly the ones that complicate optimizing Ruby in all implementations, including Ruby 1.9, Rubinius, JRuby, Ruby.NET, and others. They confound a compiler's efforts to optimize calls by always leaving open questions about the behavior of a method. Will it need access to a heap-allocated scope? Will it save off a binding or the current call frame? No way to know for sure, since they're methods.
In short, there appears to be no good reason to keep them as methods, and many reasons to make them keywords. What follows is a short list of such methods and why they ought to be keywords:
  • *eval - requires implicit access to the caller's binding
  • block_given?/iterator? - requires access to block/iter information
  • local_variables - requires access to caller's scope
  • public/private/protected - requires access to current frame's visibility
There may be others, but these are definitely the biggest offenders. The three points above were used to compile this list, but my criteria for a keyword could be the following more straightforward points. A feature should be implemented (or converted to) a keyword if it fits either of the following criteria:
  • It manipulates runtime state in ways impossible from user-created code
  • It can't be implemented in user-created code, and therefore could not reasonably be overridden or hooked to provide additional behavior
As an alternative, if modifications could be made to ensure these methods were not overridable, Ruby implementations could safely treat them as keywords; searching for calls to "eval" in a given context would be guaranteed to mean an eval would take place in that context.

What do we gain from doing all this?

I can at least give a JRuby perspective. I expect others can give their perspectives.

In JRuby, we could greatly optimize method invocations if, for example, we knew we could just use Java's local variables (on Java's stack) rather than always heap-allocating a scoping structure. We could also avoid allocating a frame or binding when they are not needed, just allowing Java's call frame to be "enough" for us. We can already detect if there are closures in a given context, which helps us learn that a heap-allocated scope will be necessary, but we can't safely detect eval, block_given?, etc. As a result of these methods-that-would-be-keywords, we're forced to set up and tear down every method in the most expensive manner.

Other implementations&emdash;including Ruby 1.9/2.0 and Rubinius&emdash;would probably be able to make similar optimizations if we could calculate ahead of time whether these keyword operations would occur.

For what it's worth, I may go ahead and implement JRuby's compiler to treat these methods as keywords, only falling back on the "method" behavior when we detect in the rest of the system that the keyword has been overridden. But that situation is far from ideal...we'd like to see all implementations adopt this behavior and so benefit equally.

As an example, here's an early demonstration of the performance change in our old friend fib() when we can know ahead of time if any of these keywords are called (fib calls none of them). This example shows the performance today and the performance when we can safely just use Java local variables and scoping constructs. We could additionally omit heap-allocated frames for each call, giving a further boost.

I've included Ruby 1.8.6 to provide a reference value.


Current JRuby:
~ $ jruby -J-server bench_fib_recursive.rb
1.323000 0.000000 1.323000 ( 1.323000)
1.118000 0.000000 1.118000 ( 1.119000)
1.055000 0.000000 1.055000 ( 1.056000)
1.054000 0.000000 1.054000 ( 1.054000)
1.055000 0.000000 1.055000 ( 1.054000)
1.055000 0.000000 1.055000 ( 1.055000)
1.055000 0.000000 1.055000 ( 1.055000)
1.049000 0.000000 1.049000 ( 1.049000)

~ $ jruby -J-server bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
3.901000 0.000000 3.901000 ( 3.901000)
4.468000 0.000000 4.468000 ( 4.468000)
2.446000 0.000000 2.446000 ( 2.446000)
2.400000 0.000000 2.400000 ( 2.400000)
2.423000 0.000000 2.423000 ( 2.423000)
2.397000 0.000000 2.397000 ( 2.397000)
2.399000 0.000000 2.399000 ( 2.399000)
2.401000 0.000000 2.401000 ( 2.401000)
2.427000 0.000000 2.427000 ( 2.428000)
2.403000 0.000000 2.403000 ( 2.403000)
Using Java's local variables instead of a heap-allocated scope:
~ $ jruby -J-server bench_fib_recursive.rb
2.360000 0.000000 2.360000 ( 2.360000)
0.818000 0.000000 0.818000 ( 0.818000)
0.775000 0.000000 0.775000 ( 0.775000)
0.773000 0.000000 0.773000 ( 0.773000)
0.799000 0.000000 0.799000 ( 0.799000)
0.771000 0.000000 0.771000 ( 0.771000)
0.776000 0.000000 0.776000 ( 0.776000)
0.770000 0.000000 0.770000 ( 0.769000)

~ $ jruby -J-server bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
3.100000 0.000000 3.100000 ( 3.100000)
3.487000 0.000000 3.487000 ( 3.487000)
1.705000 0.000000 1.705000 ( 1.706000)
1.684000 0.000000 1.684000 ( 1.684000)
1.678000 0.000000 1.678000 ( 1.678000)
1.683000 0.000000 1.683000 ( 1.683000)
1.679000 0.000000 1.679000 ( 1.679000)
1.679000 0.000000 1.679000 ( 1.679000)
1.681000 0.000000 1.681000 ( 1.681000)
1.679000 0.000000 1.679000 ( 1.679000)
Ruby 1.8.6:
~ $ ruby bench_fib_recursive.rb     
1.760000 0.010000 1.770000 ( 1.775304)
1.750000 0.000000 1.750000 ( 1.770101)
1.760000 0.010000 1.770000 ( 1.768833)
1.750000 0.010000 1.760000 ( 1.782908)
1.750000 0.010000 1.760000 ( 1.774193)
1.750000 0.000000 1.750000 ( 1.766951)
1.750000 0.010000 1.760000 ( 1.777814)
1.750000 0.010000 1.760000 ( 1.782449)

~ $ ruby bench_method_dispatch_only.rb
Test interpreted: 100k loops calling self's foo 100 times
2.240000 0.000000 2.240000 ( 2.268611)
2.160000 0.010000 2.170000 ( 2.187729)
2.280000 0.010000 2.290000 ( 2.292342)
2.210000 0.010000 2.220000 ( 2.250331)
2.190000 0.010000 2.200000 ( 2.210965)
2.230000 0.000000 2.230000 ( 2.260737)
2.240000 0.010000 2.250000 ( 2.256210)
2.150000 0.010000 2.160000 ( 2.173298)
2.250000 0.010000 2.260000 ( 2.271438)
2.160000 0.000000 2.160000 ( 2.183670)

What do you think? Is it worth it?

9 comments:

  1. Personally, I'd like to see Ruby get the ability to implement most of those constructs in Ruby, but I doubt that's going to happen at any time in 1.8.

    Can you do the optimization 'speculatively' and watch for any attempt to redefine any of the methods that would invalidate the optimization?

    ReplyDelete
  2. Speculative optimization is what I'm planning to do now, assuming they're keyword-like and only falling back on the old behavior (probably globally) if I can detect that's not the case. the problem is whether I can detect it or not.

    Another option is to lazily initialize the runtime constructs in question only at the point I know they're needed, but that's extremely complicated to implement in practice.

    It's worth mentioning that Rubinius, while far from being a complete Ruby implementation, has already opted to treat these operations as keywords. Evan's on board with the idea too.

    ReplyDelete
  3. As they are methods you can still alias them/override them and then recall them as necessary, so all is not lost. Making them keywords removes these options.

    ReplyDelete
  4. I'm also not sure whether changing the language to make optimization easier is necessarily the best way to go.

    It sort of isn't necessarily important that these methods can't be implemented in-language (it may be nice if they could be) -- as Dr Nic pointed out, you can augment their behaviour and call the old versions to get access to their implementations.

    I guess if there's a way to detect if they're overridden (as you're doing), then that's the best way to do it.

    ReplyDelete
  5. Nic and Simon: I think you are mistaken. Although you can alias these methods, you can't ever wrap or hook their behavior. An example with eval:

    alias :my_eval :eval

    def eval(string)
    my_eval(string)
    end

    x = 1
    eval("puts x") #=> error

    The actual eval call must be made within the scope where you expect it to run, so there's no way to wrap it. The same goes for the others; binding would get the wrong binding, public/private/protected would set visibility in the wrong scope, and so on. That means these methods are basically impossible to wrap or hook, which is the primary argument for keeping them methods.

    Bottom line is that these aren't methods...they're keyword operations that do things no method can do. They should be keywords.

    ReplyDelete
  6. fHmm... I just realised that I'm mistaken about the way local_bindings behaves when you eval it with a binding. I just wasn't thinking carefully enough about what goes on.

    ReplyDelete
  7. Could it be an alternative to let the actual keyword-handing behaviour be specified as a command-line switch to the interpreter? (ie "optimized" or "true ruby")

    ReplyDelete
  8. these offenders may be able to do things that no method can do, but ruby programmers can play around with them in the same way they do with other methods. they can use Object#send, Method#call, etc. because they expect them to be methods, not keywords, and they are least likely to be surprised when they use them in that way. i don't think the speed boost is worth the split in the consistency of the programmers' expectations.

    ReplyDelete
  9. El Raichu: Except that users can't use send to invoke a number of other features like "alias". In that case, Ruby has both a keyword and a method with expected behaviors. There's already a split in the consistency of behaviors here.

    ReplyDelete