Thursday, September 11, 2008

A First Taste of InvokeDynamic


Greetings, readers!

Over the past couple weeks I've had a few departures from typical JRuby development. I consider it a working vacation. I'm hoping to report on all of it soon, but for now we'll focus on one of the most exciting items: JSR-292, otherwise known as "InvokeDynamic".

I've reported on invokedynamic previously (InvokeDynamic: Actually Useful?), and of course the technical bits of John Rose's blog should be required reading for anyone interested in this stuff. What I'm going to try to do today is give you an inside picture of the pieces of InvokeDynamic and how they fit together. It will be technical, but everyone should be able to follow it. Ready?

The Problem

Any description of a solution must first describe the problem.

As you probably know, Java is a statically-typed language. That means the types of all variables, method arguments, method return values, and so on must be known before runtime. In Java's case, this also means all variable types must be declared explicitly, everywhere. A variable cannot be untyped, and a method cannot accept untyped parameters nor return an untyped value. Types are pervasive.

The problem, put simply, is this: Because Java is the primary language on the JVM, almost all language implementations on the JVM are written in Java. When implementing a statically-typed language, especially one with structure and rules similar to Java, this is not much of a problem. But when implementing a dynamic language that stubbornly refuses to yield type information until runtime, all this static-typing is a real pain in the neck. Of course this is pretty much the same situation when implementing a dynamic language on top of C or C++ or C#, since they're all generally statically-typed languages too. Or is it? An example is in order.
public class Hello {
public static void main(String[] args) {
java.util.List list = new java.util.ArrayList();
for (int i = 0; i < 5; i++) {
String newString = args[0] + i;
list.add(newString);
}
System.out.println(list);
}
}
Here we see a short, reasonably simple snippit of Java code. An ArrayList is constructed, populated with five strings based on the incoming first command-line argument and a numeric iteration count, and then displayed as a string on the console. The type declarations (shown in bold) represent a lot of the visual noise, the "ceremony" that dynamic language fans decry. From a usability perspective, they're both a positive and negative influence; they noise up the code and require more typing, but they also make it trivial to determine the type of a variable (in most cases) or build tools that safely restructure your code (so-called "refactoring"). From a technical perspective, they give the "javac" compiler all the information it needs to produce very clean, optimized bytecode, and they give the JVM itself type information it uses to execute and optimize that bytecode at runtime. Ahh, but what about the bytecode?

If we peel the Java layer away, the situation changes a bit. At the JVM bytecode level, types are still visible, but they're not nearly as prevalent. Here's the same code in bytecode, with the type names again in boldface:
public static void main(java.lang.String[]);
Code:
0: new #2; //class java/util/ArrayList
3: dup
4: invokespecial #3; //Method java/util/ArrayList."<init>":()V
7: astore_1
8: iconst_0
9: istore_2
10: iload_2
11: iconst_5
12: if_icmpge 50
15: new #4; //class java/lang/StringBuilder
18: dup
19: invokespecial #5; //Method java/lang/StringBuilder."<init>":()V
22: aload_0
23: iconst_0
24: aaload
25: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
28: iload_2
29: invokevirtual #7; //Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
32: invokevirtual #8; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
35: astore_3
36: aload_1
37: aload_3
38: invokeinterface #9, 2; //InterfaceMethod java/util/List.add:(Ljava/lang/Object;)Z
43: pop
44: iinc 2, 1
47: goto 10
50: getstatic #10; //Field java/lang/System.out:Ljava/io/PrintStream;
53: aload_1
54: invokevirtual #11; //Method java/io/PrintStream.println:(Ljava/lang/Object;)V
57: return
Since not everyone reads JVM bytecode like their native language, a description of these operations is in order.

Java provides what's called an "operand stack" for bytecode it executes. The stack is analogous to registers in a "real" CPU, acting as temporary storage for values against which operations (like math, method calls, and so on) are to be performed. So most JVM bytecode spends its time either manipulating that stack by pushing, popping, duping, and swapping values, or executing operations that produce or consume values. It's a pretty simple mechanism. So then, with a general understanding of the operand stack, lets look at the bytecode itself:
  • The "load" and "store" instructions are all local variable accesses. "load" retrieves a local variable and pushes it on the stack. "store" pops a value off the stack and stores it in a local variable. The prefix indicates whether the value is an object or "reference" type (denoted by "a") or one of the primitive types (denoted by "i" for integer, "f" for float, and so on). The standard load and store operations take an argument (embedded along with the operation into the bytecode) to indicate which indexed local variable to work with, but there are specialized bytecodes (denoted by a suffixed underscore and digit) for a "compressed" representation of heavily-used low-index variables.

  • The "invoke" bytecodes are what you might expect: method invocations. Method invocations consume zero or more arguments from the stack and in some cases a receiver object as well. "virtual" refers to a normal call to a non-interface method on an object receiver. "interface" refers to an interface invocation on an object receiver. "static" refers to a static invocation, or one that does not require an object to call against. The "strange quark" of the bunch is "invokespecial", which is used for calling constructors and superclass implementations of methods. You'll notice a couple invokespecials above paired with "new" operations; "new" instantiates the object and "invokespecial" initializes it.

  • The "const" instructions are what you might guess: they push a constant on the stack. Again, the prefix and suffix denote type and "compressed" opcodes for specific values, respectively.

  • "aaload" and all "*aload" operations are retrievals out of an array. As with local variables, the first letter indicates the type of the array. Here, the "aaload" is our retrieval of args[0].

  • "iinc" is an integer increment operation. The arguments are the index of the local variable and how much to increment it by (usually 1).

  • "if_icmpge" performs a conditional jump after testing whether the second-topmost int on the stack (indicated by the "i" in "icmpge") is greater than or equal to the topmost int on the stack (the >= relationship represented by the "ge" in "icmpge"). This is our "for" loop test i < 5 reversed to act as a loop exit condition rather than a loop continue condition. The looping itself is provided by the "goto" operation further down (yes, the JVM has goto...it's just Java that doesn't have goto).

  • Finally, we see the "return" instruction, which represents the void return from main. If it were a return of a specific value or object type, it would be preceded by the appropriate type character.

Now the astute reader may already have noticed that other than being specified as reference or primitive types, the opcodes themselves have no type information. Even beyond that, there are no actual variable declarations at the bytecode level whatsoever. The only types we see come in the form of opcode prefixes (as in aload, iinc, etc) and the method signatures against which we execute invoke* operations. The stack itself is also untyped; we push a reference type (aload) one minute and push a primitive type (iload) the next (though values on the stack do not "lose" their types). And when I tell you that the type signatures shown above for each method invocation or object construction are simply strings stuffed into the class's pool of constants...well...now you may start to realize that Java's sometimes touted, oft-maligned static-typing...is just a façade.

The Greatest Trick

Let's dispense with the formality once and for all. The biggest lie that's been spread about the JVM (ok, maybe the biggest after "it's slow") is that it's never going to be a good host for dynamic languages. "But look at Java," people cry, "it's so staticky and rigid; it's far too difficult to implement a dynamic language on top of that!" And in a very naive way, they're partially correct. Writing a language implementation in Java and following Java's rules can certainly make life difficult for a dynamic language implementer. We end up stripping types (making everything Object, since we don't know types until runtime), boxing types (stuffing primitives in carrier objects, to simplify passing them through our Object-only code), and boxing array arguments (since many dynamic languages also have flexible "arities" or numbers of arguments, and others allow optional, "rest", and other special argument types). With each sacrifice we make, we lose many of the benefits static typing provides us, not to mention confounding the JVM's efforts to optimize.

But it's not nearly as bad as it seems. Because much of the rigid, static nature of Java is in the language itself (and not the JVM) we can in many cases ignore the rules. We don't have to declare local variable types. We can juggle items on the stack at will. We can cheat in clever ways, allowing much of normal code execution to proceed with very little type information. In many cases we can get that code to run nearly as well as statically-typed code of twice the size, because the JVM is so dynamic already at its core. JVM bytecode is our assembly, and it's a powerful tool in the right hands.

Unfortunately, on current JVMs, there's one place we absolutely, positively must follow the rules: method invocation.

Know Thyself

Question: In the bytecode above, all invocations came with a formal "signature" representing the type to call against and the types of the method's arguments and return value. If we do not know those types until runtime, and they may be variant even then...how do we support invocation in a dynamic language?

Answer: Very carefully.

Because we are bound to following Java's method invocation rules, the once sunny and clear forecast turns rather cloudy. Every invocation has to be called against a known type. Its arguments must be known types. Its return value must be a known type. Making matters worse, we can't even provide signatures with similar types; the signatures must exactly match the method we intend to invoke. So we understand limitation #1: invocations are statically typed.

There's another way this affects dynamic languages, especially those that may not present normal Java types or that run in an interpreted mode for some part of execution: Invocations must be against real methods on real types. There's simply no way to tell the JVM that instead of calling method W on type X with param Y and return value Z, I want you to enter this interpreter loop; don't mind the types, we'll figure it out for you. Oh no, you have to be part of the Java club and present a normal Java type to get invocation privileges. That's limitation #2: invocations must be against Java methods on Java types.

Adding insult to injury, JVMs even run verification against the bytecode you feed them to make sure you're following the rules. One little mistake and zooop...off to the exception farm you go. It's downright unfair.

The traditional way to get around all this rigidity (a technique used heavily even by normal Java libraries, since everyone wants to bend the rules sometimes) is to abstract out the act of "invoking" itself, usually by creating "Method" objects that do the call for you. And oddly enough, the reflection capabilities of the JVM come into heavy play here. "Method" happens to be one of the types in the java.lang.reflect package, and it even has an "invoke" method on it. Even better, "invoke" returns Object, and accepts as parameters an Object receiver and an array of Object arguments. Can it truly be this easy? Well, yes and no.

Using reflection to invoke methods works great...except for a few problems. Method objects must be retrieved from a specific type, and can't be created in a general way. You can't ask the JVM to give you a Method that just represents a signature, or even a name and a signature; it must be retrieved from a specific type available at runtime. Oh, but that's at runtime, right? We're ok, because we do actually have types at runtime, right? Well, yes and no.

First off, you're ignoring the second inconvenience above. Language implementations like JRuby or Rhino, which have interpreters, often simply don't *have* normal Java types they can present for reflection. And if you don't have normal types, you don't have normal methods either; JRuby, for example, has a method object type that represents a parsed bit of Ruby code and logic for interpreting it.

Second, reflected invocation is a lot slower than direct invocation. Over the years, the JVM has gotten really good at making reflected invocation fast. Modern JVMs actually generate a bunch of code behind the scenes to avoid a much of the overhead old JVMs dealt with. But the simple truth is that reflected access through any number of layers will always be slower than a direct call, partially because the completely generified "invoke" method must check and re-check receiver type, argument types, visibility, and other details, but also because arguments must all be objects (so primitives get object-boxed) and must be provided as an array to cover all possible arities (so arguments get array-boxed).

The performance difference may not matter for a library doing a few reflected calls, especially if those calls are mostly to dynamically set up a static structure in memory against which it can make normal calls. But in a dynamic language, where every call must use these mechanisms, it's a severe performance hit.

Build a Better Mousetrap?

As a result of reflection's poor (relative) performance, language implementers have been forced to come up with new tricks. In JRuby's case, this means we generate our own little invoker classes at build time, one per core class method. So instead of calling through our DynamicMethod to a java.lang.reflect.Method object, boxing argument lists and performing type checks along the way, we're able to create a fast, specialized bit of bytecode that does the trick for us.
public org.jruby.runtime.builtin.IRubyObject call(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject,
org.jruby.RubyModule, java.lang.String, org.jruby.runtime.builtin.IRubyObject);
Code:
0: aload_2
1: checkcast #13; //class org/jruby/RubyString
4: aload_1
5: aload 5
7: invokevirtual #17; //Method org/jruby/RubyString.split:(Lorg/jruby/runtime/ThreadContext;
Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/RubyArray;
10: areturn
Here's an example of a generated invoker for RubyString.split, the implementation of String#split, taking one argument. We pass into the "call" method a ThreadContext (runtime information for JRuby), an IRubyObject receiver (the String itself), a RubyModule target Ruby type (to track the hierarchy during super calls), a String method name (to allow aliased methods to present an accurate backtrace), and the argument. Out of it we get an IRubyObject return value. And the bytecode is pretty straightforward; we prepare our arguments and the receiver and we make the call directly. What would normally be perhaps a dozen layers of reflected logic has been reduced to 10 bytes of bytecode, plus the size of the class/method metadata like type signatures, method names, and so on.

But there's still a problem here. Take a look at this other invoker for RubyString.slice_bang, the implementation of String#slice!:
public org.jruby.runtime.builtin.IRubyObject call(org.jruby.runtime.ThreadContext, org.jruby.runtime.builtin.IRubyObject,
org.jruby.RubyModule, java.lang.String, org.jruby.runtime.builtin.IRubyObject);
Code:
0: aload_2
1: checkcast #13; //class org/jruby/RubyString
4: aload_1
5: aload 5
7: invokevirtual #17; //Method org/jruby/RubyString.slice_bang:(Lorg/jruby/runtime/ThreadContext;
Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
10: areturn
Oddly familiar, isn't it? What we have here is called "wastefulness". In order to provide optimal invocation performance for all core methods, we must generate hundreds of these these tiny methods into tiny classes with everything neatly tied up in a bow so the JVM will pretty please perform that invocation for us as quickly as possible. And the largest side effect of all this is that we generate the same bytecode, over and over again, with only the tiniest of changes. In fact, this case only changes one thing: the string name of the method we eventually call on RubyString. There are dozens of these cases in JRuby's core classes, and if we attempted to extend this mechanism to all Java types we encountered (we don't, for memory-saving purposes), there would be hundreds of cases of nearly-complete duplication.

I smell an opportunity. Our first step is to trim all that fat.

Hitting the Wall

Let me tell you a little story.

Little Billy developer wanted to freely generate bytecode. He'd come to recognize the power of code generation, and knew his language implementation was dynamic enough that compiling once would not be optimal. He also knew his language needed to do dynamic invocation on top of a statically-typed language, and needed lots of little invokers.

So one day, Billy's happily playing in the sandbox, building invokers and making "vroom, vroom" sounds, when along comes mean old Polly Permgen.

"Get out of my sandbox, Billy," cried Polly, "you're taking up too much space, and this is *my* heap!"

"Oh, but Polly," said Billy, rising to his feet. "I'm having ever so much fun, and there's lots of room to play on that heap over there. It's oh so large, and there's plenty of open space," he desperately replied.

"But I told you...this is MY heap. I don't want to play over there, because I like playing *right here*." She threw her exceptions at Billy, smashing his invokers to dust. Satisfied by the look of horror on Billy's face, she plopped down right where he had been sitting, and smiled terribly up at him.

Dejected, Billy sulked away and became a Lisp programmer, living forever in a land where data is code and code is data and everyone eats butterscotches and rides unicorns. He was never seen nor heard from again.


This story will be very familiar to anyone who's tried to push the limits of code generation on the JVM. The JVM keeps in memory a large, pre-allocated chunk of reserved space called the "heap". The heap is maintained as a contiguous area of space to allow the JVM's garbage collector to move objects around at will. All objects allocated by the system come out of this heap, which is usually split up into "generations". The "young" generation sees the most activity. Objects that are created and immediately dereferenced (like, abandoned?), never make it out of this generation. Objects that persist longer stick around longer. Some objects live forever and get to the oldest generations, but most objects die an early death. And when they die, their bodies become the grass, and the antelope eat the grass. It's a beautiful circle of life. But why are there no butterscotches and unicorns?

The dirty secret of several JVM implementations, Hotspot included, is that there's a separate heap (or a separate generation of the heap) used for special types of data like class definitions, class metadata, and sometimes bytecode or JITted native code. And it couldn't have a scarier name: The Permanent Generation. Except in rare cases, objects loaded into the PermGen are never garbage collected (because they're supposed to be permanent, get it?) and if not used very, very carefully, it will fill up, resulting in the dreaded "java.lang.OutOfMemoryError: PermGen space" that ultimately caused little Billy to go live in the clouds and have tea parties with beautiful mermaids.

So it is with great reluctance that we are forced to abandon the idea of generating a lot of fat, wasteful, but speedy invokers. And it's with even greater reluctance we must abandon the idea of recompiling, since we can barely afford to generate all that code once. If only there were a way to share all that code and decrease the amount of PermGen we consume, or at least make it possible for generated code to be easily garbage collected. Hmmm.

AnonymousClassLoader

Now it starts to get cool.

Enter java.dyn.AnonymousClassLoader. AnonymousClassLoader is the first artifact introduced by the InvokeDynamic work, and it's designed to solve two problems:
  1. Generating many classes with similar bytecode and only minor changes is very inefficient, wasting a lot of precious memory.

  2. Generated bytecode must be contained in a class, which must be contained in a ClassLoader, which keeps a hard reference to the class; as a result, to make even one byte of bytecode garbage-collectable, it must be wrapped in its own class and its own classloader.

It solves these problems in a number of ways.

First, classes loaded by AnonymousClassLoader are not given full-fledged symbolic names in the global symbol tables; they're given rough numeric identifiers. They are effectively anonymized, allowing much more freedome to generate them at will, since naming conflicts essentially do not happen.

Second, the classes are loaded without a parent ClassLoader, so there's no overprotective mother keeping them on a short leash. When the last normal references to the class disappear, it's eligible for garbage collection like any other object.

Third, it provides a mechanism whereby an existing class can be loaded and slightly modified, producing a new class with those modifications but sharing the rest of its structure and data. Specifically, AnonymousClassLoader provides a way to alter the class's constant pool, changing method names, type signatures, and constant values.
   public static class Invoker implements InvokerIfc {
public Object doit(Integer b) {
return fake(new Something()).target(b);
}
}

public static Class rewrite(Class old) throws IOException, InvalidConstantPoolFormatException {
HashMap constPatchMap = new HashMap();
constPatchMap.put("fake", "real");

ConstantPoolPatch patch = new ConstantPoolPatch(Invoker.class);
patch.putPatches(constPatchMap, null, null, true);

return new AnonymousClassLoader(Invoker.class).loadClass(patch);
}
Here's a very simple example of passing an existing class (Invoker) through AnonymousClassLoader, translating the method name "fake" in the constant pool into the name "real". The resulting class has exactly the same bytecode for its "doIt" method and the same metadata for its fields and methods, but instead of calling the "fake" method it will call the "real" method. If we needed to adjust the method signature as well, it's just another entry in the constPatchMap.

So if we put these three items together with our two invokers above, we see first that generating those invokers ends up being a much simpler affairs. Where before we had to be very cautious about how many invokers we created, and take care to stuff them into their own classloaders (in case they need to be garbage-collected later), now we can load them freely, and we will see neither symbolic collisions nor PermGen leaks. And where before we ended up generating mostly the same code for dozens of different classes, now we can simply create that code once (perhaps as normal Java code) and use that as a template for future classes, sharing the bulk of the class data in the process. Plus we're still getting the fastest invocation money can buy, because we don't have to use reflection.

Who could ask for more?

Parametric Explosion

I could. There's still a problem with our invokers: we have to create the templates.

Let's consider only Object-typed signatures for a moment. Even if we accept that everything's going to be an Object, we still want to avoid stuffing arguments into an Object[] every time we want to make a call. It's wasteful, because of all those transient Object[] we create and collect, and it's slow, because we need to populate those arrays and read from them on the other side. So you end up hand-generating many different methods to support signatures that don't box arguments into Object[]. For example, the many call signatures on JRuby's DynamicMethod type, which is the supertype of all Ruby method objects in a JRuby runtime:
    public abstract IRubyObject call(ThreadContext context, IRubyObject self, RubyModule clazz, 
String name, IRubyObject[] args, Block block);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule clazz,
String name, IRubyObject[] args);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name, IRubyObject arg);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name, IRubyObject arg1, IRubyObject arg2);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name, IRubyObject arg1, IRubyObject arg2, IRubyObject arg3);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name, Block block);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name, IRubyObject arg, Block block);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name, IRubyObject arg1, IRubyObject arg2, Block block);
public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule klazz, String name, IRubyObject arg1, IRubyObject arg2, IRubyObject arg3, Block block);
What was that I said about wasteful?

And this doesn't even consider the fact that ideally we want to move toward calling methods with *specific types* since any good JVM dynlang will eventually have to call a normal Java method with a non-Object-based signature. Oh, we could certainly generate new versions of "call" into their own little interfaces at runtime, but we'd have to load them, manage them, make sure they can GC, make sure they don't collide with each other, and so on. We end up back where we started, because AnonymousClassLoader is only part of the solution. What we really need is a way to ask the JVM for a lightweight, non-reflected, statically-typed "handle" to a method that's primitive enough for the JVM to treat it like a function pointer.

Hey! Let's call it a MethodHandle! Brilliant!

Method Handles

MethodHandle is the next major piece of infrastructure added for InvokeDynamic. Instead of having to pass around java.lang.reflect.Method objects, which are slower to invoke and carry all that metadata and reflection bulk with them, we can now instead deal directly with MethodHandle, a very primitive reference type representing a specific method on a specific type with specific parameters.

But wait, didn't you say specifics get in the way?

Specifics can get in the way if we're concerned only about invoking dumb dynamic-typed methods that could accept any number of types, as is the case in dynamic languages. Being forced to specify a specific type means that specific type becomes Object, and so all paths must lead to the same generic code. And truly, if MethodHandle was no more than a "detachable method" it wouldn't be particularly useful. But in order to support the more complex call protocols dynamic languages introduce, with their implicit type conversions, dynamic lookup schemes, and "no such method" hooks, MethodHandles are also composable.

Say we have a target method on the Happy type that takes a single String argument.
public class Happy {
public void happyTime(String arg){}
}
We can capture a method handle for this class in one of two ways. We can either "unreflect" a java.lang.reflect.Method object, or we can ask the MethodHandles factory to produce one for us:
MethodHandle happyTimeHandle = MethodHandles.findVirtual(Happy.class, "happyTime", void.class, String.class);
Our new happyTimeHandle is a direct reference to the "happyTime" method. It's statically typed, with a type signature of "(Happy, String)void" (meaning it accepts a Happy argument and a String argument and returns void, since we must include the receiver type). And the code looks very similar to retrieving a java.lang.reflect.Method instance. So if all we're concerned about is calling happyTime on a Happy instance with a String argument, this is basically all there is to it. But that's rarely enough for us dynamic types. No, we need all our "magic" too.

Luckily, MethodHandles also provides a way to adapt and compose handles. Perhaps the simplest adaptation is currying.

Currying a method (and really when we talk about methods here we're talking about functions with a leading receiver argument) means to grab that method reference, stuff a couple values into its argument list, and produce a new method reference that uses those values plus future values you provide at call time to make the target call. In this case, we'll insert a Happy instance we want this handle to always invoke against.
MethodHandle curriedHandle = MethodHandles.insertArgument(happyTimeHandle, new Happy());
The resulting curried handle has a signature of only "(String)void", since we've curried or bound the handle to a specific instance of Happy.

There are also more complicated adaptations. We may need to have what John Rose calls a "flyby" adapter that examines and possibly coerces arguments in the arg list. So we grab a handle to the method representing that logic, attach it to our MethodHandle as a flyby argument adapter, and the resulting handle will perform that adaptation as calls pass through it. We may want to "splat" or "spread" arguments, accepting a variable argument count and automatically stuffing it into an array. MethodHandles.spreadArguments can return a handle that does what we're looking for. Perhaps we need pre and post-call logic, like artificial frame or variable scope allocation. We just represent the logic as simple functions, produce handles for each, and assemble a new MethodHandle that brackets the call. Bit by bit, piece by piece, the complex vagaries of our call protocols can be decomposed into functions, referenced by method handles, and composed into fast, efficient, direct calls. Are we having fun yet?

We haven't even gotten to the coolest part.

Brief History

JSR-292 started out life as a proposal for a new bytecode, "invokedynamic", to accompany the four other "invoke" bytecodes by allowing for dynamic invocation. When it was announced, the early concept provided only for invocation without a static-typed signature. It still required a call to eventually reach a real method on a real type, and it did not provide (or did not specify) a way to alter the JVM's normal logic for looking up what method it should actually invoke. For languages like JRuby and Groovy, which store method tables in their own structures, this meant the original concept was essentially useless: most dynamic languages have "open" types whose methods can be added, removed, and redefined later, so it was impossible to ever present a normal type invokedynamic could call.

It also included nothing to solve the larger problems of implementing a dynamic language on the JVM, problems like the restrictive, over-pedantic rules for loading new bytecode and the limitations and poor performance of reflected methods. It was, in essence, dead in the water. That was mid 2006.

Fast-forward to September of that year. Sun Microsystems, after years of promoting Java as the "one true language" on the JVM, has decided to hire on two open-source developers to work on the JRuby project, a JVM implementation of Ruby, a fairly complex dynamically-typed language. The pair had managed to run the most complicated application framework the Ruby world had to offer, and for the first time in a long time it started to look like directly supporting non-Java languages on the JVM might be a good idea.

Around this time or shortly after, John Rose became the new JSR-292 lead. John was a member of the Hotspot VM team, and among his many accomplishments he listed a fast Scheme VM and a bytecode-based regular expression engine. But perhaps most importantly, John knew Hotspot intimately, knew that the its core was simply *made* for dynamic languages, and had a pretty good idea how to expose that core. So it began.

InvokeDynamic

The culmination of InvokeDynamic is, of course, the ability to make a dynamic call that the JVM not only recognizes, but also optimizes in the same way it optimizes plain old static-typed calls. AnonymousClassLoading provides a piece of that puzzle, making it easy to generate lightweight bits of code suitable for use as adapters and method handles. MethodHandle provides another piece of that puzzle, serving as a direct method reference, allowing fast invocation, argument list manipulation, and functional composability. The last piece of the puzzle, and probably the coolest one of all, is the bootstrapper. Now it's time to blow your mind.

There's two sides to a invocation. There's the call, presumably a chunk of bytecode doing an "invoke" operation, and there's the target, the actual method it invokes. Under normal circumstances, targets fall into three categories: static methods, virtual methods, and interface methods. Because two of these types--static and virtual--are explicitly bound to a specific method, they can be verified when the method's bytecode is loaded. If the type or method do not exist, the bytecode is considered invalid and an error is thrown. However the third type of target, an interface method, may have any number of targets at runtime, potentially targets that have not even been loaded into the system yet. So the JVM gives invokeinterface operations much more flexibility. Flexibility we can exploit.

Much of the JVM's optimizations come from it treating what looks like normal code as "special". Hotspot, for example, has a large list of "intrinsic" methods (like System.arraycopy or Object.getClass), methods that it always tries to inline directly into the caller, to ensure they have the maximum possible performance and locality. It turns out that adding bytecodes to the JVM isn't really even necessary, if you have the freedom to define special new behaviors based solely on the methods, types, or operations in play. And apparently, the Hotspot team has that freedom.

Because of the low probability of a new bytecode being approved, and because it really wasn't necessary, John introduced a "special" new interface type called java.dyn.Dynamic. Dynamic does not include any methods, nor is it intended as a marker interface. You can implement it if you like, but its real purpose comes when paired with the invokeinterface bytecode. For you see, under InvokeDynamic, an invokeinterface against Dynamic is not really an interface invocation at all.
public class SimpleExample {
public Object doDynamicCall(Object arg) {
return arg.myDynamicMethod();
}
}
Here's a simple example of code that won't compile. Because the incoming argument's type is Object, we can only call methods that exist on Object. "myDynamicMethod" is not one of them. The hypothetical bytecode for that call, if it did compile, would look roughly like this:
public java.lang.Object doDynamicCall(java.lang.Object);
Code:
0: aload_1
1: invokevirtual #3; //Method java/lang/Object.myDynamicMethod:()V
4: areturn

In its current state, this bytecode would not even load, because the verifier would see there's no myDynamicMethod on Object and kick it out. But we want to make a dynamic call, right? So let's transform that virtual invocation into a dynamic one:
public java.lang.Object doDynamicCall(java.lang.Object);
Code:
0: aload_1
1: invokeinterface #3; //Method java/dyn/Dynamic.myDynamicMethod:()V
4: areturn
Hooray! We've set up a dynamic call! Wasn't that easy?

We've made it an interface invocation, so the JVM won't kick it out and it loads happily. And we've provided our "special" marker, the java.dyn.Dynamic interface, so the JVM knows not to do a normal interface invocation. That wraps up the call side...myDynamicMethod is now recognized as an "invokedynamic". But what about the target? How do we route this call to the right place?

Now we finally get to the bootstrap process. In order to make dynamic languages truly first-class citizens on the JVM, they need to be able to actively participate in method dispatch decisions. If method lookup and dispatch is forever only in the hands of the JVM, it's a much more complicated process to do fast dynamic calls. Believe me, I've tried. So John came up with the idea of a "bootstrap" method.

The bootstrap method is simply a piece of code that the JVM can call when it encounters a dynamic invocation. The bootstrap receives all information about the call directly from the JVM itself, makes a decision about where that call needs to go, and provides that information to the JVM. As long as that decision remains valid, meaning future calls are against the same type and method tables don't change, no further calls to the bootstrap are needed. The JVM proceeds to link and optimize the dynamic call as if it were a normal static-typed invocation. Here's what this looks like in practice:
public class DynamicInvokerThingy {
public static Object bootstrap(CallSite site, Object... args) {
MethodHandle target = MethodHandles.findStatic(
MyDynamicTarget.class,
"myDynamicMethod",
MethodType.make(Object.class, site.type().parameterArray()));
site.setTarget(target);

return MyDynamicTarget.myDynamicMethod(args[0]);
}
}
This is a simple bootstrap method for the "myDynamicMethod" call above. When "myDynamicMethod" is invoked, the JVM "upcalls" into this bootstrap method. It provides the original argument list (with the receiver first, since invokeinterface always takes a receiver), and a CallSite. CallSite is a representation of the "site" in the original code where the dynamic invocation came from, and it has a type just like a method handle. In this case, the CallSite.type() is "(Object)Object" since we always pass along the receiver (the one Object argument) and the method returns an Object.

In this case, we're just going to bind any dynamic call coming into this bootstrap to the same method, which might look like this:
public class MyDynamicTarget {
public static Object myDynamicMethod(Object receiver) { ... }
}
Notice that now we actually have a formal argument for the receiver; because we have bound an instance invocation (invokeinterface) to a static method (invokestatic) the receiver becomes the first argument to the call. Back in bootstrap, we retrieve a handle to this method and set it into the CallSite. At this point the CallSite has everything it needs for the JVM to link future calls straight through. As a final step, we perform the invocation ourselves to provide a return value for the current call. And the bootstrap method will never be called for this particular call site again...because the JVM links it straight through.

As I alluded to earlier, we can also invalidate a CallSite by clearing its target. Clearing the target tells the JVM the originally linked method is no longer the right one, please bootstrap again. We're basically a direct participant in the JVM's method selection and linking process. So cool.

Oh, there's one more bit of magic I should show you: how to get from point A to point B, i.e. how to tell the JVM which bootstrap method to use. Remember our SimpleExample class above? The one we coaxed into doing dynamic invocation? Here's how we point SimpleExample's dynamic calls at our bootstrap method...we just this code add to SimpleExample itself:
    static {
Linkage.registerBootstrapMethod(
SimpleExample.class,
MethodHandles.findStatic(DynamicInvokerThingy.class, "bootstrap", Linkage.BOOTSTRAP_METHOD_TYPE));
}
Linkage is another class from InvokeDynamic, responsible primarily for wiring up dynamic-invoker classes to their bootstrap logic. Here we're registering a bootstrap method for SimpleExample by creating a handle to DynamicInvokerThingy.bootstrap. Linkage has a convenient BOOTSTRAP_METHOD_TYPE constant we can use for the type. And that's basically it. What could be easier?

Status

InvokeDynamic is a work in progress. It first successfully performed a dynamic invocation on August 26, 2008 - International InvokeDynamic Day. John had given me wind of the "imminent" event, so I had already started to look at wiring it into JRuby. Ultimately, it was into the first week of September before I got all the bits together and working, but after a day or two of back-and-forth emails, a bug report (I found a bug! I'm helping!), and a little JRuby refactoring, I managed to successfully wire InvokeDynamic directly into JRuby's dispatch process! Such excitement! The code is already in JRuby's trunk, and will ship with JRuby 1.1.5 (though it obviously will be disabled on JVMs without InvokeDynamic).

Now before you go off and get all excited, you should know that I wired it up in probably the most primitive way possible. A lot of the method-adapting logic isn't fully implemented yet, and what is there isn't wired into Hotspot's JIT, so it's still early days. But I'm absolutely giddy when I think about the possibilities of MethodHandles alone, much less the entire InvokeDynamic package all together. It gives me shivers just thinking about it.

(Before you think I'm some kind of crackpot, imagine how much work it's taken to get JRuby running as well as it is today and how much work each tiny incremental improvement requires. The idea that the next round of *major* improvements will be a simple matter of functionally decomposing JRuby's core--something we've wanted to do all along--is pure butterscotches and unicorns.)

And there's also the sobering fact that at best this would be a Java 7 feature; there's no possibility of backporting it other than as an emulation layer. So production users looking for InvokeDynamic-enabled JRuby are going to have to be ambitious or at least wait for Java 7...and that's assuming we're able to get the JSR approved and included (though I'm going to do whatever I can to make that happen).

But at the end of the day, make no mistake: The JVM is going to be the best VM for building dynamic languages, because it already is a dynamic language VM. And InvokeDynamic, by promoting dynamic languages to first-class JVM citizens, will prove it.

More Information

If you'd like to read more about InvokeDynamic, here's a few resources:

The JSR-292 JCP page has a link to the draft document about InvokeDynamic. It's starting to get a little aged now but the general concepts are all there. A good read.

The JRuby SVN repository already contains the early InvokeDynamic work I've done. Look for the classes InvokeDynamicInvocationCompiler and InvokeDynamicSupport, both referenced from StandardASMCompiler. And feel free to email or stop into #jruby on FreeNode IRC if you have questions.

The Multi-Language VM page can get you started with John Rose's InvokeDynamic patches, along with some other oddities like JVM continuations and something called "quid". And you'll need a good walkthrough on building OpenJDK, so try Volker Simonis's OpenJDK instructions for now. Unfortunately the MLVM bits only work on Linux and Solaris builds of OpenJDK at the moment; that will change in the future.

Update: I can't believe I forgot to do my final plug for the JVM Languages Summit, which is coming up at the end of this month. I believe there's still a few seats open. If you're in the SF bay area or feel like taking a trip, the slate of talks is going to be awesome. You will hear John Rose talk about InvokeDynamic, me talk about JRuby past and future, and lots, lots more. There's even a couple Microsofties coming down and a Parrot presentation. Great fun!

And as always, feel free to contact me, comment on this blog, or look me up on IM or IRC. I'm keen to see InvokeDynamic put through its paces all the way through its specification and development process, and I could use some help.

Thank you for your time!

23 comments:

  1. Charles- The link to John Rose's blog post is incorrect.

    ReplyDelete
  2. That was a brilliant post.

    You'd write a great 'Implementing a dynamic language on the JVM' book!

    ReplyDelete
  3. Great post! I am curious about the speed boosts. Does this mean that JRuby execution will approach compiled Java speeds?

    ReplyDelete
  4. I think you meant constPatchMap instead of utf8Map.

    ReplyDelete
  5. Awesome! Thank you for your hard work Charlie!

    ReplyDelete
  6. This is the last hurdle for Jruby ! Can't wait to hear some memory and performance differences

    ReplyDelete
  7. Great post! Just so you know: the wider code snippets look great from Safari, but get truncated in Firefox (at least in FF3 on OS X)

    ReplyDelete
  8. "From a usability perspective, they're both a positive and negative influence; they noise up the code and require more typing, but ..."

    I think the real problem with being forced to add these type declarations is that, apart from this, one's code would often work with many different types. Type declarations aren't just "more typing". They're forcing you to take an algorithm and say "this implementation works for 'int' only".

    What would you say if your professor wrote an algorithm for quicksort, then said "this works only for 32-bit integers!", and then went over to the other side of the blackboard, wrote the exact some algorithm, and declared "this works only for strings!"? That's what mandatory typing feels like to me.

    If I had a dollar for every time I saw the exact same method for int, float, String, ...

    ReplyDelete
  9. Frank: Ahh, I was worried about that; it looks great full screen on my 1440 LCD, but I should probably crop a couple examples a bit smaller.

    ReplyDelete
  10. anonymous: Yes, I agree that's one of the primary reasons dynamic language fans have such disdain for statically typed languages. Even templated languages like C++ and C#, which try to solve the problem with parametric polymorphism, still run into issues: if you're not careful, you end up with an explosion of slightly different types. These days I'm leaning toward an "ideal" language that not only supports limited type inference, but which has the ability to drop into "dynamic mode" for code that warrants it. Sort of a polyglot approach compressed into a single language.

    ReplyDelete
  11. Charles,

    How does AnonymousClassLoader work with permissions? i.e. if there is security manager in place, does the new classloader introduce something that is subject to permissions? I assume yes, but how - is it the same set of permissions as the CodeSource that created it, or some other design ?

    - Paul

    ReplyDelete
  12. Thanks. That was the best (only?) description of invokedynamic that I have ever read. The possibilities are truly huge, and not only for dynamic languages. This could allow for dynamically adding or removing almost any kind of behavior while the program is running. It could be a way to implement AOP natively in the VM.

    ReplyDelete
    Replies
    1. Good call! Yeah, that would be interesting, native AOP as a side-effect of dynamic code manipulation.

      Delete
  13. That's really and excellent description of dynamic invocation on the JVM. Before that I understood that it should simplify (and speed up) dynamic languages implementation on the JVM, but for the first time I think I actually understand how it is working under the hoods !!

    I agree that there is really a bright future for dynamic languages on the JVM, and for JRuby as well ;-)

    ReplyDelete
  14. A great post! Thank you for a vivid description of invokedynamic.

    But, as Alex Tkachman of the Groovy team noted earlier, using invokedynamic would effectively mean either loosing backward-compatibility or having to maintain two sets of code-base: one for older JVMs without invokedynamic, using all the dynamic tricks; another one for a new JVM with invokedynamic, probably faster and easier to implement. How will JRuby address this issue?

    ReplyDelete
  15. Hey Charles, what do you think the potential speed increases are by using InvokeDynamic and its family?

    Could you put your next on the line and give us a percent speed increase?

    :-)

    Keep up the good work Charles

    Chris

    ReplyDelete
  16. Chris Richards: I really couldn't say, but I'll toss out a number like 2x improvement in method call performance. But keep in mind that currently in JRuby method call performance is pretty fast, so it's not likely to be the key bottleneck. Real-world improvement then from invokedynamic may or may not be immediately noticeable.

    However, one key problem with current JRuby is that all calls pass through a single piece of code. That single piece of code ends up looking like a very polymorphic (megamorphic) method call, and so dynamic calls can't inline like normal static-typed calls usually can. invokedynamic will open up the ability to inline even dynamic calls without a lot of fuss, which could easily give hotspot entirely new optimization opportunities like eliminating the cost of Fixnums or call framing. So I'd say invokedynamic will allow us to simplify our codebase at the very least, and could potentially bring Ruby closer to Java-level performance long term.

    ReplyDelete
  17. Greeting Charles.

    I'm trying to figure out how invokedynamic would handle, say, differing arguments but same "source code location".

    For example

    def foo(a:dynamic, b:dynamic)
    a.foo(b)
    end

    How would it accomodate for b being of various types? It's a String once, now it's an Integer. (think of b being an instance of "Object" above I guess).

    It would seem from "CallSite is a representation of the "site" in the original code where the dynamic invocation came from, and it has a type just like a method handle. In this case, the CallSite.type() is "(Object)Object" "

    That perhaps CallSite is linked to the types of the parameters? I couldn't tell from

    public class DynamicInvokerThingy {
    public static Object bootstrap(CallSite site, Object... args) {
    MethodHandle target = MethodHandles.findStatic(
    MyDynamicTarget.class,
    "myDynamicMethod",
    MethodType.make(Object.class, site.type().parameterArray()));
    site.setTarget(target);

    return MyDynamicTarget.myDynamicMethod(args[0]);
    }
    }


    how it was using the args to lookup the "exact perfect match" method (similar to what jruby does when it calls through to a java method, it searches for the best match). Is this information contained at the callsite instance, above, and that's how it does the lookup?

    I assume callsite also includes the return value one could do the equivalent of

    def go(b:Object, d:Object)
    String x = b.invokeDynamic(:method_name, d)
    end

    as well as letting the types of b,d vary, with the JVM choosing the right method based on the various type and the return type works, and this is optimized?

    I guess I'm confused as to how this works compared to ruby's method dispatch.

    Also does this allow for, say

    def go c
    a = get_user_input
    Object b = c.invokedynamic(a) # changing method name
    end

    I guess overall my question is "does invokedynamic accomodate Object typed args in some magic way, or does it need to know each args exact class at compile time (or runtime), and does it accomodate for more than one class callee [different a's at a.invokedynamic(:method_name)] at each callsite, and what can one expect as its return value?"

    Sorry for all the random questions I guess I just don't get it yet, and you said to post comments for questions :P
    Thanks!

    -roger-

    ReplyDelete
  18. Hello Charles, Please update the information about the InvokeDynamic for JAVA 8, because, i think for JAVA 8 there are major changes in InvokeDynamic. So please update the InvokeDynamic Article.

    ReplyDelete