Sunday, July 9, 2006

Is Reflection Really as Fast as Direct Invocation?

This was originally posted to the jruby-devel mailing list, but I am desperate to be proven wrong here. We use reflection extensively to bind Ruby methods to Java impls in JRuby, and the rumors of how fast reflection is have always bothered me. What is the truth? Certainly there are optimizations that make reflection very fast, but as fast as INVOKEINTERFACE and friends? Show me the numbers! Prove me wrong!!

--

It has long been assumed that reflection is fast, and that much is true. The JVM has done some amazing things to make reflected calls really f'n fast these days, and for most scenarios they're as fast as you'd ever want them to be. I certainly don't know the details, but the rumors are that there's code generation going on, reflection calls are actually doing direct calls, the devil and souls are involved, and so on. Many stories, but not a lot of concrete evidence.

A while back, I started playing around with a "direct invocation method" in JRuby. Basically, it's an interface that provides an "invoke" method. The idea is that for every Ruby method we provide in Java code you would create an implementation of this interface; then when the time comes to invoke those methods, we are doing an INVOKEINTERFACE bytecode rather than a call through reflection code.

The down side is that this would create a class for every Ruby method, which amounts to probably several hundred classes. That's certainly not ideal, but perhaps manageable considering you'd have JRuby loaded once in a whole JVM for all uses of it. It could also be mitigated by only doing this for heavily-hit methods. Still, requiring lots of punky little classes is a big deal. [OT: Oh what I would give for delegates right about now...]

The up side, or so I hoped, would be that a straight INVOKEINTERFACE would be faster than a reflected call, regardless of any optimization going on, and we wouldn't have to do any wacked-out code generation.

Initial results seemed to agree with the upside, but in the long term nothing seemed to speed up all that much. There's actually a number of these "direct invocation methods" still in the codebase, specifically for a few heavily-hit String methods like hash, [], and so on.

So I figured I'd resolve this question once and for all in my mind. Is a reflected call as fast as this "direct invocation"?

A test case is attached. I ran the loops for ten million invocations...then ran them again timed, so that hotspot could do its thing. The results are below for both pure interpreter and hotspotted runs (time are in ms).

Hotspotted:
first time reflected: 293
second time reflected: 211
total invocations: 20000000
first time direct: 16
second time direct: 8
total invocations: 20000000

Interpreted:
first time reflected: 9247
second time reflected: 9237
total invocations: 20000000
first time direct: 899
second time direct: 893
total invocations: 20000000

I would really love for someone to prove me wrong, but according to this simple benchmark, direct invocation is faster--way, way faster--in all cases. It's obviously way faster when we're purely interpreting or before hotspot kicks in, but it's even faster after hotspot. I made both invocations increment a static variable, which I'm hoping prevented hotspot from optimizing code into oblivion. However even if hotspot IS optimizing something away, it's apparent that it does a better job on direct invocations. I know hotspot does some inlining of code when it's appropriate to do so...perhaps reflected code is impossible to inline?

Anyone care to comment? I wouldn't mind speeding up Java-native method invocations by a factor of ten, even if it did mean a bunch of extra classes. We could even selectively "directify" methods, like do everything in Kernel and Object and specific methods elsewhere.

--

The test case was attached to my email...I include the test case contents here for your consumption.

private static interface DirectCall {
public void call();
}

public static class DirectCallImpl implements DirectCall {
public static int callCount = 0;
public void call() { callCount += 1; }
}

public static DirectCall dci = new DirectCallImpl();

public static int callCount = 0;
public static void call() { callCount += 1; }

public void testReflected() {
try {
Method callMethod = getClass().getMethod("call", new Class[0]);

long time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
callMethod.invoke(null, null);
}
System.out.println("first time reflected: " + (System.currentTimeMillis() - time));
time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
callMethod.invoke(null, null);
}
System.out.println("second time reflected: " + (System.currentTimeMillis() - time));
System.out.println("total invocations: " + callCount);
} catch (Exception e) {
e.printStackTrace();
assertTrue(false);
}
}

public void testDirect() {
long time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
dci.call();
}
System.out.println("first time direct: " + (System.currentTimeMillis() - time));
time = System.currentTimeMillis();
for (int i = 0; i < 10000000; i++) {
dci.call();
}
System.out.println("second time direct: " + (System.currentTimeMillis() - time));
System.out.println("total invocations: " + DirectCallImpl.callCount);
}


Update: A commenter noticed that the original code was allocating a new Object[0] for every call to the reflected method; that was a rather dumb mistake on my part. The commenter also noted that I was doing a direct call to the impl rather than a call to the interface, which was also true. I updated the above code and re-ran the numbers, and reflection does much better as a result...but still not as fast as the direct call:

Hotspotted:

first time reflected: 146
second time reflected: 109
total invocations: 20000000
first time direct: 15
second time direct: 8
total invocations: 20000000

Interpreted:

first time reflected: 6560
second time reflected: 6565
total invocations: 20000000
first time direct: 912
second time direct: 920
total invocations: 20000000

11 comments:

  1. Yes, I have also heard that in the way reflection is implemented in the Sun JVM enviroment, internal classes are generated on the fly for reflection. However, this does NOT mean that reflective calls is as fast as normal calls. Far from it.

    I can't find the reference right now but unfortately I have also heard of potential problems with garbage collection if you do a lot of new reflective calls to new classes... As far as I recall the problem is that the generated reflective classes can't be garbage collected even if they are not used anymore. It has been a while since I read this. Don't know if this problem has been solved in recent JVMs.

    ReplyDelete
  2. Juha: My goodness, I can't believe I didn't see that...of course the allocation of the empty Object array would slow things down. I will post updated numbers based on a more correct version.

    I am thinking more and more that inlining has a lot to do with this; I think also that optimizing one layer of indirection (interface call) is always going to be faster than optimizing multiple layers of indirection (3-4 levels of reflection classes), regardless of the magic performed.

    Direct invocation should be an option for us, but inlining may not play as much of a role since we'll keep all such "invokers" in a hash. That said, I think there's still a good performance gain to be had.

    ReplyDelete
  3. If you use the Server JVM (-server flag) reflection does better relative to the direct call.

    ReplyDelete
  4. Chris: Running with -server didn't change the results much for me. Here's results with -server looping 100 million times:

    first time reflected: 1115
    second time reflected: 1036
    total invocations: 200000000
    first time direct: 25
    second time direct: 17
    total invocations: 200000000

    ReplyDelete
  5. (BTW, this is 1.5.0_06, if it matters)

    ReplyDelete
  6. Sorry for the confusion. Safe to say reflection will never be faster than direct :-)

    ReplyDelete
  7. Maybe you can use class-generation-on-fly (CGLib, etc) to generate all these java-to-ruby class? Of course, with some kind of cache to avoid the penalty of generation ...

    ReplyDelete
  8. FWIW CGLIB has some classes for improving reflection performance (you swap out your use of Class, Method with FastClass and FastMethod). However we have considered removing them in future versions because reflection is much better nowadays.

    The downside to dynamic generation (and to a lesser degree pre-generation) is slower startup when generating/verifying the classes and running out of PermGen memory. At the very least you should try to combine multiple methods stubs into a single class, instead of one class per method.

    I'm happy to help with any of this, just let me know.

    ReplyDelete
  9. Raphael: Yes, we could do some code generation, and yes we could target our "direct invocation" to heavily-hit core methods. Both options are on the table...we just need to decide on the best way to go and do some performance testing.

    And JRuby does run great under Mustang; on my primitive benchmarks it's something like 20-30% faster than 1.5. Really amazing work from the Sun JVM team.

    ReplyDelete
  10. You might also want to talk with the pnuts folks. I hear they do some code generation for dodging reflection, too, and they supposedly get good performance out of it. Maybe they can provide some advice from their experience.

    ReplyDelete
  11. this is very old but you could do a callMethod.setAccessible(true); to cut half of the time taken in reflection.

    ReplyDelete