Saturday, May 11, 2013

On Languages, VMs, Optimization, and the Way of the World

I shouldn't be up this late, but I've been doing lots of thinking and exploring tonight.

In studying various VMs over the past few years, I've come up with a list of do's and don't that make things optimize right. These apply to languages, the structures that back them, and the VMs that optimize those languages, and from what I've seen there's a lot of immutable truths here given current optimization technology.

Let's dive in.

#1: Types don't have to be static

JVM and other dynamic-optimizing runtimes have proven this out. At runtime, it's possible to gather the same information static types would provide you at compile time, leading to optimizations at least as good as fully statically-typed, statically-optimized code. In some cases, it may be possible to do a better job, since runtime profiling is based on real execution, real branch percentages, real behavior, rather than a guess at what a program might do. You could probably make the claim that static optimization is a halting problem, and dynamic optimization eventually can beat it by definition since it can optimize what the program is actually doing.

However, this requires one key thing to really work well.

#2: Types need to be predictable

In order for runtime optimization to happen, objects need to have predictable types and those types need to have a predictable structure. This isn't to say that types must be statically declared...they just need to look the same on repeat visits. If objects can change type (smalltalk's become, perl's and C's weak typing) you're forced to include more guards against those changes, or you're forced to invalidate more code whenever something changes (or in the case of C, you just completely shit the bed when things aren't as expected). If change is possible and exposed at a language level, there may be nothing you can do to cope with all those different type shapes, and optimization can only go so far.

This applies both to the shape of a type's method table (methods remaining consistent once encountered) and the shape of the type's instances (predictable object layout). Many dynamically-typed languages impose dynamic type shape and object shape on VMs that run them, preventing those VMs from making useful predictions about how to optimize code. Optimistic predictions (generating synthetic types for known type shapes or preemptively allocating objects based on previously-seen shapes) still have to include fallback logic to maintain the mutable behavior, should it ever be needed. Again, optimization potential is limited, because the shape of the world can change on a whim and the VM has to be vigilent

The alternative summation of #1 and #2 is that types don't have to be statically declared, but they need to be statically defined. Most popular dynamic languages do neither, but all they really need to do is the latter.

#3: You can't cheat the CPU

Regardless of how clever you'd like to be in your code or language or VM or JIT, the limiting factor is how modern CPUs actually run your code. There's a long list of expectations you must meet to squeeze every last drop of speed out of a system, and diverging from those guidelines will always impose a penalty. This is the end...the bottom turtle...the unifying theory. It is, at the end of the day, the CPU you must appease to get the best performance. All other considerations fall out of that, and anywhere performance does not live up to expectations you are guaranteed to discover that someone tried to cheat the CPU.

Traditionally, static typing was the best way to guarantee we produced good CPU instructions. It gave us a clear picture of the world we could ponder and meditate over, eventually boiling out the secrets of the universe and producing the fastest possible code. But that always assumed a narrow vision of a world with unlimited resources. It assumed we could make all the right decisions for a program ahead of time and that no limitations outside our target instruction set would ever affect us. In the real world, however, CPUs have limited cache sizes, multiple threads, bottlenecked memory pipelines, and basic physics to contend with (you can only push so many electrons through a given piece of matter without blowing it up). Language and VM authors ignore the expectations of their target systems only at great peril.

Let's look at a few languages and where they fit.

Language Scorecard

Java is statically typed and types are of a fixed shape. This is the ideal situation mostly because of the type structure being predictable. Once encountered, a rose is just a rose. Given appropriate dynamic optimizations, there's no reason Java code can't compete with or surpass statically-typed and statically-compiled C/++, and in theory there's nothing preventing Java code from becoming optimal CPU instructions.

Dart is dynamically typed (or at least, types are optional and the VM doesn't care about them), but types are of a fixed shape. If programmers can tolerate fixed-shape types, Dart provides a very nice dynamic language that still can achieve the same optimizations as statically-typed Java or statically-compiled C/++.

Groovy is dynamically typed with some inference and optimization if you specify static types, but most (all?) types defined in Groovy are not guaranteed to be a fixed shape. As a result, even when specifying static types, guards must be inserted to check that those types' shapes have not changed. Groovy does, however, guarantee object shape is consistent over time, which avoids overhead from being able to reshape objects at runtime.

Ruby and JavaScript are dynamically typed and types and objects can change shape at runtime. This is a confluence of all the hardest-to-optimize language characteristics. In both cases, the best we can do is to attempt to predict common type and object shapes and insert guards for when we're wrong, but it's not possible to achieve the performance of a system with fully-predictable type and object shapes. Prove me wrong.

Now of course when I say it's not possible, I mean it's not possible for the general case. Specific cases of a known closed-world application can indeed be optimized as though the types and objects involved had static shapes. I do something along these lines in my RubyFlux compiler, which statically analyzes incoming Ruby code and assumes the methods it sees defined and the fields it sees accessed will be the only methods and fields it ever needs to worry about. But that requires omitting features that can mutate type and object structure, or else you have to have a way to know which types and objects those features will affect. Sufficiently smart compiler indeed.

Python has similar structural complexities to Ruby and adds in the additional complexity of an introspectable call stack. Under those circumstances, even on-stack execution state is not safe; a VM can't even make guarantees about the values it has in hand or the shape of a given call's activation. PyPy does an admirable job of attacking this problem by rewriting currently-running code and lifting on-stack state to the heap when it is accessed, but this approach prevents dropping unused local state (since you can't predict who might want to see it) and also fails to work under parallel execution (since you can't rewrite code another thread might be executing). Again, the dynamicity of a "cool" feature brings with it intrinsic penalties that are reducible but not removable.

Get to the Damn Point, Already

So what am I trying to say in all this? I started the evening by exploring a benchmark post comparing Dart's VM with JVM on the same benchmark. The numbers were not actually very exciting...with a line-by-line port from Dart to Java, Java came out slightly behind Dart. With a few modifications to the Java code, Java pulled slightly ahead. With additional modifications to the Dart code, it might leapfrog Java again. But this isn't interesting because Dart and Java can both rely on type and object shapes remaining consistent, and as a result the optimizations they perform can basically accomplish the same thing. Where it matters, they're similar enough that VMs don't care about the differences.

Where does this put languages I love, like Ruby? It's probably fair to concede that Ruby can't ever achieve the raw, straight-line performance of type-static (not statically-typed) languages like Dart or Java, regardless of the VM technologies involved. We'll be able to get close; JRuby can, with the help of invokedynamic, make method calls *nearly* as fast as Java calls, and by generating type shapes we can make object state *nearly* as predictable as Java types, but we can't go all the way. Regardless of how great the underlying VM is, if you can't hold to its immutable truths, you're walking against the wind. Ruby on Dart would probably not be any faster than Ruby on JVM, because you'd still have to implement mutable types and growable objects in pretty much the same way. Ruby on PyPy might be able to go farther, since the VM is designed for mutable types and growable objects, but you might have to sacrifice parallelism or accept that straight-line object-manipulating performance won't go all the way to a Java or Dart. Conversely, languages that make those type-static guarantees might be able to beat dynamic languages when running on dynamic language VMs (e.g. dart2js) for exactly the same reasons that they excel on their own VMs: they provide a more consistent view of the world, and offer no surprises to the VM that would hinder optimization. You trade dynamicity at the language level for predictability at the VM level.

The Actual Lesson

I guess the bottom line for me is realizing that there's always going to be a conflict between what programmers want out of programming languages and what's actually possible to give them. There's no magical fairy world where every language can be as fast as every other language, because there's no way to predict how every program is going to execute (or in truth, how a given program is going to execute given a general strategy). And that's ok; most of these languages can still get very close to each other in performance, and over time the dynamic type/object-shaped languages may offer ways to ratchet down some of that dynamism...or they might not care and just accept what limitations result. The important thing is for language users to recognize that nothing is free, and to understand the implications of language features and design decisions they make in their own programs.