Tuesday, May 9, 2006

A DSL for Bytecode Generation

Having discovered the power and magic of bytecode generation, it occurred to me that none of the existing libraries have the subtle elegance that most code generation tasks really deserve. I believe there's a simple reason for this: they're written in Java. Some of them really try to make things easier, and perhaps come close to succeeding, but they're all still cumbersome, clunky, and very, very verbose.

I have been writing JRuby's compiler in pure ruby by calling out to a Java-based bytecode generation library. My initial attempts were fairly straightforward calls: push this, call that, pop the other. Very linear, very boring, very verbose, and not a great deal simpler than the equivalent Java code. It seemed such a shame to waste an expressive language like Ruby on such a menial task, I've decided to build a domain-specific language for Java bytecode generation.

A short sample of what works today (basically the operations I needed for my test compiler):

class_bytes = ClassBuilder.def_class :public, "FooClass" do |c|
c.def_constructor :public do |con|
con.call_super
con.return_void
end

c.def_method :public, :string, "myMethod", [:void], [:exception] do |m|
m.call_this GenUtils.array_cls(:string), "getStringArr"
m.call_this :string, "getMessage"
m.return_top :ref
end

c.def_method :private, :string, "getMessage", [:void] do |m|
m.construct_obj :stringbuffer, [:string] do |p|
p.constant "Now I will say: "
end

m.call_method :stringbuffer, "append", :string, :stringbuffer do |p|
p.constant "Hello CodeGen!"
end
m.call_method :string, "toString", :void, :stringbuffer
m.return_top :ref
end

c.def_method :public, GenUtils.array_cls(:string), "getStringArr" do |m|
m.construct_array :string, 5 do |p,i|
p.constant "string \##{i}"
end

m.array_set 2, :string do |p|
p.constant "replacement at index 2"
end

m.return_top :ref
end
end


This approach has a number of advantages over others:

  • The structure of the generator is very similar to that of the generated code
  • Method parameters and array initializers (or the code to make them available) are logically associated with the eventual call or array they'll apply to
  • The builders maintain some internal state, and will be able to count stack depth, validate typing, automatically attempt casts, and automatically return the correct types
  • It's far easier to read
  • It's far more fun
Some notes on the code above:
  • The param-building blocks (with |p| params) are in all cases optional. If omitted, method calls will assume you have prepared all params, array creation will create an empty array, and array sets will assume the value is already present on the stack.
  • The core bytecode operations (dup, etc) are still present and callable on the MethodBuilder m. This allows you to fall back to linear-style when necessary.
  • Various classes (perhaps eventually all classes) from java.lang are aliased as symbols like :string and :object. At the ClassBuilder level, it is also possible to "import" classes, as in c.import "javax.swing.JFrame", :jframe and use the aliased symbol throughout this generation (much like import in a .java file)
  • I'm looking for a better way to handle arrays. GenUtils is only used internally except for array types, and I'd like to hide it completely.

I'm tossing this working snippit out to the world for comments and critique...and perhaps as a little teaser of things to come. I'm planning to add a few more operations and port the early v1 compiler over to this soon...then both will develop together. I see this DSL/library as having huge potential for other projects that want a simple, elegant way to do bytecode generation.

Thoughts? Comments?

1 comment:

  1. Very tidy -- a great demonstration of what one can do with a dynamic language!

    ReplyDelete