Tuesday, November 20, 2007

Bytecode Tools in Ruby: A Low-level DSL

I've been toying with the idea of rewriting the JRuby compiler in Ruby, or at least writing the appropriate plumbing that would allow someone to do something similar. Migrating the JRuby compiler may or may not be worth it, since the existing Java compiler is basically done and working well, and a conversion would be sure to introduce bugs here and there. But it would certainly be a show of faith to give it a try.

As part of this effort, I've built up some basic utility code and a simple JVM bytecode builder that could act as the lowest level of such a compiler. I'm looking for input on the syntax at this point, while I take a break from it to explore JRuby Java integration improvements I think should be done before 1.1.

So here's the Ruby source of the builder, as contained within a test case:
require 'test/unit'
require 'compiler/builder'
require 'compiler/signature'

class TestBuilder < Test::Unit::TestCase
import java.lang.String
import java.util.ArrayList
import java.lang.Void
import java.lang.Object
import java.lang.Boolean

include Compiler::Signature

def test_class_builder
cb = Compiler::ClassBuilder.build("MyClass", "MyClass.java") do
field :list, ArrayList

constructor(String, ArrayList) do
aload 0
invokespecial Object, "<init>", Void::TYPE
aload 0
aload 1
aload 2
invokevirtual this, :bar, [ArrayList, String, ArrayList]
aload 0
swap
putfield this, :list, ArrayList
returnvoid
end

static_method(:foo, this, String) do
new this
dup
aload 0
new ArrayList
dup
invokespecial ArrayList, "<init>", Void::TYPE
invokespecial this, "<init>", [Void::TYPE, String, ArrayList]
areturn
end

method(:bar, ArrayList, String, ArrayList) do
aload 1
invokevirtual(String, :toLowerCase, String)
aload 2
swap
invokevirtual(ArrayList, :add, [Boolean::TYPE, Object])
aload 2
areturn
end

method(:getList, ArrayList) do
aload 0
getfield this, :list, ArrayList
areturn
end

static_method(:main, Void::TYPE, String[]) do
aload 0
ldc_int 0
aaload
invokestatic this, :foo, [this, String]
invokevirtual this, :getList, ArrayList
aprintln
returnvoid
end
end

cb.write("MyClass.class")
end
end

For those of you who don't speak bytecode, here's roughly the Java code that this would produce:
import java.util.ArrayList;

public class MyClass {
public ArrayList list;

public MyClass(String a, ArrayList b) {
list = bar(a, b);
}

public static MyClass foo(String a) {
return new MyClass(a, new ArrayList());
}

public ArrayList bar(String a, ArrayList b) {
b.add(a.toLowerCase());
return b;
}

public ArrayList getList() {
return list;
}

public static void main(String[] args) {
System.out.println(foo(args[0]).getList());
}
}

The general idea is that fairly clean-looking Ruby code can be used to generate real Java classes, providing a readable base for code generation tools like compilers.

There's a couple things to notice here:
  • Everything is public. I have not wired in visibility and other modifiers mainly because it starts to look cluttered no matter how I try. Suggestions are welcome.
  • The bytecode, while clean looking, is pretty raw. This interface also doesn't save you from yourself; if you're not ordering your bytecodes right, you'll end up with an unverifiable class file.
  • It's not apparent just from looking at the code which types specified are return values and which are argument values. Something more explicit could be useful here.
I'd like to continue this work. The above code, run against JRuby trunk and the lib/ruby/site_ruby/1.8/compiler library I'm working on, will produce a working MyClass class file:
~/NetBeansProjects/jruby $ jruby test/compiler/test_builder.rb
Loaded suite test/compiler/test_builder
Started
.
Finished in 0.096 seconds.

1 tests, 0 assertions, 0 failures, 0 errors
~/NetBeansProjects/jruby $ java -cp . MyClass foo
[foo]

So it's actually emitting the appropriate bytecode for this class.

Comments? Thoughts for improvement?

3 comments:

  1. I like it, certainly opens up the scope to people interested in having a look.

    As you say, the utility code capabilities are also very useful.

    ReplyDelete
  2. From a user's perspective I'd rather like to see 1.1 out first and performing well than a migration of the compiler to Ruby. Just my two cents..

    ReplyDelete
  3. I found that using strings for method call targets worked really well as opposed to passing multiple arguments. It makes the DSL more readable and closer to the output of ILDASM on the CLR.

    ReplyDelete