Headius: March 2008

Sunday, March 30, 2008

Duby's C Backend POC

Late last week I wired up a quick proof-of-concept C back-end for Duby, to show that it is possible. Duby's design represents all types as symbols, which allows the type representation to vary across platforms. So in fib(), the fixnum type is represented as ":fixnum" throughout, and it can be mapped to whatever the backend decides fixnums should be. This means that, in general, code you write in Duby will type infer (for some definition of infer) and produce an AST suitable for compilation to any backend or type system. And with the appropriate plugins, like the "math" typer from my previous Duby post, you can do cool things like represent integer math as either primitive math operations or method calls against object types. But the source code remains the same.

I also wanted to blunt some criticism I received during the first Duby prototype. It represented types as symbols, but symbols nearly identical to their eventual Java types. In order for Duby to be generally useful (in, for example, Rubinius) platform-agnostic symbolic typing is a must.

So here's the code for my little C backend in its current state. It's pretty crude at the moment, but you can get the general idea. There's obviously work to do: it doesn't correctly insert "return" keywords, doesn't insert end-of-line semicolons everywhere appropriate (and often inserts them in totally inappropriate place), and in general doesn't produce entirely valid C code, but those are mostly minor details; the important takeaway here is that the typing and general Duby AST representation can easily be compiled this way.

Ok then! Given a simple script:

def fib(n)
  {n => :fixnum}
  if n < 2
    n
  else
    fib(n - 1) + fib(n - 2)
  end
end

def go
  fib(35)
end

We can run the C "compiler" thus:

jruby -rduby/plugin/math lib/ruby/site_ruby/1.8/duby/c_compiler.rb fib.rb

And we get a fib.rb.c file as a result:

int fib(int n) {;
if (n < 2) {n} else {fib(n - 1) + fib(n - 2)};
}

;
int go() {fib(35)}

;

As I said, it's not the prettiest thing in the world, but I think it demonstrates that Duby should easily be able to target pretty much any backend platform you like. Fun stuff!

The Power of Java's NIO

Akira Tanaka just did a lightning talk on IO.copy_stream, a new method added to Ruby 1.9 last night that does a fast stream transfer all in C code, avoiding the intermediate strings. It's a good idea, one I hope takes hold for other IO operations that could use some native lovin.

It's so good, in fact, I took five minutes out of my day to produce a crude implementation of it in JRuby (and note, I have not looked at Akira's implementation, so I'm sure it handles cases my version does not):

@JRubyMethod(name = "copy_stream", meta = true, compat = RUBY1_9)
public static IRubyObject copy_stream(
        IRubyObject recv, IRubyObject stream1, IRubyObject stream2)
        throws IOException {
    RubyIO io1 = (RubyIO)stream1;
    RubyIO io2 = (RubyIO)stream2;

    ChannelDescriptor d1 = io1.openFile.getMainStream().getDescriptor();
    if (!d1.isSeekable()) {
        throw recv.getRuntime().newTypeError("only supports file-to-file copy");
    }
    ChannelDescriptor d2 = io2.openFile.getMainStream().getDescriptor();
    if (!d2.isSeekable()) {
        throw recv.getRuntime().newTypeError("only supports file-to-file copy");
    }

    FileChannel f1 = (FileChannel)d1.getChannel();
    FileChannel f2 = (FileChannel)d2.getChannel();

    long size = f1.size();

    f1.transferTo(f2.position(), size, f2);

    return recv.getRuntime().newFixnum(size);
}

It's primitive, but it does the trick. NIO streams unfortunately only provide this nice "transferTo" method for FileChannel, though there's other ways you can do fast copies from stream to stream using buffers. But this was a nice simple way to wire it up for now. So, let's see numbers.

require 'benchmark'

Benchmark.bmbm do |bm|
  bm.report("Control") do
    10000.times do
      File.open(__FILE__) do |file|
        File.open(__FILE__ + ".tmp", 'w') do |file2|
        end
      end
    end
  end
  bm.report("10k file copies") do
    10000.times do
      File.open(__FILE__) do |file|
        File.open(__FILE__ + ".tmp", 'w') do |file2|
          IO.copy_stream(file, file2)
        end
      end
    end
  end
end

First JRuby numbers (excluding rehearsal):

                      user     system      total        real
Control           2.191000   0.000000   2.191000 (  2.191000)
10k file copies   5.190000   0.000000   5.190000 (  5.190000)

Not bad for 10k copies of a file...it works out to about 3s above and beyond the cost of running the blocks and opening/closing the files. How about Ruby 1.9?

                      user     system      total        real
Control           0.360000   0.550000   0.910000 (  0.903211)
10k file copies   0.850000   2.450000   3.300000 (  5.363433)

Ruby 1.9's numbers come out around 4.4s above and beyond the control logic. Obviously we need to look at improving JRuby's numbers for the blocks and file opening, but it's good to know that Java's IO actually *can* be nice and fast when you need it to be.

This feature will only be available as part of JRuby's 1.9 support, which is still in development.

Friday, March 21, 2008

More Fun With Duby

It's been...oh my, almost two weeks since I broke the news about Duby. Since then I attended PyCon and we managed to get JRuby 1.1 RC3 out the door, which is looking like it will become JRuby 1.1 final. But I've still been working on Duby in my spare time. It's kinda nice to have a different "spare time" project than JRuby for a while.

After my previous post, I continued to carry the compiler forward. I managed to get it compiling the following syntactic structures:

until and while loops
static method definitions and calls (using "def self.foo" syntax)
array types (which combined with static calls allowed creating a "main" method that worked!)
instance variables (using @foo to mean the "foo" field on a Java type)
type resolution from any Java type
imports (with "import foo.bar.Baz" syntax)
packages (using "module Foo" syntax, though that will probably change)
type inference across methods on the same type, including smarts for instance vs static

All very exciting stuff, and starting to get very close to a language that would be usable for simple algorithms. Except there was just one problem: every new feature I added made the codebase more difficult to understand.

Reloaded

I had just started to explore the next steps when I realized the codebase was becoming too hard to follow. After chasing around a few inference bugs I decided it was time to take a step back. I also hoped all along to eliminate the hard requirement to declare a return type, since that at least should be apparent from inspecting the method body...but it wouldn't be possible with the original design.

The initial Duby compiler was entirely made up of decorations on the existing Ruby AST. A CallNode, for example, was decorated with methods called "compile", "declared_type", "signature" and so on that were called in various sequences to first infer types and then produce bytecode. There were a few problems with this approach that became apparent as work continued:

The hypothetical Duby AST structure did not map 1:1 to the Ruby AST structure; it was richer, with type information, method signatures, and overloaded/hooked syntax. To support this in the existing AST, many layers of complexity had to be added to the compilation process: "is this call being used as a call or as a type declaration?"
The decorator methods were inextricably entwined with Java/JVM-specific details, be they type names, bytecodes, or just obviously Java-centric syntax. This not only made it more difficult to evolve the language, but made it impossible for the language to live anywhere but on JVM and be compiled by anything but JRuby.
My grand plans for the language were quickly exceeding what would be possible by simply decorating the AST.

The third bullet may spark some interest, so I'll explain very briefly (since it's still just forming in my head). As I thought more about how to map Ruby to the JVM, I realized that very few algorithms, very little syntax couldn't be made to map to *something* fast, static-typed, and conceptually the same. Mix-ins, for example, could easily be implemented like C# 3.0's extension methods. Blocks could be implemented behind the scenes as anonymous types of particular signatures. Even operator overloading could easily be mapped to appropriate methods on Numeric types, Collection types, and many others. The key to all this was having type inferencing and compiler layers that were not just flexible, but infinitely pluggable.

The Dream of Duby

The dream, as I see it, would be to use a Ruby-like syntax to wire up any arbitrary types using what looks generally like idiomatic Ruby code. For example, our friend fib(). The same fib() method I showed before, executing against primitive integer values, could execute against boxed Integer objects or BigInteger objects just by specifying the right plugin. So if within the context of a file, I declare that "fixnum" types are to be represented as BigInteger objects, and math operations against them are to call appropriate methods on BigInteger, so be it...that's what comes out the other side.

The term for this is, I believe, "specialization". In the case above, it's explicit specialization on a case-by-case basis. For many cases, this is all you need...you know the types you're dealing with ahead of time and can comfortably declare them or specify type mappings during compilation. But the more interesting side of this comes from general-purpose specialization in the form of parametric polymorphism...or in terms you might be more familiar with: generics.

I am certainly stepping beyond my current education and understanding here, but the pieces are starting to fit together for me. I and others like me have long been envious of languages like Haskell which infer types so incredibly well. And living between the Ruby and Java worlds, I've often felt like there had to be some middle ground that would satisfy both ends of the spectrum: a static, verified type system flexible enough to consume and specialize against any incoming types just by loading a plugin or recompiling, combined with the cleanest, most expressive (while still being one of the most reable) dynamic language syntaxes around. And so that's where I'd like Duby to fit.

The New Version

So where does it stand now?

Things have been moving fast. With JRuby 1.1 RC3 out of the way, I've taken some time to go back and rework the Duby pipeline.

Now, the only decoration on the Ruby AST is in the form of transformation logic to produce a richer Duby syntax tree. Literals are the same. The three call types have been unified into two: Call and FunctionalCall (against "self"). The various types of code bodies have been unified into a single Body construct. Method definition is represented through MethodDefinition and StaticMethodDefinition, both of which aggregate a "signature" element to represent declared or inferred signature information. The several looping constructs (excluding for loops, which are block-based iterations) have been unified into Loop. And so on. Not all the syntax supported by the Duby prototype has been represented in transformation, but it's not far off. And I'm taking a much more measured approach now.

The new AST structure affords a much more realistic typing and compilation pipeline. Each of the Duby AST nodes defines a single method "infer" which, when passed a Typer, will walk its children and infer as many types as it is able. Each child walks its own children, and so on, unifying and generalizing types as it goes (though the unification and generalization is stubbed out now to only allow exact matches...this will eventually be the responsibility of the back-end to handle). Simple code that calls fully-resolved target methods and has no unknown branches may completely resolve during this first pass.

In more complicated cases, each node that can't make an accurate determination about inferred type registers itself with the Typer in its "deferred" list. Once the initial inference cycle has run, all nodes in the AST will have either successfully inferred their result type or registered with the deferred list. At this point, you can either continue to pass ASTs through the Typer, or you can begin the resolution phase.

To start the resolution phase, the "resolve" method on the Typer is called, which attempts to iteratively resolve all deferred nodes in the AST. This resolution process in theory will loop until either the deferred list has been drained of all nodes (which will presumably then be 100% resolved), or until two successive resolution cycles fail to alter the list (perhaps because more information is needed or there are circular references for which the user must add hints). In general, this means that the deepest unresolved child nodes will fall out first. For example, if you have a method "foo" that calls a method "bar" before "bar" has been defined in the source.

def foo
  bar
end
def bar
  1
end

During the first inference cycle, bar will completely resolve but foo will be deferred. During the resolution phase, foo will now see that bar has been defined and resolved, and will itself resolve. Both return :fixnum (though the decision of what type ":fixnum" resolves to will be left to the compiler backend for a given system).

Back to fib()

Here's our friend fib(), which serves as a nice simple example:

def fib(n)
  {n => :fixnum}

  if n < 2
    n
  else
    fib(n - 1) + fib(n - 2)
  end
end

The fib() method is actually fairly interesting here because it recurses. If this were a simple recursion, it would be impossible to determine what actual type fib returns without an explicit declaration, since no other information is available, and this would produce an error of the following variety:

~/NetBeansProjects/jruby ➔ cat recurse.rb
def recurse
  recurse
end
~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb recurse.rb 
Could not infer typing for the following nodes:
  FunctionalCall(recurse) (child of MethodDefinition(recurse))
  MethodDefinition(recurse) (child of Script)
  Script (child of )

AST:
Script
 MethodDefinition(recurse)
  {:return=>Type(notype)}
  Arguments
  FunctionalCall(recurse)

Here we see a simple "recurse" method that just calls itself, and as you'd expect type inference fails. Because the return value of "recurse" depends on knowing the return value of "recurse", resolution fails.

However in the case of fib(), we don't have a simple recursion, we have a conditional recursion. The default behavior for the Duby "Simple" typer is to assume that if one branch of an If can successfully resolve, that's the type to use (temporarily) as the value of the If (while still marking the if as unresolved, and unifying the two bodies later). And since the root of the fib() method only contains an If, it can successfully resolve. Let's try it:

~/NetBeansProjects/jruby ➔ jruby lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rb
Could not infer typing for the following nodes:
  Call(<) (child of Condition)
  Condition (child of If)
  Call(-) (child of FunctionalCall(fib))
  FunctionalCall(fib) (child of Call(+))
  Call(-) (child of FunctionalCall(fib))
  FunctionalCall(fib) (child of Call(+))
  Call(+) (child of If)
...

Ouch, what happened here? Actually it's pretty easy to understand...the calls to "<", "-", and "+" were unknown to the Typer, and so they could not be resolved. As a result, the If Condition could not resolve, nor could the body of the Else statement. This is not necessarily a fatal state, merely an incomplete one. The "resolve" method on Typer can be called with "true" to force an error to be raised, or with no arguments to just "do the best it can do". In this case, using the simple call to the typer, it raises and displays the error, but there's no reason that more information couldn't be added to the system to allow a subsequent resolution to proceed.

Pluggable Inference

This is where having a pluggable engine starts to come in handy. Though the mechanism is currently pretty crude, there's already a basic ability to specify a plugin for method resolution. In order to enlist in method resolution, a plugin must define a "method_type" method that accepts a parent Typer, a target type, a method name, and a list of parameter types. If at any time method resolution fails in the Simple Typer, the plugin Typers will be called in turn. So in this case, I created a simple "Math" typer that's aware of a few simple operations with LHS and RHS of :fixnum. Let's try it:

~/NetBeansProjects/jruby ➔ jruby -rcompiler/duby/plugin/math lib/ruby/site_ruby/1.8/compiler/duby/typer2.rb fib.rb

AST:
Script
 MethodDefinition(fib)
  {:return=>Type(fixnum), :n=>Type(fixnum)}
  Arguments
   RequiredArgument(n)
  Body
   Noop
   If
    Condition
     Call(<)
      Local(name = n, scope = MethodDefinition(fib))
      Fixnum(2)
    Local(name = n, scope = MethodDefinition(fib))
    Call(+)
     FunctionalCall(fib)
      Call(-)
       Local(name = n, scope = MethodDefinition(fib))
       Fixnum(1)
     FunctionalCall(fib)
      Call(-)
       Local(name = n, scope = MethodDefinition(fib))
       Fixnum(2)

Notice now that the return type of fib() has been correctly inferred to be :fixnum. Huzzah! We are successful!

Debugging

Along with work to make the system more pluggable and the code easier to follow, I've also been trying to provide useful debugging output. Man, I think making debugging output useful and readable is harder than writing a type inference engine in the first place. I must have spent a good hour just tweaking output so it didn't look totally heinous. And it's still not great, but it's definitely usable. Here, for your edification, is the debugging output from type inference on fib.rb:

* [Simple] Learned local type under MethodDefinition(fib) : n = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "<" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "<" Type(fixnum) on Type(fixnum) = Type(boolean)
* [AST] [Call] resolved!
* [AST] [Condition] resolved!
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Retrieved local type in MethodDefinition(fib) : n = Type(fixnum)
* [AST] [Fixnum] resolved!
* [Simple] Method type for "-" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "-" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] Method type for "fib" Type(fixnum) on Type(script) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "fib" Type(fixnum) on Type(script) not found
* [Simple] Deferring inference for FunctionalCall(fib)
* [Simple] Method type for "+"  on  not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "+"  on  not found
* [Simple] Deferring inference for Call(+)
* [Simple] Deferring inference for If
* [Simple] Learned method fib (Type(fixnum)) on Type(script) = Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "fib" Type(fixnum) on Type(script) = Type(fixnum)
* [AST] [FunctionalCall] resolved!
* [Simple] [Cycle 0]: Inferred type for FunctionalCall(fib): Type(fixnum)
* [Simple] Method type for "+" Type(fixnum) on Type(fixnum) not found.
* [Simple] Invoking plugin: #<Compiler::Duby::Typer::MathTyper:0xbc8159>
* [Math] Method type for "+" Type(fixnum) on Type(fixnum) = Type(fixnum)
* [AST] [Call] resolved!
* [Simple] [Cycle 0]: Inferred type for Call(+): Type(fixnum)
* [AST] [If] resolved!
* [Simple] [Cycle 0]: Inferred type for If: Type(fixnum)
* [Simple] Inference cycle 0 resolved all types, exiting

What's Next

I should clarify a few things before getting back to work:

This codebase is mostly separate from, but heavily advised by the original Duby prototype. I learned a lot from that code, and it's still more functional from a compilation standpoint, but it's not really something I can evolve. The new codebase will probably be at the same level of functionality with a week or so.

Largely the more measured pace of this new codebase is because of two key goals.

I'd like to move the system ever toward full, general type inference when possible. That includes inferring method parameters as well as return types. That also means there will have to be a much more comprehensive type-inference engine than the current "simple" engine, but nothing about the current system will break forward compatibility with such work.

I'd also like to see the entire type inferencing pipeline and ideally most of the compilation pipeline entirely platform-independent. There has been a surprising amount of interest in Duby from folks desiring to target LLVM, C, x86 ASM, Parrot, and others. Sadly, without a realistic codebase to work from--one which isolates typing logic from the underlying platform--none of that work has moved forward (though it's almost frightening how quickly people pounced on the idea). So the new system is almost entirely independent of Java, the JVM, and JRuby. Ideally the only pieces you'd need to reimplement (or plugin) for a given platform would be the parser (producing a Duby AST through whatever mechanism you choose) and the type mapper/compiler for a given system. The parser would probably be the more difficult, but since the language syntax is "basically Ruby" and designed to support full AOT compilation you can use any Ruby parser to produce a Ruby AST and then transform it in the same way I transform JRuby's AST. The compiler will mostly be a matter of mapping Duby syntactic structures and inferred generic types to code sequences and native types on your platform of choice.

Over the weekend I'll probably try to absorb the available literature on type inference, to learn what I'm doing wrong and what I could be doing better...but I think my "common sense" approach seems to be working out pretty well so far. We shall see. Suggestions for papers, ideally papers designed for mortals, are welcome.

So there you have it...the Duby update several of you have been asking for. Satisfied for now? :)

(BTW, the code is all available in the JRuby repository, under lib/ruby/site_ruby/1.8/compiler/duby. The new stuff is largely under the ast/ subdir and in the "transform.rb" and "typer2.rb" files. The tests of interest are in test/compiler/duby/test_ast.rb and test_typer.rb)

Monday, March 17, 2008

Another GSoC Idea

We were just discussing GSoC a bit, and another idea occurred to me:

Survey existing language implementations and how they're solving similar problems like POSIX, code generation/compilation, parsing, and so on, and work with project leads to pull out common solutions into reusable APIs.

This would be a huge help to all such language projects, since we all really want to work together but we're generally swamped with fixes and whatnot for our own projects. It would even be possible to narrow the focus a bit, such as to take on POSIX and use Tom Enebo's current library in JRuby as a base to start building out complete JVM POSIX support for all languages. If you're interested in helping JVM languages succeed, this would be a great way to help them all at once.

Monday, March 10, 2008

Duby: A Type-Inferred Ruby-Like JVM Language

It's been one of those nights, my friends! An outstanding night!

All day Sunday I had the stupids. It was probably related to a few beers on Saturday night, or the couple glasses of red wine before that. Whatever it was, I didn't trust myself to work on any JRuby bugs for fear of introducing problems if my brain clicked off mid-stream. So I started playing around with my old bytecode generating DSL from a few months back. Then things got interesting.

We've long wanted to have a "Ruby-like" language, probably a subset of Ruby syntax, that we could compile to solid, fast, idiomatic JVM bytecode. Not a compiler for Ruby, with all the bells and whistles that make Ruby both difficult to support and inefficient to use for implementing itself. A real subset language that produces clean, tight JVM bytecode on par with what you'd get from compiled Java. But better, because it still looks and feels mostly like Ruby.

So I wrote one! And I used my bytecode library too!

Let's say we want to implement our good friend fib.

class Foo
  def fib(n)
    if (n < 2)
      n
    else
      fib(n - 2) + fib(n - 1)
    end
  end
end

This is normal ruby code. Given a Fixnum input, it calculates the appropriate fibbonaci number and returns it. It's slow in Ruby for a few reasons:

In JRuby, it uses a boxed integer value. Matz's Ruby and Rubinius use tagged integers to improve performance, and we rely on the JVM to optimize as much as it can (which turns out to be a *lot*). But it's still way slower than using primitives directly.
The comparison operations, integer math operations, and fib operations are all dynamic invocations. So there's at least a bit of method lookup cost, and then a bunch of abstraction cost. You can reduce it, but you can't eliminate it.
There are many Ruby features that influence performance negatively even when you're not using them. It's very difficult, for example, to optimally store local variables when the local scope can be captured at any time. So either you rely on tricks, or you store local variables on the heap and deal with them being slow.

When working with a statically-typed language, you can eliminate all of this. In Java, for example, you have both object types and primitive types; primitive operations are extremely fast and eventually JIT down to machine-code equivalents; and the feature set is suitably narrow to allow current JVMs to do amazing levels of optimization.

But of course Java has its problems too. For one, it does very little guessing or "inferring" of types for you, which means you generally have to declare them all over the place. On local variables, on parameters, on return types. C# 3.0 aims to correct this by adding type inference all over, but then there's still other syntactic hassle using any C-like language: curly braces, semicolons, and other gratuitous syntax that make up a lot of visual noise.

Wouldn't be nice if we could take the code above, add some type inference logic, and turn it into a fast, static-typed language?

class Foo
  def fib(n)
    {n => java.lang.Integer::TYPE, :return => java.lang.Integer::TYPE}
    if (n < 2)
      n
    else
      fib(n - 2) + fib(n - 1)
    end
  end
end

And there it is! This is the same code as before, but now it's been decorated with a little type declaration block (in the form of a Ruby hash/map) immediately preceding the body of the method. The type decl describes that the 'n' argument is to be mapped to a primitive int, and the method itself will return a primitive int (and yes, I know those could be inferred too...it shall be soon). The rest of the method just works like you'd expect, except that it's all primitive operations, chosen based on the inferred types. For the bold, here's the javap disassembly output from the compiler:

Compiled from "superfib.rb"
public class Foo extends java.lang.Object{
public int fib(int);
  Code:
   0: iload_1
   1: ldc #10; //int 2
   3: if_icmpge 10
   6: iload_1
   7: goto 27
   10: aload_0
   11: iload_1
   12: ldc #10; //int 2
   14: isub
   15: invokevirtual #12; //Method fib:(I)I
   18: aload_0
   19: iload_1
   20: ldc #13; //int 1
   22: isub
   23: invokevirtual #12; //Method fib:(I)I
   26: iadd
   27: ireturn

public Foo();
  Code:
   0: aload_0
   1: invokespecial #15; //Method java/lang/Object."<init>":()V
   4: return

}

A few items to point out:

A default constructor is generated, as you'd expect in Java. This will be expected to also recognize "def initialize" constructors. I haven't decided if I'll allow overloading or not.
Notice the type signature for fib and all the type signatures for calls it makes are correctly inferred to the correct types.
Notice all the comparison and arithmetic operations are compiled to the correct bytecodes (iadd, isub, if_icmpge and so on).

And the performance is what you'd hope for:

$ jruby -rjava -e "t = Time.now; 5.times {Java::Foo.new.fib(35)}; p Time.now - t"
0.681
$ jruby -rnormalfib -e "t = Time.now; 5.times {Foo.new.fib(35)}; p Time.now - t"
27.851

Here's another example, with some string operations thrown in:

class Foo
 def bar
   {:return => java.lang.String}
 
   'here'
 end
end

class Foo
 # reopening classes works in the same file only (for now)
 def baz(a)
   {a => java.lang.String}
 
   b = "foo"
   a = a + bar + b
   puts a
 end
end

It works, of course:

$ jruby -rjava -e "Java::Foo.new.baz('Type inference is fun')"
Type inference is funherefoo

And once again, the disassembled output:

Compiled from "stringthing.rb"
public class Foo extends java.lang.Object{
public java.lang.String bar();
 Code:
  0: ldc #13; //String here
  2: areturn

public void baz(java.lang.String);
 Code:
  0: ldc #15; //String foo
  2: astore_2
  3: aload_1
  4: checkcast #17; //class java/lang/String
  7: aload_0
  8: invokevirtual #19; //Method bar:()Ljava/lang/String;
  11: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
  14: checkcast #17; //class java/lang/String
  17: aload_2
  18: invokevirtual #23; //Method java/lang/String.concat:(Ljava/lang/String;)Ljava/lang/String;
  21: astore_1
  22: getstatic #29; //Field java/lang/System.out:Ljava/io/PrintStream;
  25: aload_1
  26: invokevirtual #34; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
  29: return

public Foo();
 Code:
  0: aload_0
  1: invokespecial #36; //Method java/lang/Object."<init>":()V
  4: return

}

Notice here that the + operation was detected as acting on two strings, so it was compiled to call String#concat rather than try to do a numeric operation. These sorts of mappings are simple to add, and since there's type information everywhere it's also easy to come up with cool ways to map Ruby syntax to Java types.

My working name for this is going to be Duby, pronounced "Doobie", after Duke plus Ruby. Duby! It has a nice ring to it. It may be subject to change, but that's what we'll go with for the moment.

Currently, Duby supports only the features you see here. It's a very limited subset of Ruby right now, and the subset doesn't support all Java primitive types, for example, so there's a lot of blanks to be filled. It also doesn't support static (class) methods, doesn't wire up "initialize" methods, doesn't support packages (for namespacing) or imports (to shrink type names) or superclasses or interfaces or enums or generics or what have you. But it's already functional for simple code, as you see here, so I think it's a great start for 10 hours of work. The rest will come, as needed and as time is available.

What are my plans for it? Well many of you may know Rubinius, an effort to reimplement Ruby in Ruby, modeled after the design of many Smalltalk VMs. Well, in order to make JRuby more approachable to Ruby developers, Duby seems like the best way to go. They'll be able to write mostly idiomatic Ruby code and know it will both perform at Java speeds and provide compile-type checking that all is wired up correctly. So we can avoid the overhead of "turtles all the way down" by just teaching one turtle how to speak JVM bytecode and building on that.

I also hope this will lead to lots of new ideas about how to implement languages for the JVM. It's certainly attractive to language users to be able to contribute to language implementations using the language in question, and if we can come up with compilers and sub-languages like Duby it could make the JVM more approachable to a wide range of developers.

Oh, and for those of you curious: the Duby compiler is written mostly in Ruby too.

Saturday, March 8, 2008

RubyInline for JRuby? Easy!

With JRuby 1.1 approaching, performance looking great, and general Ruby compatibility better than its ever been, we've started to branch out into more libraries and applications. So a couple days ago, I thought I'd make good on a promise to myself and have a look at getting RubyInline working on JRuby.

RubyInline is a library by Ryan "zenspider" Davis which allows you to embed snippits of C code into your Ruby scripts. RubyInline does a minimal parse of that source, and based on the function signature you provide it wires it to the containing class as a Ruby method entry point and performs appropriate entry and exit type conversion to whatever C types you happen to use. It's particularly useful if you have a small algorithm you need to run fast, and you'd like to run it in a somewhat faster language by "throwing work over" to it.

Here's the example of RubyInline from Ryan's page:

class MyTest

  def factorial(n)
    f = 1
    n.downto(2) { |x| f *= x }
    f
  end

  inline do |builder|
    builder.c "
    long factorial_c(int max) {
      int i=max, result=1;
      while (i >= 2) { result *= i--; }
      return result;
    }"
  end
end

The interesting bit is the builder.c call, one of several functions on the C builder. Others allow you to add arbitrary preamble code, imports, and "bare" methods (with no type conversion) among other things. And as you'd expect, performance is greatly improved by writing some algorithms in C instead of Ruby.

Naturally, we want to have the same thing work in Java, and ideally use the same API and the same RubyInline plumbing for the rest of it. So then, I present to you java_inline, a RubyInline builder for JRuby!

This represents about four hours of work, and although it doesn't yet have a complete complement of tests and could use some "robustification", it's already working in about 100 lines of code. It's made far easier in JRuby than in C Ruby because we already have a full-features Java integration layer to handle the argument mapping.

Here's a sample java_inline script to show what it looks like, similar to the fastmath.rb sample provided with RubyInline.

So how does it work? Well the Ruby side of things is largely the same as RubyInline's C builder...parse signature, compose the code together and compile it, and bind the method. It wires directly into the RubyInline pipeline, so all you need to do is install RubyInline, require the java_inline.rb script and you're all set. On the Java side, it's using the Java Compiler API, provided in Java 6 implementations. (OT: This has to be the worst-designed API I've ever seen. Go see for yourself. It's cruel and unusual. I won't dwell on it.) So yes, this will only work on Java 6. Deal with it...or submit a patch to get it working on Java 5 as well :)

It's not released in any official form yet, but I'll probably try to wire up a gem or something. I have to make sure I'm dotting my eyes and crossing my tees when I release stuff, even if it's only 100loc. But the repository is obviously public, so play with it, submit patches and improvements, and let me know if you'd like to use it or help work on it more. I'm also interest in suggestions for other libraries you'd like to see special JRuby support for, so pass that along too.

Thursday, March 6, 2008

PyCon, Euruko, and Scotland on Rails

Upcoming event round-up!

Next weekend, Tom Enebo, Nick Sieger and I will be at PyCon, checking out all the pythony goodness and hooking up with the Jython guys for some hacking.
Tom and I will travel to Prague for Euruko 2008, the European RubyConf. We'll present JRuby, unsurprisingly, and hopefully meet up with more European Rubyists we haven't talked to before.
After Euruko, Tom and I will be in Edinburgh for Scotland on Rails. We're supposed to do a 90-minute talk about JRuby on Rails. We'll try to make it entertaining and informative.

Monday, March 3, 2008

Welcome Pythonistas to Sun!

Today we can finally announce another exciting language-related event: we've hired Frank Wierzbicki of the Jython project and Ted Leung of OSAF (and a bunch of other stuff) to help with the Python story at Sun. Hooray!

Frank Wierzbicki has been tirelessly plodding away on Jython for years, since the early days. Even when Jython was considered by many to be "dead", Frank kept the project moving forward as best he could. Now he's been a major part of Jython's revitalization...a new release last year and rapid work to get compatibility up to current-Python levels prove that.

Ted Leung has long been involved with the Open Source Applications Foundation and the Apache Software Foundation, and was one of the original developers of the Xerces Java XML parser. I don't know Ted personally, but I'm excited to be working with him, especially in light of some of this previous roles.

So what does this mean for Sun? Well, it means we're serious about improving support for Python on Sun platforms. Jython is a big part of the story, since they have many challenges similar to JRuby, but a bunch of new ones as well. So we'll be looking to share key libraries and subsystems as much as possible, and we'll start looking at Jython as another driver for future JVM and platform improvement. On the non-Java side of the world, it means we'll ramp up support for Python itself. Sun has already settled on Mercurial for source control for all our OSS projects, and the package management system being worked up for Indiana is written in Python as well. But there's more we need to do.

Please help me in congratulating Frank and Ted on their new roles at Sun. It's going to be another interesting year for polyglots :)