Headius: What Would I (Will I?) Change About Ruby

Tuesday, April 24, 2007

What Would I (Will I?) Change About Ruby

The latest Ruby Blogging Contest hits close to home: What changes would make Ruby into a better language without making it into something that isn't Ruby?

As you might guess, I've got some pretty strong thoughts here. I'm not a heavy Rails user, and I'm not as heavy a Ruby user as I'd like to be. But implementing a Ruby interpreter and now compiler has taught me a few things about what's right and what's wrong with Ruby. I'm not going to complain about performance, whine that the C code is too hard to follow, or even attack C-based extensions. Those may be important issues, but they're all fixable in the long term without breaking anything that works today (or by providing reasonable substitutes). I'm also not going to go into language design ideas...I have mine, you have yours, Matz has his. But my money's on Matz to do the "right thing" with regards to actual language design.

What I'm talking about are a few really important changes to the Ruby runtime, libraries, and ecosystem. Take these as my educated opinions...and don't think too hard about whether I'll be working to change these things in JRuby and in the wider Ruby world.

1. Threading

This more than any other area probably means the most visible changes to Ruby. Ruby currently is green-threaded, as most of you know. JRuby implements native threads mainly because Java uses native threads...we just piggyback off the excellent work of the JVM engineers. And the developing Ruby 1.9, the future successor to the current version 1.8 C implementation, provides something in the middle: native threads with a giant lock, so threads won't run concurrently.

So in general, Ruby is trending toward support for native threads. But there's a problem...some of Ruby's current APIs are impossible to do safely with native threads (and in general, impossible to really do safely with green threads...Ruby just does them anyway). Threading needs to be improved, with support for concurrent execution and removal of operations that prevent that.

Specifically, the following operations and features are inherently unsafe, and are not supported by any mature threaded system:

Thread#kill: Killing one thread from another may leave its locks and resources in an unpredictable state. JRuby currently implements this by setting a kill flag on the target thread and waiting for it to die--basically asking the thread to "please die yourself"--but it's not deterministic and the thread could fail to die.
Thread#raise: Forcing another thread to raise an exception can have the same effect as kill, since the thread may not expect to handle the given exception and may not be able to release locks or tidy up resources. JRuby handles this similar to kill, by setting a field to contain the exception a target thread should "please raise", but again it's not deterministic and there's no way to guarantee the target thread will raise.
Thread#critical=: There is no way to deterministically force true concurrent threads to stop and wait for the the current thread, not to mention the horrendous race conditions that can result when locks are involved. As a result of the many critical problems with critical=, it is already slated to be removed in Ruby 1.9/2.0.

In order for Ruby to survive in a parallel-processing era, unsafe threading operations need to go, and any libraries or apps that depend on them need to find new ways to solve these problems. Sorry folks, these aren't my rules. I understand why people like these features...I like them too. But you can't have your concurrency and eat it too.

2. ObjectSpace

ObjectSpace is Ruby's view into the garbage-collected heap. You can use it to iterate over all objects of a particular type, attach finalizers to any object, look an object up by its object ID, and so on. In Ruby, it's a pretty low-cost heap-walker, able to dig up objects matching particular criteria for you on a whim. It sounds like it might be pretty useful, but it's used by very few libraries...and most of those uses can be implemented in other (potentially more efficient) ways.

JRuby implements ObjectSpace by keeping a separate linked list in memory of weak references to created objects. This means that for every ObjectSpace-aware object that's created, a weakref is added to this list. When the object is collected, the weakref removes itself from the list. Walking all objects of a particular type just involves walking that list. Reconstituting an object ID into the object it references is supported by a separate weak list (again, more memory overhead).

There are no plans currently for ObjectSpace to be removed from Ruby in a future version. But there's a problem...in addition to being pure overhead in JRuby (which you can turn off completely by using the -O flag), ObjectSpace limits evolving development of the Ruby garbage collector, breaks heap and memory transparency, and poses yet more problems for threading.

There are many issues here. First off, the JRuby thing. By having to add ObjectSpace governors for all objects in the system, JRuby pays a very large penalty. We're forced to do this because the JVM (and most other advanced garbage-collecting VMs) does not allow you to traverse in-memory objects nor retrieve the object that is associated with a given ID. In general this is because the JVM does all sorts of wonderful and magical things with objects and memory behind the scenes, and the ability to ask for all objects of a given type or pull an object based on some ID number at any time cripples many of these tricks.

The threading issues are perhaps more important. Imagine if you will a true concurrent VM, with many threads creating objects, maybe one or more threads collecting garbage, and synchronizing all this to guarantee the integrity and efficiency of the heap and garbage collector. There is absolutely no room in this scenario for those multiple threads to request lists of specifically-typed objects at any time, nor to provide an ID and expect its object to be presented to you. These features break encapsulation across threads, they violate security restrictions from thread to thread, and they require whole new levels of locking to ensure that while reading from the heap no other thread produces new objects and no garbage collection occurs. As a result, ObjectSpace harms Ruby by limiting the flexibility of its garbage collecting and threading subsystems, and should be eliminated.

3. $SAFE and tainting

Safe levels are a fairly simple, straightforward way to set a "security level" that governs what operations are possible for a given thread. By setting the safe level to various values, you can limit modification of Object, prevent IO, disallow creation of new methods or classes, and so on. Added to this is the ability to "taint" or "untaint" objects. Tainted objects are considered "unsafe", and so certain security levels will cause errors to be thrown when those objects are passed to safe-only operations.

JRuby has safe level and tainting checks in place, but it's almost assured they're not working correctly. We have never tested them, largely because practically no tests (or perhaps literally no tests) use safe levels or tainting, and we've had *exactly one* bug report relating to safe levels, just a couple weeks ago. And to further kill the possibility of JRuby ever supporting safe levels and tainting correctly, my work tonight to fix some safe level issues revealed that doing so would add a tremendous amount of overhead to almost all critical operations like method creation, module/class mutation, and worst of all, object creation.

At this point, safe levels will probably remain in their current half-implemented state for 1.0, but I think it's almost decided for us that safe levels and tainting will simply not be supported in JRuby. In their place, we'll do two things (which I'd recommend the C implementation consider as well:

Recommend that people who really want "safe" environments use an approach like whytheluckystiff's Sandbox, which takes a more JVM-like approach to safety: it runs code in a true sandboxed sub-runtime with only "safe" operations even defined. In other words, not only is it disallowed to load in files or hit the network, it's physically *impossible* to do so. What makes this even better is that Sandbox is already supported in JRuby (gem "javasand") and JRuby out of the box allows a fine granularity of operations to be disabled in new runtimes.
Implement safe levels like Java handles security restrictions, which we get to leverage since they're already being checked and enforced at the JVM level. We will not be able to map everything...for obvious reasons, checking tainted strings all the time or limiting class and method creation are unlikely to ever happen, but we can limit those operations that the JVM allows us to limit, like loading remote code, opening sockets, accessing local files, and so on. So it's highly likely JRuby's implementation of safe levels will map to clearly-defined sets of Java security restrictions in the near future.

4. Direction

Ruby is a very free-form community. Matz is the most benevolent dictator I've had the pleasure to work with, and most of the community are true free-thinking artists. It's like the hippie commune of the language world. Peace out, man.

But there's a problem here. Ruby needs guidance beyond VM and language design or the loose meanderings of its more vocal community members. It boils down to a few simple points:

Ruby needs a spec. Anyone who believes this isn't true isn't paying attention. Now I'm not talking about a gold-standard legal document signed in blood by Matz and the chief stakeholders of the Ruby community. An officially sponsored, widely supported, and massively publicized community spec would work fine--and probably fit the community and the language better. But something needs to done quickly, since Ruby's "bus number" is dangerously low. A spec is not something to be feared...it's a guarantee that Ruby will live on into the future, that alternative implementations (like JRuby) can't intentionally introduce nasty incompatibilities (or at least, that they'd be easy to discover and easy to document), and perhaps most importantly...that the full glory and beauty of Ruby is published forever for all to see and explore, rather than dangerously trapped in very few minds.
Ruby needs a non-profit governing body. I'm not necessarily talking about a council of elders here, I'm just talking about some legal entity to which OSS copyrights can be assigned, donations can be made, and from which projects and initiatives can be funded. Maybe this would be RubyCentral, maybe this would be some other (new) organization...I don't know that. But it would be a great help to the community and Ruby's future if there were some official organization that could act as caretaker for Ruby's future. I'm all set to sign over any JRuby copyrights I have to such an organization, to protect the future of Ruby on the JVM just like the future of the C implementation. How about you?
Ruby needs you. Granted, this isn't really a change as such. You probably wouldn't be reading this if Ruby didn't already have you. But the Ruby community is at a big point in its lifetime...at risk of losing its identity, being eclipsed by newer projects, or even slipping deep, deep into the trough of disillusionment. What will prevent that happening is the community showing its strong ties, coming together to support official organizations and official documents, and above all, continuing to pour all our hearts into creating newer and better applications and libraries in Ruby, pushing the boundaries of what people think is possible.

12 comments:

DekaritaeApril 25, 2007 at 2:17 AM
Haii. I'm just buzzing around, tracking down the various people who were involved in the LiteStep community, to see where they're at now. If you're interested in catching up, lots of old-timers still hang out on #FPN, irc.freenode.net.
ReplyDelete
Replies
UnintentionalObjectRetentionApril 25, 2007 at 5:13 AM
"And the developing Ruby 1.9, the future successor to the current version 1.8 C implementation, provides something in the middle: native threads with a giant lock, so threads won't run concurrently."

Wow. I did not know that. That is hilariously useless.
ReplyDelete
Replies
R.J.April 25, 2007 at 6:06 AM
Is there anything that can be done with the growing corpus of tests Ruby/JRuby are collecting to help define an initial "spec"? At least that would help people fumble into the intended design.

More importantly, I suspect the intent of the underlying design decisions in (J)Ruby are largely undocumented right now. That does scare me. It's not that you or Matz are hoarding secrets, of course. It's a lot of work to break out in verbage what is important. But it's critical for long-term viability of the language.
ReplyDelete
Replies
Andrew LawApril 25, 2007 at 9:57 AM
Hi Charles,

This isn't the ideal way to ask a jruby question but I'm having no joy posting to users@jruby.codehaus.org (I get the mailer daemon barfing even though I am a subscriber and I've had no response since I've forwarded it on to user-owner). Is there another way I can ask what is definately a newbie question and probably a simple mistake on my part?

Looking forward to hearing from you. Keep up the great work too!

Regs, Andrew
ReplyDelete
Replies
sanxiynApril 25, 2007 at 4:33 PM
PHP has some kind of a quasi-official governance group (The PHP Group), but it's not as official as Apache or anything. On the other hand, governing bodies sound like committees, and I'm not sure you want to go there either. Benevolent dictators, when they work, work great.

Python's benevolent dictator is Guido van Rossum, but Python does have the non-profit governing body called Python Software Foundation, which holds all copyrights relating to CPython implementation, funds Python Conferences, and give grants. They are not mutually exclusive.
ReplyDelete
Replies
robert thauApril 26, 2007 at 9:16 AM
Ummmm... as I read the docs on ObjectSpace#define_finalizer, Ruby finalizers already are per-object; it's just that the method that attaches them to an object happens to be part of the ObjectSpace module.
ReplyDelete
Replies
Charles Oliver NutterApril 26, 2007 at 10:34 AM
robert thau: Yes, they are already per-object, but you don't attach them to the object directly (which would be something like defining a finalize method on the type, for example), you tell the GC (via ObjectSpace): "Hey, run this code when this object gets collected". It's a subtle difference, but it's an important one.
ReplyDelete
Replies
Charles Oliver NutterApril 26, 2007 at 7:06 PM
Bruce Rennie: You can certainly make that claim, but making it too often about too many features eventually means your system spends more of its time supporting features you *might* use *someday* than actually getting work done. Sure, machines are 500 to 2000 times faster...and we're doing at least that many times as much with them. Moore's law has ended, and we're not able to do as much with the same individual cores as we'd like. That means scaling horizontally. That means concurrency. And concurrency and live heap inspection do not mix.

There's probably nothing I can say to convince you that these features are not worth the impact they have on performance and evolution of languages and systems. But if you want these features to survive, you need to do something to help make sure they're implemented "effectively". Try it yourself, see how easy it is to make many threads all create garbage, a few more clean it up, and allow heap inspection across the whole lot without crippling the system. Maybe I'm totally off base. Maybe I'm not.

Yes, live heap inspection is useful. So would be full runtime profiling, tracking of all object creations and collections, logging every packet sent and the time it took, and tracing all user operations. But we don't do all those things all the time because we actually want our software to accomplish something. Feel free to run on systems that leave those sorts of features on at runtime if they perform well enough for you. Feel free to run your systems in debug mode all the time, just in case you might want to query that runtime information on a whim. Me, I'd rather my machine's cycles are spent getting work done, and I'm willing to trade a little convenience to do it.
ReplyDelete
Replies
Justin W SmithApril 27, 2007 at 7:41 AM
Great review of whats "wrong" with Ruby.
I truly love Ruby, but there are many aspects of the language which need to mature.

When I first learned Ruby, I quickly realized how limited its implementation of Threads is. For small scripts and other "one-off" work that we all do Threading doesn't matter, but Ruby's usage has quickly expanded beyond this. It's becoming a standard language for implementing large enterprise-level systems.

It is critical that the issues that you mention here are corrected in a thoughtful manner.
ReplyDelete
Replies
Tom PalmerMay 1, 2007 at 12:40 PM
For ObjectSpace, I've wondered how much of the same feature set is available in Java 6 with its cool heap information. Any way to tap into this (even with JNI?) to provide cheaper ObjectSpace on Java 6 plus? Even if just meant for debugging purposes?

For thread killing, I know less safe ways of killing threads like for instance power outages. To the extent I understand it, I don't see the huge trouble with kill/raise. (The whole "critical" thing, however, seems scary.)
ReplyDelete
Replies
AustinMay 8, 2007 at 9:28 AM
I haven't used ObjectSpace much and maybe I am mistaken, but I really like it because it provides some of the same power that Smalltalk's browse instances of, etc.

Sure this stuff could be duplicated in an IDE or another library, but it would be a shame to move away from a rich introspective dynamic language, just for the sake of removing overhead and performance gains. Ruby doesn't have a spec, but ObjectSpace belongs in it.

I think the more you change Ruby, the more "outside" the Ruby community you will be.

Thanks for all the excellent work you are doing with jruby.
ReplyDelete
Replies
Bob AmanJune 6, 2007 at 11:19 PM
ObjectSpace.each_object(Module) { |module| ... }

I find myself using the above to locate all subclasses of a particular class on a fairly regular basis. I would love to have a more efficient alternative, but I'm not willing to rely on mechanisms that force the programmer to write extra code to register a reference to the subclass with the parent class. That will inevitably lead to bugs when the programmer forgets to add the registration code. I can't even begin to say how much I want a Module#descendants method built into Ruby.
ReplyDelete
Replies

Add comment