Monday, November 5, 2007

Ruby Community Seeks Autotranslator

As many of you know, Ruby was created in Japan by Yukihiro Matsumoto, and most of the core development team is still Japanese to this day. This has posed a serious problem for the Ruby community, since the language barrier between the Japanese core team and community and the English-speaking community is extremely high. Only a few members of the core team can speak English comfortably, so discussions about the future of Ruby, bug fixes, and new features happens almost entirely on the Japanese ruby-dev mailing list. That leaves those of us English speakers on the ruby-core mailing list out in the cold.

We need a two-way autotranslator.

Yes, we all know that automated translation technology is not perfect, and that for East Asian languages it's often barely readable. But even having partial, confusing translations of the Japanese emails would be better than having nothing at all, since we'd know that certain topics are being discussed. And English to JP translators do a bit better than the reverse direction, so core team members interested in ruby-core emails would get the same benefit.

I imagine this is also part of the reason Rails has not taken off as quickly in Japan as it has in the English-speaking world: the Rails core team is peopled primarily by English speakers, and the main Rails lists are all in English. Presumably, an autotranslating gateway would be useful for many such communities.

But here's the problem: I know of no such service.

There are multiple translation services, for free and for pay, that can handle Japanese to some level. Google Translate and Babelfish are the two I use regularly. But these only support translating a block of text or a URL entered into a web form. There also does not appear to be a Google API for Translate, so screen-scraping would be the only option at present.

The odd thing about this is that autotranslators are good enough now that there could easily be a generic translation service for dozens of languages. Enter in source and target languages, source and target mailing lists, and it would busily chew through mail. For closely-related European languages, autotranslators do an extremely good job. And just last night I translated a Chinese blog post using Google Translate that ended up reading as almost perfect English. The time is ripe for such a service, and making it freely available could knock down some huge barriers between international communities.

So, who's going to set it up first and grab the brass ring (or is there a service I've overlooked)?


  1. I'd like to put something together, I'll see what I can do.

    Jason Toy

  2. InterTran (hundreds of languages) might be interesting, but the public service is almost always too busy to give results. Translating is computationally expensive. In the long run it needs to sustain the hardware and software costs of such a service.

  3. What you really need is some guy who is literate in both languages. Throw him into a cage, feed mailing-list dump and pizza in at one end, receive translation at the other. I suppose the only sticky point is finding enough pizza to power our gerbil-cage translation system...

  4. I used to be *reasonably* conversational in japanese, but a decade of atrophy has mostly left with with "nihongo ga sukoshi wakarimasu".

    Perhaps we need a ruby focussed language learning community. Like livemocha but for geeks.

  5. How come members of Japanese core team must use English? How come almost members of English-speaking community don't study Japanese despite the fact that they can study 'speaking Ruby'?

  6. @anon 11/10 - perhaps the eagerness for the entire community to focus on the English language is that the Ruby syntax itself is English (alphanumberic characters, left-to-right across the page) rather than Japanese. This syntax encouraged English speakers to use Ruby. That's my guess.

  7. Thanks, Dr. Nic, that was a very polite answer. I couldn't do that. You know I'm really peace and all... but the cultural ignorance of some japanese folks just puzzles me.

    Hello?! We're talking about software and computer science here. Almost every notable academic and practical achievement in this area has been published in English. This is clearly the language that connects minds across borders. Just a fact. Not my fault. And I'm not a native speaker either.

    I mean, you guys really have your ways with your culture. The reason we haven't got proper unicode support in Ruby today is because the standard didn't fully acknowledge the subtleties of the language.

    And did you come up with something better? I mean other than an encoding scheme which represents absolutely nothing else but Japanese?

    Nobody's telling anyone to learn English but it would sure help you get a broader perspective and get to learn more about the parts of the world that are not Japan.

    I mean, I really love you all and I have the deepest respect for your culture. But you definitely need to be less stiff - and relax.

    P.S. Sorry for ranting so cowardly anonymous, but hey, it's the internet! You may even insult me in return and I wouldn't care...

    And Mats still rules, no matter what!

  8. I should add that just because Chinese to English works ok, means nothing. Round tripping English to Japanese to English on any of the mentioned services illustrates how bad things can get. East Asian languages are similar as far as the lexical stuff goes, but Chinese just happens to have similar word order and sentence structure to English whereas Japanese is inflected and highly context sensitive.

  9. The best bet is still Google because of their new algorithms for translation using machine learning. They do like Ruby, so if you ask them, they could probably be willing to help.


  10. Jason: This is a great effort! What are you using to translate? I especially like that you provide a way for other human translators to come in and provide a more accurate translation. I'll blog this if you don't mind, so people know it's out there.

    A couple missing features come to mind:

    - subscription feeds
    - a mailing list for each translated output would be even better