Unfortunately, the list of Hadoop patches required for making Dumbo work properly just expanded a bit, since I traced down a strange encoding bug to an issue in Streaming’s typed bytes code. Hence, you might want to apply the MAPREDUCE-764 patch to your Hadoop build if you use Dumbo, even though the bug only leads to problems in very specific cases and usually isn’t hard to work around. Hopefully this patch will make it into Hadoop 0.21.

This isn’t all bad news, however. The encoding bug was initially reported on the dumbo-user mailing list, which apparently has 12 subscribers already and is starting to attract fairly regular traffic. I haven’t promoted this mailing list much so far and never really expected that people would actually start using it to be honest, but obviously I was wrong. Everyone who reads this blog should consider subscribing, I’m sure you won’t regret it!

3 Responses to MAPREDUCE-764

  1. If it’s not too much trouble, it would be really helpful if you just supply a patched version of hadoop tarball along with a dumbo release (at least until all the required JIRA patches are committed to Hadoop).

    Right now it’s quite confusing to figure out which version of dumbo will require applying which patches to which version of hadoop. Just trying out a simple dumbo tutorial shouldn’t require this much effort.

    • Klaas says:

      The Cloudera distribution should be close to what you want, Harish. It includes all required patches (except for the MAPREDUCE-764 patch of course, but maybe they’ll include that one too once it gets committed). They don’t provide a packaged Dumbo yet, however, but installing Dumbo itself isn’t that hard. And I guess the Cloudera guys might consider packaging Dumbo too if enough people ask for it…

      Providing a patched Hadoop tarball is still a good idea though. It would be awesome if someone would step up and maintain a Dumbo-targeted branch of Hadoop (an easy way to achieve that might be to fork the Yahoo! Hadoop distribution on Github), but personally I’d like to avoid the additional work of maintaining such a branch right now. You could try sending an email to the mailing list in order to find out if someone else is up for it maybe.

  2. […] distribution yet, so you’ll still have to apply this patch yourself if you want to avoid strange encoding problems in certain corner cases. This patch has now been reviewed and accepted for Hadoop 0.21 for quite a […]

Leave a Reply to Klaas Cancel reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: