HADOOP-5528 got committed yesterday. From Hadoop 0.21 onwards, join keys will work “out of the box”, without requiring any patching. Since the patch evolved somewhat before it got committed, it won’t work anymore with Dumbo 0.20.3 though. Therefore, I released Dumbo 0.21.4 this morning, for which the list of changes includes fixing the incompatibility with the final HADOOP-5528 patch.
So far, my luck with getting Hadoop patches reviewed and committed has varied quite a bit. From my limited personal experience, it seems that it’s more difficult to get a committer to look at a bugfix or an important enhancement, while such contributions can actually be considered more important than new features. It is of course possible that these particular issues just happened to get overlooked somehow, or maybe there’s a procedure for attracting the committers’ attention that I’m not aware of, but nevertheless I’m still under the impression that Hadoop’s patch handling currently is not as smooth and efficient as it could be. The fact that, as of this writing, not less than 47 issues are in the “Patch available” state, seems to confirm this impression.