Integration with Java code

Although Python has many advantages, you might still want to write some of your mappers or reducers in Java once in a while. Flexibility and speed are probably the most likely potential reasons. Thanks to a recent enhancement, this is now easily achievable. Here’s a version of that uses the example mapper and reducer from the feathers project (and thus requires -libjar feathers.jar):

import dumbo"",

You can basically mix up Python with Java in any way you like. There’s only one minor restriction: You cannot use a Python combiner when you specify a Java mapper. Things should still work in this case though, it’ll just be slow since the combiner won’t actually run. In theory, this limitation could be avoided by relying on HADOOP-4842, but personally I don’t think it’s worth the trouble.

The source code for and fm.last.feathers.reduce.Sum is just as straightforward as the code for the OutputFormat classes discussed in my previous post. All you have to keep in mind is that only the mapper input keys and values can be arbitrary writables. Every other key or value has to be a TypedBytesWritable. Writing a custom Java partitioner for Dumbo programs is equally easy by the way. The fm.last.feather.partition.Prefix class is a simple example. It can be used by specifying -partitioner fm.last.feather.partition.Prefix.

As you probably expected already, none of this will work for local runs on UNIX, but you can still test things locally fairly easily by running on Hadoop in standalone mode.

One Response to Integration with Java code

  1. program code says:

    program code…

    […]Integration with Java code « Dumbotics[…]…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: