Dumbo IP count in C

It doesn’t comply very well with the goal of making it as easy as possible to write MapReduce programs, but Dumbo mappers and reducers can also be written in C instead of Python. I just put an example on GitHub to illustrate this. Although it’s nowhere near as convenient as using Python, writing a mapper or reducer in C is not that hard since you get to use the nifty Python C API, and in some specific cases the speed gains might be worth the extra effort. Moreover, setuptools nicely takes care of all the building and compiling, and you can limit the C code to computationally expensive parts and still use Python for the rest.

Advertisements

5 Responses to Dumbo IP count in C

  1. Elias says:

    Would having C versions of dumbo.sumsreducer etc make sense?

    • Klaas says:

      That might make sense yeah, but the C versions should only be optional alternatives for the default implementations in Python, in my opinion. A possible way of doing this is by putting the C versions in a separate module, checking if this module is available when the Dumbo module gets imported, and replacing the Python versions with C versions if it is.

  2. Elias says:

    Oh and btw, how does the C version compare to the Java version compared to the Python/Dumbo version?

    • Klaas says:

      It depends… 🙂 In general, I expect Java to still be notably faster than Dumbo/C, and when your mappers and/or reducers do a lot of work, Dumbo/C can be substantially faster than Dumbo/Python.

  3. L-Lysine says:

    ”; I am very thankful to this topic because it really gives useful information *~~

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: