Dumbo over HBase

This should be old news for dumbo-user subscribers, but Tim has, once again, put his Java coding skills to good use. This time around he created nifty input and output formats for consuming and/or producing HBase tables from Dumbo programs. Here’s a silly but illustrative example:

from dumbo import opt, run

@opt("inputformat", "fm.last.hbase.mapred.TypedBytesTableInputFormat")
@opt("hadoopconf", "hbase.mapred.tablecolumns=testfamily:testqualifier")
def mapper(key, columns):
    for family, column in columns.iteritems():
        for qualifier, value in column.iteritems():
            yield key, (family, qualifier, value)

@opt("outputformat", "fm.last.hbase.mapred.TypedBytesTableOutputFormat")
@opt("hadoopconf", "hbase.mapred.outputtable=output_table")
def reducer(key, values):
    columns = {}
    for family, qualifier, value in values:
        column = columns.get(family, {})
        column[qualifier] = value
    yield key, columns

if __name__ == "__main__":
    run(mapper, reducer)

Have a look at the readme for more information.

8 Responses to Dumbo over HBase

  1. Nathan says:

    Is this still supported? I compiled lasthbase.jar and put all my HBase jars including lasthbase in the hadoop classpath. I installed Cloudera’s CDH2 this morning and I tried running a dumbo job and got this error. The CDH2 version is: Hadoop 0.20.1+169.113

    I have seen you say the CDH2 should support the patches needed, but I still get this error. :\

    2010-11-07 18:18:37,976 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201011071752_0004_m_000000_3: java.lang.NullPointerException
    at org.apache.hadoop.io.BytesWritable.(BytesWritable.java:54)
    at org.apache.hadoop.typedbytes.TypedBytesWritable.(TypedBytesWritable.java:41)
    at org.apache.hadoop.streaming.io.TypedBytesOutputReader.getLastOutput(TypedBytesOutputReader.java:73)
    at org.apache.hadoop.streaming.PipeMapRed.getContext(PipeMapRed.java:612)
    at org.apache.hadoop.streaming.PipeMapRed.logFailure(PipeMapRed.java:639)
    at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:123)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    • Nathan says:

      My python version is 2.6.5

    • Nathan says:

      ALso, here are the two commands I have tried running. Both result in the same error I posted above:

      dumbo test-in.py -hadoop /usr/lib/hadoop -libjar lasthbase.jar -inputformat org.apache.hadoop.hbase.mapred.TableInputFormat -D hbase.mapred.tablecolumns=”f:cnt” -input /hbase/webpage -output /outputDir

      dumbo test-in.py -hadoop /usr/lib/hadoop -libjar lasthbase.jar -inputformat fm.last.hbase.mapred.TableInputFormat -D hbase.mapred.tablecolumns=”f:cnt” -input /hbase/webpage -output /outputDir

  2. Asael Moshe says:

    I tried using it for a case in which the value in hbase was a 64bit counter.
    For cases where the value of the counter was between 128 and 256 (for example: “\x00\x00\x00\x00\x00\x00\x00\xA8”), I didn’t get 8 bytes as I expected, but rather the 10 bytes:
    Does anyone know how to get the original bytes in my python mapper?

  3. free0wolf says:

    Can we access Cassandra instead of HBase?

  4. I want to to thank you for this fantastic read!! I absolutely enjoyed
    every little bit of it. I’ve got you book-marked to look at new things
    you post…

Leave a Reply to Nathan Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: