I added the @opt decorator to Dumbo yesterday, and decided it was useful enough to justify a new minor release. Using this decorator, mappers and reducers can specify (additional) options they rely on. For instance, you can now write
from dumbo import opt, run, sumreducer
@opt("addpath", "yes")
def mapper(key, value):
for word in value.split():
yield (word, key[0]), 1
if __name__ == "__main__":
run(mapper, sumreducer, combiner=sumreducer)
instead of
from dumbo import main, sumreducer
def mapper(key, value):
for word in value.split():
yield (word, key[0]), 1
def runner(job):
job.additer(mapper, sumreducer, combiner=sumreducer)
def starter(prog):
prog.addopt("addpath", "yes")
if __name__ == "__main__":
main(runner, starter)
to count words on a file per file basis (recall that the -addpath yes option makes sure the file path is prepended to the key passed to the mapper). The former version is not only shorter, but also more clear, since the option specification is closer to the code that relies on it.
Under the hood, the @opt decorator appends the given option to an opts list attribute. For mapper and reducer classes, you can just set this attribute directly:
class Mapper:
opts = [("addpath", "yes")]
def __call__(self, key, value):
for word in value.split():
yield (word, key[0]), 1
if __name__ == "__main__":
from dumbo import run, sumreducer
run(Mapper, sumreducer, combiner=sumreducer)
Furthermore, it’s now also possible to pass an options list to run() or additer() via the opts argument:
def mapper(key, value):
for word in value.split():
yield (word, key[0]), 1
if __name__ == "__main__":
from dumbo import run, sumreducer
opts = [("addpath", "yes")]
run(mapper, sumreducer, combiner=sumreducer, opts=opts)
which could be handy when you want to use join keys for only one iteration, for example.
[...] shouldn’t be hard to understand if you had a peek at the posts about hiding join keys and the @opt decorator, except maybe for the following [...]
Thank you for the intriguing read! Alright playtime is over and back to school work.