December 23, 2009
At the 11/18 Bay Area HUG, Paul Tarjan apparently presented an approach for reading Hadoop records in Python. In summary, his approach seems to work as follows:
hadoop_record Python module
Although it’s a nice and very systematic solution, I couldn’t resist blogging about an already existing alternative solution for this problem:
typedbytes Python module
Not only would this have saved Paul a lot of work, it probably also would’ve been more efficient, especially when using ctypedbytes, the speedy variant of the typedbytes module.