Reading Hadoop records in Python

At the 11/18 Bay Area HUG, Paul Tarjan apparently presented an approach for reading Hadoop records in Python. In summary, his approach seems to work as follows:

Hadoop records
     → CsvRecordInput
         → hadoop_record Python module

Although it’s a nice and very systematic solution, I couldn’t resist blogging about an already existing alternative solution for this problem:

Hadoop records
     → TypedBytesRecordInput
         → typedbytes Python module

Not only would this have saved Paul a lot of work, it probably also would’ve been more efficient, especially when using ctypedbytes, the speedy variant of the typedbytes module.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: