HUGUK #4

In response to Johan‘s desperate request I’ve decided to organize a 4th HUGUK meetup. More info will follow on the official HUGUK blog soon, but since it’s going to be fairly short notice I thought it made sense to already share some details now:

The two main talks will be:

“Introduction to Sqoop” by Aaron Kimball

– Synopsis –

This talk introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from RDBMS sources to Hadoop’s distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to an RDBMS for use with other data pipelines.

After this session, users will understand how databases and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We’ll also cover some deeper technical details of Sqoop’s architecture, and take a look at some upcoming aspects of Sqoop’s development roadmap.

– Bio –

Aaron Kimball has been working with Hadoop since early 2007. Aaron has worked with the NSF and several other universities nationally and internationally to advance education in the field of large-scale data-intensive computing. He helped create and deliver academic course materials first used at the University of Washington (and later adopted by many other academic institutions) as well as Hadoop training materials used by several industry partners. Aaron has also worked as an independent consultant focusing on Hadoop and Amazon EC2-based systems. At Cloudera, he continues to actively develop Hadoop and related tools, as well as focus on training and user education. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.

“Hive at Last.fm” by Tim Sell

– Synopsis –

This talk is about using Hive in practice. We will go through some of the specific use cases for which Hive is currently being used at Last.fm, highlighting its strengths and weaknesses along the way.

– Bio –

Tim Sell is a Data Engineer at Last.fm who works with Hive and Hadoop on a daily basis.

As usual we’ll try to provide some free beer at the end and anyone is welcome to give a short lightning talk after the main presentations.

About these ads

3 Responses to HUGUK #4

  1. Abhinay Mehta says:

    Do we have to sign up for this or just turn up?

    • Klaas says:

      You’ll probably have to sign up. The post on the huguk.org blog will explain how to register exactly, so stay tuned for any news there.

  2. Stress balls says:

    Many thanks with the well-thought posting. I’m truly at function correct now! So I ought to go off with out examining all I’d like. But, I set your weblog on my RSS feed to ensure I can study far more….

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: