In response to Johan‘s desperate request I’ve decided to organize a 4th HUGUK meetup. More info will follow on the official HUGUK blog soon, but since it’s going to be fairly short notice I thought it made sense to already share some details now:
- Date: Thursday 3rd of June
- Time: 18.30
- Place: new Skills Matter building
The two main talks will be:
“Introduction to Sqoop” by Aaron Kimball
– Synopsis –
This talk introduces Sqoop, the open source SQL-to-Hadoop tool. Sqoop helps users perform efficient imports of data from RDBMS sources to Hadoop’s distributed file system, where it can be processed in concert with other data sources. Sqoop also allows users to export Hadoop-generated results back to an RDBMS for use with other data pipelines.
After this session, users will understand how databases and Hadoop fit together, and how to use Sqoop to move data between these systems. The talk will provide suggestions for best practices when integrating Sqoop and Hadoop in your data processing pipelines. We’ll also cover some deeper technical details of Sqoop’s architecture, and take a look at some upcoming aspects of Sqoop’s development roadmap.
– Bio –
Aaron Kimball has been working with Hadoop since early 2007. Aaron has worked with the NSF and several other universities nationally and internationally to advance education in the field of large-scale data-intensive computing. He helped create and deliver academic course materials first used at the University of Washington (and later adopted by many other academic institutions) as well as Hadoop training materials used by several industry partners. Aaron has also worked as an independent consultant focusing on Hadoop and Amazon EC2-based systems. At Cloudera, he continues to actively develop Hadoop and related tools, as well as focus on training and user education. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.
“Hive at Last.fm” by Tim Sell
– Synopsis –
This talk is about using Hive in practice. We will go through some of the specific use cases for which Hive is currently being used at Last.fm, highlighting its strengths and weaknesses along the way.
– Bio –
Tim Sell is a Data Engineer at Last.fm who works with Hive and Hadoop on a daily basis.
As usual we’ll try to provide some free beer at the end and anyone is welcome to give a short lightning talk after the main presentations.