Processing files from S3 with Cascading

Posted on August 10th, 2013

   Cascading is a Hadoop ecosystem framework that provides a higher level abstraction over MapReduce. I recently worked on a Cascading prototype that would read log files from an Amazon Web Services S3 bucket, do a minor transform, land the output in HDFS then move the files to another S3 bucket configured for archiving.
Quick-n-dirty git getting started guide

Posted on September 7th, 2011

As a git neophyte I approve of this post:

UPDATE: I also found this helpful site:

Tip for optimizing MySQL data types

Posted on June 28th, 2011

This is a tip that I’ve kept forgetting to write down so here it is:

During a system’s life cycle, requirements change and components are refactored. This includes databases as well, and particularly as data grows. Decisions and assumptions are made at the beginning of a system’s life cycle that may or may not hold up over years of operation and it’s good practice to continually analyze how well the initial design is working.
“select count” with Generics, Spring and Hibernate

Posted on April 18th, 2011

In a recent project, we introduced Generics and Spring into an application. In developing the Generic DAO implementation I was trying to find a way to get a record count from the database. It’s a simple enough task with HQL or Spring’s JdbcTemplate.queryForInt(String sql), however the Generics made it a little tricky. The solution I came up with was using HibernateCallback. A reference to the persistent type is a member of GenericDAOHibernate as is the SessionFactory and an instance of Spring’s HibernateTemplate.

