Software Development

Processing files from S3 with Cascading

Posted in Big Data, Computing, Data Management, Software Development on August 10th, 2013 by Jeff – Be the first to comment

   Cascading is a Hadoop ecosystem framework that provides a higher level abstraction over MapReduce. I recently worked on a Cascading prototype that would read log files from an Amazon Web Services S3 bucket, do a minor transform, land the output in HDFS then move the files to another S3 bucket configured for archiving.
read more »

Quick-n-dirty git getting started guide

Posted in Software Development on September 7th, 2011 by Jeff – Be the first to comment

As a git neophyte I approve of this post:

UPDATE: I also found this helpful site:

Tip for optimizing MySQL data types

Posted in MySQL, Software Development on June 28th, 2011 by Jeff – Be the first to comment

This is a tip that I’ve kept forgetting to write down so here it is:

During a system’s life cycle, requirements change and components are refactored. This includes databases as well, and particularly as data grows. Decisions and assumptions are made at the beginning of a system’s life cycle that may or may not hold up over years of operation and it’s good practice to continually analyze how well the initial design is working.
read more »

“select count” with Generics, Spring and Hibernate

Posted in Software Development on April 18th, 2011 by Jeff – Be the first to comment

In a recent project, we introduced Generics and Spring into an application. In developing the Generic DAO implementation I was trying to find a way to get a record count from the database. It’s a simple enough task with HQL or Spring’s JdbcTemplate.queryForInt(String sql), however the Generics made it a little tricky. The solution I came up with was using HibernateCallback. A reference to the persistent type is a member of GenericDAOHibernate as is the SessionFactory and an instance of Spring’s HibernateTemplate.

read more »