Data Management

Processing files from S3 with Cascading

Posted in Big Data, Computing, Data Management, Software Development on August 10th, 2013 by Jeff – Be the first to comment

   Cascading is a Hadoop ecosystem framework that provides a higher level abstraction over MapReduce. I recently worked on a Cascading prototype that would read log files from an Amazon Web Services S3 bucket, do a minor transform, land the output in HDFS then move the files to another S3 bucket configured for archiving.
read more »

Tip for optimizing MySQL data types

Posted in MySQL, Software Development on June 28th, 2011 by Jeff – Be the first to comment

This is a tip that I’ve kept forgetting to write down so here it is:

During a system’s life cycle, requirements change and components are refactored. This includes databases as well, and particularly as data grows. Decisions and assumptions are made at the beginning of a system’s life cycle that may or may not hold up over years of operation and it’s good practice to continually analyze how well the initial design is working.
read more »

MySQL udf_median on Windows 7 64bit

Posted in Information Technology, MySQL on May 21st, 2011 by Jeff – 6 Comments

In a minor but ongoing saga of supporting the venerable MySQL UDF function udf_median, I can now add a HOWTO for building it on Windows 7 x64 using Microsoft Visual C++ Express 2010.
read more »

MySQL udf_median on Windows

Posted in MySQL on April 19th, 2011 by Jeff – 1 Comment

A few years ago I had to get a MySQL UDF (User Defined Function) working on my Windows workstation for a project I was working on. I had the benefit of a couple of other folks to help me get my environment set up and a .dll compiled. I was recently contacted about my project files and realized I had an orphaned link out on the Internet so I thought I’d better fix that up.

read more »