Big Data

Processing files from S3 with Cascading

Posted in Big Data, Computing, Data Management, Software Development on August 10th, 2013 by Jeff – Be the first to comment

   Cascading is a Hadoop ecosystem framework that provides a higher level abstraction over MapReduce. I recently worked on a Cascading prototype that would read log files from an Amazon Web Services S3 bucket, do a minor transform, land the output in HDFS then move the files to another S3 bucket configured for archiving.
read more »