Thursday, February 25, 2010

Hadoop: Effectiveness of a Combiner Class

The execution speed difference for an MR job with and without a combiner class is huge. The Security log analytics without a combiner class  did not complete in 1.5 days. With the addition of a Combiner class, the code finished in 15-20 minutes! Now, the reasons for this performance enhancement are obvious.
  • <K, V> are in memory and network latency and traffic to reducers is decreased.
  • Disk operations are minimal at the reducers as a result of combine operations.