Partition phase runs between Map and Reduce phase. It is an optional phase. We can create a custom partitioner by extending the org.apache.hadoop.mapreduce.Partitioner class in Hadoop. In this class, we have to override getPartition(KEY key, VALUE value, int numPartitions) method.
This method takes three inputs. In this method, numPartitions is same as the number of reducers in our job. We pass key and value to get the partition number to which this key,value record will be assigned. There will be a reducer corresponding to that partition. The reducer will further handle to summarizing of the data.
Once custom Partitioner class is ready, we have to set it in the Hadoop job. We can use following method to set it:
job.setPartitionerClass(CustomPartitioner.class);Read the full book at www.amazon.com