How does inter cluster data copying works in Hadoop?

In Hadoop, there is a utility called DistCP (Distributed Copy) to perform large inter/intra-cluster copying of data. This utility is also based on MapReduce. It creates Map tasks for files given as input.

After every copy using DistCP, it is recommended to run crosschecks to confirm that there is no data corruption and copy is complete.

Read the full book at www.amazon.com
Posted in Hadoop, Hadoop Interview Questions

Leave a Reply

Your email address will not be published. Required fields are marked *

*