What is Replication factor in HDFS, and how can we set it?

Replication factor in HDFS is the number of copies of a file in file system. A Hadoop application can specify the number of replicas of a file it wants HDFS to maintain.

This information is stored in NameNode.

We can set the replication factor in following ways:

  • We can use Hadoop fs shell, to specify the replication factor for a file. Command as follows:

$hadoop fs –setrep –w 5 /file_name

In above command, replication factor of file_name file is set as 5.

  • We can also use Hadoop fs shell, to specify the replication factor of all the files in a directory.

$hadoop fs –setrep –w 2 /dir_name

In above command, replication factor of all the files under directory dir_name is set as 2.

Read the full book at www.amazon.com
Posted in Hadoop, Hadoop Interview Questions

Leave a Reply

Your email address will not be published. Required fields are marked *

*