What is the difference between NameNode, Backup Node and Checkpoint NameNode in HDFS?

The differences between NameNode, BackupNode and Checkpoint NameNode are as follows:

  • NameNode: NameNode is at the heart of the HDFS file system that manages the metadata i.e. the data of the files is not stored on the NameNode but rather it has the directory tree of all the files present in the HDFS file system on a Hadoop cluster. NameNode uses two files for the namespace:
  • fsimage file: This file keeps track of the latest checkpoint of the namespace.
  • edits file: This is a log of changes made to the namespace since checkpoint.
  • Checkpoint Node: Checkpoint Node keeps track of the latest checkpoint in a directory that has same structure as that of NameNode‚Äôs directory.Checkpoint node creates checkpoints for the namespace at regular intervals by downloading the edits and fsimage file from the NameNode and merging it locally. The new image is then again updated back to the active NameNode.
  • BackupNode: This node also provides check pointing functionality like that of the Checkpoint node but it also maintains its up-to-date in-memory copy of the file system namespace that is in sync with the active NameNode.
Read the full book at www.amazon.com
Posted in Hadoop, Hadoop Interview Questions

Leave a Reply

Your email address will not be published. Required fields are marked *

*