Blog Archives

How will you replace HDFS data volume before shutting down a DataNode?

In HDFS, DataNode supports hot swappable drives. With a swappable drive we can add or replace HDFS data volumes while the DataNode is still running. The procedure for replacing a hot swappable drive is as follows: <li>First we format and

Posted in Hadoop, Hadoop Interview Questions

What are the important configuration files in Hadoop?

There are two important configuration files in a Hadoop cluster: <li><strong>Default Configuration</strong>: There are core-default.xml, hdfs-default.xml and mapred-default.xml files in which we specify the default configuration for Hadoop cluster. These are read only files.</li> <li><strong>Custom Configuration</strong>: We have site-specific custom

Posted in Hadoop, Hadoop Interview Questions

How will you monitor memory used in a Hadoop cluster?

In Hadoop, TaskTracker is the one that uses high memory to perform a task. We can configure the TastTracker to monitor memory usage of the tasks it creates. It can monitor the memory usage to find the badly behaving tasks,

Posted in Hadoop, Hadoop Interview Questions

Why do we need Serialization in Hadoop map reduce methods?

In Hadoop, there are multiple data nodes that hold data. During the processing of map and reduce methods data may transfer from one node to another node. Hadoop uses serialization to convert the data from Object structure to Binary format.

Posted in Hadoop, Hadoop Interview Questions

What is the use of Distributed Cache in Hadoop?

Hadoop provides a utility called Distributed Cache to improve the performance of jobs by caching the files used by applications. An application can specify which file it wants to cache by using JobConf configuration. Hadoop framework copies these files to

Posted in Hadoop, Hadoop Interview Questions

How will you synchronize the changes made to a file in Distributed Cache in Hadoop?

It is a trick question. In Distributed Cache, it is not allowed to make any changes to a file. This is a mechanism to cache read-only data across multiple nodes. Therefore, it is not possible to update a cached file

Posted in Hadoop, Hadoop Interview Questions

What is Safemode in HDFS?

Safemode is considered as the read-only mode of NameNode in a cluster. During the startup of NameNode, it is in SafeMode. It does not allow writing to file-system in Safemode. At this time, it collects data and statistics from all

Posted in Hadoop, Hadoop Interview Questions

How will you create a custom Partitioner in a Hadoop job?

Partition phase runs between Map and Reduce phase. It is an optional phase. We can create a custom partitioner by extending the org.apache.hadoop.mapreduce.Partitioner class in Hadoop. In this class, we have to override getPartition(KEY key, VALUE value, int numPartitions) method.

Posted in Hadoop, Hadoop Interview Questions

What are the differences between RDBMS and HBase data model?

The main differences between RDBMS and HBase data model are as follows: Schema: In RDBMS, there is a schema of tables, columns etc. In HBASE, there is no schema. Normalization: RDBMS data model is normalized as per the rule of

Posted in Hadoop, Hadoop Interview Questions

What is a Checkpoint node in HDFS?

A Checkpoint node in HDFS periodically fetches fsimage and edits from NameNode, and merges them. This merge result is called a Checkpoint. Once a Checkpoint is created, Checkpoint Node uploads the Checkpoint to NameNode. Secondary node also takes Checkpoint similar

Posted in Hadoop, Hadoop Interview Questions