Blog Archives

How will you replace HDFS data volume before shutting down a DataNode?

In HDFS, DataNode supports hot swappable drives. With a swappable drive we can add or replace HDFS data volumes while the DataNode is still running. The procedure for replacing a hot swappable drive is as follows: <li>First we format and

Posted in Hadoop, Hadoop Interview Questions

What are the important configuration files in Hadoop?

There are two important configuration files in a Hadoop cluster: <li><strong>Default Configuration</strong>: There are core-default.xml, hdfs-default.xml and mapred-default.xml files in which we specify the default configuration for Hadoop cluster. These are read only files.</li> <li><strong>Custom Configuration</strong>: We have site-specific custom

Posted in Hadoop, Hadoop Interview Questions

How will you monitor memory used in a Hadoop cluster?

In Hadoop, TaskTracker is the one that uses high memory to perform a task. We can configure the TastTracker to monitor memory usage of the tasks it creates. It can monitor the memory usage to find the badly behaving tasks,

Posted in Hadoop, Hadoop Interview Questions

Why do we need Serialization in Hadoop map reduce methods?

In Hadoop, there are multiple data nodes that hold data. During the processing of map and reduce methods data may transfer from one node to another node. Hadoop uses serialization to convert the data from Object structure to Binary format.

Posted in Hadoop, Hadoop Interview Questions

What is the use of Distributed Cache in Hadoop?

Hadoop provides a utility called Distributed Cache to improve the performance of jobs by caching the files used by applications. An application can specify which file it wants to cache by using JobConf configuration. Hadoop framework copies these files to

Posted in Hadoop, Hadoop Interview Questions

How will you synchronize the changes made to a file in Distributed Cache in Hadoop?

It is a trick question. In Distributed Cache, it is not allowed to make any changes to a file. This is a mechanism to cache read-only data across multiple nodes. Therefore, it is not possible to update a cached file

Posted in Hadoop, Hadoop Interview Questions

What are the important points a NameNode considers before selecting the DataNode for placing a data block?

Some of the important points for selecting a DataNode by NameNode are as follows: <li>NameNode tries to keep at least one replica of a Block on the same node that is writing the block.</li> <li>It tries to spread the different

Posted in Hadoop, Hadoop Interview Questions

What is Safemode in HDFS?

Safemode is considered as the read-only mode of NameNode in a cluster. During the startup of NameNode, it is in SafeMode. It does not allow writing to file-system in Safemode. At this time, it collects data and statistics from all

Posted in Hadoop, Hadoop Interview Questions

How will you create a custom Partitioner in a Hadoop job?

Partition phase runs between Map and Reduce phase. It is an optional phase. We can create a custom partitioner by extending the org.apache.hadoop.mapreduce.Partitioner class in Hadoop. In this class, we have to override getPartition(KEY key, VALUE value, int numPartitions) method.

Posted in Hadoop, Hadoop Interview Questions

What are the differences between RDBMS and HBase data model?

The main differences between RDBMS and HBase data model are as follows: Schema: In RDBMS, there is a schema of tables, columns etc. In HBASE, there is no schema. Normalization: RDBMS data model is normalized as per the rule of

Posted in Hadoop, Hadoop Interview Questions