How will you replace HDFS data volume before shutting down a DataNode?

In HDFS, DataNode supports hot swappable drives. With a swappable drive we can add or replace HDFS data volumes while

What are the important configuration files in Hadoop?

There are two important configuration files in a Hadoop cluster:

All the Jobs in Hadoop and HDFS implementation uses

How will you monitor memory used in a Hadoop cluster?

In Hadoop, TaskTracker is the one that uses high memory to perform a task. We can configure the TastTracker to

Why do we need Serialization in Hadoop map reduce methods?

In Hadoop, there are multiple data nodes that hold data. During the processing of map and reduce methods data may

What is the use of Distributed Cache in Hadoop?

Hadoop provides a utility called Distributed Cache to improve the performance of jobs by caching the files used by applications.

How will you synchronize the changes made to a file in Distributed Cache in Hadoop?

It is a trick question. In Distributed Cache, it is not allowed to make any changes to a file. This

What are the important points a NameNode considers before selecting the DataNode for placing a data block?

Some of the important points for selecting a DataNode by NameNode are as follows:

NameNode also tries to spread

How will you create a custom Partitioner in a Hadoop job?

Partition phase runs between Map and Reduce phase. It is an optional phase. We can create a custom partitioner by