Why do we need Serialization in Hadoop map reduce methods?

In Hadoop, there are multiple data nodes that hold data. During the processing of map and reduce methods data may

What is the use of Distributed Cache in Hadoop?

Hadoop provides a utility called Distributed Cache to improve the performance of jobs by caching the files used by applications.

How will you synchronize the changes made to a file in Distributed Cache in Hadoop?

It is a trick question. In Distributed Cache, it is not allowed to make any changes to a file. This

How will you replace HDFS data volume before shutting down a DataNode?

In HDFS, DataNode supports hot swappable drives. With a swappable drive we can add or replace HDFS data volumes while

What are the important configuration files in Hadoop?

There are two important configuration files in a Hadoop cluster:

All the Jobs in Hadoop and HDFS implementation uses

How will you monitor memory used in a Hadoop cluster?

In Hadoop, TaskTracker is the one that uses high memory to perform a task. We can configure the TastTracker to

How will you create a custom Partitioner in a Hadoop job?

Partition phase runs between Map and Reduce phase. It is an optional phase. We can create a custom partitioner by

What are the differences between RDBMS and HBase data model?

The main differences between RDBMS and HBase data model are as follows: Schema: In RDBMS, there is a schema of

What is a Checkpoint node in HDFS?

A Checkpoint node in HDFS periodically fetches fsimage and edits from NameNode, and merges them. This merge result is called