Blog Archives

What is STREAMTABLE in Hive?

In Hive, we can optimize a query by using STREAMTABLE hint. We can specify it in SELECT query with JOIN. During the map/reduce stage of JOIN, a table data can be streamed by using this hint. E.g. SELECT /*+ STREAMTABLE(table1)

Posted in Hive, Hive Interview Questions

What is the use of TOUCH in ALTER statement?

In Hive, TOUCH clause in ALTER statement is used to read the metadata and write it back. This operation will modify the last accessed time of a partition in Hive. With TOUCH statement we can also execute the POST and

Posted in Hive, Hive Interview Questions

How does OVERWRITE clause work in CREATE TABLE statement in Hive?

We use OVERWRITE clause in CREATE TABLE statement to delete the existing data and write new data in a Hive table. Essentially, as the name suggests, OVERWRITE helps in overwriting existing data in a Hive table.

Posted in Hive, Hive Interview Questions

What are the options to connect an application to a Hive server?

We can use following options to connect an application a Hive server: JDBC Driver: We can use JDBC Driver with embedded as well as remote access to connect to HiveServer. This is for Java based connectivity. Python Client: For Python

Posted in Hive, Hive Interview Questions

How TRIM and RPAD functions work in Hive?

TRIM and RPAD functions are for processing String data type in Hive. With TRIM function we can delete the spaces before and after a String. It is very useful for formatting user input in which user may have entered extra

Posted in Hive, Hive Interview Questions

How will you recursively access sub-directories in Hive?

We can use following commands in Hive to recursively access sub-directories: hive> Set mapred.input.dir.recursive=true; hive> Set hive.mapred.supports.subdirectories=true; Once above options are set to true, Hive will recursively access sub-directories of a directory in MapReduce.

Posted in Hive, Hive Interview Questions

What is the optimization that can be done in SELECT * query in Hive?

We can convert some of the SELECT queries in Hive into single FETCH task. With this optimization, latency of SELECT query is decreased. To use this we have to set the value of hive.fetch.task.conversion parameter. The permissible values are: 0:

Posted in Hive, Hive Interview Questions

What is the use of ORC format tables in Hive?

We use Optimized Row Columnar (ORC) file format to store data efficiently in Hive. It is used for performance improvement in reading, writing and processing of data. In ORC format, we can overcome the limitations of other Hive file formats.

Posted in Hive, Hive Interview Questions

What are the main use cases for using Hive?

Hive is mainly used for Datawarehouse applications. Hive used Hadoop and MapReduce that put some restrictions on use cases for Hive. Some of the main use cases for Hive are: Analysis of static Big data <li>Applications in which less responsive

Posted in Hive, Hive Interview Questions

What is a Managed table in Hive?

Managed tables are the tables in which files, metadata and statistics etc are managed by internal Hive processes. Hive creates Managed tables by default. When we drop a managed table or partition, then all the metadata and data associated with

Posted in Hive, Hive Interview Questions