What is the optimization that can be done in SELECT * query in Hive?

We can convert some of the SELECT queries in Hive into single FETCH task. With this optimization, latency of SELECT query is decreased.

To use this we have to set the value of hive.fetch.task.conversion parameter. The permissible values are:

  • 0: It means FETCH is disabled.
  • 1: It is minimal mode. SELECT *, FILTER on partition columns (WHERE and HAVING clauses), LIMIT only
  • 2: It is more mode: SELECT, FILTER, LIMIT only (including virtual columns)
    “more” can even take UDF expressions in the SELECT clause.