We use Optimized Row Columnar (ORC) file format to store data efficiently in Hive. It is used for performance improvement in reading, writing and processing of data.
In ORC format, we can overcome the limitations of other Hive file formats. Some of the advantages of ORC format are:
Read the full book at www.amazon.com
<li>There is single file as the output of each task. This reduces load on NameNode.</li> <li>It supports date time, decimal, struct, map etc complex types.</li> <li>It stores light-weight indexes within the file.</li> <li>We can bound the memory used in read/write of data.</li> <li>It stores metadata with Protocol Buffers that supports add/remove of fields.</li>