Column-oriented data storage format
Apache ORC (Optimized Row Columnar) is a free and open-source column-oriented data storage format.[ 3] It is similar to the other columnar-storage file formats available in the Hadoop ecosystem such as RCFile and Parquet . It is used by most of the data processing frameworks Apache Spark , Apache Hive , Apache Flink , and Apache Hadoop .
In February 2013, the Optimized Row Columnar (ORC) file format was announced by Hortonworks in collaboration with Facebook .[ 1]
A month later, the Apache Parquet format was announced, developed by Cloudera and Twitter .[ 4]
Apache ORC format is supported by Amazon's AWS Glue .[ 5]
History
Version
Original release date
Latest version
Release date
Old version, no longer maintained: 1.0
2016-01-25
1.0.0
2016-01-25
Old version, no longer maintained: 1.1
2016-06-10
1.1.2
2016-07-08
Old version, no longer maintained: 1.2
2016-08-25
1.2.3
2016-12-12
Old version, no longer maintained: 1.3
2017-01-23
1.3.4
2017-10-16
Old version, no longer maintained: 1.4
2017-05-08
1.4.5
2019-12-09
Old version, no longer maintained: 1.5
2018-05-14
1.5.13
2021-09-15
Old version, yet still maintained: 1.6
2019-09-03
1.6.14
2022-04-14
Old version, yet still maintained: 1.7
2021-09-15
1.7.8
2023-01-21
Current stable version: 1.8
2022-09-03
1.8.2
2023-01-13
Legend:
Old version, not maintained
Old version, still maintained
Latest version
Latest preview version
Future release
See also
References
^ a b Alan Gates (February 20, 2013). "The Stinger Initiative: Making Apache Hive 100 Times Faster" . Hortonworks blog . Archived from the original on March 28, 2013.
^ "Apache ORC - Releases" . Retrieved 21 August 2024 .
^ Yin Huai, Siyuan Ma, Rubao Lee, Owen O'Malley, and Xiaodong Zhang (2013). "Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters " . VLDB' 39. pp. 1750–1761. CiteSeerX 10.1.1.406.4342 . doi :10.14778/2556549.2556559 . {{cite conference }}
: CS1 maint: multiple names: authors list (link )
^ Justin Kestelyn (March 13, 2013). "Introducing Parquet: Efficient Columnar Storage for Apache Hadoop" . Cloudera blog . Archived from the original on September 19, 2016. Retrieved May 4, 2017 .
^ "Using the ORC format in AWS Glue" . docs.aws.amazon.com . Retrieved August 21, 2024 .
Top-level projects Commons Incubator Other projects Attic Licenses