Hudi¶
Hudi is an ingestion tool and data file organization to add fast ingestion support to the Hadoop platform. Data ingested via Hudi can be queried by Hive, Spark, and Presto. The name Hudi stands for “Hadoop Update, Delete, Insert”.
Logo |
![]() |
---|---|
Website |
|
Repository |
|
Byline |
Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing. |
License |
Apache 2.0 |
Project age |
5 years 3 months |
Backers |
Apache (Governed by), Uber (Creator) |
Lastest News (2022-05-02) |
Release 0.11.0 Release highlights: multi-modal index, data skipping with metadata table, async indexer, Spark DataSource improvements, … more |
Size score (1 to 10, higher is better) |
6.25 |
Trend score (1 to 10, higher is better) |
9.25 |
Education Resources¶
No recent documentation available for project.
Git Commit Statistics¶
Statistics computed using Git data through March 31, 2022.
Statistic |
Lifetime |
Last 12 Months |
---|---|---|
Commits |
24,333 |
20,258 |
Lines committed |
6,748,138 |
5,101,764 |
Unique committers |
308 |
155 |
Core committers |
26 |
22 |

Similar Projects¶
Project |
Size Score |
Trend Score |
Byline |
---|---|---|---|
9.0 |
7.5 |
Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet. |
|
9.25 |
7.25 |
Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. |
|
8.25 |
6.5 |
Apache NiFi supports highly configurable directed graphs of data routing, transformation, and system mediation logic. |
|
6.75 |
3.25 |
Storm is a distributed realtime computation system. |