Hudi

Hudi is an ingestion tool and data file organization to add fast ingestion support to the Hadoop platform. Data ingested via Hudi can be queried by Hive, Spark, and Presto. The name Hudi stands for “Hadoop Update, Delete, Insert”.

Logo

../_images/apache_hudi-small.png

Website

https://hudi.apache.org/

Repository

https://github.com/apache/hudi

Byline

Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing.

License

Apache 2.0

Project age

5 years 3 months

Backers

Apache (Governed by), Uber (Creator)

Lastest News (2022-05-02)

Release 0.11.0 Release highlights: multi-modal index, data skipping with metadata table, async indexer, Spark DataSource improvements, … more

Size score (1 to 10, higher is better)

6.25

Trend score (1 to 10, higher is better)

9.25

Education Resources

No recent documentation available for project.

Git Commit Statistics

Statistics computed using Git data through March 31, 2022.

Statistic

Lifetime

Last 12 Months

Commits

24,333

20,258

Lines committed

6,748,138

5,101,764

Unique committers

308

155

Core committers

26

22

../_images/apache_hudi-monthly-commits.png

Similar Projects

Project

Size Score

Trend Score

Byline

Beam

9.0

7.5

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet.

Flink

9.25

7.25

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.

NiFi

8.25

6.5

Apache NiFi supports highly configurable directed graphs of data routing, transformation, and system mediation logic.

Storm

6.75

3.25

Storm is a distributed realtime computation system.