Hudi

Hudi is an ingestion tool and data file organization to add fast ingestion support to the Hadoop platform. Data ingested via Hudi can be queried by Hive, Spark, and Presto. The name Hudi stands for “Hadoop Update, Delete, Insert”.

Logo

../_images/apache_hudi-small.png

Website

https://hudi.apache.org/

Repository

https://github.com/apache/hudi

Byline

Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing.

License

Apache 2.0

Project age

5 years 11 months

Backers

Apache (Governed by), Uber (Creator)

Lastest News (2022-08-17)

0.12.0 Many changes in Hudi 0.12, including Presto connector and support for archive beyond savepoints. See the Release Highlights for … more

Size score (1 to 10, higher is better)

6.75

Trend score (1 to 10, higher is better)

8.0

Education Resources

No recent documentation available for project.

Git Commit Statistics

Statistics computed using Git data through November 30, 2022.

Statistic

Lifetime

Last 12 Months

Commits

39,710

19,964

Lines committed

10,583,779

4,691,434

Unique committers

415

195

Core committers

27

21

../_images/apache_hudi-monthly-commits.png

Similar Projects

Project

Size Score

Trend Score

Byline

Beam

9.0

8.25

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet.

Flink

9.25

7.0

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.

NiFi

8.5

5.0

Apache NiFi supports highly configurable directed graphs of data routing, transformation, and system mediation logic.

Storm

6.5

3.0

Storm is a distributed realtime computation system.