Hudi¶

Hudi is an ingestion tool and data file organization to add fast ingestion support to the Hadoop platform. Data ingested via Hudi can be queried by Hive, Spark, and Presto. The name Hudi stands for “Hadoop Update, Delete, Insert”.

Logo
Website	https://hudi.apache.org/
Repository	https://github.com/apache/hudi
Byline	Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing.
License	Apache 2.0
Project age	5 years 11 months
Backers	Apache (Governed by), Uber (Creator)
Lastest News (2022-08-17)	0.12.0 Many changes in Hudi 0.12, including Presto connector and support for archive beyond savepoints. See the Release Highlights for … more
Size score (1 to 10, higher is better)	6.75
Trend score (1 to 10, higher is better)	8.0

Education Resources¶

No recent documentation available for project.

Git Commit Statistics¶

Statistics computed using Git data through November 30, 2022.

Statistic	Lifetime	Last 12 Months
Commits	39,710	19,964
Lines committed	10,583,779	4,691,434
Unique committers	415	195
Core committers	27	21

../_images/apache_hudi-monthly-commits.png

Similar Projects¶

Project	Size Score	Trend Score	Byline
Beam	9.0	8.25	Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet.
Flink	9.25	7.0	Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.
NiFi	8.5	5.0	Apache NiFi supports highly configurable directed graphs of data routing, transformation, and system mediation logic.
Storm	6.5	3.0	Storm is a distributed realtime computation system.