Ingestion

Description

Data ingestion frameworks preprocess incoming data for insertion into a data warehouse or data lake. These operate on data in batches, in “micro” batches, or on streams of records/events. There is some overlap between this ccategory and Workflow Management. Workflow Management projects tend to focus more on the control flow of a pipeline rathe than on the actual data manipulation.

Projects

5

Lines Committed vs. Age Chart (click to view)

Lines Committed vs. Age Chart (click to view)

Projects

Project

Size Score

Trend Score

Byline

Beam

9.0

8.25

Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends, including Apache Flink, Apache Spark, Google Cloud Dataflow and Hazelcast Jet.

Flink

9.25

7.0

Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities.

Hudi

6.75

8.0

Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing.

NiFi

8.5

5.0

Apache NiFi supports highly configurable directed graphs of data routing, transformation, and system mediation logic.

Storm

6.5

3.0

Storm is a distributed realtime computation system.