Big Data

Description

Tools for transforming and analyzing the largest data sets.

Projects

23

Size vs. Trend Chart (click to view)

Size vs. Trend Chart (click to view)

Sub-categories

Category

Description

Projects

Data Cleansing

Tools to normalize, reformat, and address consistency issues in data before performing anlytics.

3

Distributed Data Processing

Tools for scaling data transformations and analyses across multiple servers.

10

Ingestion

Data ingestion frameworks preprocess incoming data for insertion into a data warehouse or data lake. These operate on data in batches, in “micro” batches, or on streams of records/events. There is some overlap between this ccategory and Workflow Management. Workflow Management projects tend to focus more on the control flow of a pipeline rathe than on the actual data manipulation.

5

Messaging

Point-to-point messaging, publish/subscribe messaging, and message queues.

1

Workflow Management

Workflow Management engines coordinate the steps in a data pipeline for automated execution. The projects in this category focus on the control flow of tasks. There is some overlap between this category and Ingestion. Ingestion projects tend to focus more on the dataflow and manipulation of individual records.

4