Spark

The most popular big data processing tool and the successor to Hadoop Map-Reduce. It is mostly used for ETL (extract transform load), but also includes facilities for streaming aggregations, graph computations, and machine learning.

Logo

../_images/apache_spark-small.png

Website

http://spark.apache.org

Repository

https://github.com/apache/spark

Byline

A unified analytics engine for large-scale data processing.

License

Apache 2.0

Project age

11 years 2 months

Backers

Apache (Governed by), DataBricks (Commercial Product By)

Lastest News (2021-03-02)

Spark 3.1.1 is released. Apache Spark 3.1.1 is the second release of the 3.x line. This release adds Python type annotations and Python … more

Size score (1 to 10, higher is better)

9.5

Trend score (1 to 10, higher is better)

6.75

Education Resources

URL

Resource Type

Description

https://spark.apache.org/docs/3.1.1/

Documentation

Official project documentation.

Git Commit Statistics

Statistics computed using Git data through May 31, 2021.

Statistic

Lifetime

Last 12 Months

Commits

59,321

32,011

Lines committed

34,149,872

10,141,766

Unique committers

2,443

346

Core committers

12

20

../_images/apache_spark-monthly-commits.png

Similar Projects

Project

Size Score

Trend Score

Byline

Dask

6.0

6.25

Parallel computing with task scheduling.

HPCC

5.5

7.0

HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics.

Hadoop

8.5

7.5

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

Mars

6.75

6.25

Mars is a tensor-based unified framework for large-scale data computation which scales Numpy, Pandas and Scikit-learn.

Ray

8.0

9.0

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.