Dask

Dask is a Python-focused distributed programming environment that integrates with NumPy, Pandas, and Scikit-Learn. It is lighter weight than Hadoop ecosystem tools like Spark.

Logo

../_images/dask_dask-small.png

Website

https://dask.org

Repository

https://github.com/dask/dask

Byline

Parallel computing with task scheduling.

License

BSD 3-clause

Project age

6 years 5 months

Backers

Anaconda Inc. (Commercial support), Chan Zuckerberg Initiative (Grant), NumFocus (Grant), Quansight (Commercial support)

Size score (1 to 10, higher is better)

6.0

Trend score (1 to 10, higher is better)

6.25

Education Resources

URL

Resource Type

Description

https://docs.dask.org/en/latest/

Documentation

Official project documentation.

Git Commit Statistics

Statistics computed using Git data through May 31, 2021.

Statistic

Lifetime

Last 12 Months

Commits

29,563

10,551

Lines committed

3,233,979

918,305

Unique committers

478

131

Core committers

14

19

../_images/dask_dask-monthly-commits.png

Similar Projects

Project

Size Score

Trend Score

Byline

Analytics Zoo

5.0

8.25

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray

HPCC

5.5

7.0

HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics.

Hadoop

8.5

7.5

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

Mars

6.75

6.25

Mars is a tensor-based unified framework for large-scale data computation which scales Numpy, Pandas and Scikit-learn.

Modin

4.5

6.25

Speed up your Pandas workflows by changing a single line of code