Data Workspaces

Data Workspaces is an open source framework for maintaining the state of a data science project, including data sets, intermediate data, results, and code. It supports reproducability through snapshotting and lineage models and collaboration through a push/pull model inspired by source control systems like Git.

Logo

../_images/data-workspaces_data-workspaces-core-small.png

Website

https://dataworkspaces.ai

Repository

https://github.com/data-workspaces/data-workspaces-core

Byline

Easy management of source data, intermediate data, and results for data science projects.

License

Apache 2.0

Project age

4 years 3 months

Backers

Benedat LLC (Creator and maintainer), Max Planck Institute for Software Systems (Creator and maintainer)

Lastest News (2022-03-14)

Release 1.6.0 We are happy to announce release 1.6.0. The primary changes are: Added support for Python 3.10 and dropped Python 3.6; … more

Size score (1 to 10, higher is better)

2.25

Trend score (1 to 10, higher is better)

3.5

Education Resources

URL

Resource Type

Description

https://data-workspaces-core.readthedocs.io/en/latest/

Documentation

Official project documentation.

https://www.dataworkspaces.ai/quick-start/

Documentation

This is a useful guide to help users to kick-start their projects.

https://youtu.be/VjU5gGSvGsY

Video

This is the first part of a demo video.

https://youtu.be/TIPEH6jlqtA

Video

This is the second part of a demo video.

Git Commit Statistics

Statistics computed using Git data through November 30, 2022.

Statistic

Lifetime

Last 12 Months

Commits

2,566

5

Lines committed

388,981

100

Unique committers

8

2

Core committers

2

0

../_images/data-workspaces_data-workspaces-core-monthly-commits.png

Similar Projects

Project

Size Score

Trend Score

Byline

Flambe

1.5

2.25

Flambé is a machine learning experimentation framework built to accelerate the entire research life cycle. Flambé’s main objective is to provide a unified interface for prototyping models, running experiments containing complex pipelines, monitoring those experiments in real-time, reporting results, and deploying a final model for inference.

MLflow

9.25

8.5

An open source platform for the machine learning lifecycle

PyCaret

8.5

8.0

An open-source, low-code machine learning library in Python.

Rubicon-ML

2.5

7.25

rubicon-ml is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a repeatable and searchable way.