Get started

Get up and running with PipelineDP by learning how to set up your environment and start running examples locally.

PipelineDP overview

PipelineDP is a Python open source framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.

To make differential privacy accessible to non-experts, PipelineDP:

Provides a convenient API familiar to Spark or Beam developers.
Encapsulates the complexities of differential privacy, such as:
- protecting outliers and rare categories,
- generating safe noise,
- privacy budget accounting.
Supports standard computations: count, sum, and average (and soon more).

Note that this project is still experimental and is subject to change. At the moment we don’t recommend its usage in production systems as it’s not thoroughly tested yet. You can learn more in the Roadmap section.

Setting up your environment

Here’s how you set up PipelineDP on your computer:

# Check that your Python version is 3.7 or greater
$ python --version

# Create and activate a Python virtual environment
$ python -m venv demo-pipelinedp
$ source demo-pipelinedp/bin/activate

# Install PipelineDP
$ pip install pipeline-dp

Trying it out

Quick tour (5 min, no setup needed)

A simple example that shows how to calculate restaurant visits with differential privacy.

View as Jupiter Notebook Run in Google Colab

Advanced tour (1 hour, no setup needed)

A deeper walk-through: learn the key concepts of differential privacy and PipelineDP API.

View as Jupiter Notebook Run in Google Colab

Run an example locally (15 min, requires setting up Python environment)

If you’d like to plan to run an example on your computer instead of Jupiter notebook, please go through the “Setting up the environment” section below and run:

# 1. Follow the “set up the environment” section above to install PipelineDP

# 2. Download and execute example code from git
$ git clone https://github.com/OpenMined/PipelineDP.git
$ cd PipelineDP/examples/restaraunt_visits/
$ pip install pandas absl-py
$ python run_without_frameworks.py --output_file=output.txt

# 3. Check the results 
$ cat output.txt

# 4. Look inside run_without_frameworks.py file, play with parameters and metrics