Get started
Get up and running with PipelineDP by learning how to set up your environment and start running examples locally.
PipelineDP overview
PipelineDP is a Python open source framework for applying differentially private aggregations to large datasets using batch processing systems such as Apache Spark, Apache Beam, and more.
To make differential privacy accessible to non-experts, PipelineDP:
- Provides a convenient API familiar to Spark or Beam developers.
- Encapsulates the complexities of differential privacy, such as:
- protecting outliers and rare categories,
- generating safe noise,
- privacy budget accounting.
- Supports standard computations: count, sum, and average (and soon more).
Note that this project is still experimental and is subject to change. At the moment we don’t recommend its usage in production systems as it’s not thoroughly tested yet. You can learn more in the Roadmap section.
Setting up your environment
Here’s how you set up PipelineDP on your computer:
Trying it out
Quick tour (5 min, no setup needed)
A simple example that shows how to calculate restaurant visits with differential privacy.
View as Jupiter Notebook Run in Google Colab
Advanced tour (1 hour, no setup needed)
A deeper walk-through: learn the key concepts of differential privacy and PipelineDP API.
View as Jupiter Notebook Run in Google Colab
Run an example locally (15 min, requires setting up Python environment)
If you’d like to plan to run an example on your computer instead of Jupiter notebook, please go through the “Setting up the environment” section below and run: