Note that this project is still experimental and is subject to change. At the moment we don’t recommend its usage in production systems as it’s not thoroughly tested yet. You can learn more in the Roadmap section.
The key definitions used in the documentation, code and examples are explained on this page.
PipelineDP design enables execution on several data processing frameworks (including local execution), and is extensible to other frameworks. This is possible by implementing all DP logic in a framework-agnostic way, cleanly separated from how frameworks perform data processing.
Here’s how this works in detail:
DPEngine is the heart of PipelineDP. It’s a component that encapsulates all
differential privacy logic in a framework-agnostic way. Developers can use
DPEngine directly, however often they’ll prefer access via a set of
developer-facing APIs that resemble regular (non-private) APIs of the popular
PipelineBackend is an abstraction for low-level data processing operations
(map, join, combine, filter, etc.). The abstraction comes with several
framework-specific implementations that enable execution of PipelineDP on
Apache Beam, Apache Spark or even locally.
For more information about DP computation please check DP computations in pipelines reference doc.
A few high-level things we’re planning for future:
- Adding more aggregation types (variance, percentiles, vector summation).
- Better integration with the Python ecosystem (e.g. dataframes).
- Using advanced composition methods (such as privacy loss distribution).
- Rigorous statistical testing of differential privacy properties.