Skip to content

Releases: NGO-Algorithm-Audit/python-synthpop

Release v0.1.2

Choose a tag to compare

@jfparie jfparie released this 25 May 10:38
  • fix bug in function data_processor.py/_decode_categorical. The issue being that the GC can simulate out of bounds value that need to be capped to the maximum number of classes for the categorical variable.

Release v0.1.1

Choose a tag to compare

@jfparie jfparie released this 14 Mar 15:17
d54cab6

python-synthpop v0.1.1 release summary

  • fix bug in efficacy_metrics.py

v0.1 python-synhtpop

Choose a tag to compare

@jfparie jfparie released this 10 Mar 18:28
b09d3fe

python-synthpop v0.1 release summary 🚀

We are excited to announce the release of python-synthpop v0.1 – an open-source library for synthetic data generation (SDG). This release introduces robust implementations of Classification and Regression Trees (CART) and Gaussian Copula (GC) synthesizers, equipping users with an open-source python library to generate high-quality, privacy-preserving synthetic datasets.

Key Features in This Release:

  1. Missing data handling:

    • Users can decide whether missing data should be removed or imputed;
    • Users are guided on identifying the type of missing data in their dataset (e.g., missing at random or not at random) and advised on whether to handle it through removal or imputation based on best practices;
    • This ensures smooth handling of datasets with missing values.
  2. Preprocessing utilities:

    • Robust preprocessing functions for data normalization, transformation, and feature engineering to streamline the preparation of data before synthesis.
  3. Synthetic data generation methods:

    • CART-based synthesis: Create synthetic datasets that retain complex relationships in your data using decision trees;
    • Gaussian Copula synthesis: Leverage the power of copulas to capture and reproduce intricate dependencies between variables.
  4. Postprocessing functions:

    • Seamlessly map synthetic data back to its original structure and domain.
  5. Evaluation metrics:

    • Built-in tools to evaluate the quality of synthetic datasets:
      • Distributional similarity metrics.
      • Utility measures for downstream tasks (e.g., classification, regression).
      • Privacy-preserving metrics to assess disclosure risks.

Live demo in web app

A live demo of python-synthpop can be found in this local-first web app. In this architectural setup, data is processed entirely on your device and it not uploaded to any third-party, such as cloud providers. This computing approach is called local-first and allows organisations to securely use tools locally. Instructions how the tool can be hosted locally, incl. source code, can be found here.

Documentation and Support

v0.0.9

Choose a tag to compare

@devhelpr devhelpr released this 04 Mar 08:32
3c17446
Update pyproject.toml

v0.0.8

Choose a tag to compare

@devhelpr devhelpr released this 04 Mar 08:22
137f156
Update pyproject.toml

v0.0.7

Choose a tag to compare

@devhelpr devhelpr released this 04 Mar 08:19
1752e6f
Update pyproject.toml

v0.0.6

Choose a tag to compare

@devhelpr devhelpr released this 04 Mar 08:00
1698203
Update setup.py

v0.0.5

Choose a tag to compare

@devhelpr devhelpr released this 04 Mar 07:55
171656c
Update publish.yml

v0.0.4

Choose a tag to compare

@devhelpr devhelpr released this 04 Mar 07:49
ba75d25
Update pyproject.toml

v0.0.3

Choose a tag to compare

@devhelpr devhelpr released this 03 Mar 20:56
fc937ca
Update publish.yml