https://github.com/GreenmaskIO/greenmask Skip to content Navigation Menu Toggle navigation Sign in * Product + GitHub Copilot Write better code with AI + Security Find and fix vulnerabilities + Actions Automate any workflow + Codespaces Instant dev environments + Issues Plan and track work + Code Review Manage code changes + Discussions Collaborate outside of code + Code Search Find more, search less Explore + All features + Documentation + GitHub Skills + Blog * Solutions By size + Enterprise + Teams + Startups By industry + Healthcare + Financial services + Manufacturing By use case + CI/CD & Automation + DevOps + DevSecOps * Resources Topics + AI + DevOps + Security + Software Development + View all Explore + Learning Pathways + White papers, Ebooks, Webinars + Customer Stories + Partners * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles Repositories + Topics + Trending + Collections * Enterprise + Enterprise platform AI-powered developer platform Available add-ons + Advanced Security Enterprise-grade security features + GitHub Copilot Enterprise-grade AI features + Premium Support Enterprise-grade 24/7 support * Pricing Search or jump to... Search code, repositories, users, issues, pull requests... Search [ ] Clear Search syntax tips Provide feedback We read every piece of feedback, and take your input very seriously. [ ] [ ] Include my email address so I can be contacted Cancel Submit feedback Saved searches Use saved searches to filter your results more quickly Name [ ] Query [ ] To see all available qualifiers, see our documentation. Cancel Create saved search Sign in Sign up Reseting focus You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert {{ message }} GreenmaskIO / greenmask Public * Notifications You must be signed in to change notification settings * Fork 14 * Star 759 PostgreSQL database anonymization and synthetic data generation tool greenmask.io License Apache-2.0 license 759 stars 14 forks Branches Tags Activity Star Notifications You must be signed in to change notification settings * Code * Issues 27 * Pull requests 3 * Discussions * Actions * Projects 2 * Security * Insights Additional navigation options * Code * Issues * Pull requests * Discussions * Actions * Projects * Security * Insights GreenmaskIO/greenmask main BranchesTags Go to file Code Folders and files Last Last Name Name commit commit message date Latest commit History 440 Commits .github/workflows .github/workflows cmd/greenmask cmd/greenmask docker docker docs docs internal internal pkg/toolkit pkg/toolkit playground playground tests tests .dockerignore .dockerignore .gitignore .gitignore CNAME CNAME LICENSE LICENSE Makefile Makefile README.md README.md config.yml.example config.yml.example docker-compose-integration.yml docker-compose-integration.yml docker-compose.yml docker-compose.yml go.mod go.mod go.sum go.sum mkdocs.yml mkdocs.yml requirements.txt requirements.txt View all files Repository files navigation * README * Apache-2.0 license Greenmask Dump anonymization and synthetic data generation tool Greenmask is a powerful open-source utility that is designed for logical database backup dumping, anonymization, synthetic data generation and restoration. It has ported PostgreSQL libraries, making it reliable. It is stateless and does not require any changes to your database schema. It is designed to be highly customizable and backward-compatible with existing PostgreSQL utilities, fast and reliable. Discord Telegram X (formerly Twitter) Follow Build status Documentation License GitHub Release GitHub Downloads (all assets, all releases) Docker pulls Go Report Card schema.png Getting started Greenmask has a Playground - it is a sandbox environment in Docker with sample databases included to help you try Greenmask without any additional actions 1. Clone the greenmask repository and navigate to its directory by running the following commands: git clone git@github.com:GreenmaskIO/greenmask.git && cd greenmask 2. Once you have cloned the repository, start the environment by running Docker Compose: docker-compose run greenmask Features * Deterministic transformers -- deterministic approach to data transformation based on the hash functions. This ensures that the same input data will always produce the same output data. Almost each transformer supports either random or hash engine making it universal for any use case. * Dynamic parameters -- almost each transformer supports dynamic parameters, allowing to parametrize the transformer dynamically from the table column value. This is helpful for resolving the functional dependencies between columns and satisfying the constraints. * Transformation validation and easy maintainable - During configuration process, Greenmask provides validation warnings, data transformation diff and schema diff features, allowing you to monitor and maintain transformations effectively throughout the software lifecycle. Schema diff helps to avoid data leakage when schema changed. * Partitioned tables transformation inheritance -- Define transformation configurations once and apply them to all partitions within partitioned tables (using apply_for_inherited parameter), simplifying the anonymization process. * Stateless - Greenmask operates as a logical dump and does not impact your existing database schema. * Cross-platform - Can be easily built and executed on any platform, thanks to its Go-based architecture, which eliminates platform dependencies. * Database type safe - Ensures data integrity by validating data and utilizing the database driver for encoding and decoding operations. This approach guarantees the preservation of data formats. * Backward compatible - It fully supports the same features and protocols as existing vanilla PostgreSQL utilities. Dumps created by Greenmask can be successfully restored using the pg_restore utility. * Extensible - Users have the flexibility to implement domain-based transformations in any programming language or use predefined templates. * Integrable - Integrate seamlessly into your CI/CD system for automated database anonymization and restoration. * Parallel execution - Take advantage of parallel dumping and restoration, significantly reducing the time required to deliver results. * Provide variety of storages - offers a variety of storage options for local and remote data storage, including directories and S3-like storage solutions. * Pgzip support for faster compression -- by setting --pgzip, it can speeds up the dump and restoration processes through parallel compression. Use Cases Greenmask is ideal for various scenarios, including: * Backup and Restoration. Use Greenmask for your daily routines involving logical backup dumping and restoration. It seamlessly handles tasks like table restoration after truncation. Its functionality closely mirrors that of pg_dump and pg_restore, making it a straightforward replacement. * Anonymization, Transformation, and Data Masking. Employ Greenmask for anonymizing, transforming, and masking backups, especially when setting up a staging environment or for analytical purposes. It simplifies the deployment of a pre-production environment with consistently anonymized data, facilitating faster time-to-market in the development lifecycle. General Information It is evident that the most appropriate approach for executing logical backup dumping and restoration is by leveraging the core PostgreSQL utilities, specifically pg_dump and pg_restore. Greenmask has been purposefully designed to align with PostgreSQL's native utilities, ensuring compatibility. Greenmask primarily handles data dumping operations independently and delegates the responsibilities of schema dumping and restoration to pg_dump and pg_restore, maintaining seamless integration with PostgreSQL's standard tools. Backup and Process Greenmask uses the directory format of pg_dump and pg_restore. This format is particularly suitable for parallel execution and partial restoration, and it includes clear metadata files that aid in determining the backup and restoration steps. Greenmask has been optimized to work seamlessly with remote storage systems and anonymization procedures. Storage Options * s3 - This option supports any S3-like storage system, including AWS S3, making it versatile and adaptable to various cloud-based storage solutions. * directory - This is the standard choice, representing the ordinary filesystem directory for local storage. Data Anonymization and Validation Greenmask works with COPY lines, collects schema metadata using the Golang driver, and employs this driver in the encoding and decoding process. The validate command offers a way to assess the impact on both schema (validation warnings) and data (transformation and displaying differences). This command allows you to validate the schema and data transformations, ensuring the desired outcomes during the Anonymization process. Customization If your table schema relies on functional dependencies between columns, you can address this challenge using the Dynamic parameters. By setting dynamic parameters, you can resolve such as created_at and updated_at cases, where the updated_at must be greater or equal than the created_at. If you need to implement custom logic imperatively use TemplateRecord or Template transformers. Greenmask provides a framework for creating your custom transformers, which can be reused efficiently. These transformers can be seamlessly integrated without requiring recompilation, thanks to the PIPE (stdin /stdout) interaction. Furthermore, Greenmask's architecture is designed to be highly extensible, making it possible to introduce other interaction protocols, such as HTTP or Socket, for conducting anonymization procedures. PostgreSQL Version Compatibility Greenmask is compatible with PostgreSQL versions 11 and higher. Links * Documentation * Email: support@greenmask.io * Twitter * Telegram * Discord * DockerHub References * Utilized the Demo database, provided by PostgresPro, for integration testing purposes. * Employed the adventureworks database created by morenoh149/ postgresDBSamples, in the Docker Compose playground. About PostgreSQL database anonymization and synthetic data generation tool greenmask.io Topics golang security obfuscation restore postgresql s3 staging obfuscator dump transform deterministic security-tools anonymization masking synthetic-data Resources Readme License Apache-2.0 license Activity Custom properties Stars 759 stars Watchers 4 watching Forks 14 forks Report repository Releases 19 v0.2.0 Latest Oct 9, 2024 + 18 releases Packages 0 Contributors 7 * * * * * * * Languages * Go 99.0% * Other 1.0% Footer (c) 2024 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact * Manage cookies * Do not share my personal information You can't perform that action at this time.