The Assumption That Kills Flexibility
Most PII tools assume anonymization is a one-way operation. Once data is redacted, it is gone forever.
This assumption works for some use cases. But it fails catastrophically for:
- Legal discovery - Courts may order original documents
- Clinical trials - Adverse event reporting requires patient identification
- Audit requirements - Regulators need to verify what was protected
- Research - Linking anonymized records back to sources validates findings
What Competitors Offer
Microsoft Presidio
Presidio provides four anonymization operators: Replace, Redact, Hash, and Mask. None are reversible.
To achieve reversibility with Presidio, you must build a custom encryption operator, manage an external key store, implement decryption logic, and maintain a separate audit system.
Private AI
Private AI offers de-identification for AI workflows but focuses on irreversible anonymization for privacy preservation. Reversibility is not a core feature.
Protecto
Protecto criticizes deterministic masking limitations: if an AI modifies the text slightly, the mapping breaks. Token-based approaches require maintaining external mapping tables that can be lost or corrupted.
Why Reversibility Matters
GDPR Compliance
GDPR explicitly recognizes pseudonymization as a valid data protection measure - and pseudonymization is reversible by definition.
HIPAA Safe Harbor
HIPAA allows pseudonymization with a re-identification key held by a covered entity. Clinical trials routinely require this capability.
Legal Discovery
Courts may order production of original documents. Irreversible anonymization makes discovery obligations impossible to fulfill.
Audit Compliance
Regulators may ask what PII was in a document. With reversible encryption, you can demonstrate, decrypt, re-encrypt, and document.
Our Approach: AES-256-GCM Encryption
cloak.business offers five anonymization methods. Only Encrypt is reversible.
| Method | Reversible? | Use Case |
|---|---|---|
| Replace | Substitute with fake data | |
| Redact | Remove entirely | |
| Mask | Partial obscuring | |
| Hash | One-way transformation | |
| Encrypt | AES-256-GCM reversible |
Technical Specifications
- Algorithm: AES-256-GCM
- Key derivation: Argon2id
- Nonce: Random 12-byte per encryption
- Authentication: GCM tag integrity check
- Key storage: Client-side only (zero-knowledge)
The Self-Contained Advantage
Simple tokenization requires external token-to-value mapping. Mapping security becomes critical. Mapping can be lost or corrupted. There is no embedded audit trail.
Our approach: Encrypted value embedded in token itself. No external mapping required. Self-contained and auditable. If the AI returns modified text, the encrypted tokens survive.
Key Takeaways
- Irreversible anonymization blocks legitimate use cases - Legal, audit, research all need reversibility
- GDPR and HIPAA explicitly permit pseudonymization - Regulators expect this capability
- Token-based approaches have fragility - External mappings can break or be lost
- Self-contained encryption is robust - No external dependencies
- Reversible encryption is a differentiator - Most tools simply do not offer it
Sources
Related Posts
Why 317 Pattern Recognizers Beat 30
Microsoft Presidio ships with ~30 recognizers focused on US formats. Learn why 317 custom recognizers with checksum validation achieve 82% higher accuracy for global PII detection.
When SaaS-Only Isn't Enough
Air-gapped networks and data sovereignty mandates require offline PII processing. Learn why SaaS-only tools fail and how Desktop App provides full offline capability.