Regex-First: Why It Matters
Our Approach: Regex + NLP
- 317 regex recognizers: 100% reproducible for structured data
- NLP for names & locations with confidence scores
- Fully auditable — every detection traceable to a pattern or model
- Transparent: you always know what matched and why
- Fast, predictable performance
- 48 languages across 3 NLP engines
AI-Only Approaches
- All detections are probabilistic
- Can't explain why something was flagged
- Requires large training datasets
- Difficult to audit for compliance
- Higher compute costs (GPU needed)
- Model drift degrades accuracy over time
The 10-Step Process
From input to output, here's exactly what happens to your document
Input Text
Submit your document via web interface, API, or Office Add-in
Language Detection
System identifies the document language for optimal processing
Tokenization
Text is broken into tokens for pattern matching
Pattern Matching
317 regex recognizers and NLP models scan for 320+ entity types across 70+ countries
Context Analysis
Surrounding text improves detection accuracy
Confidence Scoring
Each detection receives a confidence score (0.0–1.0) enabling human-in-the-loop review decisions
Entity Classification
Detected items are categorized by type
Human-in-the-Loop Review
Review all detections, override false positives, and approve before anonymization
Apply Anonymization
Choose your method: Replace, Redact, Hash, Encrypt, Asymmetric Encrypt, Mask, or Keep
Output Document
Download your anonymized document
MCP Server: Privacy-First AI Integration
How your data flows through the MCP Server to keep AI tools safe
The MCP Server acts as a privacy shield, intercepting requests from AI tools, anonymizing PII, processing safe data through AI, and optionally restoring original values.
AI Tool Request
Your AI tool (Cursor, Claude) sends a request containing PII
MCP Server Intercepts
Server analyzes and detects all PII entities
Anonymization
PII is replaced with tokens or redacted
AI Processing
AI receives and processes only anonymized data
Response Return
AI response comes back through MCP Server
De-tokenization
Optional: Original values restored for user
Frequently Asked Questions
Does cloak.business use AI for detection?
No. Detection uses deterministic regex patterns and NLP models (spaCy, Stanza). This ensures 100% reproducible results — the same input always produces the same output, unlike probabilistic AI approaches.
Why regex patterns instead of AI?
Regex patterns are auditable, reproducible, and compliant. You can inspect exactly what each pattern matches. AI-based detection is non-deterministic — results can vary between runs, making compliance documentation difficult.
How accurate is the detection?
With 317 custom pattern recognizers including checksum validation (Luhn, IBAN, SSN), cloak.business achieves significantly higher accuracy than generic NER models, especially for structured identifiers like credit cards, tax IDs, and national ID numbers.
Which languages are supported?
48 languages are supported with dedicated NLP models for named entity recognition. Pattern-based detection (regex) works across all languages since it matches character patterns regardless of language.
Can I add custom entity patterns?
Yes. The API supports custom recognizer definitions so you can add patterns for proprietary identifiers, internal reference numbers, or domain-specific data formats.