Platform Features
Last Updated: 2026-02-12
PII Detection#
cloak.business combines deterministic pattern matching with machine-learning NLP to detect personally identifiable information across text, images, and structured data.
Pattern-Based Detection#
- 317 custom regex recognizers covering structured data: national IDs, passports, tax numbers, credit cards, IBANs, phone numbers, emails, IP addresses, medical IDs, license plates, and more
- 390+ entity types across 75+ countries
- Backend-enforced request limits protect against resource exhaustion (max 250 entity filters, 50 ad-hoc recognizers, 200 total regex patterns per request)
- Deterministic: the same input always produces the same result
NLP-Based Detection#
- Catches unstructured PII that has no fixed format — names, locations, organizations
- Three NLP engines: spaCy (25 languages), Stanza NER (7 languages), XLM-RoBERTa (16 languages)
- All models run on cloak.business's own servers in Germany — no data sent to third parties
Confidence Scoring#
Every detected entity receives a confidence score between 0 and 1. Scores are determined by:
- Pattern strength — How specific and unambiguous the regex pattern is
- Context word analysis — Surrounding words that reinforce or weaken a match (e.g., "passport number:" before a pattern increases confidence)
- NLP model confidence — The probability assigned by the NLP engine
Users can set a minimum confidence threshold to control sensitivity.
Anonymization Methods#
Once PII is detected, five anonymization methods are available:
| Method | Description | Example |
|---|---|---|
| Replace | Substitutes the entity with a type label | Jane Doe → <PERSON> |
| Redact | Removes the entity entirely | Call 555-0123 → Call |
| Hash (SHA-256) | Produces a one-way cryptographic hash | Jane Doe → 8f14e45f... |
| Encrypt (AES-256-GCM) | Reversible encryption — can be decrypted with the session key | Jane Doe → eyJhbGci... |
| Mask | Partially obscures the value | DE89 3704 0044 0532 0130 00 → **** **** **** **** **30 00 |
- Replace is ideal for readability — documents remain human-readable.
- Hash is useful for consistent pseudonymization — the same value always maps to the same hash.
- Encrypt supports reversible workflows — authorized users can restore the original text.
- Mask preserves partial information for verification purposes.
- Redact removes PII completely with no trace.
Multi-Language Support#
cloak.business supports 48 user interface languages and detects PII across multiple language families.
NLP Language Coverage#
| Engine | Languages | Examples |
|---|---|---|
| spaCy | 25 | English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Japanese, Chinese, Korean, and more |
| Stanza NER | 7 | Arabic, Farsi, Hebrew, Hindi, Turkish, Ukrainian, Vietnamese |
| XLM-RoBERTa | 16 | Cross-lingual transformer covering European, Asian, and Middle Eastern languages |
Additional Capabilities#
- Right-to-left (RTL) support — Arabic, Hebrew, Farsi, and Urdu are fully supported in the UI and detection pipeline
- Country-specific patterns — Recognizers are tailored to each country's ID formats, phone patterns, and naming conventions
- OCR in 37 languages — Image text extraction supports 37 Tesseract OCR language packs
Image Anonymization#
cloak.business can detect and redact PII directly from images — scanned documents, screenshots, photos of ID cards, forms, and more.
- OCR-powered text extraction using 37 Tesseract language packs
- Same 317 pattern recognizers applied to extracted text
- Bounding box redaction — detected PII is covered with colored rectangles on the image
- EXIF orientation correction — phone photos are automatically rotated before processing
- Adjacent box merging — multi-word entities (like full names) are merged into a single redaction box
- Supports PNG, JPEG, BMP, and TIFF formats
See Image Anonymization for full details.
Country Presets#
To simplify configuration, cloak.business provides 131+ presets that pre-select the relevant entity types for a given country, region, or industry.
- Country presets — Select "Germany" and get all German ID, tax, financial, and phone patterns activated automatically
- Regional presets — "European Union", "Asia-Pacific", "Americas" cover multiple countries at once
- Industry presets — Healthcare, Finance, Technology, Legal, and more — each tailored to the entity types most relevant in that sector
Presets can be combined and customized. Start with a preset, then add or remove individual entity types as needed.
Batch Processing#
Process multiple documents in a single operation:
- Upload several text files or paste multiple text blocks
- All documents are processed with the same configuration (entity types, anonymization method, confidence threshold)
- Results are returned individually for each document
- Useful for bulk document sanitization workflows
Token-Based Pricing#
cloak.business uses a simple token system:
- 1 token ≈ 1 character analyzed
- Text analysis: tokens consumed based on text length
- Image analysis: fixed token cost per image
- Free tier: 200 tokens per billing cycle — no credit card required
- Pro and Business plans offer higher token allocations with additional features
See Pricing & Plans for full details.
NLP Model Privacy#
A critical differentiator: all NLP models run on cloak.business's own servers in a German data center. This means:
- spaCy, Stanza, and XLM-RoBERTa models are hosted and operated by cloak.business
- No text is sent to Meta (XLM-RoBERTa's creator), Stanford (Stanza's creator), or any other third party
- Your data never leaves the EU
- Models are not fine-tuned on user data
This is not a wrapper around a third-party API. The models run on infrastructure we own and control.