Platform Features

Last Updated: 2026-02-12

PII Detection#

cloak.business combines deterministic pattern matching with machine-learning NLP to detect personally identifiable information across text, images, and structured data.

Pattern-Based Detection#

317 custom regex recognizers covering structured data: national IDs, passports, tax numbers, credit cards, IBANs, phone numbers, emails, IP addresses, medical IDs, license plates, and more
317 entity types across 70+ countries
Backend-enforced request limits protect against resource exhaustion (max 250 entity filters, 50 ad-hoc recognizers, 200 total regex patterns per request)
Deterministic: the same input always produces the same result

NLP-Based Detection#

Catches unstructured PII that has no fixed format — names, locations, organizations
Three NLP engines: spaCy (25 languages), Stanza NER (7 languages), XLM-RoBERTa (16 languages)
All models run on cloak.business's own servers in Germany — no data sent to third parties

Confidence Scoring#

Every detected entity receives a confidence score between 0 and 1. Scores are determined by:

Pattern strength — How specific and unambiguous the regex pattern is
Context word analysis — Surrounding words that reinforce or weaken a match (e.g., "passport number:" before a pattern increases confidence)
NLP model confidence — The probability assigned by the NLP engine

Users can set a minimum confidence threshold to control sensitivity.

Anonymization Methods#

Once PII is detected, five anonymization methods are available:

Method	Description	Example
Replace	Substitutes the entity with a type label	`Jane Doe` → `<PERSON>`
Redact	Removes the entity entirely	`Call 555-0123` → `Call`
Hash (SHA-256)	Produces a one-way cryptographic hash	`Jane Doe` → `8f14e45f...`
Encrypt (AES-256-GCM)	Reversible encryption — can be decrypted with the session key	`Jane Doe` → `eyJhbGci...`
Mask	Partially obscures the value	`DE89 3704 0044 0532 0130 00` → `** 30 00`

Replace is ideal for readability — documents remain human-readable.
Hash is useful for consistent pseudonymization — the same value always maps to the same hash.
Encrypt supports reversible workflows — authorized users can restore the original text.
Mask preserves partial information for verification purposes.
Redact removes PII completely with no trace.

Multi-Language Support#

cloak.business supports 48 user interface languages and detects PII across multiple language families.

NLP Language Coverage#

Engine	Languages	Examples
spaCy	25	English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Japanese, Chinese, Korean, and more
Stanza NER	7	Arabic, Farsi, Hebrew, Hindi, Turkish, Ukrainian, Vietnamese
XLM-RoBERTa	16	Cross-lingual transformer covering European, Asian, and Middle Eastern languages

Additional Capabilities#

Right-to-left (RTL) support — Arabic, Hebrew, Farsi, and Urdu are fully supported in the UI and detection pipeline
Country-specific patterns — Recognizers are tailored to each country's ID formats, phone patterns, and naming conventions
OCR in 37 languages — Image text extraction supports 37 Tesseract OCR language packs

Image Anonymization#

cloak.business can detect and redact PII directly from images — scanned documents, screenshots, photos of ID cards, forms, and more.

OCR-powered text extraction using 37 Tesseract language packs
Same 317 pattern recognizers applied to extracted text
Bounding box redaction — detected PII is covered with colored rectangles on the image
EXIF orientation correction — phone photos are automatically rotated before processing
Adjacent box merging — multi-word entities (like full names) are merged into a single redaction box
Supports PNG, JPEG, BMP, and TIFF formats

See Image Anonymization for full details.

Country Presets#

To simplify configuration, cloak.business provides 85+ presets that pre-select the relevant entity types for a given country, region, or industry.

Country presets — Select "Germany" and get all German ID, tax, financial, and phone patterns activated automatically
Regional presets — "European Union", "Asia-Pacific", "Americas" cover multiple countries at once
Industry presets — Healthcare, Finance, Technology, Legal, and more — each tailored to the entity types most relevant in that sector

Presets can be combined and customized. Start with a preset, then add or remove individual entity types as needed.

Batch Processing#

Process multiple documents in a single operation:

Upload several text files or paste multiple text blocks
All documents are processed with the same configuration (entity types, anonymization method, confidence threshold)
Results are returned individually for each document
Useful for bulk document sanitization workflows

Token-Based Pricing#

cloak.business uses a simple token system:

1 token ≈ 1 character analyzed
Text analysis: tokens consumed based on text length
Image analysis: fixed token cost per image
Free tier: 200 tokens per billing cycle — no credit card required
Pro and Business plans offer higher token allocations with additional features

See Pricing & Plans for full details.

NLP Model Privacy#

A critical differentiator: all NLP models run on cloak.business's own servers in a German data center. This means:

spaCy, Stanza, and XLM-RoBERTa models are hosted and operated by cloak.business
No text is sent to Meta (XLM-RoBERTa's creator), Stanford (Stanza's creator), or any other third party
Your data never leaves the EU
Models are not fine-tuned on user data

This is not a wrapper around a third-party API. The models run on infrastructure we own and control.