Platform Features

Last Updated: 2026-02-12


PII Detection#

cloak.business combines deterministic pattern matching with machine-learning NLP to detect personally identifiable information across text, images, and structured data.

Pattern-Based Detection#

  • 317 custom regex recognizers covering structured data: national IDs, passports, tax numbers, credit cards, IBANs, phone numbers, emails, IP addresses, medical IDs, license plates, and more
  • 390+ entity types across 75+ countries
  • Backend-enforced request limits protect against resource exhaustion (max 250 entity filters, 50 ad-hoc recognizers, 200 total regex patterns per request)
  • Deterministic: the same input always produces the same result

NLP-Based Detection#

  • Catches unstructured PII that has no fixed format — names, locations, organizations
  • Three NLP engines: spaCy (25 languages), Stanza NER (7 languages), XLM-RoBERTa (16 languages)
  • All models run on cloak.business's own servers in Germany — no data sent to third parties

Confidence Scoring#

Every detected entity receives a confidence score between 0 and 1. Scores are determined by:

  • Pattern strength — How specific and unambiguous the regex pattern is
  • Context word analysis — Surrounding words that reinforce or weaken a match (e.g., "passport number:" before a pattern increases confidence)
  • NLP model confidence — The probability assigned by the NLP engine

Users can set a minimum confidence threshold to control sensitivity.


Anonymization Methods#

Once PII is detected, five anonymization methods are available:

MethodDescriptionExample
ReplaceSubstitutes the entity with a type labelJane Doe<PERSON>
RedactRemoves the entity entirelyCall 555-0123Call
Hash (SHA-256)Produces a one-way cryptographic hashJane Doe8f14e45f...
Encrypt (AES-256-GCM)Reversible encryption — can be decrypted with the session keyJane DoeeyJhbGci...
MaskPartially obscures the valueDE89 3704 0044 0532 0130 00**** **** **** **** **30 00
  • Replace is ideal for readability — documents remain human-readable.
  • Hash is useful for consistent pseudonymization — the same value always maps to the same hash.
  • Encrypt supports reversible workflows — authorized users can restore the original text.
  • Mask preserves partial information for verification purposes.
  • Redact removes PII completely with no trace.

Multi-Language Support#

cloak.business supports 48 user interface languages and detects PII across multiple language families.

NLP Language Coverage#

EngineLanguagesExamples
spaCy25English, German, French, Spanish, Italian, Portuguese, Dutch, Polish, Japanese, Chinese, Korean, and more
Stanza NER7Arabic, Farsi, Hebrew, Hindi, Turkish, Ukrainian, Vietnamese
XLM-RoBERTa16Cross-lingual transformer covering European, Asian, and Middle Eastern languages

Additional Capabilities#

  • Right-to-left (RTL) support — Arabic, Hebrew, Farsi, and Urdu are fully supported in the UI and detection pipeline
  • Country-specific patterns — Recognizers are tailored to each country's ID formats, phone patterns, and naming conventions
  • OCR in 37 languages — Image text extraction supports 37 Tesseract OCR language packs

Image Anonymization#

cloak.business can detect and redact PII directly from images — scanned documents, screenshots, photos of ID cards, forms, and more.

  • OCR-powered text extraction using 37 Tesseract language packs
  • Same 317 pattern recognizers applied to extracted text
  • Bounding box redaction — detected PII is covered with colored rectangles on the image
  • EXIF orientation correction — phone photos are automatically rotated before processing
  • Adjacent box merging — multi-word entities (like full names) are merged into a single redaction box
  • Supports PNG, JPEG, BMP, and TIFF formats

See Image Anonymization for full details.


Country Presets#

To simplify configuration, cloak.business provides 131+ presets that pre-select the relevant entity types for a given country, region, or industry.

  • Country presets — Select "Germany" and get all German ID, tax, financial, and phone patterns activated automatically
  • Regional presets — "European Union", "Asia-Pacific", "Americas" cover multiple countries at once
  • Industry presets — Healthcare, Finance, Technology, Legal, and more — each tailored to the entity types most relevant in that sector

Presets can be combined and customized. Start with a preset, then add or remove individual entity types as needed.


Batch Processing#

Process multiple documents in a single operation:

  • Upload several text files or paste multiple text blocks
  • All documents are processed with the same configuration (entity types, anonymization method, confidence threshold)
  • Results are returned individually for each document
  • Useful for bulk document sanitization workflows

Token-Based Pricing#

cloak.business uses a simple token system:

  • 1 token ≈ 1 character analyzed
  • Text analysis: tokens consumed based on text length
  • Image analysis: fixed token cost per image
  • Free tier: 200 tokens per billing cycle — no credit card required
  • Pro and Business plans offer higher token allocations with additional features

See Pricing & Plans for full details.


NLP Model Privacy#

A critical differentiator: all NLP models run on cloak.business's own servers in a German data center. This means:

  • spaCy, Stanza, and XLM-RoBERTa models are hosted and operated by cloak.business
  • No text is sent to Meta (XLM-RoBERTa's creator), Stanford (Stanza's creator), or any other third party
  • Your data never leaves the EU
  • Models are not fine-tuned on user data

This is not a wrapper around a third-party API. The models run on infrastructure we own and control.