How It Works — Regex-First PII Detection

Regex-first PII detection: 317 deterministic pattern recognizers for structured data (IDs, tax numbers, credit cards), plus spaCy, Stanza, and XLM-RoBERTa NLP for names and locations across 48 languages.

Regex-First: Why It Matters

Our Approach: Regex + NLP

  • 317 regex recognizers: 100% reproducible for structured data
  • NLP for names & locations with confidence scores
  • Fully auditable — every detection traceable to a pattern or model
  • Transparent: you always know what matched and why
  • Fast, predictable performance
  • 48 languages across 3 NLP engines

AI-Only Approaches

  • All detections are probabilistic
  • Can't explain why something was flagged
  • Requires large training datasets
  • Difficult to audit for compliance
  • Higher compute costs (GPU needed)
  • Model drift degrades accuracy over time

The 10-Step Process

From input to output, here's exactly what happens to your document

1

Input Text

Submit your document via web interface, API, or Office Add-in

2

Language Detection

System identifies the document language for optimal processing

3

Tokenization

Text is broken into tokens for pattern matching

4

Pattern Matching

317 regex recognizers and NLP models scan for 320+ entity types across 70+ countries

5

Context Analysis

Surrounding text improves detection accuracy

6

Confidence Scoring

Each detection receives a confidence score (0.0–1.0) enabling human-in-the-loop review decisions

7

Entity Classification

Detected items are categorized by type

8

Human-in-the-Loop Review

Review all detections, override false positives, and approve before anonymization

9

Apply Anonymization

Choose your method: Replace, Redact, Hash, Encrypt, Asymmetric Encrypt, Mask, or Keep

10

Output Document

Download your anonymized document

MCP Server: Privacy-First AI Integration

How your data flows through the MCP Server to keep AI tools safe

The MCP Server acts as a privacy shield, intercepting requests from AI tools, anonymizing PII, processing safe data through AI, and optionally restoring original values.

AI Tool Request

Your AI tool (Cursor, Claude) sends a request containing PII

MCP Server Intercepts

Server analyzes and detects all PII entities

Anonymization

PII is replaced with tokens or redacted

AI Processing

AI receives and processes only anonymized data

Response Return

AI response comes back through MCP Server

De-tokenization

Optional: Original values restored for user

Frequently Asked Questions

Does cloak.business use AI for detection?

No. Detection uses deterministic regex patterns and NLP models (spaCy, Stanza). This ensures 100% reproducible results — the same input always produces the same output, unlike probabilistic AI approaches.

Why regex patterns instead of AI?

Regex patterns are auditable, reproducible, and compliant. You can inspect exactly what each pattern matches. AI-based detection is non-deterministic — results can vary between runs, making compliance documentation difficult.

How accurate is the detection?

With 317 custom pattern recognizers including checksum validation (Luhn, IBAN, SSN), cloak.business achieves significantly higher accuracy than generic NER models, especially for structured identifiers like credit cards, tax IDs, and national ID numbers.

Which languages are supported?

48 languages are supported with dedicated NLP models for named entity recognition. Pattern-based detection (regex) works across all languages since it matches character patterns regardless of language.

Can I add custom entity patterns?

Yes. The API supports custom recognizer definitions so you can add patterns for proprietary identifiers, internal reference numbers, or domain-specific data formats.

See It in Action

Try our PII detection and anonymization free with 200 tokens per cycle.