How Detection Works
Regex Pattern Matching (Structured PII)
317 custom PatternRecognizers with regex patterns detect structured data like national IDs, tax numbers, passports, and driver licenses. Each pattern uses boundary assertions to prevent false matches in code or structured data.
NLP Named Entity Recognition (Names & Locations)
spaCy (25 languages), Stanza NER (7 languages), and XLM-RoBERTa transformers (16 languages) detect unstructured PII like person names, locations, and organizations that cannot be captured by regex alone. All models run on our own servers in Germany — no data is ever sent to Meta, Google, Stanford, or any third party.
Confidence Scoring
Each detection includes a confidence score (0-1). Highly-specific formats (e.g., German IBAN DE89 3704 0044 0532 0130 00) score 0.85+, while generic digit patterns score 0.3-0.5 and rely on context words for confirmation.
Context Word Analysis
Each recognizer has context words in the relevant language (e.g., 'Personalausweis' for German IDs, 'kitambulisho' for Kenyan IDs). When context words appear near a match, the confidence score is boosted.
Supported Entity Types
Comprehensive coverage of personal information types across categories
Personal Identifiers
- Person Names
- Email Addresses
- Phone Numbers
- Date of Birth
- Age
- Gender
- Nationality
Financial Information
- Credit Card Numbers
- IBAN
- BIC/SWIFT
- Bank Account Numbers
- Tax IDs
- VAT Numbers
Government IDs
- Social Security Numbers (SSN)
- National ID Numbers
- Passport Numbers
- Drivers License
- Health Insurance IDs
Location Data
- Street Addresses
- Cities
- ZIP/Postal Codes
- Countries
- GPS Coordinates
Digital Identifiers
- IP Addresses (v4/v6)
- MAC Addresses
- URLs
- Domain Names
- User IDs
Organization Data
- Company Names
- Organization IDs
- Registration Numbers
- Department Names
Temporal Data
- Dates
- Times
- Date Ranges
- Timestamps
International Formats
- German ID (Personalausweis)
- UK National Insurance
- Spanish DNI/NIE
- Italian Codice Fiscale
- And 70+ more country-specific formats
Custom Entity Support
Need to detect custom patterns? Create your own entity types with regex patterns or use our AI-assisted pattern generator.
Manual Pattern Creation
Define regex patterns for proprietary identifiers like internal employee IDs, project codes, or custom reference numbers.
AI Pattern Generator
Describe what you want to detect in plain language, and our AI generates optimized regex patterns for you.