Why Regex-First?
For regulatory compliance, you need results you can explain and reproduce. Our regex-first approach keeps structured data detection fully deterministic, while NLP handles names and locations with transparent confidence scores.
Detailed Comparison
| Regex-First (Us) | AI/ML-Based | |
|---|---|---|
| Reproducibility | Structured data: 100% identical. Names: confidence-scored | All results vary between runs |
| Auditability | Every detection traceable to pattern or NLP model | Black box — can't explain decisions |
| Training Data | Regex: none. NLP: pre-trained models included | Requires custom training datasets |
| Model Drift | Regex: none. NLP: versioned, stable models | Degrades unpredictably over time |
| Performance | Fast, CPU only | Variable, GPU-dependent |
| Compute Cost | Low (CPU only) | High (GPU often needed) |
| Regulatory Compliance | Easy — patterns + confidence scores are auditable | Difficult to prove to regulators |
How Pattern Matching Works
Each entity type has carefully crafted regex patterns that match specific formats.
Email Addresses
Matches standard email format: local-part@domain.tld
Credit Card Numbers
Matches Visa, Mastercard, Amex, and other card formats with Luhn validation
German IBAN
Matches German IBAN format with optional spaces
Built for Compliance
When auditors ask "why was this detected?" you need a clear answer. Regex detections trace to a specific pattern. NLP detections include model name and confidence score.
- GDPR Article 25: Privacy by design with explainable processing
- ISO 27001: Documented, repeatable processes
- Audit Trail: Every detection can be traced to a specific pattern
Example Audit Response
Q: Why was "john.smith@company.com" flagged?
A: Matched email pattern at position 45-68 with confidence 0.95. Pattern: standard email format validation.