48 Languages Supported
Full PII detection and anonymization across the entire platform
spaCy NLP - Runs Locally (25 languages)
Stanza NER - Runs Locally (7 languages)
XLM-RoBERTa Transformer - Runs Locally (16 languages)
RTL Support
Powered by Advanced NLP
Three NLP engines working together for maximum language coverage
- Lazy-loaded models (max 5 cached) for memory efficiency
- Automatic language detection
- Mixed-language document processing
- Language-specific entity patterns
Country-Specific Formats
We detect PII in formats specific to each country and region.
European Formats
- German: Personalausweis, Steuer-ID, Reisepass
- French: NIR, Carte Nationale, Permis
- Italian: Codice Fiscale, Carta d'Identità
- Spanish: DNI, NIE, NIF
- Dutch: BSN, Rijbewijs
- Polish: PESEL, NIP, REGON
Asia-Pacific Formats
- Japan: My Number, Passport
- India: Aadhaar, PAN, GSTIN, Vehicle Registration
- Thailand: National ID, Tax ID, Passport
- Indonesia: NIK, NPWP, Passport
- Vietnam: CCCD, Tax Code, Passport
- Malaysia: MyKad, Tax ID, Passport
Americas, Africa & Middle East
- US: SSN, Driver's License, Passport
- UK: National Insurance, NHS Number
- Canada: SIN, Driver's License
- Australia: TFN, Medicare, ABN
- Kenya: National ID, KRA PIN, Passport
- South Africa: ID Number, Tax Number, Passport
Frequently Asked Questions
Which 48 languages does cloak.business support?
cloak.business supports Afrikaans, Arabic, Armenian, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Ukrainian, Urdu, and Vietnamese — with full RTL support for Arabic, Hebrew, Persian, and Urdu.
Does PII detection work the same in all languages?
Detection uses two approaches: regex-based pattern matching for structured data (IDs, phone numbers, tax numbers) and NLP models for unstructured entities (names, locations). Pattern-based detection covers all 48 languages. NLP-based detection is available in languages with trained models.
How are country-specific ID formats handled?
cloak.business includes 317 pattern recognizers covering 70+ countries. Each recognizer validates the specific format, checksum, and structure of national IDs, tax numbers, health identifiers, and financial data for that country.
Can I detect PII in multiple languages within the same document?
Yes. cloak.business can process multilingual documents and detect PII across different languages in a single request. The system automatically identifies which language patterns to apply.
How do I add support for a new language or entity type?
You can create custom entity recognizers using regex patterns or deny lists. This allows you to add domain-specific identifiers or extend coverage to additional formats not yet included in the built-in recognizer library.
Explore Related Features
Multi-language detection works seamlessly with all cloak.business products.
Chrome Extension
Anonymize AI prompts in ChatGPT, Claude, Gemini, and 3 more AI platforms — in any of 48 supported languages.
PII Anonymization API
REST API with JavaScript and Python SDKs. Full multi-language support built in.
Reversible Encryption
Encrypt PII with AES-256-GCM and restore original data anytime with your key.
Is This Right for You?
Best For
- ✦Global enterprises with multilingual document workflows requiring consistent GDPR and privacy compliance
- ✦Translation and localization agencies that process PII-containing content in multiple languages
- ✦Government agencies and NGOs processing citizen data across EU, APAC, and LATAM jurisdictions
- ✦Legal discovery and compliance teams working with 48 supported language jurisdictions
Not Ideal For
- ✦Monolingual English-only workflows — the standard plan is sufficient without the overhead of language detection
- ✦Languages not in the supported 48 — check the entity catalog for specific language and entity coverage
- ✦Real-time sub-10ms latency requirements — language detection adds processing overhead over English-only