Multi-Language Support — 48 Languages

Detect and anonymize PII in 48 languages with native pattern support. Full RTL support for Arabic, Hebrew, Persian, and Urdu.

48 Languages Supported

Full PII detection and anonymization across the entire platform

spaCy NLP - Runs Locally (25 languages)

EnglishGermanSpanishFrenchItalianPortugueseDutchPolishRussianJapaneseChineseKoreanRomanianGreekCroatianSlovenianMacedonianSwedishDanishNorwegianFinnishUkrainianLithuanianCatalanTurkish

Stanza NER - Runs Locally (7 languages)

BulgarianHungarianHebrew (RTL)VietnameseAfrikaansArmenianBasque

XLM-RoBERTa Transformer - Runs Locally (16 languages)

Arabic (RTL)HindiCzechSlovakIndonesianThaiPersian (RTL)SerbianLatvianEstonianMalayBengaliUrdu (RTL)SwahiliTagalogIcelandic

RTL Support

ArabicHebrewPersianUrdu

Powered by Advanced NLP

Three NLP engines working together for maximum language coverage

  • Lazy-loaded models (max 5 cached) for memory efficiency
  • Automatic language detection
  • Mixed-language document processing
  • Language-specific entity patterns

Country-Specific Formats

We detect PII in formats specific to each country and region.

European Formats

  • German: Personalausweis, Steuer-ID, Reisepass
  • French: NIR, Carte Nationale, Permis
  • Italian: Codice Fiscale, Carta d'Identità
  • Spanish: DNI, NIE, NIF
  • Dutch: BSN, Rijbewijs
  • Polish: PESEL, NIP, REGON

Asia-Pacific Formats

  • Japan: My Number, Passport
  • India: Aadhaar, PAN, GSTIN, Vehicle Registration
  • Thailand: National ID, Tax ID, Passport
  • Indonesia: NIK, NPWP, Passport
  • Vietnam: CCCD, Tax Code, Passport
  • Malaysia: MyKad, Tax ID, Passport

Americas, Africa & Middle East

  • US: SSN, Driver's License, Passport
  • UK: National Insurance, NHS Number
  • Canada: SIN, Driver's License
  • Australia: TFN, Medicare, ABN
  • Kenya: National ID, KRA PIN, Passport
  • South Africa: ID Number, Tax Number, Passport

Frequently Asked Questions

Which 48 languages does cloak.business support?

cloak.business supports Afrikaans, Arabic, Armenian, Basque, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Macedonian, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Thai, Turkish, Ukrainian, Urdu, and Vietnamese — with full RTL support for Arabic, Hebrew, Persian, and Urdu.

Does PII detection work the same in all languages?

Detection uses two approaches: regex-based pattern matching for structured data (IDs, phone numbers, tax numbers) and NLP models for unstructured entities (names, locations). Pattern-based detection covers all 48 languages. NLP-based detection is available in languages with trained models.

How are country-specific ID formats handled?

cloak.business includes 317 pattern recognizers covering 70+ countries. Each recognizer validates the specific format, checksum, and structure of national IDs, tax numbers, health identifiers, and financial data for that country.

Can I detect PII in multiple languages within the same document?

Yes. cloak.business can process multilingual documents and detect PII across different languages in a single request. The system automatically identifies which language patterns to apply.

How do I add support for a new language or entity type?

You can create custom entity recognizers using regex patterns or deny lists. This allows you to add domain-specific identifiers or extend coverage to additional formats not yet included in the built-in recognizer library.

Is This Right for You?

Best For

  • Global enterprises with multilingual document workflows requiring consistent GDPR and privacy compliance
  • Translation and localization agencies that process PII-containing content in multiple languages
  • Government agencies and NGOs processing citizen data across EU, APAC, and LATAM jurisdictions
  • Legal discovery and compliance teams working with 48 supported language jurisdictions

Not Ideal For

  • Monolingual English-only workflows — the standard plan is sufficient without the overhead of language detection
  • Languages not in the supported 48 — check the entity catalog for specific language and entity coverage
  • Real-time sub-10ms latency requirements — language detection adds processing overhead over English-only

Anonymize in Any Language

Start with 200 free tokens. Works with all 48 languages.