Skip to Content
Unified docs shell with shared Classifyre tokens and acid-green highlight accents.

PII

Schema-driven detector documentation.

PIIactiveP010 params16 examples
Detector Metadata
Capability catalog entry from all_detectors.json.

Categories

PRIVACYCOMPLIANCE

Supported Asset Types

TXTTABLEURL

Recommended Model

presidio-analyzer

Notes

Identifies personal data (e.g., names, emails, IDs) that must be protected for privacy and compliance.

Parameters
Configuration parameters for the PII detector. Shared from `PIIDetectorConfig`.
ParameterTypeRequiredDescriptionDefaultConstraints
enabled_patternsarray | nullNoPresidio entity types to detect. When null, all supported entities are enabled. Use PIIEnabledPattern values (e.g. EMAIL_ADDRESS, US_SSN, CREDIT_CARD).null
languagestringNoBCP-47 language code for NER models (e.g. en, de, es)en
spacy_modelstring | nullNospaCy model to load (e.g. en_core_web_sm, en_core_web_lg). Defaults to en_core_web_sm when null.null
spacy_model_urlstring | nullNoWheel URL for the spaCy model. When set and the model is not installed, the CLI installs it at runtime.null
custom_recognizersarray | nullNoAd-hoc recognizers added to the Presidio registry at runtime. Each entry defines a regex-pattern or deny-list recognizer for a custom entity type.null
max_lengthinteger | nullNoOverride spaCy's nlp.max_length (default 1,000,000 chars). Set higher than your longest expected input to avoid the E088 error. Prefer chunk_size for very large texts.null
chunk_sizeinteger | nullNoSplit text into chunks of this many characters before analysis. Findings from all chunks are merged with corrected offsets. When null the full text is passed as-is (subject to max_length).null
chunk_overlapinteger | nullNoCharacter overlap between consecutive chunks. Helps detect entities that span a chunk boundary.0
confidence_thresholdnumberNoMinimum Presidio confidence score to report a finding (0-1)0.7min 0, max 1
max_findingsinteger | nullNoMaximum number of findings to return per assetnull