PII
Schema-driven detector documentation.
PIIactiveP010 params16 examples
Detector Metadata
Capability catalog entry from
all_detectors.json.Categories
PRIVACYCOMPLIANCE
Supported Asset Types
TXTTABLEURL
Recommended Model
presidio-analyzerNotes
Identifies personal data (e.g., names, emails, IDs) that must be protected for privacy and compliance.
Parameters
Configuration parameters for the PII detector. Shared from `PIIDetectorConfig`.
| Parameter | Type | Required | Description | Default | Constraints |
|---|---|---|---|---|---|
| enabled_patterns | array | null | No | Presidio entity types to detect. When null, all supported entities are enabled. Use PIIEnabledPattern values (e.g. EMAIL_ADDRESS, US_SSN, CREDIT_CARD). | null | — |
| language | string | No | BCP-47 language code for NER models (e.g. en, de, es) | en | — |
| spacy_model | string | null | No | spaCy model to load (e.g. en_core_web_sm, en_core_web_lg). Defaults to en_core_web_sm when null. | null | — |
| spacy_model_url | string | null | No | Wheel URL for the spaCy model. When set and the model is not installed, the CLI installs it at runtime. | null | — |
| custom_recognizers | array | null | No | Ad-hoc recognizers added to the Presidio registry at runtime. Each entry defines a regex-pattern or deny-list recognizer for a custom entity type. | null | — |
| max_length | integer | null | No | Override spaCy's nlp.max_length (default 1,000,000 chars). Set higher than your longest expected input to avoid the E088 error. Prefer chunk_size for very large texts. | null | — |
| chunk_size | integer | null | No | Split text into chunks of this many characters before analysis. Findings from all chunks are merged with corrected offsets. When null the full text is passed as-is (subject to max_length). | null | — |
| chunk_overlap | integer | null | No | Character overlap between consecutive chunks. Helps detect entities that span a chunk boundary. | 0 | — |
| confidence_threshold | number | No | Minimum Presidio confidence score to report a finding (0-1) | 0.7 | min 0, max 1 |
| max_findings | integer | null | No | Maximum number of findings to return per asset | null | — |