Skip to main content
Article
piiredactionpresidioprivacygdprhipaapythonnlp

Redact PII from Text with Microsoft Presidio

Use Microsoft Presidio to detect and redact personally identifiable information (PII) from text. This Python-based tool helps build privacy-preserving applications compliant with regulations like GDPR and HIPAA by replacing sensitive data with placeholders or hashes.

beginner15 min4 steps
The play
  1. Install Presidio Libraries
    First, install the two core Microsoft Presidio packages using pip. `presidio-analyzer` is for detecting PII, and `presidio-anonymizer` is for redacting or replacing it.
  2. Detect PII Entities
    Use the `AnalyzerEngine` to find PII in a string. It loads a set of default recognizers for common entities like names, phone numbers, and credit card numbers. The `analyze` method returns a list of found entities, their type, and their location.
  3. Anonymize Detected PII
    Use the `AnonymizerEngine` to redact the entities found by the analyzer. Pass the analyzer results to the anonymizer's `anonymize` method. By default, it replaces each entity with its type in angle brackets (e.g., 'John Doe' becomes '<PERSON>').
  4. Customize Redaction Rules
    You can customize how specific entities are redacted using `OperatorConfig`. For example, you can choose to 'replace' a person's name with a fixed placeholder and 'hash' a phone number, providing more granular control over the output.
Starter code
from presidio_analyzer import AnalyzerEngine
from presidio_anonymizer import AnonymizerEngine
from presidio_anonymizer.entities import OperatorConfig

# 1. Define the text with PII
text_with_pii = "A message from Jane Doe to customer support. My phone number is 212-555-1234 and my email is jane.d@email.com."

# 2. Initialize the AnalyzerEngine
analyzer = AnalyzerEngine()

# 3. Call analyzer to find PII entities
analyzer_results = analyzer.analyze(text=text_with_pii, language='en')

# 4. Initialize the AnonymizerEngine
anonymizer = AnonymizerEngine()

# 5. Define custom redaction rules (operators)
# - Replace PERSON with a static value
# - Redact EMAIL with asterisks
# - Hash PHONE_NUMBER using SHA256
custom_operators = {
    "PERSON": OperatorConfig("replace", {"new_value": "[REDACTED_CUSTOMER]"}),
    "EMAIL_ADDRESS": OperatorConfig("mask", {"type": "mask", "masking_char": "*", "chars_to_mask": 15, "from_end": False}),
    "PHONE_NUMBER": OperatorConfig("hash", {"hash_type": "sha256"})
}

# 6. Anonymize the text using the custom rules
anonymized_output = anonymizer.anonymize(
    text=text_with_pii,
    analyzer_results=analyzer_results,
    operators=custom_operators
)

# --- Print results ---
print(f"Original Text:\n{text_with_pii}\n")
print(f"PII Entities Found:\n{analyzer_results}\n")
print(f"Anonymized Text:\n{anonymized_output.text}")
Redact PII from Text with Microsoft Presidio — Action Pack