Learning the Signature of Memorization in Autoregressive Language Models

Discover a new 'transferable learned attack' that detects a distinct 'signature of memorization' in fine-tuned LLMs. This enables robust identification of training data leakage, pushing beyond heuristic methods to enhance AI privacy and security practices.

intermediate30 min5 steps

The play

Understand the Memorization Signature
Grasp that fine-tuned autoregressive language models can inadvertently embed detectable 'signatures of memorization,' making them vulnerable to sophisticated membership inference attacks.
Assess Fine-tuning Practices
Review your current privacy-preserving fine-tuning techniques and workflows to identify potential vulnerabilities against this new, more robust attack vector.
Strengthen Data Handling
Implement or enhance robust data anonymization and synthetic data generation strategies before any fine-tuning process to minimize the risk of data leakage.
Integrate Leakage Metrics
Adopt advanced evaluation metrics specifically designed to detect and quantify data leakage and memorization within your fine-tuned models.
Prioritize in Sensitive Domains
When designing and deploying LLMs, especially in sensitive domains, explicitly consider this new attack vector to proactively mitigate privacy risks and ensure compliance.

Starter code

import re

def anonymize_text(text: str) -> str:
    """Replaces common PII patterns with placeholders before fine-tuning."""
    text = re.sub(r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b', '[PHONE_NUMBER]', text) # Phone numbers
    text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL_ADDRESS]', text) # Emails
    text = re.sub(r'\b(SSN|NIN|ID)\s*:\s*\d{3}-\d{2}-\d{4}\b', '[SOCIAL_SECURITY_NUMBER]', text) # SSN-like patterns
    return text

# Example usage for a sensitive data point in your fine-tuning dataset
sensitive_data_point = "Contact Jane Doe at jane.doe@example.com or call 555-123-4567 for support."
anonymized_data_point = anonymize_text(sensitive_data_point)

print(f"Original: {sensitive_data_point}")
print(f"Anonymized: {anonymized_data_point}")
# Use anonymized_data_point in your fine-tuning dataset

Source

Paperarxiv.org