Skip to main content
Paper·arxiv.org
llmsecurityresearchfine-tuningevaluationmachine-learning

Learning the Signature of Memorization in Autoregressive Language Models

Discover a new 'transferable learned attack' that detects a distinct 'signature of memorization' in fine-tuned LLMs. This enables robust identification of training data leakage, pushing beyond heuristic methods to enhance AI privacy and security practices.

intermediate30 min5 steps
The play
  1. Understand the Memorization Signature
    Grasp that fine-tuned autoregressive language models can inadvertently embed detectable 'signatures of memorization,' making them vulnerable to sophisticated membership inference attacks.
  2. Assess Fine-tuning Practices
    Review your current privacy-preserving fine-tuning techniques and workflows to identify potential vulnerabilities against this new, more robust attack vector.
  3. Strengthen Data Handling
    Implement or enhance robust data anonymization and synthetic data generation strategies before any fine-tuning process to minimize the risk of data leakage.
  4. Integrate Leakage Metrics
    Adopt advanced evaluation metrics specifically designed to detect and quantify data leakage and memorization within your fine-tuned models.
  5. Prioritize in Sensitive Domains
    When designing and deploying LLMs, especially in sensitive domains, explicitly consider this new attack vector to proactively mitigate privacy risks and ensure compliance.
Starter code
import re

def anonymize_text(text: str) -> str:
    """Replaces common PII patterns with placeholders before fine-tuning."""
    text = re.sub(r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b', '[PHONE_NUMBER]', text) # Phone numbers
    text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL_ADDRESS]', text) # Emails
    text = re.sub(r'\b(SSN|NIN|ID)\s*:\s*\d{3}-\d{2}-\d{4}\b', '[SOCIAL_SECURITY_NUMBER]', text) # SSN-like patterns
    return text

# Example usage for a sensitive data point in your fine-tuning dataset
sensitive_data_point = "Contact Jane Doe at jane.doe@example.com or call 555-123-4567 for support."
anonymized_data_point = anonymize_text(sensitive_data_point)

print(f"Original: {sensitive_data_point}")
print(f"Anonymized: {anonymized_data_point}")
# Use anonymized_data_point in your fine-tuning dataset
Source
Learning the Signature of Memorization in Autoregressive Language Models — Action Pack