Paper·arxiv.org
llmsecurityresearchfine-tuningevaluationmachine-learning
Learning the Signature of Memorization in Autoregressive Language Models
Discover a new 'transferable learned attack' that detects a distinct 'signature of memorization' in fine-tuned LLMs. This enables robust identification of training data leakage, pushing beyond heuristic methods to enhance AI privacy and security practices.
intermediate30 min5 steps
The play
- Understand the Memorization SignatureGrasp that fine-tuned autoregressive language models can inadvertently embed detectable 'signatures of memorization,' making them vulnerable to sophisticated membership inference attacks.
- Assess Fine-tuning PracticesReview your current privacy-preserving fine-tuning techniques and workflows to identify potential vulnerabilities against this new, more robust attack vector.
- Strengthen Data HandlingImplement or enhance robust data anonymization and synthetic data generation strategies before any fine-tuning process to minimize the risk of data leakage.
- Integrate Leakage MetricsAdopt advanced evaluation metrics specifically designed to detect and quantify data leakage and memorization within your fine-tuned models.
- Prioritize in Sensitive DomainsWhen designing and deploying LLMs, especially in sensitive domains, explicitly consider this new attack vector to proactively mitigate privacy risks and ensure compliance.
Starter code
import re
def anonymize_text(text: str) -> str:
"""Replaces common PII patterns with placeholders before fine-tuning."""
text = re.sub(r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b', '[PHONE_NUMBER]', text) # Phone numbers
text = re.sub(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', '[EMAIL_ADDRESS]', text) # Emails
text = re.sub(r'\b(SSN|NIN|ID)\s*:\s*\d{3}-\d{2}-\d{4}\b', '[SOCIAL_SECURITY_NUMBER]', text) # SSN-like patterns
return text
# Example usage for a sensitive data point in your fine-tuning dataset
sensitive_data_point = "Contact Jane Doe at jane.doe@example.com or call 555-123-4567 for support."
anonymized_data_point = anonymize_text(sensitive_data_point)
print(f"Original: {sensitive_data_point}")
print(f"Anonymized: {anonymized_data_point}")
# Use anonymized_data_point in your fine-tuning datasetSource