Article
llmdistillationfine-tuningmachine-learningnlplength-inflation
Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models
Mitigate 'length inflation' in On-policy Distillation (OPD) for LLMs, where student models generate excessively long sequences, causing truncated data and training instability. This pack offers actionable strategies to monitor and control sequence lengths for robust student model training.
intermediate1 hour5 steps
The play
- Monitor Student Rollout LengthsRegularly track the average and maximum sequence lengths generated by your student model during OPD training. Early detection of increasing lengths is crucial for intervention.
- Apply Length Penalties During GenerationIntegrate a length penalty into the student model's text generation process to discourage the creation of excessively long outputs. This can be done via generation parameters in libraries like Hugging Face Transformers.
- Configure `max_new_tokens` and SamplingExplicitly set `max_new_tokens` to a sensible maximum for student-generated rollouts. Experiment with `do_sample=True` along with `top_k` or `top_p` to control generation diversity and length.
- Optimize Context Window ManagementEnsure your training pipeline efficiently manages sequences that approach or exceed the model's maximum context window. Prioritize preventing excessive lengths over merely truncating them post-generation, as truncation loses valuable data.
- Align Teacher and Student Output LengthsIf possible, analyze the teacher model's typical output lengths for the task. Consider fine-tuning the teacher or adjusting its sampling parameters to produce more concise outputs that better guide the student towards desired length characteristics.
Starter code
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2") # Replace with your student model
model = AutoModelForCausalLM.from_pretrained("gpt2") # Replace with your student model
input_text = "The quick brown fox jumps over the lazy dog. This story continues with "
input_ids = tokenizer.encode(input_text, return_tensors='pt')
# Generate with a length penalty and max_new_tokens to control output length
output = model.generate(
input_ids,
max_new_tokens=50, # Maximum number of new tokens to generate
length_penalty=1.5, # Penalize longer sequences (values > 1.0)
num_beams=4, # Use beam search for better quality with penalty
no_repeat_ngram_size=2, # Prevent repetitive phrases
early_stopping=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))