Demystifying OPD: Length Inflation and Stabilization Strategies for Large Language Models

Mitigate 'length inflation' in On-policy Distillation (OPD) for LLMs, where student models generate excessively long sequences, causing truncated data and training instability. This pack offers actionable strategies to monitor and control sequence lengths for robust student model training.

intermediate1 hour5 steps

The play

Monitor Student Rollout Lengths
Regularly track the average and maximum sequence lengths generated by your student model during OPD training. Early detection of increasing lengths is crucial for intervention.
Apply Length Penalties During Generation
Integrate a length penalty into the student model's text generation process to discourage the creation of excessively long outputs. This can be done via generation parameters in libraries like Hugging Face Transformers.
Configure `max_new_tokens` and Sampling
Explicitly set `max_new_tokens` to a sensible maximum for student-generated rollouts. Experiment with `do_sample=True` along with `top_k` or `top_p` to control generation diversity and length.
Optimize Context Window Management
Ensure your training pipeline efficiently manages sequences that approach or exceed the model's maximum context window. Prioritize preventing excessive lengths over merely truncating them post-generation, as truncation loses valuable data.
Align Teacher and Student Output Lengths
If possible, analyze the teacher model's typical output lengths for the task. Consider fine-tuning the teacher or adjusting its sampling parameters to produce more concise outputs that better guide the student towards desired length characteristics.

Starter code

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("gpt2") # Replace with your student model
model = AutoModelForCausalLM.from_pretrained("gpt2") # Replace with your student model

input_text = "The quick brown fox jumps over the lazy dog. This story continues with "
input_ids = tokenizer.encode(input_text, return_tensors='pt')

# Generate with a length penalty and max_new_tokens to control output length
output = model.generate(
    input_ids,
    max_new_tokens=50,      # Maximum number of new tokens to generate
    length_penalty=1.5,     # Penalize longer sequences (values > 1.0)
    num_beams=4,            # Use beam search for better quality with penalty
    no_repeat_ngram_size=2, # Prevent repetitive phrases
    early_stopping=True
)

print(tokenizer.decode(output[0], skip_special_tokens=True))