Paper·arxiv.org
llmresearchai-agentsevaluationfine-tuning
Learning to Reason with Insight for Informal Theorem Proving
Implement an iterative feedback loop using a 'critic' LLM to refine a 'generator' LLM's reasoning. This enhances its ability to solve complex tasks like theorem proving or code generation by providing targeted, actionable feedback.
intermediate1 day5 steps
The play
- Define Complex Reasoning Task and MetricsClearly identify the specific complex reasoning task (e.g., informal theorem proving, code generation, creative writing) and establish objective quality metrics for evaluating outputs. This defines what 'insightful' means for your application.
- Configure the Generator LLMSelect and initialize an LLM (e.g., GPT-4, Llama 3) that will be responsible for generating initial solutions, proofs, or creative content for the defined task.
- Configure the Critic LLMSelect and initialize a separate LLM. This 'critic' LLM's role is to evaluate the outputs from the Generator LLM, identify shortcomings, and provide specific, actionable feedback for improvement. Its prompt should emphasize critical analysis and constructive suggestions.
- Implement the Iterative Feedback LoopDesign and implement a system where: 1) The Generator LLM produces an output. 2) The Critic LLM evaluates this output and provides feedback. 3) This feedback is then fed back to the Generator LLM (e.g., as part of its next prompt or for fine-tuning) to guide the generation of an improved output in the next iteration.
- Refine Generator with Critic's FeedbackContinuously refine the Generator LLM's capabilities. This can be done by incorporating the critic's feedback into the generator's prompt for subsequent attempts, or by using the critic's evaluations and feedback as training data for fine-tuning the generator model itself.
Starter code
{
"generator_prompt_template": "Given the problem: {problem_statement}, and previous feedback: {feedback_history}, generate a detailed solution or proof. Focus on addressing the feedback to improve quality.",
"critic_prompt_template": "Review the following solution for the problem: {problem_statement}. Solution: {generated_solution}. Identify specific flaws, logical gaps, or areas for improvement. Provide constructive feedback in a bulleted list, focusing on how the solution can be made more insightful and correct.",
"initial_problem": "Prove that for any prime number p > 3, p^2 - 1 is divisible by 24."
}Source