Learning to Reason with Insight for Informal Theorem Proving

Implement an iterative feedback loop using a 'critic' LLM to refine a 'generator' LLM's reasoning. This enhances its ability to solve complex tasks like theorem proving or code generation by providing targeted, actionable feedback.

intermediate1 day5 steps

The play

Define Complex Reasoning Task and Metrics
Clearly identify the specific complex reasoning task (e.g., informal theorem proving, code generation, creative writing) and establish objective quality metrics for evaluating outputs. This defines what 'insightful' means for your application.
Configure the Generator LLM
Select and initialize an LLM (e.g., GPT-4, Llama 3) that will be responsible for generating initial solutions, proofs, or creative content for the defined task.
Configure the Critic LLM
Select and initialize a separate LLM. This 'critic' LLM's role is to evaluate the outputs from the Generator LLM, identify shortcomings, and provide specific, actionable feedback for improvement. Its prompt should emphasize critical analysis and constructive suggestions.
Implement the Iterative Feedback Loop
Design and implement a system where: 1) The Generator LLM produces an output. 2) The Critic LLM evaluates this output and provides feedback. 3) This feedback is then fed back to the Generator LLM (e.g., as part of its next prompt or for fine-tuning) to guide the generation of an improved output in the next iteration.
Refine Generator with Critic's Feedback
Continuously refine the Generator LLM's capabilities. This can be done by incorporating the critic's feedback into the generator's prompt for subsequent attempts, or by using the critic's evaluations and feedback as training data for fine-tuning the generator model itself.

Starter code

{
  "generator_prompt_template": "Given the problem: {problem_statement}, and previous feedback: {feedback_history}, generate a detailed solution or proof. Focus on addressing the feedback to improve quality.",
  "critic_prompt_template": "Review the following solution for the problem: {problem_statement}. Solution: {generated_solution}. Identify specific flaws, logical gaps, or areas for improvement. Provide constructive feedback in a bulleted list, focusing on how the solution can be made more insightful and correct.",
  "initial_problem": "Prove that for any prime number p > 3, p^2 - 1 is divisible by 24."
}

Source

Paperarxiv.org