Paper·arxiv.org
llmragevaluationresearchcontext-engineering
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
Implement the Case-Grounded Evidence Verification framework to compel AI models to genuinely depend on provided evidence, rather than superficially attaching it. This improves factual accuracy and trustworthiness by ensuring models' decisions are directly contingent on verifiable evidence.
intermediate1 hour5 steps
The play
- Define Grounding CriteriaEstablish explicit rules for what constitutes 'grounded' evidence. Specify how evidence must directly support, contradict, or be irrelevant to a given claim for your domain.
- Construct Verification DatasetsCreate or augment datasets with claims, corresponding evidence snippets, and clear labels indicating the evidence-claim relationship (e.g., 'supports', 'contradicts', 'insufficient'). Annotate specific spans of evidence that justify the label.
- Implement Evidence-Sensitive TrainingIntegrate training methodologies that penalize models for making claims not directly supported by the provided evidence. This could involve specific loss functions, negative sampling, or multi-task learning objectives focused on evidence verification.
- Evaluate Evidence RelianceDevelop and apply evaluation metrics that quantify the model's dependence on evidence. This often involves perturbing or removing key evidence phrases and measuring how consistently the model's prediction or justification changes.
- Iterate and RefineContinuously refine your grounding criteria, annotation guidelines, training objectives, and evaluation methods based on model performance, identified grounding failures, and human feedback.
Starter code
{
"claim": "The capital of France is Berlin.",
"evidence": "Paris is the capital and most populous city of France.",
"question": "Does the evidence provided directly support the claim? Explain your reasoning, citing specific phrases from the evidence if applicable. If not, state why."
}Source