HumanEval+

Evaluate your code generation models using HumanEval+, an extended version of OpenAI's HumanEval benchmark. This Action Pack guides you through setting up the benchmark, generating code solutions, and running the enhanced evaluation with additional test cases.

intermediate30 min5 steps

The play

Clone the HumanEval+ Repository
Obtain the HumanEval+ benchmark by cloning its official GitHub repository to your local machine.
Set Up Your Environment
Navigate into the cloned directory and install the required Python dependencies to prepare your evaluation environment.
Generate Code Completions
Integrate your code generation LLM to produce solutions for the problems defined in HumanEval+. Save these completions in the expected format (e.g., JSONL) for evaluation.
Run the Evaluation Script
Execute the HumanEval+ evaluation script against your generated code completions. This script will run the original and extended test cases.
Analyze Evaluation Results
Review the output from the evaluation script, focusing on pass@k metrics and detailed results for both original and additional test cases to understand your model's performance.

Starter code

git clone https://github.com/HumanEvalPlus/HumanEvalPlus.git
cd HumanEvalPlus
pip install -e .

Source

Repogithub.com