Article·lmarena.ai
benchmarkevaluationLLMllmai-modelschatbot-arena
LMSYS Chatbot Arena
Quickly evaluate large language models (LLMs) head-to-head using the LMSYS Chatbot Arena. This crowdsourced platform lets you compare two anonymous models, providing direct human feedback to benchmark their performance.
beginner5 min5 steps
The play
- Access the ArenaNavigate to the LMSYS Chatbot Arena website to begin your LLM evaluation.
- Start a New BattleClick 'New Battle' to initiate a fresh comparison between two randomly selected, anonymous LLMs.
- Interact and EvaluatePrompt both models with the same query. Carefully compare their responses for quality, coherence, helpfulness, and overall performance.
- Submit Your VoteSelect the model you believe performed better (or choose 'Tie'/'Neither'). You can also provide optional written feedback.
- Reveal and LearnAfter submitting your vote, the names of the models will be revealed. Review the arena's leaderboard and statistics to see how models rank.
Starter resource
↗chat.lmsys.orgSource