Skip to main content
Article·lmarena.ai
benchmarkevaluationLLMllmai-modelschatbot-arena

LMSYS Chatbot Arena

Quickly evaluate large language models (LLMs) head-to-head using the LMSYS Chatbot Arena. This crowdsourced platform lets you compare two anonymous models, providing direct human feedback to benchmark their performance.

beginner5 min5 steps
The play
  1. Access the Arena
    Navigate to the LMSYS Chatbot Arena website to begin your LLM evaluation.
  2. Start a New Battle
    Click 'New Battle' to initiate a fresh comparison between two randomly selected, anonymous LLMs.
  3. Interact and Evaluate
    Prompt both models with the same query. Carefully compare their responses for quality, coherence, helpfulness, and overall performance.
  4. Submit Your Vote
    Select the model you believe performed better (or choose 'Tie'/'Neither'). You can also provide optional written feedback.
  5. Reveal and Learn
    After submitting your vote, the names of the models will be revealed. Review the arena's leaderboard and statistics to see how models rank.
Starter resource
chat.lmsys.org
Source
LMSYS Chatbot Arena — Action Pack