LLM Comparison Dashboard

Select two or more models to compare their responses.

Evaluation Metrics

Keyword: Measures coverage of expected concepts
Semantic: Measures overlap with expected concepts
Clarity: Rewards readable, well-structured responses
Conciseness: Rewards answers that are neither too short nor too long
Specificity: Penalizes vague language
Final: Balanced weighted score across all metrics