LLM Comparison Dashboard
Select two or more models to compare their responses.
Evaluation Metrics
- Keyword: Measures coverage of expected concepts
- Semantic: Measures overlap with expected concepts
- Clarity: Rewards readable, well-structured responses
- Conciseness: Rewards answers that are neither too short nor too long
- Specificity: Penalizes vague language
- Final: Balanced weighted score across all metrics