Open LLM Leaderboard | LLMWay

Leaderboard

Open LLM Leaderboard

aims to track, rank, and evaluate LLMs and chatbots as they are released.

Link

aims to track, rank, and evaluate LLMs and chatbots as they are released.

Relevant Sites

SuperLim

a Swedish language understanding benchmark that evaluates natural language processing (NLP) models on various tasks such as argumentation analysis, semantic similarity, and textual entailment.

CompMix

a benchmark evaluating QA methods that operate over a mixture of heterogeneous input sources (KB, text, tables, infoboxes).

MMedBench

a benchmark that evaluates large language models' ability to answer medical questions across multiple languages.

Relevant Sites

Leave a Reply Cancel reply