LLM Leaderboard

Although the leaderboard is geared towards 1B LLMs, we show the current performance of all models in the G1Bbon benchmark. As we continue to collect more data, we will update the leaderboard to reflect the latest scores only for the 1B models.

G1Bbon Benchmark (2 Quadrants)

Rank Model Aggregated Mean Score
1gpt400.70 - 0.72
2Centaur_880.63 - 0.65
3gpt40_mini0.58 - 0.60
4Qwen_78_Instruct0.50 - 0.52
5Qwen_3B_Instruct0.45 - 0.47
6Qwen_3B0.42 - 0.44
7Qwen_7B0.40 - 0.42
8Deepseek_R1_7B_Qwen0.38 - 0.40
9Qwen_1B0.35 - 0.37
10Owen_1B0.32 - 0.34
11Owen_1B_Instruct0.30 - 0.32
12Deepseek_R1_8B_Llama0.16 - 0.18
13Deepseek_R1_1B_Owen0.14 - 0.16

G1Bbon Benchmark (4 Quadrants)

Rank Model Aggregated Mean Score
1Qwen_78_Instruct0.34 - 0.36
2Centaur_880.32 - 0.34
3gpt40_mini0.29 - 0.31
4gpt400.25 - 0.27
5Qwen_3B_Instruct0.22 - 0.24
6Qwen_3B0.20 - 0.22
7Qwen_1B0.18 - 0.20
8Qwen_7B0.16 - 0.18
9Deepseek_R1_7B_Qwen0.14 - 0.16
10Deepseek_R1_1B_Owen0.10 - 0.12
11Owen_1B_Instruct0.08 - 0.10
12Deepseek_R1_8B_Llama0.04 - 0.06