LLM Leaderboard
Although the leaderboard is geared towards 1B LLMs, we show the current performance of all models in the G1Bbon benchmark. As we continue to collect more data, we will update the leaderboard to reflect the latest scores only for the 1B models.
G1Bbon Benchmark (2 Quadrants)
Rank | Model | Aggregated Mean Score |
---|---|---|
1 | gpt40 | 0.70 - 0.72 |
2 | Centaur_88 | 0.63 - 0.65 |
3 | gpt40_mini | 0.58 - 0.60 |
4 | Qwen_78_Instruct | 0.50 - 0.52 |
5 | Qwen_3B_Instruct | 0.45 - 0.47 |
6 | Qwen_3B | 0.42 - 0.44 |
7 | Qwen_7B | 0.40 - 0.42 |
8 | Deepseek_R1_7B_Qwen | 0.38 - 0.40 |
9 | Qwen_1B | 0.35 - 0.37 |
10 | Owen_1B | 0.32 - 0.34 |
11 | Owen_1B_Instruct | 0.30 - 0.32 |
12 | Deepseek_R1_8B_Llama | 0.16 - 0.18 |
13 | Deepseek_R1_1B_Owen | 0.14 - 0.16 |
G1Bbon Benchmark (4 Quadrants)
Rank | Model | Aggregated Mean Score |
---|---|---|
1 | Qwen_78_Instruct | 0.34 - 0.36 |
2 | Centaur_88 | 0.32 - 0.34 |
3 | gpt40_mini | 0.29 - 0.31 |
4 | gpt40 | 0.25 - 0.27 |
5 | Qwen_3B_Instruct | 0.22 - 0.24 |
6 | Qwen_3B | 0.20 - 0.22 |
7 | Qwen_1B | 0.18 - 0.20 |
8 | Qwen_7B | 0.16 - 0.18 |
9 | Deepseek_R1_7B_Qwen | 0.14 - 0.16 |
10 | Deepseek_R1_1B_Owen | 0.10 - 0.12 |
11 | Owen_1B_Instruct | 0.08 - 0.10 |
12 | Deepseek_R1_8B_Llama | 0.04 - 0.06 |