Welcome to G1Bbon

The G1Bbon benchmark is based on a Temporal Reasoning Task, designed to test temporal integration and decision-making capabilities across human players and language models. The ultimate goal is to drive the development of the best probabilistic reasoning LLM constrained to 1.5 billion parameters, setting a new standard for temporal reasoning and pattern detection.

What is the Temporal Reasoning Task?

The Temporal Reasoning Task is a cognitive experiment designed to test how well participants can detect patterns that emerge over time and to integrate information across multiple rounds to make accurate decisions. It challenges both human players and language models to identify statistical biases in color distributions across different spatial locations.

Ready to begin?

Try the task yourself and see how well you can detect temporal patterns.