Unsolved Questions (UQ) Project

UQ is an effort to create a new evaluation paradigm for LLMs. Instead of crafting exam-style benchmark questions where we know the answers, we test LLMs on organic, unsolved problems via automated LLM-based validation & community verification. Our current paper is at https://arxiv.org/abs/2508.17580.

We are looking for contributors for the following tasks:

Contribution	Point awarded	Description
Verify model answers in your expert domain	1 for every accepted verification	Provide original, human-written verification to a model's answer. Post the verification on our website and contact a project lead. Many questions/answers are too challenging for our team to verify. We need your expertise!
Write reviews for proposed questions	1 for every question	Review the question that passes LLM-based filtering to decide whether to include it into the new version of UQ-Dataset.
Help with development and maintenance	1-10	A great way to contribute to UQ is to help with engineering tasks, especially for UQ-Platform. Contact a project lead for detailed information.

Tracking contributions

We log all points in a shared tracker so your work is visible.
Project leads may also award ad-hoc points to recognize unlisted but useful tasks.

Authorship and credit

Final authorship and author order will be set by the project leads, considering both each contributor's point total and meaningful qualitative impact.
Contributors with fewer than 10 points may be acknowledged.

Human verification policy

Write verifications in your own words to protect integrity and showcase your expertise. No AI-generated reasoning.
You may use tools for lookup or formatting, but disclose any AI assistance with grammar, typos, etc.
Include enough detail for someone else to double-check your work, with references as needed.

Review for new question policy

Follow the template and write the verification yourself. No AI-generated reasoning.
You may use tools for lookup or formatting; if AI helps with grammar/typos, disclose such use.

We're excited to collaborate!

Ready to contribute? Get in touch with our project leads to get started.

: Assessing Language Models on Unsolved Questions