Turning "what is worth studying" into a learnable preference-modeling problem: learn scientific judgment from community feedback, then use that judgment to improve scientific ideation.
Top scientists do more than read papers, write code, and run experiments. They judge which problems are worth pursuing and which directions may have lasting impact. AI Can Learn Scientific Taste defines scientific taste as the ability to judge and propose high-impact research ideas, then formulates it as a preference modeling and alignment problem.
Reinforcement Learning from Community Feedback learns preferences from large-scale scientific community signals instead of costly expert labels.
Given the titles and abstracts of two papers, the model reasons about which work is likely to have higher impact.
Given a paper, the model proposes follow-up research ideas and is aligned with the Judge as a reward model.
The Judge generalizes to future years, unseen fields, and peer-review preferences; the Thinker transfers to later research topics.
The paper treats scientific taste not as an individual's subjective whim, but as a collective judgment formed through long-term community interaction. If a work is repeatedly reused, extended, and cited, it reflects a durable community preference about research value. Citations, reuse, and review preferences can therefore provide learning signals for scientific judgment.
This perspective addresses a missing component in AI scientist systems. Current models can retrieve literature, write code, and execute experiments, but without a sense of which direction is worth pursuing, they struggle to close the expert research loop.
RLCF starts from a practical observation: scientific ideation has no single ground-truth answer, making RLVR difficult to apply directly. RLHF is also limited because expert annotation is expensive and cannot easily capture long-term community-scale preference. The team instead constructs preference pairs using community feedback such as citations, matching papers by field and publication time.
The workflow has three steps. First, collect papers and community feedback. Second, train Scientific Judge to decide which of two papers has higher impact. Third, use the Judge as a generative reward model to train Scientific Thinker to propose stronger follow-up ideas.
SciJudgeBench contains 700K arXiv paper pairs. Each pair is tightly matched by field and publication year while showing a significant citation difference. This design discourages shortcuts based only on year, domain, or topic popularity, pushing the model to learn deeper signals from titles and abstracts.
Experiments show that Scientific Judge improves with both data scale and model scale. A Judge trained from Qwen3-30B outperforms frontier closed models such as GPT-5.2 and Gemini 3 Pro, while also generalizing to 2025 papers, unseen disciplines, and ICLR review-score preferences.
If Judge is the research critic, Thinker is the research idea generator. Given a paper, it proposes a potentially high-impact follow-up idea. During training, the Judge compares generated ideas pairwise within a group; the within-group win rate becomes the reward, a process the paper calls comparison-based GRPO.
After training on high-citation papers from January to July 2025 using only about 4K examples, Scientific Thinker reaches a 70-80% win rate over its base model and generalizes to later research topics. Better judgment can therefore guide stronger scientific creativity.
This work decomposes "scientific taste" into two trainable abilities: judgment and ideation. Judge identifies which research is more likely to matter; Thinker proposes directions with higher potential. Together, they form a loop from evaluation to generation.
Citations are not the whole of scientific value, and the model does not yet possess full human scientific judgment. But RLCF provides a scalable starting point: learn value signals from long-term community feedback, then use those signals to generate better research ideas.
Paper: https://arxiv.org/abs/2603.14473
Project: https://tongjingqi.github.io/AI-Can-Learn-Scientific-Taste
GitHub: https://github.com/tongjingqi/AI-Can-Learn-Scientific-Taste
Data and models: https://huggingface.co/collections/OpenMOSS-Team/ai-can-learn-scientific-taste