Zhuohao Yu

About Me

Hello, I am Zhuohao Yu (于倬浩), currently a second-year Master’s student at Peking University, advised by Prof. Wei Ye and Prof. Shikun Zhang. My research interest lies in natural language processing.

Beyond academia, I have a strong background in competitive programming, where I won Gold medal in the ACM-ICPC regionals and Silver medal in the National Olympiad in Informatics (NOI).

Education

2023.09 - 2026.06, MS, Software Engineering, Peking University.
2019.09 - 2023.06, BS, Computer Science, Renmin University of China.

Research Interests

My research focuses on advancing interpretable, reliable, self-evolving LLMs through autonomous evaluation, alignment and reasoning, with a commitment to creating trustworthy, practical, open-source AI systems.

This work follows a progressive pipeline: developing robust evaluation methodologies, leveraging these insights for self-improvement, all built upon efficient and trustworthy infrastructures.

Autonomous evaluation of language models. As LLMs surpass human expertise in specialized domains, evaluation becomes profoundly challenging. When models possess knowledge beyond human validators, who becomes the arbiter of truth? How can evaluation frameworks evolve alongside increasingly capable models? What distinguishes a model that truly understands from one that merely memorizes?

Related Works: KIEval (ACL 2024), PandaLM (ICLR 2024), FreeEval (EMNLP 2024).
Self-improving and reasoning LLMs. The next frontier lies in creating models that can leverage autonomous rewards to enhance their own capabilities, both during training and inference. How can we convert assessment into actionable learning that preserves model integrity? What role does reasoning play in enabling models to refine their own capabilities? What constraints prevent optimization for superficial metrics rather than meaningful capabilities?

Related Works: RewardAnything (Preprint), ORPS (ICML 2025), Supervised Knowledge… (ICLR 2024).
Trustworthy open-source AI systems. Realizing AI’s potential demands infrastructure that is powerful, accessible, and responsible. How can we democratize access while establishing safeguards against misuse? What architectures allow transparency without sacrificing efficiency? How might attribution mechanisms maintain accountability within collaborative ecosystems?

Related Works: SAEMark (Preprint), FreeEval (EMNLP 2024), PaperLens (Try it).

Publications

RewardAnything: Generalizable Principle-Following Reward Models

Zhuohao Yu, Jiali Zeng, Weizheng Gu, Yidong Wang, Jindong Wang, Fandong Meng, Jie Zhou, Yue Zhang, Shikun Zhang, Wei Ye.

Preprint

PDF Code

Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation

Zhuohao Yu, Weizheng Gu, Yidong Wang, Zhengran Zeng, Jindong Wang, Wei Ye, Shikun Zhang.

International Conference on Machine Learning (ICML), 2025.

PDF Code

FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models

Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Zhengran Zeng, Wei Ye, Jindong Wang, Yue Zhang, Shikun Zhang.

Empirical Methods in Natural Language Processing (EMNLP), 2024.

PDF Code System Demonstration Track

KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models

Zhuohao Yu, Chang Gao, Wenjin Yao, Yidong Wang, Wei Ye, Jindong Wang, Xing Xie, Yue Zhang, Shikun Zhang.

Annual Meeting of the Association for Computational Linguistics (ACL), 2024.

PDF Code Main Conference

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Yidong Wang*, Zhuohao Yu* (*Equal Contribution),
Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, Xing Xie, Wei Ye, Shikun Zhang, Yue Zhang.

International Conference on Learning Representations (ICLR), 2024.

PDF Code

Supervised Knowledge Makes Large Language Models Better In-context Learners

Linyi Yang*, Shuibai Zhang*, Zhuohao Yu* (*Equal Contribution),
Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang.

International Conference on Learning Representations (ICLR), 2024.

PDF

CodeShell Technical Report

Rui Xie, Zhengran Zeng, Zhuohao Yu, Chang Gao, Shikun Zhang, Wei Ye.

Preprint

PDF Code

Exploring Vision-Language Models for Imbalanced Learning

Yidong Wang, Zhuohao Yu, Jindong Wang, Qiang Heng, Hao Chen, Wei Ye, Rui Xie, Xing Xie, Shikun Zhang.

International Journal of Computer Vision (IJCV), 2023.

PDF Code

TextBox 2.0: A Text Generation Library with Pre-trained Language Models

Tianyi Tang, Junyi Li, Zhipeng Chen, Yiwen Hu, Zhuohao Yu, Wenxun Dai, Zican Dong, Xiaoxue Cheng, Yuhao Wang, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen.

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022.

PDF

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Zhipeng Chen, Jingyuan Wang, Wayne Xin Zhao, Ji-Rong Wen.

The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022.

PDF

Textbox: A Unified, Modularized, and Extensible Framework for Text Generation

Junyi Li, Tianyi Tang, Gaole He, Jinhao Jiang, Xiaoxuan Hu, Puzhao Xie, Zhipeng Chen, Zhuohao Yu, Wayne Xin Zhao, Ji-Rong Wen.

Annual Meeting of the Association for Computational Linguistics (ACL), 2021

PDF Code System Demonstration Track

Fun Projects

PaperLens 📚

LLM-powered platform to navigate, discover, and analyze research papers.
Search for ideas instead of terms.

About Me

Education

Research Interests

Publications

Fun Projects

PaperLens 📚

Services

Conference Reviewers

Journal Reviewers