Haofei Yu

Language Technologies InstituteSchool of Computer ScienceCarnegie Mellon University
5000 Forbes Ave, Pittsburgh, PA 15213
Profile Picture
Taken at Göreme, Cappadocia, Turkey


I am currently in the second year at Carnegie Mellon University, pursuing a Master of Science in Intelligent Information Systems within the Language Technologies Institute of the School of Computer Science.

Before my journey at CMU, I laid my academic foundations at the ChuKochen Honors College of Zhejiang University, where I obtained my Bachelor's degree in Computer Science and Technology (with honors). During my years there, I had the privilege of delving into Natural Language Processing (NLP) research, contributing to natural language processing research at Westlake University and Tencent AI Lab's NLP Center.

At CMU, my research changing from natural language processing to more interesting and grounded stuff. Instead of doing research on traditional NLP tasks, I studied crafting language agents for better social interaction in negotiation and collaboration scenarios. Additionally, I worked on models that fuse multimodal information and models that generate executable actions like web navigation and code generation.

Moreover, my 2023 summer internship at Apple's Siri Information and Intelligence Team focuses on long-form Web QA to effectively resolve real Siri user queries in industry.

Research Interest

My research enthusiasm lies in applying AI to grounded tasks. My primary goal is to develop a multimodal agent tailored for real-world applications, blending the practical with the magical in technology. To achieve this goal, I identified three main challenges:

Challenge1 Multimodal Fusion
How to allow the agent to fuse heterogeneous data and understand multimodal concepts like sarcasm.

Challenge2 Multimodal Interaction
How to enable the agent to provide feedback, including language and executable action, to the real world interactively.

Challenge3 Memory Management
How to equip the agent with a memory system that adapts to new information while retaining past ones.

The following figure is an overview of 3 challenges in the observation-feedback loop for multimodal agent, accompanied by a snapshot of my current research progress on each of them, including ongoing work.

Research Line


Nov 5, 2023

🎉 Multimodal Mixture of Experts is accepted by NeurIPS UniReps Workshop in 2023!

Oct 7, 2023

🎉 TRAMS got accepted by EMNLP 2023 Findings! Counting the Bugs in ChatGPT’s Wugs got accepted by EMNLP 2023 Main Conference!

May 2, 2023

🎉 Uni-Encoder and RFiD got accepted by ACL Findings!

Aug 27, 2022

😄 Enrolled at MIIS Program and began a new semester at CMU.

Jun 30, 2022

💪 Graduated from Zhejiang University and received the B.Eng. in Computer Science and Technology (with Honors).