Hi! I am a third-year PhD student in the Natural Language Processing group at the University of Hong Kong (HKUNLP). I am fortunate to be advised by Dr. Tao Yu (core), Dr. Lingpeng Kong and Prof. Ben Kao. My primary interests are Data Science and Natural Language Processing. Previously, I graduated from the Chinese University of Hong Kong, Computer Science, in 2022.
Publications
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Hongjin Su*, Howard Yen*, Mengzhou Xia*, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu
Preprint
[paper]
[code]
[data]
[website]
ARKS: Active Retrieval in Knowledge Soup for Code Generation
Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu
EMNLP 2024
[paper]
[code]
[data]
[website]
Generative Representational Instruction Tuning
Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela
ICLR AGI Workshop 2024 (Oral, Best Paper Award)
[paper]
[code]
[models]
[blog]
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu
ICLR 2024 Spotlight (Top 5%)
[paper]
[code]
[model]
[blog]
OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu
COLM 2024
[paper]
[code]
[blog]
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu
ACL 2023
[paper]
[model (3.4M downloads)]
[website]
[code (used by over 3.8k repos)]
Selective Annotation Makes Language Models Better Few-Shot Learners
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023
[paper]
[code]
Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation
Shizhe Diao, Ruijia Xu, Hongjin Su, Yilei Jiang, Yan Song, Tong Zhang
ACL 2021
[paper]
[code]