Hi! I am a third-year PhD student in the Natural Language Processing group at the University of Hong Kong (HKUNLP). I am fortunate to be advised by Dr. Tao Yu (core), Dr. Lingpeng Kong and Prof. Ben Kao. My primary interests are Data Science and Natural Language Processing. Previously, I graduated from the Chinese University of Hong Kong, Computer Science, in 2022.

Publications


BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Hongjin Su*, Howard Yen*, Mengzhou Xia*, Weijia Shi, Niklas Muennighoff, Han-yu Wang, Haisu Liu, Quan Shi, Zachary S. Siegel, Michael Tang, Ruoxi Sun, Jinsung Yoon, Sercan O. Arik, Danqi Chen, Tao Yu
Preprint
[paper] [code] [data] [website]

ARKS: Active Retrieval in Knowledge Soup for Code Generation
Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu
EMNLP 2024
[paper] [code] [data] [website]

Generative Representational Instruction Tuning
Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela
ICLR AGI Workshop 2024 (Oral, Best Paper Award)
[paper] [code] [models] [blog]

Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu
ICLR 2024 Spotlight (Top 5%)
[paper] [code] [model] [blog]

OpenAgents: An Open Platform for Language Agents in the Wild
Tianbao Xie*, Fan Zhou*, Zhoujun Cheng*, Peng Shi*, Luoxuan Weng*, Yitao Liu*, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Leo Z. Liu, Yiheng Xu, Hongjin Su, Dongchan Shin, Caiming Xiong, Tao Yu
COLM 2024
[paper] [code] [blog]

One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su*, Weijia Shi*, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu
ACL 2023
[paper] [model (3.4M downloads)] [website] [code (used by over 3.8k repos)] GitHub Repo stars PyPI Downloads

Selective Annotation Makes Language Models Better Few-Shot Learners
Hongjin Su, Jungo Kasai, Chen Henry Wu, Weijia Shi, Tianlu Wang, Jiayi Xin, Rui Zhang, Mari Ostendorf, Luke Zettlemoyer, Noah A. Smith, Tao Yu
ICLR 2023
[paper] [code]

Taming Pre-trained Language Models with N-gram Representations for Low-Resource Domain Adaptation
Shizhe Diao, Ruijia Xu, Hongjin Su, Yilei Jiang, Yan Song, Tong Zhang
ACL 2021
[paper] [code]