I’m a CS Ph.D. student at Virginia Tech’s Large Language Models (LLMs) Lab, advised by Tu Vu. My research focuses on collaborative and communal machine learning, aiming to efficiently and effectively develop all-rounded LLMs. I explore methods that blend a diverse set of skills, encompassing parameter-efficient fine-tuning, modular composition, and model merging.
Previously, I received a Master’s at the Language Science and Technology Department (LST) of Saarland University, where I worked with Dietrich Klakow and Vera Demberg. Prior to that, I contributed to the development of NLP systems for historical archives with Richard Tsai and Yuan-ju Liu at Academia Sinica. I obtained my Bachelor’s in History and was selected as a Google CSRMP Fellow in 2023.
Research
My research broadly revolves around efficient NLP and desensitization. Problems that frame my research goals include:
Improving the reuse of fine-tuning artifacts (Intermediate-task Transfer Learning).
Efficiently and effectively transferring advanced skills (e.g., multilingualism, instruction-following, long-context handling, reasoning) for robust generalization (CaT, In-Context Prompt Editing).
Developing methods to identify quality data mixtures for pretraining and post-training (Modeling Orthographic Variation, Target-aware Language Modeling, Sample Size Determination).
Advancing our understanding of LLM behavior to develop actionable principle (e.g. the mechanisms of knoweldge transfer during finetuning phase).
Email: pinjie(at)vt.edu
I am actively seeking a internship for 2025. I invite you to review my CV for further details. CV.
🔥 News
- 2024.10: 1 paper got accepted to EMNLP 2024 Industry Track.
- 2024.09: 1 paper Target-Aware Language Modeling via Granular Data Sampling got accepted to EMNLP 2024.
- 2024.08: I started my PhD study at Virginia Tech.
- 2024.07: 1 paper Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning: A Systematic Study got accepted to ACL 2024 Student Research Workshop!
- 2024.03: I will spend time in Taipei in March and April. Feel free to reach out if you’re in the area and would like to meet up!
- 2024.02: I successfully presented my Master thesis Exploring Task Selection for Intermediate-Task Transfer Learning.
- 2024.02: 1 paper Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin acceted at LREC-COLING 2024.
- 2024.01: 1 paper Projecting Annotations for Discourse Relations: Connective Identification for Low Resource Languages accepted to the Workshop on Computational Approaches to Discourse at EACL 2023.
📝 Selected Publications
Please see Google Scholar for an up-to-date publication list.
* indicates equal contributions
Ernie Chang, Pin-Jie Lin, Yang Li, Changsheng Zhao, Daeil Kim, Rastislav Rabtin, Zechun Liu, Yangyang Shi, Vikas Chandra
EMNLP 2024
[Paper]
Using ~1% of RefinedWeb data, models match full pretraining performance
Pin-Jie Lin, Miaoran Zhang, Marius Mosbach, Dietrich Klakow
Student Research Workshop at ACL 2024
[Paper] [Code]
Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg
LREC-COLING 2024
[Paper]
Our synthetic data are generated from a phonological-theoretic, parameter-free framework
Ernie Chang*, Pin-Jie Lin*, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra
ICASSP 2024
[Paper]
HuggingFace Daily Paper and twelve picks by Jordi Pons
Pin-Jie Lin*, Muhammed Saeed*, Ernie Chang*, Merel Scholman
Interspeech 2023
[Paper]
We address one of the most underrepresented low-resource languages in the world. Our benchmark is publicly available
Ernie Chang*, Muhammad Hassan Rashid*, Pin-Jie Lin*, Changsheng Zhao, Vera Demberg, Yangyang Shi and Vikas Chandra
ACL 2023 Findings
[Paper] [Code]
Our approach forecasts model performance with 0.9% error, using only 10% of the data
Dongqi Pu*, Xudong Hong*, Pin-Jie Lin*, Ernie Chang, Vera Demberg
COLING 2022
[Paper]
The top-performing movie script summarizer