I’m a CS Ph.D. student at Virginia Tech’s Large Language Models (LLMs) Lab, advised by Tu Vu. My research focuses on collaborative and communal machine learning, aiming to efficiently and effectively develop all-rounded LLMs. I explore methods that blend a diverse set of skills, encompassing parameter-efficient fine-tuning, modular composition, and model merging.

Previously, I received a Master’s at the Language Science and Technology Department (LST) of Saarland University, where I worked with Dietrich Klakow and Vera Demberg. Prior to that, I contributed to the development of NLP systems for historical archives with Richard Tsai and Yuan-ju Liu at Academia Sinica. I obtained my Bachelor’s in History and was selected as a Google CSRMP Fellow in 2023.

Research

My research broadly revolves around efficient NLP and desensitization. Problems that frame my research goals include:

Email: pinjie(at)vt.edu

I am actively seeking a internship for 2025. I invite you to review my CV for further details. CV.

🔥 News

  • 2024.10: 1 paper got accepted to EMNLP 2024 Industry Track.
  • 2024.09: 1 paper Target-Aware Language Modeling via Granular Data Sampling got accepted to EMNLP 2024.
  • 2024.08: I started my PhD study at Virginia Tech.
  • 2024.07: 1 paper Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning: A Systematic Study got accepted to ACL 2024 Student Research Workshop!
  • 2024.03: I will spend time in Taipei in March and April. Feel free to reach out if you’re in the area and would like to meet up!
  • 2024.02: I successfully presented my Master thesis Exploring Task Selection for Intermediate-Task Transfer Learning.
  • 2024.02: 1 paper Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin acceted at LREC-COLING 2024.
  • 2024.01: 1 paper Projecting Annotations for Discourse Relations: Connective Identification for Low Resource Languages accepted to the Workshop on Computational Approaches to Discourse at EACL 2023.


📝 Selected Publications

Please see Google Scholar for an up-to-date publication list.

* indicates equal contributions

Description
Target-Aware Language Modeling via Granular Data Sampling
Ernie Chang, Pin-Jie Lin, Yang Li, Changsheng Zhao, Daeil Kim, Rastislav Rabtin, Zechun Liu, Yangyang Shi, Vikas Chandra
EMNLP 2024
[Paper]
Using ~1% of RefinedWeb data, models match full pretraining performance


Description
Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning
Pin-Jie Lin, Miaoran Zhang, Marius Mosbach, Dietrich Klakow
Student Research Workshop at ACL 2024
[Paper] [Code]


Description
Modeling Orthographic Variation Improves NLP Performance for Nigerian Pidgin
Pin-Jie Lin, Merel Scholman, Muhammed Saeed, Vera Demberg
LREC-COLING 2024
[Paper]
Our synthetic data are generated from a phonological-theoretic, parameter-free framework


Description
In-Context Prompt Editing For Conditional Audio Generation
Ernie Chang*, Pin-Jie Lin*, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra
ICASSP 2024
[Paper]
HuggingFace Daily Paper and twelve picks by Jordi Pons


Description
Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin
Pin-Jie Lin*, Muhammed Saeed*, Ernie Chang*, Merel Scholman
Interspeech 2023
[Paper]
We address one of the most underrepresented low-resource languages in the world. Our benchmark is publicly available


Description
Revisiting Sample Size Determination in Natural Language Understanding
Ernie Chang*, Muhammad Hassan Rashid*, Pin-Jie Lin*, Changsheng Zhao, Vera Demberg, Yangyang Shi and Vikas Chandra
ACL 2023 Findings
[Paper] [Code]
Our approach forecasts model performance with 0.9% error, using only 10% of the data


Description
Two-Stage Movie Script Summarization: An Efficient Method For Low-Resource Long Document Summarization
Dongqi Pu*, Xudong Hong*, Pin-Jie Lin*, Ernie Chang, Vera Demberg
COLING 2022
[Paper]
The top-performing movie script summarizer