Improving automatic reviewer assignment using large language models for NBDT journal

Tags:

working group

tools

Status:

Completed

Contributor/Mentors

Contributor: Subhankar Panda

Mentors:Titipat Achakulvisut, Daniele Marinazzo, Björn Brembs

About

Matching papers to reviewers based on topics is a crucial task for the Neurons, Behavior, Data Analysis, and Theory (NBDT) journal. However, the current automatic reviewer assignment tool that uses SciBERT embeddings, cosine similarity, and linear programming may not capture the semantic meaning of the text accurately. This project aims to address this limitation by fine tuning SciBERT on the relevant corpus of data and selecting the appropriate optimization objectives. SciBERT can learn from both the left and right contexts of words and has a vocabulary that is more suitable for scientific texts than BERT. The project will involve creating a training dataset, pre-processing it, generating the appropriate features, selecting and fine-tuning SciBERT, generating the word embeddings from the fine-tuned SciBERT and choosing the appropriate optimization objectives (Contrastive Learning, Learning to Rank Diversely, and LambdaRank) according to the dataset obtained and the subsequent evaluation of its performance. The expected outcome is an improved tool that more accurately matches papers to reviewers for the NBDT journal and can potentially be useful in other domains as well.

Completed Deliverables

2023

Enhanced SciBERT’s text-to-semantic meaning capture by fine tuning on data corpus and providing alternative and appropriate optimization approaches (e.g., contrastive learning, learning to rank diversely, lambda rank).
Improved SciBERT’s automatic reviewer assignment

2023