Back to top

Master's Thesis Ishwor Subedi

Last modified May 9

Candidate Profile Evaluation- A RAG Approach with Synthetic Data Generation for Tech Jobs

 

Introduction & Motivation

Applicant Tracking Systems (ATS) have been integral to candidate profile evaluation for many years. These systems assist companies by parsing and analyzing resume content, often with the help of machine learning techniques, to provide insights into candidates' profiles. However, a significant challenge has been the lack of sufficient resumes to either train the ATS or test its functionality after deployment. Moreover, existing systems often struggle to provide a comprehensive overview of resumes and effectively compare them to the requirements outlined in job descriptions.

To address these issues, we propose the use of synthetic data generation through Large Language Models (LLMs) to create resumes for technical roles (e.g., software developers, data scientists, machine learning engineers) that closely resemble real-world resumes. To assess the quality of this synthetic data, we suggest evaluation methods that compare the generated resumes to actual resumes in the field. Additionally, we propose the implementation of a Retrieval-Augmented Generation (RAG) system to enhance the comparison of resumes against specific job descriptions, offering deeper insights into the alignment between candidates' qualifications and job requirements.

 

Research Questions

  • R1: How to generate synthetic data matching the real-world distribution of the resumes overcoming the privacy barriers?                                                                                                                                                                                
  • R2: How do we evaluate the quality of synthetic resume data?                                                                                       
  • R3: Could a RAG-based approach for Candidate Selection perform better than the Named Entity Recognition baseline?                                                                                                                                                                                            
  • R4: Which open-source or proprietary models perform well on candidate summarization and matching?

References:

[1] Krohn, A. (2023). Evaluating Text Summarization Models on Resumes : Investigating the Quality of Generated Resume Summaries and their Suitability as Resume Introductions (Dissertation, KTH Royal Institute of Technology). Retrieved from https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-332407

[2] Mercan, Ö. B., Cavsak, S. N., Deliahmetoglu, A., & Tanberk, S. (2023). Abstractive Text Summarization for Resumes With Cutting Edge NLP Transformers and LSTM. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2306.13315

 

Files and Subpages