Back to top

Guided Research Mohab Ghanem

Last modified Apr 12, 2021

Evaluating Text Similarity Techniques for Matching Personal Employee Objectives

 

Searching for similarities in different pieces of text has always been an interesting topic in machine learning. The problem itself is composed of multiple subproblems, starting from finding the best representation of a word, moving to aggregating word representations to sentences, and finally defining the notion of similarity between these representations. While there are multiple ways to solve each subproblem, the final combination of individual solutions highly depends on the problem at hand. Our concrete problem for this guided research is finding employees with similar personal objectives from a database of employee objectives obtained from the HR department of MERCK.

 

The primary approach is to develop independent modules to solve each subproblem (representing words, representing sentences, and defining similarity), with the goal of making the outputs of these modules interoperable, to allow making different combinations of sub solutions. A diagram of the preliminary approach is shown in Figure [1]. The evaluation of the system is to be done manually by the HR department of MERCK, for this we will develop a user interface on top of the previously mentioned modules.



Figure [1] shows a sample diagram for the flow of the application.



Throughout this work, we will be comparing the performance of different word representation algorithms like word2vec and BERT. We will also be comparing aggregations of these and other word representations to sentence representations, like for example Universal Sentence Encoders. We will also use the feedback we get from the human testers to further fine-tune our models for the task at hand.




A preliminary schedule is shown in Figure [2], and explained in the points below:

  1. We start by reviewing the literature related to this topic, and identifying potentially helpful resources and expected problems.

  2. Following the literature review, we implement the first module, which is the embeddings generator.

  3. After the embeddings generator is ready we start implementing the similarity calculator module.

  4. Approaching the end of the previous step, we develop a POC with a simple user interface, to allow the human testers to give us early feedback.

  5. Finally, we integrate the feedback we get from the human testers, and finalize the user interface.

 



In conclusion, the final output of this guided research is developing a system to be used by the HR department of MERCK, to help them find people with similar objectives among their employees using a blend of text similarity techniques.

 

Files and Subpages