Back to top

Guided Research Peter Weinberger

Last modified Nov 9, 2021

Problem Statement

Articles about new research within the engineering domain are generally published in different languages. In order to find related fields to specific keywords, currently some pretrained text embedding models are used. However, these models are only trained in one specific language. Therefore, if a research article is not available in the specific language the model is trained on, the user would have to utilize another model that is trained on the right language. To overcome the need for the search in different engineering domain-specific embedding models, this research should evaluate different approaches to either train a bilingual text embedding model or translate between the text embeddings of multiple models trained on engineering articles of different languages.

Research Design

Before we can train different models, which are capable to cover both languages, we ensure that the evaluation of various models is feasible. The evaluation of word embeddings is hard and sometimes even impossible. According to [1] we can apply extrinsic and intrinsic evaluation techniques to word embeddings. Extrinsic evaluators using word embeddings as input features for a downstream task and measure changes in the task-specific performance metrics. Intrinsic evaluation on the other side tests the quality of a representation independent of a specific natural language processing task.

An extrinsic evaluation has several pitfalls and is a controversial topic in literature [2]. Considering both, that this work has no real downstream task, because of our main goal and the gigantic data of one million mostly unlabeled engineering-specific articles in German or English—which can be exploited in this research—the evaluation will be rather intrinsic.

Of course the intrinsic evaluation of word embeddings has downsides, too [3]. But, because of our goal an intrinsic evaluation would fit our use case the best. There are numerous intrinsic evaluation techniques in literature, however this work will focus on Comparative Intristic Evaluation and Coherence as the main evaluation methods. For both methods a human domain expert is at hand, who accompanies the evaluation with his knowledge.

Comparative Intrinsic Evaluation as described in [2] takes representatively selected query words into account. The k most similar words to each query word will be calculated by a similarity metric e.g. cosine similarity. People are then asked to select the word they feel is most similar to the initial query word. Thus, we can subsequently calculate a score and know which of the trained models best represents similarity relationships.

Coherence is another evaluation technique which is even more intuitive. Query words are sampled from the vocabulary, then the k-nearest neighbors are determined, and afterwards an “intruder” word is sneaked in to the list of k-nearest neighbors that has nothing to do with the other words. A human evaluator is of charge to find this “intruder” word and a score can be computed to evaluate different word embedding models.

To summarize the comparison of the computed word embedding models, this work also aims to show how capable the most promising model responds to find word analogies. With this method we can test how well the model has mapped general relationships of words into the generated vector space, in addition to both other intrinsic evaluators which are verifying the models proficiency in representing similarity relationships.

Research Questions

The two specific research questions, which this work will address, are:

  1. How to merge engineering domain-specific word embeddings of different languages to ensure suitable semantic word comparisons between different languages?
  2. How can this approach be adapted for semantic comparisons between engineering domain-specific articles of different languages?

Merging domain-specific word embeddings of different languages and hence to be able to supply a proper representation of the engineering domain is the main goal of this research. We ensure that we choose the best model by evaluating the model via the two mentioned intrinsic evaluators. Additionally, it is important to explore how our ideas and methods for creating engineering-specific word embeddings of different languages can be adapted for document embeddings. For engineering experts, it may be of great value to search for semantically similar articles based on a query article to perform research in their field.

Bibliography

[1] A. Gladkova and A. Drozd. “Intrinsic Evaluations of Word Embeddings: What Can We Do Better?” In: Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP. Berlin, Germany: Association for Com- putational Linguistics, Aug. 2016, pp. 36–42. doi: 10.18653/v1/W16-2507.

[2] T. Schnabel, I. Labutov, D. Mimno, and T. Joachims. “Evaluation methods for unsupervised word embeddings.” In: Proceedings of the 2015 conference on empirical methods in natural language processing. 2015, pp. 298–307.

[3] B. Wang, A. Wang, F. Chen, Y. Wang, and C.-C. J. Kuo. “Evaluating word embedding models: methods and experimental results.” In: APSIPA Trans- actions on Signal and Information Processing 8 (2019). issn: 2048-7703. doi: 10.1017/atsip.2019.12.

Files and Subpages