Innovative companies are on average 20% more successful and therefore more competitive than their average competitors. A fundamental component for innovative companies in the mechanical and plant engineering industry is to know new and relevant technologies in order to make them usable in production and products. Therefore it is an essential task of engineers to identify new technologies for the solution of problems in their own company and to evaluate them regarding their possible applications. A detailed search for technologies at trade fairs, on the Internet or in technical journals can be very time-consuming. The reason is an oversupply of information (Big Data), which is mostly unstructured and scattered and difficult to find through a multitude of sources. As a result, considerable financial and time resources are required at various points in the company to obtain an overview in a structured and efficient manner and to compile information relevant to decision-making.
The goal is to create a new process for innovation in industrial companies using a Software as a Service platform based on AI algorithms. This enables companies to identify their own customized technology gamechanger faster, cheaper and more efficiently and at the same time to build up internal technology know-how. This offers especially small and medium sized companies without their own research or innovation department an opportunity to keep up with large corporations and to compete. In this way, the project specifically strengthens small and medium-sized enterprises, the backbone of industry, and helps German companies to continue to expand their top position as world market leaders.
The innovation of the project and central aspect is the research of a cognitive system in the form of Natural Language Processing (NLP) algorithms linked to an information model consisting of a knowledge graph. This system enables engineers to find technological solutions for domain-specific problems with just a single mouse click. The novelty consists of two core components:
Individual articles from a dataset collected by the project partner should be classified according to certain predefined topics. Since the dataset did not provide labels and a manual labeling process of the more than 600,000 articles is too time-consuming, established supervised classification algorithms could not be applied here. Therefore, a novel algorithm was developed, which exploits the semantic meanings of the defined topics to assign appropriate labels to the articles. Among other things, approaches from established topic modeling algorithms and information from domain-specific word and document embeddings are used. According to the initial assessment of the domain experts, the novel algorithm is capable of assigning the correct labels of the corresponding topics to a significant proportion of the articles in the data set with a high degree of accuracy if the hyperparameters are adjusted accordingly.
During the evaluation, our trained Transformer model delivered best results. In particular, the recognition of company names works very well. It is often possible to reliably identify the company behind a technology from the article title alone. The recognition of technologies, on the other hand, is more challenging and turns out to be difficult even with a current Transformer model.
To find solutions to specific problems in the engineering domain, we extracted the properties of each technology described in engineering texts. These properties of technologies can then be linked to each other in a Knowledge Graph. If a user then searches for specific solutions to his problem, he can define the properties that his solution must possess in advance. Based on this, the system can suggest technologies with the requested or similar properties as a solution. For the extraction of technology properties, a rule-based approach was developed, which searches for predefined units in the text as a starting point and extracts the technologies as well as their units in a "dependency parsing" approach. Our evaluation showed that this approach yielded very promising results for further reseach in this area.
Bavarian Ministry of Economic Affairs, Regional Development and Energy
Schopf, Tim; Weinberger, Peter; Kinkeldei, Thomas; Matthes, Florian
Towards Bilingual Word Embedding Models for Engineering, MSIE 2022, 4th International Conference on Management Science and Industrial Engineering, Chiang Mai, Thailand, 2022
Schopf, Tim; Braun, Daniel; Matthes, Florian
Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics, In Proceedings of the 17th International Conference on Web Information Systems and Technologies, Portugal, 2021
Braun, Daniel; Klymenko, Oleksandra; Schopf, Tim; Akan, Kaan; Matthes, Florian
The Language of Engineering: Training a Domain-Specific Word Embedding Model for Engineering, MSIE 2021: 3rd International Conference on Management Science and Industrial Engineering, Osaka, Japan, 2021