Classifying text documents to car components requires that each document is labeled individually to create a vocabulary-specific training dataset for further classification. This process must be done manually and is therefore extremely time-consuming and not flexible. Moreover, this approach is not adjustable to changing requirements such as granularity because the labels remain static and cannot be quickly customized.
To solve the classification problem of customer complaints in the automotive industry, I propose a combined approach based on vector representations of predefined context rules to map a customer problem to predefined classes of vehicle components.
This process involves the creation of a pipeline with several steps such as data preprocessing, keyword extraction, topic modeling, labeling, context rule creation, context window extraction, vectorization of context windows, and assignment of primarily unclassified documents to predefined classes. Various vectorization techniques are tested, ranging from traditional such as tf-idf to state-of-the-art such as BERT.
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
20220718 Kreinhaus Master Thesis Kick-off Presentation.pdf | 894 KB | 15.02.2023 Versions | ||
221212 Andrei Kreinhaus Master Thesis.pdf | 3,34 MB | 15.02.2023 | ||
221212 Kreinhaus Master Thesis Final Presentation.pptx | 2,45 MB | 15.02.2023 |