Large pretrained language models have made substantial progress in their ability to encode a text sequence’s syntactic and semantic information. Such language models thereby enable the construction of powerful machine learning models with little addi- tional training data for downstream tasks ranging from question answering systems to text classification. The key to their pervasive success is the Transformer architecture they are based on. The Transformer and its self-attention mechanism allow the tokens of an input sequence to be processed in parallel. This parallelism enables the degree of pretraining necessary to achieve state of art results on downstream tasks. However, the Transformer’s memory and compute requirements grow quadratically with regard to the input sequence’s length. This renders processing long sequences prohibitively expensive. The goal of this project is to examine a selection of the models created to overcome these limitations and evaluate different aspects of their performance on downstream machine learning tasks. The BigPatent corpus, a collection of of U.S. patent documents, is used to set up benchmark task in which a variety of model configurations is tested on classifying patents according to their subject matter.
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
220613 Jan Robin Geibel Master Thesis Kick-Off.pdf | 1,15 MB | 15.02.2023 | ||
220613 Jan Robin Geibel Master Thesis Kick-Off.pptx | 4,33 MB | 15.02.2023 | ||
221010 Jan Robin Geibel Master Thesis.pptx | 10,63 MB | 15.02.2023 | ||
master_thesis_Jan_Robin_Geibel_03695002.pdf | 12,99 MB | 15.02.2023 |