Back to top

Bachelor's Thesis Matteo Merz

Last modified Aug 21
   No tags assigned

Location-aware visual document segmentation refers to the segmentation of PDF documents into text or image segments based on visual cues such as whitespace, font style or text alignment, while also providing bounding boxes which reflect the location of the segment relative to the document. Oncology guidelines present multiple particular challenges for segmentation based on visual cues. These challenges include multi-column layouts, figures and tables spanning multiple pages, formatting errors and challenging hierarchical structures, which makes interpretation of sections depend on context from higher levels.

This thesis will investigate the effectiveness of different location-aware visual document segmentation methods on oncology guideline documents. The first part of the thesis is the creation of a benchmark for the visual segmentation of oncology guidelines. For this purpose, firstly a dataset of curated, manually annotated oncology guideline documents will be assembled. Secondly, metrics for measuring the accuracy of the bounding boxes created by the segmentation methods and the preservation of context are defined (e.g. bounding box IoU, answering capabilities). Using this benchmark and other established benchmarks (e.g. DocBank, OmniDocBench), a set of segmentation methods (e.g. VISA, Llama-Index, NeuSym-Rag) are evaluated. Based on the benchmark results, methods for improving both the segmentation methods (e.g. parameter tuning) and the data (e.g. preprocessing steps) will be explored. The final goal of the thesis is the delivery of an annotated benchmark for the visual segmentation of oncology guidelines as well as the integration of a segmentation method into the Aidvice project, which returns highly accurate bounding boxes and accurately integrates needed context.

Files and Subpages

There are no subpages or files.