Back to top

Master's Thesis Matthias Holdorf

Last modified Dec 6, 2016

Computer Support for the Analysis and Improvement of the Readability of IT-related Texts

Context: A major task in information technology (IT) is communication. Difficult-to-read text hinders the communication between stakeholders and can have expensive consequences.

Objectives: We aim to design a tool that decreases the amount of time and resources needed to improve the readability of an IT-related text.

Method: We transfer the concept of bug pattern in static code analysis to the readability of text as readability anomalies. The term readability anomaly refers to an indicator of difficult-to-read text passages that may negatively affect communication. To identify the business needs of a software company with a staff of 100 employees, we conducted qualitative interviews and a quantitative survey. Furthermore, we reviewed existing approaches and methodologies from the knowledge base. Subsequently, we designed and implemented a readability checker based on the elicited requirements.

Results: The results of the interviews confirmed the assumptions of previous work: Difficult-to-read text hinders communication. The anomaly detection yielded an average precision of 69% with high variation. We investigated the relevance of the true-positive findings with a controlled experiment. Our participants considered 64% of the findings as relevant and would incorporate 59% immediately. Moreover, they were not aware of 48% of the findings. During the application of the tool, the practitioners have incorporated 49% of the overall findings. An analysis of our readability checker takes an average of 40 seconds for 10,000 words.

Conclusion: Our readability analysis tool (RAT) can uncover many practically relevant anomalies. Although some readability anomalies need to be adjusted or have to be supported by richer linguistic features, the checker provides effective means to improve the readability of IT-related texts. Based on our application in a practical environment, we found the following requirements and prospects for future work: Improvement of the precision and relevance of anomalies, domain-specific anomalies, configurability of anomaly detection, paraphrasing of detected anomalies, performance of an analysis, integration in the workflow of a company, support of various file formats, and the extent of integration in text processing programs.




Files and Subpages