Automatic text simplification is an important application in almost every domain to make text more accessible and interpretable for the general public. This is especially important in the legal domain where government regulations are often written in a hard-to-understand language and it has been shown that using simpler language makes justice more accessible [1]. Text simplification is a low resource task and very few aligned datasets are available for the purpose of training machine learning models. These datasets have either been collected manually [2] or by applying various text alignment techniques on Wikipedia articles [3]. There are several services such as Capito [4] that offer paid services to convert legal text to a simplified form, but there are hardly any publicly available AI assisted services to do this in an automated or semi-automated manner. While corpora containing aligned complex and simplified legal texts do not exist, there are many legal articles online which contain simplified language and could be leveraged to build an aligned corpus using various text alignment techniques [5][6][7].
The aim of the thesis is to create an aligned dataset for the purpose of simplification of legal texts. The thesis would involve the following tasks.
References:
[1] Rubab, Iram & Mamona, Yasmin & Khan, & Asgher, Tahira. (2020). Transformation of Legal Texts into Simplified Accounts to Make the Justice Accessible.
[2] Xu, Wei & Callison-Burch, Chris & Napoles, Courtney. (2015). Problems in Current Text Simplification Research: New Data Can Help. Transactions of the Association for Computational Linguistics. 3. 283-297. 10.1162/tacl_a_00139.
[3] Coster, William & Kauchak, David. (2011). Simple English Wikipedia: A New Text Simplification Task.. 665-669.
[4] “Barrierefreie Information: Leicht Verständliche Sprache.” Capito, 4 Feb. 2021, www.capito.eu
[5] Shieber, Stuart & Nelken, Rani. (2006). Towards robust context-sensitive sentence alignment for monolingual corpora.
[6] Sultan, A.M. & Bethard, Steven & Sumner, Tamara. (2014). Back to Basics for Monolingual Alignment: Exploiting Word Similarity and Contextual Evidence. Trans. Assoc. Comput. Linguist.. 2. 219-230. 10.1162/tacl_a_00178.
[7] Huang, Yonghui & Li, Yunhui & Luan, Yi. (2018). Monolingual sentence matching for text simplification.
Name | Type | Size | Last Modification | Last Editor |
---|---|---|---|---|
210517 Muralidharan Master Thesis Kickoff.pptx | 1,14 MB | 17.05.2021 | ||
210927 Muralidharan Master Thesis final presentation.pdf | 1,30 MB | 27.09.2021 | ||
211015 Muralidharan Master Thesis.pdf | 2,73 MB | 19.10.2021 | ||
Master Thesis kickoff presentation.pdf | 874 KB | 17.05.2021 |