Multi-Task Deep Learning for Legal Domain

Last modified by Anum Afzal Dec 21, 2021

Project Summary

The revival of deep learning yielded astonishing results in many tasks from computer vision, machine translation to speech recognition in the last years. This advancement is favored by the increasing availability of datasets and computational resources. On the other side, the legal domain with its serious demand for natural language processing applications cannot benefit in equal measure from it, since appropriate preprocessed legal datasets are highly limited or barely exist at all. In contrast to using datasets from other domains, we propose the usage of multi-task deep learning in order to exploit task-independent commonalities and overcome the dataset shortage in the legal domain.

As part of this work, we have created six different legal corpora for translation, text summarization and document classification. Five out of the six corpora descend from the DCEP, Europarl and JRC-Acquis corpus provided by the European Union which we processed for the immediate use with neural network based models. The last corpus is a collection of 42k documents containing court decisions of the seven federal courts of Germany scraped from their official website.

Based on these newly created corpora, various multi-task combinations within a task family (e.g. only translation tasks) and across task families (e.g. translation, summarization & classification) were trained on the state-of-the-art multi-task deep learning model, the MultiModel. In addition, we compared the single & multi-task performance of the MultiModel on two different sets of hyperparameters to the state- of-the-art translation model, the Transformer. The MultiModel trained on joint tasks is on an equal footing with the Transformer. We show that multi-task deep learning is advisable in situations where training data is sparse through experiments in which a jointly trained MultiModel is able to outperform a single-task trained MultiModel and the Transformer. Surprisingly, a combination across task families surpasses several combinations within task families. Finally, we trained a combination which beats the JRC EuroVoc Indexer JEX in the German multi-label classification task by nearly 14 points on the F1 metric.

Motivation:

Machine Learning moved into the focus of researchers and practitioners in recent years. Seemingly easy tasks for humans, which could not be solved effectively by computers are now tackled and solved with machine learning tools in tow. A subfield distinguished itself as very promising, yielding astonishing results for challenges across different areas, including computer vision, natural language processing, robotics, medical applications and data mining. This is deep learning, a machine learning discipline, which draws its capabilities from artificial neural networks. These neural networks learn from experience and represent their learned knowledge in the interaction of interconnected components. By stacking these components, a deep architecture is created posing the eponym for this exceptional innovative field. Apparently, artificial neural networks did not fell out of the blue. In fact, the first artificial neural network was already developed in 1962. The training of the same was effectively solved with the backpropagation algorithm introduced in 1986. More complex network structures including convolutional neural networks (CNN) which perform intuitive human tasks such as object recognition were developed afterwards. However big breakthroughs would still take some time in coming. Computing power to speed up the training process accordingly, rose over the last decades, reaching a certain threshold lately, finally enabling the full potential of deep learning. Especially developments of faster and more dense graphical processor units (GPU) contributed. Manufacturers even specialized in producing hardware with particular interfaces for deep learning requirements. Rightly, deep learning ushers a new era for solving ostensibly impossible problems.

However, whereas deep learning models beat record after record and win numerous contests in pattern recognition and machine learning, the propagation of these models across industries is far from pervading all levels. It is the current challenge to apply these models, induce new processes and support humans in their work. Hence, domains move into focus which are truly based on tasks in which deep learning shines and can largely benefit from its potential. This certainly includes law and the legal domain. A large proportion of legal professionals are being confronted with tasks of natural language processing every day. Ever since laws have been designed and politics been operated, all associated acts had to be documented precisely. This leads to an exceptionally large text base, which is growing steadily. Instead of manually handling the paperwork, deep learning can assist or completely carry out processing it. Work in this direction has already been conducted, e.g. translating legal documents or even classifying verdicts of the French Supreme Court. Nonetheless, a large amount of possibilities are not exploited yet. Our motivation lies in exploring these and advancing the application of deep learning in the legal domain

Problem Statement:

Two factors play an important role when applying deep learning in the legal domain.

Computational Power and Tools: The parameter count for sophisticated neural net- works is in the hundreds of millions. Depending on the used model, it can even be higher. The training process involves repeatedly updating these parameters through the backpropagation algorithm. In order to efficiently execute the train- ing, potent hardware is needed. Also, interfaces and tools to rapidly implement models may not be missing.
Large Datasets: The provision of a large dataset is key to good performance of a deep learning model. A rule of thumb is that a supervised deep learning algorithm should have at least 10 million examples to match or exceed human performance. Each sample needs to be labeled appropriately to the task at hand. Therefore, large annotated datasets are indispensable.

The introduction of parallel training methods and the usage of GPU hardware accelerated the training speed in the last years. Adequate computational power is no longer a problem to machine learning. The gradual decline of Moore’s Law did not inhibit growth. Parallel computing won significance, which appositely aligns with concurrent training. Concluding, computational power is not a problem to deep learning in the legal domain. It is independent from the legal domain.

The real difficulty lies in acquiring annotated datasets. A labeled dataset is essential to training a model on a specific task. The creation of annotated datasets is thriving to support improvements in general tasks, such as object recognition, machine translation or speech recognition. However, these datasets include samples across domains and do often not suffice for acceptable accuracy in special domains such as the legal domain. Against the huge amounts of text, which are available in the legal domain, only a small proportion is publicly available and labeled appropriately. This leads to the following problem:

Annotated legal datasets are highly limited or barely exist at all.

Many legal tasks, including named entity recognition, named entity disambiguation, question answering, text summarization, document classification, part-of-speech tag- ging, semantic analysis and taxonomy generation are in desperate need of preprocessed datasets. The only exception to this dataset shortage is the legal translation task, which is supplied with multiple huge datasets.

We try to counteract the data shortage with the application of multi-task deep learning in the legal domain. Multi-task deep learning describes the training of multiple tasks on one model in order to mitigate data scarcity and establish transfer learning. Hence, the overall goals of this work are:

Exploit commonalities and overcome task-specific dataset shortage in the legal domain
Establish transfer learning for better results in legal text tasks
Support generic and task-independent deep learning architectures

Research Questions:

We are going to answer the following research questions in regard to the application of multi-task deep learning in the legal domain.

Can multi-task deep learning be beneficial for tasks in the legal domain?
How does training simultaneously on multiple tasks of the legal domain compare to training on each task separately?
How far is multi-task deep learning from state-of-the-art solutions in the legal domain?
What needs to be considered for choosing suitable hyperparameters for multi-task deep learning in the legal domain?