Back to top

Issues Labeled as Architectural Design Decisions (ILADD)

Last modified by Florian Matthes Feb 16, 2019
   No tags assigned

The need to support a software architect's day-to-day activities through efficient tool support has been highlighted in both research and industry. However, managing enterprises' Architectural Knowledge (AK) comprising of Architectural Design Decisions to support scenarios such as decision making during the execution of large industrial projects is still a challenge [Bh16]. One of the evident reasons for this challenge is the lack of sufficient motivation, as well extensive manual effort and cost involved in the documentation process for managing AK. Hence, approaches that support automatically or semi-automatically extracting AK from project-related artifacts (cf. architecture reengineering, architecture recovery, architecture archaeology) are becoming popular.

Even though AK is not explicitly documented, it is implicitly captured in different systems including project management, issue management, and version management systems. In our work, we focus on extracting design decisions made in the past from issue management systems. Analyzing design decisions made in the past will help us to address specific architectural concerns in similar project context. To this end, we envision to apply machine-learning (ML) based approaches to automatically extract and classify design decisions from issues captured in issue management systems.

As first steps, we have extracted more than 1500 issues from two popular open-source projects (Apache Spark and Apache Hadoop Common) into SocioCortex. Furthermore, we have manually labeled these issues as either reflecting a design decision (784 issues) or not a design decision (790 issues). Since these decisions can be classified into different decision categories such as existence, property, and executive decisions (cf. ontology of architectural design decisions by Kruchten), we have also manually labeled the identified decisions into three specific categories, namely Structural (227 issues), Behavioral (388 issues), and Non-existence or Ban decisions (164 issues) for further analysis. The manual labeling of design decisions is based on the rules captured in the Table below. We plan to use this annotated dataset for training our ML models for automatic extraction and classification.

Structural decision:

+ Adding or updating plugins, libraries, or third-party systems
+ Adding or updating classes, modules, or les (a class, in this context, refers to a Java class)
+ Changing access specifier of a class
+ Merging or splitting classes or modules
+ Moving parts of the code or the entire les from one location to another (code refactoring to address maintainability issues)
+ Updating names of classes, methods, or modules

Behavioral decision:
+ Adding or updating functionality (methods/functions) and process flows
+ Providing configuration options for managing the behavior of the system
+ Adding or updating application programming interfaces (APIs)
+ Adding or updating dependencies between methods
+ Deprecating or disabling specific functionality
+ Changing the access specifiers of methods

Ban decision:
+ Removing existing plugins, libraries, or third-party systems
+ Discarding classes, modules, code snippets, or les
+ Deleting methods, APIs, process ows, or dependencies between methods
+ Removing deprecated methods

Design decision:
+ An issue that belongs to any one of the above categories

Not a design decision:
+ An issue that does not belong to any of the above categories

 

We believe such a dataset with manually labeled information will benefit the software architecture research community, in general, to further enrich, share, and reuse the knowledge to apply (semi-) automated approaches to support different use cases including software architecture recommendations and decision support. Such a dataset will provide us a common ground and help us to benchmark our algorithms used for architectural decision support.

The rationale for the selection of the two open-source projects, the data curation and labeling processes are elaborated in the paper titled "Automatic Extraction of Design Decisions from Issue Management Systems: A Machine Learning Based Approach".

 

The dataset with manually labeled information is now publicly available [link]. This dataset can also be accessed over the REST APIs for building custom applications. Please feel free to contact us if you have issues accessing this dataset.

Contributions and suggestions to improve the dataset are most welcome!


External links: [researchgate]