Nowadays, enterprises have recently entered a new era which is characterized by the exploration of Big Data to uncover yet unreachable knowledge. Big Data concerns massive data sets in the range of exabytes — highly variable, complex, and growing data sets from multiple data sources with difficulties of storing, analyzing, and visualizing for further processes. With the rapid development of networking, data storage, and data collection capacity, Big Data is now swiftly expanding in various domains, including manufacturing, healthcare, and retail, by the increasing volume and detail of information captured by companies, the rise of social media, and the Internet of Things. The process of researching massive amounts of data reveals hidden patterns and enables companies to gain richer and deeper insights into invaluable information. The result of Big Data analytics underpins new waves of productivity growth, innovation, and becomes a key basis for competition.
However, a vast number of Big Data technology vendors has emerged providing technologies to support companies for harnessing the intended value of Big Data. Within the huge Big Data vendor landscape, companies seek end-to-end solutions that can store, analyze, and visualize mass data quickly and reliably. The open-source Elasticsearch, Logstash, and Kibana (ELK) stack developed by Elastic is a search-based data discovery tool that provides a promising set of tools that can be used for near real-time analytics and fast full-text searches. It helps to glean actionable insights from almost any type of structured and unstructured data from almost any data source and faces the daunting task of harnessing the intended value of Big Data. Since the challenges in Big Data technology selection are non-trivial, evaluating the applicability of the ELK stack for different Big Data use cases seems inevitable.
The goal of this master’s thesis is to assess the applicability of the ELK stack for various Big Data use cases. For that reason, this work aims to juxtapose the ELK stack in opposition to related search-based data discovery tools and to crystallize out its key features and capabilities by conducting a structured literature review and a descriptive study. Within the scope of four experiments, the ELK stack is implemented and its performance is assessed by various performance benchmarks. Based on the results of the experiments, distinct characteristics of the ELK stack are elicited. These characteristics indicate strengths and weaknesses of the ELK stack and provide guidance for better decision making in Big Data technology adoption.