Motivation
Usually, large corporations struggle to provide an enterprise-wide overview of available data and data demands.
The main challenges are complex processes which are intensified by the exponential growth of data. As a result, data is often split and locked in separate data silos maintained by the diverse departments.
Leveraging these data silos for analytical and data science tasks is a hard problem.
Studies show that data scientists spend up to 80% of their time for discovering, accessing and transforming data instead of analyzing it.
Current, heavily engineering-based solutions do not allow queryable ad-hoc access and do not scale beyond a certain number of data sources and models.
Approach
We are currently researching new systems to enable next-generation data management using the following approaches:
Linked data management using graph databases and ontologies that enable:
Query federation using industry standards (like Apache Drill, GraphQL) that enable:
[Ho19a] | Holl, P., & Gossling, K. Midas-An interactive data catalog for data science teams.: KDD`19 Project Showcase |