Towards Semantically Enriched Data Spaces

Last modified Jan 27, 2022

Motivation

Usually, large corporations struggle to provide an enterprise-wide overview of available data and data demands.
The main challenges are complex processes which are intensified by the exponential growth of data. As a result, data is often split and locked in separate data silos maintained by the diverse departments.

Leveraging these data silos for analytical and data science tasks is a hard problem.
Studies show that data scientists spend up to 80% of their time for discovering, accessing and transforming data instead of analyzing it.
Current, heavily engineering-based solutions do not allow queryable ad-hoc access and do not scale beyond a certain number of data sources and models.

Approach

We are currently researching new systems to enable next-generation data management using the following approaches:

Linked data management using graph databases and ontologies that enable:

enrichment of decentral data with meta / semantic / provenance information
efficient incremental management of this information as annotated labeled graphs
adoption of emergent meta data standards and ontologies

Query federation using industry standards (like Apache Drill, GraphQL) that enable:

Virtual data sets
Avoidance of data duplication
Decoupling of source and target schema

Publications

[Ho19a]

Holl, P., & Gossling, K. Midas-An interactive data catalog for data science teams.: KDD`19 Project Showcase