Back to top

Towards Semantically Enriched Data Spaces

Last modified Jan 27, 2022

Motivation

Usually, large corporations struggle to provide an enterprise-wide overview of available data and data demands.
The main challenges are complex processes which are intensified by the exponential growth of data. As a result, data is often split and locked in separate data silos maintained by the diverse departments. 

Leveraging these data silos for analytical and data science tasks is a hard problem.
Studies show that data scientists spend up to 80% of their time for discovering, accessing and transforming data instead of analyzing it. 
Current, heavily engineering-based solutions do not allow queryable ad-hoc access and do not scale beyond a certain number of data sources and models. 

 

Approach

We are currently researching new systems to enable next-generation data management using the following approaches:

Linked data management using graph databases and ontologies that enable:

  • enrichment of decentral data with meta / semantic / provenance information
  • efficient incremental management of this information as annotated labeled graphs
  • adoption of emergent meta data standards and ontologies 

Query federation using industry standards (like Apache Drill, GraphQL) that enable:

  • Virtual data sets
  • Avoidance of data duplication
  • Decoupling of source and target schema

 

 

Publications

[Ho19a] Holl, P., & Gossling, K. Midas-An interactive data catalog for data science teams.: KDD`19 Project Showcase