Back to top

Ha11b - Making Data Analysis Expertise Broadly Accessible through Workflows

Last modified Apr 4, 2012

Abstract

The demand for advanced skills in data analysis spans many areas of science, computing, and business analytics. This paper presents a framework that uses workflows to enable non-expert users to reuse workflows created by experts and representing complex data mining processes for text analytics. This framework includes workflows for document classification, document clustering, and topic detection, all assembled from components available in well-known text analytics software libraries. The workflows capture expert-level knowledge on how these individual components need to be combined with data preparation and feature selection steps to make the underlying statistical learning algorithms most effective. The framework allows researchers to easily experiment with different combinations of data analysis processes, represented as workflows of computations that they can easily reconfigure and that seamlessly harness the power of large-scale distributed execution resources. We report on our experiences to date on having users with limited data analytic knowledge and even basic programming skills to apply workflows to their data.

Files and Subpages

Name Type Size Last Modification Last Editor
Bu11c.pdf 206 KB 16.05.2012