The typical approach to data integration is to start by defining a common mediated schema, and then to map the data sources being integrated to this schema. In Internet-scale data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, a better approach is to allow the user to discover the mediated schema and the set of sources to use through an iterative exploration of the space of possible schemas and sources. In this paper, we present μBE, a data integration tool that helps in this iterative exploratory process by automatically choosing the data sources to include in a data integration system and defining a mediated schema on these sources. The data integration system desired by the user may depend on several subjective and objective criteria, and the user guides μBE towards finding this system by iteratively solving a series of constrained non-linear optimization problems, and modifying the parameters and constraints of the problem in the next iteration based on the solution found in the current iteration. Our formulation of the optimization problem is designed to make it easy for the user to provide such feedback. A simple, intuitive user interface helps the user in this process. We experimentally demonstrate that μBE is efficient and finds high-quality data integration solutions.
Download Full PDF Version (Non-Commercial Use)