Schema Matching Prediction

by prof. Avigdor Gal, Technion - Israel Institute of Technology

April 16th 2014 @ 11:00, in Garda

In this invited talk prof. Avigdor Gal will present his work about

Schema Matching Prediction

Abstract

Web-scale data integration involves fully automated efforts that lack knowledge of the exact match between data descriptions. In this talk we introduce schema matching prediction, an assessment mechanism to support schema matchers in the absence of an exact match. Given attribute pair-wise similarity measures, a predictor predicts the success of a matcher in identifying correct correspondences. We present a comprehensive framework in which predictors can be defined, designed, and evaluated. We formally define schema matching evaluation and schema matching prediction using similarity spaces and discuss a set of four desirable properties of predictors, namely correlation, robustness, tunability, and generalization. We present a method for constructing predictors, supporting generalization and introduce prediction models as means of tuning prediction towards various quality measures. We define the empirical properties of correlation and robustness and provide concrete measures for their evaluation. We illustrate the usefulness of schema matching prediction by presenting three use cases: We propose a method for ranking the relevance of deep Web sources with respect to given user needs. We show how predictors can assist in the design of schema matching systems. Finally, we show how prediction can support dynamic weight setting of matchers in an ensemble, thus improving upon current state-of-the-art weight setting methods. An extensive empirical evaluation shows the usefulness of predictors in these use cases and demonstrates the usefulness of prediction models in increasing the performance of schema matching.

Speaker:

Associate Professor Avigdor Gal of the Faculty of Industrial Engineering & Management at the Technion is a Technion graduate and an expert on information systems.

His research focuses on effective methods of integrating data from multiple and diverse sources, which affect the way businesses and consumers seek information over the Internet. His current work zeroes in on schema matching — the task of providing communication between databases, and connecting such communication to real-world concepts. Another line of research involves the identification of complex events such as flu epidemics, biological attacks, and breaches in computer security, and its application to disaster and crisis management. He has applied his research to European and American projects in government, eHealth, and the integration of business documents. Born in Tel Aviv-Jaffa, Prof. Gal received his bachelor’s degree in Computer Science in 1990, and in 1995 earned his doctorate in information systems engineering — both from the Technion. Prof. Gal has published more than 100 papers in leading professional journals (e.g. Journal of the ACM (JACM), ACM Transactions on Database Systems (TODS), IEEE Transactions on Knowledge and Data Engineering (TKDE), ACM Transactions on Internet Technology (TOIT), and the VLDB Journal) and conferences (ICDE, BPM, DEBS, ER, CoopIS) and books (Schema Matching and Mapping). He authored the book Uncertain schema Matching in 2011, serves in various editorial capacities for periodicals including the Journal on Data Semantics (JoDS), Encyclopedia of Database Systems and Computing, and has helped organize professional workshops and conferences nearly every year since 1998. He has won the IBM Faculty Award each year from 2002-2004, several Technion awards for teaching, the 2011-2013 Technion-Microsoft Electronic Commerce Research Award, and the 2012 Yanai Award for Excellence in Academic Education, and others.