Entity matching has been a fundamental task of every major integration, querying, and data cleaning effort. It aims at identifying whether two pieces of information are actually referring to the same real world object, or the degree to which an entity stored in the data satisfies the conditions specified in a query. Entity matching has received considerable attention during the last decade, especially by the Semantic Web community where entities are the fundamental data representation structure. Despite the many interesting results that the research community presented, there is still no widely acceptable benchmark for evaluating and comparing these approaches.

The EMBench system, short for Entity Matching Benchmark, aims at generating data for evaluating entity matching techniques. Based on an extensive analysis of the existing matching techniques, we have created a collection of matching scenarios that are considered important and should be supported by the matching techniques. EMBench components focus on generating testing data for capturing the identified matching scenarios, including components for creating a data repository, generating an entity collection and applying entity modifiers, and evaluating the performance of matching techniques. In contrast to existing efforts, the current system offers high specification capabilities that allow not only the generation of test data but also to range the degree in which a test situation occurs.


  • On Generating Benchmark Data for Entity Matching. Ekaterini Ioannou, Nataliya Rassadko, Yannis Velegrakis. In Journal of Data Semantics 2(1): 37-56, 2013. [ abstract and pdf ]
            author = {Ekaterini Ioannou and Nataliya Rassadko and Yannis Velegrakis},
            title = {On Generating Benchmark Data for Entity Matching},
            journal = {J. Data Semantics},
            volume = {2},
            number = {1},
            year = {2013},
            pages = {37-56}

Last modified: July 2014,   Page maintained by: Ekaterini Ioannou, Yannis Velegrakis