Entity++ matching is a fundamental task of every major integration, querying, and data cleaning effort. It aims at identifying whether two pieces of information are actually referring to the same real world object, or the degree to which an entity stored in the data satisfies the conditions specified in a query. Entity matching has received considerable attention during the last decade, especially by the Semantic Web community where entities are the fundamental data representation structure. Despite the many interesting results that the research community presented, there is still no widely acceptable benchmark for evaluating and comparing these approaches.

The EMBench++ system, short for Entity Matching Benchmark, aims at generating data for evaluating entity matching techniques. Based on an extensive analysis of the existing matching techniques, we have created a collection of matching scenarios that are considered important and should be supported by the matching techniques. EMBench components focus on generating testing data for capturing the identified matching scenarios, including components for creating a data repository, and generating an entity collection and applying entity modifiers. The system uses a flexible and generic model in which an entity is a set of characteristics that can also refer other entities. In addition, it builds on resolution scenarios that are able to capture basic real world situations (e.g., syntactic variations and structural differences) and more advanced situations (i.e., evolving information).




 
Last modified: May 2018,   Page maintained by: Ekaterini Ioannou, Yannis Velegrakis