Efficient Query Processing in Probabilistic-Temporal Databases

by prof. Martin Theobald, University of Antwerp

June 9th 2014 @ 10:00, in Levico

In this invited talk prof. Martin Theobald will present his work about

Efficient Query Processing in Probabilistic-Temporal Databases

Abstract

Recent advances in the field of information extraction have paved the way for the automatic construction and growth of large, semantic knowledge bases from Web sources. Knowledge bases like DBpedia or YAGO today contain hundreds of millions of facts about real-world entities and their relationships among each other, which are captured in the popular Resource Description Framework (RDF) format. However, the very nature of the underlying extraction techniques entails that the resulting RDF knowledge bases may face a significant amount of incorrect, incomplete, or even inconsistent factual knowledge, which makes efficient and reliable query answering over this kind of uncertain RDF data a challenge. Our query engine, coined URDF, performs query answering in uncertain RDF knowledge bases via a combination of Datalog-style deduction rules, consistency constraints, and probabilistic inference, which will be the main subject of this talk. Specifically, by casting the above scenario into a probabilistic database setting, we develop a new top-k algorithm for query answering, which - for the first time in the context of probabilistic databases - allows us to fully integrate data and confidence computations over this kind of probabilistic input data. Extensions of our framework include the automatic learning of these deduction rules from RDF data sources, as well as the consideration of temporal deduction rules and consistency constraints over time-annotated, probabilistic facts.

Speaker:

Martin Theobald is an Associate Professor for Databases and Information Retrieval at the University of Antwerp.

Before joining the ADReM research group in Antwerp in 2012, he spent four years as a Senior Researcher at the Max-Planck-Institute for Informatics in Saarbrücken. Between 2006 and 2008, Martin was a Post-Doc at the Stanford Infolab, where he worked on the Trio probabilistic database system. Martin obtained a doctoral degree in Computer Science from Saarland University in 2006. For his dissertation with the title “Efficient Top-k Query Processing for Text, Semistructured, and Structured Data”, Martin received several awards, including an ACM-Sigmod Jim Gray Dissertation Award “Honorable Mention”. Martin currently is an Area Editor for Elsevier's Information Systems, and he served on the program committees and as a reviewer for numerous international journals, conferences and workshops, including TODS, TKDE, VLDB-J, PVDLB, SIGMOD, SIGIR, ICDE, WSDM and WWW.