Notice
Resolving Entities in the Web of Data
- document 1 document 2 document 3
- niveau 1 niveau 2 niveau 3
Descriptif
Over the past decade, numerous knowledge bases (KBs) have been built to power a new generation of Web applications that provide entity-centric search and recommendation services. These KBs offer comprehensive, machine-readable descriptions of a large variety of real-world entities (e.g., persons, places, products, events) published on the Web as Linked Data (LD). Even when derived from the same data source (e.g., a Wikipedia entry), KBs such as DBpedia, YAGO2, or Freebase may provide multiple, non-identical descriptions for the same real-world entities. This is due to the different information extraction tools and curation policies employed by KBs, resulting to complementary and sometimes conflicting entity descriptions. Entity resolution (ER) aims to identify different descriptions that refer to the same real-world entity, and emerges as a central data-processing task for an entity-centric organization of Web data. ER is needed to enrich interlinking of data elements describing entities, even by third-parties, so that the Web of data can be accessed by machines as a global data space using standard languages, such as SPARQL. ER can also facilitate an automated KB construction by integrating entity descriptions from legacy
KBs with Web content published as HTML documents.
ER has attracted significant attention from many researchers in information systems, database and machine-learning communities. The objective of this lecture is to present the new ER challenges stemming from the Web openness in describing, by an unbounded number of KBs, a multitude of entity types across domains, as well as the high heterogeneity (semantic and structural) of descriptions, even for the same types of entities. The scale, diversity and graph structuring of entity descriptions published according to the LD paradigm challenge the core ER tasks, namely, (i) how descriptions can be effectively compared for similarity and (ii) how resolution algorithms can efficiently filter the candidate pairs of descriptions that need to be compared.
In a multi-type and large-scale entity resolution, we need to examine whether two entity descriptions are somehow (or near) similar without resorting to domain- specific similarity functions and/or mapping rules. Furthermore, the resolution of some entity descriptions might influence the resolution of other neighbourhood descriptions. This setting clearly goes beyond deduplication (or record linkage) of collections of descriptions usually referring to a single entity type that slightly differ only in their attribute values. It essentially requires leveraging similarity of descriptions both on their content and structure. It also forces us to revisit traditional ER workfows consisting of separate indexing (for pruning the number of candidate pairs) and matching (for resolving entity descriptions) phases.
In this talk we intend to provide a starting point for researchers, students and developers who are interested in a global view of the ER problem in the Web of data.
Dans la même collection
-
Explorations Mathématiques de l'activité du cerveau
TouboulJonathanExplorations Mathématiques de l'activité du cerveau Le siècle dernier a été une période fascinante durant laquelle les recherches expérimentales ont fait des avancées majeures sur la
-
Logic-based static analysis for the verification of programs with dynamically allocated data struct…
DrăgoiCezaraSoftware development has reached a complexity level that cannot be handled without the aid of computer assisted methods. It is therefore of the highest importance to have rigorous methods and
-
Wireless In the Woods: Monitoring the Snow Melt Process in the Sierra Nevada
WatteyneThomasHistorically, the study of mountain hydrology and the water cycle has been largely observational, with meteorological forcing and hydrological variables extrapolated from a few infrequent manual
-
Phénomènes Aléatoires dans les Réseaux
RobertPhilippeLes phénomènes aléatoires sont une composante-clé des réseaux de communication, ils interviennent, de façon majeure, dans le trafic que les réseaux traitent, ainsi que dans certains algorithmes
-
Modèles mémoire pour les multiprocesseurs à mémoire partagée
MarangetLucLa plupart des systèmes qui s'apparentent à des ordinateurs un tant soit peu sophistiqués comprennent plusieurs unités de calcul qui communiquent par l'intermédiaire d'une mémoire partagée.
-
Gestion de données personnelles respectueuse de la vie privée
AnciauxNicolasEn très peu de temps, nous sommes entrés dans une ère de génération massive des données personnelles créées par les individus, leurs équipements digitaux ou mises à disposition par certaines
-
Génération de maillages pour la simulation numérique
LoseilleAdrienUne branche importante du calcul scientifique consiste à simuler sur ordinateurs des phénomènes physiques complexes. Son intérêt consiste à mieux appréhender des problèmes fondamentaux : solution
-
Réseau optiques, algorithmes et probabilités
RobertsJ. B.L'objectif des recherches de l'équipe RAP est de modéliser le comportement de réseaux de divers types, soumis à une demande de nature aléatoire, afin la prédire leurs performances. Le partage des
-
Réduction de modèles de voies de signalisation intracellulaire
FeretJérômeLes voies de signalisation intracellulaire sont des cascades d'interaction entre protéines, qui permettent à la cellule de recevoir des signaux, de les propager jusqu'à son noyau, puis de les