A study of the discovery and redundancy of link keys between two RDF datasets based on partition pattern structures

A link key between two RDF datasets D1 and D2 is a set of pairs of properties allowing to identify pairs of individuals x1 and x2 through an identity link such as x1 owl∶sameAs x2 . In this paper, relying on and extending previous work, we introduce an original formalization of link key discovery based on the framework of Partition Pattern Structures (pps). Our objective is to study and evaluate the redundancy of link keys based on the fact that owl:sameAs is an equivalence relation. In the pps concept lattice, every concept has an extent representing a link key candidate and an intent representing a partition of instances into sets of equivalent instances. Experiments show three main results. Firstly redundancy of link keys is not so significant in real-world datasets. Nevertheless, the link key discovery approach based on pps returns a reduced number of non redundant link key candidates when compared to a standard approach. Moreover, the pps-based approach is efficient and returns link keys of high quality.

