A link key between two RDF datasets D1 and D2 is a set ofpairs of properties allowing to identify pairs of individuals, say x1 in D1 and x2 in D2, which can be materialized as ax1owl:sameAs x2 identity link. There exist several ways to mine such link keys but no one takes into account the fact that owl:sameAs is an equivalence relation, which leads to the discovery of non-redundant link keys. Accordingly, in this paper, we present the link key discovery based on Pattern Structures(PS). PS output a pattern concept lattice where every concept has an extent representing a set of pairs of individuals and an intent representing the related link key candidate. Then, we discuss the equivalence relation in duced by a link key and we introduce the notion of non-redundant link key candidate.
Archives de catégorie : Articles
A Hybrid Approach to Identifying the Most Predictive and Discriminant Features in Supervised Classification Problems
In this paper, we are interested in the predictive and discriminant nature of features in supervised classification problems. We discuss the notions of prediction and discrimination and propose a hybrid approach combining supervised classifiers, model explanation, multicriteria decision making and pattern mining for identifying the most predictive and discriminant features in a dataset. The explanation of models learned by supervised classifiers produces rankings of features according to various performance measures. Based on that, multicriteria decision making and pattern mining methods are used to, respectively, select the most important features and interpret their role in terms of prediction and discrimination. Finally, we present and discuss two experiments on public datasets illustrating the potential of the approach.
Sandwich: An Algorithm for Discovering Relevant Link Keys in an LKPS Concept Lattice
The discovery of link keys between two RDF datasets allows the identification of individuals which share common key characteristics. Actually link keys correspond to closed sets of a specific Galois connection and can be discovered thanks to an FCA-based algorithm. In this paper, given a pattern concept lattice where each concept intent is a link key candidate, we aim at identifying the most relevant candidates w.r.t adapted quality measures. To achieve this task, we introduce the « Sandwich » algorithm which is based on a combination of two dual bottom-up and top-down strategies for traversing the pattern concept lattice. The output of the Sandwich algorithm is a poset of the most relevant link key candidates. We provide details about the quality measures applicable to the selection of link keys, the Sandwich algorithm, and as well a discussion on the benefit of our approach.
Steps Towards Causal Formal Concept Analysis
Efficiently discovering causal relations from data and representing them in a way that facilitates their use is an important problem in science that has received much attention. In this paper, we propose an adaptation of the Formal Concept Analysis formalism to the problem of discovering and representing causal relations. We show that Formal Concept Analysis structures and algorithms are well-suited to this problem.
A Novel Framework for Unification of Association Rule Mining, Online Analytical Processing and Statistical Reasoning
Statistical reasoning was one of the earliest methods to draw insights from data. However, over the last three decades, association rule mining and online analytical processing have gained massive ground in practice and theory. Logically, both association rule mining and online analytical processing have some common objectives, but they have been introduced with their own set of mathematical formalizations and have developed their specific terminologies. Therefore, it is difficult to reuse results from one domain in another. Furthermore, it is not easy to unlock the potential of statistical results in their application scenarios. The target of this paper is to bridge the artificial gaps between association rule mining, online analytical processing and statistical reasoning. We first provide an elaboration of the semantic correspondences between their foundations, i.e., itemset apparatus, relational algebra and probability theory. Subsequently, we propose a novel framework for the unification of association rule mining, online analytical processing and statistical reasoning. Additionally, an instance of the proposed framework is developed by implementing a sample decision support tool. The tool is compared with a state-of-the-art decision support tool and evaluated by a series of experiments using two real data sets and one synthetic data set. The results of the tool validate the framework for the unified usage of association rule mining, online analytical processing, and statistical reasoning. The tool clarifies in how far the operations of association rule mining and online analytical processing can complement each other in understanding data, data visualization and decision making.
Some Notes on Polyadic Concept Analysis
Despite the popularity of Formal Concept Analysis (FCA) as a mathematical framework for data analysis, some of its extensions are still considered arcane. Polyadic Concept Analysis (PCA) is one of the most promising yet understudied of these extensions. This formalism offers many interesting open questions but is hindered in its dissemination by complex notations and a lack of agreed-upon basic definitions. In this paper, we discuss in a mostly informal way the fundamental differences between FCA and PCA in the relation between contexts, conceptual structures, and rules. We identify open questions, present partial results on the maximal size of concept n-lattices and suggest new research directions.
Explaining multicriteria decision making with formal concept analysis
Multicriteria decision making aims at helping a decision maker choose the best solutions among alternatives compared against multiple conflicting criteria. The reasons why an alternative is considered among the best are not always clearly explained. In this paper, we propose an approach that uses formal concept analysis and background knowledge on the criteria to explain the presence of alternatives on the Pareto front of a multicriteria decision problem.
Condensed Representations of Association Rules in n-ary Relations
Association rules mining is a problem that has given rise to a rich literature, especially in classic binary bidimensional data. In particular, the representation of the set of rules without loss of information is well understood. This is not the case in multidimensional binary data. In this paper, we show that the knowledge of the closed $n$-sets of a multidimensional Boolean tensor is enough to allow for the derivation of the confidence of every multidimensional association rule. This generalises well-known results in the bidimensional case. We also provide experimental comparisons between the numbers of closed $n$-sets and frequent associations.
Reduction and Introducers in d-contexts
Concept lattices are well-known conceptual structures that organise interesting patterns — the concepts — extracted from data. In some applications, the size of the lattice can be a problem, as it is often too large to be efficiently computed and too complex to be browsed. In others, redundant information produces noise that makes understanding the data difficult. In classical FCA, those two problems can be attenuated by, respectively, computing a substructure of the lattice — such as the AOC-poset — and reducing the context. These solutions have not been studied in $d$-dimensional contexts for $d > 3$. In this paper, we generalise the notions of AOC-poset and reduction to $d$-lattices, the structures that are obtained from multidimensional data in the same way that concept lattices are obtained from binary relations.
On-Demand Relational Concept Analysis
Formal Concept Analysis (FCA) and its associated conceptual structures is used to support exploratory search through conceptual navigation. Relational Concept Analysis (RCA) is an extension of Formal Concept Analysis to process relational datasets. RCA and its multiple interconnected structures represent good candidates to support exploratory search in relational datasets, as they are enabling navigation within a structure as well as between the connected structures. However, building the entire structures does not present an efficient solution to explore a small localised area of the dataset, to retrieve the closest alternatives to a given query. In these cases, generating only a concept and its neighbour concepts at each navigation step appears as a less costly alternative. In this paper, we propose an algorithm to compute a concept, and its neighbourhood in connected concept lattices. The concepts are generated directly from the relational context family, and possess both formal and relational attributes. The algorithm takes into account two RCA scaling operators and it is implemented in the RCAExplore tool.
Keywords: Relational Concept Analysis, Formal Concept Analysis, Exploratory Search, On-demand Generation, Local Generation