Statistical reasoning was one of the earliest methods to draw insights from data. However, over the last three decades, association rule mining and online analytical processing have gained massive ground in practice and theory. Logically, both association rule mining and online analytical processing have some common objectives, but they have been introduced with their own set of mathematical formalizations and have developed their specific terminologies. Therefore, it is difficult to reuse results from one domain in another. Furthermore, it is not easy to unlock the potential of statistical results in their application scenarios. The target of this paper is to bridge the artificial gaps between association rule mining, online analytical processing and statistical reasoning. We first provide an elaboration of the semantic correspondences between their foundations, i.e., itemset apparatus, relational algebra and probability theory. Subsequently, we propose a novel framework for the unification of association rule mining, online analytical processing and statistical reasoning. Additionally, an instance of the proposed framework is developed by implementing a sample decision support tool. The tool is compared with a state-of-the-art decision support tool and evaluated by a series of experiments using two real data sets and one synthetic data set. The results of the tool validate the framework for the unified usage of association rule mining, online analytical processing, and statistical reasoning. The tool clarifies in how far the operations of association rule mining and online analytical processing can complement each other in understanding data, data visualization and decision making.
Despite the popularity of Formal Concept Analysis (FCA) as a mathematical framework for data analysis, some of its extensions are still considered arcane. Polyadic Concept Analysis (PCA) is one of the most promising yet understudied of these extensions. This formalism offers many interesting open questions but is hindered in its dissemination by complex notations and a lack of agreed-upon basic definitions. In this paper, we discuss in a mostly informal way the fundamental differences between FCA and PCA in the relation between contexts, conceptual structures, and rules. We identify open questions, present partial results on the maximal size of concept n-lattices and suggest new research directions.
Multicriteria decision making aims at helping a decision maker choose the best solutions among alternatives compared against multiple conflicting criteria. The reasons why an alternative is considered among the best are not always clearly explained. In this paper, we propose an approach that uses formal concept analysis and background knowledge on the criteria to explain the presence of alternatives on the Pareto front of a multicriteria decision problem.
Association rules mining is a problem that has given rise to a rich literature, especially in classic binary bidimensional data. In particular, the representation of the set of rules without loss of information is well understood. This is not the case in multidimensional binary data. In this paper, we show that the knowledge of the closed $n$-sets of a multidimensional Boolean tensor is enough to allow for the derivation of the confidence of every multidimensional association rule. This generalises well-known results in the bidimensional case. We also provide experimental comparisons between the numbers of closed $n$-sets and frequent associations.
Concept lattices are well-known conceptual structures that organise interesting patterns — the concepts — extracted from data. In some applications, the size of the lattice can be a problem, as it is often too large to be efficiently computed and too complex to be browsed. In others, redundant information produces noise that makes understanding the data difficult. In classical FCA, those two problems can be attenuated by, respectively, computing a substructure of the lattice — such as the AOC-poset — and reducing the context. These solutions have not been studied in $d$-dimensional contexts for $d > 3$. In this paper, we generalise the notions of AOC-poset and reduction to $d$-lattices, the structures that are obtained from multidimensional data in the same way that concept lattices are obtained from binary relations.
Formal Concept Analysis (FCA) and its associated conceptual structures is used to support exploratory search through conceptual navigation. Relational Concept Analysis (RCA) is an extension of Formal Concept Analysis to process relational datasets. RCA and its multiple interconnected structures represent good candidates to support exploratory search in relational datasets, as they are enabling navigation within a structure as well as between the connected structures. However, building the entire structures does not present an efficient solution to explore a small localised area of the dataset, to retrieve the closest alternatives to a given query. In these cases, generating only a concept and its neighbour concepts at each navigation step appears as a less costly alternative. In this paper, we propose an algorithm to compute a concept, and its neighbourhood in connected concept lattices. The concepts are generated directly from the relational context family, and possess both formal and relational attributes. The algorithm takes into account two RCA scaling operators and it is implemented in the RCAExplore tool.
Keywords: Relational Concept Analysis, Formal Concept Analysis, Exploratory Search, On-demand Generation, Local Generation
When analyzing microbial communities, an active and computational challenge concerns the
categorization of 16S rRNA gene sequences into operational taxonomic units (OTUs).
Established clustering tools use a one pass algorithm to tackle high number of gene se-
quences and produce OTUs in reasonable time. However, all of the current tools are based
on a crisp clustering approach, where a gene sequence is assigned to one cluster. The weak
quality of the output compared with more complex clustering algorithms forces the user to
postprocess the obtained OTUs. Providing a membership degree when assigning a gene
sequence to an OTU will help the user during the postprocessing task. Moreover it is
possible to use this membership degree to automatically evaluate the quality of the obtained
OTUs. So the goal of this study is to propose a new clustering approach that takes into
account uncertainty when producing OTUs, and improves both the quality and the pre-
sentation of the OTU results.
Keywords: algorithm, clustering, sequences.
We study the maximum number of closed sets in a 3-dimensional dataset of size n x n x n.
We show that it is between 3.36^n and 3.38^n.
Dans ce papier, nous étudions le nombre maximum d’ensembles fermés dans un cube de données de taille n x n x n.
Nous montrons qu’il se situe entre 3.36^n et 3.38^n.
Association rules mining is a problem that gave rise to a rich literature, especially in classic binary bidimensional data. In particular, the relation between closed sets and association rules is well understood. This is not the case in multidimensional data. In this paper, we show that the knowledge of the closed n-sets of a multidimensional boolean tensor is enough to allow for the derivation of the confidence of every multidimensional association rule.