Archives de catégorie : Articles

Représentation condensée de règles d’association multidimensionnelles

Association rules mining is a problem that gave rise to a rich literature, especially in classic binary bidimensional data. In particular, the relation between closed sets and association rules is well understood. This is not the case in multidimensional data. In this paper, we show that the knowledge of the closed n-sets of a multidimensional boolean tensor is enough to allow for the derivation of the confidence of every multidimensional association rule.

Représentation condensée de règles d’association multidimensionnelles

La fouille de règles d’association est un problème qui a donné lieu à une littérature foisonnante, notamment dans les données binaires bidimensionnelles classiques. En particulier, la relation entre les ensembles fermés et les règles d’association est bien connue. Tel n’est pas le cas dans les données multidimensionnelles. Dans ce papier, nous montrons que la connaissance des n-ensembles fermés d’un tenseur booléen multidimensionnel est suffisante pour inférer la confiance de toutes les règles d’association multidimensionnelles.

A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data

When analyzing microbial communities, an active and computational challenge concerns the categorization of 16S rRNA gene sequences into operational taxonomic units (OTUs). Established clustering tools use a one pass algorithm in order to tackle high numbers of gene sequences and produce OTUs in reasonable time. However, all of the current tools are based on a crisp clustering approach, where a gene sequence is assigned to one cluster. The weak quality of the output compared to more complex clustering algorithms, forces the user to post-process the obtained OTUs. Providing a membership degree when assigning a gene sequence to an OTU, will help the user during the post-processing task. Moreover it is possible to use this membership degree to automatically evaluate the quality of the obtained OTUs. So the goal of this work is to propose a new clustering approach that takes into account uncertainty when producing OTUs, and improves both the quality and the presentation of the OTUs results.

A Depth-first Search Algorithm for Computing Pseudo-closed Sets

The question of the lower bounds for the delay in the computation of the Duquenne-Guigues implication basis in non-lectic orders is still open. As a step towards an answer, we propose an algorithm that can enumerate pseudo-closed sets in orders that do not necessarily extend the inclusion order using depthfirst searches in a sequence of closure systems. Empirical comparisons with NextClosure on the runtime and number of closed sets computed are provided.

Keywords: Implication, Pseudo-closed set

Introducer Concepts in n-Dimensional Contexts

Concept lattices are well-known conceptual structures that organise interesting patterns—the concepts—extracted from data. In some applications, such as software engineering or data mining, the size of the lattice can be a problem, as it is often too large to be efficiently computed, and too complex to be browsed. For this reason, the Galois Sub-Hierarchy, a restriction of the concept lattice to introducer concepts, has been introduced as a smaller alternative. In this paper, we generalise the Galois Sub-Hierarchy to n-lattices, conceptual structures obtained from multidimensional data in the same way that concept lattices are obtained from binary relations.

Keywords: Formal Concept Analysis, Polyadic Concept Analysis, Introducer Concept, AOC-poset, Galois Sub-Hierarchy.

Average Size of Implicational Bases

Implicational bases are objects of interest in formal concept analysis and its applications. Unfortunately, even the smallest base, the Duquenne-Guigues base, has an exponential size in the worst case. In this paper, we use results on the average number of minimal transversals in random hypergraphs to show that the base of proper premises is, on average, of quasi-polynomial size.

Keywords: Formal Concept Analysis, Implication Base, Average Case Analysis.

k-Partite Graphs as Contexts

In formal concept analysis, 2-dimensional formal contexts are bipartite graphs. In this work, we generalise the notions of context and concept to graphs that are not bipartite. We then study the complexity of the enumeration and identify the structure of the set of such concepts.

On-demand Relational Concept Analysis

Formal Concept Analysis and its associated conceptual structures have been used to support exploratory search through conceptual navigation. Relational Concept Analysis (RCA) is an extension of Formal Concept Analysis to process relational datasets. RCA and its multiple interconnected structures represent good candidates to support exploratory search in relational datasets, as they are enabling navigation within a structure as well as between the connected structures. However, building the entire structures does not present an efficient solution to explore a small localised area of the dataset, for instance to retrieve the closest alternatives to a given query. In these cases, generating only a concept and its neighbour concepts at each navigation step appears as a less costly alternative. In this paper, we propose an algorithm to compute a concept and its neighbourhood in extended concept lattices. The concepts are generated directly from the relational context family, and possess both formal and relational attributes. The algorithm takes into account two RCA scaling operators. We illustrate it on an example.

Keywords: Relational Concept Analysis, Formal Concept Analysis, Ondemand Generation

Invalidating a Conjecture on the Average Number of Closed Sets in a Random Database

In this paper, we invalidate a conjecture from [3] which stated that given a random database with n columns and m lines, the average number of closed sets of size x with a support y is maximized when x = log n and y = log m. We prove a refinement of this conjecture and obtain that x = log m − log log log m + O(1) and y = log n − log log log n + O(1). From there we obtain a estimation of the average number of closed sets in a random database.

Keywords: Average analysis, closed sets, data mining.