Archives de catégorie : Articles

Condensed Representations of Association Rules in n-ary Relations

Association rules mining is a problem that has given rise to a rich literature, especially in classic binary bidimensional data. In particular, the representation of the set of rules without loss of information is well understood. This is not the case in multidimensional binary data. In this paper, we show that the knowledge of the closed $n$-sets of a multidimensional Boolean tensor is enough to allow for the derivation of the confidence of every multidimensional association rule. This generalises well-known results in the bidimensional case. We also provide experimental comparisons between the numbers of closed $n$-sets and frequent associations.

Reduction and Introducers in d-contexts

Concept lattices are well-known conceptual structures that organise interesting patterns — the concepts — extracted from data. In some applications, the size of the lattice can be a problem, as it is often too large to be efficiently computed and too complex to be browsed. In others, redundant information produces noise that makes understanding the data difficult. In classical FCA, those two problems can be attenuated by, respectively, computing a substructure of the lattice — such as the AOC-poset — and reducing the context. These solutions have not been studied in $d$-dimensional contexts for $d > 3$. In this paper, we generalise the notions of AOC-poset and reduction to $d$-lattices, the structures that are obtained from multidimensional data in the same way that concept lattices are obtained from binary relations.

On-Demand Relational Concept Analysis

Formal Concept Analysis (FCA) and its associated conceptual structures is used to support exploratory search through conceptual navigation. Relational Concept Analysis (RCA) is an extension of Formal Concept Analysis to process relational datasets. RCA and its multiple interconnected structures represent good candidates to support exploratory search in relational datasets, as they are enabling navigation within a structure as well as between the connected structures. However, building the entire structures does not present an efficient solution to explore a small localised area of the dataset, to retrieve the closest alternatives to a given query. In these cases, generating only a concept and its neighbour concepts at each navigation step appears as a less costly alternative. In this paper, we propose an algorithm to compute a concept, and its neighbourhood in connected concept lattices. The concepts are generated directly from the relational context family, and possess both formal and relational attributes. The algorithm takes into account two RCA scaling operators and it is implemented in the RCAExplore tool.

Keywords: Relational Concept Analysis, Formal Concept Analysis, Exploratory Search, On-demand Generation, Local Generation

A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data

When analyzing microbial communities, an active and computational challenge concerns the
categorization of 16S rRNA gene sequences into operational taxonomic units (OTUs).
Established clustering tools use a one pass algorithm to tackle high number of gene se-
quences and produce OTUs in reasonable time. However, all of the current tools are based
on a crisp clustering approach, where a gene sequence is assigned to one cluster. The weak
quality of the output compared with more complex clustering algorithms forces the user to
postprocess the obtained OTUs. Providing a membership degree when assigning a gene
sequence to an OTU will help the user during the postprocessing task. Moreover it is
possible to use this membership degree to automatically evaluate the quality of the obtained
OTUs. So the goal of this study is to propose a new clustering approach that takes into
account uncertainty when producing OTUs, and improves both the quality and the pre-
sentation of the OTU results.

Keywords: algorithm, clustering, sequences.

Du nombre maximum d’ensembles fermés en 3 dimensions

We study the maximum number of closed sets in a 3-dimensional dataset of size n x n x n.
We show that it is between 3.36^n and 3.38^n.

Du nombre maximum d’ensembles fermés en 3 dimensions

Dans ce papier, nous étudions le nombre maximum d’ensembles fermés dans un cube de données de taille n x n x n.
Nous montrons qu’il se situe entre 3.36^n et 3.38^n.

Représentation condensée de règles d’association multidimensionnelles

Association rules mining is a problem that gave rise to a rich literature, especially in classic binary bidimensional data. In particular, the relation between closed sets and association rules is well understood. This is not the case in multidimensional data. In this paper, we show that the knowledge of the closed n-sets of a multidimensional boolean tensor is enough to allow for the derivation of the confidence of every multidimensional association rule.

Représentation condensée de règles d’association multidimensionnelles

La fouille de règles d’association est un problème qui a donné lieu à une littérature foisonnante, notamment dans les données binaires bidimensionnelles classiques. En particulier, la relation entre les ensembles fermés et les règles d’association est bien connue. Tel n’est pas le cas dans les données multidimensionnelles. Dans ce papier, nous montrons que la connaissance des n-ensembles fermés d’un tenseur booléen multidimensionnel est suffisante pour inférer la confiance de toutes les règles d’association multidimensionnelles.

Bounding the Number of Minimal Transversals in Tripartite 3-Uniform Hypergraphs

We bound the number of minimal hypergraph transversals that arise in tri-partite 3-uniform hypergraphs, a class commonly found in applications dealing with data. Let H be such a hypergraph on a set of vertices V. We give a lower bound of 1.4977 |V | and an upper bound of 1.5012 |V | .

A De Novo Robust Clustering Approach for Amplicon-Based Sequence Data

When analyzing microbial communities, an active and computational challenge concerns the categorization of 16S rRNA gene sequences into operational taxonomic units (OTUs). Established clustering tools use a one pass algorithm in order to tackle high numbers of gene sequences and produce OTUs in reasonable time. However, all of the current tools are based on a crisp clustering approach, where a gene sequence is assigned to one cluster. The weak quality of the output compared to more complex clustering algorithms, forces the user to post-process the obtained OTUs. Providing a membership degree when assigning a gene sequence to an OTU, will help the user during the post-processing task. Moreover it is possible to use this membership degree to automatically evaluate the quality of the obtained OTUs. So the goal of this work is to propose a new clustering approach that takes into account uncertainty when producing OTUs, and improves both the quality and the presentation of the OTUs results.