When analyzing microbial communities, an active and computational challenge concerns the categorization of 16S rRNA gene sequences into operational taxonomic units (OTUs). Established clustering tools use a one pass algorithm in order to tackle high numbers of gene sequences and produce OTUs in reasonable time. However, all of the current tools are based on a crisp clustering approach, where a gene sequence is assigned to one cluster. The weak quality of the output compared to more complex clustering algorithms, forces the user to post-process the obtained OTUs. Providing a membership degree when assigning a gene sequence to an OTU, will help the user during the post-processing task. Moreover it is possible to use this membership degree to automatically evaluate the quality of the obtained OTUs. So the goal of this work is to propose a new clustering approach that takes into account uncertainty when producing OTUs, and improves both the quality and the presentation of the OTUs results.

# Archives de catégorie : Articles

# A Depth-first Search Algorithm for Computing Pseudo-closed Sets

The question of the lower bounds for the delay in the computation of the Duquenne-Guigues implication basis in non-lectic orders is still open. As a step towards an answer, we propose an algorithm that can enumerate pseudo-closed sets in orders that do not necessarily extend the inclusion order using depthfirst searches in a sequence of closure systems. Empirical comparisons with NextClosure on the runtime and number of closed sets computed are provided.

Keywords: Implication, Pseudo-closed set

# Introducer Concepts in n-Dimensional Contexts

Concept lattices are well-known conceptual structures that organise interesting patterns—the concepts—extracted from data. In some applications, such as software engineering or data mining, the size of the lattice can be a problem, as it is often too large to be efficiently computed, and too complex to be browsed. For this reason, the Galois Sub-Hierarchy, a restriction of the concept lattice to introducer concepts, has been introduced as a smaller alternative. In this paper, we generalise the Galois Sub-Hierarchy to n-lattices, conceptual structures obtained from multidimensional data in the same way that concept lattices are obtained from binary relations.

Keywords: Formal Concept Analysis, Polyadic Concept Analysis, Introducer Concept, AOC-poset, Galois Sub-Hierarchy.

# Average Size of Implicational Bases

Implicational bases are objects of interest in formal concept analysis and its applications. Unfortunately, even the smallest base, the Duquenne-Guigues base, has an exponential size in the worst case. In this paper, we use results on the average number of minimal transversals in random hypergraphs to show that the base of proper premises is, on average, of quasi-polynomial size.

Keywords: Formal Concept Analysis, Implication Base, Average Case Analysis.

# k-Partite Graphs as Contexts

In formal concept analysis, 2-dimensional formal contexts are bipartite graphs. In this work, we generalise the notions of context and concept to graphs that are not bipartite. We then study the complexity of the enumeration and identify the structure of the set of such concepts.

# On-demand Relational Concept Analysis

Formal Concept Analysis and its associated conceptual structures have been used to support exploratory search through conceptual navigation. Relational Concept Analysis (RCA) is an extension of Formal Concept Analysis to process relational datasets. RCA and its multiple interconnected structures represent good candidates to support exploratory search in relational datasets, as they are enabling navigation within a structure as well as between the connected structures. However, building the entire structures does not present an efficient solution to explore a small localised area of the dataset, for instance to retrieve the closest alternatives to a given query. In these cases, generating only a concept and its neighbour concepts at each navigation step appears as a less costly alternative. In this paper, we propose an algorithm to compute a concept and its neighbourhood in extended concept lattices. The concepts are generated directly from the relational context family, and possess both formal and relational attributes. The algorithm takes into account two RCA scaling operators. We illustrate it on an example.

Keywords: Relational Concept Analysis, Formal Concept Analysis, Ondemand Generation

# Invalidating a Conjecture on the Average Number of Closed Sets in a Random Database

In this paper, we invalidate a conjecture from [3] which stated that given a random database with n columns and m lines, the average number of closed sets of size x with a support y is maximized when x = log n and y = log m. We prove a refinement of this conjecture and obtain that x = log m − log log log m + O(1) and y = log n − log log log n + O(1). From there we obtain a estimation of the average number of closed sets in a random database.

Keywords: Average analysis, closed sets, data mining.

# On Implication Bases in n-Lattices

Implication bases in n-lattices are not formally defined. We clarify the different types of implications we need to reconstruct a concept n-lattice and show they can be derived from the same set of implications. We use this to identify a particular type of implication base in n-contexts. Finally, we provide an algorithm for computing implicational closures with n-dimensional bases.

# On-Demand Generation of AOC-Posets: Reducing the Complexity of Conceptual Navigation

Exploratory search allows to progressively discover a dataspace by browsing through a structured collection of documents. Concept lattices are graph structures which support exploratory search by conceptual navigation, i.e., navigating from concept to concept by selecting and deselecting descriptors. These methods are known to be limited by the size of concept lattices which can be too large to be efficiently computed or too complex to be browsed intelligibly. In this paper, we address the problem of providing techniques that reduce the complexity of FCA-based exploratory search. We show the suitability of AOC-posets, a condensed alternative structure to achieve conceptual navigation. Also, we outline algorithms to enable an on-demand generation of AOC-posets. The necessity to devise more flexible methods to perform product selection in software product line engineering is what motivates our work.

# An Incremental Algorithm for Computing n-Dimensional Concepts

The problem of computing maximal rectangles (concepts) from a binary relation (context) has received much attention. The same cannot be said of the general, n-dimensional case. In this work, we propose the first outputpolynomial incremental algorithm for computing the n-concepts of an n-context.