Formal concept analysis revolves around the notion of formal concept. Many tools have been developed to compute these concepts, based on the fact that the extents and intents are closed under the closure operators given by the context. However, concepts are also dual to the complement of the incidence relation of the context, a fact that is rarely discussed in the community.
In this paper, we compare tools developed in the FCA community against dualization-based tools on runtime in both real and artificial datasets. The experimental observations show that dualization-based tools are competitive and more resistant to increases in context density.
Archives de l’auteur : Alexandre Bazin
A Proposal for Building a Compact and Tunable Representation of a Concept Lattice Based on Clustering
A concept lattice provides a model of a dataset that can be navigated and explored by an analyst in an interactive way, except when the concept lattice is too large. Such a problem can be overcome by building a representation of the whole concept lattice that keeps a reasonable size and that can be interpreted by the analyst. Relying on previous work about link key discovery, we revisit in this paper an approach based on Formal Concept Analysis (FCA) and Agglomerative Hierarchical Clustering (AHC) applied to a set of concepts for building a representative set of clusters. Accordingly, we propose an AHC algorithm that (a) efficiently computes this representative set, and (b) respects the ordinal structure of the original concept lattice. A set of experiments performed over real datasets shows the effectiveness of our approach.
Variability-Driven User-Story Generation using LLM and Triadic Concept Analysis
A widely used Agile practice for requirements is to produce a set of user stories (also called « agile product backlog »), which roughly includes a list of pairs (role, feature), where the role handles the feature for a certain purpose. In the context of Software Product Lines, the requirements for a family of similar systems is thus a family of user-story sets, one per system, leading to a 3-dimensional dataset composed of sets of triples (system, role, feature). In this paper, we combine Triadic Concept Analysis (TCA) and Large Language Model (LLM) prompting to suggest the user-story set required to develop a new system relying on the variability logic of an existing system family. This process consists in 1) computing 3-dimensional variability expressed as a set of TCA implications, 2) providing the designer with intelligible design options, 3) capturing the designer’s selection of options, 4) proposing a first user-story set corresponding to this selection, 5) consolidating its validity according to the implications identified in step 1, while completing it if necessary, and 6) leveraging LLM commonsense knowledge to have a more comprehensive website. This process is evaluated with a dataset comprising the user-story sets of 67 similar-purpose websites.
Discovering a Representative Set of Link Keys in RDF Datasets
A link key is based on a set of property pairs and can be used to identify pairs of individuals representing the same real-world entity in two different RDF datasets. Various algorithms are aimed at discovering link keys which usually output a large number of candidates, making link key selection and validation a challenging task. In this paper, we propose an approach combining Formal Concept Analysis (FCA) for discovering link key candidates and building a link key lattice, and then hierarchical clustering over a given set of candidates for building a representative set of link keys.
Such a link key set should minimize the number of candidates to be validated while preserving a maximal number of links between individuals. The paper also provides a series of experiments which are performed over different RDF datasets, showing the effectiveness of the approach and the ability of hierarchical clustering to return a concise and meaningful set of candidates while preserving the ordinal structure of the link key lattice.
Feature Independence from the Point of View of Formal Concept Analysis
Measuring the dependence of two features/variables in a dataset is a problem that finds applications in most sciences. It is generally either based on the probability theoretic definition of independence or done by evaluating how much a feature is a particular function of the other. In this paper, we introduce a definition of independence in formal concept analysis, a lattice theoretic framework, and we investigate whether it can be leveraged to measure the independence of numerical features. We exploit the connections between binary relations and algebraic and logical structures at the heart of formal concept analysis to propose three measures and we evaluate their potential using synthetic feature selection problems.
FCAvizIR: Exploring Relational Data Set’sImplications using Metrics and Topics
Implication is a core notion of Formal Concept Analysis and its extensions. It provides information about the regularities present in the data. When one considers a relational data set of real-size, implications are numerous and their formulation, which combines primitive and relational attributes computed using Relational Concept Analysis framework, is complex. For an expert wishing to answer a question based on such a corpus of implications, having a smart exploration strategy is crucial. In this paper, we propose a visual approach, implemented in a web platform named FCAvizIR, for leveraging such corpus. Comprised of three interactive and coordinated views and a toolbox, FCAvizIR has been designed to explore corpora of implication rules following Schneiderman’s famous mantra “overview first, zoom and filter, then details on demand”. It enables metrics filtering, e.g. fixing a minimum and a maximum support value, and the multiple selection of relations and attributes in the premise and in the conclusion to identify the corresponding subset of implications presented as a list and Euler diagrams. An example of exploration is presented using an excerpt of Knomana to analyze plant-based extracts for controlling pests.
Exploring the 3-Dimensional Variability of Websites’User-Stories using Triadic Concept Analysis
Configurable software systems and families of similar software systems are increasingly being considered by industry to provide software tailored to each customer’s needs. Their development requires managing software variability, i.e. commonalities, differences and constraints. A primary step is properly analyzing the variability of software, which can be done at various levels, from specification to deployment. In this paper, we focus on the software variability expressed through user-stories, viz. short formatted sentences indicating which user role can perform which action at the specification level. At this level, variability is usually analyzed in a two dimension view, i.e. software described by features, and considering the roles apart. The novelty of this work is to model the three dimensions of the variability (i.e. software, roles, features) and explore it using Triadic Concept Analysis (TCA), an extension of Formal Concept Analysis. The variability exploration is based on the extraction of 3-dimensional implication rules. The adopted methodology is applied to a case study made of 65 commercial web sites in four domains, i.e. manga, martial arts sports equipment, board games including trading cards, and video-games. This work highlights the diversity of information provided by such methodology to draw directions for the development of a new product or for building software variability models.
Dispositif d’apprentissage automatique collaboratif pour la pratique du débat
Le projet AREN-DIA (ARgumentation Et Numérique – Didactique & Intelligence Artificielle) vise à sensibiliser à la pratique du débat dans le cadre de l’éducation et au sein de la société civile. Le projet se matérialise à travers la création et la mise à l’épreuve d’une plateforme de débat. Cette dernière offre la possibilité d’engager des débats structurés à partir d’un texte, renouvelant l’approche traditionnelle des échanges argumentatifs. Elle intègre une technologie collaborative de Traitement Automatique du Langage afin d’augmenter l’efficacité du processus de débat. Notre article se consacrera aux enjeux de l’axe IA du projet, à savoir : comment concevoir un mécanisme de renforcement incitant les utilisateurs à participer à l’amélioration du système d’IA produisant une représentation structurée des propos d’un débat ? Ainsi, une procédure automatique vient compléter le débat en suggérant des termes-clés synthétisant les propos tenus. Cette indexation est le point de départ de l’analyse et de l’accompagnement du débat par la machine. Elle est soumise à une interaction avec les utilisateurs, qui seront invités à valider, invalider ou compléter ces termes-clés. Afin de lever l’ambiguïté sémantique, nous avons recours à une étape d’enrichissement des termes pour les préparer à l’opération d’extraction de connaissances basée sur l’analyse formelle de concepts (AFC). Ces connaissances, sous forme d’implications, sont utilisées pour mettre à jour les relations dans la base de connaissances exploitée.
Distances Between Formal Concept Analysis Structures
In this paper, we study the notion of distance between the most important structures of formal concept analysis: formal contexts, concept lattices, and implication bases. We first define three families of Minkowski-like distances between these three structures. We then present experiments showing that the correlations of these measures are low and depend on the distance between formal contexts.
Extraction de connaissances basée sur l’analyse formelle de concepts en vue de l’assistance aux débats en ligne
Nous présentons un processus automatisé d’accompagnement de débats visant à extraire des associations entre termes à partir des listes de termesclés des arguments, listes co-construites par les utilisateurs et notre système d’indexation. L’indexation encourage les utilisateurs à compléter ou corriger la liste des termes-clés, agissant comme un outil incitatif à l’élaboration de points de vue plus structurés. L’algorithme est basé sur l’analyse formelle de concepts et s’appuie sur la base de connaissances JeuxDeMots (JDM). La procédure fait intervenir plusieurs modules menant à une étape d’extraction de connaissances sous forme d’implications destinées à être intégrées dans JDM. Cette approche coopérative permet à la base de connaissances de s’enrichir à mesure que les débats sont analysés, améliorant les termes-clés suggérés par la plate-forme.
