Archives de l’auteur : Alexandre Bazin

Discovering a Representative Set of Link Keys in RDF Datasets

A link key is based on a set of property pairs and can be used to identify pairs of individuals representing the same real-world entity in two different RDF datasets. Various algorithms are aimed at discovering link keys which usually output a large number of candidates, making link key selection and validation a challenging task. In this paper, we propose an approach combining Formal Concept Analysis (FCA) for discovering link key candidates and building a link key lattice, and then hierarchical clustering over a given set of candidates for building a representative set of link keys.
Such a link key set should minimize the number of candidates to be validated while preserving a maximal number of links between individuals. The paper also provides a series of experiments which are performed over different RDF datasets, showing the effectiveness of the approach and the ability of hierarchical clustering to return a concise and meaningful set of candidates while preserving the ordinal structure of the link key lattice.

Feature Independence from the Point of View of Formal Concept Analysis

Measuring the dependence of two features/variables in a dataset is a problem that finds applications in most sciences. It is generally either based on the probability theoretic definition of independence or done by evaluating how much a feature is a particular function of the other. In this paper, we introduce a definition of independence in formal concept analysis, a lattice theoretic framework, and we investigate whether it can be leveraged to measure the independence of numerical features. We exploit the connections between binary relations and algebraic and logical structures at the heart of formal concept analysis to propose three measures and we evaluate their potential using synthetic feature selection problems.

FCAvizIR: Exploring Relational Data Set’sImplications using Metrics and Topics

Implication is a core notion of Formal Concept Analysis and its extensions. It provides information about the regularities present in the data. When one considers a relational data set of real-size, implications are numerous and their formulation, which combines primitive and relational attributes computed using Relational Concept Analysis framework, is complex. For an expert wishing to answer a question based on such a corpus of implications, having a smart exploration strategy is crucial. In this paper, we propose a visual approach, implemented in a web platform named FCAvizIR, for leveraging such corpus. Comprised of three interactive and coordinated views and a toolbox, FCAvizIR has been designed to explore corpora of implication rules following Schneiderman’s famous mantra “overview first, zoom and filter, then details on demand”. It enables metrics filtering, e.g. fixing a minimum and a maximum support value, and the multiple selection of relations and attributes in the premise and in the conclusion to identify the corresponding subset of implications presented as a list and Euler diagrams. An example of exploration is presented using an excerpt of Knomana to analyze plant-based extracts for controlling pests.

Exploring the 3-Dimensional Variability of Websites’User-Stories using Triadic Concept Analysis

Configurable software systems and families of similar software systems are increasingly being considered by industry to provide software tailored to each customer’s needs. Their development requires managing software variability, i.e. commonalities, differences and constraints. A primary step is properly analyzing the variability of software, which can be done at various levels, from specification to deployment. In this paper, we focus on the software variability expressed through user-stories, viz. short formatted sentences indicating which user role can perform which action at the specification level. At this level, variability is usually analyzed in a two dimension view, i.e. software described by features, and considering the roles apart. The novelty of this work is to model the three dimensions of the variability (i.e. software, roles, features) and explore it using Triadic Concept Analysis (TCA), an extension of Formal Concept Analysis. The variability exploration is based on the extraction of 3-dimensional implication rules. The adopted methodology is applied to a case study made of 65 commercial web sites in four domains, i.e. manga, martial arts sports equipment, board games including trading cards, and video-games. This work highlights the diversity of information provided by such methodology to draw directions for the development of a new product or for building software variability models.

Dispositif d’apprentissage automatique collaboratif pour la pratique du débat

Le projet AREN-DIA (ARgumentation Et Numérique – Didactique & Intelligence Artificielle) vise à sensibiliser à la pratique du débat dans le cadre de l’éducation et au sein de la société civile. Le projet se matérialise à travers la création et la mise à l’épreuve d’une plateforme de débat. Cette dernière offre la possibilité d’engager des débats structurés à partir d’un texte, renouvelant l’approche traditionnelle des échanges argumentatifs. Elle intègre une technologie collaborative de Traitement Automatique du Langage afin d’augmenter l’efficacité du processus de débat. Notre article se consacrera aux enjeux de l’axe IA du projet, à savoir : comment concevoir un mécanisme de renforcement incitant les utilisateurs à participer à l’amélioration du système d’IA produisant une représentation structurée des propos d’un débat ? Ainsi, une procédure automatique vient compléter le débat en suggérant des termes-clés synthétisant les propos tenus. Cette indexation est le point de départ de l’analyse et de l’accompagnement du débat par la machine. Elle est soumise à une interaction avec les utilisateurs, qui seront invités à valider, invalider ou compléter ces termes-clés. Afin de lever l’ambiguïté sémantique, nous avons recours à une étape d’enrichissement des termes pour les préparer à l’opération d’extraction de connaissances basée sur l’analyse formelle de concepts (AFC). Ces connaissances, sous forme d’implications, sont utilisées pour mettre à jour les relations dans la base de connaissances exploitée.

Distances Between Formal Concept Analysis Structures

In this paper, we study the notion of distance between the most important structures of formal concept analysis: formal contexts, concept lattices, and implication bases. We first define three families of Minkowski-like distances between these three structures. We then present experiments showing that the correlations of these measures are low and depend on the distance between formal contexts.

Extraction de connaissances basée sur l’analyse formelle de concepts en vue de l’assistance aux débats en ligne

Nous présentons un processus automatisé d’accompagnement de débats visant à extraire des associations entre termes à partir des listes de termesclés des arguments, listes co-construites par les utilisateurs et notre système d’indexation. L’indexation encourage les utilisateurs à compléter ou corriger la liste des termes-clés, agissant comme un outil incitatif à l’élaboration de points de vue plus structurés. L’algorithme est basé sur l’analyse formelle de concepts et s’appuie sur la base de connaissances JeuxDeMots (JDM). La procédure fait intervenir plusieurs modules menant à une étape d’extraction de connaissances sous forme d’implications destinées à être intégrées dans JDM. Cette approche coopérative permet à la base de connaissances de s’enrichir à mesure que les débats sont analysés, améliorant les termes-clés suggérés par la plate-forme.

Discovery of link keys in resource description framework datasets based on pattern structures

In this paper, we present a detailed and complete study on data interlinking and the discovery of identity links between two RDF-Resource Description Framework-datasets over the web of data. Data interlinking is the task of discovering identity links between individuals across datasets. Link keys are constructions based on pairs of properties and classes that can be considered as rules allowing to infer identity links between subjects in two RDF datasets. Here we investigate how FCA-Formal Concept Analysis-and its extensions are well adapted to investigate and to support the discovery of link keys. Indeed plain FCA allows to discover the so-called link key candidates, while a specific pattern structure allows to associate a pair of classes with every candidate. Different link key candidates can generate sets of identity links between individuals that can be considered as equal when they are regarded as partitions of the identity relation and thus involving a kind of redundancy. In this paper, such a redundancy is deeply studied thanks to partition pattern structures. In particular, experiments are proposed where it is shown that redundancy of link key candidates while not significant when based on identity of partitions appears to be much more significant when based on similarity.

Polyadic Relational Concept Analysis

Formal concept analysis is a mathematical framework based on lattice theory that aims at representing the information contained in binary object-attribute datasets (called formal contexts) in the form of a lattice of so-called formal concepts. Since its introduction, it has been extended to more complex types of data. In this paper, we are interested in two of those extensions: relational concept analysis and polyadic concept analysis that allow to process, respectively, relational data and $n$-ary relations. We present a framework for polyadic relational concept analysis that extends relational concept analysis to relational datasets that are made of $n$-ary relations. We show its basic properties and that it is a valid extension of relational concept analysis.

A study of the discovery and redundancy of link keys between two RDF datasets based on partition pattern structures

A link key between two RDF datasets D1 and D2 is a set of pairs of properties allowing to identify pairs of individuals x1 and x2 through an identity link such as x1 owl∶sameAs x2 . In this paper, relying on and extending previous work, we introduce an original formalization of link key discovery based on the framework of Partition Pattern Structures (pps). Our objective is to study and evaluate the redundancy of link keys based on the fact that owl:sameAs is an equivalence relation. In the pps concept lattice, every concept has an extent representing a link key candidate and an intent representing a partition of instances into sets of equivalent instances. Experiments show three main results. Firstly redundancy of link keys is not so significant in real-world datasets. Nevertheless, the link key discovery approach based on pps returns a reduced number of non redundant link key candidates when compared to a standard approach. Moreover, the pps-based approach is efficient and returns link keys of high quality.