Research

My research focuses on statistics related to graph-structured data.
During my thesis, I developed and studied tools for multi-scale graph comparisons based on heat diffusion and topological data analysis. These tools come with statistical guarantees that ensure the asymptotic validity of two-sample tests. Implementing these methods allowed me to apply them to practical problems, particularly in machine learning and neural network classifiers.
Currently, I am exploring various aspects of graph data compression. This includes work on graph inference through a compressive learning method and, more recently, investigating methods to accelerate community detection algorithms using coarsening techniques.

Articles Talks Material Others

Articles

	A multilevel approach to accelerate the training of Transformers, 2025. With G. Lauga, M. Chaumette, E. Desainte-Maréville and A. Lebeurrier [HAL] In this article, we investigate the potential of multilevel approaches to accelerate the training of transformer architectures. Using an ordinary differential equation (ODE) interpretation of these architectures, we propose an appropriate way of varying the discretization of these ODE Transformers in order to accelerate the training. We validate our approach experimentally by a comparison with the standard training procedure.
	A note on the relations between mixture models, maximum-likelihood and entropic optimal transport, 2025. With T. Vayer [arxiv] This note aims to demonstrate that performing maximum-likelihood estimation for a mixture model is equivalent to minimizing over the parameters an optimal transport problem with entropic regularization. The objective is pedagogical: we seek to present this already known result in a concise and hopefully simple manner. We give an illustration with Gaussian mixture models by showing that the standard EM algorithm is a specific block-coordinate descent on an optimal transport loss.
	PASCO (PArallel Structured COarsening): an overlay to speed up graph clustering algorithms, 2024. With R. Vaudaine, T. Vayer, P. Borgnat, R. Gribonval, P. Gonçalves, M. Karsai [arxiv] Clustering the nodes of a graph is a cornerstone of graph analysis and has been extensively studied. However, some popular methods are not suitable for very large graphs: e.g., spectral clustering requires the computation of the spectral decomposition of the Laplacian matrix, which is not applicable for large graphs with a large number of communities. This work introduces PASCO, an overlay that accelerates clustering algorithms. Our method consists of three steps: 1- We compute several independent small graphs representing the input graph by applying an efficient and structure-preserving coarsening algorithm. 2- A clustering algorithm is run in parallel onto each small graph and provides several partitions of the initial graph. 3- These partitions are aligned and combined with an optimal transport method to output the final partition. The PASCO framework is based on two key contributions: a novel global algorithm structure designed to enable parallelization and a fast, empirically validated graph coarsening algorithm that preserves structural properties. We demonstrate the strong performance of PASCO in terms of computational efficiency, structural preservation, and output partition quality, evaluated on both synthetic and real-world graph datasets.
	Compressive Recovery of Sparse Precision Matrices, 2023. With T. Vayer, R. Gribonval, P. Gonçalves [arxiv] We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples $\mathbf{X} \in \mathbb{R}^{n \times d}$. Standard approaches amount to searching for a precision matrix $\boldsymbol\Theta$ representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^{2}$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting. In this work, we adopt a ‘‘compressive'' viewpoint and aim to estimate a sparse $\boldsymbol\Theta$ from a sketch of the data, i.e., a low-dimensional vector of size $m \ll d^{2}$ carefully designed from $\mathbf{X}$ using nonlinear random features. Under certain assumptions on the spectrum of $\boldsymbol\Theta$ (or its condition number), we show that it is possible to estimate it from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by compressed sensing theory and involve restricted isometry properties and instance optimal decoders. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
	Eve, Adam and the Preferential Attachment Tree, 2023. With A. Contat, N. Curien, P. Lacroix and V. Rivoirard Published in Probability Theory and Related Fields. [journal] [arxiv] We consider the problem of finding the initial vertex (Adam) in a Barabasi--Albert tree process $ (\mathcal{T}(n) : n \geq 1) $ at large times. More precisely, given $ \varepsilon>0 $, one wants to output a subset $\mathcal{P}_{ \varepsilon}(n) $ of vertices of $\mathcal{T}(n)$ so that the initial vertex belongs to $\mathcal{P}_ \varepsilon(n)$ with probability at least $1- \varepsilon$ when $n$ is large. It has been shown by Bubeck, Devroye & Lugosi (2017), refined later by Banerjee & Huang (2023), that one needs to output at least $\varepsilon^{-1 + o(1)}$ and at most $\varepsilon^{-2 + o(1)}$ vertices to succeed. We prove that the exponent in the lower bound is sharp and the key idea is that Adam is either a ``large degree" vertex or is a neighbor of a ``large degree" vertex (Eve).
	Heat diffusion distance processes: a statistically founded method to analyze graph data sets, 2023. Published in J Appl. and Comput. Topology (SI : Data Science on Graphs). [journal] [arxiv] We propose two multiscale comparisons of graphs using heat diffusion, allowing to compare graphs without node correspondence or even with different sizes. These multiscale comparisons lead to the definition of Lipschitz-continuous empirical processes indexed by a real parameter. The statistical properties of empirical means of such processes are studied in the general case. Under mild assumptions, we prove a functional Central Limit Theorem, as well as a Gaussian approximation with a rate depending only on the sample size. Once applied to our processes, these results allow to analyze data sets of pairs of graphs. We design consistent confidence bands around empirical means and consistent two-sample tests, using bootstrap methods. Their performances are evaluated by simulations on synthetic data sets.

Talks

April, 2025 - Paris - Séminaire Parisien de Statistiques.
Compressive Recovery of Sparse Precision Matrices.

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

April, 2025 - Nantes - Applied mathematics seminar, LMJL.
Compressive Recovery of Sparse Precision Matrices.

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

March, 2025 - Rennes - Statistics seminar, IRMAR.
Compressive Recovery of Sparse Precision Matrices.

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

February, 2025 - Marne la Vallée - Proba-Stats seminar of the LAMA team.
Compressive Recovery of Sparse Precision Matrices.

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

December, 2024 - Rennes - COMPACT seminar.
Compressive Recovery of Sparse Precision Matrices.

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

November, 2024 - Rennes - Malt seminar.
Statistical comparison of graph-structured data and its application to distribution shift detection.
When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. Given initial conditions, one can use the way heat diffuses to compare graphs. Choosing relevant and informative diffusion times is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. We define real-valued processes indexed by [0, T] for some T > 0, representing the comparisons for all diffusion times. Using tools from topological data analysis, we are able to compare graphs with unknown node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. From these results, we will see how we can construct consistent two-sample tests. Then, I will present some applications for the detection of distribution shifts in the context of neural networks by using activation graphs.

September, 2024 - Nantes - SIMS seminar.
Compressive Recovery of Sparse Precision Matrices.

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

June, 2024 - Copenhagen - Popnets Workshop.
Statistical comparison of graph-structured data and its application to distribution shift detection.
When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. Given initial conditions, one can use the way heat diffuses to compare graphs. Choosing relevant and informative diffusion times is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. We define real-valued processes indexed by [0, T] for some T > 0, representing the comparisons for all diffusion times. Using tools from topological data analysis, we are able to compare graphs with unknown node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. From these results, we will see how we can construct consistent two-sample tests. Then, I will present some applications for the detection of distribution shifts in the context of neural networks by using activation graphs.

May, 2024 - Bordeaux - Journées de Statistiques.
Compressive Recovery of Sparse Precision Matrices. - [slides]

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

November, 2023 - ENS de Lyon - RT MIA Workshop: Dimension Reduction for Learning and Visualization.
Compressive Recovery of Sparse Precision Matrices.

We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples. Standard approaches amount to searching for a precision matrix representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^2$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting.
In this talk, we adopt a “compressive” viewpoint and aim to estimate a sparse precision matrix from a sketch of the data, i.e., a low-dimensional vector of size $m\ll d^2$ carefully designed from the data using nonlinear random features (e.g., rank-one projections). Under certain spectral assumptions, we show that it is possible to recover the precision matrix from a sketch of size $m=\Omega\left((d+2k)\log(d)\right)$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by the compressed sensing theory. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and the graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
Joint work with : Titouan Vayer, Rémi Gribonval and Paulo Gonçalves.

May, 2023 - LJK Grenoble - Seminar of the DATA department.
Statistical comparison of graph-structured data and its application to distribution shift detection.

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. Given initial conditions, one can use the way heat diffuses to compare graphs. Choosing relevant and informative diffusion times is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. We define real-valued processes indexed by [0, T] for some T > 0, representing the comparisons for all diffusion times. Using tools from topological data analysis, we are able to compare graphs with unknown node correspondence or even graphs with different sizes.
In this talk, I will introduce these processes and present their statistical properties. From these results, we will see how we can construct consistent two-sample tests. Then I will present some applications to the detection of distribution shifts in the context of neural networks by using activation graphs.

January, 2023 - Campus Agro Paris-Saclay - Meeting of the EcoNet project.
Statistical comparison of graph structured data.

Not provided.

January, 2023 - CIRM - Workshop on Random Geometry.
Finding Adam in the nearest-neighbor tree.

In this talk, I present the problem of finding the root in the online nearest-neighbor tree (NNT) model. Consider a probability distribution on some metric space. Here we will restrict ourselves to the uniform distribution on the unit circle with the Euclidean distance. The random tree is constructed recursively by linking each new node to its nearest neighbor.
Assume that we only observe the structure of such a large tree. That is, we only have access to the connectivity and neither the vertex positions in the metric space nor the vertex labels are available. Can we find a set $S$ of vertices with a reasonable size (i.e., independent of the tree size) that contain the root with high probability. I will explain how we can obtain such set by using a notion of centrality, and present the similarities and differences with the uniform random recursive tree (uRRT) model.

November, 2022 - Orsay - Working group of the Probability-Statistics team.
Testing between the stochastic block model and the Erdos-Renyi model. [article].
With Leonardo Martins-Bianco and Zacharie Naulet.

In a first part, we talk about the detection of the SBM vs ER model, in the sparse case, in the regime where detection is possible. We show that by counting cycles we can distinguish between the two models. Then, we talk about finding the communities in an SBM graph. We present the spectral clustering algorithm and show that it is efficient in the dense case.

June, 2022 - Lyon - Journées de Statistiques de la SFdS.
Analyse statistique de graphes, via des processus de diffusion de la chaleur.

Ce travail porte sur la comparaison de données de graphes, potentiellement pondérés et de tailles différentes. Lorsqu’on travaille avec des graphes pondérés, on peut interpréter les poids des arêtes comme des conductivités thermiques. Dès lors, on peut comparer les graphes en comparant leur répartition de chaleur après un temps de diffusion t. Ce paramètre d’échelle t doit être minutieusement choisi pour s’assurer des comparaisons pertinentes. A l’opposé de précédents travaux considérant un temps de diffusion fixé arbitrairement ou choisi à partir des données, on propose de prendre en compte tout le processus de diffusion. Pour cela, on définit des processus à valeurs réelles indexés par tous les temps de diffusion dans (0,T), en concaténant les comparaisons faites aux différentes échelles. Dans cet exposé, nous commencerons par présenter ces processus de comparaison de graphes et leurs propriétés statistiques. Puis, nous montrerons comment en dériver des tests à deux échantillons consistants. Nous présenterons quelques applications sur des jeux de données synthétiques et réels. On s’intéressera notamment aux données provenant de graphes d’activations de réseaux de neurones.

June, 2022 - Lyon - Machine Learning and Signal Processing Seminar.
Heat diffusion distance processes for graphs and their application to distribution shift detection.

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. Given initial conditions, one can use the way heat diffuses to compare graphs. Choosing relevant and informative diffusion times is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. We define real-valued processes indexed by [0, T] for some T > 0, representing the comparisons for all diffusion times. Using tools from topological data analysis, we are able to compare graphs with unknown node correspondence or even graphs with different sizes.
In this talk, I will introduce these processes and present their statistical properties. From these results, we will see how we can construct consistent two-sample tests. Then I will present some applications to the detection of distribution shifts in the context of neural networks by using activation graphs.

May, 2022 - Orsay - Working group of the Probability-Statistics team.
Presentation of Density estimation from unweighted k-nearest neighbor graphs: a roadmap [article].
With Alice Contat and Nicolas Curien.

Consider a density f on R^d, and some i.i.d. sample of size n following this density. Construct the oriented k-nearest neighbor graph. Can we, only from the graph structure (i.e. forgetting the embedding in R^d), recover the density. We study the 1D case in details. In particular, we explained why the order of magnitude of k needs to be greater than n^(2/3). We proposed some potential improvements of the proposed method for d=1.

May, 2022 - Porquerolles - Datashape seminar.
Detecting distribution shifts using activation graphs from neural networks.

The abstract is not available yet.

Mar, 2022 - Orsay - Working group of the Probability-Statistics team.
Presentation of Identifying the deviator [arxiv].

Alice and Bob are playing a game. They construct a binary sequence by alternatively anouncing 0 or 1. Imagine that a rule says that they should announce their digit by following a Bernoulli distribution of parameter 1/2, independently of the past of the sequence. If they both follow the rule, the sequence will verify several properties. For example, A : the proportion of 1s in the sequence tends to 1/2 ; B : the associated random walk (where we make a +1 step when a player announced 1 and a -1 step when a player announced 0) will cross 0 infinitely many times. Assuming that A or B is not observed, and that only one of the players is cheating, I present the article's results that show how we can identify the cheater almost surely, given the whole binary sequence. The article actually tackles a more general problem, but I only focus on these two examples.

Dec, 2021 - Besançon - Forum des Jeunes Mathématicien.ne.s.
Statistical analysis of graph structured data, via heat diffusion processes. - [slides]

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. This means that heat diffuses faster along edges with higher weights. Given initial conditions, one can use the way heat diffuses to compare graphs. But choosing a relevant and informative diffusion time is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. For that, we define real-valued processes indexed by all the diffusion times in [0, T] for some T > 0, namely the Heat Kernel Distance (HKD) process and the Heat Persistence Distance (HPD) process. Borrowing tools from TDA, the HPD process is able to compare graphs without known node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. Namely, we proved under mild assumptions that they verify a functional central limit theorem and admit a gaussian approximation. Moreover, I will present potential applications of these processes (the construction of confidence bands and two-sample tests).

Oct, 2021 - île d'Oléron - Colloque Jeunes Probabilistes et Statisticiens.
Statistical analysis of graph structured data, via heat diffusion processes. - [slides]

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. This means that heat diffuses faster along edges with higher weights. Given initial conditions, one can use the way heat diffuses to compare graphs. But choosing a relevant and informative diffusion time is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. For that, we define real-valued processes indexed by all the diffusion times in [0, T] for some T > 0, namely the Heat Kernel Distance (HKD) process and the Heat Persistence Distance (HPD) process. Borrowing tools from TDA, the HPD process is able to compare graphs without known node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. Namely, we proved under mild assumptions that they verify a functional central limit theorem and admit a gaussian approximation. Moreover, I will present potential applications of these processes (the construction of confidence bands and two-sample tests).

Oct, 2021 - Orsay - Working group of the Probability-Statistics team.
Presentation of Finding Adam in random growing trees [article].

The article deals with the estimation of the root when observing the structure of a large tree following random recursive tree models : either the uniform attachment model or the preferential attachment model. I only presented the results for the uniform attachment model. The weaker result shows that the root is recovered with probability greater than 1-ε, when the set of candidates for the root is of polynomial size in 1/ε, where the candidates are chosen as the ones whose largest subtree is the smallest. The strongest result shows that set of candidates can be chosen with subpolynomial size, by looking at a more refined statistic. I presented both results and the proof of the weaker one.

Oct, 2021 - U. Paris-Saclay - Datashape seminar
Heat diffusion distance processes: a statistically founded method to analyze graph data sets. - [slides]

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. This means that heat diffuses faster along edges with higher weights. Given initial conditions, one can use the way heat diffuses to compare graphs. But choosing a relevant and informative diffusion time is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. For that, we define real-valued processes indexed by all the diffusion times in [0, T] for some T > 0, namely the Heat Kernel Distance (HKD) process and the Heat Persistence Distance (HPD) process. Borrowing tools from TDA, the HPD process is able to compare graphs without known node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. Namely, we proved under mild assumptions that they verify a functional central limit theorem and admit a gaussian approximation. Moreover, I will present potential applications of these processes (the construction of confidence bands and two-sample tests, the study of neural networks through their activation graphs).

Material

My Ph.D. thesis is entitled Contributions to statistical analysis of graph-structured data.
You can find the manuscript and the slides of the defense here : [manuscript] [slides].

Others

Editorial activities

Reviewing activity for the Bernoulli Journal.
May 2023 - Review of a communication proposal for the 2023 edition of the GRETSI conference,
Special session: Graph Learning and Learning with Graphs.

Awards

Winner of the challenge : Mathématiques et Entreprises 2021, with Olympio Hacquart and Vadim Lebovici.
Subject proposed by french company Eurecam : Reconstruction of trajectories from real life 3D detections of people. We choose to use optimal transport with boundary to reconstruct trajectories frame by frame. We proposed an ad hoc minimization problem. Pre-processing and post-processing steps were implemented to obtain better reconstructions. Our work can be found here.