Dissertation: TDA
This commit is contained in:
parent
bf232372fd
commit
48a6b0e75f
3 changed files with 175 additions and 6 deletions
Binary file not shown.
|
@ -2,6 +2,7 @@
|
||||||
|
|
||||||
\input{preamble}
|
\input{preamble}
|
||||||
|
|
||||||
|
\usepackage[firstpage]{draftwatermark}
|
||||||
|
|
||||||
|
|
||||||
\begin{document}
|
\begin{document}
|
||||||
|
@ -78,18 +79,176 @@ Thank you!
|
||||||
\chapter{Topological Data Analysis and Persistent Homology}%
|
\chapter{Topological Data Analysis and Persistent Homology}%
|
||||||
\label{cha:tda-ph}
|
\label{cha:tda-ph}
|
||||||
|
|
||||||
|
\section{Homology}%
|
||||||
|
\label{sec:homology}
|
||||||
|
|
||||||
|
Our goal is to understand the topological structure of a metric
|
||||||
|
space. For this, we can use \emph{homology}, which consists in
|
||||||
|
associating for a metric space $X$ and a dimension $i$ a vector space
|
||||||
|
$H_i(X)$. The dimension of $H_i(X)$ will give us the number of
|
||||||
|
$i$-dimensional components in $X$: the dimension of $H_0(X)$ is the
|
||||||
|
number of path-connected components in $X$, the dimension of $H_1(X)$
|
||||||
|
is the number of holes in $X$, and the dimension of $H_2(X)$ is the
|
||||||
|
number of voids.
|
||||||
|
|
||||||
|
Crucially, these vector spaces are robust to continuous deformation of
|
||||||
|
the underlying metric space (they are \emph{homotopy
|
||||||
|
invariant}). However, computing the homology of an arbitrary metric
|
||||||
|
space can be extremely difficult. It is necessary to approximate it in
|
||||||
|
a structure that would be both combinatorial and topological in
|
||||||
|
nature.
|
||||||
|
|
||||||
\section{Simplicial Complexes}%
|
\section{Simplicial Complexes}%
|
||||||
\label{sec:simplicial-complexes}
|
\label{sec:simplicial-complexes}
|
||||||
|
|
||||||
\section{Homology}%
|
In order to understand the topological structure of a metric space, we
|
||||||
\label{sec:homology}
|
need a way to decompose it in smaller pieces which, when assembled,
|
||||||
|
conserve the overall organisation of the space. For this, we use a
|
||||||
|
structure called a \emph{simplicial complex}, which is a kind of
|
||||||
|
higher-dimensional generalization of graphs.
|
||||||
|
|
||||||
|
The building blocks of this representation will be \emph{simplices},
|
||||||
|
which are simply the convex hull of an arbitrary set of
|
||||||
|
points. Examples of simplices include single points, segments,
|
||||||
|
triangles, and tetrahedrons (in dimensions 0, 1,, 2, and 3
|
||||||
|
respectively).
|
||||||
|
|
||||||
|
\begin{defn}[Simplex]
|
||||||
|
The \emph{$k$-dimensional simplex} $\sigma = [x_0,\ldots,x_k]$ is
|
||||||
|
the convex hull of the set $\{x_0,\ldots,x_k\} \in \mathbb{R}^d$,
|
||||||
|
where $x_0,\ldots,x_k$ are affinely independent. $x_0,\ldots,x_k$
|
||||||
|
are called the \emph{vertices} of $\sigma$, and the simplices
|
||||||
|
defined by the subsets of $\{x_0,\ldots,x_k\}$ are called the
|
||||||
|
\emph{faces} of $\sigma$.
|
||||||
|
\end{defn}
|
||||||
|
|
||||||
|
We then need a way to combine these basic building blocks meaningfully
|
||||||
|
so that the resulting object can adequately reflect the topological
|
||||||
|
structure of the metric space.
|
||||||
|
|
||||||
|
\begin{defn}[Simplicial complex]
|
||||||
|
A \emph{simplicial complex} is a collection $K$ of simplices such
|
||||||
|
that:
|
||||||
|
\begin{itemize}
|
||||||
|
\item any face of a simplex of $K$ is a simplex of $K$
|
||||||
|
\item the intersection of two simplices of $K$ is either the empty
|
||||||
|
set or a common face or both.
|
||||||
|
\end{itemize}
|
||||||
|
\end{defn}
|
||||||
|
|
||||||
|
%% TODO figure with examples of simplicial complexes
|
||||||
|
|
||||||
|
Using these definitions, we can define homology on simplicial
|
||||||
|
complexes. %% TODO add reference for more details/do it myself?
|
||||||
|
|
||||||
\section{Filtrations}%
|
\section{Filtrations}%
|
||||||
\label{sec:filtrations}
|
\label{sec:filtrations}
|
||||||
|
|
||||||
|
If we consider that a simplicial complex is a kind of
|
||||||
|
``discretization'' of a metric space, we realise that there must be an
|
||||||
|
issue of \emph{scale}. For our analysis to be invariant under small
|
||||||
|
perturbations in the data, we need a way to find the optimal scale
|
||||||
|
parameter to capture the adequate topological structure, without
|
||||||
|
taking into account some small perturbations, nor ignoring some
|
||||||
|
important smaller features.
|
||||||
|
|
||||||
|
%% TODO rewrite using the Cech filtration as an example?
|
||||||
|
|
||||||
|
The ideal solution to these problems is to consider all scales at
|
||||||
|
once: this is the objective of \emph{filtered simplical complexes}.
|
||||||
|
|
||||||
|
\begin{defn}[Filtration]
|
||||||
|
A \emph{filtered simplicial complex}, or simply a \emph{filtration},
|
||||||
|
$K$ is a sequence ${(K_i)}_{i\in I}$ of simplicial complexes such
|
||||||
|
that:
|
||||||
|
\begin{itemize}
|
||||||
|
\item for any $i, j \in I$, if $i < j$ then $K_i \subseteq K_j$,
|
||||||
|
\item $\bigcup_{i\in I} K_i = K$.
|
||||||
|
\end{itemize}
|
||||||
|
\end{defn}
|
||||||
|
|
||||||
|
\section{Persistent Homology}%
|
||||||
|
\label{sec:persistent-homology}
|
||||||
|
|
||||||
|
We can now compute the homology for each step in a filtration. This
|
||||||
|
leads to the notion of \emph{persistent homology}, which gives us all
|
||||||
|
the information necessary to establish the topological structure of
|
||||||
|
the metric space at multiple scales.
|
||||||
|
|
||||||
|
\begin{defn}[Persistent homology]
|
||||||
|
The \emph{$p$-th persistent homology} of a simplicial complex
|
||||||
|
$K = {(K_i)}_{i\in I}$ is the pair
|
||||||
|
$(\{H_p(K_i)\}_{i\in I}, \{f_{i,j}\}_{i,j\in I, i\leq j})$, where
|
||||||
|
for all $i\leq j$, $f_{i,j} : H_p(K_i) \mapsto H_p(K_j)$ is induced
|
||||||
|
by the inclusion map $K_i \mapsto K_j$.
|
||||||
|
\end{defn}
|
||||||
|
|
||||||
|
The functions $f_{i,j}$ allow us to link generators in each successive
|
||||||
|
homology space in the filtration. Since each generator correspond to a
|
||||||
|
topological feature (connected component, hole, void, etc, depending
|
||||||
|
on the dimension $p$), we can determine whether it survives in the
|
||||||
|
next step of the filtration. We can now determine when each feature is
|
||||||
|
born and when it dies (if it dies at all). This representation will be
|
||||||
|
dependent on the choice of basis for each homology space
|
||||||
|
$H_p(K_i)$. However, by the Fundamental Theorem of Persistent
|
||||||
|
Homology, we can choose base vectors in each homology space such that
|
||||||
|
the collection of half-open intervals is well-defined and unique. This
|
||||||
|
construction is called a \emph{barcode}.
|
||||||
|
%% TODO references for the Fundamental Theorem
|
||||||
|
|
||||||
\section{Topological summaries: barcodes and persistence diagrams}%
|
\section{Topological summaries: barcodes and persistence diagrams}%
|
||||||
\label{sec:topol-summ}
|
\label{sec:topol-summ}
|
||||||
|
|
||||||
|
In order to interpret the results of the persistent homology
|
||||||
|
computation, we need to compare the output for a particular data set
|
||||||
|
to a suitable null model. For this, we need some kind of a similarity
|
||||||
|
measure between barcodes and a way to evaluate the statistical
|
||||||
|
significance of the results.
|
||||||
|
|
||||||
|
One possible approach for this is to define a space in which we can
|
||||||
|
project barcodes and study their geometric
|
||||||
|
properties. \emph{Persistence diagrams} are an example of such a
|
||||||
|
space.
|
||||||
|
|
||||||
|
\begin{defn}[Persistence diagrams]
|
||||||
|
A \emph{persistence diagram} is the union of a finite multiset of
|
||||||
|
points in $\bar{\mathbb{R}}^2$ zith the diagonal
|
||||||
|
$\Delta = \{(x,x) \;|\; x\in\mathbb{R}^2\}$, where every point of
|
||||||
|
$\Delta$ has infinite multiplicity.
|
||||||
|
\end{defn}
|
||||||
|
|
||||||
|
The diagonal $\Delta$ is added to facilitate comparisons between
|
||||||
|
diagrams, as points near the diagonal correspond to short-lived
|
||||||
|
topological feature, thus likely to be caused by small perturbations
|
||||||
|
in the data.
|
||||||
|
|
||||||
|
We can now define several distances on the space of persistence
|
||||||
|
diagrams.
|
||||||
|
|
||||||
|
\begin{defn}[Wasserstein distance]
|
||||||
|
The \emph{$p$-th Wasserstein distance} between two diagrams $X$ and
|
||||||
|
$Y$ is
|
||||||
|
\[ W_p[d](X, Y) = \inf_{\phi:X\mapsto Y} \left[\sum_{x\in X} {d\left(x, \phi(x)\right)}^p\right] \]
|
||||||
|
for $p\in [1,\infty)$, and
|
||||||
|
\[ W_\infty[d](X, Y) = \inf_{\phi:X\mapsto Y} \sup_{x\in X} d\left(x,
|
||||||
|
\phi(x)\right) \] for $p = \infty$, where $d$ is a distance on
|
||||||
|
$\mathbb{R}^2$ and $\phi$ ranges over all bijections from $X$ to
|
||||||
|
$Y$.
|
||||||
|
\end{defn}
|
||||||
|
|
||||||
|
\begin{defn}[Bottleneck distance]
|
||||||
|
The \emph{bottleneck distance} is defined as the infinite
|
||||||
|
Wasserstein distance with $d$ the uniform norm:
|
||||||
|
$d_B = W_\infty[L_\infty]$.
|
||||||
|
\end{defn}
|
||||||
|
|
||||||
|
Since the bottleneck distance is by far the most commonly used, we
|
||||||
|
will focus on it in the following. It is symmetric, non-negative, and
|
||||||
|
satisfies the triangle inequality. However, it is not a true distance,
|
||||||
|
as it is fairly straightforward to come up with two distinct diagrams
|
||||||
|
at bottleneck distance zero, even on multisets not touching the
|
||||||
|
diagonal $\Delta$.
|
||||||
|
|
||||||
\section{Stability}%
|
\section{Stability}%
|
||||||
\label{sec:stability}
|
\label{sec:stability}
|
||||||
|
|
||||||
|
@ -97,10 +256,6 @@ Thank you!
|
||||||
|
|
||||||
\chapter{Temporal Networks}%
|
\chapter{Temporal Networks}%
|
||||||
\label{cha:temporal-networks}
|
\label{cha:temporal-networks}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
\backmatter%
|
\backmatter%
|
||||||
|
|
||||||
\nocite{*}
|
\nocite{*}
|
||||||
|
|
|
@ -16,6 +16,20 @@
|
||||||
\usepackage{lettrine}
|
\usepackage{lettrine}
|
||||||
|
|
||||||
\usepackage{amssymb, amsmath}
|
\usepackage{amssymb, amsmath}
|
||||||
|
\usepackage{amsthm}
|
||||||
|
|
||||||
|
\theoremstyle{plain}
|
||||||
|
\newtheorem{thm}{Theorem}[chapter]
|
||||||
|
\newtheorem{lem}[thm]{Lemma}
|
||||||
|
\newtheorem{cor}[thm]{Corollary}
|
||||||
|
\newtheorem{prop}[thm]{Proposition}
|
||||||
|
\theoremstyle{definition}
|
||||||
|
\newtheorem{defn}{Definition}[chapter]
|
||||||
|
\newtheorem{expl}{Example}[chapter]
|
||||||
|
\theoremstyle{remark}
|
||||||
|
\newtheorem*{rem}{Remark}
|
||||||
|
\newtheorem*{note}{Note}
|
||||||
|
\newtheorem*{notation}{Notation}
|
||||||
|
|
||||||
\usepackage{pdfpages}
|
\usepackage{pdfpages}
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue