Dissertation: TDA

2018-07-26 11:08:18 +01:00 · 2018-07-26 11:08:18 +01:00 · 48a6b0e75f
commit 48a6b0e75f
parent bf232372fd
3 changed files with 175 additions and 6 deletions
--- a/dissertation/dissertation.pdf
+++ b/dissertation/dissertation.pdf
--- a/dissertation/dissertation.tex
+++ b/dissertation/dissertation.tex
@ -2,6 +2,7 @@
 \input{preamble}
 \usepackage[firstpage]{draftwatermark}
 \begin{document}
@ -78,18 +79,176 @@ Thank you!
 \chapter{Topological Data Analysis and Persistent Homology}%
 \label{cha:tda-ph}
 \section{Homology}%
 \label{sec:homology}
 Our goal is to understand the topological structure of a metric
 space. For this, we can use \emph{homology}, which consists in
 associating for a metric space $X$ and a dimension $i$ a vector space
 $H_i(X)$. The dimension of $H_i(X)$ will give us the number of
 $i$-dimensional components in $X$: the dimension of $H_0(X)$ is the
 number of path-connected components in $X$, the dimension of $H_1(X)$
 is the number of holes in $X$, and the dimension of $H_2(X)$ is the
 number of voids.
 Crucially, these vector spaces are robust to continuous deformation of
 the underlying metric space (they are \emph{homotopy
  invariant}). However, computing the homology of an arbitrary metric
 space can be extremely difficult. It is necessary to approximate it in
 a structure that would be both combinatorial and topological in
 nature.
 \section{Simplicial Complexes}%
 \label{sec:simplicial-complexes}
-\section{Homology}%
+In order to understand the topological structure of a metric space, we
-\label{sec:homology}
+need a way to decompose it in smaller pieces which, when assembled,
 conserve the overall organisation of the space. For this, we use a
 structure called a \emph{simplicial complex}, which is a kind of
 higher-dimensional generalization of graphs.
 The building blocks of this representation will be \emph{simplices},
 which are simply the convex hull of an arbitrary set of
 points. Examples of simplices include single points, segments,
 triangles, and tetrahedrons (in dimensions 0, 1,, 2, and 3
 respectively).
 \begin{defn}[Simplex]
  The \emph{$k$-dimensional simplex} $\sigma = [x_0,\ldots,x_k]$ is
  the convex hull of the set $\{x_0,\ldots,x_k\} \in \mathbb{R}^d$,
  where $x_0,\ldots,x_k$ are affinely independent. $x_0,\ldots,x_k$
  are called the \emph{vertices} of $\sigma$, and the simplices
  defined by the subsets of $\{x_0,\ldots,x_k\}$ are called the
  \emph{faces} of $\sigma$.
 \end{defn}
 We then need a way to combine these basic building blocks meaningfully
 so that the resulting object can adequately reflect the topological
 structure of the metric space.
 \begin{defn}[Simplicial complex]
  A \emph{simplicial complex} is a collection $K$ of simplices such
  that:
  \begin{itemize}
  \item any face of a simplex of $K$ is a simplex of $K$
  \item the intersection of two simplices of $K$ is either the empty
    set or a common face or both.
  \end{itemize}
 \end{defn}
 %% TODO figure with examples of simplicial complexes
 Using these definitions, we can define homology on simplicial
 complexes. %% TODO add reference for more details/do it myself?
 \section{Filtrations}%
 \label{sec:filtrations}
 If we consider that a simplicial complex is a kind of
 ``discretization'' of a metric space, we realise that there must be an
 issue of \emph{scale}. For our analysis to be invariant under small
 perturbations in the data, we need a way to find the optimal scale
 parameter to capture the adequate topological structure, without
 taking into account some small perturbations, nor ignoring some
 important smaller features.
 %% TODO rewrite using the Cech filtration as an example?
 The ideal solution to these problems is to consider all scales at
 once: this is the objective of \emph{filtered simplical complexes}.
 \begin{defn}[Filtration]
  A \emph{filtered simplicial complex}, or simply a \emph{filtration},
  $K$ is a sequence ${(K_i)}_{i\in I}$ of simplicial complexes such
  that:
  \begin{itemize}
  \item for any $i, j \in I$, if $i < j$ then $K_i \subseteq K_j$,
  \item $\bigcup_{i\in I} K_i = K$.
  \end{itemize}
 \end{defn}
 \section{Persistent Homology}%
 \label{sec:persistent-homology}
 We can now compute the homology for each step in a filtration. This
 leads to the notion of \emph{persistent homology}, which gives us all
 the information necessary to establish the topological structure of
 the metric space at multiple scales.
 \begin{defn}[Persistent homology]
  The \emph{$p$-th persistent homology} of a simplicial complex
  $K = {(K_i)}_{i\in I}$ is the pair
  $(\{H_p(K_i)\}_{i\in I}, \{f_{i,j}\}_{i,j\in I, i\leq j})$, where
  for all $i\leq j$, $f_{i,j} : H_p(K_i) \mapsto H_p(K_j)$ is induced
  by the inclusion map $K_i \mapsto K_j$.
 \end{defn}
 The functions $f_{i,j}$ allow us to link generators in each successive
 homology space in the filtration. Since each generator correspond to a
 topological feature (connected component, hole, void, etc, depending
 on the dimension $p$), we can determine whether it survives in the
 next step of the filtration. We can now determine when each feature is
 born and when it dies (if it dies at all). This representation will be
 dependent on the choice of basis for each homology space
 $H_p(K_i)$. However, by the Fundamental Theorem of Persistent
 Homology, we can choose base vectors in each homology space such that
 the collection of half-open intervals is well-defined and unique. This
 construction is called a \emph{barcode}.
 %% TODO references for the Fundamental Theorem
 \section{Topological summaries: barcodes and persistence diagrams}%
 \label{sec:topol-summ}
 In order to interpret the results of the persistent homology
 computation, we need to compare the output for a particular data set
 to a suitable null model. For this, we need some kind of a similarity
 measure between barcodes and a way to evaluate the statistical
 significance of the results.
 One possible approach for this is to define a space in which we can
 project barcodes and study their geometric
 properties. \emph{Persistence diagrams} are an example of such a
 space.
 \begin{defn}[Persistence diagrams]
  A \emph{persistence diagram} is the union of a finite multiset of
  points in $\bar{\mathbb{R}}^2$ zith the diagonal
  $\Delta = \{(x,x) \;|\; x\in\mathbb{R}^2\}$, where every point of
  $\Delta$ has infinite multiplicity.
 \end{defn}
 The diagonal $\Delta$ is added to facilitate comparisons between
 diagrams, as points near the diagonal correspond to short-lived
 topological feature, thus likely to be caused by small perturbations
 in the data.
 We can now define several distances on the space of persistence
 diagrams.
 \begin{defn}[Wasserstein distance]
  The \emph{$p$-th Wasserstein distance} between two diagrams $X$ and
  $Y$ is
  \[ W_p[d](X, Y) = \inf_{\phi:X\mapsto Y} \left[\sum_{x\in X} {d\left(x, \phi(x)\right)}^p\right] \]
  for $p\in [1,\infty)$, and
  \[ W_\infty[d](X, Y) = \inf_{\phi:X\mapsto Y} \sup_{x\in X} d\left(x,
      \phi(x)\right) \] for $p = \infty$, where $d$ is a distance on
  $\mathbb{R}^2$ and $\phi$ ranges over all bijections from $X$ to
  $Y$.
 \end{defn}
 \begin{defn}[Bottleneck distance]
  The \emph{bottleneck distance} is defined as the infinite
  Wasserstein distance with $d$ the uniform norm:
  $d_B = W_\infty[L_\infty]$.
 \end{defn}
 Since the bottleneck distance is by far the most commonly used, we
 will focus on it in the following. It is symmetric, non-negative, and
 satisfies the triangle inequality. However, it is not a true distance,
 as it is fairly straightforward to come up with two distinct diagrams
 at bottleneck distance zero, even on multisets not touching the
 diagonal $\Delta$.
 \section{Stability}%
 \label{sec:stability}
@ -97,10 +256,6 @@ Thank you!
 \chapter{Temporal Networks}%
 \label{cha:temporal-networks}
 \backmatter%
 \nocite{*}
--- a/dissertation/preamble.tex
+++ b/dissertation/preamble.tex
@ -16,6 +16,20 @@
 \usepackage{lettrine}
 \usepackage{amssymb, amsmath}
 \usepackage{amsthm}
 \theoremstyle{plain}
 \newtheorem{thm}{Theorem}[chapter]
 \newtheorem{lem}[thm]{Lemma}
 \newtheorem{cor}[thm]{Corollary}
 \newtheorem{prop}[thm]{Proposition}
 \theoremstyle{definition}
 \newtheorem{defn}{Definition}[chapter]
 \newtheorem{expl}{Example}[chapter]
 \theoremstyle{remark}
 \newtheorem*{rem}{Remark}
 \newtheorem*{note}{Note}
 \newtheorem*{notation}{Notation}
 \usepackage{pdfpages}