tda-networks/dissertation/dissertation.tex

\documentclass[a4paper,11pt,openany,extrafontsizes]{memoir}

\input{preamble}

\begin{document}

\pagestyle{plain}
\tightlists%

\begin{titlingpage}
  \begin{center}
    \vspace{1cm}
    \textsf{\Huge{University of Oxford}}\\
    \vspace{1cm}
    \includegraphics{branding/beltcrest.png}\\
    \vspace{2cm}
    \Huge{\thetitle}\\
    \vspace{2cm}
    \large{by\\[14pt]\theauthor\\[8pt]St Catherine's College}
    \vfill
    %% Inkscape L-system
    %% [C]++[C]++[C]++[C]++[C]
    %% B=DA++EA----CA[-DA----BA]++;C=+DA--EA[---BA--CA]+;D=-BA++CA[+++DA++EA]-;E=--DA++++BA[+EA++++CA]--CA;A=
    % \begin{tikzpicture}
    %   \pgfdeclarelindenmayersystem{Penrose}{
    %     \symbol{M}{\pgflsystemdrawforward}
    %     \symbol{N}{\pgflsystemdrawforward}
    %     \symbol{O}{\pgflsystemdrawforward}
    %     \symbol{P}{\pgflsystemdrawforward}
    %     \symbol{A}{\pgflsystemdrawforward}
    %     \symbol{+}{\pgflsystemturnright}
    %     \symbol{-}{\pgflsystemturnleft}
    %     \rule{M->OA++PA----NA[-OA----MA]++}
    %     \rule{N->+OA--PA[---MA--NA]+}
    %     \rule{O->-MA++NA[+++OA++PA]-}
    %     \rule{P->--OA++++MA[+PA++++NA]--NA}
    %     \rule{A->}
    %   }
    %   \draw[lindenmayer system={Penrose, axiom=[N]++[N]++[N]++[N]++[N],
    %     order=2, angle=36, step=4pt}]
    %   lindenmayer system;
    % \end{tikzpicture}
    % \vspace{2.2cm}
    \vfill
    \large{A dissertation submitted in partial fulfilment of the requirements for the degree of\\
      Master of Science in Statistical Science}\\
    \vspace{.5cm}
    \large{\emph{Department of Statistics,\\ 24--29 St Giles, Oxford, OX1 3LB}}\\
    \vspace{1cm} \large{\thedate}
  \end{center}
\end{titlingpage}

%\chapterstyle{hangnum}
%\chapterstyle{ell}
%\chapterstyle{southall}
\chapterstyle{wilsondob}

\frontmatter

\cleardoublepage%

\vspace*{3cm}
\begin{abstract}
  Temporal networks are a mathematical model to represent interactions
  evolving over time. As such, they have a multitude of applications,
  from biology to physics to social networks. The study of dynamics on
  networks is an emerging field, with many challenges in modelling and
  data analysis.

  An important issue is to uncover meaningful temporal structure in a
  network. We focus on the problem of periodicity detection in
  temporal networks, by partitioning the time range of the network and
  clustering the resulting subnetworks.

  For this, we leverage methods from the field of topological data
  analysis and persistent homology. These methods have begun to be
  employed with static graphs in order to provide a summary of
  topological features, but applications to temporal networks have
  never been studied in detail.

  We cluster temporal networks by computing the evolution of
  topological features over time. Applying persistent homology to
  temporal networks and comparing various approaches has never been
  done before, and we examine their performance side-by-side with a
  simple clustering algorithm. Using a generative model, we show that
  persistent homology is able to detect periodicity in the topological
  structure of a network.

  We define two types of topological features, with and without
  aggregating the temporal networks, and multiple ways of embedding
  them in a feature space suitable for machine-learning
  applications. In particular, we examine the theoretical guarantees
  and empirical performance of kernels defined on topological
  features.

  Topological insights prove to be useful in statistical learning
  applications. Combined with the recent advances in network science,
  they lead to a deeper understanding of the structure of temporal
  networks.
\end{abstract}
\vspace*{\fill}

\cleardoublepage%

\chapter{Acknowledgements}%
\label{cha:acknowledgements}

I would like to thank my supervisors, Dr Heather Harrington, Dr Renaud
Lambiotte, and Dr Mason Porter, from the Mathematical Institute, for
their continuous support and guidance from the very beginning. They
have allowed me to pursue my interests in networks and topological
data analysis while providing me with resources, ideas, and
motivation. They remained available to answer my questions and listen
to my ideas, and provided invaluable feedback at every stage of the
project.

I would also like to thank Dr Steve Oudot from École polytechnique,
who was the first to introduce me to the field of topological data
analysis, which led me to the original idea for the project. He was
also very helpful during the project, giving me advice and updates on
recent advances in persistent homology.

I also want to acknowledge the students and staff of the Department of
Statistics and St Catherine's college, who always provided a
stimulating work environment, along with excellent discussions.

Finally, my thanks go to my family and friends for their interest in
my project, because trying to explain it to people not acquainted with
the topic was, and remains, the best way to clarify my ideas and
organise my thoughts.

\cleardoublepage%

\tableofcontents

\clearpage

\listoffigures

\begingroup
\let\clearpage\relax
\listofalgorithms%
\addcontentsline{toc}{chapter}{List of Algorithms}
\endgroup

\clearpage

\mainmatter%

\chapter{Introduction}%
\label{cha:introduction}

\section{Temporal networks analysis}%
\label{sec:temp-netw-analys}

Networks are one of the most important mathematical concepts developed
in the last few centuries. They allow the representation of
interconnected data and complex systems. As such, the concept has been
applied to wide variety of problems, in biology and neuroscience,
physics, computer networks, and social science. In this context,
network science has emerged as a discipline of its own, combining
ideas and challenges from multiple fields of
study~\cite{newman_networks:_2010}.

\captionsetup[figure]{labelformat=empty}

\begin{wrapfigure}[15]{R}[.3cm]{.3\linewidth}
  \centering
  \vspace{-15pt}
  \SetCoordinates[yAngle=45]
  \begin{tikzpicture}[multilayer=3d]
    \SetLayerDistance{-2}
    \Plane[x=.5,y=.5,width=3,height=3,style={thin,dashed},layer=1]
    \Plane[x=.5,y=.5,width=3,height=3,style={thin,dashed},layer=2]
    \Plane[x=.5,y=.5,width=3,height=3,style={thin,dashed},layer=3]
    \begin{Layer}[layer=1]
      \node at (.5,.5)[below right] {Time $t_1$};
    \end{Layer}
    \begin{Layer}[layer=2]
      \node at (.5,.5)[below right] {Time $t_2$};
    \end{Layer}
    \begin{Layer}[layer=3]
      \node at (.5,.5)[below right] {Time $t_3$};
    \end{Layer}

    \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7,layer=1]{0_1}
    \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7,layer=1]{1_1}
    \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7,layer=1]{2_1}
    \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7,layer=1]{3_1}
    \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7,layer=1]{4_1}
    \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7,layer=1]{5_1}
    \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7,layer=1]{6_1}
    \Edge[](1_1)(3_1)
    \Edge[](3_1)(4_1)
    \Edge[](3_1)(4_1)
    \Edge[](0_1)(5_1)
    \Edge[](0_1)(5_1)
    \Edge[](2_1)(5_1)
    \Edge[](4_1)(5_1)
    \Edge[](0_1)(6_1)
    \Edge[](1_1)(6_1)
    \Edge[](5_1)(6_1)

    \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7,layer=2]{0_2}
    \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7,layer=2]{1_2}
    \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7,layer=2]{2_2}
    \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7,layer=2]{3_2}
    \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7,layer=2]{4_2}
    \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7,layer=2]{5_2}
    \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7,layer=2]{6_2}
    \Edge[](1_2)(2_2)
    \Edge[](0_2)(3_2)
    \Edge[](2_2)(4_2)
    \Edge[](2_2)(6_2)

    \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7,layer=3]{0_3}
    \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7,layer=3]{1_3}
    \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7,layer=3]{2_3}
    \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7,layer=3]{3_3}
    \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7,layer=3]{4_3}
    \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7,layer=3]{5_3}
    \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7,layer=3]{6_3}
    \Edge[](0_3)(2_3)
    \Edge[](0_3)(2_3)
    \Edge[](1_3)(4_3)
    \Edge[](2_3)(5_3)
    \Edge[](4_3)(5_3)
    \Edge[](0_3)(6_3)
    \Edge[](2_3)(6_3)
    \Edge[](4_3)(6_3)
  \end{tikzpicture}
  \caption[Multilayer network.]{}%
  \label{fig:multilayer}
\end{wrapfigure}

\captionsetup[figure]{labelformat=default}

An emerging trend in network science is the study of dynamics on
networks~\cite{holme_temporal_2012, holme_modern_2015,
  porter_dynamical_2014}. Real-world systems, such as brains or social
groups, tend to evolve over time, and these changing networks have
given birth to the new field of network dynamics, where edges can
reconfigure over time. Mathematical modelling of temporal connectivity
patterns remain a difficult
problem~\cite{bassett_network_2017}. Recent advances in applied
mathematics have led to may concurrent representations, multilayer
networks~\cite{kivela_multilayer_2014} being one of the most
important.

Temporal networks bring new challenges in size, shape, and complexity
of data analysis, but also new opportunities with the development of
new empirical methods and theoretical advances. One of these advances
is the development of generative models that can be used to infer the
dynamic mechanisms taking place in real-world
systems~\cite{bazzi_generative_2016, gauvin_randomized_2018,
  petri_simplicial_2018}.

Moreover, network theory naturally exposes many links with
topology. The purpose of networks lies in the representation of
\emph{structure}, while topology is the study of spaces and
\emph{connectedness}. As topological methods gain traction in data
science and statistical learning, they are also applied to more
complex data representations, including
networks~\cite{horak_persistent_2009, petri_topological_2013,
  stolz_persistent_2017}. Topological features naturally complement
more traditional network statistics by focusing on mesoscale
structure.


\section{Related work}%
\label{sec:related-work}

Topological data analysis (TDA) is a recent
field~\cite{carlsson_topology_2009}. It was originally focused on
point cloud data, with only a recent shift towards network
data~\cite{horak_persistent_2009}. Various methods have been
developed, the main one being the weight-rank clique filtration
(WRCF)~\cite{petri_topological_2013}. Other examples of application of
TDA to networks using WRCF can be found in~\cite{otter_roadmap_2017}.

There has also been attempts to map the nodes of a network to points
in a metric space. For instance, the shortest-path distance between
nodes can be used to compute pairwise distances in the network. Note
that for this method to work properly, the network has to be
connected. Many methods can be used to build a simplicial complex from
a directed or undirected network~\cite{jonsson_simplicial_2008,
  horak_persistent_2009}.

The main starting point for this project was the introduction of TDA
for the study of temporal networks
in~\cite{price-wright_topological_2015}. In this dissertation,
topological methods are introduced to classify temporal networks
randomly generated by different models. The objective of this study
was to uncover the temporal structure of a network in order to inform
its partitioning into ``snapshots''. Different methods to partition a
network were compared for the first time, and topological features
appeared to be relevant for distinguishing various temporal
distributions.

Finally, there has been an increasing interest in using the
topological structure of a dataset as an additional source of
information for a statistical learning model. This has led to the
development of topological descriptors suitable for use in various
learning contexts. Previous work on vectorizations and kernels on
topological features will be useful in the analysis of the structure
of temporal networks.

\section{Contributions}%
\label{sec:contributions}

The main contributions of this work are threefold:
\begin{itemize}
\item We make an attempt at temporal partitioning networks and
  clustering the subnetworks, with immediate application for detecting
  periodicity. Sliding windows and persistent homology have been used
  in the context of periodicity detection
  before~\cite{perea_sw1pers:_2015, perea_sliding_2017}, but never in
  the context of temporal networks.
\item In general, topological methods have never been thoroughly
  studied on temporal network data. The work
  in~\cite{price-wright_topological_2015} is the first to introduce
  the topic, but computation was limited due to the lack of available
  libraries. Here, we introduce recent (from the last 2--3 years)
  state-of-the-art topological methods and adapt them to temporal
  networks.
\item Various methods to use topological features in a statistical
  learning context and their trade-offs are exposed. The mathematical
  background and practical considerations are leveraged to compare
  them in the context of machine learning.
\item Finally, different topological approaches are compared. There
  are different ways to build a simplicial filtration on a network,
  and different manners of measuring distances between the outputs of
  persistent homology in the context of machine learning. These
  different methods are compared here with the objective of
  periodicity detection in temporal networks.
\end{itemize}


\chapter{Graphs and Temporal Networks}%
\label{cha:temporal-networks}

\section{Definition and basic properties}%
\label{sec:defin-basic-prop}

In this section, we introduce the notion of temporal networks (or
temporal graphs). This is a complex notion, with many concurrent
definitions and interpretations.

After clarifying the notations, we restate the standard definition of
a non-temporal graph.

\begin{notation}
  \begin{itemize}
  \item $\mathbb{N}$ is the set of non-negative natural numbers
    $0,1,2,\ldots$
  \item $\mathbb{N}^*$ is the set of positive integers $1,2,\ldots$
  \item $\mathbb{R}$ is the set of real numbers.
    $\mathbb{R}_+ = \{x\in\mathbb{R} \;|\; x\geq 0\}$, and
    $\mathbb{R}_+^* = \{x\in\mathbb{R} \;|\; x>0\}$.
\end{itemize}
\end{notation}

\begin{defn}[Graph]
  A \emph{graph} is a couple $G = (V, E)$, where $V$ is a set of
  \emph{nodes} (or \emph{vertices}), and $E \subseteq V\times V$ is a
  set of \emph{edges}. A \emph{weighted graph} is defined by
  $G = (V, E, w)$, where $w : E\mapsto \mathbb{R}_+^*$ is called the
  \emph{weight function}.
\end{defn}

We also define some basic concepts that we will need later to build
simplicial complexes on graphs.

\begin{defn}[Clique]
  A \emph{clique} is a set of nodes where each pair is adjacent. That
  is, a clique $C$ of a graph $G = (V,E)$ is a subset of $V$ such that
  for all $i,j\in C, i \neq j \implies (i,j)\in E$. A clique is said
  to be \emph{maximal} if it cannot be augmented by any node, such
  that the resulting set of nodes is itself a clique.
\end{defn}

Temporal networks can be defined in the more general framework of
\emph{multilayer networks}~\cite{kivela_multilayer_2014}. However,
this definition is much too general for our simple applications, and
we restrict ourselves to edge-centric time-varying
graphs~\cite{casteigts_time-varying_2012}. In this model, the set of
nodes is fixed, but edges can appear or disappear at different times.

In this study, we restrict ourselves to discrete time stamps. Each
interaction is taken to be instantaneous.

\begin{defn}[Temporal network]\label{defn:temp-net}
  A \emph{temporal network} is a tuple
  $G = (V, E, \mathcal{T}, \rho)$, where:
  \begin{itemize}
  \item $V$ is a set of nodes,
  \item $E\subseteq V\times V$ is a set of edges,
  \item $\mathbb{T}$ is the \emph{temporal domain} (often taken as
    $\mathbb{N}$ or any other countable set), and
    $\mathcal{T}\subseteq\mathbb{T}$ is the \emph{lifetime} of the
    network,
  \item $\rho: E\times\mathcal{T}\mapsto\{0,1\}$ is the \emph{presence
      function}, which determines whether an edge is present in the
    network at each time stamp.
  \end{itemize}
  The \emph{available times} of an edge are the set
  $\mathcal{I}(e) = \{t\in\mathcal{T}: \rho(e,t)=1\}$.
\end{defn}

Temporal networks can also have weighted edges. In this case, it is
possible to have constant weights (edges can only appear or disappear
over time, and always have the same weight), or time-varying
weights. In the latter case, we can set the domain of the presence
function to be $\mathbb{R}_+$ instead of $\{0,1\}$, where by
convention a 0 weight corresponds to an absent edge.

\begin{defn}[Additive and dismantling temporal
  networks]\label{defn:additive}
  A temporal network is said to be \emph{additive} if for all $e\in E$
  and $t\in\mathcal{T}$, if $\rho(e,t)=1$, then for all
  $t'>t, \rho(e, t') = 1$. An additive network can only gain edges
  over time.

  A temporal network is said to be \emph{dismantling} if for all
  $e\in E$ and $t\in\mathcal{T}$, if $\rho(e,t)=0$, then for all
  $t'>t, \rho(e, t') = 0$. An dismantling network can only lose edges
  over time.
\end{defn}

\section{Network statistics}%
\label{sec:network-statistics}

To analyse networks, network statistics are used. These are
low-dimensional summaries of important properties of a graph. Some of
them focus on local features, while some others concentrate on global
aspects. Note that the following only applies for graphs, and cannot
be used directly on temporal networks.

These definitions are taken from the reference work by
Newman~\cite{newman_networks:_2010}.

The first network statistics try to determine which vertices are the
most \emph{central}, which are the most ``important'' in the network.

\begin{defn}[Local clustering coefficient]
  The \emph{local clustering coefficient} of a vertex $v$ is defined as
  \[ C(v) = \frac{\sum_{u,w\in \mathcal{V}}
      a_{u,v}a_{w,v}a_{u,w}}{\sum_{u,w\in \mathcal{V}, u\neq w}
      a_{u,v}a_{w,v}}. \]

  The \emph{average clustering coefficient} is
  \[ \overline{C} = \frac{1}{\left|\mathcal{V}\right|}
    \sum_{v\in\mathcal{V}} C(v). \]
\end{defn}

\begin{defn}[Global clustering coefficient]
  The \emph{global clustering coefficient} or \emph{transitivity} is
  \[ C = \frac{3\times\text{number of triangles}}{\text{number of
        connected triples}}. \]
\end{defn}

Another interesting summary is the average shortest path between
vertices.

\begin{defn}[Path]
  A \emph{path} between two vertices $v_0$ and $v_n$ is a sequence of
  vertices $(v_0, v_1, \ldots, v_n)$ such that every consecutive pair
  of vertices $(v_i, v_{i+1})$ is connected by an edge.

  The \emph{length} of a path is the number of edges traversed along
  the path. The distance $l(u,v)$ between vertices $u$ and $v$ is
  defined as the length of the shortest path between $u$ and $v$.
\end{defn}

\begin{defn}[Average shortest path length]
  The \emph{Average shortest path length} is defined as
  \[ l =
    \frac{1}{\left|\mathcal{V}\right|(\left|\mathcal{V}\right|-1)}
    \sum_{u,v \in\mathcal{V}, u\neq v} l(u,v). \]
\end{defn}

Many other centrality measures exist, the most well-known being the
eigenvector centrality, Katz centrality, and PageRank. See chapter~7
of~\cite{newman_networks:_2010} for more details.

% \section{Examples of applications}%
% \label{sec:exampl-appl}

% \section{Network partitioning}%
% \label{sec:network-partitioning}

% Temporal networks are a very active research subject, leading to
% multiple interesting problems. The additional time dimension adds a
% significant layer of complexity that cannot be adequately treated by
% the common methods on static graphs.

% Moreover, data collection can lead to large amount of noise in
% datasets. Combined with large dataset sized due to the huge number of
% data points for each node in the network, temporal graphs cannot be
% studied effectively in their raw form. Recent advances have been made
% to fit network models to rich but noisy
% data~\cite{newman_network_2018}, generally using some variation on the
% expectation-maximization (EM) algorithm.

% One solution that has been proposed to study such temporal data has
% been to \emph{partition} the time scale of the network into a sequence
% of smaller, static graphs, representing all the interactions during a
% short interval of time. The approach consists in subdividing the
% lifetime of the network in \emph{sliding windows} of a given length.
% We can then ``flatten'' the temporal network on each time interval,
% keeping all the edges that appear at least once (or adding their
% weights in the case of weighted networks).

% This partitioning is sensitive to two parameters: the length of each
% time interval, and their overlap. Of those, the former is the most
% important: it will define the \emph{resolution} of the study. If it is
% too small, too much noise will be taken into account; if it is too
% large, we will lose important information. There is a need to find a
% compromise, which will depend on the application and on the task
% performed on the network. In the case of a classification task to
% determine periodicity, it will be useful to adapt the resolution to
% the expected period: if we expect week-long periodicity, a resolution
% of one day seems reasonable.

% Once the network is partitioned, we can apply any statistical learning
% task on the sequence of static graphs. In this study, we will focus on
% classification of time steps. This can be used to detect periodicity,
% outliers, or even maximise temporal communities.

\chapter{Topological Data Analysis and Persistent Homology}%
\label{cha:tda-ph}

\section{Basic constructions}%
\label{sec:basic-constructions}

\subsection{Homology}%
\label{sec:homology}

Our goal is to understand the topological structure of a metric
space. For this, we can use \emph{homology}, which consists of
associating a vector space $H_i(X)$ to a metric space $X$ and a
dimension $i$. The dimension of $H_i(X)$ gives us the number of
$i$-dimensional components in $X$: the dimension of $H_0(X)$ is the
number of path-connected components in $X$, the dimension of $H_1(X)$
is the number of holes in $X$, and the dimension of $H_2(X)$ is the
number of voids.

Crucially, these vector spaces are robust to continuous deformation of
the underlying metric space (they are \emph{homotopy
  invariant}). However, computing the homology of an arbitrary metric
space can be extremely difficult. It is necessary to approximate it in
a structure that would be both combinatorial and topological in
nature.

\subsection{Simplicial complexes}%
\label{sec:simplicial-complexes}

To understand the topological structure of a metric space, we need a
way to decompose it in smaller pieces that, when assembled, conserve
the overall organisation of the space. For this, we use a structure
called a \emph{simplicial complex}, which is a kind of
higher-dimensional generalization of a graph.

The building blocks of this representation is the \emph{simplex},
which is the convex hull of an arbitrary set of points. Examples of
simplices include single points, segments, triangles, and tetrahedrons
(in dimensions 0, 1, 2, and 3 respectively).

\begin{defn}[Simplex]
  A \emph{$k$-dimensional simplex} $\sigma = [x_0,\ldots,x_k]$ is the
  convex hull of the set $\{x_0,\ldots,x_k\} \in \mathbb{R}^d$, where
  $x_0,\ldots,x_k$ are affinely independent. $x_0,\ldots,x_k$ are
  called the \emph{vertices} of $\sigma$, and the simplices defined by
  the subsets of $\{x_0,\ldots,x_k\}$ are called the \emph{faces} of
  $\sigma$.
\end{defn}

\begin{figure}[ht]
  \centering
  \begin{subfigure}[b]{.3\linewidth}
    \centering
    \begin{tikzpicture}
      \tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
      inner sep=0pt,minimum size=15pt]
      \node (a)[point] at (0,0) {a};
    \end{tikzpicture}
    \caption{Single vertex}
  \end{subfigure}%
  %
  \begin{subfigure}[b]{.3\linewidth}
    \centering
    \begin{tikzpicture}
      \tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
      inner sep=0pt,minimum size=15pt]
      \node (a)[point] at (0,0) {a};
      \node (b)[point] at (1.4,2) {b};

      \begin{scope}[on background layer]
        \draw[fill=blue!15] (a.center) -- (b.center) -- cycle;
      \end{scope}
    \end{tikzpicture}
    \caption{Segment}
  \end{subfigure}%
  %
  \begin{subfigure}[b]{.3\linewidth}
    \centering
    \begin{tikzpicture}
      \tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
      inner sep=0pt,minimum size=15pt]
      \node (a)[point] at (0,0) {a};
      \node (b)[point] at (1.4,2) {b};
      \node (c)[point] at (2.8,0) {c};

      \begin{scope}[on background layer]
        \draw[fill=blue!15] (a.center) -- (b.center) -- (c.center) -- cycle;
      \end{scope}
    \end{tikzpicture}
    \caption{Triangle}
  \end{subfigure}%
  %
  \caption{Examples of simplices.}%
  \label{fig:simplex}
\end{figure}


We then need a way to meaningfully combine these basic building blocks
so that the resulting object can adequately reflect the topological
structure of the metric space.

\begin{defn}[Simplicial complex]
  A \emph{simplicial complex} is a collection $K$ of simplices such
  that:
  \begin{itemize}
  \item any face of a simplex of $K$ is a simplex of $K$
  \item the intersection of two simplices of $K$ is either the empty
    set, or a common face, or both.
  \end{itemize}
\end{defn}

\begin{figure}[ht]
  \centering
  \begin{tikzpicture}
    \tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
      inner sep=0pt,minimum size=10pt]
      \node (a)[point] {};
      \node (b)[point,above right=1.4cm and 1cm of a] {};
      \node (c)[point,right=2cm of a] {};
      \node (d)[point,above right=.4cm and 2cm of b] {};
      \node (e)[point,above right=.4cm and 2cm of c] {};
      \node (f)[point,below right=.7cm and 1.3cm of c] {};
      \node (g)[point,right=2cm of d] {};
      \node (h)[point,below right=.4cm and 1.5cm of e] {};

      \begin{scope}[on background layer]
        \draw[fill=blue!15] (a.center) -- (b.center) -- (c.center) -- cycle;
        \draw (b) -- (d) -- (g);
        \draw (c.center) -- (e.center) -- (f.center) -- cycle;
        \draw (d) -- (e) -- (h);
      \end{scope}

      \node (1)[point,right=2cm of g] {};
      \node (2)[point,above right=.5cm and 1cm of 1] {};
      \node (3)[point,below right=.5cm and 1cm of 2] {};
      \node (4)[point,below left=1cm and .3cm of 3] {};
      \node (5)[point,below right=1cm and .3cm of 1] {};
      \node (6)[point,below left=1cm and .1cm of 5] {};
      \node (7)[point,below right=1cm and .1cm of 4] {};
      \node (8)[point,below right=.7cm and .7cm of 6] {};

      \begin{scope}[on background layer]
        \draw[fill=green!15] (1.center) -- (2.center) -- (3.center) -- (4.center) -- (5.center) -- cycle;
        \draw (1) -- (4) -- (2) -- (5) -- (3) -- (1);
        \draw[fill=blue!15] (6.center) -- (7.center) -- (8.center) -- cycle;
        \draw (5) -- (6) -- (4) -- (7);
      \end{scope}
  \end{tikzpicture}
  \caption[Example of a simplicial complex.]{Example of a simplicial
    complex that has two connected components, two 3-simplices, and
    one 5-simplex.}%
  \label{fig:simplical-complex}
\end{figure}

The notion of simplicial complex is closely related to that of a
hypergraph. One important distinction lies in the fact that a subset
of a hyperedge is not necessarily a hyperedge itself.

\subsection{Simplicial homology}%
\label{sec:simplicial-homology}

Using these definitions, we can define homology on simplicial
complexes~\cite{edelsbrunner_computational_2010,
  chazal_introduction_2017}. In this section, we restrict to homology
with coefficients in $\mathbb{Z}_2$, the field with two elements.

\begin{defn}[$k$-chains]
  Let $K$ be a finite simplicial complex, and $p$ a non-negative
  integer. The space of \emph{$k$-chains} $C_p(K)$ of $K$ is the set
  of formal sums of $p$-simplices of $K$. More precisely, it is the
  $\mathbb{Z}_2$-vector space spanned by the $p$-simplices of $K$.
\end{defn}

Since the coefficients of $C_p(K)$ are in $\mathbb{Z}_2$, a $p$-chain
is simply a finite collection of $p$-simplices. The sum of two
$k$-chains is the symmetric difference of the two chains, i.e.\ the
collection of $p$-simplices that belong to either, but not both, of
the chains.

\begin{defn}[Boundary]
  The \emph{boundary} of a $p$-simplex $\sigma$ is the $(p-1)$-chain
  \[ \partial_p(\sigma) := \sum_{\tau\in K_{p-1},\; \tau\subset\sigma}
    \tau, \] where $K_{p-1}$ is the set of $(p-1)$-simplices of $K$.
\end{defn}

As the $p$-simplices form a basis of $C_p(K)$, $\partial_p$ can be
extended into a linear map from $C_p(K)$ to $C_{p-1}(K)$, called the
\emph{boundary operator}. The elements of the kernel
$\mathrm{Ker}(\partial_p)$ are called the \emph{$p$-cycles} of
$K$. The image $\mathrm{Im}(\partial_p)$ is the space of
\emph{$p$-boundaries} of $K$.

\begin{lem}\label{lem:boundary}
  The image of $\partial_{p+1}$ is a subset of the kernel of
  $\partial_p$.
\end{lem}

\begin{proof}
  The boundary of a boundary is always empty. To see this, consider
  the boundary of a $(p+1)$-simplex $\sigma$. The boundary of $\sigma$
  consists of all $p$-faces of $\sigma$. The boundary of this boundary
  will contain each $(p-1)$-face of $\sigma$ twice, and since $1+1=0$
  in $\mathbb{Z}_2$, we have that
  \[ \partial_{p} \circ \partial_{p+1} \equiv 0. \]

  This implies directly that
  $\mathrm{Im}(\partial_{p+1}) \subset \mathrm{Ker}(\partial_p)$.
\end{proof}

\begin{defn}[Homology]
  For any $p\in\mathbb{N}$, the \emph{$p$-th (simplicial) homology
    group} of a simplicial complex $K$ is the quotient vector space
  \[ H_p(K) := \mathrm{Ker}(\partial_{p}) /
    \mathrm{Im}(\partial_{p+1}). \]

  The dimension $\beta_p(K)$ of $H_p(K)$ is called the \emph{$p$-th
    Betti number} of $K$.
\end{defn}

Let us close this overview of simplicial homology by a look at induced
maps. Let $K$ and $L$ be two simplicial complexes and $f: K \mapsto L$
a simplicial map between them. Since $f$ maps linearly each simplex of
$K$ to a simplex of $L$, we can extend it to map a chain of $K$ to a
chain of $L$ of the same dimension. If $c = \sum a_i \sigma_i$ is a
$p$-chain in $K$, we can define $f_\#(c) = \sum a_i \tau_i$, where
$\tau_i = f(\sigma_i)$ if it has dimension $p$ and $\tau_i = 0$ if
$f(\sigma_i)$ has dimension less than $p$.

\begin{figure}[!ht]
  \centering
  \begin{tikzpicture}
    \node (l dots left)  at (0,0) {\ldots};
    \node (cpl)          at (3,0) {$C_p(L)$};
    \node (cp1l)         at (6,0) {$C_{p-1}(L)$};
    \node (l dots right) at (9,0) {\ldots};
    \node (k dots left)  at (0,2) {\ldots};
    \node (cpk)          at (3,2) {$C_p(K)$};
    \node (cp1k)         at (6,2) {$C_{p-1}(K)$};
    \node (k dots right) at (9,2) {\ldots};
    \draw[->] (l dots left) -- (cpl) node [above,midway] {$\partial_{p+1}$};
    \draw[->] (cpl) -- (cp1l) node [above,midway] {$\partial_{p}$};
    \draw[->] (cp1l) -- (l dots right) node [above,midway] {$\partial_{p-1}$};
    \draw[->] (k dots left) -- (cpk) node [above,midway] {$\partial_{p+1}$};
    \draw[->] (cpk) -- (cp1k) node [above,midway] {$\partial_{p}$};
    \draw[->] (cp1k) -- (k dots right) node [above,midway] {$\partial_{p-1}$};
    \draw[->] (cpk) -- (cpl) node [right,midway] {$f_\#^{p}$};
    \draw[->] (cp1k) -- (cp1l) node [right,midway] {$f_\#^{p-1}$};
  \end{tikzpicture}
  \caption{Induced maps and boundary operators.}%
  \label{fig:induced-map}
\end{figure}

\begin{prop}
  $f_\#$ commutes with the boundary operator:
  \[ f_\# \circ \partial_K = \partial_L \circ f_\#. \]
\end{prop}

\begin{proof}
  Consider $f_\#^p : C_p(K) \mapsto C_p(L)$, and let
  $c = \sum a_i \sigma_i \in C_p(K)$. If $f(\sigma_i)$ has dimension
  $p$, then all $(p-1)$-faces of $\sigma_i$ map to the corresponding
  $(p-1)$-faces of $\tau_i$, and the proposition follows. On the other
  hand, if $f(\sigma_i)$ has dimension less than $p$, then the
  $(p−1)$-faces of σf $\sigma_i$ map to simplices of dimension less
  than $p−1$, with the possible exception of exactly two $(p−1)$-faces
  whose images coincide and cancel each other. So we have that
  $\partial_L(f_\#(\sigma_i)) = f_\#(\partial_K(\sigma_i)) =
  0$. (See~\cite{edelsbrunner_computational_2010} for details.)
\end{proof}

\begin{cor}
  $f_\#$ maps cycles to cycles, and boundaries to boundaries.
\end{cor}

Therefore, $f_\#$ defines a map over quotients, called the
\emph{induced map on homology} \[ f_*^p : H_p(K) \mapsto H_p(L). \]

\begin{prop}\label{prop:functor}
  $f \mapsto f_*^p$ is a functor:
  \begin{itemize}
  \item if $f = \mathrm{id}_X$, then $f_*^p = \mathrm{id}_{H_p(X)}$,
  \item if
    $X \overset{f}{\longrightarrow} Y \overset{g}{\longrightarrow} Z$,
    then ${(g \circ f)}_* = g_* \circ f_*$.
  \end{itemize}
\end{prop}

%% homotopy equivalence?

The derivation of simplicial homology in this section used the field
$\mathbb{Z}_2$. It is however possible to define homology over any
field. The definition of the boundary operator needs to be adapted to
ensure that the \autoref{lem:boundary} remains true, even when
$1 \neq -1$. In this dissertation, we consider only persistent
homology on $\mathbb{Z}_2$, as it is the field used in our
implementation. It is important to note however that changing the
field of the vector spaces can affect the homology and therefore the
topological features detected~\cite{zomorodian_computing_2005}.

\subsection{Filtrations}%
\label{sec:filtrations}

If we consider that a simplicial complex is a kind of
``discretization'' of a subset of a metric space, we realise that
there must be an issue of \emph{scale}. For our analysis to be
invariant under small perturbations in the data, we need a way to find
the optimal scale parameter to capture the adequate topological
structure, without taking into account some small perturbations, nor
ignoring some important smaller features.

To illustrate this, let us take the example of the Čech complex, one
of the most important tools to build a simplicial complex from a
metric space.

\begin{defn}[Nerve]
  Let $\mathcal{S} = {(S_i)}_{i\in I}$ be a non-empty collection of
  sets. The \emph{nerve} of $\mathcal{S}$ is the simplicial complex
  whose vertices are the elements of $I$ and where
  $(i_0, \ldots, i_k)$ is a $k$-simplex if, and only if,
  $S_0 \cap \cdots \cap S_k \neq \emptyset$.
\end{defn}

\begin{defn}[Čech complex]
  Let $X$ be a point cloud in an arbitrary metric space, and
  $\varepsilon > 0$. The \emph{Čech complex}
  $\check{C}_\varepsilon(X)$ is the nerve of the set of
  $\varepsilon$-balls centred on the points in $X$.
\end{defn}

% By the Nerve theorem~\cite{edelsbrunner_computational_2010}, we know
% that for any point cloud $X$ and any $\varepsilon > 0$,
% $\check{C}_\varepsilon(X)$ and $X$ have the same homology.

An example construction of a Čech complex is represented
on~\autoref{fig:cech-complex}. The simplicial complex depends on the
value of $\varepsilon$. To adequately represent the topological
structure of the underlying point cloud, it is necessary to consider
all possible values of $\varepsilon$ in order to capture all the
topological features.

\begin{figure}[ht]
  \centering
  \begin{tikzpicture}[scale=1.1,every node/.style={transform shape}]
    \tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
      inner sep=0pt,minimum size=5pt]
      \node (a)[point] {};
      \node (b)[point,right=1.5cm of a] {};
      \node (c)[point,above right=1cm and .7cm of a] {};
      \node (d)[point,above=1.5cm of c] {};
      \node (e)[point,above right=.8cm and 1.55cm of c] {};
      \node (f)[point,above right=.5cm and 2.5cm of b] {};

      \def\circlea{(a) circle (1cm)}
      \def\circleb{(b) circle (1cm)}
      \def\circlec{(c) circle (1cm)}
      \def\circled{(d) circle (1cm)}
      \def\circlee{(e) circle (1cm)}
      \def\circlef{(f) circle (1cm)}

      \draw[color=gray,dashed,thick] \circlea;
      \draw[color=gray,dashed,thick] \circleb;
      \draw[color=gray,dashed,thick] \circlec;
      \draw[color=gray,dashed,thick] \circled;
      \draw[color=gray,dashed,thick] \circlee;
      \draw[color=gray,dashed,thick] \circlef;

      \begin{scope}
        \clip\circlea;
        \fill[pattern=north east lines,pattern color=blue!40]\circleb;
        \fill[pattern=north east lines,pattern color=blue!40]\circlec;
      \end{scope}
      \begin{scope}
        \clip\circlec;
        \fill[pattern=north east lines,pattern color=blue!40]\circleb;
        \fill[pattern=north east lines,pattern color=blue!40]\circled;
        \fill[pattern=north east lines,pattern color=blue!40]\circlee;
      \end{scope}
      \begin{scope}
        \clip\circled;
        \fill[pattern=north east lines,pattern color=blue!40]\circlee;
      \end{scope}

      \draw[<->] (b) -- +(1cm,0cm) node[left=.45cm,anchor=south] {$\varepsilon$};

      \node (a1)[point,right=7cm of a] {};
      \node (b1)[point,right=1.5cm of a1] {};
      \node (c1)[point,above right=1cm and .7cm of a1] {};
      \node (d1)[point,above=1.5cm of c1] {};
      \node (e1)[point,above right=.8cm and 1.55cm of c1] {};
      \node (f1)[point,above right=.5cm and 2.5cm of b1] {};
      \begin{scope}[on background layer]
        \draw[thick,fill=blue!15] (a1.center) -- (b1.center) -- (c1.center) -- cycle;
        \draw[thick] (c1.center) -- (d1.center) -- (e1.center) -- cycle;
      \end{scope}
  \end{tikzpicture}
  \caption[Example of a point cloud and the corresponding Čech
  complex.]{Example of a point cloud (left), and the corresponding
    Čech complex at level $\varepsilon$ (right). Dashed circles
    represent the $\varepsilon$-balls used to construct the simplicial
    complex.}%
  \label{fig:cech-complex}
\end{figure}

This is the objective of \emph{filtered simplicial complexes}.

\begin{defn}[Filtration]\label{defn:filt}
  A \emph{filtered simplicial complex}, or simply a \emph{filtration},
  $K$ is a sequence ${(K_i)}_{i\in I}$ of simplicial complexes such
  that:
  \begin{itemize}
  \item for any $i, j \in I$, if $i < j$ then $K_i \subseteq K_j$,
  \item $\bigcup_{i\in I} K_i = K$.
  \end{itemize}
\end{defn}

To continue the example of Čech filtrations, one can build a sequence
of simplicial complexes for each value of $\varepsilon > 0$. Due to
their construction, Čech complexes on a point cloud $X$ respect the
essential inclusion property:
\[ \forall \varepsilon, \varepsilon' > 0,\quad \varepsilon < \varepsilon'
  \implies \check{C}_\varepsilon(X) \subseteq
  \check{C}_{\varepsilon'}(X). \]

\section{Persistent Homology}%
\label{sec:persistent-homology}

We can now compute the homology for each step in a filtration. This
leads to the notion of \emph{persistent
  homology}~\cite{carlsson_topology_2009, zomorodian_computing_2005},
which gives all the information necessary to establish the topological
structure of a metric space at multiple scales.

\begin{defn}[Persistent homology]
  The \emph{$p$-th persistent homology} of a simplicial complex
  $K = {(K_i)}_{i\in I}$ is the pair
  $({\{H_p(K_i)\}}_{i\in I}, {\{f_{i,j}\}}_{i,j\in I, i\leq j})$, where
  for all $i\leq j$, $f_{i,j} : H_p(K_i) \mapsto H_p(K_j)$ is induced
  by the inclusion map $K_i \mapsto K_j$.
\end{defn}

By functoriality (\autoref{prop:functor}),
$f_{k,j} \circ f_{i,k} = f_{i,j}$. Therefore, the functions $f_{i,j}$
allow us to link generators in each successive homology space in a
filtration.

Because each generator corresponds to a topological feature (connected
component, hole, void, and so on, depending on the dimension $p$), we
can determine whether it survives in the next step of the
filtration. We say that $x \in H_p(K_i)$ is \emph{born} in $H_p(K_i)$
if it is not in the image of $f_{i-1,i}$. We say that $x$ \emph{dies}
in $H_p(K_j)$ if $j > i$ is the smallest index such that
$f_{i,j}(x) = 0$. The half-open interval $[i, j)$ represents the
lifetime of $x$. If $f_{i,j}(x) \neq 0$ for all $j > i$, we say that
$x$ lives forever and its lifetime is the interval $[i, \infty)$.

The couples of intervals (birth time, death time) depends on the
choice of basis for each homology space $H_p(K_i)$. However, by the
Fundamental Theorem of Persistent
Homology~\cite{zomorodian_computing_2005}, we can choose basis vectors
in each homology space such that the collection of half-open intervals
is well-defined and unique. This construction is called a
\emph{barcode}~\cite{carlsson_topology_2009}.

\section{Topological summaries: barcodes and persistence diagrams}%
\label{sec:topol-summ}

Although it contains relevant topological information, the persistent
homology defined in the previous section cannot be used directly in
statistical methods. \emph{Topological summaries} are a compact
representation of persistent homology as elements of a metric
space. This is particularly useful in the context of statistical
analysis, e.g.\ when one needs to compare the output of a given
dataset to a null model.

One possible approach is to define a space in which we can project
barcodes and study their geometric properties. One such space is the
space of \emph{persistence
  diagrams}~\cite{edelsbrunner_computational_2010}.

\begin{defn}[Multiset]
  A \emph{multiset} $M$ is the couple $(A, m)$, where $A$ is the
  \emph{underlying set} of $M$, formed by its distinct elements, and
  $m : A\mapsto\mathbb{N}^*$ is the \emph{multiplicity function}
  giving the number of occurrences of each element of $A$ in $M$.
\end{defn}

\begin{defn}[Persistence diagrams]
  A \emph{persistence diagram} is the union of a finite multiset of
  points in $\overline{\mathbb{R}}^2$ with the diagonal
  $\Delta = \{(x,x) \;|\; x\in\mathbb{R}^2\}$, where every point of
  $\Delta$ has infinite multiplicity.
\end{defn}

One adds the diagonal $\Delta$ for technical reasons. It is convenient
to compare persistence diagrams by using bijections between them, so
persistence diagrams must have the same cardinality.

In some cases, the diagonal in the persistence diagrams can also
facilitate comparisons between diagrams, as points near the diagonal
correspond to short-lived topological features, so they are likely to
be caused by small perturbations in the data.

One can build a persistence diagram from a barcode by taking the union
of the multiset of (birth, death) couples with the diagonal
$\Delta$. \autoref{fig:ph-pipeline} summarises the entire pipeline.

\begin{figure}[ht]
  \centering
  \begin{tikzpicture}
    \tikzstyle{pipelinestep}=[rectangle,thick,draw=black,inner sep=5pt,minimum size=15pt]
    \node (data)[pipelinestep] {Data};
    \node (filt)[pipelinestep,right=1cm of data] {Filtered complex};
    %% \node (barcode)[pipelinestep,right=1cm of filt] {Barcodes};
    \node (dgm)[pipelinestep,right=1cm of filt] {Persistence diagram};
    \node (interp)[pipelinestep,right=1cm of dgm] {Interpretation};

    \draw[->] (data.east) -- (filt.west);
    %% \draw[->] (filt.east) -- (barcode.west);
    \draw[->] (filt.east) -- (dgm.west);
    \draw[->] (dgm.east) -- (interp.west);
  \end{tikzpicture}

  \caption{Persistent homology pipeline.}%
  \label{fig:ph-pipeline}
\end{figure}

One can define an operator $\dgm$ as the first two steps in the
pipeline. It constructs a persistence diagram from a subset of a
metric space, via persistent homology on a filtered complex.

We can now define several distances on the space of persistence
diagrams.

\begin{defn}[Wasserstein distance]\label{defn:wasserstein-dist}
  The \emph{$p$-th Wasserstein distance} between two diagrams $X$ and
  $Y$ is
  \[ W_p[d](X, Y) = \inf_{\phi:X\mapsto Y} \left[\sum_{x\in X} {d\left(x, \phi(x)\right)}^p\right] \]
  for $p\in [1,\infty)$, and:
  \[ W_\infty[d](X, Y) = \inf_{\phi:X\mapsto Y} \sup_{x\in X} d\left(x,
      \phi(x)\right) \] for $p = \infty$, where $d$ is a distance on
  $\mathbb{R}^2$ and $\phi$ ranges over all bijections from $X$ to
  $Y$.
\end{defn}

\begin{defn}[Bottleneck distance]\label{defn:bottleneck}
  The \emph{bottleneck distance} is defined as the infinite
  Wasserstein distance where $d$ is the uniform norm:
  $d_B = W_\infty[L_\infty]$.
\end{defn}

The bottleneck distance is symmetric, non-negative, and satisfies the
triangle inequality. However, it is not a true distance, as one can
come up with two distinct diagrams with bottleneck distance 0, even
on multisets that do not touch the diagonal $\Delta$.

\begin{figure}[ht]
  \centering
  \begin{tikzpicture}[dot/.style={draw,circle,inner sep=0pt,minimum size=4pt}]
    \draw[thick,->] (-0.3,0) -- (6,0) node [right] {birth};
    \draw[thick,->] (0,-0.3) -- (0,6) node [above right] {death};
    \draw[very thick,dashed] (0,0) -- (5.5,5.5);
    \node (b1) [dot,fill=blue!70] at (1,2) {};
    \node (r1) [dot,fill=red!70] at (1,2.5) {};
    \draw (b1) -- (r1);
    \node (b2) [dot,fill=blue!70] at (1.5,4.5) {};
    \node (r2) [dot,fill=red!70] at (2,5) {};
    \draw (b2) -- (r2);
    \node (b3) [dot,fill=blue!70] at (3,5.2) {};
    \node (r3) [dot,fill=red!70] at (2.5,5.5) {};
    \draw (b3) -- (r3);
    \node (b4) [dot,fill=blue!70] at (2,3) {};
    \draw (b4) -- (2.5,2.5);
    \node (b5) [dot,fill=blue!70] at (4.7,5.3) {};
    \draw (b5) -- (5,5);
    \node (b6) [dot,fill=blue!70] at (0.8,1.2) {};
    \draw (b6) -- (1,1);
    \node (r4) [dot,fill=red!70] at (2.8,3.2) {};
    \draw (r4) -- (3,3);
    \node (r5) [dot,fill=red!70] at (3.4,4.2) {};
    \draw (r5) -- (3.8,3.8);
  \end{tikzpicture}
  \caption{Bottleneck distance between two diagrams.}%
  \label{fig:bottleneck}
\end{figure}

\section{Stability}%
\label{sec:stability}

One of the most important aspects of topological data analysis is that
it is \emph{stable} with respect to small perturbations in the
data. More precisely, the second step of the pipeline
in~\autoref{fig:ph-pipeline} is Lipschitz with respect to a suitable
metric on filtered complexes and the bottleneck distance on
persistence
diagrams~\cite{cohen-steiner_stability_2007,chazal_persistence_2014}. First,
we define a distance between subsets of a metric
space~\cite{oudot_persistence_2015}.

\begin{defn}[Hausdorff distance]
  Let $X$ and $Y$ be subsets of a metric space $(E, d)$. The
  \emph{Hausdorff distance} is defined by
  \[ d_H(X,Y) = \max \left[ \sup_{x\in X} \inf_{y\in Y} d(x,y),
      \sup_{y\in Y} \inf_{x\in X} d(x,y) \right]. \]
\end{defn}

We can now give an appropriate stability
property~\cite{cohen-steiner_stability_2007,chazal_persistence_2014}.

\begin{prop}
  Let $X$ and $Y$ be subsets in a metric space. We have
  \[ d_B(\dgm(X),\dgm(Y)) \leq d_H(X,Y). \]
\end{prop}

\section{Algorithms and implementations}%
\label{sec:algor-impl}

Many algorithms have been developed to compute persistent
homology. The first one developed, and by far the most commonly used
is the so-called standard algorithm, introduced for the field
$\mathbb{Z}_2$ in~\cite{edelsbrunner_topological_2000}, and for
general fields in~\cite{zomorodian_computing_2005}. This algorithm
operates on the sequentially on the column of a boundary matrix. Its
complexity is therefore cubic in the number of simplices in the worst
case. It has been proven that this bound is
hard~\cite{morozov_persistence_2005}.

Many algorithms have since been developed to deliver heuristic
speed-ups in the case of sparse matrices. There are both sequential
algorithms, such as the dual algorithm~\cite{de_silva_persistent_2011,
  de_silva_dualities_2011}, and algorithms that introduce parallelism
in the computation, such as the distributed
algorithm~\cite{mcgeoch_distributed_2014}.

These algorithms have been implemented in many publicly-available
implementations in the last few years. For a complete review and
benchmarks of these implementations,
see~\cite{otter_roadmap_2017}. Here, we focus on implementations that
provide a Python interface and implement common data structures, such
as filtrations and persistence diagrams. State-of-the-art libraries
include Ripser~\cite{bauer_ripser:_2018},
DIPHA~\cite{reininghaus_dipha_2018}, GUDHI~\cite{maria_gudhi_2014},
and Dionysus~\cite{morozov_dionysus:_2018}. GUDHI and Dionysus are
under active development, with new versions released recently,
exposing a complete Python API and implementing various algorithms,
including multifield persistence and cohomology.

In this project, Dionysus~2 has been selected for its ease of use,
good documentation, and good performance~\cite{otter_roadmap_2017}. In
this project, we only use persistent homology over the field
$\mathbb{Z}_2$. Dionysus is also one of the few libraries to implement
zigzag persistence (\autoref{sec:zigzag-persistence}).

% \section{Discussion}%
% \label{sec:discussion}

% information thrown away in filtrations and in PH


\chapter{Topological Data Analysis on Networks}%
\label{cha:topol-data-analys}

\section{Persistent homology for networks}%
\label{sec:pers-homol-netw}

We now consider the problem of applying persistent homology to network
data. An undirected network is already a simplicial complex of
dimension 1. However, this is not sufficient to capture enough
topological information; we need to introduce higher-dimensional
simplices. If the network is connected, one method is to project the
nodes of a network onto a metric space~\cite{otter_roadmap_2017},
thereby transforming the network data into a point-cloud data. For
this, we need to compute the distance between each pair of nodes in
the network (e.g.\ with the shortest-path distance).

Various methods to project nodes onto a metric space (called
\emph{graph embeddings}) are
available~\cite{fouss_algorithms_2016}. These mapping try to preserve
the structure of the network as much as possible, e.g.\ by ensuring
that neighbours in the network are neighbours in the metric space
(according to the distance in that space), and vice-versa. A few
methods worth mentioning in this area are \emph{spectral methods},
which define the mapping according to the eigenvectors of a matrix
constructed from the graph. These methods have the advantage of
minimizing a well-defined criterion, which often admits an exact
solution, and can often be computed
exactly~\cite{fouss_algorithms_2016}. They include kernel principal
components analysis, multidimensional scaling, Markov diffusion maps
and Laplacian eigenmap. Other methods are \emph{latent space methods},
which produce an embedding using a physical analogy, such as spring
networks and attractive forces. These methods are often used for graph
drawing (i.e.\ embedding in 2 or 3-dimensional spaces), but can only
be approximated for large networks~\cite{fouss_algorithms_2016}.

Using these graph embeddings, one can get a point cloud in a Euclidean
space, and build a simplicial complex using one of the various methods
developed for point clouds. One such example is the Čech complex
(\autoref{sec:filtrations}).

Another common method, for weighted networks, is called the
\emph{weight rank-clique filtration}
(WRCF)~\cite{petri_topological_2013}, which filters a network based
on weights. The procedure works as follows:
\begin{enumerate}
\item Consider the set of all nodes, without any edge, to be
  filtration step~0.
\item Rank all edge weights in decreasing order $\{w_1,\ldots,w_n\}$.
\item At filtration step $t$, keep only the edges whose weights are
  larger than or equal to $w_t$, thereby creating an unweighted graph.
\item Define the maximal cliques of the resulting graph to be
  simplices.
\end{enumerate}

At each step of the filtration, we construct a simplicial complex
based on cliques; this is called a \emph{clique
  complex}~\cite{zomorodian_tidy_2010}. The result of the algorithm is
itself a filtered simplicial complex (\autoref{defn:filt}), because a
subset of a clique is necessarily a clique itself, and the same is
true for the intersection of two cliques.

This leads to one of the possibilities for applying persistent
homology to temporal networks. One can apply WRCF on a network,
obtaining a filtered complex, to which we can then apply persistent
homology.

This method can quickly become very computationally expensive, as
finding all maximal cliques (e.g.\ using the Bron--Kerbosch algorithm)
is a complicated problem, with an optimal computational complexity of
$\mathcal{O}\big(3^{n/3}\big)$~\cite{tomita_worst-case_2006}. In
practice, one often restrict the search to cliques of dimension less
than or equal to a certain bound $d_M$. With this restriction, the new
simplicial complex is homologically equivalent to the original: they
have the same homology groups up to $H_{d_M-1}$.

\section{Zigzag persistence}%
\label{sec:zigzag-persistence}

The persistent homology methods exposed in the previous sections
operate on filtrations which are nested sequences of simplicial
complexes:
\[ \cdots \longrightarrow K_{i-1} \longrightarrow K_i \longrightarrow
  K_{i+1} \longrightarrow \cdots, \] where each $\longrightarrow$
represents an inclusion map.

As we have seen in the previous section, filtrations can be built on
networks. Computing persistent homology therefore relies on
aggregating temporal networks, and then building a sequence of nested
simplicial complexes orthogonal to the time dimension.

Another approach would be to use the existing temporal sequence in the
network to build the filtration. The issue in this case is that the
sequence is no longer nested, as edges can be added or removed at each
time step (except for additive or dismantling temporal networks,
see~\autoref{defn:additive}). The development of \emph{zigzag
  persistence}~\cite{carlsson_zigzag_2008, carlsson_zigzag_2009}
solves this issue by introducing a novel way to compute persistent
homology on sequences of complexes that are no longer nested:
\[ \cdots \longleftrightarrow K_{i-1} \longleftrightarrow K_i
  \longleftrightarrow K_{i+1} \longleftrightarrow \cdots, \] where
each $\longleftrightarrow$ represents an inclusion map oriented
forwards or backwards.

To build this sequence from a temporal network, one can build a clique
complex at each time step. Edge additions and deletions will translate
to simplex additions and deletions in the sequence of simplicial
complexes. More details of this implementation is provided
in~\autoref{sec:zigzag-persistence-1}.

Note that zigzag persistence is related to the more general concept of
\emph{multi-parameter persistence}~\cite{carlsson_theory_2009,
  dey_computing_2014}, where simplicial complexes can be filtered with
more than one parameter. It is an active area of research, especially
as the fundamental theorem of persistent homology is no longer valid
with more than one parameter, and there are significant challenges in
the visualization of ``barcodes'' for 2-parameter persistent
homology~\cite{otter_roadmap_2017}.

The complexity of the zigzag algorithm is cubic in the maximum number
of simplices in the complex~\cite{carlsson_zigzag_2009}, which is
equivalent to the worst-case complexity of the standard algorithm for
persistent homology (\autoref{sec:algor-impl}). In practice however,
zigzag computation tend to be much longer than their standard
counterparts. Computing zigzag persistence on a temporal network is
more costly than computing persistent homology on the weight rank
clique filtration of the aggregated graph.

The library Dionysus~\cite{morozov_dionysus:_2018} is the only one to
implement zigzag persistence at the time of this writing. As
implementation of the zigzag algorithm is not
straightforward~\cite{carlsson_zigzag_2009, maria_computing_2016},
Dionysus was the most logical option for the topological study of
temporal networks.

\chapter{Persistent Homology for Machine-Learning Applications}%
\label{cha:pers-homol-mach}

The output of persistent homology is not directly usable by most
statistical methods. For example, barcodes and persistence diagrams,
which are multisets of points in $\overline{\mathbb{R}}^2$, are not
elements of a metric space in which one can perform statistical
computations.

The distances between persistence diagrams defined
in~\autoref{sec:topol-summ} allow one to compare different
outputs. From a statistical perspective, it is possible to use a
generative model of simplicial complexes and to use a distance between
persistence diagrams to measure the similarity of our observations
with this null model~\cite{adler_persistent_2010}. This would
effectively define a metric space of persistence diagrams. It is even
possible to define some statistical summaries (means, medians,
confidence intervals) on these
spaces~\cite{turner_frechet_2014,munch_probabilistic_2015}.

The issue with this approach is that metric spaces do not offer enough
algebraic structure to be amenable to most machine-learning
techniques. Many of these methods, such as principal-components
analysis (PCA) and support vector machines (SVMs) require a Hilbert
structure on the feature space~\cite{carriere_sliced_2017,
  chazal_persistence_2014}. Equipped with this structure, one can then
define common operations such as addition, average or scalar product
on features, which then facilitate their use in machine learning. One
of the most recent development in the study of topological summaries
has been to find mappings between the space of persistence diagrams
and Banach spaces\cite{adams_persistence_2017,
  bubenik_statistical_2015, kwitt_statistical_2015,
  kusano_kernel_2017}. (The definitions of common topological
structures can be found in \autoref{cha:topology}.)

\section{Vectorization methods}%
\label{sec:vect-meth}

The first possibility is to build an explicit feature map. Each
persistence diagram is projected into a vector of $\mathbb{R}^n$, on
which one can then build a suitable Hilbert structure.

The main examples in this category are persistence
landscapes~\parencite{bubenik_statistical_2015} and persistence
images~\cite{adams_persistence_2017}.

\subsection{Persistence landscapes}

Persistence landscapes~\cite{bubenik_statistical_2015} give a way to
project barcodes to a space where it is possible to add them
meaningfully. It is then possible to define means of persistence
diagrams, as well as other summary statistics.

The function mapping a persistence diagram to a persistence landscape
is \emph{injective}, but no explicit inverse exists to go back from a
persistence landscape to the corresponding persistence
diagram. Moreover, a mean of persistence landscapes does not
necessarily have a corresponding persistence diagram.

\begin{defn}[Persistence landscape]
  The persistence landscape of a diagram $D = {\{(b_i,d_i)\}}_{i=1}^n$
  is the set of functions $\lambda_k: \mathbb{R} \mapsto \mathbb{R}$,
  for $k\in\mathbb{N}$, such that
  \[ \lambda_k(x) = k\text{-th largest value of } {\{f_{(b_i,
      d_i)}(x)\}}_{i=1}^n, \] (and $\lambda_k(x) = 0$ if the $k$-th
largest value does not exist), where $f_{(b,d)}$ is a piecewise-linear
function defined by:
  \[ f_{(b,d)} =
    \begin{cases}
      0,& \text{if }x \notin (b,d),\\
      x-b,& \text{if }x\in (b,\frac{b+d}{2}),\\
      -x+d,& \text{if }x\in (\frac{b+d}{2},d)\,.
    \end{cases}
  \]
\end{defn}

Moreover, one can show that persistence landscapes are stable with
respect to the $L^p$ distance, and that the Wasserstein and bottleneck
distances are bounded by the $L^p$
distance~\cite{bubenik_statistical_2015}. We can thus view the
landscapes as elements of a Banach space in which we can perform the
statistical computations.

\subsection{Persistence images}

Persistence images~\cite{adams_persistence_2017} consist in a
convolution of the persistence diagram with a probability
distribution, followed by a discretization of the resulting
distribution in order to obtain a finite-dimensional vector. Most of
the following section is derived from the original
paper~\cite{adams_persistence_2017}.

\begin{defn}[Persistence surface]
  Let $B$ be a persistence diagram, and
  $T : \mathbb{R}^2\mapsto\mathbb{R}^2$ the linear transformation
  $T(x,y) = (x, y-x)$. Let $\phi_u:\mathbb{R}^2\mapsto\mathbb{R}$ be a
  differentiable probability density function with mean
  $u\in\mathbb{R}^2$, and $f$ a non-negative weighting function which
  is zero along the horizontal axis, continuous, and piecewise
  differentiable.

  The \emph{persistence surface} associated to $B$ is the function
  $\rho_B:\mathbb{R}^2\mapsto\mathbb{R}$ such that
  \[ \rho_B(z) = \sum_{u\in T(B)} f(u) \phi_u(z). \]
\end{defn}

Then, one needs to reduce the persistence surface to a
finite-dimensional vector by discretizing a subdomain of $\rho_B$ and
integrating it over each region.

\begin{defn}[Persistence image]
  Let $\rho_B$ be the persistence surface of a persistence diagram
  $B$. We fix a grid on the plane with $n$ cells (called
  \emph{pixels}). The \emph{persistence image} of $B$ is the
  collection of pixels, where for each cell $p$,
  \[ {I(\rho_B)}_p = \iint_p \rho_B \diff y \diff x. \]
\end{defn}

There are three parameters:
\begin{itemize}
\item the resolution of the grid overlaid on the persistence surface,
\item the probability distribution, which is often taken as a Gaussian
  distribution centred on each point (one still needs to choose an
  appropriate variance),
\item the weight function, which must be zero on the horizontal axis
  (which corresponds to the diagonal $\Delta$ before transformation by
  the function $T$), continuous, and piecewise differentiable in order
  for the stability results to hold. Generally, weighting functions
  are taken non-decreasing in $y$ in order to weight points of higher
  persistence more heavily.
\end{itemize}

All of these choices are non-canonical, but the classification
accuracy on most tasks seem to be robust to the choice of resolution
and variance of the Gaussian
distribution~\cite{zeppelzauer_topological_2016}.

It is also important to note that points with infinite persistence are
ignored by the weighting function $f$. Persistence images are
therefore not suitable in applications where these features can be
important to consider.

Persistence images are stable with respect to the 1-Wasserstein
distance between persistence diagrams (and with respect to the $L^1$,
$L^2$, and $L^\infty$ distances between
images)~\cite{adams_persistence_2017}.

In practice, persistence images are interesting because they project
persistence diagrams in a Euclidean space. Compared to persistence
landscape, one can apply a broader range of machine-learning
techniques. It has also been observed that persistence images
outperform performance landscapes in many classification tasks, with a
comparable computational efficiency~\cite{adams_persistence_2017}.

% \subsection{Tropical and arctic semirings}

% \cite{kalisnik_tropical_2018}

\section{Kernel-based methods}%
\label{sec:kernel-based-methods}

The other possibility is to define feature maps \emph{implicitly} by
building kernels on persistence diagrams. Such a kernel allows to use
a wide range of kernel-based machine-learning methods.

Let us recall the general framework of kernel
methods~\cite{muandet_kernel_2017, sejdinovic_advanced_2018}.

\begin{defn}[kernel]
  A function $k:X\times X \mapsto \mathbb{R}_+$ on a non-empty set $X$
  is a \emph{kernel} if there exist a Hilbert space $\mathcal{H}$ and
  a map $\phi:X\mapsto\mathcal{H}$ such that
  \[ \forall x, y \in X,\; k(x,y) = {\langle \phi(x), \phi(y)
      \rangle}_{\mathcal{H}}. \]

  The Hilbert space $\mathcal{H}$ is called the \emph{feature space}
  and the function $\phi$ is called the \emph{feature map}.
\end{defn}

As inner products are positive definite, so are kernels, since they
are inner products on feature maps.

\begin{defn}[Reproducing kernel]
  Let $\mathcal{H}$ be a Hilbert space of functions from a non-empty
  set $X$ to $\mathbb{R}$. A function $k:X\mapsto\mathbb{R}$ is called
  a \emph{reproducing kernel} of $\mathcal{H}$ if it satisfies:
  \begin{itemize}
  \item $\forall x\in X,\; k(\cdot,x)\in\mathcal{H}$,
  \item
    $\forall x\in X, \forall f\in\mathcal{H},\; {\langle f,
      k(\cdot,x)\rangle}_{\mathcal{H}} = f(x)$.
  \end{itemize}
\end{defn}

Note that every reproducing kernel is a kernel, with feature space
$\mathcal{H}$ and feature map $\phi:x \mapsto k(\cdot,x)$. In this
case, $\phi$ is called the \emph{canonical feature map}: the features
are not explicited as vectors of $\mathbb{R}^n$, but as functions on
$X$.

If $\mathcal{H}$ has a reproducing kernel, it is called a
\emph{reproducible kernel Hilbert space} (RKHS). The important result
here is the \emph{Moore-Aronszajn
  theorem}~\cite{berlinet_reproducing_2011}: for every positive
definite function $k$, there exists a unique RKHS with kernel $k$.

We can now build a feature space with a Hilbert structure without
defining explicitly the feature map. Defining a kernel, i.e.\ any
positive definite function, on persistence diagrams is enough to
guarantee the existence of a unique RKHS with the adequate structure
to perform machine-learning tasks.

The following sections will define some relevant kernels.

\subsection{Sliced Wasserstein kernel}%
\label{sec:swk}

The sliced Wasserstein kernel is a new kernel on persistence diagrams
introduced by Carrière et al.\ in~\cite{carriere_sliced_2017}. The
general idea is to intersect the plane by lines going through the
origin, and projecting the points of the persistence diagrams onto
these lines, computing the distance between the diagrams as the
distance between measures on the real line. These distances are then
integrated over all the possible lines passing through the origin.

The formal definition (taken from~\cite{carriere_sliced_2017}) relies
on the \emph{1-Wasserstein distance} between measures on~$\mathbb{R}$.

\begin{defn}[1-Wasserstein distance]
  Lt $\mu$ and $\nu$ be two non-negative measures on $\mathbb{R}$ such
  that $\mu(\mathbb{R}) = \nu(\mathbb{R})$. The 1-Wasserstein distance
  between $\mu$ and $\nu$ is
  \[ \mathcal{W}(\mu, \nu) = \inf_{f} \int_{\mathcal{R}} f(x) \left[
    \mu(\diff x) - \nu(\diff x) \right], \]
  where $f$ is 1-Lipschitz.
\end{defn}

One can now define formally the sliced Wasserstein kernel.

\begin{defn}[Sliced Wasserstein kernel]
  Let $\mathbb{S}_1$ be the $L_2$-distance sphere in
  $\mathbb{R}^2$. Given $\theta\in\mathbb{S}_1$ let $L(\theta)$ be the
  line $\{\lambda\theta : \lambda\in\mathbb{R}\}$, and $\pi_\theta$
  the orthogonal projection onto $L(\theta)$. Let $\pi_\Delta$ be the
  orthogonal projection on the diagonal.

  Let $D_1$ and $D_2$ be two persistence diagrams, and let
  \[\mu_1^\theta = \sum_{p\in D_1} \delta_{\pi_\theta(p)} \qquad\text{and}\qquad
    \mu_{1\Delta}^\theta = \sum_{p\in D_1}
    \delta_{\pi_\theta\circ\pi_\Delta(p)},\] and similarly for
  $\mu_2^\theta$ and $\mu_{2\Delta}^\theta$.

  The sliced Wasserstein distance is defined as
  \[ SW(D_1, D_2) = \frac{1}{2\pi} \int_{\mathbb{S_1}}
    \mathcal{W}(\mu_1^\theta + \mu_{2\Delta}^\theta,\; \mu_2^\theta +
    \mu_{1\Delta}^\theta) \diff\theta. \]
\end{defn}

One can show that $SW$ is negative
definite~\cite{carriere_sliced_2017}. The function $k_{SW}$ defined as
\[ k_{SW}(D_1, D_2) = \exp\left(-\frac{SW(D_1,D_2)}{2\sigma^2}\right) \]
is therefore a valid kernel, called the \emph{sliced Wasserstein
  kernel}.

\paragraph{Stability}

It can be shown that the sliced Wasserstein kernel is
\emph{equivalent} to the 1-Wasserstein distance between persistence
diagrams (\autoref{defn:wasserstein-dist}). (For a definition of
metric equivalence, see~\autoref{cha:topology}.)

\paragraph{Approximate computation}

In practice, $k_{SW}$ can be approximated by sampling $M$ directions
between $-\pi/2$ and $\pi/2$. For each direction $\theta_i$ and for
each persistence diagram $D$, one computes the scalar products between
the points of the diagram and $\theta_i$, and sorts them into a vector
$V_{\theta_i}(D)$. The $L_1$-distance between the vectors
corresponding to each diagram is then averaged over the samples
directions:
\[ SW_M(D_1, D_2) = \frac{1}{M} \sum_{i=1}^M {\lVert V_{\theta_i}(D_1)
    - V_{\theta_i}(D_2) \rVert}_1. \]

The complete approximate computation is detailed
in~\autoref{algo:swk}. It has a complexity of
$\mathcal{O}(MN\log(N))$, where $N$ is an upper bound on the
cardinality of the persistence diagrams.

\begin{algorithm}[ht]
  \caption{Approximate computation of the sliced Wasserstein kernel.}\label{algo:swk}
  \DontPrintSemicolon%
  \KwIn{$D_1 = \{p_1^1,\ldots,p_{N_1}^1\}, D_2 = \{p_1^2,\ldots,p_{N_1}^2\}, M$}
  \KwOut{$SW$}
  Add $\pi_\Delta(D_1)$ to $D_2$ and vice-versa\;
  $SW \leftarrow 0$\;
  $\theta \leftarrow -\pi/2$\;
  $s \leftarrow \pi/M$\;
  \For{$i \leftarrow 1$\KwTo$M$}{
    Store the products $\langle p_k^1, \theta \rangle$ in an array $V_1$\;
    Store the products $\langle p_k^2, \theta \rangle$ in an array $V_2$\;
    Sort $V_1$ and $V_2$ in ascending order\;
    $SW \leftarrow SW + s {\lVert V_1 - V_2 \rVert}_1$\;
    $\theta \leftarrow \theta + s$\;
  }
  $SW \leftarrow SW/\pi$\:
\end{algorithm}

\subsection{Persistence scale-space kernel}%
\label{sec:pers-scale-space}

The persistent scale-space kernel
(PSSK)~\cite{reininghaus_stable_2015,kwitt_statistical_2015} is
another kernel on persistence diagrams. The following overview is
summarised from~\cite{reininghaus_stable_2015}. The general idea is to
represent a diagram $D$ as a sum of Dirac deltas centred on each point
of $D$. This representation is a natural projection onto the sapce of
functionals, which has a Hilbert structure.

However, this representation does not take into account the distance
of the points of $D$ to the diagonal. This is important since points
closed to the diagonal represent short-lived features, and are
therefore more likely to be noise. Do take this into account, the sum
of Dirac deltas is taken as the initial condition of a heat diffusion
on the half-plane above the diagonal, with a null boundary condition
on the diagonal itself.

This leads to the definition of the embedding as the solution of
partial differential equation, which admit an explicit solution in the
form of a positive definite kernel between persistence diagrams. This
kernel also depends on a scale parameter, which allows to control the
robustness of the embedding to noise.

This kernel also comes with stability guarantees, as it is
Lipschitz-continuous with respect to the 1-Wasserstein distance. It is
also fast, as the distance between two diagrams $D_1$ and $D_2$ can be
computed in $\mathcal{O}(\lvert D_1 \rvert \lvert D_2 \rvert)$, where
$\lvert D \rvert$ is the number of points in the diagram, or
approximated in $\mathcal{O}(\lvert D_1 \rvert + \lvert D_2 \rvert)$
with bounded error. In practice, empirical tests show that the
persistence scale-space kernel significantly outperforms the
persistence landscapes in shape classification
tasks~\cite{reininghaus_stable_2015}.

\subsection{Persistence weighted Gaussian kernel}%
\label{sec:pers-weight-gauss}

The persistence weighted Gaussian kernel
(PWGK)~\cite{kusano_kernel_2017} is actually a family of kernels on
persistence diagrams. Given a diagram $D$, one can define a measure
$\mu_D^w := \sum_{x\in D} w(x) \delta_x$, where $\delta_x$ is the
Dirac delta centred on $x$. The weight function $w$ can be chosen to
give more weight to points farther from the diagonal. One example is
$w(x) := \arctan\left(C {(\mathrm{death}(x) -
    \mathrm{birth}(x))}^p\right)$, with $C>0$ and $p\in\mathbb{N}^*$.

Then, given a kernel $k$ and the corresponding RKHS $\mathcal{H}_k$,
\[ \mu_D^w \mapsto \sum_{x\in D} w(x) k(\cdot, x) \] is an embedding
of $\mu_D^w$ in $\mathcal{H}_k$. The persistence weighted gaussian
kernel is obtained by choosing $k$ as the Gaussian kernel
$k_G(x,y) := \exp\left(-\frac{{\lVert x-y \rVert}^2}{2\sigma^2}
\right)$.

The PWGK is stable with respect to the bottleneck
distance~\cite{kusano_kernel_2017}, and allows for efficient
computation. If the persistent diagrams contain at most $n$ points,
computation of the kernel involves $\mathcal{O}(n^2)$ evaluations of
the kernel $k$. Similarly to the PSSK, an approximation is possible in
$\mathcal{O}(n)$.

Experimental results on shape classification with SVMs show a
significant improvement in accuracy over the PSSk, persistence images,
and persistent landscapes~\cite{kusano_kernel_2017}.

\section{Comparison}%
\label{sec:comparison}

Every vectorization exposed in the previous sections are injective and
stable with respect to some distance in the space of persistence
diagrams. None of them, however, are surjective, and no explicit
inverse exists.

Only one of these methods preserves the metric on the space of
persistence diagrams: the sliced Wasserstein kernel, due to its
equivalence to the 1-Wasserstein distance, as mentioned
in~\autoref{sec:swk}. As such, it is considered as the
state-of-the-art in kernel embeddings of persistence diagrams.

There are two broad classes of applications that require different
kinds of vectorization methods. On the one hand, if one needs to go
back from the feature space to the diagram space, the best bet is an
embedding that preserves distances, such as the sliced Wasserstein
kernels, or has strong stability guarantees, such as the persistent
weighted Gaussian kernel. These embeddings are best for distance-based
methods, such as multidimensional scaling or nearest neighbours
algorithms.

On the other hand, getting insights from individual points of a
diagram, in order to recover information about individual topological
features (such as cycles, holes, or voids), is a much harder, less
well-studied problem. For instance, to recover the topological
features of the mean of persistence diagrams, one would need to fit
one of the vectorization methods on the mean. For this, persistence
landscapes or images seem better suited.

This project focuses on clustering of networks. As such, conservation
of the metric and stability is extremely important. Due to the
theoretical guarantees, we will focus on the sliced Wasserstein
kernel, which is also significantly easier to implement in its
approximate version than the PSSK (which uses random Fourier
features~\cite{reininghaus_stable_2015}) and the PWGK (which uses the
fast Gauss transform~\cite{kusano_kernel_2017}).

\chapter{Temporal partitioning of networks}%
\label{cha:temp-part-netw}

\section{Problem statement}%
\label{sec:problem-description}

\subsection{Data}%
\label{sec:data}

Temporal networks represent an active and recent area of research. The
additional dimension adds complexity to the study of graphs. As such,
many methods that work well with graphs fail in the context of
temporal networks.

Temporal networks are much more difficult to visualize, which makes it
harder to uncover patterns
directly~\cite{holme_temporal_2012}. Moreover, there are many issues
in data collection. Complex interaction networks where each edge can
be either present or absent at each time step grow exponentially in
size with the number of nodes and the total data collection
time~\cite{holme_temporal_2012}. Empirical temporal networks also tend
to exhibit oversampling and noise, due to the nature of the
measurements. For instance, proximity networks can record an
interaction between two individuals if they walk close to each other
without actually interacting. New advances try to take into account
these limitations of data collection~\cite{sulo_meaningful_2010,
  newman_network_2018}.

In this study, we will consider temporal networks with \emph{contact}
interactions. In this context, interactions between nodes are supposed
to have a duration of~0, and \emph{oversampling} is used to represent
a long interaction. For instance, in a network sampled every
5~seconds, an interaction lasting for 30~seconds will be recorded in 6
consecutive time steps.

\subsection{Sliding windows}%
\label{sec:sliding-windows}

One possible solution to the study temporal networks is a partitioning
of the time scale using \emph{sliding windows}.

\begin{defn}[Temporal partitioning]\label{defn:partitioning}
  Let $G = (V, E, \mathcal{T}, \rho)$ a temporal network, and let
  $C = (c_1,\ldots,c_n)$ be a cover of $\mathcal{T}$ by non-empty
  intervals of $\mathbb{N}$.

  Then the sequence of temporal networks $(G_1,\ldots,G_n)$, where
  $G_i = (V, E, c_i, \rho_i)$ and
  $\rho_i(e, t) = \rho(e, t)\mathbb{1}_{t\in c_i}$, is a
  \emph{temporal partitioning} of $G$.

  The partitioning is \emph{uniform} if all intervals of $C$ have the
  same length. This length is called the \emph{temporal resolution} of
  the partitioning.
\end{defn}

In this project, we will only consider uniform partitioning of a
finite temporal domain, where the intersection of two consecutive
intervals have the same length. This intersection length is called the
\emph{overlap}.

\begin{figure}[!ht]
  \centering
  \begin{tikzpicture}
    \def\t{2}
    \def\s{.5}
    \foreach \x in {0,...,4}
      \draw[thick,|-|] ({2*\x*(\t - \s)},0) -- ({2*\x*(\t - \s) + \t},0);
    \foreach \x in {0,...,3}
      \draw[thick,|-|] ({2*\x*(\t - \s) + \t - \s},.4) -- ({2*\x*(\t - \s) + 2*\t - \s},.4);
    \draw[<->] (0,-.2) -- (\t,-.2) node [below,midway] {$\Delta t$};
    \draw[<->] ({\t - \s}, .6) -- (\t, .6) node [above,midway] {$s$};
    % \draw[thick,->] (0,-1) -- ({8*(\t - \s) + \t},-1) node [above] {$t$};
  \end{tikzpicture}
  \caption{Uniform temporal partitioning with resolution $\Delta t$ and overlap $s$.}%
  \label{fig:partitioning}
\end{figure}

The choice of temporal resolution and overlap have a significant
effect on the results of the analysis~\cite{ribeiro_quantifying_2013,
  krings_effects_2012, sulo_meaningful_2010}. Different tasks may
require specific parameters. A large resolution can overlook a
significant pattern, while small overlap may cut through significant
features, divided between two consecutive intervals.

\subsection{Classification}%
\label{sec:classification}

After partitioning the temporal network, it is possible to run any
kind of classification task on the resulting sequence of subnetworks.

If labels are available on each subnetwork, it is possible to run some
supervised learning tasks. In the more common case of unsupervised
learning, there are many possibilities, including the clustering of
all subnetworks, or the detection of change points, where the
structure of the network change
fundamentally~\cite{peel_detecting_2014}.

In this dissertation, we focus on unsupervised clustering of the
subnetworks in order to detect periodicity in the original temporal
network. Most machine-learning algorithms cannot take temporal
networks directly as inputs. It is thus necessary to \emph{vectorize}
these networks, i.e.\ to project them onto a metric space with a
structure suitable to the algorithm used. For instance, one could use
traditional statistical summaries of networks
(\autoref{sec:network-statistics}), or the topological methods and
their vectorizations discussed in the previous chapters.

The choice of vectorization depends on the choice of the clustering
algorithm itself. Some machine-learning techniques, such as support
vector machines, require a Hilbert structure on the input
space~\cite{carriere_sliced_2017, hastie_elements_2009}, while some,
like $k$-nearest neighbours or agglomerative clustering, only require
a metric space~\cite{hastie_elements_2009}. The feature space will
therefore restrict the set of clustering algorithms available.

\subsection{Applications}%
\label{sec:applications}

The persistent homology pipeline can be used to determine different
properties of temporal networks. This study focuses on determining the
\emph{periodicity} of a temporal network. By clustering the
subnetworks obtained by partitioning the temporal domain into sliding
windows, it is possible to determine if a temporal network is periodic
in its topological structure, and if so, to estimate its
period.

\section{The analysis pipeline}%
\label{sec:analysis-pipeline}

\subsection{General overview}%
\label{sec:general-overview}

\begin{figure}[p]
  \caption[Overview of the analysis pipeline.]{Overview of the
    analysis pipeline. New approaches introduced in this study are
    highlighted in \emph{italics}.}%
  \label{fig:pipeline}
  \centering
  %\footnotesize
  \begin{tikzpicture}[block_left/.style={rectangle,draw=black,thick,fill=white,text width=4cm,text centered,inner sep=6pt},
    block_right/.style={rectangle,draw=black,thick,fill=white,text width=8cm,text ragged,inner sep=6pt},
    line/.style={draw,thick,-latex',shorten >=0pt},
    dashed_line/.style={draw,thick,dashed,-latex',shorten >=0pt}]
    \matrix [column sep=2cm,row sep=1cm] {
      \node {\Large\textbf{General approach}};
      & \node {\Large\textbf{Specific pipeline}}; \\
      \node (dataset)[block_left] {Dataset};
      & \node (dataset_r)[block_right] {
        Data sources
        \begin{itemize}
        \item Generative model (\ref{sec:gener-model-peri})
          \begin{itemize}
          \item Erdős-Rényi, Watts-Strogatz models
          \item periodic distribution of interactions
          \end{itemize}
        \item Social networks data (\ref{sec:datasets})
        \end{itemize}
      }; \\
      \node (representation)[block_left] {Data representation};
      & \node (representation_r)[block_right]{
        Temporal networks
        \begin{itemize}
        \item Definition (\ref{sec:defin-basic-prop})
        \item Representation (\ref{sec:data-representation})
        \end{itemize}
      }; \\
      \node (processing)[block_left] {Data processing};
      & \node (processing_r)[block_right] {
      Temporal partitioning \emph{(standard)} (\ref{sec:sliding-windows-1})\\
      \emph{Novelty: Clustering of time windows}
      }; \\
      \node (analysis)[block_left] {Data analysis};
      & \node (analysis_r)[block_right] {
      Topological tools (\ref{sec:topological-analysis})\\
      \emph{Novelty: application of TDA to temporal networks}\\
      \emph{Novelty: comparison between WRCF and zigzag persistence for networks}
      }; \\
      \node (interpretation)[block_left] {Interpretation};
      & \node (interpretation_r)[block_right] {
        Clustering (\ref{sec:clustering})
        \begin{itemize}
        \item Distance matrix
        \item[] \emph{Novelty: comparison between bottleneck distance and SW kernel}
        \item Hierarchical clustering
      \end{itemize}
      }; \\
    };

    \begin{scope}[every path/.style=line]
      \path (dataset) -- (representation);
      \path (representation) -- (processing);
      \path (processing) -- (analysis);
      \path (analysis) -- (interpretation);
    \end{scope}
    \begin{scope}[every path/.style=dashed_line]
      \path (dataset) -- (dataset_r);
      \path (representation) -- (representation_r);
      \path (processing) -- (processing_r);
      \path (analysis) -- (analysis_r);
      \path (interpretation) -- (interpretation_r);
    \end{scope}
  \end{tikzpicture}
\end{figure}

The analysis pipeline consists in several steps:
\begin{enumerate}
\item Load the data: temporal networks are often distributed as
  \emph{interaction lists}. In these files, each line consists of two
  nodes and a timestamp, and thus represents one contact
  interaction. One can reconstruct the temporal network by extracting
  all timestamps of a given edge and adding them as an edge
  property. It is then easy to extract a subnetwork within a specific
  time interval.
\item Interaction networks are sometimes directed. In these cases, it
  is necessary to transform the network into an undirected one, as
  most method (particularly topological methods, such as WRCF and
  zigzag persistence) only work on undirected networks.
\item Using the methods discussed in~\autoref{sec:sliding-windows-1},
  the temporal domain is segmented into sliding windows and a list of
  subnetworks can be generated.
\item Features are extracted from each subnetwork. These features can
  be constructed from different kinds of persistent homology on
  networks, as discussed in~\autoref{sec:topological-analysis}.
\item Depending on the methods used, the feature space is equipped
  with a metric that can make it a Hilbert space or a simple metric
  space. In any case, a distance matrix representing pairwise
  distances between each subnetwork is computed.
\item Hierarchical clustering is applied to the distance matrix.
\end{enumerate}

The whole analysis pipeline is summarised in~\autoref{fig:pipeline}.

\subsection{Data representation}%
\label{sec:data-representation}

The data is represented in the algorithms as multigraphs. Each edge is
associated to a timestamp (an integer). Two nodes can be linked by
multiple edges, each one of them representing a time at which the edge
is present.

This representation allows for easy filtering, as one can extract a
temporal network in a given time interval by keeping only the edges
whose timestamp is included in the interval. One can also build the
underlying aggregated graph by ``collapsing'' multiple edges into a
single one.

It is important to note that the nodes of the network are completely
static and always present. This follows the temporal networks model
adopted in~\autoref{defn:temp-net}.

\subsection{Sliding windows}%
\label{sec:sliding-windows-1}

As mentioned in~\autoref{sec:sliding-windows}, we consider temporal
networks whose temporal domain is a finite interval of $\mathbb{N}$.

For a temporal resolution $\Delta t$ and an overlap $s$, we compute
the temporal partitioning as follows.
\begin{enumerate}
\item Compute the length of the temporal domain $\mathcal{T}$.
\item Segment it into $N$ sliding windows of length $\Delta t$ with an
  overlap $s$.
\item Each subnetwork in the sequence contains only the interactions
  appearing during the corresponding sliding window.
\end{enumerate}

\begin{algorithm}[ht]
  \caption{Temporal partitioning of network with sliding
    windows.}\label{algo:partitioning}
  \DontPrintSemicolon%
  \SetKwData{Res}{res}
  \SetKwData{Times}{times}
  \SetKwData{WinLength}{window\_length}
  \SetKwData{Windows}{windows}
  \KwIn{Graph $G$, resolution \Res}
  \KwOut{List of subnetworks \Windows}
  \Times$\leftarrow$ list of timestamps in $G$\;
  $\WinLength\leftarrow\Res\times (\max(\Times) - \min(\Times))$\;
  \For{$i \leftarrow 0$ \KwTo$1/\Res - 1$}{
    \Windows[$i$] $\leftarrow$ subnetwork of $G$ containing all nodes, and
    edges whose timestamp is in
    $\left[ \min(\Times) + \WinLength\times i, \min(\Times) +
      \WinLength\times(i+1) \right]$\;
  }
\end{algorithm}

\subsection{Topological analysis}%
\label{sec:topological-analysis}

The major novelty in this analysis is to introduce topological
features in temporal network analysis. The idea is that the techniques
introduced in~\autoref{cha:tda-ph} will reveal additional structure in
networks that is not captured by traditional methods, and is relevant
for detecting periodicity or other important properties of temporal
networks.

Here, two different approaches are presented and compared. One is
focusing on the topology of the aggregated graph, using weight-rank
clique filtration, while the other leverages the temporal dimension by
using the more recent advances in generalized persistence,
specifically zigzag persistence.

\subsubsection{Aggregated graph persistence homology}%
\label{sec:aggr-graph-pers}

The first possibility to introduce topological features into the
feature map is to use weight-rank clique filtration
(\autoref{sec:pers-homol-netw}) on the aggregated static graphs.

For this, we associate to each edge in the network a weight
corresponding to the number of time steps in which it is present. For
an edge $e$ and a time interval $c_i$ (keeping the notations
of~\autoref{defn:partitioning}), the weight associated to $e$ is
\[ w(e) = \sum_{t\in c_i} \rho(e,t). \]

The resulting graph is called the \emph{aggregated graph} of the
temporal network on the time interval $c_i$. This graph being
weighted, it is possible to compute persistence diagrams using
weight-rank clique filtration (the algorithm is exposed
in~\autoref{sec:pers-homol-netw}).

% \subsubsection{Traditional network statistics}%
% \label{sec:trad-netw-stat}

% Network statistics (\autoref{sec:network-statistics}) can be used as
% feature maps. First, temporal networks are transformed into static
% graphs by keeping all edges that appear at least once in the network.

% One can then combine different centrality measures, average degree,
% average shortest path length, and others statistical summaries into a
% vector which is used as a feature vector. Equipped with the Euclidean
% distance, this feature space form a metric space suitable for
% machine-learning tasks.

\subsubsection{Zigzag persistence}%
\label{sec:zigzag-persistence-1}

The main drawback of WRCF persistence is the loss of any kind of
temporal information in the network. Three nodes can be detected as
being very close to one another even though their contacts might have
been in separate time steps. We can avoid aggregating the temporal
networks by using \emph{generalised persistence}, specifically zigzag
persistence as exposed in~\autoref{sec:zigzag-persistence}.

In practice, zigzag persistence is more computationally expensive than
WRCF persistence~\cite{carlsson_zigzag_2009}, and leads to lower
number of topological features at every dimension. Aggregating
networks tend to artificially create a lot of cliques that do not
appear in the original temporal network.

To compute zigzag persistence, the algorithm needs the \emph{maximal
  simplicial complex}, i.e.\ the union of all simplicial complexes in
the sequence. In the case of temporal networks, this is the set of
maximal cliques in the aggregated graph. Zigzag persistence can then
be computed from the list of times when each simplex enters or leaves
the maximal filtration. The following algorithm determines these
times:
\begin{enumerate}
\item Determine the maximal simplicial complex by computing the
  cliques in the aggregated graph.
\item For each time step $t$:
  \begin{itemize}
  \item Keep only the edges present at this time step (i.e.\ the edges
    $e$ such that $\rho(e, t) = 1$).
  \item Compute all the cliques in this network.
  \end{itemize}
\item For each clique in the maximal simplicial complex, determine
  where it is present and where it is absent in the sequence of lists
  of cliques.
\item Finally, determine the transition times in the presence arrays.
\end{enumerate}

This computation can be quite costly for large networks, even before
starting the main zigzag algorithm. Clique-finding is indeed an
NP-complete problem~\cite{karp_reducibility_2010}. This is thus by far
the most computationally expensive step of the analysis pipeline, and
is also more expensive than WRCF persistence.

\subsection{Clustering}%
\label{sec:clustering}

\subsubsection{Distance matrix}%
\label{sec:distance-matrix}

In order to cluster the subnetworks obtained by temporal partitioning
of the original network, one needs to introduce a notion of distance
between the topological features. Since the output of the previous
step in the analysis pipeline take the form of persistence diagrams,
two options are possible: a standard measure of distance between
diagrams (see~\autoref{sec:topol-summ}), or one of the vectorization
or kernel-based methods exposed in~\autoref{cha:pers-homol-mach}.

One of the main contributions of this study is to compare the
performance of the bottleneck distance (\autoref{defn:bottleneck}) and
of the sliced Wasserstein kernel (\autoref{sec:swk}) in the context of
network clustering.

A distance matrix is obtained by computing pairwise distances between
each pair of subnetworks obtained during the temporal partitioning
step. One important remark is that the distances considered compute
distances between persistence diagrams. However, persistence homology
returns a \emph{sequence} of such persistence diagrams for each
subnetwork, each diagram in the sequence corresponding to topological
features of a specific dimension. For the purposes of clustering,
0-dimensional features are not extremely interesting since they
correspond to connected components, and 2 or 3-dimensional diagrams
are often nearly empty except for very large subnetworks. It is
therefore appropriate to restrict our analysis to 1-dimensional
diagrams, which represent a good compromise. This is consistent with
what has been done for point cloud data
classification~\cite{carriere_sliced_2017}.

The bottleneck distance gives the space of persistence diagram a
metric-space structure. Meanwhile, the sliced Wasserstein kernel gives
the space a structure of a Hilbert space, which can be required
for several machine-learning techniques, such as support-vector
machines (SVMs) or principal components analysis (PCA).

For the implementation, we use the approximate computation of the
sliced Wasserstein kernel (\autoref{algo:swk}) sampled along 10
directions, which is actually faster in practice than the computation
of the bottleneck distance. For the computation of the bottleneck
distance, the diagram points that go to infinity are
excluded. According to the definition of the bottleneck distance, if
two diagrams do not have the same number of infinite points, the
distance is automatically infinity, which does not work well in
clustering algorithms. Moreover, this does not interfere with the
comparison between the bottleneck distance and the sliced Wasserstein
kernel, since infinite points are ignored by the kernel anyway.

\subsubsection{Hierarchical clustering}%
\label{sec:hier-clust}

To simplify the interpretation of the analysis and the comparison
between the different approaches, the clustering algorithm used is
\emph{hierarchical clustering}~\cite{hastie_elements_2009}.

The main advantage is that it does not require knowing in advance the
number of clusters that one is looking for. The only input is the
dissimilarity matrix, obtained from a single metric. It is necessary
here to use an algorithm that does not require the observations
themselves, as in this case they take the form of a persistence
diagram instead of a numeric vector. Moreover, kernel-based methods
are not applicable to the bottleneck distance since it does not confer
a Hilbert structure to the space of persistence diagram. By contrast,
hierarchical clustering only requires a valid measure of distance.

The hierarchical representation (or \emph{dendrogram}) is also
especially useful in the context of periodicity detection, since
periodicity can appear at various levels of the hierarchy.

Hierarchical clustering is performed in a bottom-up way, also called
\emph{agglomerative clustering}. Starting from the distance matrix,
with each observation in its own cluster, the algorithm merges rows
and columns at each step of the clustering, updating the distances
between the new clusters. To do that, it needs to compute the distance
between two clusters. Several approaches are possible to compute the
distance between two clusters $A$ and $B$ using a metric $d$:
\begin{itemize}
\item Complete linkage: the distance between $A$ and $B$ is the
  maximum distance between their elements
  \[ \max\left\{ d(x,y) : x\in A, y\in B \right\} \]
\item Single linkage: using the minimum distance
  \[ \min\left\{ d(x,y) : x\in A, y\in B \right\} \]
\item Average linkage: using the mean distance between the elements
  \[ \frac{1}{|A| |B|} \sum_{x\in A} \sum_{y\in B} d(x,y). \]
\end{itemize}

The implementation used is taken from the library
Scikit-Learn~\cite{pedregosa_scikit-learn:_2011}.

\chapter{Results and Discussion}%
\label{cha:results-discussion}

\section{Data}%
\label{sec:data-1}

\subsection{Generative model for periodic temporal networks}%
\label{sec:gener-model-peri}

In order to detect periodicity, one can generate a random temporal
network with a periodic structure.

We first build a random Erdős-Rényi graph. Starting from this base
graph, we generate a temporal stream for each edge independently. This
generative model is inspired by previous work on periodic temporal
networks~\cite{price-wright_topological_2015}.

For each edge, we generate a sequence of times in a predefined time
range $T$. For this, we choose uniformly at random a number of
interactions $n$ in $[0, T/2]$. We then generate at random a sequence
of $n$ times in $[0, T]$ from a density
\[ f(t) = \sin(f t) + 1, \] where $f$ is the frequency. The times are
then sorted.

\begin{figure}[ht]
  \centering
  \begin{subfigure}[b]{0.2\linewidth}
    \begin{tikzpicture}
      \clip (0,0) rectangle (4.0,4.0);
      \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
      \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
      \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
      \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
      \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
      \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
      \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
      \Edge[](1)(3)
      \Edge[](3)(4)
      \Edge[](3)(4)
      \Edge[](0)(5)
      \Edge[](0)(5)
      \Edge[](2)(5)
      \Edge[](4)(5)
      \Edge[](0)(6)
      \Edge[](1)(6)
      \Edge[](5)(6)
    \end{tikzpicture}
  \end{subfigure}
  \begin{subfigure}[b]{0.2\linewidth}
    \begin{tikzpicture}
      \clip (0,0) rectangle (4.0,4.0);
      \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
      \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
      \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
      \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
      \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
      \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
      \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
      \Edge[](1)(2)
      \Edge[](0)(3)
      \Edge[](2)(4)
      \Edge[](2)(6)
    \end{tikzpicture}
  \end{subfigure}
  \begin{subfigure}[b]{0.2\linewidth}
    \begin{tikzpicture}
      \clip (0,0) rectangle (4.0,4.0);
      \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
      \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
      \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
      \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
      \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
      \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
      \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
      \Edge[](0)(2)
      \Edge[](0)(2)
      \Edge[](1)(4)
      \Edge[](2)(5)
      \Edge[](4)(5)
      \Edge[](0)(6)
      \Edge[](2)(6)
      \Edge[](4)(6)
    \end{tikzpicture}
  \end{subfigure}
  \begin{subfigure}[b]{0.2\linewidth}
    \begin{tikzpicture}
      \clip (0,0) rectangle (4.0,4.0);
      \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
      \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
      \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
      \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
      \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
      \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
      \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
      \Edge[](0)(1)
      \Edge[](1)(2)
      \Edge[](2)(3)
      \Edge[](2)(3)
      \Edge[](0)(4)
      \Edge[](0)(4)
      \Edge[](1)(4)
      \Edge[](2)(4)
      \Edge[](3)(5)
      \Edge[](4)(6)
      \Edge[](5)(6)
    \end{tikzpicture}
  \end{subfigure}
  % \begin{subfigure}[b]{0.2\linewidth}
  %   \begin{tikzpicture}
  %     \clip (0,0) rectangle (4.0,4.0);
  %     \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
  %     \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
  %     \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
  %     \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
  %     \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
  %     \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
  %     \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
  %     \Edge[](0)(1)
  %     \Edge[](0)(3)
  %     \Edge[](1)(3)
  %     \Edge[](3)(5)
  %     \Edge[](1)(6)
  %   \end{tikzpicture}
  % \end{subfigure}
  \caption{Example of a random temporal network generated
    by~\autoref{algo:generative-model}.}%
  \label{fig:random-example}
\end{figure}

\begin{algorithm}[ht]
  \caption{Random temporal network
    generation.}\label{algo:generative-model}
  \DontPrintSemicolon%
  \SetKwData{Basegraph}{basegraph}
  \SetKwData{TimeRange}{time\_range}
  \SetKwData{Nodes}{nodes}
  \SetKwData{EdgeProb}{edge\_prob}
  \SetKwData{Frequency}{frequency}
  \SetKwData{Network}{network}
  \SetKwData{Times}{times}
  \KwIn{\Nodes, \EdgeProb, \TimeRange, \Frequency}
  \KwOut{\Network}
  $\Basegraph\leftarrow \mathrm{ErdősRényi}(\Nodes, \EdgeProb)$\;
  $\Network\leftarrow$ network with no edges and the vertices of \Basegraph\;
  \For{$e\in \Basegraph.\mathrm{edges}$}{
    $\Times\leftarrow \mathrm{random\_edge\_presences}(\TimeRange, \Frequency)$\;
    \For{$t\in\Times$}{Add $(e.\mathrm{source}, e.\mathrm{target}, t)$ to \Network}
  }
\end{algorithm}

The complete method to generate a random network is summarised
in~\autoref{algo:generative-model}. The function
\texttt{random\_edge\_presences} returns a sequence of periodic
times. An example of a small random network can be found
on~\autoref{fig:random-example}.

\subsection{Datasets}%
\label{sec:datasets}

The SocioPatterns dataset~\cite{isella_whats_2011} has been collected
during the \textsc{infectious} exhibition at the Science Gallery in
Dublin, Ireland from April 17th to July 17th, 2009. During this event,
a radio-frequency identification (RFID) device was embedded into each
visitor's badge (as part of an interactive exhibit). RFID devices
exchange radio packets when they are at a close range from each other
(between 1~m and 1.5~m), in a peer-to-peer fashion. The data
collection process is described in detail
in~\cite{cattuto_dynamics_2010}.

The devices are configured so that face-to-face interactions between
two individuals is accurately recorded with a probability of 99\% over
a period of 20~s, which is an appropriate time scale to record social
interaction. False positives are also extremely rare as RFID devices
have a very limited range and multiple radio packet exchanges are
required to record an interaction.

The event in Dublin recorded more than 230,000 contact interactions
between more than 14,000 visitors. The data is made available both in
the form of daily-aggregated static
networks~\cite{noauthor_infectious_2011} and as a list of contact
interactions (each element being a timestamp and two nodes
IDs)~\cite{noauthor_infectious_2011-1}.

\begin{figure}[ht]
  \centering
  \includegraphics[width=0.9\textwidth]{fig/sociopatterns.jpg}
  \caption[Aggregated networks for two different days of the
  SocioPatterns dataset.]{Aggregated networks for two different days
    of the SocioPatterns dataset. Nodes are colored from red to purple
    according to their arrival
    time. (Source:~\cite{isella_whats_2011}.)}%
  \label{fig:sp_plot}
\end{figure}

The interactions times of the SocioPatterns dataset show that there
are limited interactions between visitors entering the exhibition more
than one hour apart (see~\autoref{fig:sp_plot}). A consequence of this
is that the network diameter of the daily aggregated graphs connects
visitors entering the venue at successive times, as can be seen in the
figure.

Another interesting properties of these interactions is their
lengths. Most of the interaction last less than one minute, as can be
expected in the context of visitors in a museum exhibition. The
distribution of the interaction durations shows broad tails, decaying
slightly faster than a power law~\cite{isella_whats_2011}.

The temporal network has also been used in a variety of contexts from
percolation analysis and dynamical spreading to community
detection~\cite{isella_whats_2011}. These studies have confirmed that
topological criteria detect efficiently the edges acting as bridges
between communities~\cite{girvan_community_2002, holme_attack_2002}.

Many empirical temporal networks exhibit periodic
patterns~\cite{holme_modern_2015}. Many papers have explored
traditional network statistics and methods to uncover cyclic behaviour
in various datasets, mainly telecommunication
networks~\cite{jo_circadian_2012, aledavood_daily_2015,
  holme_network_2003, aledavood_digital_2015}.

Visualizations show significant variations in the patterns of the
daily aggregated graphs between weekdays and weekends (see the
SocioPatterns poster in appendix). This project will attempt to apply
the topological methods on an empirical dataset to try to detect
periodicity.

\section{Computational environment}%
\label{sec:comp-envir}

The analysis pipeline described in~\autoref{sec:analysis-pipeline} is
entirely implemented in Python. For these tests, we use Python~3.5,
with Numpy~1.15.1. The library
Dionysus~2.0.7~\cite{morozov_dionysus:_2018} is used for persistent
homology, zigzag persistence, and bottleneck distance. Networks are
handled by igraph~0.7.1, and machine-learning algorithms are provided
by Scikit-Learn~0.19.2~\cite{pedregosa_scikit-learn:_2011}.

The program runs on a shared-memory system with 32 cores of 3.6~GHz,
756~GB of RAM, and 1.6~TB of storage. It runs Ubuntu Linux
16.04.5. Dionysus was compiled from the latest development version
using GCC 5.4.0 with the optimization level -O3.

\section{Results}%
\label{sec:results}

\subsection{Generative model}%
\label{sec:generative-model}

For this study, random networks have been generated with the following
parameters (keeping the notations
from~\autoref{sec:gener-model-peri}):
\begin{itemize}
\item the base graph $G$ is an Erdős-Rényi graph with 40 nodes and an
  edge probability of 90\%,
\item the total time range $T$ for the sequence of times is 200,
\item the frequency $f$ is 15/200.
\end{itemize}

\autoref{fig:density} shows the density and sample times for a single
edge. A series of presence times like this is generated for each edge
in the base graph.

\begin{figure}[ht]
  \centering
  \includegraphics[width=.85\linewidth]{fig/density.pdf}
  \caption[Example of periodic density for edge times
  generation.]{Example of periodic density for edge times generation
    (blue), with random edge times (red), and the sliding windows
    (grey).}%
  \label{fig:density}
\end{figure}

The generated temporal network is then subdivided into 20
subnetworks. The sliding windows are also represented
on~\autoref{fig:density}.

From these subnetworks, persistence is computed, in the form of
persistence diagrams. An example can be found
in~\autoref{fig:diagram}.

\begin{figure}[ht]
  \centering
  \includegraphics[width=.5\linewidth]{fig/diagram.pdf}
  \caption{Example persistence diagram.}%
  \label{fig:diagram}
\end{figure}

\autoref{fig:gen} represents the output of hierarchical clustering for
a random network, with zigzag and WRCF persistence, and the sliced
Wasserstein kernel and the bottleneck distance. This clustering is
representative of what is obtained by applying the same pipelines to
many temporal networks generated by the random model
of~\autoref{sec:gener-model-peri}.

\begin{figure}[ht]
  \centering
  \begin{subfigure}[b]{0.46\linewidth}
    \centering
    \includegraphics[width=\linewidth]{fig/gen_zz_k.pdf}
    \caption{Zigzag persistence, sliced Wasserstein kernel}%
    \label{fig:gen_zz_k}
  \end{subfigure}\qquad
  \begin{subfigure}[b]{0.46\linewidth}
    \centering
    \includegraphics[width=\linewidth]{fig/gen_wrcf_k.pdf}
    \caption{WRCF, sliced Wasserstein kernel}%
    \label{fig:gen_wrcf_k}
  \end{subfigure}

  \begin{subfigure}[b]{0.46\linewidth}
    \centering
    \includegraphics[width=\linewidth]{fig/gen_zz_b.pdf}
    \caption{Zigzag persistence, bottleneck distance}%
    \label{fig:gen_zz_b}
  \end{subfigure}\qquad
  \begin{subfigure}[b]{0.46\linewidth}
    \centering
    \includegraphics[width=\linewidth]{fig/gen_wrcf_b.pdf}
    \caption{WRCF, bottleneck distance}%
    \label{fig:gen_wrcf_b}
  \end{subfigure}
  \caption{Hierarchical clustering with 10 clusters of a random
    temporal network.}%
  \label{fig:gen}
\end{figure}

As we can see on the figure, the hierarchical clustering algorithm is
able to determine the periodicity of the temporal networks when using
the sliced Wasserstein kernel. However, with the simple bottleneck
distance, the periodicity is not correctly detected. The periodicity
detection can be confirmed by moving further up in the dendrogram of
the clustering algorithm. With only 2 or 3 clusters, the low and high
sections of the density (\autoref{fig:density}) are still accurately
classified with the sliced Wasserstein kernel, while the subnetworks
are distributed randomly among the clusters with the bottleneck
distance.

Somewhat less clear is the comparison between zigzag persistence and
WRCF persistence. When generating many samples from the random
temporal network model, WRCF and sliced Wasserstein kernel clustering
is noisier and less consistent in its periodicity detection than its
zigzag persistence counterpart. This indicates that the aggregation of
the temporal subnetworks lead to the creation of artificial
topological features that introduce noise in the
dataset. (See~\autoref{sec:zigzag-persistence-1} for details on why
aggregation introduce artificial simplices in the temporal network.)

\subsection{SocioPatterns dataset}%
\label{sec:soci-datas}

\begin{figure}[ht]
  \centering
  \begin{subfigure}[b]{0.8\linewidth}
  \centering
  \includegraphics[width=.85\linewidth]{fig/sp_zz_k.pdf}
  \caption{Zigzag persistence, sliced Wasserstein kernel}%
  \label{fig:sp_zz_k}
\end{subfigure}
\begin{subfigure}[b]{0.8\linewidth}
  \centering
  \includegraphics[width=.85\linewidth]{fig/sp_wrcf_k.pdf}
  \caption{WRCF, sliced Wasserstein kernel}%
  \label{fig:sp_wrcf_k}
\end{subfigure}
\caption{Hierarchical clustering with 10 clusters of the SocioPatterns
  dataset.}%
\label{fig:sp}
\end{figure}

In the study of the SocioPatterns dataset, we expect to uncover a
periodicity on a scale of a day. Therefore, we would like to partition
the time range of the dataset into windows approximately the length of
a day. However, this leads to very sparse subnetworks, which do not
exhibit enough topological features for a complete study. Since the
main periodicity in the dataset is expected to be a weekday/weekend
succession, we choose a resolution of two days.

The previous section has demonstrated that the sliced Wasserstein
kernel was the most suitable to uncover periodicity in a temporal
network. The results, with zigzag persistence and WRCF, of the
hierarchical clustering algorithm are shown in~\autoref{fig:sp}.

\begin{figure}[ht]
  \centering
  \begin{subfigure}[b]{0.45\linewidth}
    \includegraphics[width=\linewidth]{fig/gen_zz_gram1.pdf}
    \caption{Generative model}%
    \label{fig:gram_gen}
  \end{subfigure}\qquad
  \begin{subfigure}[b]{0.45\linewidth}
    \includegraphics[width=\linewidth]{fig/sp_zz_gram1.pdf}
    \caption{SocioPatterns dataset}%
    \label{fig:gram_sp}
  \end{subfigure}
  \caption{Gram matrices of the sliced Wasserstein kernel with zigzag
    persistence.}%
  \label{fig:gram}
\end{figure}

However, the subnetworks do not cluster periodically in either
case. This is confirmed by visualizing the Gram matrix of the sliced
Wasserstein kernel (\autoref{fig:gram}). The Gram matrix obtained with
the generative model exhibit a cyclical pattern, while the one from
the SocioPatterns dataset do not show enough distinctions between the
subnetworks.

It is unclear whether this is due to the analysis pipeline, or to the
temporal network itself not exhibiting enough periodicity on a
topological level. To confirm this, one would need to compare our
analysis with one using traditional network statistics, such as the
ones in~\cite{jo_circadian_2012, aledavood_daily_2015,
  holme_network_2003, aledavood_digital_2015}. Other empirical
networks, such as telecommunication networks, may also exhibit more
obvious cyclical patterns, where topological features might be useful.

\chapter{Conclusions}%
\label{cha:conclusions}

\section{Topological data analysis of temporal networks}%
\label{sec:topol-data-analys}

Periodicity detection on our generative model has proven
successful. More importantly, topological features and persistent
homology seem to play an important part in the classification
task. The general idea of partitioning the time range of a temporal
network into sliding windows, and running an unsupervised clustering
algorithm, works in the context of periodicity detection.

More generally, we have introduced persistent homology and topological
data analysis methods for the study of temporal networks. Building on
previous work clustering different temporal network generative models
with persistent homology~\cite{price-wright_topological_2015}, we have
expanded both the methods used and the applications, solving the
real-world problem of periodicity detection. All in all, it is clear
that persistent homology is a promising new direction for the study of
temporal networks.

Topological data analysis is a recent field, with new methods and
approaches being constantly developed and improved. In this project,
we have compared different approaches. In the context of periodicity
detection, zigzag persistence is a small improvement over the
topological analysis of aggregated graphs using weight rank clique
filtration. If this result was confirmed by other studies, it would be
an interesting development, as it would imply that the temporal aspect
is essential and cannot be discarded easily when studying temporal
networks.

One of the most active research areas of topological data analysis has
been its applications in machine learning. Considerable efforts have
been deployed in the development of various vectorization techniques
to embed topological information into a feature space suitable for
statistical learning algorithms. In this project, a few of these
methods have been compared for their theoretical guarantees, and their
practical applications in periodicity detection. From a mathematical
point of view, kernels seem the most promising approach, by offering
strong stability properties and algebraic structures on the feature
space. This development leads to a broader class of applications in
machine learning where topological analysis can be useful. These
theoretical advances have translated into much better results for
periodicity detection. The simple bottleneck distance in the space of
diagram (with a structure of metric space) was not able to determine
any kind of periodicity in the random networks, whereas the sliced
Wasserstein kernel (embedding persistence diagrams in its RKHS, with
an metric equivalent to the distance in the space of persistence
diagrams) picked up the period accurately. This confirms previous work
on shape classification, where kernels on persistence diagrams
significantly outperformed other feature
embeddings~\cite{kusano_kernel_2017, reininghaus_stable_2015,
  carriere_sliced_2017}.

Finally, we have tried to apply the same analysis to detect
periodicity on real-world data, the SocioPatterns dataset. Our model
was not able to detect a periodicity with a change of patterns between
the weekdays and the weekends. This is unclear whether it is due to
the limits in some part of our analysis pipeline, or to the
periodicity in the network being non-topological in nature. A future
study might focus on combining topological features with traditional
network statistics.

\section{Future work}%
\label{sec:future-work}

Further study of topological features of temporal networks is
needed. We could imagine other applications than periodicity
detection, such as community
detection~\cite{girvan_community_2002}. Many standard methods are
difficult to adapt to temporal network models, and computational
topology could bring an additional perspective in these tasks, by
complementing traditional network statistics.

In the specific context of periodicity detection, this analysis can be
expanded by varying the parameters such as the resolution and the
overlap. It could be especially useful for inferring the period in a
temporal network.

One should also explore the other vectorization methods in the context
of periodicity detection. It would be interesting to know how
persistence images, or the other kernels, perform in this task. Last
but not least, it is essential to compare the performance of the
topological features with more traditional network statistics. It
would also be interesting to combine both aspects and use both set of
features in machine-learning tasks.

Finally, temporal networks seem to be the ideal context to apply
multidimensional persistence. For instance, the weight rank clique
filtration adds a ``weight'' dimension to the existing time
dimension. In theory, it would be possible to use this by constructing
a 2-parameter filtration on the network, and computing persistence on
it.


\appendix

\chapter{Topology}%
\label{cha:topology}

In the following chapter, we recall a few essential definitions in
topology. This is in large part taken from~\cite{golse_mat321_2015}.

In this chapter, all vector spaces will be over a field $\mathbb{K}$,
which is either the field of real numbers $\mathbb{R}$ or the field of
complex numbers $\mathbb{C}$.

\section{Metric spaces}%
\label{sec:metric-spaces}

\begin{defn}[Distance, metric space]
  An application $d : X \times X \mapsto \mathbb{R}^+$ is a \emph{distance}
  over $X$ if
  \begin{enumerate}[(i)]
  \item $\forall x,y\in X,\; d(x,y) = 0 \Leftrightarrow x=y$ (separation),
  \item $\forall x,y\in X,\; d(x,y) = d(y,x)$ (symmetry),
  \item $\forall x,y,z\in X,\; d(x,y) + d(y,z) \geq d(x,z)$ (triangle inequality).
  \end{enumerate}

  In this case, $(X, d)$ is called a \emph{metric space}.
\end{defn}

If $Y$ is a subset of $X$ and $X$ is a metric space (with the distance
$d$), then $(Y, d)$ is immediately a metric space itself. $d_Y$ is
called the \emph{induced metric} on $Y$.

%% example?

If $(X,d)$ is metric space, then for all $x\in X$ and $r>0$, the set
\[ B(x,r) := \{y\in X : d(x,y) < r\} \] is called the \emph{open ball}
centered at $x$ and of radius $r$. The \emph{closed ball} centered at
$x$ and of radius $r$ is defined by
\[ B_c(x,r) := \{y\in X : d(x,y) \leq r\}. \]

%% example/figure?

An important class of metric spaces is the one where the set $X$ is
itself a normed vector space.

\begin{defn}[Norm]
  Let $V$ be a vector space over $\mathbb{K}$. An application
  $N: V\mapsto\mathbb{R}^+$ is a \emph{norm} over $V$ if
  \begin{enumerate}[(i)]
  \item $\forall x\in V,\; N(x) = 0 \Leftrightarrow x=0$,
  \item
    $\forall x\in V, \forall \lambda\in\mathbb{K},\; N(\lambda x) =
    \lvert\lambda\rvert N(x)$,
  \item $\forall x,y\in V,\; N(x) + N(y) \geq N(x+Y)$.
  \end{enumerate}
\end{defn}

Let $(V,N)$ be a normed vector space. For every subset $X$ of $V$, one
can define $d(x,y) := N(x-y)$ for all $x,y\in X$. Using the properties
of the norm $N$, one can check easily that $d$ is a distance, and
therefore $(X,d)$ is a metric space.

%% examples?

There are many norms possible on a vector space. This brings the need
to compare these various norms.

\begin{defn}[Norm equivalence]
  Let $V$ be a vector space. Two norms $N_1$ and $N_2$ on $V$ are said
  to be \emph{equivalent} if there are two constants $C_1$ and $C_2$
  such that
  \[ \forall x\in V,\quad N_1(x) \leq C_1 N_2(x) \quad\text{and}\quad N_2(x)
    \leq C_2 N_1(x). \]
\end{defn}

Geometrically speaking, two norms are equivalent if the unit ball for
the norm $N_1$ contains a non-empty ball centred at 0 for the norm
$N_2$, and vice-versa.

%% all norms are equivalent in a vector space of finite dimension?

\section{Completeness}%
\label{sec:completeness}

\begin{defn}[Convergence]
  A sequence ${(x_n)}_{n\in\mathbb{N}}$ of elements of a metric space
  $(X,d)$ \emph{converges} to a limit $x$ if
  \[ \lim_{n\rightarrow\infty} d(x_n,x) = 0. \]
\end{defn}

\begin{defn}[Cauchy sequence]
  A sequence ${(x_n)}_{n\in\mathbb{N}}$ of elements of a metric space
  $(X,d)$ is a \emph{Cauchy sequence} if
  \[ \forall\varepsilon>0, \exists n_0\in\mathbb{N},\; \text{such
      that:}\; \forall n,m\geq n_0,\; d(x_n, x_m) < \varepsilon. \]
\end{defn}

Note that every convergent sequence is a Cauchy sequence, but the
opposite is not true in general. %% counter-example?

\begin{defn}[Completeness]
  A metric space $(X,d)$ is \emph{complete} if, and only if, every
  Cauchy sequence converges to an element of $X$.
\end{defn}

%% examples?

%% properties? would need open/close

\begin{defn}[Banach space]
  A \emph{Banach space} is a complete normed vector space.
\end{defn}

\section{Hilbert spaces}%
\label{sec:hilbert-spaces}

In this section, vector spaces are defined over~$\mathbb{C}$. The
theory extends easily to vector spaces over~$\mathbb{R}$.

An application $L$ between two $\mathbb{C}$-vector spaces $V$ and $W$
is said to be \emph{anti-linear} if
\[ \forall \lambda,\mu\in\mathbb{C}, \forall x,y\in V,\; L(\lambda x +
  \mu y) = \bar{\lambda} L(x) + \bar{\mu} y. \]

\begin{defn}[Hermitian product]
  An application $\langle\cdot,\cdot\rangle : V \times V \mapsto \mathbb{C}$ is
  \begin{enumerate}[(i)]
  \item a \emph{sesquilinear form} if $x\mapsto\langle x,y \rangle$ is
    linear and $y\mapsto\langle x,y \rangle$ is anti-linear,
  \item a \emph{Hermitian form} if it is sesquilinear and
    $\langle x,y \rangle = \overline{\langle y,x \rangle}$ for all
    $x,y\in V$,
  \item a \emph{Hermitian product} if it is a Hermitian form positive
    definite, i.e.\ if $\langle x,x \rangle > 0$ for all $x \neq 0$.
  \end{enumerate}
\end{defn}

\begin{rem}
  In the case of vector spaces over $\mathbb{R}$, sesquilinear forms
  are simply bilinear, Hermitian forms are symmetric bilinear, and
  Hermitian products are inner products.
\end{rem}

\begin{prop}[Cauchy-Schwartz inequality]
  Let $\langle\cdot,\cdot\rangle$ be a Hermitian product over
  $V$. Then, for all $x,y\in V$,
  \[ \lvert\langle x,y \rangle\rvert \leq \sqrt{\langle x,x \rangle}
    \sqrt{\langle y,y \rangle}, \] where the two sides are equal if
  and only if $x$ and $y$ are linearly dependent.
\end{prop}

\begin{proof}
  Suppose that $x \neq 0$ and $y \neq 0$ (otherwise the proposition is
  obvious). For all $t > 0$, we compute
  \[ \langle x - ty, x - ty \rangle = \langle x,x \rangle - 2t
    \mathrm{Re}\langle x,y \rangle + t^2 \langle y,y \rangle \geq
    0. \]

  Thus, for all $t > 0$,
  \[ 2 \mathrm{Re} \langle x,y \rangle \leq \frac{1}{t}\langle x,x
    \rangle + t \langle y,y \rangle. \]

  We minimize the right-hand side by choosing
  \[ t = \sqrt{\frac{\langle x,x \rangle}{\langle y,y \rangle}}, \]
  thus
  \[ \mathrm{Re} \langle x,y \rangle \leq \sqrt{\langle x,x \rangle}
    \sqrt{\langle y,y \rangle}. \]

  The inequality follows by replacing $x$ by $e^{i\theta}x$ and using the fact that
  \[ \forall z\in\mathbb{C},\; \lvert z \rvert =
    \sup_{\theta\in\mathbb{R}} \mathrm{Re}\left(e^{i\theta} z
    \right). \]

  The equality case follows from setting
  $\langle x - ty, x - ty \rangle = 0$.
\end{proof}

If $\langle\cdot,\cdot\rangle$ is a Hermitian product over $V$, it can
be verified easily that the form
\[ \lVert\cdot\rVert : x \mapsto \sqrt{\langle x,x \rangle} \] is a
norm over $V$. The triangle inequality comes from the Cauchy-Schwartz
formula. %% proof?

\begin{defn}[Pre-Hilbert space]
  A \emph{pre-Hilbert space} is a vector space $V$ with a Hermitian
  product $\langle\cdot,\cdot\rangle$ and the associated norm
  $\lVert\cdot\rVert$. It is a metric space for the distance
  $d(x,y) := \lVert x-y \rVert$.
\end{defn}

\begin{defn}[Hilbert space]
  A pre-Hilbert space $H$ is a \emph{Hilbert space} if
  $(H, \lVert\cdot\rVert)$ is a Banach space.
\end{defn}

\addtocounter{chapter}{1}
\addcontentsline{toc}{chapter}{\protect\numberline{\thechapter} Infectious SocioPatterns poster}
\includepdf{fig/infectious_poster.pdf}
% \includepdf{fig/infectious_poster_highres.pdf}\label{cha:infectious_poster}


\chapter{Code}%
\label{cha:code}

\section{\texttt{zigzag.py}}%
\label{sec:zigzagpy}

\inputminted{python}{zigzag.py}

\section{\texttt{wrcf.py}}%
\label{sec:wrcfpy}

\inputminted{python}{wrcf.py}

\section{\texttt{sliced\_wasserstein.py}}%
\label{sec:sliced_wassersteinpy}

\inputminted{python}{sliced_wasserstein.py}

\section{\texttt{generative.py}}%
\label{sec:generativepy}

\inputminted{python}{generative.py}

\section{\texttt{sociopatterns.py}}%
\label{sec:sociopatternspy}

\inputminted{python}{sociopatterns.py}

\section{\texttt{clustering.py}}%
\label{sec:clusteringpy}

\inputminted{python}{clustering.py}

\backmatter%

% \nocite{*}
\printbibliography%

\end{document}


%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% End: