2964 lines
122 KiB
TeX
2964 lines
122 KiB
TeX
\documentclass[a4paper,11pt,openany,extrafontsizes]{memoir}
|
||
|
||
\input{preamble}
|
||
|
||
\begin{document}
|
||
|
||
\pagestyle{plain}
|
||
\tightlists%
|
||
|
||
\begin{titlingpage}
|
||
\begin{center}
|
||
\vspace{1cm}
|
||
\textsf{\Huge{University of Oxford}}\\
|
||
\vspace{1cm}
|
||
\includegraphics{branding/beltcrest.png}\\
|
||
\vspace{2cm}
|
||
\Huge{\thetitle}\\
|
||
\vspace{2cm}
|
||
\large{by\\[14pt]\theauthor\\[8pt]St Catherine's College}
|
||
\vfill
|
||
%% Inkscape L-system
|
||
%% [C]++[C]++[C]++[C]++[C]
|
||
%% B=DA++EA----CA[-DA----BA]++;C=+DA--EA[---BA--CA]+;D=-BA++CA[+++DA++EA]-;E=--DA++++BA[+EA++++CA]--CA;A=
|
||
% \begin{tikzpicture}
|
||
% \pgfdeclarelindenmayersystem{Penrose}{
|
||
% \symbol{M}{\pgflsystemdrawforward}
|
||
% \symbol{N}{\pgflsystemdrawforward}
|
||
% \symbol{O}{\pgflsystemdrawforward}
|
||
% \symbol{P}{\pgflsystemdrawforward}
|
||
% \symbol{A}{\pgflsystemdrawforward}
|
||
% \symbol{+}{\pgflsystemturnright}
|
||
% \symbol{-}{\pgflsystemturnleft}
|
||
% \rule{M->OA++PA----NA[-OA----MA]++}
|
||
% \rule{N->+OA--PA[---MA--NA]+}
|
||
% \rule{O->-MA++NA[+++OA++PA]-}
|
||
% \rule{P->--OA++++MA[+PA++++NA]--NA}
|
||
% \rule{A->}
|
||
% }
|
||
% \draw[lindenmayer system={Penrose, axiom=[N]++[N]++[N]++[N]++[N],
|
||
% order=2, angle=36, step=4pt}]
|
||
% lindenmayer system;
|
||
% \end{tikzpicture}
|
||
% \vspace{2.2cm}
|
||
\vfill
|
||
\large{A dissertation submitted in partial fulfilment of the requirements for the degree of\\
|
||
Master of Science in Statistical Science}\\
|
||
\vspace{.5cm}
|
||
\large{\emph{Department of Statistics,\\ 24--29 St Giles, Oxford, OX1 3LB}}\\
|
||
\vspace{1cm} \large{\thedate}
|
||
\end{center}
|
||
\end{titlingpage}
|
||
|
||
%\chapterstyle{hangnum}
|
||
%\chapterstyle{ell}
|
||
%\chapterstyle{southall}
|
||
\chapterstyle{wilsondob}
|
||
|
||
\frontmatter
|
||
|
||
\cleardoublepage%
|
||
|
||
\vspace*{3cm}
|
||
\begin{abstract}
|
||
Temporal networks are a mathematical model to represent interactions
|
||
evolving over time. As such, they have a multitude of applications,
|
||
from biology to physics to social networks. The study of dynamics on
|
||
networks is an emerging field, with many challenges in modelling and
|
||
data analysis.
|
||
|
||
An important issue is to uncover meaningful temporal structure in a
|
||
network. We focus on the problem of periodicity detection in
|
||
temporal networks, by partitioning the time range of the network and
|
||
clustering the resulting subnetworks.
|
||
|
||
For this, we leverage methods from the field of topological data
|
||
analysis and persistent homology. These methods have begun to be
|
||
employed with static graphs in order to provide a summary of
|
||
topological features, but applications to temporal networks have
|
||
never been studied in detail.
|
||
|
||
We cluster temporal networks by computing the evolution of
|
||
topological features over time. Applying persistent homology to
|
||
temporal networks and comparing various approaches has never been
|
||
done before, and we examine their performance side-by-side with a
|
||
simple clustering algorithm. Using a generative model, we show that
|
||
persistent homology is able to detect periodicity in the topological
|
||
structure of a network.
|
||
|
||
We define two types of topological features, with and without
|
||
aggregating the temporal networks, and multiple ways of embedding
|
||
them in a feature space suitable for machine-learning
|
||
applications. In particular, we examine the theoretical guarantees
|
||
and empirical performance of kernels defined on topological
|
||
features.
|
||
|
||
Topological insights prove to be useful in statistical learning
|
||
applications. Combined with the recent advances in network science,
|
||
they lead to a deeper understanding of the structure of temporal
|
||
networks.
|
||
\end{abstract}
|
||
\vspace*{\fill}
|
||
|
||
\cleardoublepage%
|
||
|
||
\chapter{Acknowledgements}%
|
||
\label{cha:acknowledgements}
|
||
|
||
I would like to thank my supervisors, Dr Heather Harrington, Dr Renaud
|
||
Lambiotte, and Dr Mason Porter, from the Mathematical Institute, for
|
||
their continuous support and guidance from the very beginning. They
|
||
have allowed me to pursue my interests in networks and topological
|
||
data analysis while providing me with resources, ideas, and
|
||
motivation. They remained available to answer my questions and listen
|
||
to my ideas, and provided invaluable feedback at every stage of the
|
||
project.
|
||
|
||
I would also like to thank Dr Steve Oudot from École polytechnique,
|
||
who was the first to introduce me to the field of topological data
|
||
analysis, which led me to the original idea for the project. He was
|
||
also very helpful during the project, giving me advice and updates on
|
||
recent advances in persistent homology.
|
||
|
||
I also want to acknowledge the students and staff of the Department of
|
||
Statistics and St Catherine's college, who always provided a
|
||
stimulating work environment, along with excellent discussions.
|
||
|
||
Finally, my thanks go to my family and friends for their interest in
|
||
my project, because trying to explain it to people not acquainted with
|
||
the topic was, and remains, the best way to clarify my ideas and
|
||
organise my thoughts.
|
||
|
||
\cleardoublepage%
|
||
|
||
\tableofcontents
|
||
|
||
\clearpage
|
||
|
||
\listoffigures
|
||
|
||
\begingroup
|
||
\let\clearpage\relax
|
||
\listofalgorithms%
|
||
\addcontentsline{toc}{chapter}{List of Algorithms}
|
||
\endgroup
|
||
|
||
\clearpage
|
||
|
||
\mainmatter%
|
||
|
||
\chapter{Introduction}%
|
||
\label{cha:introduction}
|
||
|
||
\section{Temporal networks analysis}%
|
||
\label{sec:temp-netw-analys}
|
||
|
||
Networks are one of the most important mathematical concepts developed
|
||
in the last few centuries. They allow the representation of
|
||
interconnected data and complex systems. As such, the concept has been
|
||
applied to wide variety of problems, in biology and neuroscience,
|
||
physics, computer networks, and social science. In this context,
|
||
network science has emerged as a discipline of its own, combining
|
||
ideas and challenges from multiple fields of
|
||
study~\cite{newman_networks:_2010}.
|
||
|
||
\captionsetup[figure]{labelformat=empty}
|
||
|
||
\begin{wrapfigure}[15]{R}[.3cm]{.3\linewidth}
|
||
\centering
|
||
\vspace{-15pt}
|
||
\SetCoordinates[yAngle=45]
|
||
\begin{tikzpicture}[multilayer=3d]
|
||
\SetLayerDistance{-2}
|
||
\Plane[x=.5,y=.5,width=3,height=3,style={thin,dashed},layer=1]
|
||
\Plane[x=.5,y=.5,width=3,height=3,style={thin,dashed},layer=2]
|
||
\Plane[x=.5,y=.5,width=3,height=3,style={thin,dashed},layer=3]
|
||
\begin{Layer}[layer=1]
|
||
\node at (.5,.5)[below right] {Time $t_1$};
|
||
\end{Layer}
|
||
\begin{Layer}[layer=2]
|
||
\node at (.5,.5)[below right] {Time $t_2$};
|
||
\end{Layer}
|
||
\begin{Layer}[layer=3]
|
||
\node at (.5,.5)[below right] {Time $t_3$};
|
||
\end{Layer}
|
||
|
||
\Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7,layer=1]{0_1}
|
||
\Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7,layer=1]{1_1}
|
||
\Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7,layer=1]{2_1}
|
||
\Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7,layer=1]{3_1}
|
||
\Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7,layer=1]{4_1}
|
||
\Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7,layer=1]{5_1}
|
||
\Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7,layer=1]{6_1}
|
||
\Edge[](1_1)(3_1)
|
||
\Edge[](3_1)(4_1)
|
||
\Edge[](3_1)(4_1)
|
||
\Edge[](0_1)(5_1)
|
||
\Edge[](0_1)(5_1)
|
||
\Edge[](2_1)(5_1)
|
||
\Edge[](4_1)(5_1)
|
||
\Edge[](0_1)(6_1)
|
||
\Edge[](1_1)(6_1)
|
||
\Edge[](5_1)(6_1)
|
||
|
||
\Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7,layer=2]{0_2}
|
||
\Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7,layer=2]{1_2}
|
||
\Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7,layer=2]{2_2}
|
||
\Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7,layer=2]{3_2}
|
||
\Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7,layer=2]{4_2}
|
||
\Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7,layer=2]{5_2}
|
||
\Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7,layer=2]{6_2}
|
||
\Edge[](1_2)(2_2)
|
||
\Edge[](0_2)(3_2)
|
||
\Edge[](2_2)(4_2)
|
||
\Edge[](2_2)(6_2)
|
||
|
||
\Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7,layer=3]{0_3}
|
||
\Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7,layer=3]{1_3}
|
||
\Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7,layer=3]{2_3}
|
||
\Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7,layer=3]{3_3}
|
||
\Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7,layer=3]{4_3}
|
||
\Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7,layer=3]{5_3}
|
||
\Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7,layer=3]{6_3}
|
||
\Edge[](0_3)(2_3)
|
||
\Edge[](0_3)(2_3)
|
||
\Edge[](1_3)(4_3)
|
||
\Edge[](2_3)(5_3)
|
||
\Edge[](4_3)(5_3)
|
||
\Edge[](0_3)(6_3)
|
||
\Edge[](2_3)(6_3)
|
||
\Edge[](4_3)(6_3)
|
||
\end{tikzpicture}
|
||
\caption[Multilayer network.]{}%
|
||
\label{fig:multilayer}
|
||
\end{wrapfigure}
|
||
|
||
\captionsetup[figure]{labelformat=default}
|
||
|
||
An emerging trend in network science is the study of dynamics on
|
||
networks~\cite{holme_temporal_2012, holme_modern_2015,
|
||
porter_dynamical_2014}. Real-world systems, such as brains or social
|
||
groups, tend to evolve over time, and these changing networks have
|
||
given birth to the new field of network dynamics, where edges can
|
||
reconfigure over time. Mathematical modelling of temporal connectivity
|
||
patterns remain a difficult
|
||
problem~\cite{bassett_network_2017}. Recent advances in applied
|
||
mathematics have led to may concurrent representations, multilayer
|
||
networks~\cite{kivela_multilayer_2014} being one of the most
|
||
important.
|
||
|
||
Temporal networks bring new challenges in size, shape, and complexity
|
||
of data analysis, but also new opportunities with the development of
|
||
new empirical methods and theoretical advances. One of these advances
|
||
is the development of generative models that can be used to infer the
|
||
dynamic mechanisms taking place in real-world
|
||
systems~\cite{bazzi_generative_2016, gauvin_randomized_2018,
|
||
petri_simplicial_2018}.
|
||
|
||
Moreover, network theory naturally exposes many links with
|
||
topology. The purpose of networks lies in the representation of
|
||
\emph{structure}, while topology is the study of spaces and
|
||
\emph{connectedness}. As topological methods gain traction in data
|
||
science and statistical learning, they are also applied to more
|
||
complex data representations, including
|
||
networks~\cite{horak_persistent_2009, petri_topological_2013,
|
||
stolz_persistent_2017}. Topological features naturally complement
|
||
more traditional network statistics by focusing on mesoscale
|
||
structure.
|
||
|
||
|
||
\section{Related work}%
|
||
\label{sec:related-work}
|
||
|
||
Topological data analysis (TDA) is a recent
|
||
field~\cite{carlsson_topology_2009}. It was originally focused on
|
||
point cloud data, with only a recent shift towards network
|
||
data~\cite{horak_persistent_2009}. Various methods have been
|
||
developed, the main one being the weight-rank clique filtration
|
||
(WRCF)~\cite{petri_topological_2013}. Other examples of application of
|
||
TDA to networks using WRCF can be found in~\cite{otter_roadmap_2017}.
|
||
|
||
There has also been attempts to map the nodes of a network to points
|
||
in a metric space. For instance, the shortest-path distance between
|
||
nodes can be used to compute pairwise distances in the network. Note
|
||
that for this method to work properly, the network has to be
|
||
connected. Many methods can be used to build a simplicial complex from
|
||
a directed or undirected network~\cite{jonsson_simplicial_2008,
|
||
horak_persistent_2009}.
|
||
|
||
The main starting point for this project was the introduction of TDA
|
||
for the study of temporal networks
|
||
in~\cite{price-wright_topological_2015}. In this dissertation,
|
||
topological methods are introduced to classify temporal networks
|
||
randomly generated by different models. The objective of this study
|
||
was to uncover the temporal structure of a network in order to inform
|
||
its partitioning into ``snapshots''. Different methods to partition a
|
||
network were compared for the first time, and topological features
|
||
appeared to be relevant for distinguishing various temporal
|
||
distributions.
|
||
|
||
Finally, there has been an increasing interest in using the
|
||
topological structure of a dataset as an additional source of
|
||
information for a statistical learning model. This has led to the
|
||
development of topological descriptors suitable for use in various
|
||
learning contexts. Previous work on vectorizations and kernels on
|
||
topological features will be useful in the analysis of the structure
|
||
of temporal networks.
|
||
|
||
\section{Contributions}%
|
||
\label{sec:contributions}
|
||
|
||
The main contributions of this work are threefold:
|
||
\begin{itemize}
|
||
\item We make an attempt at temporal partitioning networks and
|
||
clustering the subnetworks, with immediate application for detecting
|
||
periodicity. Sliding windows and persistent homology have been used
|
||
in the context of periodicity detection
|
||
before~\cite{perea_sw1pers:_2015, perea_sliding_2017}, but never in
|
||
the context of temporal networks.
|
||
\item In general, topological methods have never been thoroughly
|
||
studied on temporal network data. The work
|
||
in~\cite{price-wright_topological_2015} is the first to introduce
|
||
the topic, but computation was limited due to the lack of available
|
||
libraries. Here, we introduce recent (from the last 2--3 years)
|
||
state-of-the-art topological methods and adapt them to temporal
|
||
networks.
|
||
\item Various methods to use topological features in a statistical
|
||
learning context and their trade-offs are exposed. The mathematical
|
||
background and practical considerations are leveraged to compare
|
||
them in the context of machine learning.
|
||
\item Finally, different topological approaches are compared. There
|
||
are different ways to build a simplicial filtration on a network,
|
||
and different manners of measuring distances between the outputs of
|
||
persistent homology in the context of machine learning. These
|
||
different methods are compared here with the objective of
|
||
periodicity detection in temporal networks.
|
||
\end{itemize}
|
||
|
||
|
||
\chapter{Graphs and Temporal Networks}%
|
||
\label{cha:temporal-networks}
|
||
|
||
\section{Definition and basic properties}%
|
||
\label{sec:defin-basic-prop}
|
||
|
||
In this section, we introduce the notion of temporal networks (or
|
||
temporal graphs). This is a complex notion, with many concurrent
|
||
definitions and interpretations.
|
||
|
||
After clarifying the notations, we restate the standard definition of
|
||
a non-temporal graph.
|
||
|
||
\begin{notation}
|
||
\begin{itemize}
|
||
\item $\mathbb{N}$ is the set of non-negative natural numbers
|
||
$0,1,2,\ldots$
|
||
\item $\mathbb{N}^*$ is the set of positive integers $1,2,\ldots$
|
||
\item $\mathbb{R}$ is the set of real numbers.
|
||
$\mathbb{R}_+ = \{x\in\mathbb{R} \;|\; x\geq 0\}$, and
|
||
$\mathbb{R}_+^* = \{x\in\mathbb{R} \;|\; x>0\}$.
|
||
\end{itemize}
|
||
\end{notation}
|
||
|
||
\begin{defn}[Graph]
|
||
A \emph{graph} is a couple $G = (V, E)$, where $V$ is a set of
|
||
\emph{nodes} (or \emph{vertices}), and $E \subseteq V\times V$ is a
|
||
set of \emph{edges}. A \emph{weighted graph} is defined by
|
||
$G = (V, E, w)$, where $w : E\mapsto \mathbb{R}_+^*$ is called the
|
||
\emph{weight function}.
|
||
\end{defn}
|
||
|
||
We also define some basic concepts that we will need later to build
|
||
simplicial complexes on graphs.
|
||
|
||
\begin{defn}[Clique]
|
||
A \emph{clique} is a set of nodes where each pair is adjacent. That
|
||
is, a clique $C$ of a graph $G = (V,E)$ is a subset of $V$ such that
|
||
for all $i,j\in C, i \neq j \implies (i,j)\in E$. A clique is said
|
||
to be \emph{maximal} if it cannot be augmented by any node, such
|
||
that the resulting set of nodes is itself a clique.
|
||
\end{defn}
|
||
|
||
Temporal networks can be defined in the more general framework of
|
||
\emph{multilayer networks}~\cite{kivela_multilayer_2014}. However,
|
||
this definition is much too general for our simple applications, and
|
||
we restrict ourselves to edge-centric time-varying
|
||
graphs~\cite{casteigts_time-varying_2012}. In this model, the set of
|
||
nodes is fixed, but edges can appear or disappear at different times.
|
||
|
||
In this study, we restrict ourselves to discrete time stamps. Each
|
||
interaction is taken to be instantaneous.
|
||
|
||
\begin{defn}[Temporal network]\label{defn:temp-net}
|
||
A \emph{temporal network} is a tuple
|
||
$G = (V, E, \mathcal{T}, \rho)$, where:
|
||
\begin{itemize}
|
||
\item $V$ is a set of nodes,
|
||
\item $E\subseteq V\times V$ is a set of edges,
|
||
\item $\mathbb{T}$ is the \emph{temporal domain} (often taken as
|
||
$\mathbb{N}$ or any other countable set), and
|
||
$\mathcal{T}\subseteq\mathbb{T}$ is the \emph{lifetime} of the
|
||
network,
|
||
\item $\rho: E\times\mathcal{T}\mapsto\{0,1\}$ is the \emph{presence
|
||
function}, which determines whether an edge is present in the
|
||
network at each time stamp.
|
||
\end{itemize}
|
||
The \emph{available times} of an edge are the set
|
||
$\mathcal{I}(e) = \{t\in\mathcal{T}: \rho(e,t)=1\}$.
|
||
\end{defn}
|
||
|
||
Temporal networks can also have weighted edges. In this case, it is
|
||
possible to have constant weights (edges can only appear or disappear
|
||
over time, and always have the same weight), or time-varying
|
||
weights. In the latter case, we can set the domain of the presence
|
||
function to be $\mathbb{R}_+$ instead of $\{0,1\}$, where by
|
||
convention a 0 weight corresponds to an absent edge.
|
||
|
||
\begin{defn}[Additive and dismantling temporal
|
||
networks]\label{defn:additive}
|
||
A temporal network is said to be \emph{additive} if for all $e\in E$
|
||
and $t\in\mathcal{T}$, if $\rho(e,t)=1$, then for all
|
||
$t'>t, \rho(e, t') = 1$. An additive network can only gain edges
|
||
over time.
|
||
|
||
A temporal network is said to be \emph{dismantling} if for all
|
||
$e\in E$ and $t\in\mathcal{T}$, if $\rho(e,t)=0$, then for all
|
||
$t'>t, \rho(e, t') = 0$. An dismantling network can only lose edges
|
||
over time.
|
||
\end{defn}
|
||
|
||
\section{Network statistics}%
|
||
\label{sec:network-statistics}
|
||
|
||
To analyse networks, network statistics are used. These are
|
||
low-dimensional summaries of important properties of a graph. Some of
|
||
them focus on local features, while some others concentrate on global
|
||
aspects. Note that the following only applies for graphs, and cannot
|
||
be used directly on temporal networks.
|
||
|
||
These definitions are taken from the reference work by
|
||
Newman~\cite{newman_networks:_2010}.
|
||
|
||
The first network statistics try to determine which vertices are the
|
||
most \emph{central}, which are the most ``important'' in the network.
|
||
|
||
\begin{defn}[Local clustering coefficient]
|
||
The \emph{local clustering coefficient} of a vertex $v$ is defined as
|
||
\[ C(v) = \frac{\sum_{u,w\in \mathcal{V}}
|
||
a_{u,v}a_{w,v}a_{u,w}}{\sum_{u,w\in \mathcal{V}, u\neq w}
|
||
a_{u,v}a_{w,v}}. \]
|
||
|
||
The \emph{average clustering coefficient} is
|
||
\[ \overline{C} = \frac{1}{\left|\mathcal{V}\right|}
|
||
\sum_{v\in\mathcal{V}} C(v). \]
|
||
\end{defn}
|
||
|
||
\begin{defn}[Global clustering coefficient]
|
||
The \emph{global clustering coefficient} or \emph{transitivity} is
|
||
\[ C = \frac{3\times\text{number of triangles}}{\text{number of
|
||
connected triples}}. \]
|
||
\end{defn}
|
||
|
||
Another interesting summary is the average shortest path between
|
||
vertices.
|
||
|
||
\begin{defn}[Path]
|
||
A \emph{path} between two vertices $v_0$ and $v_n$ is a sequence of
|
||
vertices $(v_0, v_1, \ldots, v_n)$ such that every consecutive pair
|
||
of vertices $(v_i, v_{i+1})$ is connected by an edge.
|
||
|
||
The \emph{length} of a path is the number of edges traversed along
|
||
the path. The distance $l(u,v)$ between vertices $u$ and $v$ is
|
||
defined as the length of the shortest path between $u$ and $v$.
|
||
\end{defn}
|
||
|
||
\begin{defn}[Average shortest path length]
|
||
The \emph{Average shortest path length} is defined as
|
||
\[ l =
|
||
\frac{1}{\left|\mathcal{V}\right|(\left|\mathcal{V}\right|-1)}
|
||
\sum_{u,v \in\mathcal{V}, u\neq v} l(u,v). \]
|
||
\end{defn}
|
||
|
||
Many other centrality measures exist, the most well-known being the
|
||
eigenvector centrality, Katz centrality, and PageRank. See chapter~7
|
||
of~\cite{newman_networks:_2010} for more details.
|
||
|
||
% \section{Examples of applications}%
|
||
% \label{sec:exampl-appl}
|
||
|
||
% \section{Network partitioning}%
|
||
% \label{sec:network-partitioning}
|
||
|
||
% Temporal networks are a very active research subject, leading to
|
||
% multiple interesting problems. The additional time dimension adds a
|
||
% significant layer of complexity that cannot be adequately treated by
|
||
% the common methods on static graphs.
|
||
|
||
% Moreover, data collection can lead to large amount of noise in
|
||
% datasets. Combined with large dataset sized due to the huge number of
|
||
% data points for each node in the network, temporal graphs cannot be
|
||
% studied effectively in their raw form. Recent advances have been made
|
||
% to fit network models to rich but noisy
|
||
% data~\cite{newman_network_2018}, generally using some variation on the
|
||
% expectation-maximization (EM) algorithm.
|
||
|
||
% One solution that has been proposed to study such temporal data has
|
||
% been to \emph{partition} the time scale of the network into a sequence
|
||
% of smaller, static graphs, representing all the interactions during a
|
||
% short interval of time. The approach consists in subdividing the
|
||
% lifetime of the network in \emph{sliding windows} of a given length.
|
||
% We can then ``flatten'' the temporal network on each time interval,
|
||
% keeping all the edges that appear at least once (or adding their
|
||
% weights in the case of weighted networks).
|
||
|
||
% This partitioning is sensitive to two parameters: the length of each
|
||
% time interval, and their overlap. Of those, the former is the most
|
||
% important: it will define the \emph{resolution} of the study. If it is
|
||
% too small, too much noise will be taken into account; if it is too
|
||
% large, we will lose important information. There is a need to find a
|
||
% compromise, which will depend on the application and on the task
|
||
% performed on the network. In the case of a classification task to
|
||
% determine periodicity, it will be useful to adapt the resolution to
|
||
% the expected period: if we expect week-long periodicity, a resolution
|
||
% of one day seems reasonable.
|
||
|
||
% Once the network is partitioned, we can apply any statistical learning
|
||
% task on the sequence of static graphs. In this study, we will focus on
|
||
% classification of time steps. This can be used to detect periodicity,
|
||
% outliers, or even maximise temporal communities.
|
||
|
||
\chapter{Topological Data Analysis and Persistent Homology}%
|
||
\label{cha:tda-ph}
|
||
|
||
\section{Basic constructions}%
|
||
\label{sec:basic-constructions}
|
||
|
||
\subsection{Homology}%
|
||
\label{sec:homology}
|
||
|
||
Our goal is to understand the topological structure of a metric
|
||
space. For this, we can use \emph{homology}, which consists of
|
||
associating a vector space $H_i(X)$ to a metric space $X$ and a
|
||
dimension $i$. The dimension of $H_i(X)$ gives us the number of
|
||
$i$-dimensional components in $X$: the dimension of $H_0(X)$ is the
|
||
number of path-connected components in $X$, the dimension of $H_1(X)$
|
||
is the number of holes in $X$, and the dimension of $H_2(X)$ is the
|
||
number of voids.
|
||
|
||
Crucially, these vector spaces are robust to continuous deformation of
|
||
the underlying metric space (they are \emph{homotopy
|
||
invariant}). However, computing the homology of an arbitrary metric
|
||
space can be extremely difficult. It is necessary to approximate it in
|
||
a structure that would be both combinatorial and topological in
|
||
nature.
|
||
|
||
\subsection{Simplicial complexes}%
|
||
\label{sec:simplicial-complexes}
|
||
|
||
To understand the topological structure of a metric space, we need a
|
||
way to decompose it in smaller pieces that, when assembled, conserve
|
||
the overall organisation of the space. For this, we use a structure
|
||
called a \emph{simplicial complex}, which is a kind of
|
||
higher-dimensional generalization of a graph.
|
||
|
||
The building blocks of this representation is the \emph{simplex},
|
||
which is the convex hull of an arbitrary set of points. Examples of
|
||
simplices include single points, segments, triangles, and tetrahedrons
|
||
(in dimensions 0, 1, 2, and 3 respectively).
|
||
|
||
\begin{defn}[Simplex]
|
||
A \emph{$k$-dimensional simplex} $\sigma = [x_0,\ldots,x_k]$ is the
|
||
convex hull of the set $\{x_0,\ldots,x_k\} \in \mathbb{R}^d$, where
|
||
$x_0,\ldots,x_k$ are affinely independent. $x_0,\ldots,x_k$ are
|
||
called the \emph{vertices} of $\sigma$, and the simplices defined by
|
||
the subsets of $\{x_0,\ldots,x_k\}$ are called the \emph{faces} of
|
||
$\sigma$.
|
||
\end{defn}
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{subfigure}[b]{.3\linewidth}
|
||
\centering
|
||
\begin{tikzpicture}
|
||
\tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
|
||
inner sep=0pt,minimum size=15pt]
|
||
\node (a)[point] at (0,0) {a};
|
||
\end{tikzpicture}
|
||
\caption{Single vertex}
|
||
\end{subfigure}%
|
||
%
|
||
\begin{subfigure}[b]{.3\linewidth}
|
||
\centering
|
||
\begin{tikzpicture}
|
||
\tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
|
||
inner sep=0pt,minimum size=15pt]
|
||
\node (a)[point] at (0,0) {a};
|
||
\node (b)[point] at (1.4,2) {b};
|
||
|
||
\begin{scope}[on background layer]
|
||
\draw[fill=blue!15] (a.center) -- (b.center) -- cycle;
|
||
\end{scope}
|
||
\end{tikzpicture}
|
||
\caption{Segment}
|
||
\end{subfigure}%
|
||
%
|
||
\begin{subfigure}[b]{.3\linewidth}
|
||
\centering
|
||
\begin{tikzpicture}
|
||
\tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
|
||
inner sep=0pt,minimum size=15pt]
|
||
\node (a)[point] at (0,0) {a};
|
||
\node (b)[point] at (1.4,2) {b};
|
||
\node (c)[point] at (2.8,0) {c};
|
||
|
||
\begin{scope}[on background layer]
|
||
\draw[fill=blue!15] (a.center) -- (b.center) -- (c.center) -- cycle;
|
||
\end{scope}
|
||
\end{tikzpicture}
|
||
\caption{Triangle}
|
||
\end{subfigure}%
|
||
%
|
||
\caption{Examples of simplices.}%
|
||
\label{fig:simplex}
|
||
\end{figure}
|
||
|
||
|
||
We then need a way to meaningfully combine these basic building blocks
|
||
so that the resulting object can adequately reflect the topological
|
||
structure of the metric space.
|
||
|
||
\begin{defn}[Simplicial complex]
|
||
A \emph{simplicial complex} is a collection $K$ of simplices such
|
||
that:
|
||
\begin{itemize}
|
||
\item any face of a simplex of $K$ is a simplex of $K$
|
||
\item the intersection of two simplices of $K$ is either the empty
|
||
set, or a common face, or both.
|
||
\end{itemize}
|
||
\end{defn}
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{tikzpicture}
|
||
\tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
|
||
inner sep=0pt,minimum size=10pt]
|
||
\node (a)[point] {};
|
||
\node (b)[point,above right=1.4cm and 1cm of a] {};
|
||
\node (c)[point,right=2cm of a] {};
|
||
\node (d)[point,above right=.4cm and 2cm of b] {};
|
||
\node (e)[point,above right=.4cm and 2cm of c] {};
|
||
\node (f)[point,below right=.7cm and 1.3cm of c] {};
|
||
\node (g)[point,right=2cm of d] {};
|
||
\node (h)[point,below right=.4cm and 1.5cm of e] {};
|
||
|
||
\begin{scope}[on background layer]
|
||
\draw[fill=blue!15] (a.center) -- (b.center) -- (c.center) -- cycle;
|
||
\draw (b) -- (d) -- (g);
|
||
\draw (c.center) -- (e.center) -- (f.center) -- cycle;
|
||
\draw (d) -- (e) -- (h);
|
||
\end{scope}
|
||
|
||
\node (1)[point,right=2cm of g] {};
|
||
\node (2)[point,above right=.5cm and 1cm of 1] {};
|
||
\node (3)[point,below right=.5cm and 1cm of 2] {};
|
||
\node (4)[point,below left=1cm and .3cm of 3] {};
|
||
\node (5)[point,below right=1cm and .3cm of 1] {};
|
||
\node (6)[point,below left=1cm and .1cm of 5] {};
|
||
\node (7)[point,below right=1cm and .1cm of 4] {};
|
||
\node (8)[point,below right=.7cm and .7cm of 6] {};
|
||
|
||
\begin{scope}[on background layer]
|
||
\draw[fill=green!15] (1.center) -- (2.center) -- (3.center) -- (4.center) -- (5.center) -- cycle;
|
||
\draw (1) -- (4) -- (2) -- (5) -- (3) -- (1);
|
||
\draw[fill=blue!15] (6.center) -- (7.center) -- (8.center) -- cycle;
|
||
\draw (5) -- (6) -- (4) -- (7);
|
||
\end{scope}
|
||
\end{tikzpicture}
|
||
\caption[Example of a simplicial complex.]{Example of a simplicial
|
||
complex that has two connected components, two 3-simplices, and
|
||
one 5-simplex.}%
|
||
\label{fig:simplical-complex}
|
||
\end{figure}
|
||
|
||
The notion of simplicial complex is closely related to that of a
|
||
hypergraph. One important distinction lies in the fact that a subset
|
||
of a hyperedge is not necessarily a hyperedge itself.
|
||
|
||
\subsection{Simplicial homology}%
|
||
\label{sec:simplicial-homology}
|
||
|
||
Using these definitions, we can define homology on simplicial
|
||
complexes~\cite{edelsbrunner_computational_2010,
|
||
chazal_introduction_2017}. In this section, we restrict to homology
|
||
with coefficients in $\mathbb{Z}_2$, the field with two elements.
|
||
|
||
\begin{defn}[$k$-chains]
|
||
Let $K$ be a finite simplicial complex, and $p$ a non-negative
|
||
integer. The space of \emph{$k$-chains} $C_p(K)$ of $K$ is the set
|
||
of formal sums of $p$-simplices of $K$. More precisely, it is the
|
||
$\mathbb{Z}_2$-vector space spanned by the $p$-simplices of $K$.
|
||
\end{defn}
|
||
|
||
Since the coefficients of $C_p(K)$ are in $\mathbb{Z}_2$, a $p$-chain
|
||
is simply a finite collection of $p$-simplices. The sum of two
|
||
$k$-chains is the symmetric difference of the two chains, i.e.\ the
|
||
collection of $p$-simplices that belong to either, but not both, of
|
||
the chains.
|
||
|
||
\begin{defn}[Boundary]
|
||
The \emph{boundary} of a $p$-simplex $\sigma$ is the $(p-1)$-chain
|
||
\[ \partial_p(\sigma) := \sum_{\tau\in K_{p-1},\; \tau\subset\sigma}
|
||
\tau, \] where $K_{p-1}$ is the set of $(p-1)$-simplices of $K$.
|
||
\end{defn}
|
||
|
||
As the $p$-simplices form a basis of $C_p(K)$, $\partial_p$ can be
|
||
extended into a linear map from $C_p(K)$ to $C_{p-1}(K)$, called the
|
||
\emph{boundary operator}. The elements of the kernel
|
||
$\mathrm{Ker}(\partial_p)$ are called the \emph{$p$-cycles} of
|
||
$K$. The image $\mathrm{Im}(\partial_p)$ is the space of
|
||
\emph{$p$-boundaries} of $K$.
|
||
|
||
\begin{lem}\label{lem:boundary}
|
||
The image of $\partial_{p+1}$ is a subset of the kernel of
|
||
$\partial_p$.
|
||
\end{lem}
|
||
|
||
\begin{proof}
|
||
The boundary of a boundary is always empty. To see this, consider
|
||
the boundary of a $(p+1)$-simplex $\sigma$. The boundary of $\sigma$
|
||
consists of all $p$-faces of $\sigma$. The boundary of this boundary
|
||
will contain each $(p-1)$-face of $\sigma$ twice, and since $1+1=0$
|
||
in $\mathbb{Z}_2$, we have that
|
||
\[ \partial_{p} \circ \partial_{p+1} \equiv 0. \]
|
||
|
||
This implies directly that
|
||
$\mathrm{Im}(\partial_{p+1}) \subset \mathrm{Ker}(\partial_p)$.
|
||
\end{proof}
|
||
|
||
\begin{defn}[Homology]
|
||
For any $p\in\mathbb{N}$, the \emph{$p$-th (simplicial) homology
|
||
group} of a simplicial complex $K$ is the quotient vector space
|
||
\[ H_p(K) := \mathrm{Ker}(\partial_{p}) /
|
||
\mathrm{Im}(\partial_{p+1}). \]
|
||
|
||
The dimension $\beta_p(K)$ of $H_p(K)$ is called the \emph{$p$-th
|
||
Betti number} of $K$.
|
||
\end{defn}
|
||
|
||
Let us close this overview of simplicial homology by a look at induced
|
||
maps. Let $K$ and $L$ be two simplicial complexes and $f: K \mapsto L$
|
||
a simplicial map between them. Since $f$ maps linearly each simplex of
|
||
$K$ to a simplex of $L$, we can extend it to map a chain of $K$ to a
|
||
chain of $L$ of the same dimension. If $c = \sum a_i \sigma_i$ is a
|
||
$p$-chain in $K$, we can define $f_\#(c) = \sum a_i \tau_i$, where
|
||
$\tau_i = f(\sigma_i)$ if it has dimension $p$ and $\tau_i = 0$ if
|
||
$f(\sigma_i)$ has dimension less than $p$.
|
||
|
||
\begin{figure}[!ht]
|
||
\centering
|
||
\begin{tikzpicture}
|
||
\node (l dots left) at (0,0) {\ldots};
|
||
\node (cpl) at (3,0) {$C_p(L)$};
|
||
\node (cp1l) at (6,0) {$C_{p-1}(L)$};
|
||
\node (l dots right) at (9,0) {\ldots};
|
||
\node (k dots left) at (0,2) {\ldots};
|
||
\node (cpk) at (3,2) {$C_p(K)$};
|
||
\node (cp1k) at (6,2) {$C_{p-1}(K)$};
|
||
\node (k dots right) at (9,2) {\ldots};
|
||
\draw[->] (l dots left) -- (cpl) node [above,midway] {$\partial_{p+1}$};
|
||
\draw[->] (cpl) -- (cp1l) node [above,midway] {$\partial_{p}$};
|
||
\draw[->] (cp1l) -- (l dots right) node [above,midway] {$\partial_{p-1}$};
|
||
\draw[->] (k dots left) -- (cpk) node [above,midway] {$\partial_{p+1}$};
|
||
\draw[->] (cpk) -- (cp1k) node [above,midway] {$\partial_{p}$};
|
||
\draw[->] (cp1k) -- (k dots right) node [above,midway] {$\partial_{p-1}$};
|
||
\draw[->] (cpk) -- (cpl) node [right,midway] {$f_\#^{p}$};
|
||
\draw[->] (cp1k) -- (cp1l) node [right,midway] {$f_\#^{p-1}$};
|
||
\end{tikzpicture}
|
||
\caption{Induced maps and boundary operators.}%
|
||
\label{fig:induced-map}
|
||
\end{figure}
|
||
|
||
\begin{prop}
|
||
$f_\#$ commutes with the boundary operator:
|
||
\[ f_\# \circ \partial_K = \partial_L \circ f_\#. \]
|
||
\end{prop}
|
||
|
||
\begin{proof}
|
||
Consider $f_\#^p : C_p(K) \mapsto C_p(L)$, and let
|
||
$c = \sum a_i \sigma_i \in C_p(K)$. If $f(\sigma_i)$ has dimension
|
||
$p$, then all $(p-1)$-faces of $\sigma_i$ map to the corresponding
|
||
$(p-1)$-faces of $\tau_i$, and the proposition follows. On the other
|
||
hand, if $f(\sigma_i)$ has dimension less than $p$, then the
|
||
$(p−1)$-faces of σf $\sigma_i$ map to simplices of dimension less
|
||
than $p−1$, with the possible exception of exactly two $(p−1)$-faces
|
||
whose images coincide and cancel each other. So we have that
|
||
$\partial_L(f_\#(\sigma_i)) = f_\#(\partial_K(\sigma_i)) =
|
||
0$. (See~\cite{edelsbrunner_computational_2010} for details.)
|
||
\end{proof}
|
||
|
||
\begin{cor}
|
||
$f_\#$ maps cycles to cycles, and boundaries to boundaries.
|
||
\end{cor}
|
||
|
||
Therefore, $f_\#$ defines a map over quotients, called the
|
||
\emph{induced map on homology} \[ f_*^p : H_p(K) \mapsto H_p(L). \]
|
||
|
||
\begin{prop}\label{prop:functor}
|
||
$f \mapsto f_*^p$ is a functor:
|
||
\begin{itemize}
|
||
\item if $f = \mathrm{id}_X$, then $f_*^p = \mathrm{id}_{H_p(X)}$,
|
||
\item if
|
||
$X \overset{f}{\longrightarrow} Y \overset{g}{\longrightarrow} Z$,
|
||
then ${(g \circ f)}_* = g_* \circ f_*$.
|
||
\end{itemize}
|
||
\end{prop}
|
||
|
||
%% homotopy equivalence?
|
||
|
||
The derivation of simplicial homology in this section used the field
|
||
$\mathbb{Z}_2$. It is however possible to define homology over any
|
||
field. The definition of the boundary operator needs to be adapted to
|
||
ensure that the \autoref{lem:boundary} remains true, even when
|
||
$1 \neq -1$. In this dissertation, we consider only persistent
|
||
homology on $\mathbb{Z}_2$, as it is the field used in our
|
||
implementation. It is important to note however that changing the
|
||
field of the vector spaces can affect the homology and therefore the
|
||
topological features detected~\cite{zomorodian_computing_2005}.
|
||
|
||
\subsection{Filtrations}%
|
||
\label{sec:filtrations}
|
||
|
||
If we consider that a simplicial complex is a kind of
|
||
``discretization'' of a subset of a metric space, we realise that
|
||
there must be an issue of \emph{scale}. For our analysis to be
|
||
invariant under small perturbations in the data, we need a way to find
|
||
the optimal scale parameter to capture the adequate topological
|
||
structure, without taking into account some small perturbations, nor
|
||
ignoring some important smaller features.
|
||
|
||
To illustrate this, let us take the example of the Čech complex, one
|
||
of the most important tools to build a simplicial complex from a
|
||
metric space.
|
||
|
||
\begin{defn}[Nerve]
|
||
Let $\mathcal{S} = {(S_i)}_{i\in I}$ be a non-empty collection of
|
||
sets. The \emph{nerve} of $\mathcal{S}$ is the simplicial complex
|
||
whose vertices are the elements of $I$ and where
|
||
$(i_0, \ldots, i_k)$ is a $k$-simplex if, and only if,
|
||
$S_0 \cap \cdots \cap S_k \neq \emptyset$.
|
||
\end{defn}
|
||
|
||
\begin{defn}[Čech complex]
|
||
Let $X$ be a point cloud in an arbitrary metric space, and
|
||
$\varepsilon > 0$. The \emph{Čech complex}
|
||
$\check{C}_\varepsilon(X)$ is the nerve of the set of
|
||
$\varepsilon$-balls centred on the points in $X$.
|
||
\end{defn}
|
||
|
||
% By the Nerve theorem~\cite{edelsbrunner_computational_2010}, we know
|
||
% that for any point cloud $X$ and any $\varepsilon > 0$,
|
||
% $\check{C}_\varepsilon(X)$ and $X$ have the same homology.
|
||
|
||
An example construction of a Čech complex is represented
|
||
on~\autoref{fig:cech-complex}. The simplicial complex depends on the
|
||
value of $\varepsilon$. To adequately represent the topological
|
||
structure of the underlying point cloud, it is necessary to consider
|
||
all possible values of $\varepsilon$ in order to capture all the
|
||
topological features.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{tikzpicture}[scale=1.1,every node/.style={transform shape}]
|
||
\tikzstyle{point}=[circle,thick,draw=black,fill=blue!30,%
|
||
inner sep=0pt,minimum size=5pt]
|
||
\node (a)[point] {};
|
||
\node (b)[point,right=1.5cm of a] {};
|
||
\node (c)[point,above right=1cm and .7cm of a] {};
|
||
\node (d)[point,above=1.5cm of c] {};
|
||
\node (e)[point,above right=.8cm and 1.55cm of c] {};
|
||
\node (f)[point,above right=.5cm and 2.5cm of b] {};
|
||
|
||
\def\circlea{(a) circle (1cm)}
|
||
\def\circleb{(b) circle (1cm)}
|
||
\def\circlec{(c) circle (1cm)}
|
||
\def\circled{(d) circle (1cm)}
|
||
\def\circlee{(e) circle (1cm)}
|
||
\def\circlef{(f) circle (1cm)}
|
||
|
||
\draw[color=gray,dashed,thick] \circlea;
|
||
\draw[color=gray,dashed,thick] \circleb;
|
||
\draw[color=gray,dashed,thick] \circlec;
|
||
\draw[color=gray,dashed,thick] \circled;
|
||
\draw[color=gray,dashed,thick] \circlee;
|
||
\draw[color=gray,dashed,thick] \circlef;
|
||
|
||
\begin{scope}
|
||
\clip\circlea;
|
||
\fill[pattern=north east lines,pattern color=blue!40]\circleb;
|
||
\fill[pattern=north east lines,pattern color=blue!40]\circlec;
|
||
\end{scope}
|
||
\begin{scope}
|
||
\clip\circlec;
|
||
\fill[pattern=north east lines,pattern color=blue!40]\circleb;
|
||
\fill[pattern=north east lines,pattern color=blue!40]\circled;
|
||
\fill[pattern=north east lines,pattern color=blue!40]\circlee;
|
||
\end{scope}
|
||
\begin{scope}
|
||
\clip\circled;
|
||
\fill[pattern=north east lines,pattern color=blue!40]\circlee;
|
||
\end{scope}
|
||
|
||
\draw[<->] (b) -- +(1cm,0cm) node[left=.45cm,anchor=south] {$\varepsilon$};
|
||
|
||
\node (a1)[point,right=7cm of a] {};
|
||
\node (b1)[point,right=1.5cm of a1] {};
|
||
\node (c1)[point,above right=1cm and .7cm of a1] {};
|
||
\node (d1)[point,above=1.5cm of c1] {};
|
||
\node (e1)[point,above right=.8cm and 1.55cm of c1] {};
|
||
\node (f1)[point,above right=.5cm and 2.5cm of b1] {};
|
||
\begin{scope}[on background layer]
|
||
\draw[thick,fill=blue!15] (a1.center) -- (b1.center) -- (c1.center) -- cycle;
|
||
\draw[thick] (c1.center) -- (d1.center) -- (e1.center) -- cycle;
|
||
\end{scope}
|
||
\end{tikzpicture}
|
||
\caption[Example of a point cloud and the corresponding Čech
|
||
complex.]{Example of a point cloud (left), and the corresponding
|
||
Čech complex at level $\varepsilon$ (right). Dashed circles
|
||
represent the $\varepsilon$-balls used to construct the simplicial
|
||
complex.}%
|
||
\label{fig:cech-complex}
|
||
\end{figure}
|
||
|
||
This is the objective of \emph{filtered simplicial complexes}.
|
||
|
||
\begin{defn}[Filtration]\label{defn:filt}
|
||
A \emph{filtered simplicial complex}, or simply a \emph{filtration},
|
||
$K$ is a sequence ${(K_i)}_{i\in I}$ of simplicial complexes such
|
||
that:
|
||
\begin{itemize}
|
||
\item for any $i, j \in I$, if $i < j$ then $K_i \subseteq K_j$,
|
||
\item $\bigcup_{i\in I} K_i = K$.
|
||
\end{itemize}
|
||
\end{defn}
|
||
|
||
To continue the example of Čech filtrations, one can build a sequence
|
||
of simplicial complexes for each value of $\varepsilon > 0$. Due to
|
||
their construction, Čech complexes on a point cloud $X$ respect the
|
||
essential inclusion property:
|
||
\[ \forall \varepsilon, \varepsilon' > 0,\quad \varepsilon < \varepsilon'
|
||
\implies \check{C}_\varepsilon(X) \subseteq
|
||
\check{C}_{\varepsilon'}(X). \]
|
||
|
||
\section{Persistent Homology}%
|
||
\label{sec:persistent-homology}
|
||
|
||
We can now compute the homology for each step in a filtration. This
|
||
leads to the notion of \emph{persistent
|
||
homology}~\cite{carlsson_topology_2009, zomorodian_computing_2005},
|
||
which gives all the information necessary to establish the topological
|
||
structure of a metric space at multiple scales.
|
||
|
||
\begin{defn}[Persistent homology]
|
||
The \emph{$p$-th persistent homology} of a simplicial complex
|
||
$K = {(K_i)}_{i\in I}$ is the pair
|
||
$({\{H_p(K_i)\}}_{i\in I}, {\{f_{i,j}\}}_{i,j\in I, i\leq j})$, where
|
||
for all $i\leq j$, $f_{i,j} : H_p(K_i) \mapsto H_p(K_j)$ is induced
|
||
by the inclusion map $K_i \mapsto K_j$.
|
||
\end{defn}
|
||
|
||
By functoriality (\autoref{prop:functor}),
|
||
$f_{k,j} \circ f_{i,k} = f_{i,j}$. Therefore, the functions $f_{i,j}$
|
||
allow us to link generators in each successive homology space in a
|
||
filtration.
|
||
|
||
Because each generator corresponds to a topological feature (connected
|
||
component, hole, void, and so on, depending on the dimension $p$), we
|
||
can determine whether it survives in the next step of the
|
||
filtration. We say that $x \in H_p(K_i)$ is \emph{born} in $H_p(K_i)$
|
||
if it is not in the image of $f_{i-1,i}$. We say that $x$ \emph{dies}
|
||
in $H_p(K_j)$ if $j > i$ is the smallest index such that
|
||
$f_{i,j}(x) = 0$. The half-open interval $[i, j)$ represents the
|
||
lifetime of $x$. If $f_{i,j}(x) \neq 0$ for all $j > i$, we say that
|
||
$x$ lives forever and its lifetime is the interval $[i, \infty)$.
|
||
|
||
The couples of intervals (birth time, death time) depends on the
|
||
choice of basis for each homology space $H_p(K_i)$. However, by the
|
||
Fundamental Theorem of Persistent
|
||
Homology~\cite{zomorodian_computing_2005}, we can choose basis vectors
|
||
in each homology space such that the collection of half-open intervals
|
||
is well-defined and unique. This construction is called a
|
||
\emph{barcode}~\cite{carlsson_topology_2009}.
|
||
|
||
\section{Topological summaries: barcodes and persistence diagrams}%
|
||
\label{sec:topol-summ}
|
||
|
||
Although it contains relevant topological information, the persistent
|
||
homology defined in the previous section cannot be used directly in
|
||
statistical methods. \emph{Topological summaries} are a compact
|
||
representation of persistent homology as elements of a metric
|
||
space. This is particularly useful in the context of statistical
|
||
analysis, e.g.\ when one needs to compare the output of a given
|
||
dataset to a null model.
|
||
|
||
One possible approach is to define a space in which we can project
|
||
barcodes and study their geometric properties. One such space is the
|
||
space of \emph{persistence
|
||
diagrams}~\cite{edelsbrunner_computational_2010}.
|
||
|
||
\begin{defn}[Multiset]
|
||
A \emph{multiset} $M$ is the couple $(A, m)$, where $A$ is the
|
||
\emph{underlying set} of $M$, formed by its distinct elements, and
|
||
$m : A\mapsto\mathbb{N}^*$ is the \emph{multiplicity function}
|
||
giving the number of occurrences of each element of $A$ in $M$.
|
||
\end{defn}
|
||
|
||
\begin{defn}[Persistence diagrams]
|
||
A \emph{persistence diagram} is the union of a finite multiset of
|
||
points in $\overline{\mathbb{R}}^2$ with the diagonal
|
||
$\Delta = \{(x,x) \;|\; x\in\mathbb{R}^2\}$, where every point of
|
||
$\Delta$ has infinite multiplicity.
|
||
\end{defn}
|
||
|
||
One adds the diagonal $\Delta$ for technical reasons. It is convenient
|
||
to compare persistence diagrams by using bijections between them, so
|
||
persistence diagrams must have the same cardinality.
|
||
|
||
In some cases, the diagonal in the persistence diagrams can also
|
||
facilitate comparisons between diagrams, as points near the diagonal
|
||
correspond to short-lived topological features, so they are likely to
|
||
be caused by small perturbations in the data.
|
||
|
||
One can build a persistence diagram from a barcode by taking the union
|
||
of the multiset of (birth, death) couples with the diagonal
|
||
$\Delta$. \autoref{fig:ph-pipeline} summarises the entire pipeline.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{tikzpicture}
|
||
\tikzstyle{pipelinestep}=[rectangle,thick,draw=black,inner sep=5pt,minimum size=15pt]
|
||
\node (data)[pipelinestep] {Data};
|
||
\node (filt)[pipelinestep,right=1cm of data] {Filtered complex};
|
||
%% \node (barcode)[pipelinestep,right=1cm of filt] {Barcodes};
|
||
\node (dgm)[pipelinestep,right=1cm of filt] {Persistence diagram};
|
||
\node (interp)[pipelinestep,right=1cm of dgm] {Interpretation};
|
||
|
||
\draw[->] (data.east) -- (filt.west);
|
||
%% \draw[->] (filt.east) -- (barcode.west);
|
||
\draw[->] (filt.east) -- (dgm.west);
|
||
\draw[->] (dgm.east) -- (interp.west);
|
||
\end{tikzpicture}
|
||
|
||
\caption{Persistent homology pipeline.}%
|
||
\label{fig:ph-pipeline}
|
||
\end{figure}
|
||
|
||
One can define an operator $\dgm$ as the first two steps in the
|
||
pipeline. It constructs a persistence diagram from a subset of a
|
||
metric space, via persistent homology on a filtered complex.
|
||
|
||
We can now define several distances on the space of persistence
|
||
diagrams.
|
||
|
||
\begin{defn}[Wasserstein distance]\label{defn:wasserstein-dist}
|
||
The \emph{$p$-th Wasserstein distance} between two diagrams $X$ and
|
||
$Y$ is
|
||
\[ W_p[d](X, Y) = \inf_{\phi:X\mapsto Y} \left[\sum_{x\in X} {d\left(x, \phi(x)\right)}^p\right] \]
|
||
for $p\in [1,\infty)$, and:
|
||
\[ W_\infty[d](X, Y) = \inf_{\phi:X\mapsto Y} \sup_{x\in X} d\left(x,
|
||
\phi(x)\right) \] for $p = \infty$, where $d$ is a distance on
|
||
$\mathbb{R}^2$ and $\phi$ ranges over all bijections from $X$ to
|
||
$Y$.
|
||
\end{defn}
|
||
|
||
\begin{defn}[Bottleneck distance]\label{defn:bottleneck}
|
||
The \emph{bottleneck distance} is defined as the infinite
|
||
Wasserstein distance where $d$ is the uniform norm:
|
||
$d_B = W_\infty[L_\infty]$.
|
||
\end{defn}
|
||
|
||
The bottleneck distance is symmetric, non-negative, and satisfies the
|
||
triangle inequality. However, it is not a true distance, as one can
|
||
come up with two distinct diagrams with bottleneck distance 0, even
|
||
on multisets that do not touch the diagonal $\Delta$.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{tikzpicture}[dot/.style={draw,circle,inner sep=0pt,minimum size=4pt}]
|
||
\draw[thick,->] (-0.3,0) -- (6,0) node [right] {birth};
|
||
\draw[thick,->] (0,-0.3) -- (0,6) node [above right] {death};
|
||
\draw[very thick,dashed] (0,0) -- (5.5,5.5);
|
||
\node (b1) [dot,fill=blue!70] at (1,2) {};
|
||
\node (r1) [dot,fill=red!70] at (1,2.5) {};
|
||
\draw (b1) -- (r1);
|
||
\node (b2) [dot,fill=blue!70] at (1.5,4.5) {};
|
||
\node (r2) [dot,fill=red!70] at (2,5) {};
|
||
\draw (b2) -- (r2);
|
||
\node (b3) [dot,fill=blue!70] at (3,5.2) {};
|
||
\node (r3) [dot,fill=red!70] at (2.5,5.5) {};
|
||
\draw (b3) -- (r3);
|
||
\node (b4) [dot,fill=blue!70] at (2,3) {};
|
||
\draw (b4) -- (2.5,2.5);
|
||
\node (b5) [dot,fill=blue!70] at (4.7,5.3) {};
|
||
\draw (b5) -- (5,5);
|
||
\node (b6) [dot,fill=blue!70] at (0.8,1.2) {};
|
||
\draw (b6) -- (1,1);
|
||
\node (r4) [dot,fill=red!70] at (2.8,3.2) {};
|
||
\draw (r4) -- (3,3);
|
||
\node (r5) [dot,fill=red!70] at (3.4,4.2) {};
|
||
\draw (r5) -- (3.8,3.8);
|
||
\end{tikzpicture}
|
||
\caption{Bottleneck distance between two diagrams.}%
|
||
\label{fig:bottleneck}
|
||
\end{figure}
|
||
|
||
\section{Stability}%
|
||
\label{sec:stability}
|
||
|
||
One of the most important aspects of topological data analysis is that
|
||
it is \emph{stable} with respect to small perturbations in the
|
||
data. More precisely, the second step of the pipeline
|
||
in~\autoref{fig:ph-pipeline} is Lipschitz with respect to a suitable
|
||
metric on filtered complexes and the bottleneck distance on
|
||
persistence
|
||
diagrams~\cite{cohen-steiner_stability_2007,chazal_persistence_2014}. First,
|
||
we define a distance between subsets of a metric
|
||
space~\cite{oudot_persistence_2015}.
|
||
|
||
\begin{defn}[Hausdorff distance]
|
||
Let $X$ and $Y$ be subsets of a metric space $(E, d)$. The
|
||
\emph{Hausdorff distance} is defined by
|
||
\[ d_H(X,Y) = \max \left[ \sup_{x\in X} \inf_{y\in Y} d(x,y),
|
||
\sup_{y\in Y} \inf_{x\in X} d(x,y) \right]. \]
|
||
\end{defn}
|
||
|
||
We can now give an appropriate stability
|
||
property~\cite{cohen-steiner_stability_2007,chazal_persistence_2014}.
|
||
|
||
\begin{prop}
|
||
Let $X$ and $Y$ be subsets in a metric space. We have
|
||
\[ d_B(\dgm(X),\dgm(Y)) \leq d_H(X,Y). \]
|
||
\end{prop}
|
||
|
||
\section{Algorithms and implementations}%
|
||
\label{sec:algor-impl}
|
||
|
||
Many algorithms have been developed to compute persistent
|
||
homology. The first one developed, and by far the most commonly used
|
||
is the so-called standard algorithm, introduced for the field
|
||
$\mathbb{Z}_2$ in~\cite{edelsbrunner_topological_2000}, and for
|
||
general fields in~\cite{zomorodian_computing_2005}. This algorithm
|
||
operates on the sequentially on the column of a boundary matrix. Its
|
||
complexity is therefore cubic in the number of simplices in the worst
|
||
case. It has been proven that this bound is
|
||
hard~\cite{morozov_persistence_2005}.
|
||
|
||
Many algorithms have since been developed to deliver heuristic
|
||
speed-ups in the case of sparse matrices. There are both sequential
|
||
algorithms, such as the dual algorithm~\cite{de_silva_persistent_2011,
|
||
de_silva_dualities_2011}, and algorithms that introduce parallelism
|
||
in the computation, such as the distributed
|
||
algorithm~\cite{mcgeoch_distributed_2014}.
|
||
|
||
These algorithms have been implemented in many publicly-available
|
||
implementations in the last few years. For a complete review and
|
||
benchmarks of these implementations,
|
||
see~\cite{otter_roadmap_2017}. Here, we focus on implementations that
|
||
provide a Python interface and implement common data structures, such
|
||
as filtrations and persistence diagrams. State-of-the-art libraries
|
||
include Ripser~\cite{bauer_ripser:_2018},
|
||
DIPHA~\cite{reininghaus_dipha_2018}, GUDHI~\cite{maria_gudhi_2014},
|
||
and Dionysus~\cite{morozov_dionysus:_2018}. GUDHI and Dionysus are
|
||
under active development, with new versions released recently,
|
||
exposing a complete Python API and implementing various algorithms,
|
||
including multifield persistence and cohomology.
|
||
|
||
In this project, Dionysus~2 has been selected for its ease of use,
|
||
good documentation, and good performance~\cite{otter_roadmap_2017}. In
|
||
this project, we only use persistent homology over the field
|
||
$\mathbb{Z}_2$. Dionysus is also one of the few libraries to implement
|
||
zigzag persistence (\autoref{sec:zigzag-persistence}).
|
||
|
||
% \section{Discussion}%
|
||
% \label{sec:discussion}
|
||
|
||
% information thrown away in filtrations and in PH
|
||
|
||
|
||
\chapter{Topological Data Analysis on Networks}%
|
||
\label{cha:topol-data-analys}
|
||
|
||
\section{Persistent homology for networks}%
|
||
\label{sec:pers-homol-netw}
|
||
|
||
We now consider the problem of applying persistent homology to network
|
||
data. An undirected network is already a simplicial complex of
|
||
dimension 1. However, this is not sufficient to capture enough
|
||
topological information; we need to introduce higher-dimensional
|
||
simplices. If the network is connected, one method is to project the
|
||
nodes of a network onto a metric space~\cite{otter_roadmap_2017},
|
||
thereby transforming the network data into a point-cloud data. For
|
||
this, we need to compute the distance between each pair of nodes in
|
||
the network (e.g.\ with the shortest-path distance).
|
||
|
||
Various methods to project nodes onto a metric space (called
|
||
\emph{graph embeddings}) are
|
||
available~\cite{fouss_algorithms_2016}. These mapping try to preserve
|
||
the structure of the network as much as possible, e.g.\ by ensuring
|
||
that neighbours in the network are neighbours in the metric space
|
||
(according to the distance in that space), and vice-versa. A few
|
||
methods worth mentioning in this area are \emph{spectral methods},
|
||
which define the mapping according to the eigenvectors of a matrix
|
||
constructed from the graph. These methods have the advantage of
|
||
minimizing a well-defined criterion, which often admits an exact
|
||
solution, and can often be computed
|
||
exactly~\cite{fouss_algorithms_2016}. They include kernel principal
|
||
components analysis, multidimensional scaling, Markov diffusion maps
|
||
and Laplacian eigenmap. Other methods are \emph{latent space methods},
|
||
which produce an embedding using a physical analogy, such as spring
|
||
networks and attractive forces. These methods are often used for graph
|
||
drawing (i.e.\ embedding in 2 or 3-dimensional spaces), but can only
|
||
be approximated for large networks~\cite{fouss_algorithms_2016}.
|
||
|
||
Using these graph embeddings, one can get a point cloud in a Euclidean
|
||
space, and build a simplicial complex using one of the various methods
|
||
developed for point clouds. One such example is the Čech complex
|
||
(\autoref{sec:filtrations}).
|
||
|
||
Another common method, for weighted networks, is called the
|
||
\emph{weight rank-clique filtration}
|
||
(WRCF)~\cite{petri_topological_2013}, which filters a network based
|
||
on weights. The procedure works as follows:
|
||
\begin{enumerate}
|
||
\item Consider the set of all nodes, without any edge, to be
|
||
filtration step~0.
|
||
\item Rank all edge weights in decreasing order $\{w_1,\ldots,w_n\}$.
|
||
\item At filtration step $t$, keep only the edges whose weights are
|
||
larger than or equal to $w_t$, thereby creating an unweighted graph.
|
||
\item Define the maximal cliques of the resulting graph to be
|
||
simplices.
|
||
\end{enumerate}
|
||
|
||
At each step of the filtration, we construct a simplicial complex
|
||
based on cliques; this is called a \emph{clique
|
||
complex}~\cite{zomorodian_tidy_2010}. The result of the algorithm is
|
||
itself a filtered simplicial complex (\autoref{defn:filt}), because a
|
||
subset of a clique is necessarily a clique itself, and the same is
|
||
true for the intersection of two cliques.
|
||
|
||
This leads to one of the possibilities for applying persistent
|
||
homology to temporal networks. One can apply WRCF on a network,
|
||
obtaining a filtered complex, to which we can then apply persistent
|
||
homology.
|
||
|
||
This method can quickly become very computationally expensive, as
|
||
finding all maximal cliques (e.g.\ using the Bron--Kerbosch algorithm)
|
||
is a complicated problem, with an optimal computational complexity of
|
||
$\mathcal{O}\big(3^{n/3}\big)$~\cite{tomita_worst-case_2006}. In
|
||
practice, one often restrict the search to cliques of dimension less
|
||
than or equal to a certain bound $d_M$. With this restriction, the new
|
||
simplicial complex is homologically equivalent to the original: they
|
||
have the same homology groups up to $H_{d_M-1}$.
|
||
|
||
\section{Zigzag persistence}%
|
||
\label{sec:zigzag-persistence}
|
||
|
||
The persistent homology methods exposed in the previous sections
|
||
operate on filtrations which are nested sequences of simplicial
|
||
complexes:
|
||
\[ \cdots \longrightarrow K_{i-1} \longrightarrow K_i \longrightarrow
|
||
K_{i+1} \longrightarrow \cdots, \] where each $\longrightarrow$
|
||
represents an inclusion map.
|
||
|
||
As we have seen in the previous section, filtrations can be built on
|
||
networks. Computing persistent homology therefore relies on
|
||
aggregating temporal networks, and then building a sequence of nested
|
||
simplicial complexes orthogonal to the time dimension.
|
||
|
||
Another approach would be to use the existing temporal sequence in the
|
||
network to build the filtration. The issue in this case is that the
|
||
sequence is no longer nested, as edges can be added or removed at each
|
||
time step (except for additive or dismantling temporal networks,
|
||
see~\autoref{defn:additive}). The development of \emph{zigzag
|
||
persistence}~\cite{carlsson_zigzag_2008, carlsson_zigzag_2009}
|
||
solves this issue by introducing a novel way to compute persistent
|
||
homology on sequences of complexes that are no longer nested:
|
||
\[ \cdots \longleftrightarrow K_{i-1} \longleftrightarrow K_i
|
||
\longleftrightarrow K_{i+1} \longleftrightarrow \cdots, \] where
|
||
each $\longleftrightarrow$ represents an inclusion map oriented
|
||
forwards or backwards.
|
||
|
||
To build this sequence from a temporal network, one can build a clique
|
||
complex at each time step. Edge additions and deletions will translate
|
||
to simplex additions and deletions in the sequence of simplicial
|
||
complexes. More details of this implementation is provided
|
||
in~\autoref{sec:zigzag-persistence-1}.
|
||
|
||
Note that zigzag persistence is related to the more general concept of
|
||
\emph{multi-parameter persistence}~\cite{carlsson_theory_2009,
|
||
dey_computing_2014}, where simplicial complexes can be filtered with
|
||
more than one parameter. It is an active area of research, especially
|
||
as the fundamental theorem of persistent homology is no longer valid
|
||
with more than one parameter, and there are significant challenges in
|
||
the visualization of ``barcodes'' for 2-parameter persistent
|
||
homology~\cite{otter_roadmap_2017}.
|
||
|
||
The complexity of the zigzag algorithm is cubic in the maximum number
|
||
of simplices in the complex~\cite{carlsson_zigzag_2009}, which is
|
||
equivalent to the worst-case complexity of the standard algorithm for
|
||
persistent homology (\autoref{sec:algor-impl}). In practice however,
|
||
zigzag computation tend to be much longer than their standard
|
||
counterparts. Computing zigzag persistence on a temporal network is
|
||
more costly than computing persistent homology on the weight rank
|
||
clique filtration of the aggregated graph.
|
||
|
||
The library Dionysus~\cite{morozov_dionysus:_2018} is the only one to
|
||
implement zigzag persistence at the time of this writing. As
|
||
implementation of the zigzag algorithm is not
|
||
straightforward~\cite{carlsson_zigzag_2009, maria_computing_2016},
|
||
Dionysus was the most logical option for the topological study of
|
||
temporal networks.
|
||
|
||
\chapter{Persistent Homology for Machine-Learning Applications}%
|
||
\label{cha:pers-homol-mach}
|
||
|
||
The output of persistent homology is not directly usable by most
|
||
statistical methods. For example, barcodes and persistence diagrams,
|
||
which are multisets of points in $\overline{\mathbb{R}}^2$, are not
|
||
elements of a metric space in which one can perform statistical
|
||
computations.
|
||
|
||
The distances between persistence diagrams defined
|
||
in~\autoref{sec:topol-summ} allow one to compare different
|
||
outputs. From a statistical perspective, it is possible to use a
|
||
generative model of simplicial complexes and to use a distance between
|
||
persistence diagrams to measure the similarity of our observations
|
||
with this null model~\cite{adler_persistent_2010}. This would
|
||
effectively define a metric space of persistence diagrams. It is even
|
||
possible to define some statistical summaries (means, medians,
|
||
confidence intervals) on these
|
||
spaces~\cite{turner_frechet_2014,munch_probabilistic_2015}.
|
||
|
||
The issue with this approach is that metric spaces do not offer enough
|
||
algebraic structure to be amenable to most machine-learning
|
||
techniques. Many of these methods, such as principal-components
|
||
analysis (PCA) and support vector machines (SVMs) require a Hilbert
|
||
structure on the feature space~\cite{carriere_sliced_2017,
|
||
chazal_persistence_2014}. Equipped with this structure, one can then
|
||
define common operations such as addition, average or scalar product
|
||
on features, which then facilitate their use in machine learning. One
|
||
of the most recent development in the study of topological summaries
|
||
has been to find mappings between the space of persistence diagrams
|
||
and Banach spaces\cite{adams_persistence_2017,
|
||
bubenik_statistical_2015, kwitt_statistical_2015,
|
||
kusano_kernel_2017}. (The definitions of common topological
|
||
structures can be found in \autoref{cha:topology}.)
|
||
|
||
\section{Vectorization methods}%
|
||
\label{sec:vect-meth}
|
||
|
||
The first possibility is to build an explicit feature map. Each
|
||
persistence diagram is projected into a vector of $\mathbb{R}^n$, on
|
||
which one can then build a suitable Hilbert structure.
|
||
|
||
The main examples in this category are persistence
|
||
landscapes~\parencite{bubenik_statistical_2015} and persistence
|
||
images~\cite{adams_persistence_2017}.
|
||
|
||
\subsection{Persistence landscapes}
|
||
|
||
Persistence landscapes~\cite{bubenik_statistical_2015} give a way to
|
||
project barcodes to a space where it is possible to add them
|
||
meaningfully. It is then possible to define means of persistence
|
||
diagrams, as well as other summary statistics.
|
||
|
||
The function mapping a persistence diagram to a persistence landscape
|
||
is \emph{injective}, but no explicit inverse exists to go back from a
|
||
persistence landscape to the corresponding persistence
|
||
diagram. Moreover, a mean of persistence landscapes does not
|
||
necessarily have a corresponding persistence diagram.
|
||
|
||
\begin{defn}[Persistence landscape]
|
||
The persistence landscape of a diagram $D = {\{(b_i,d_i)\}}_{i=1}^n$
|
||
is the set of functions $\lambda_k: \mathbb{R} \mapsto \mathbb{R}$,
|
||
for $k\in\mathbb{N}$, such that
|
||
\[ \lambda_k(x) = k\text{-th largest value of } {\{f_{(b_i,
|
||
d_i)}(x)\}}_{i=1}^n, \] (and $\lambda_k(x) = 0$ if the $k$-th
|
||
largest value does not exist), where $f_{(b,d)}$ is a piecewise-linear
|
||
function defined by:
|
||
\[ f_{(b,d)} =
|
||
\begin{cases}
|
||
0,& \text{if }x \notin (b,d),\\
|
||
x-b,& \text{if }x\in (b,\frac{b+d}{2}),\\
|
||
-x+d,& \text{if }x\in (\frac{b+d}{2},d)\,.
|
||
\end{cases}
|
||
\]
|
||
\end{defn}
|
||
|
||
Moreover, one can show that persistence landscapes are stable with
|
||
respect to the $L^p$ distance, and that the Wasserstein and bottleneck
|
||
distances are bounded by the $L^p$
|
||
distance~\cite{bubenik_statistical_2015}. We can thus view the
|
||
landscapes as elements of a Banach space in which we can perform the
|
||
statistical computations.
|
||
|
||
\subsection{Persistence images}
|
||
|
||
Persistence images~\cite{adams_persistence_2017} consist in a
|
||
convolution of the persistence diagram with a probability
|
||
distribution, followed by a discretization of the resulting
|
||
distribution in order to obtain a finite-dimensional vector. Most of
|
||
the following section is derived from the original
|
||
paper~\cite{adams_persistence_2017}.
|
||
|
||
\begin{defn}[Persistence surface]
|
||
Let $B$ be a persistence diagram, and
|
||
$T : \mathbb{R}^2\mapsto\mathbb{R}^2$ the linear transformation
|
||
$T(x,y) = (x, y-x)$. Let $\phi_u:\mathbb{R}^2\mapsto\mathbb{R}$ be a
|
||
differentiable probability density function with mean
|
||
$u\in\mathbb{R}^2$, and $f$ a non-negative weighting function which
|
||
is zero along the horizontal axis, continuous, and piecewise
|
||
differentiable.
|
||
|
||
The \emph{persistence surface} associated to $B$ is the function
|
||
$\rho_B:\mathbb{R}^2\mapsto\mathbb{R}$ such that
|
||
\[ \rho_B(z) = \sum_{u\in T(B)} f(u) \phi_u(z). \]
|
||
\end{defn}
|
||
|
||
Then, one needs to reduce the persistence surface to a
|
||
finite-dimensional vector by discretizing a subdomain of $\rho_B$ and
|
||
integrating it over each region.
|
||
|
||
\begin{defn}[Persistence image]
|
||
Let $\rho_B$ be the persistence surface of a persistence diagram
|
||
$B$. We fix a grid on the plane with $n$ cells (called
|
||
\emph{pixels}). The \emph{persistence image} of $B$ is the
|
||
collection of pixels, where for each cell $p$,
|
||
\[ {I(\rho_B)}_p = \iint_p \rho_B \diff y \diff x. \]
|
||
\end{defn}
|
||
|
||
There are three parameters:
|
||
\begin{itemize}
|
||
\item the resolution of the grid overlaid on the persistence surface,
|
||
\item the probability distribution, which is often taken as a Gaussian
|
||
distribution centred on each point (one still needs to choose an
|
||
appropriate variance),
|
||
\item the weight function, which must be zero on the horizontal axis
|
||
(which corresponds to the diagonal $\Delta$ before transformation by
|
||
the function $T$), continuous, and piecewise differentiable in order
|
||
for the stability results to hold. Generally, weighting functions
|
||
are taken non-decreasing in $y$ in order to weight points of higher
|
||
persistence more heavily.
|
||
\end{itemize}
|
||
|
||
All of these choices are non-canonical, but the classification
|
||
accuracy on most tasks seem to be robust to the choice of resolution
|
||
and variance of the Gaussian
|
||
distribution~\cite{zeppelzauer_topological_2016}.
|
||
|
||
It is also important to note that points with infinite persistence are
|
||
ignored by the weighting function $f$. Persistence images are
|
||
therefore not suitable in applications where these features can be
|
||
important to consider.
|
||
|
||
Persistence images are stable with respect to the 1-Wasserstein
|
||
distance between persistence diagrams (and with respect to the $L^1$,
|
||
$L^2$, and $L^\infty$ distances between
|
||
images)~\cite{adams_persistence_2017}.
|
||
|
||
In practice, persistence images are interesting because they project
|
||
persistence diagrams in a Euclidean space. Compared to persistence
|
||
landscape, one can apply a broader range of machine-learning
|
||
techniques. It has also been observed that persistence images
|
||
outperform performance landscapes in many classification tasks, with a
|
||
comparable computational efficiency~\cite{adams_persistence_2017}.
|
||
|
||
% \subsection{Tropical and arctic semirings}
|
||
|
||
% \cite{kalisnik_tropical_2018}
|
||
|
||
\section{Kernel-based methods}%
|
||
\label{sec:kernel-based-methods}
|
||
|
||
The other possibility is to define feature maps \emph{implicitly} by
|
||
building kernels on persistence diagrams. Such a kernel allows to use
|
||
a wide range of kernel-based machine-learning methods.
|
||
|
||
Let us recall the general framework of kernel
|
||
methods~\cite{muandet_kernel_2017, sejdinovic_advanced_2018}.
|
||
|
||
\begin{defn}[kernel]
|
||
A function $k:X\times X \mapsto \mathbb{R}_+$ on a non-empty set $X$
|
||
is a \emph{kernel} if there exist a Hilbert space $\mathcal{H}$ and
|
||
a map $\phi:X\mapsto\mathcal{H}$ such that
|
||
\[ \forall x, y \in X,\; k(x,y) = {\langle \phi(x), \phi(y)
|
||
\rangle}_{\mathcal{H}}. \]
|
||
|
||
The Hilbert space $\mathcal{H}$ is called the \emph{feature space}
|
||
and the function $\phi$ is called the \emph{feature map}.
|
||
\end{defn}
|
||
|
||
As inner products are positive definite, so are kernels, since they
|
||
are inner products on feature maps.
|
||
|
||
\begin{defn}[Reproducing kernel]
|
||
Let $\mathcal{H}$ be a Hilbert space of functions from a non-empty
|
||
set $X$ to $\mathbb{R}$. A function $k:X\mapsto\mathbb{R}$ is called
|
||
a \emph{reproducing kernel} of $\mathcal{H}$ if it satisfies:
|
||
\begin{itemize}
|
||
\item $\forall x\in X,\; k(\cdot,x)\in\mathcal{H}$,
|
||
\item
|
||
$\forall x\in X, \forall f\in\mathcal{H},\; {\langle f,
|
||
k(\cdot,x)\rangle}_{\mathcal{H}} = f(x)$.
|
||
\end{itemize}
|
||
\end{defn}
|
||
|
||
Note that every reproducing kernel is a kernel, with feature space
|
||
$\mathcal{H}$ and feature map $\phi:x \mapsto k(\cdot,x)$. In this
|
||
case, $\phi$ is called the \emph{canonical feature map}: the features
|
||
are not explicited as vectors of $\mathbb{R}^n$, but as functions on
|
||
$X$.
|
||
|
||
If $\mathcal{H}$ has a reproducing kernel, it is called a
|
||
\emph{reproducible kernel Hilbert space} (RKHS). The important result
|
||
here is the \emph{Moore-Aronszajn
|
||
theorem}~\cite{berlinet_reproducing_2011}: for every positive
|
||
definite function $k$, there exists a unique RKHS with kernel $k$.
|
||
|
||
We can now build a feature space with a Hilbert structure without
|
||
defining explicitly the feature map. Defining a kernel, i.e.\ any
|
||
positive definite function, on persistence diagrams is enough to
|
||
guarantee the existence of a unique RKHS with the adequate structure
|
||
to perform machine-learning tasks.
|
||
|
||
The following sections will define some relevant kernels.
|
||
|
||
\subsection{Sliced Wasserstein kernel}%
|
||
\label{sec:swk}
|
||
|
||
The sliced Wasserstein kernel is a new kernel on persistence diagrams
|
||
introduced by Carrière et al.\ in~\cite{carriere_sliced_2017}. The
|
||
general idea is to intersect the plane by lines going through the
|
||
origin, and projecting the points of the persistence diagrams onto
|
||
these lines, computing the distance between the diagrams as the
|
||
distance between measures on the real line. These distances are then
|
||
integrated over all the possible lines passing through the origin.
|
||
|
||
The formal definition (taken from~\cite{carriere_sliced_2017}) relies
|
||
on the \emph{1-Wasserstein distance} between measures on~$\mathbb{R}$.
|
||
|
||
\begin{defn}[1-Wasserstein distance]
|
||
Lt $\mu$ and $\nu$ be two non-negative measures on $\mathbb{R}$ such
|
||
that $\mu(\mathbb{R}) = \nu(\mathbb{R})$. The 1-Wasserstein distance
|
||
between $\mu$ and $\nu$ is
|
||
\[ \mathcal{W}(\mu, \nu) = \inf_{f} \int_{\mathcal{R}} f(x) \left[
|
||
\mu(\diff x) - \nu(\diff x) \right], \]
|
||
where $f$ is 1-Lipschitz.
|
||
\end{defn}
|
||
|
||
One can now define formally the sliced Wasserstein kernel.
|
||
|
||
\begin{defn}[Sliced Wasserstein kernel]
|
||
Let $\mathbb{S}_1$ be the $L_2$-distance sphere in
|
||
$\mathbb{R}^2$. Given $\theta\in\mathbb{S}_1$ let $L(\theta)$ be the
|
||
line $\{\lambda\theta : \lambda\in\mathbb{R}\}$, and $\pi_\theta$
|
||
the orthogonal projection onto $L(\theta)$. Let $\pi_\Delta$ be the
|
||
orthogonal projection on the diagonal.
|
||
|
||
Let $D_1$ and $D_2$ be two persistence diagrams, and let
|
||
\[\mu_1^\theta = \sum_{p\in D_1} \delta_{\pi_\theta(p)} \qquad\text{and}\qquad
|
||
\mu_{1\Delta}^\theta = \sum_{p\in D_1}
|
||
\delta_{\pi_\theta\circ\pi_\Delta(p)},\] and similarly for
|
||
$\mu_2^\theta$ and $\mu_{2\Delta}^\theta$.
|
||
|
||
The sliced Wasserstein distance is defined as
|
||
\[ SW(D_1, D_2) = \frac{1}{2\pi} \int_{\mathbb{S_1}}
|
||
\mathcal{W}(\mu_1^\theta + \mu_{2\Delta}^\theta,\; \mu_2^\theta +
|
||
\mu_{1\Delta}^\theta) \diff\theta. \]
|
||
\end{defn}
|
||
|
||
One can show that $SW$ is negative
|
||
definite~\cite{carriere_sliced_2017}. The function $k_{SW}$ defined as
|
||
\[ k_{SW}(D_1, D_2) = \exp\left(-\frac{SW(D_1,D_2)}{2\sigma^2}\right) \]
|
||
is therefore a valid kernel, called the \emph{sliced Wasserstein
|
||
kernel}.
|
||
|
||
\paragraph{Stability}
|
||
|
||
It can be shown that the sliced Wasserstein kernel is
|
||
\emph{equivalent} to the 1-Wasserstein distance between persistence
|
||
diagrams (\autoref{defn:wasserstein-dist}). (For a definition of
|
||
metric equivalence, see~\autoref{cha:topology}.)
|
||
|
||
\paragraph{Approximate computation}
|
||
|
||
In practice, $k_{SW}$ can be approximated by sampling $M$ directions
|
||
between $-\pi/2$ and $\pi/2$. For each direction $\theta_i$ and for
|
||
each persistence diagram $D$, one computes the scalar products between
|
||
the points of the diagram and $\theta_i$, and sorts them into a vector
|
||
$V_{\theta_i}(D)$. The $L_1$-distance between the vectors
|
||
corresponding to each diagram is then averaged over the samples
|
||
directions:
|
||
\[ SW_M(D_1, D_2) = \frac{1}{M} \sum_{i=1}^M {\lVert V_{\theta_i}(D_1)
|
||
- V_{\theta_i}(D_2) \rVert}_1. \]
|
||
|
||
The complete approximate computation is detailed
|
||
in~\autoref{algo:swk}. It has a complexity of
|
||
$\mathcal{O}(MN\log(N))$, where $N$ is an upper bound on the
|
||
cardinality of the persistence diagrams.
|
||
|
||
\begin{algorithm}[ht]
|
||
\caption{Approximate computation of the sliced Wasserstein kernel.}\label{algo:swk}
|
||
\DontPrintSemicolon%
|
||
\KwIn{$D_1 = \{p_1^1,\ldots,p_{N_1}^1\}, D_2 = \{p_1^2,\ldots,p_{N_1}^2\}, M$}
|
||
\KwOut{$SW$}
|
||
Add $\pi_\Delta(D_1)$ to $D_2$ and vice-versa\;
|
||
$SW \leftarrow 0$\;
|
||
$\theta \leftarrow -\pi/2$\;
|
||
$s \leftarrow \pi/M$\;
|
||
\For{$i \leftarrow 1$\KwTo$M$}{
|
||
Store the products $\langle p_k^1, \theta \rangle$ in an array $V_1$\;
|
||
Store the products $\langle p_k^2, \theta \rangle$ in an array $V_2$\;
|
||
Sort $V_1$ and $V_2$ in ascending order\;
|
||
$SW \leftarrow SW + s {\lVert V_1 - V_2 \rVert}_1$\;
|
||
$\theta \leftarrow \theta + s$\;
|
||
}
|
||
$SW \leftarrow SW/\pi$\:
|
||
\end{algorithm}
|
||
|
||
\subsection{Persistence scale-space kernel}%
|
||
\label{sec:pers-scale-space}
|
||
|
||
The persistent scale-space kernel
|
||
(PSSK)~\cite{reininghaus_stable_2015,kwitt_statistical_2015} is
|
||
another kernel on persistence diagrams. The following overview is
|
||
summarised from~\cite{reininghaus_stable_2015}. The general idea is to
|
||
represent a diagram $D$ as a sum of Dirac deltas centred on each point
|
||
of $D$. This representation is a natural projection onto the sapce of
|
||
functionals, which has a Hilbert structure.
|
||
|
||
However, this representation does not take into account the distance
|
||
of the points of $D$ to the diagonal. This is important since points
|
||
closed to the diagonal represent short-lived features, and are
|
||
therefore more likely to be noise. Do take this into account, the sum
|
||
of Dirac deltas is taken as the initial condition of a heat diffusion
|
||
on the half-plane above the diagonal, with a null boundary condition
|
||
on the diagonal itself.
|
||
|
||
This leads to the definition of the embedding as the solution of
|
||
partial differential equation, which admit an explicit solution in the
|
||
form of a positive definite kernel between persistence diagrams. This
|
||
kernel also depends on a scale parameter, which allows to control the
|
||
robustness of the embedding to noise.
|
||
|
||
This kernel also comes with stability guarantees, as it is
|
||
Lipschitz-continuous with respect to the 1-Wasserstein distance. It is
|
||
also fast, as the distance between two diagrams $D_1$ and $D_2$ can be
|
||
computed in $\mathcal{O}(\lvert D_1 \rvert \lvert D_2 \rvert)$, where
|
||
$\lvert D \rvert$ is the number of points in the diagram, or
|
||
approximated in $\mathcal{O}(\lvert D_1 \rvert + \lvert D_2 \rvert)$
|
||
with bounded error. In practice, empirical tests show that the
|
||
persistence scale-space kernel significantly outperforms the
|
||
persistence landscapes in shape classification
|
||
tasks~\cite{reininghaus_stable_2015}.
|
||
|
||
\subsection{Persistence weighted Gaussian kernel}%
|
||
\label{sec:pers-weight-gauss}
|
||
|
||
The persistence weighted Gaussian kernel
|
||
(PWGK)~\cite{kusano_kernel_2017} is actually a family of kernels on
|
||
persistence diagrams. Given a diagram $D$, one can define a measure
|
||
$\mu_D^w := \sum_{x\in D} w(x) \delta_x$, where $\delta_x$ is the
|
||
Dirac delta centred on $x$. The weight function $w$ can be chosen to
|
||
give more weight to points farther from the diagonal. One example is
|
||
$w(x) := \arctan\left(C {(\mathrm{death}(x) -
|
||
\mathrm{birth}(x))}^p\right)$, with $C>0$ and $p\in\mathbb{N}^*$.
|
||
|
||
Then, given a kernel $k$ and the corresponding RKHS $\mathcal{H}_k$,
|
||
\[ \mu_D^w \mapsto \sum_{x\in D} w(x) k(\cdot, x) \] is an embedding
|
||
of $\mu_D^w$ in $\mathcal{H}_k$. The persistence weighted gaussian
|
||
kernel is obtained by choosing $k$ as the Gaussian kernel
|
||
$k_G(x,y) := \exp\left(-\frac{{\lVert x-y \rVert}^2}{2\sigma^2}
|
||
\right)$.
|
||
|
||
The PWGK is stable with respect to the bottleneck
|
||
distance~\cite{kusano_kernel_2017}, and allows for efficient
|
||
computation. If the persistent diagrams contain at most $n$ points,
|
||
computation of the kernel involves $\mathcal{O}(n^2)$ evaluations of
|
||
the kernel $k$. Similarly to the PSSK, an approximation is possible in
|
||
$\mathcal{O}(n)$.
|
||
|
||
Experimental results on shape classification with SVMs show a
|
||
significant improvement in accuracy over the PSSk, persistence images,
|
||
and persistent landscapes~\cite{kusano_kernel_2017}.
|
||
|
||
\section{Comparison}%
|
||
\label{sec:comparison}
|
||
|
||
Every vectorization exposed in the previous sections are injective and
|
||
stable with respect to some distance in the space of persistence
|
||
diagrams. None of them, however, are surjective, and no explicit
|
||
inverse exists.
|
||
|
||
Only one of these methods preserves the metric on the space of
|
||
persistence diagrams: the sliced Wasserstein kernel, due to its
|
||
equivalence to the 1-Wasserstein distance, as mentioned
|
||
in~\autoref{sec:swk}. As such, it is considered as the
|
||
state-of-the-art in kernel embeddings of persistence diagrams.
|
||
|
||
There are two broad classes of applications that require different
|
||
kinds of vectorization methods. On the one hand, if one needs to go
|
||
back from the feature space to the diagram space, the best bet is an
|
||
embedding that preserves distances, such as the sliced Wasserstein
|
||
kernels, or has strong stability guarantees, such as the persistent
|
||
weighted Gaussian kernel. These embeddings are best for distance-based
|
||
methods, such as multidimensional scaling or nearest neighbours
|
||
algorithms.
|
||
|
||
On the other hand, getting insights from individual points of a
|
||
diagram, in order to recover information about individual topological
|
||
features (such as cycles, holes, or voids), is a much harder, less
|
||
well-studied problem. For instance, to recover the topological
|
||
features of the mean of persistence diagrams, one would need to fit
|
||
one of the vectorization methods on the mean. For this, persistence
|
||
landscapes or images seem better suited.
|
||
|
||
This project focuses on clustering of networks. As such, conservation
|
||
of the metric and stability is extremely important. Due to the
|
||
theoretical guarantees, we will focus on the sliced Wasserstein
|
||
kernel, which is also significantly easier to implement in its
|
||
approximate version than the PSSK (which uses random Fourier
|
||
features~\cite{reininghaus_stable_2015}) and the PWGK (which uses the
|
||
fast Gauss transform~\cite{kusano_kernel_2017}).
|
||
|
||
\chapter{Temporal partitioning of networks}%
|
||
\label{cha:temp-part-netw}
|
||
|
||
\section{Problem statement}%
|
||
\label{sec:problem-description}
|
||
|
||
\subsection{Data}%
|
||
\label{sec:data}
|
||
|
||
Temporal networks represent an active and recent area of research. The
|
||
additional dimension adds complexity to the study of graphs. As such,
|
||
many methods that work well with graphs fail in the context of
|
||
temporal networks.
|
||
|
||
Temporal networks are much more difficult to visualize, which makes it
|
||
harder to uncover patterns
|
||
directly~\cite{holme_temporal_2012}. Moreover, there are many issues
|
||
in data collection. Complex interaction networks where each edge can
|
||
be either present or absent at each time step grow exponentially in
|
||
size with the number of nodes and the total data collection
|
||
time~\cite{holme_temporal_2012}. Empirical temporal networks also tend
|
||
to exhibit oversampling and noise, due to the nature of the
|
||
measurements. For instance, proximity networks can record an
|
||
interaction between two individuals if they walk close to each other
|
||
without actually interacting. New advances try to take into account
|
||
these limitations of data collection~\cite{sulo_meaningful_2010,
|
||
newman_network_2018}.
|
||
|
||
In this study, we will consider temporal networks with \emph{contact}
|
||
interactions. In this context, interactions between nodes are supposed
|
||
to have a duration of~0, and \emph{oversampling} is used to represent
|
||
a long interaction. For instance, in a network sampled every
|
||
5~seconds, an interaction lasting for 30~seconds will be recorded in 6
|
||
consecutive time steps.
|
||
|
||
\subsection{Sliding windows}%
|
||
\label{sec:sliding-windows}
|
||
|
||
One possible solution to the study temporal networks is a partitioning
|
||
of the time scale using \emph{sliding windows}.
|
||
|
||
\begin{defn}[Temporal partitioning]\label{defn:partitioning}
|
||
Let $G = (V, E, \mathcal{T}, \rho)$ a temporal network, and let
|
||
$C = (c_1,\ldots,c_n)$ be a cover of $\mathcal{T}$ by non-empty
|
||
intervals of $\mathbb{N}$.
|
||
|
||
Then the sequence of temporal networks $(G_1,\ldots,G_n)$, where
|
||
$G_i = (V, E, c_i, \rho_i)$ and
|
||
$\rho_i(e, t) = \rho(e, t)\mathbb{1}_{t\in c_i}$, is a
|
||
\emph{temporal partitioning} of $G$.
|
||
|
||
The partitioning is \emph{uniform} if all intervals of $C$ have the
|
||
same length. This length is called the \emph{temporal resolution} of
|
||
the partitioning.
|
||
\end{defn}
|
||
|
||
In this project, we will only consider uniform partitioning of a
|
||
finite temporal domain, where the intersection of two consecutive
|
||
intervals have the same length. This intersection length is called the
|
||
\emph{overlap}.
|
||
|
||
\begin{figure}[!ht]
|
||
\centering
|
||
\begin{tikzpicture}
|
||
\def\t{2}
|
||
\def\s{.5}
|
||
\foreach \x in {0,...,4}
|
||
\draw[thick,|-|] ({2*\x*(\t - \s)},0) -- ({2*\x*(\t - \s) + \t},0);
|
||
\foreach \x in {0,...,3}
|
||
\draw[thick,|-|] ({2*\x*(\t - \s) + \t - \s},.4) -- ({2*\x*(\t - \s) + 2*\t - \s},.4);
|
||
\draw[<->] (0,-.2) -- (\t,-.2) node [below,midway] {$\Delta t$};
|
||
\draw[<->] ({\t - \s}, .6) -- (\t, .6) node [above,midway] {$s$};
|
||
% \draw[thick,->] (0,-1) -- ({8*(\t - \s) + \t},-1) node [above] {$t$};
|
||
\end{tikzpicture}
|
||
\caption{Uniform temporal partitioning with resolution $\Delta t$ and overlap $s$.}%
|
||
\label{fig:partitioning}
|
||
\end{figure}
|
||
|
||
The choice of temporal resolution and overlap have a significant
|
||
effect on the results of the analysis~\cite{ribeiro_quantifying_2013,
|
||
krings_effects_2012, sulo_meaningful_2010}. Different tasks may
|
||
require specific parameters. A large resolution can overlook a
|
||
significant pattern, while small overlap may cut through significant
|
||
features, divided between two consecutive intervals.
|
||
|
||
\subsection{Classification}%
|
||
\label{sec:classification}
|
||
|
||
After partitioning the temporal network, it is possible to run any
|
||
kind of classification task on the resulting sequence of subnetworks.
|
||
|
||
If labels are available on each subnetwork, it is possible to run some
|
||
supervised learning tasks. In the more common case of unsupervised
|
||
learning, there are many possibilities, including the clustering of
|
||
all subnetworks, or the detection of change points, where the
|
||
structure of the network change
|
||
fundamentally~\cite{peel_detecting_2014}.
|
||
|
||
In this dissertation, we focus on unsupervised clustering of the
|
||
subnetworks in order to detect periodicity in the original temporal
|
||
network. Most machine-learning algorithms cannot take temporal
|
||
networks directly as inputs. It is thus necessary to \emph{vectorize}
|
||
these networks, i.e.\ to project them onto a metric space with a
|
||
structure suitable to the algorithm used. For instance, one could use
|
||
traditional statistical summaries of networks
|
||
(\autoref{sec:network-statistics}), or the topological methods and
|
||
their vectorizations discussed in the previous chapters.
|
||
|
||
The choice of vectorization depends on the choice of the clustering
|
||
algorithm itself. Some machine-learning techniques, such as support
|
||
vector machines, require a Hilbert structure on the input
|
||
space~\cite{carriere_sliced_2017, hastie_elements_2009}, while some,
|
||
like $k$-nearest neighbours or agglomerative clustering, only require
|
||
a metric space~\cite{hastie_elements_2009}. The feature space will
|
||
therefore restrict the set of clustering algorithms available.
|
||
|
||
\subsection{Applications}%
|
||
\label{sec:applications}
|
||
|
||
The persistent homology pipeline can be used to determine different
|
||
properties of temporal networks. This study focuses on determining the
|
||
\emph{periodicity} of a temporal network. By clustering the
|
||
subnetworks obtained by partitioning the temporal domain into sliding
|
||
windows, it is possible to determine if a temporal network is periodic
|
||
in its topological structure, and if so, to estimate its
|
||
period.
|
||
|
||
\section{The analysis pipeline}%
|
||
\label{sec:analysis-pipeline}
|
||
|
||
\subsection{General overview}%
|
||
\label{sec:general-overview}
|
||
|
||
\begin{figure}[p]
|
||
\caption[Overview of the analysis pipeline.]{Overview of the
|
||
analysis pipeline. New approaches introduced in this study are
|
||
highlighted in \emph{italics}.}%
|
||
\label{fig:pipeline}
|
||
\centering
|
||
%\footnotesize
|
||
\begin{tikzpicture}[block_left/.style={rectangle,draw=black,thick,fill=white,text width=4cm,text centered,inner sep=6pt},
|
||
block_right/.style={rectangle,draw=black,thick,fill=white,text width=8cm,text ragged,inner sep=6pt},
|
||
line/.style={draw,thick,-latex',shorten >=0pt},
|
||
dashed_line/.style={draw,thick,dashed,-latex',shorten >=0pt}]
|
||
\matrix [column sep=2cm,row sep=1cm] {
|
||
\node {\Large\textbf{General approach}};
|
||
& \node {\Large\textbf{Specific pipeline}}; \\
|
||
\node (dataset)[block_left] {Dataset};
|
||
& \node (dataset_r)[block_right] {
|
||
Data sources
|
||
\begin{itemize}
|
||
\item Generative model (\ref{sec:gener-model-peri})
|
||
\begin{itemize}
|
||
\item Erdős-Rényi, Watts-Strogatz models
|
||
\item periodic distribution of interactions
|
||
\end{itemize}
|
||
\item Social networks data (\ref{sec:datasets})
|
||
\end{itemize}
|
||
}; \\
|
||
\node (representation)[block_left] {Data representation};
|
||
& \node (representation_r)[block_right]{
|
||
Temporal networks
|
||
\begin{itemize}
|
||
\item Definition (\ref{sec:defin-basic-prop})
|
||
\item Representation (\ref{sec:data-representation})
|
||
\end{itemize}
|
||
}; \\
|
||
\node (processing)[block_left] {Data processing};
|
||
& \node (processing_r)[block_right] {
|
||
Temporal partitioning \emph{(standard)} (\ref{sec:sliding-windows-1})\\
|
||
\emph{Novelty: Clustering of time windows}
|
||
}; \\
|
||
\node (analysis)[block_left] {Data analysis};
|
||
& \node (analysis_r)[block_right] {
|
||
Topological tools (\ref{sec:topological-analysis})\\
|
||
\emph{Novelty: application of TDA to temporal networks}\\
|
||
\emph{Novelty: comparison between WRCF and zigzag persistence for networks}
|
||
}; \\
|
||
\node (interpretation)[block_left] {Interpretation};
|
||
& \node (interpretation_r)[block_right] {
|
||
Clustering (\ref{sec:clustering})
|
||
\begin{itemize}
|
||
\item Distance matrix
|
||
\item[] \emph{Novelty: comparison between bottleneck distance and SW kernel}
|
||
\item Hierarchical clustering
|
||
\end{itemize}
|
||
}; \\
|
||
};
|
||
|
||
\begin{scope}[every path/.style=line]
|
||
\path (dataset) -- (representation);
|
||
\path (representation) -- (processing);
|
||
\path (processing) -- (analysis);
|
||
\path (analysis) -- (interpretation);
|
||
\end{scope}
|
||
\begin{scope}[every path/.style=dashed_line]
|
||
\path (dataset) -- (dataset_r);
|
||
\path (representation) -- (representation_r);
|
||
\path (processing) -- (processing_r);
|
||
\path (analysis) -- (analysis_r);
|
||
\path (interpretation) -- (interpretation_r);
|
||
\end{scope}
|
||
\end{tikzpicture}
|
||
\end{figure}
|
||
|
||
The analysis pipeline consists in several steps:
|
||
\begin{enumerate}
|
||
\item Load the data: temporal networks are often distributed as
|
||
\emph{interaction lists}. In these files, each line consists of two
|
||
nodes and a timestamp, and thus represents one contact
|
||
interaction. One can reconstruct the temporal network by extracting
|
||
all timestamps of a given edge and adding them as an edge
|
||
property. It is then easy to extract a subnetwork within a specific
|
||
time interval.
|
||
\item Interaction networks are sometimes directed. In these cases, it
|
||
is necessary to transform the network into an undirected one, as
|
||
most method (particularly topological methods, such as WRCF and
|
||
zigzag persistence) only work on undirected networks.
|
||
\item Using the methods discussed in~\autoref{sec:sliding-windows-1},
|
||
the temporal domain is segmented into sliding windows and a list of
|
||
subnetworks can be generated.
|
||
\item Features are extracted from each subnetwork. These features can
|
||
be constructed from different kinds of persistent homology on
|
||
networks, as discussed in~\autoref{sec:topological-analysis}.
|
||
\item Depending on the methods used, the feature space is equipped
|
||
with a metric that can make it a Hilbert space or a simple metric
|
||
space. In any case, a distance matrix representing pairwise
|
||
distances between each subnetwork is computed.
|
||
\item Hierarchical clustering is applied to the distance matrix.
|
||
\end{enumerate}
|
||
|
||
The whole analysis pipeline is summarised in~\autoref{fig:pipeline}.
|
||
|
||
\subsection{Data representation}%
|
||
\label{sec:data-representation}
|
||
|
||
The data is represented in the algorithms as multigraphs. Each edge is
|
||
associated to a timestamp (an integer). Two nodes can be linked by
|
||
multiple edges, each one of them representing a time at which the edge
|
||
is present.
|
||
|
||
This representation allows for easy filtering, as one can extract a
|
||
temporal network in a given time interval by keeping only the edges
|
||
whose timestamp is included in the interval. One can also build the
|
||
underlying aggregated graph by ``collapsing'' multiple edges into a
|
||
single one.
|
||
|
||
It is important to note that the nodes of the network are completely
|
||
static and always present. This follows the temporal networks model
|
||
adopted in~\autoref{defn:temp-net}.
|
||
|
||
\subsection{Sliding windows}%
|
||
\label{sec:sliding-windows-1}
|
||
|
||
As mentioned in~\autoref{sec:sliding-windows}, we consider temporal
|
||
networks whose temporal domain is a finite interval of $\mathbb{N}$.
|
||
|
||
For a temporal resolution $\Delta t$ and an overlap $s$, we compute
|
||
the temporal partitioning as follows.
|
||
\begin{enumerate}
|
||
\item Compute the length of the temporal domain $\mathcal{T}$.
|
||
\item Segment it into $N$ sliding windows of length $\Delta t$ with an
|
||
overlap $s$.
|
||
\item Each subnetwork in the sequence contains only the interactions
|
||
appearing during the corresponding sliding window.
|
||
\end{enumerate}
|
||
|
||
\begin{algorithm}[ht]
|
||
\caption{Temporal partitioning of network with sliding
|
||
windows.}\label{algo:partitioning}
|
||
\DontPrintSemicolon%
|
||
\SetKwData{Res}{res}
|
||
\SetKwData{Times}{times}
|
||
\SetKwData{WinLength}{window\_length}
|
||
\SetKwData{Windows}{windows}
|
||
\KwIn{Graph $G$, resolution \Res}
|
||
\KwOut{List of subnetworks \Windows}
|
||
\Times$\leftarrow$ list of timestamps in $G$\;
|
||
$\WinLength\leftarrow\Res\times (\max(\Times) - \min(\Times))$\;
|
||
\For{$i \leftarrow 0$ \KwTo$1/\Res - 1$}{
|
||
\Windows[$i$] $\leftarrow$ subnetwork of $G$ containing all nodes, and
|
||
edges whose timestamp is in
|
||
$\left[ \min(\Times) + \WinLength\times i, \min(\Times) +
|
||
\WinLength\times(i+1) \right]$\;
|
||
}
|
||
\end{algorithm}
|
||
|
||
\subsection{Topological analysis}%
|
||
\label{sec:topological-analysis}
|
||
|
||
The major novelty in this analysis is to introduce topological
|
||
features in temporal network analysis. The idea is that the techniques
|
||
introduced in~\autoref{cha:tda-ph} will reveal additional structure in
|
||
networks that is not captured by traditional methods, and is relevant
|
||
for detecting periodicity or other important properties of temporal
|
||
networks.
|
||
|
||
Here, two different approaches are presented and compared. One is
|
||
focusing on the topology of the aggregated graph, using weight-rank
|
||
clique filtration, while the other leverages the temporal dimension by
|
||
using the more recent advances in generalized persistence,
|
||
specifically zigzag persistence.
|
||
|
||
\subsubsection{Aggregated graph persistence homology}%
|
||
\label{sec:aggr-graph-pers}
|
||
|
||
The first possibility to introduce topological features into the
|
||
feature map is to use weight-rank clique filtration
|
||
(\autoref{sec:pers-homol-netw}) on the aggregated static graphs.
|
||
|
||
For this, we associate to each edge in the network a weight
|
||
corresponding to the number of time steps in which it is present. For
|
||
an edge $e$ and a time interval $c_i$ (keeping the notations
|
||
of~\autoref{defn:partitioning}), the weight associated to $e$ is
|
||
\[ w(e) = \sum_{t\in c_i} \rho(e,t). \]
|
||
|
||
The resulting graph is called the \emph{aggregated graph} of the
|
||
temporal network on the time interval $c_i$. This graph being
|
||
weighted, it is possible to compute persistence diagrams using
|
||
weight-rank clique filtration (the algorithm is exposed
|
||
in~\autoref{sec:pers-homol-netw}).
|
||
|
||
% \subsubsection{Traditional network statistics}%
|
||
% \label{sec:trad-netw-stat}
|
||
|
||
% Network statistics (\autoref{sec:network-statistics}) can be used as
|
||
% feature maps. First, temporal networks are transformed into static
|
||
% graphs by keeping all edges that appear at least once in the network.
|
||
|
||
% One can then combine different centrality measures, average degree,
|
||
% average shortest path length, and others statistical summaries into a
|
||
% vector which is used as a feature vector. Equipped with the Euclidean
|
||
% distance, this feature space form a metric space suitable for
|
||
% machine-learning tasks.
|
||
|
||
\subsubsection{Zigzag persistence}%
|
||
\label{sec:zigzag-persistence-1}
|
||
|
||
The main drawback of WRCF persistence is the loss of any kind of
|
||
temporal information in the network. Three nodes can be detected as
|
||
being very close to one another even though their contacts might have
|
||
been in separate time steps. We can avoid aggregating the temporal
|
||
networks by using \emph{generalised persistence}, specifically zigzag
|
||
persistence as exposed in~\autoref{sec:zigzag-persistence}.
|
||
|
||
In practice, zigzag persistence is more computationally expensive than
|
||
WRCF persistence~\cite{carlsson_zigzag_2009}, and leads to lower
|
||
number of topological features at every dimension. Aggregating
|
||
networks tend to artificially create a lot of cliques that do not
|
||
appear in the original temporal network.
|
||
|
||
To compute zigzag persistence, the algorithm needs the \emph{maximal
|
||
simplicial complex}, i.e.\ the union of all simplicial complexes in
|
||
the sequence. In the case of temporal networks, this is the set of
|
||
maximal cliques in the aggregated graph. Zigzag persistence can then
|
||
be computed from the list of times when each simplex enters or leaves
|
||
the maximal filtration. The following algorithm determines these
|
||
times:
|
||
\begin{enumerate}
|
||
\item Determine the maximal simplicial complex by computing the
|
||
cliques in the aggregated graph.
|
||
\item For each time step $t$:
|
||
\begin{itemize}
|
||
\item Keep only the edges present at this time step (i.e.\ the edges
|
||
$e$ such that $\rho(e, t) = 1$).
|
||
\item Compute all the cliques in this network.
|
||
\end{itemize}
|
||
\item For each clique in the maximal simplicial complex, determine
|
||
where it is present and where it is absent in the sequence of lists
|
||
of cliques.
|
||
\item Finally, determine the transition times in the presence arrays.
|
||
\end{enumerate}
|
||
|
||
This computation can be quite costly for large networks, even before
|
||
starting the main zigzag algorithm. Clique-finding is indeed an
|
||
NP-complete problem~\cite{karp_reducibility_2010}. This is thus by far
|
||
the most computationally expensive step of the analysis pipeline, and
|
||
is also more expensive than WRCF persistence.
|
||
|
||
\subsection{Clustering}%
|
||
\label{sec:clustering}
|
||
|
||
\subsubsection{Distance matrix}%
|
||
\label{sec:distance-matrix}
|
||
|
||
In order to cluster the subnetworks obtained by temporal partitioning
|
||
of the original network, one needs to introduce a notion of distance
|
||
between the topological features. Since the output of the previous
|
||
step in the analysis pipeline take the form of persistence diagrams,
|
||
two options are possible: a standard measure of distance between
|
||
diagrams (see~\autoref{sec:topol-summ}), or one of the vectorization
|
||
or kernel-based methods exposed in~\autoref{cha:pers-homol-mach}.
|
||
|
||
One of the main contributions of this study is to compare the
|
||
performance of the bottleneck distance (\autoref{defn:bottleneck}) and
|
||
of the sliced Wasserstein kernel (\autoref{sec:swk}) in the context of
|
||
network clustering.
|
||
|
||
A distance matrix is obtained by computing pairwise distances between
|
||
each pair of subnetworks obtained during the temporal partitioning
|
||
step. One important remark is that the distances considered compute
|
||
distances between persistence diagrams. However, persistence homology
|
||
returns a \emph{sequence} of such persistence diagrams for each
|
||
subnetwork, each diagram in the sequence corresponding to topological
|
||
features of a specific dimension. For the purposes of clustering,
|
||
0-dimensional features are not extremely interesting since they
|
||
correspond to connected components, and 2 or 3-dimensional diagrams
|
||
are often nearly empty except for very large subnetworks. It is
|
||
therefore appropriate to restrict our analysis to 1-dimensional
|
||
diagrams, which represent a good compromise. This is consistent with
|
||
what has been done for point cloud data
|
||
classification~\cite{carriere_sliced_2017}.
|
||
|
||
The bottleneck distance gives the space of persistence diagram a
|
||
metric-space structure. Meanwhile, the sliced Wasserstein kernel gives
|
||
the space a structure of a Hilbert space, which can be required
|
||
for several machine-learning techniques, such as support-vector
|
||
machines (SVMs) or principal components analysis (PCA).
|
||
|
||
For the implementation, we use the approximate computation of the
|
||
sliced Wasserstein kernel (\autoref{algo:swk}) sampled along 10
|
||
directions, which is actually faster in practice than the computation
|
||
of the bottleneck distance. For the computation of the bottleneck
|
||
distance, the diagram points that go to infinity are
|
||
excluded. According to the definition of the bottleneck distance, if
|
||
two diagrams do not have the same number of infinite points, the
|
||
distance is automatically infinity, which does not work well in
|
||
clustering algorithms. Moreover, this does not interfere with the
|
||
comparison between the bottleneck distance and the sliced Wasserstein
|
||
kernel, since infinite points are ignored by the kernel anyway.
|
||
|
||
\subsubsection{Hierarchical clustering}%
|
||
\label{sec:hier-clust}
|
||
|
||
To simplify the interpretation of the analysis and the comparison
|
||
between the different approaches, the clustering algorithm used is
|
||
\emph{hierarchical clustering}~\cite{hastie_elements_2009}.
|
||
|
||
The main advantage is that it does not require knowing in advance the
|
||
number of clusters that one is looking for. The only input is the
|
||
dissimilarity matrix, obtained from a single metric. It is necessary
|
||
here to use an algorithm that does not require the observations
|
||
themselves, as in this case they take the form of a persistence
|
||
diagram instead of a numeric vector. Moreover, kernel-based methods
|
||
are not applicable to the bottleneck distance since it does not confer
|
||
a Hilbert structure to the space of persistence diagram. By contrast,
|
||
hierarchical clustering only requires a valid measure of distance.
|
||
|
||
The hierarchical representation (or \emph{dendrogram}) is also
|
||
especially useful in the context of periodicity detection, since
|
||
periodicity can appear at various levels of the hierarchy.
|
||
|
||
Hierarchical clustering is performed in a bottom-up way, also called
|
||
\emph{agglomerative clustering}. Starting from the distance matrix,
|
||
with each observation in its own cluster, the algorithm merges rows
|
||
and columns at each step of the clustering, updating the distances
|
||
between the new clusters. To do that, it needs to compute the distance
|
||
between two clusters. Several approaches are possible to compute the
|
||
distance between two clusters $A$ and $B$ using a metric $d$:
|
||
\begin{itemize}
|
||
\item Complete linkage: the distance between $A$ and $B$ is the
|
||
maximum distance between their elements
|
||
\[ \max\left\{ d(x,y) : x\in A, y\in B \right\} \]
|
||
\item Single linkage: using the minimum distance
|
||
\[ \min\left\{ d(x,y) : x\in A, y\in B \right\} \]
|
||
\item Average linkage: using the mean distance between the elements
|
||
\[ \frac{1}{|A| |B|} \sum_{x\in A} \sum_{y\in B} d(x,y). \]
|
||
\end{itemize}
|
||
|
||
The implementation used is taken from the library
|
||
Scikit-Learn~\cite{pedregosa_scikit-learn:_2011}.
|
||
|
||
\chapter{Results and Discussion}%
|
||
\label{cha:results-discussion}
|
||
|
||
\section{Data}%
|
||
\label{sec:data-1}
|
||
|
||
\subsection{Generative model for periodic temporal networks}%
|
||
\label{sec:gener-model-peri}
|
||
|
||
In order to detect periodicity, one can generate a random temporal
|
||
network with a periodic structure.
|
||
|
||
We first build a random Erdős-Rényi graph. Starting from this base
|
||
graph, we generate a temporal stream for each edge independently. This
|
||
generative model is inspired by previous work on periodic temporal
|
||
networks~\cite{price-wright_topological_2015}.
|
||
|
||
For each edge, we generate a sequence of times in a predefined time
|
||
range $T$. For this, we choose uniformly at random a number of
|
||
interactions $n$ in $[0, T/2]$. We then generate at random a sequence
|
||
of $n$ times in $[0, T]$ from a density
|
||
\[ f(t) = \sin(f t) + 1, \] where $f$ is the frequency. The times are
|
||
then sorted.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{subfigure}[b]{0.2\linewidth}
|
||
\begin{tikzpicture}
|
||
\clip (0,0) rectangle (4.0,4.0);
|
||
\Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
|
||
\Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
|
||
\Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
|
||
\Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
|
||
\Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
|
||
\Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
|
||
\Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
|
||
\Edge[](1)(3)
|
||
\Edge[](3)(4)
|
||
\Edge[](3)(4)
|
||
\Edge[](0)(5)
|
||
\Edge[](0)(5)
|
||
\Edge[](2)(5)
|
||
\Edge[](4)(5)
|
||
\Edge[](0)(6)
|
||
\Edge[](1)(6)
|
||
\Edge[](5)(6)
|
||
\end{tikzpicture}
|
||
\end{subfigure}
|
||
\begin{subfigure}[b]{0.2\linewidth}
|
||
\begin{tikzpicture}
|
||
\clip (0,0) rectangle (4.0,4.0);
|
||
\Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
|
||
\Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
|
||
\Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
|
||
\Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
|
||
\Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
|
||
\Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
|
||
\Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
|
||
\Edge[](1)(2)
|
||
\Edge[](0)(3)
|
||
\Edge[](2)(4)
|
||
\Edge[](2)(6)
|
||
\end{tikzpicture}
|
||
\end{subfigure}
|
||
\begin{subfigure}[b]{0.2\linewidth}
|
||
\begin{tikzpicture}
|
||
\clip (0,0) rectangle (4.0,4.0);
|
||
\Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
|
||
\Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
|
||
\Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
|
||
\Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
|
||
\Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
|
||
\Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
|
||
\Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
|
||
\Edge[](0)(2)
|
||
\Edge[](0)(2)
|
||
\Edge[](1)(4)
|
||
\Edge[](2)(5)
|
||
\Edge[](4)(5)
|
||
\Edge[](0)(6)
|
||
\Edge[](2)(6)
|
||
\Edge[](4)(6)
|
||
\end{tikzpicture}
|
||
\end{subfigure}
|
||
\begin{subfigure}[b]{0.2\linewidth}
|
||
\begin{tikzpicture}
|
||
\clip (0,0) rectangle (4.0,4.0);
|
||
\Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
|
||
\Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
|
||
\Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
|
||
\Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
|
||
\Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
|
||
\Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
|
||
\Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
|
||
\Edge[](0)(1)
|
||
\Edge[](1)(2)
|
||
\Edge[](2)(3)
|
||
\Edge[](2)(3)
|
||
\Edge[](0)(4)
|
||
\Edge[](0)(4)
|
||
\Edge[](1)(4)
|
||
\Edge[](2)(4)
|
||
\Edge[](3)(5)
|
||
\Edge[](4)(6)
|
||
\Edge[](5)(6)
|
||
\end{tikzpicture}
|
||
\end{subfigure}
|
||
% \begin{subfigure}[b]{0.2\linewidth}
|
||
% \begin{tikzpicture}
|
||
% \clip (0,0) rectangle (4.0,4.0);
|
||
% \Vertex[x=1.408,y=1.635,size=0.3,color=blue,opacity=0.7]{0}
|
||
% \Vertex[x=1.089,y=2.415,size=0.3,color=blue,opacity=0.7]{1}
|
||
% \Vertex[x=2.021,y=1.976,size=0.3,color=blue,opacity=0.7]{2}
|
||
% \Vertex[x=1.984,y=3.000,size=0.3,color=blue,opacity=0.7]{3}
|
||
% \Vertex[x=2.576,y=2.398,size=0.3,color=blue,opacity=0.7]{4}
|
||
% \Vertex[x=2.911,y=1.593,size=0.3,color=blue,opacity=0.7]{5}
|
||
% \Vertex[x=1.994,y=1.000,size=0.3,color=blue,opacity=0.7]{6}
|
||
% \Edge[](0)(1)
|
||
% \Edge[](0)(3)
|
||
% \Edge[](1)(3)
|
||
% \Edge[](3)(5)
|
||
% \Edge[](1)(6)
|
||
% \end{tikzpicture}
|
||
% \end{subfigure}
|
||
\caption{Example of a random temporal network generated
|
||
by~\autoref{algo:generative-model}.}%
|
||
\label{fig:random-example}
|
||
\end{figure}
|
||
|
||
\begin{algorithm}[ht]
|
||
\caption{Random temporal network
|
||
generation.}\label{algo:generative-model}
|
||
\DontPrintSemicolon%
|
||
\SetKwData{Basegraph}{basegraph}
|
||
\SetKwData{TimeRange}{time\_range}
|
||
\SetKwData{Nodes}{nodes}
|
||
\SetKwData{EdgeProb}{edge\_prob}
|
||
\SetKwData{Frequency}{frequency}
|
||
\SetKwData{Network}{network}
|
||
\SetKwData{Times}{times}
|
||
\KwIn{\Nodes, \EdgeProb, \TimeRange, \Frequency}
|
||
\KwOut{\Network}
|
||
$\Basegraph\leftarrow \mathrm{ErdősRényi}(\Nodes, \EdgeProb)$\;
|
||
$\Network\leftarrow$ network with no edges and the vertices of \Basegraph\;
|
||
\For{$e\in \Basegraph.\mathrm{edges}$}{
|
||
$\Times\leftarrow \mathrm{random\_edge\_presences}(\TimeRange, \Frequency)$\;
|
||
\For{$t\in\Times$}{Add $(e.\mathrm{source}, e.\mathrm{target}, t)$ to \Network}
|
||
}
|
||
\end{algorithm}
|
||
|
||
The complete method to generate a random network is summarised
|
||
in~\autoref{algo:generative-model}. The function
|
||
\texttt{random\_edge\_presences} returns a sequence of periodic
|
||
times. An example of a small random network can be found
|
||
on~\autoref{fig:random-example}.
|
||
|
||
\subsection{Datasets}%
|
||
\label{sec:datasets}
|
||
|
||
The SocioPatterns dataset~\cite{isella_whats_2011} has been collected
|
||
during the \textsc{infectious} exhibition at the Science Gallery in
|
||
Dublin, Ireland from April 17th to July 17th, 2009. During this event,
|
||
a radio-frequency identification (RFID) device was embedded into each
|
||
visitor's badge (as part of an interactive exhibit). RFID devices
|
||
exchange radio packets when they are at a close range from each other
|
||
(between 1~m and 1.5~m), in a peer-to-peer fashion. The data
|
||
collection process is described in detail
|
||
in~\cite{cattuto_dynamics_2010}.
|
||
|
||
The devices are configured so that face-to-face interactions between
|
||
two individuals is accurately recorded with a probability of 99\% over
|
||
a period of 20~s, which is an appropriate time scale to record social
|
||
interaction. False positives are also extremely rare as RFID devices
|
||
have a very limited range and multiple radio packet exchanges are
|
||
required to record an interaction.
|
||
|
||
The event in Dublin recorded more than 230,000 contact interactions
|
||
between more than 14,000 visitors. The data is made available both in
|
||
the form of daily-aggregated static
|
||
networks~\cite{noauthor_infectious_2011} and as a list of contact
|
||
interactions (each element being a timestamp and two nodes
|
||
IDs)~\cite{noauthor_infectious_2011-1}.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\includegraphics[width=0.9\textwidth]{fig/sociopatterns.jpg}
|
||
\caption[Aggregated networks for two different days of the
|
||
SocioPatterns dataset.]{Aggregated networks for two different days
|
||
of the SocioPatterns dataset. Nodes are colored from red to purple
|
||
according to their arrival
|
||
time. (Source:~\cite{isella_whats_2011}.)}%
|
||
\label{fig:sp_plot}
|
||
\end{figure}
|
||
|
||
The interactions times of the SocioPatterns dataset show that there
|
||
are limited interactions between visitors entering the exhibition more
|
||
than one hour apart (see~\autoref{fig:sp_plot}). A consequence of this
|
||
is that the network diameter of the daily aggregated graphs connects
|
||
visitors entering the venue at successive times, as can be seen in the
|
||
figure.
|
||
|
||
Another interesting properties of these interactions is their
|
||
lengths. Most of the interaction last less than one minute, as can be
|
||
expected in the context of visitors in a museum exhibition. The
|
||
distribution of the interaction durations shows broad tails, decaying
|
||
slightly faster than a power law~\cite{isella_whats_2011}.
|
||
|
||
The temporal network has also been used in a variety of contexts from
|
||
percolation analysis and dynamical spreading to community
|
||
detection~\cite{isella_whats_2011}. These studies have confirmed that
|
||
topological criteria detect efficiently the edges acting as bridges
|
||
between communities~\cite{girvan_community_2002, holme_attack_2002}.
|
||
|
||
Many empirical temporal networks exhibit periodic
|
||
patterns~\cite{holme_modern_2015}. Many papers have explored
|
||
traditional network statistics and methods to uncover cyclic behaviour
|
||
in various datasets, mainly telecommunication
|
||
networks~\cite{jo_circadian_2012, aledavood_daily_2015,
|
||
holme_network_2003, aledavood_digital_2015}.
|
||
|
||
Visualizations show significant variations in the patterns of the
|
||
daily aggregated graphs between weekdays and weekends (see the
|
||
SocioPatterns poster in appendix). This project will attempt to apply
|
||
the topological methods on an empirical dataset to try to detect
|
||
periodicity.
|
||
|
||
\section{Computational environment}%
|
||
\label{sec:comp-envir}
|
||
|
||
The analysis pipeline described in~\autoref{sec:analysis-pipeline} is
|
||
entirely implemented in Python. For these tests, we use Python~3.5,
|
||
with Numpy~1.15.1. The library
|
||
Dionysus~2.0.7~\cite{morozov_dionysus:_2018} is used for persistent
|
||
homology, zigzag persistence, and bottleneck distance. Networks are
|
||
handled by igraph~0.7.1, and machine-learning algorithms are provided
|
||
by Scikit-Learn~0.19.2~\cite{pedregosa_scikit-learn:_2011}.
|
||
|
||
The program runs on a shared-memory system with 32 cores of 3.6~GHz,
|
||
756~GB of RAM, and 1.6~TB of storage. It runs Ubuntu Linux
|
||
16.04.5. Dionysus was compiled from the latest development version
|
||
using GCC 5.4.0 with the optimization level -O3.
|
||
|
||
\section{Results}%
|
||
\label{sec:results}
|
||
|
||
\subsection{Generative model}%
|
||
\label{sec:generative-model}
|
||
|
||
For this study, random networks have been generated with the following
|
||
parameters (keeping the notations
|
||
from~\autoref{sec:gener-model-peri}):
|
||
\begin{itemize}
|
||
\item the base graph $G$ is an Erdős-Rényi graph with 40 nodes and an
|
||
edge probability of 90\%,
|
||
\item the total time range $T$ for the sequence of times is 200,
|
||
\item the frequency $f$ is 15/200.
|
||
\end{itemize}
|
||
|
||
\autoref{fig:density} shows the density and sample times for a single
|
||
edge. A series of presence times like this is generated for each edge
|
||
in the base graph.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\includegraphics[width=.85\linewidth]{fig/density.pdf}
|
||
\caption[Example of periodic density for edge times
|
||
generation.]{Example of periodic density for edge times generation
|
||
(blue), with random edge times (red), and the sliding windows
|
||
(grey).}%
|
||
\label{fig:density}
|
||
\end{figure}
|
||
|
||
The generated temporal network is then subdivided into 20
|
||
subnetworks. The sliding windows are also represented
|
||
on~\autoref{fig:density}.
|
||
|
||
From these subnetworks, persistence is computed, in the form of
|
||
persistence diagrams. An example can be found
|
||
in~\autoref{fig:diagram}.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\includegraphics[width=.5\linewidth]{fig/diagram.pdf}
|
||
\caption{Example persistence diagram.}%
|
||
\label{fig:diagram}
|
||
\end{figure}
|
||
|
||
\autoref{fig:gen} represents the output of hierarchical clustering for
|
||
a random network, with zigzag and WRCF persistence, and the sliced
|
||
Wasserstein kernel and the bottleneck distance. This clustering is
|
||
representative of what is obtained by applying the same pipelines to
|
||
many temporal networks generated by the random model
|
||
of~\autoref{sec:gener-model-peri}.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{subfigure}[b]{0.46\linewidth}
|
||
\centering
|
||
\includegraphics[width=\linewidth]{fig/gen_zz_k.pdf}
|
||
\caption{Zigzag persistence, sliced Wasserstein kernel}%
|
||
\label{fig:gen_zz_k}
|
||
\end{subfigure}\qquad
|
||
\begin{subfigure}[b]{0.46\linewidth}
|
||
\centering
|
||
\includegraphics[width=\linewidth]{fig/gen_wrcf_k.pdf}
|
||
\caption{WRCF, sliced Wasserstein kernel}%
|
||
\label{fig:gen_wrcf_k}
|
||
\end{subfigure}
|
||
|
||
\begin{subfigure}[b]{0.46\linewidth}
|
||
\centering
|
||
\includegraphics[width=\linewidth]{fig/gen_zz_b.pdf}
|
||
\caption{Zigzag persistence, bottleneck distance}%
|
||
\label{fig:gen_zz_b}
|
||
\end{subfigure}\qquad
|
||
\begin{subfigure}[b]{0.46\linewidth}
|
||
\centering
|
||
\includegraphics[width=\linewidth]{fig/gen_wrcf_b.pdf}
|
||
\caption{WRCF, bottleneck distance}%
|
||
\label{fig:gen_wrcf_b}
|
||
\end{subfigure}
|
||
\caption{Hierarchical clustering with 10 clusters of a random
|
||
temporal network.}%
|
||
\label{fig:gen}
|
||
\end{figure}
|
||
|
||
As we can see on the figure, the hierarchical clustering algorithm is
|
||
able to determine the periodicity of the temporal networks when using
|
||
the sliced Wasserstein kernel. However, with the simple bottleneck
|
||
distance, the periodicity is not correctly detected. The periodicity
|
||
detection can be confirmed by moving further up in the dendrogram of
|
||
the clustering algorithm. With only 2 or 3 clusters, the low and high
|
||
sections of the density (\autoref{fig:density}) are still accurately
|
||
classified with the sliced Wasserstein kernel, while the subnetworks
|
||
are distributed randomly among the clusters with the bottleneck
|
||
distance.
|
||
|
||
Somewhat less clear is the comparison between zigzag persistence and
|
||
WRCF persistence. When generating many samples from the random
|
||
temporal network model, WRCF and sliced Wasserstein kernel clustering
|
||
is noisier and less consistent in its periodicity detection than its
|
||
zigzag persistence counterpart. This indicates that the aggregation of
|
||
the temporal subnetworks lead to the creation of artificial
|
||
topological features that introduce noise in the
|
||
dataset. (See~\autoref{sec:zigzag-persistence-1} for details on why
|
||
aggregation introduce artificial simplices in the temporal network.)
|
||
|
||
\subsection{SocioPatterns dataset}%
|
||
\label{sec:soci-datas}
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{subfigure}[b]{0.8\linewidth}
|
||
\centering
|
||
\includegraphics[width=.85\linewidth]{fig/sp_zz_k.pdf}
|
||
\caption{Zigzag persistence, sliced Wasserstein kernel}%
|
||
\label{fig:sp_zz_k}
|
||
\end{subfigure}
|
||
\begin{subfigure}[b]{0.8\linewidth}
|
||
\centering
|
||
\includegraphics[width=.85\linewidth]{fig/sp_wrcf_k.pdf}
|
||
\caption{WRCF, sliced Wasserstein kernel}%
|
||
\label{fig:sp_wrcf_k}
|
||
\end{subfigure}
|
||
\caption{Hierarchical clustering with 10 clusters of the SocioPatterns
|
||
dataset.}%
|
||
\label{fig:sp}
|
||
\end{figure}
|
||
|
||
In the study of the SocioPatterns dataset, we expect to uncover a
|
||
periodicity on a scale of a day. Therefore, we would like to partition
|
||
the time range of the dataset into windows approximately the length of
|
||
a day. However, this leads to very sparse subnetworks, which do not
|
||
exhibit enough topological features for a complete study. Since the
|
||
main periodicity in the dataset is expected to be a weekday/weekend
|
||
succession, we choose a resolution of two days.
|
||
|
||
The previous section has demonstrated that the sliced Wasserstein
|
||
kernel was the most suitable to uncover periodicity in a temporal
|
||
network. The results, with zigzag persistence and WRCF, of the
|
||
hierarchical clustering algorithm are shown in~\autoref{fig:sp}.
|
||
|
||
\begin{figure}[ht]
|
||
\centering
|
||
\begin{subfigure}[b]{0.45\linewidth}
|
||
\includegraphics[width=\linewidth]{fig/gen_zz_gram1.pdf}
|
||
\caption{Generative model}%
|
||
\label{fig:gram_gen}
|
||
\end{subfigure}\qquad
|
||
\begin{subfigure}[b]{0.45\linewidth}
|
||
\includegraphics[width=\linewidth]{fig/sp_zz_gram1.pdf}
|
||
\caption{SocioPatterns dataset}%
|
||
\label{fig:gram_sp}
|
||
\end{subfigure}
|
||
\caption{Gram matrices of the sliced Wasserstein kernel with zigzag
|
||
persistence.}%
|
||
\label{fig:gram}
|
||
\end{figure}
|
||
|
||
However, the subnetworks do not cluster periodically in either
|
||
case. This is confirmed by visualizing the Gram matrix of the sliced
|
||
Wasserstein kernel (\autoref{fig:gram}). The Gram matrix obtained with
|
||
the generative model exhibit a cyclical pattern, while the one from
|
||
the SocioPatterns dataset do not show enough distinctions between the
|
||
subnetworks.
|
||
|
||
It is unclear whether this is due to the analysis pipeline, or to the
|
||
temporal network itself not exhibiting enough periodicity on a
|
||
topological level. To confirm this, one would need to compare our
|
||
analysis with one using traditional network statistics, such as the
|
||
ones in~\cite{jo_circadian_2012, aledavood_daily_2015,
|
||
holme_network_2003, aledavood_digital_2015}. Other empirical
|
||
networks, such as telecommunication networks, may also exhibit more
|
||
obvious cyclical patterns, where topological features might be useful.
|
||
|
||
\chapter{Conclusions}%
|
||
\label{cha:conclusions}
|
||
|
||
\section{Topological data analysis of temporal networks}%
|
||
\label{sec:topol-data-analys}
|
||
|
||
Periodicity detection on our generative model has proven
|
||
successful. More importantly, topological features and persistent
|
||
homology seem to play an important part in the classification
|
||
task. The general idea of partitioning the time range of a temporal
|
||
network into sliding windows, and running an unsupervised clustering
|
||
algorithm, works in the context of periodicity detection.
|
||
|
||
More generally, we have introduced persistent homology and topological
|
||
data analysis methods for the study of temporal networks. Building on
|
||
previous work clustering different temporal network generative models
|
||
with persistent homology~\cite{price-wright_topological_2015}, we have
|
||
expanded both the methods used and the applications, solving the
|
||
real-world problem of periodicity detection. All in all, it is clear
|
||
that persistent homology is a promising new direction for the study of
|
||
temporal networks.
|
||
|
||
Topological data analysis is a recent field, with new methods and
|
||
approaches being constantly developed and improved. In this project,
|
||
we have compared different approaches. In the context of periodicity
|
||
detection, zigzag persistence is a small improvement over the
|
||
topological analysis of aggregated graphs using weight rank clique
|
||
filtration. If this result was confirmed by other studies, it would be
|
||
an interesting development, as it would imply that the temporal aspect
|
||
is essential and cannot be discarded easily when studying temporal
|
||
networks.
|
||
|
||
One of the most active research areas of topological data analysis has
|
||
been its applications in machine learning. Considerable efforts have
|
||
been deployed in the development of various vectorization techniques
|
||
to embed topological information into a feature space suitable for
|
||
statistical learning algorithms. In this project, a few of these
|
||
methods have been compared for their theoretical guarantees, and their
|
||
practical applications in periodicity detection. From a mathematical
|
||
point of view, kernels seem the most promising approach, by offering
|
||
strong stability properties and algebraic structures on the feature
|
||
space. This development leads to a broader class of applications in
|
||
machine learning where topological analysis can be useful. These
|
||
theoretical advances have translated into much better results for
|
||
periodicity detection. The simple bottleneck distance in the space of
|
||
diagram (with a structure of metric space) was not able to determine
|
||
any kind of periodicity in the random networks, whereas the sliced
|
||
Wasserstein kernel (embedding persistence diagrams in its RKHS, with
|
||
an metric equivalent to the distance in the space of persistence
|
||
diagrams) picked up the period accurately. This confirms previous work
|
||
on shape classification, where kernels on persistence diagrams
|
||
significantly outperformed other feature
|
||
embeddings~\cite{kusano_kernel_2017, reininghaus_stable_2015,
|
||
carriere_sliced_2017}.
|
||
|
||
Finally, we have tried to apply the same analysis to detect
|
||
periodicity on real-world data, the SocioPatterns dataset. Our model
|
||
was not able to detect a periodicity with a change of patterns between
|
||
the weekdays and the weekends. This is unclear whether it is due to
|
||
the limits in some part of our analysis pipeline, or to the
|
||
periodicity in the network being non-topological in nature. A future
|
||
study might focus on combining topological features with traditional
|
||
network statistics.
|
||
|
||
\section{Future work}%
|
||
\label{sec:future-work}
|
||
|
||
Further study of topological features of temporal networks is
|
||
needed. We could imagine other applications than periodicity
|
||
detection, such as community
|
||
detection~\cite{girvan_community_2002}. Many standard methods are
|
||
difficult to adapt to temporal network models, and computational
|
||
topology could bring an additional perspective in these tasks, by
|
||
complementing traditional network statistics.
|
||
|
||
In the specific context of periodicity detection, this analysis can be
|
||
expanded by varying the parameters such as the resolution and the
|
||
overlap. It could be especially useful for inferring the period in a
|
||
temporal network.
|
||
|
||
One should also explore the other vectorization methods in the context
|
||
of periodicity detection. It would be interesting to know how
|
||
persistence images, or the other kernels, perform in this task. Last
|
||
but not least, it is essential to compare the performance of the
|
||
topological features with more traditional network statistics. It
|
||
would also be interesting to combine both aspects and use both set of
|
||
features in machine-learning tasks.
|
||
|
||
Finally, temporal networks seem to be the ideal context to apply
|
||
multidimensional persistence. For instance, the weight rank clique
|
||
filtration adds a ``weight'' dimension to the existing time
|
||
dimension. In theory, it would be possible to use this by constructing
|
||
a 2-parameter filtration on the network, and computing persistence on
|
||
it.
|
||
|
||
|
||
|
||
|
||
\appendix
|
||
|
||
\chapter{Topology}%
|
||
\label{cha:topology}
|
||
|
||
In the following chapter, we recall a few essential definitions in
|
||
topology. This is in large part taken from~\cite{golse_mat321_2015}.
|
||
|
||
In this chapter, all vector spaces will be over a field $\mathbb{K}$,
|
||
which is either the field of real numbers $\mathbb{R}$ or the field of
|
||
complex numbers $\mathbb{C}$.
|
||
|
||
\section{Metric spaces}%
|
||
\label{sec:metric-spaces}
|
||
|
||
\begin{defn}[Distance, metric space]
|
||
An application $d : X \times X \mapsto \mathbb{R}^+$ is a \emph{distance}
|
||
over $X$ if
|
||
\begin{enumerate}[(i)]
|
||
\item $\forall x,y\in X,\; d(x,y) = 0 \Leftrightarrow x=y$ (separation),
|
||
\item $\forall x,y\in X,\; d(x,y) = d(y,x)$ (symmetry),
|
||
\item $\forall x,y,z\in X,\; d(x,y) + d(y,z) \geq d(x,z)$ (triangle inequality).
|
||
\end{enumerate}
|
||
|
||
In this case, $(X, d)$ is called a \emph{metric space}.
|
||
\end{defn}
|
||
|
||
If $Y$ is a subset of $X$ and $X$ is a metric space (with the distance
|
||
$d$), then $(Y, d)$ is immediately a metric space itself. $d_Y$ is
|
||
called the \emph{induced metric} on $Y$.
|
||
|
||
%% example?
|
||
|
||
If $(X,d)$ is metric space, then for all $x\in X$ and $r>0$, the set
|
||
\[ B(x,r) := \{y\in X : d(x,y) < r\} \] is called the \emph{open ball}
|
||
centered at $x$ and of radius $r$. The \emph{closed ball} centered at
|
||
$x$ and of radius $r$ is defined by
|
||
\[ B_c(x,r) := \{y\in X : d(x,y) \leq r\}. \]
|
||
|
||
%% example/figure?
|
||
|
||
An important class of metric spaces is the one where the set $X$ is
|
||
itself a normed vector space.
|
||
|
||
\begin{defn}[Norm]
|
||
Let $V$ be a vector space over $\mathbb{K}$. An application
|
||
$N: V\mapsto\mathbb{R}^+$ is a \emph{norm} over $V$ if
|
||
\begin{enumerate}[(i)]
|
||
\item $\forall x\in V,\; N(x) = 0 \Leftrightarrow x=0$,
|
||
\item
|
||
$\forall x\in V, \forall \lambda\in\mathbb{K},\; N(\lambda x) =
|
||
\lvert\lambda\rvert N(x)$,
|
||
\item $\forall x,y\in V,\; N(x) + N(y) \geq N(x+Y)$.
|
||
\end{enumerate}
|
||
\end{defn}
|
||
|
||
Let $(V,N)$ be a normed vector space. For every subset $X$ of $V$, one
|
||
can define $d(x,y) := N(x-y)$ for all $x,y\in X$. Using the properties
|
||
of the norm $N$, one can check easily that $d$ is a distance, and
|
||
therefore $(X,d)$ is a metric space.
|
||
|
||
%% examples?
|
||
|
||
There are many norms possible on a vector space. This brings the need
|
||
to compare these various norms.
|
||
|
||
\begin{defn}[Norm equivalence]
|
||
Let $V$ be a vector space. Two norms $N_1$ and $N_2$ on $V$ are said
|
||
to be \emph{equivalent} if there are two constants $C_1$ and $C_2$
|
||
such that
|
||
\[ \forall x\in V,\quad N_1(x) \leq C_1 N_2(x) \quad\text{and}\quad N_2(x)
|
||
\leq C_2 N_1(x). \]
|
||
\end{defn}
|
||
|
||
Geometrically speaking, two norms are equivalent if the unit ball for
|
||
the norm $N_1$ contains a non-empty ball centred at 0 for the norm
|
||
$N_2$, and vice-versa.
|
||
|
||
%% all norms are equivalent in a vector space of finite dimension?
|
||
|
||
\section{Completeness}%
|
||
\label{sec:completeness}
|
||
|
||
\begin{defn}[Convergence]
|
||
A sequence ${(x_n)}_{n\in\mathbb{N}}$ of elements of a metric space
|
||
$(X,d)$ \emph{converges} to a limit $x$ if
|
||
\[ \lim_{n\rightarrow\infty} d(x_n,x) = 0. \]
|
||
\end{defn}
|
||
|
||
\begin{defn}[Cauchy sequence]
|
||
A sequence ${(x_n)}_{n\in\mathbb{N}}$ of elements of a metric space
|
||
$(X,d)$ is a \emph{Cauchy sequence} if
|
||
\[ \forall\varepsilon>0, \exists n_0\in\mathbb{N},\; \text{such
|
||
that:}\; \forall n,m\geq n_0,\; d(x_n, x_m) < \varepsilon. \]
|
||
\end{defn}
|
||
|
||
Note that every convergent sequence is a Cauchy sequence, but the
|
||
opposite is not true in general. %% counter-example?
|
||
|
||
\begin{defn}[Completeness]
|
||
A metric space $(X,d)$ is \emph{complete} if, and only if, every
|
||
Cauchy sequence converges to an element of $X$.
|
||
\end{defn}
|
||
|
||
%% examples?
|
||
|
||
%% properties? would need open/close
|
||
|
||
\begin{defn}[Banach space]
|
||
A \emph{Banach space} is a complete normed vector space.
|
||
\end{defn}
|
||
|
||
\section{Hilbert spaces}%
|
||
\label{sec:hilbert-spaces}
|
||
|
||
In this section, vector spaces are defined over~$\mathbb{C}$. The
|
||
theory extends easily to vector spaces over~$\mathbb{R}$.
|
||
|
||
An application $L$ between two $\mathbb{C}$-vector spaces $V$ and $W$
|
||
is said to be \emph{anti-linear} if
|
||
\[ \forall \lambda,\mu\in\mathbb{C}, \forall x,y\in V,\; L(\lambda x +
|
||
\mu y) = \bar{\lambda} L(x) + \bar{\mu} y. \]
|
||
|
||
\begin{defn}[Hermitian product]
|
||
An application $\langle\cdot,\cdot\rangle : V \times V \mapsto \mathbb{C}$ is
|
||
\begin{enumerate}[(i)]
|
||
\item a \emph{sesquilinear form} if $x\mapsto\langle x,y \rangle$ is
|
||
linear and $y\mapsto\langle x,y \rangle$ is anti-linear,
|
||
\item a \emph{Hermitian form} if it is sesquilinear and
|
||
$\langle x,y \rangle = \overline{\langle y,x \rangle}$ for all
|
||
$x,y\in V$,
|
||
\item a \emph{Hermitian product} if it is a Hermitian form positive
|
||
definite, i.e.\ if $\langle x,x \rangle > 0$ for all $x \neq 0$.
|
||
\end{enumerate}
|
||
\end{defn}
|
||
|
||
\begin{rem}
|
||
In the case of vector spaces over $\mathbb{R}$, sesquilinear forms
|
||
are simply bilinear, Hermitian forms are symmetric bilinear, and
|
||
Hermitian products are inner products.
|
||
\end{rem}
|
||
|
||
\begin{prop}[Cauchy-Schwartz inequality]
|
||
Let $\langle\cdot,\cdot\rangle$ be a Hermitian product over
|
||
$V$. Then, for all $x,y\in V$,
|
||
\[ \lvert\langle x,y \rangle\rvert \leq \sqrt{\langle x,x \rangle}
|
||
\sqrt{\langle y,y \rangle}, \] where the two sides are equal if
|
||
and only if $x$ and $y$ are linearly dependent.
|
||
\end{prop}
|
||
|
||
\begin{proof}
|
||
Suppose that $x \neq 0$ and $y \neq 0$ (otherwise the proposition is
|
||
obvious). For all $t > 0$, we compute
|
||
\[ \langle x - ty, x - ty \rangle = \langle x,x \rangle - 2t
|
||
\mathrm{Re}\langle x,y \rangle + t^2 \langle y,y \rangle \geq
|
||
0. \]
|
||
|
||
Thus, for all $t > 0$,
|
||
\[ 2 \mathrm{Re} \langle x,y \rangle \leq \frac{1}{t}\langle x,x
|
||
\rangle + t \langle y,y \rangle. \]
|
||
|
||
We minimize the right-hand side by choosing
|
||
\[ t = \sqrt{\frac{\langle x,x \rangle}{\langle y,y \rangle}}, \]
|
||
thus
|
||
\[ \mathrm{Re} \langle x,y \rangle \leq \sqrt{\langle x,x \rangle}
|
||
\sqrt{\langle y,y \rangle}. \]
|
||
|
||
The inequality follows by replacing $x$ by $e^{i\theta}x$ and using the fact that
|
||
\[ \forall z\in\mathbb{C},\; \lvert z \rvert =
|
||
\sup_{\theta\in\mathbb{R}} \mathrm{Re}\left(e^{i\theta} z
|
||
\right). \]
|
||
|
||
The equality case follows from setting
|
||
$\langle x - ty, x - ty \rangle = 0$.
|
||
\end{proof}
|
||
|
||
If $\langle\cdot,\cdot\rangle$ is a Hermitian product over $V$, it can
|
||
be verified easily that the form
|
||
\[ \lVert\cdot\rVert : x \mapsto \sqrt{\langle x,x \rangle} \] is a
|
||
norm over $V$. The triangle inequality comes from the Cauchy-Schwartz
|
||
formula. %% proof?
|
||
|
||
\begin{defn}[Pre-Hilbert space]
|
||
A \emph{pre-Hilbert space} is a vector space $V$ with a Hermitian
|
||
product $\langle\cdot,\cdot\rangle$ and the associated norm
|
||
$\lVert\cdot\rVert$. It is a metric space for the distance
|
||
$d(x,y) := \lVert x-y \rVert$.
|
||
\end{defn}
|
||
|
||
\begin{defn}[Hilbert space]
|
||
A pre-Hilbert space $H$ is a \emph{Hilbert space} if
|
||
$(H, \lVert\cdot\rVert)$ is a Banach space.
|
||
\end{defn}
|
||
|
||
\addtocounter{chapter}{1}
|
||
\addcontentsline{toc}{chapter}{\protect\numberline{\thechapter} Infectious SocioPatterns poster}
|
||
\includepdf{fig/infectious_poster.pdf}
|
||
% \includepdf{fig/infectious_poster_highres.pdf}\label{cha:infectious_poster}
|
||
|
||
|
||
|
||
\chapter{Code}%
|
||
\label{cha:code}
|
||
|
||
\section{\texttt{zigzag.py}}%
|
||
\label{sec:zigzagpy}
|
||
|
||
\inputminted{python}{zigzag.py}
|
||
|
||
\section{\texttt{wrcf.py}}%
|
||
\label{sec:wrcfpy}
|
||
|
||
\inputminted{python}{wrcf.py}
|
||
|
||
\section{\texttt{sliced\_wasserstein.py}}%
|
||
\label{sec:sliced_wassersteinpy}
|
||
|
||
\inputminted{python}{sliced_wasserstein.py}
|
||
|
||
\section{\texttt{generative.py}}%
|
||
\label{sec:generativepy}
|
||
|
||
\inputminted{python}{generative.py}
|
||
|
||
\section{\texttt{sociopatterns.py}}%
|
||
\label{sec:sociopatternspy}
|
||
|
||
\inputminted{python}{sociopatterns.py}
|
||
|
||
\section{\texttt{clustering.py}}%
|
||
\label{sec:clusteringpy}
|
||
|
||
\inputminted{python}{clustering.py}
|
||
|
||
\backmatter%
|
||
|
||
% \nocite{*}
|
||
\printbibliography%
|
||
|
||
\end{document}
|
||
|
||
|
||
|
||
%%% Local Variables:
|
||
%%% mode: latex
|
||
%%% TeX-master: t
|
||
%%% End:
|