diff --git a/_site/atom.xml b/_site/atom.xml
index bee6725..efc4977 100644
--- a/_site/atom.xml
+++ b/_site/atom.xml
@@ -637,7 +637,7 @@
x_n)\), and \(y = (y_1, y_2, \ldots, y_n)\), along with probability distributions \(p \in \Delta^n\), \(q \in \Delta^m\) over \(x\) and \(y\) (\(\Delta^n\) is the probability simplex of dimension \(n\), i.e. the set of vectors of size \(n\) summing to 1). We can then define the Wasserstein distance as \[
W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}
\]\[
-\text{\small subject to } \sum_j P_{i,j} = p_i \text{ \small and } \sum_i P_{i,j} = q_j,
+\text{subject to } \sum_j P_{i,j} = p_i \text{ and } \sum_i P_{i,j} = q_j,
\] where \(C_{i,j} = d(x_i, x_j)\) are the costs computed from the original distance between points, and \(P_{i,j}\) represent the amount we are moving from pile \(i\) to pile \(j\).
Now, how can this be applied to a natural language setting? Once we have word embeddings, we can consider that the vocabulary forms a metric space (we can compute a distance, for instance the euclidean or the cosine distance, between two word embeddings). The key is to define documents as distributions over words.
Given a vocabulary \(V \subset \mathbb{R}^n\) and a corpus \(D = (d^1, d^2, \ldots, d^{\lvert D \rvert})\), we represent a document as \(d^i \in \Delta^{l_i}\) where \(l_i\) is the number of unique words in \(d^i\), and \(d^i_j\) is the proportion of word \(v_j\) in the document \(d^i\). The word mover’s distance (WMD) is then defined simply as \[ \operatorname{WMD}(d^1, d^2) = W_1(d^1, d^2). \]
diff --git a/_site/contact.html b/_site/contact.html
index 0a552a2..3078e9d 100644
--- a/_site/contact.html
+++ b/_site/contact.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
diff --git a/_site/css/default.css b/_site/css/default.css
index 0420568..9ee0a84 100644
--- a/_site/css/default.css
+++ b/_site/css/default.css
@@ -1 +1 @@
-body{background:#fafafa;counter-reset:theorem;counter-reset:definition;counter-reset:sidenote-counter}header{margin:1em 0 2em 0}header nav{width:87.5%}header nav a{display:inline-block;font-size:1.4rem;font-weight:bold;text-decoration:none!important;margin:0 0.5rem;padding:0 0}header nav a:first-child{margin-left:0}header nav a:last-child{margin-right:0}footer{margin-top:3rem;padding:.5rem 0;border-top:0.1rem solid #000;color:#555}article .header{font-style:italic;color:#555}blockquote{display:block;border-left:2px solid #808080;padding-left:1rem;margin-top:1rem;margin-bottom:1rem;color:#333;background:#eeeeee}.definition,.proposition,.theorem{display:block;border-left:2px solid #808080;padding-left:1rem;margin-top:1rem;margin-bottom:1rem}.proposition,.theorem{counter-increment:theorem;font-style:italic}.definition p:first-child,.proposition p:first-child,.theorem p:first-child,.proof p:first-child{display:inline}.definition p,.proposition p,.theorem p,.proof p{margin-top:0.4rem;margin-bottom:0.4rem;padding-left:1rem}.theorem:before{content:"Theorem " counter(theorem) ".";font-weight:bold;font-style:normal}.proposition:before{content:"Proposition " counter(theorem) ".";font-weight:bold;font-style:normal}.definition:before{counter-increment:definition;content:"Definition " counter(definition) ".";font-weight:bold;font-style:normal}.proof:before{content:"Proof.";font-style:italic}.proof:after{content:"\220E";float:right}
\ No newline at end of file
+body{background:#fafafa}header{margin:1em 0 2em 0}header nav{width:87.5%}header nav a{display:inline-block;font-size:1.4rem;font-weight:bold;text-decoration:none!important;margin:0 0.5rem;padding:0 0}header nav a:first-child{margin-left:0}header nav a:last-child{margin-right:0}footer{margin-top:3rem;padding:.5rem 0;border-top:0.1rem solid #000;color:#555}article .header{font-style:italic;color:#555}blockquote{display:block;border-left:2px solid #808080;padding-left:1rem;margin-top:1rem;margin-bottom:1rem;color:#333;background:#eeeeee}.definition,.proposition,.theorem{display:block;border-left:2px solid #808080;padding-left:1rem;margin-top:1rem;margin-bottom:1rem}.proposition,.theorem{font-style:italic}.definition p:first-child,.proposition p:first-child,.theorem p:first-child,.proof p:first-child{display:inline}.definition p,.proposition p,.theorem p,.proof p{margin-top:0.4rem;margin-bottom:0.4rem;padding-left:1rem}.theorem:before{content:"Theorem.";font-weight:bold;font-style:normal}.proposition:before{content:"Proposition.";font-weight:bold;font-style:normal}.definition:before{content:"Definition.";font-weight:bold;font-style:normal}.proof:before{content:"Proof.";font-style:italic}.proof:after{content:"\220E";float:right}
\ No newline at end of file
diff --git a/_site/cv.html b/_site/cv.html
index 225857c..0b5b985 100644
--- a/_site/cv.html
+++ b/_site/cv.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
diff --git a/_site/index.html b/_site/index.html
index 8e585fd..a539d5a 100644
--- a/_site/index.html
+++ b/_site/index.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
diff --git a/_site/posts/dyalog-apl-competition-2020-phase-1.html b/_site/posts/dyalog-apl-competition-2020-phase-1.html
index 2d93343..2785377 100644
--- a/_site/posts/dyalog-apl-competition-2020-phase-1.html
+++ b/_site/posts/dyalog-apl-competition-2020-phase-1.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
diff --git a/_site/posts/dyalog-apl-competition-2020-phase-2.html b/_site/posts/dyalog-apl-competition-2020-phase-2.html
index 2f5bfc7..3d88f83 100644
--- a/_site/posts/dyalog-apl-competition-2020-phase-2.html
+++ b/_site/posts/dyalog-apl-competition-2020-phase-2.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
diff --git a/_site/posts/ginibre-ensemble.html b/_site/posts/ginibre-ensemble.html
index ba9870c..4199cdf 100644
--- a/_site/posts/ginibre-ensemble.html
+++ b/_site/posts/ginibre-ensemble.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
diff --git a/_site/posts/hierarchical-optimal-transport-for-document-classification.html b/_site/posts/hierarchical-optimal-transport-for-document-classification.html
index 66af7ff..c8901f8 100644
--- a/_site/posts/hierarchical-optimal-transport-for-document-classification.html
+++ b/_site/posts/hierarchical-optimal-transport-for-document-classification.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
@@ -81,7 +71,7 @@
x_n)\), and \(y = (y_1, y_2, \ldots, y_n)\), along with probability distributions \(p \in \Delta^n\), \(q \in \Delta^m\) over \(x\) and \(y\) (\(\Delta^n\) is the probability simplex of dimension \(n\), i.e. the set of vectors of size \(n\) summing to 1). We can then define the Wasserstein distance as \[
W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}
\]\[
-\text{\small subject to } \sum_j P_{i,j} = p_i \text{ \small and } \sum_i P_{i,j} = q_j,
+\text{subject to } \sum_j P_{i,j} = p_i \text{ and } \sum_i P_{i,j} = q_j,
\] where \(C_{i,j} = d(x_i, x_j)\) are the costs computed from the original distance between points, and \(P_{i,j}\) represent the amount we are moving from pile \(i\) to pile \(j\).
Now, how can this be applied to a natural language setting? Once we have word embeddings, we can consider that the vocabulary forms a metric space (we can compute a distance, for instance the euclidean or the cosine distance, between two word embeddings). The key is to define documents as distributions over words.
Given a vocabulary \(V \subset \mathbb{R}^n\) and a corpus \(D = (d^1, d^2, \ldots, d^{\lvert D \rvert})\), we represent a document as \(d^i \in \Delta^{l_i}\) where \(l_i\) is the number of unique words in \(d^i\), and \(d^i_j\) is the proportion of word \(v_j\) in the document \(d^i\). The word mover’s distance (WMD) is then defined simply as \[ \operatorname{WMD}(d^1, d^2) = W_1(d^1, d^2). \]
Now, how can this be applied to a natural language setting? Once we have word embeddings, we can consider that the vocabulary forms a metric space (we can compute a distance, for instance the euclidean or the cosine distance, between two word embeddings). The key is to define documents as distributions over words.
Given a vocabulary \(V \subset \mathbb{R}^n\) and a corpus \(D = (d^1, d^2, \ldots, d^{\lvert D \rvert})\), we represent a document as \(d^i \in \Delta^{l_i}\) where \(l_i\) is the number of unique words in \(d^i\), and \(d^i_j\) is the proportion of word \(v_j\) in the document \(d^i\). The word mover’s distance (WMD) is then defined simply as \[ \operatorname{WMD}(d^1, d^2) = W_1(d^1, d^2). \]
diff --git a/_site/skills.html b/_site/skills.html
index 45d5d32..9bdce8a 100644
--- a/_site/skills.html
+++ b/_site/skills.html
@@ -15,19 +15,9 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
diff --git a/site.hs b/site.hs
index 71b8d27..50734d8 100644
--- a/site.hs
+++ b/site.hs
@@ -164,7 +164,7 @@ myReadPandocBiblio ropt csl biblio item = do
return $ fmap (const pandoc') item
--- Pandoc compiler with KaTeX and bibliography support --------------------
+-- Pandoc compiler with maths, TOC, sidenots, and bibliography support --------------------
customPandocCompiler :: Bool -> Compiler (Item String)
customPandocCompiler withTOC =
let customExtensions = extensionsFromList [Ext_latex_macros]
diff --git a/templates/archive.html b/templates/archive.html
index b43eeb2..1c254db 100644
--- a/templates/archive.html
+++ b/templates/archive.html
@@ -1,2 +1,2 @@
-Here you can find all my previous posts:
+