diff --git a/_site/atom.xml b/_site/atom.xml index aa6b018..4f833c0 100644 --- a/_site/atom.xml +++ b/_site/atom.xml @@ -66,7 +66,7 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

The first one can be precomputed once for all subsequent distances, so it is invariable in the number of documents we have to process. The second one only operates on \(\lvert T \rvert\) topics instead of the full vocabulary: the resulting optimisation problem is much smaller! This is great for performance, as it should be easy now to compute all pairwise distances in a large set of documents.

Another interesting insight is that topics are represented as collections of words (we can keep the top 20 as a visual representations), and documents as collections of topics with weights. Both of these representations are highly interpretable for a human being who wants to understand what’s going on. I think this is one of the strongest aspects of these approaches: both the various representations and the algorithms are fully interpretable. Compared to a deep learning approach, we can make sense of every intermediate step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances.

-Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
+Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).

Experiments

The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

diff --git a/_site/images/hott_fig1.jpg b/_site/images/hott_fig1.jpg new file mode 100644 index 0000000..ff2c438 Binary files /dev/null and b/_site/images/hott_fig1.jpg differ diff --git a/_site/posts/hierarchical-optimal-transport-for-document-classification.html b/_site/posts/hierarchical-optimal-transport-for-document-classification.html index 0575ce3..dc1efe3 100644 --- a/_site/posts/hierarchical-optimal-transport-for-document-classification.html +++ b/_site/posts/hierarchical-optimal-transport-for-document-classification.html @@ -85,7 +85,7 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

The first one can be precomputed once for all subsequent distances, so it is invariable in the number of documents we have to process. The second one only operates on \(\lvert T \rvert\) topics instead of the full vocabulary: the resulting optimisation problem is much smaller! This is great for performance, as it should be easy now to compute all pairwise distances in a large set of documents.

Another interesting insight is that topics are represented as collections of words (we can keep the top 20 as a visual representations), and documents as collections of topics with weights. Both of these representations are highly interpretable for a human being who wants to understand what’s going on. I think this is one of the strongest aspects of these approaches: both the various representations and the algorithms are fully interpretable. Compared to a deep learning approach, we can make sense of every intermediate step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances.

-Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
+Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).

Experiments

The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

diff --git a/_site/rss.xml b/_site/rss.xml index 86c3021..37669a1 100644 --- a/_site/rss.xml +++ b/_site/rss.xml @@ -62,7 +62,7 @@ W_1(p, q) = \min_{P \in \mathbb{R}_+^{n\times m}} \sum_{i,j} C_{i,j} P_{i,j}

The first one can be precomputed once for all subsequent distances, so it is invariable in the number of documents we have to process. The second one only operates on \(\lvert T \rvert\) topics instead of the full vocabulary: the resulting optimisation problem is much smaller! This is great for performance, as it should be easy now to compute all pairwise distances in a large set of documents.

Another interesting insight is that topics are represented as collections of words (we can keep the top 20 as a visual representations), and documents as collections of topics with weights. Both of these representations are highly interpretable for a human being who wants to understand what’s going on. I think this is one of the strongest aspects of these approaches: both the various representations and the algorithms are fully interpretable. Compared to a deep learning approach, we can make sense of every intermediate step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances.

-Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
+Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).
Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books (Yurochkin et al. 2019).

Experiments

The paper is very complete regarding experiments, providing a full evaluation of the method on one particular application: document clustering. They use Latent Dirichlet Allocation to compute topics and GloVe for pretrained word embeddings (Pennington, Socher, and Manning 2014), and Gurobi to solve the optimisation problems. Their code is available on GitHub.

diff --git a/images/hott_fig1.jpg b/images/hott_fig1.jpg new file mode 100644 index 0000000..ff2c438 Binary files /dev/null and b/images/hott_fig1.jpg differ diff --git a/images/hott_fig1.png b/images/hott_fig1.png deleted file mode 100644 index 3b7799f..0000000 Binary files a/images/hott_fig1.png and /dev/null differ diff --git a/posts/hierarchical-optimal-transport-for-document-classification.org b/posts/hierarchical-optimal-transport-for-document-classification.org index 59c4c37..5fc4e55 100644 --- a/posts/hierarchical-optimal-transport-for-document-classification.org +++ b/posts/hierarchical-optimal-transport-for-document-classification.org @@ -155,7 +155,7 @@ step, from the representations of topics to the weights in the optimisation algorithm to compute higher-level distances. #+caption: Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books citep:yurochkin2019_hierar_optim_trans_docum_repres. -[[file:/images/hott_fig1.png]] +[[file:/images/hott_fig1.jpg]] * Experiments