Add experiments and conclusion
This commit is contained in:
parent
3524466d4c
commit
044a011a4e
5 changed files with 143 additions and 0 deletions
|
@ -157,4 +157,56 @@ optimisation algorithm to compute higher-level distances.
|
|||
#+caption: Representation of two documents in topic space, along with how the distance was computed between them. Everything is interpretable: from the documents as collections of topics, to the matchings between topics determining the overall distance between the books citep:yurochkin2019_hierar_optim_trans_docum_repres.
|
||||
[[file:/images/hott_fig1.png]]
|
||||
|
||||
* Experiments
|
||||
|
||||
The paper is very complete regarding experiments, providing a full
|
||||
evaluation of the method on one particular application: document
|
||||
clustering. They use [[https://scikit-learn.org/stable/modules/decomposition.html#latentdirichletallocation][Latent Dirichlet Allocation]] to compute topics and
|
||||
GloVe for pretrained word embeddings
|
||||
citep:moschitti2014_proceed_confer_empir_method_natur, and [[https://www.gurobi.com/][Gurobi]] to
|
||||
solve the optimisation problems. Their code is available [[https://github.com/IBM/HOTT][on Github]].
|
||||
|
||||
If you want the details, I encourage you to read the full paper, they
|
||||
tested the methods on a wide variety of datasets, with datasets
|
||||
containing very short documents (like Twitter), and long documents
|
||||
with a large vocabulary (books). With a simple $k$-NN classification,
|
||||
they establish that HOTT performs best on average, especially on large
|
||||
vocabularies (books, the "gutenberg" dataset). It also has a much
|
||||
better computational performance than alternative methods based on
|
||||
regularisation of the optimal transport problem directly on words. So
|
||||
the hierarchical nature of the approach allows to gain considerably in
|
||||
performance, along with improvements in interpretability.
|
||||
|
||||
What's really interesting in the paper is the sensitivity analysis:
|
||||
they ran experiments with different word embeddings methods (word2vec,
|
||||
citep:mikolovDistributedRepresentationsWords2013), and with different
|
||||
parameters for the topic modelling (topic truncation, number of
|
||||
topics, etc). All of these reveal that changes in hyperparameters do
|
||||
not impact the performance of HOTT significantly. This is extremely
|
||||
important in a field like NLP where most of the times small variations
|
||||
in approach lead to drastically different results.
|
||||
|
||||
* Conclusion
|
||||
|
||||
All in all, this paper present a very interesting approach to compute
|
||||
distance between natural-language documents. It is no secret that I
|
||||
like methods with strong theoretical background (in this case
|
||||
optimisation and optimal transport), guaranteeing a stability and
|
||||
benefiting from decades of research in a well-established domain.
|
||||
|
||||
Most importantly, this paper allows for future exploration in document
|
||||
representation with /interpretability/ in mind. This is often added as
|
||||
an afterthought in academic research but is one of the most important
|
||||
topics for the industry, as a system must be understood by end users,
|
||||
often not trained in ML, before being deployed. The notion of topic,
|
||||
and distances as weights, can be understood easily by anyone without
|
||||
significant background in ML or in maths.
|
||||
|
||||
Finally, I feel like they did not stop at a simple theoretical
|
||||
argument, but carefully checked on real-world datasets, measuring
|
||||
sensitivity to all the arbitrary choices they had to take. Again, from
|
||||
an industry perspective, this allows to implement the new approach
|
||||
quickly and easily, confident that it won't break unexpectedly without
|
||||
extensive testing.
|
||||
|
||||
* References
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue