Probabilistic topic models, such as LDA, are standard text analysis algorithms that provide predictive and latent topic representation for a corpus. However, due to the unsupervised training process, it is difficult to verify the assumption that the latent space discovered by these models is generally meaningful and valuable. This paper introduces a two-stage LDA algorithm to estimate latent topics in text documents and use readability scores to link the identified topics to a linguistically motivated latent structure. We define a new interpretative tool called induced topic readability, which is used to rank topics from the one with the most complex linguistic structure to the one with the lowest semantic content readily. The usefulness of our method is shown with an application to real data, using articles from the New York Times.
Mariangela Sciandra, Alessandro Albano (2022). A two-stage LDA algorithm for ranking induced topic readability. In JADT 2022 proceedings book.
A two-stage LDA algorithm for ranking induced topic readability
Mariangela Sciandra
;Alessandro Albano
2022-07-01
Abstract
Probabilistic topic models, such as LDA, are standard text analysis algorithms that provide predictive and latent topic representation for a corpus. However, due to the unsupervised training process, it is difficult to verify the assumption that the latent space discovered by these models is generally meaningful and valuable. This paper introduces a two-stage LDA algorithm to estimate latent topics in text documents and use readability scores to link the identified topics to a linguistically motivated latent structure. We define a new interpretative tool called induced topic readability, which is used to rank topics from the one with the most complex linguistic structure to the one with the lowest semantic content readily. The usefulness of our method is shown with an application to real data, using articles from the New York Times.File | Dimensione | Formato | |
---|---|---|---|
A two-stage LDA algorithm for ranking induced topic readability.pdf
Solo gestori archvio
Descrizione: Contributo completo
Tipologia:
Post-print
Dimensione
270.25 kB
Formato
Adobe PDF
|
270.25 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.