Topic modeling is a type of statistical modeling for discovering the abstract ``topics'' that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a fixed number of topics starting from words in each document modeled according to a Dirichlet distribution. In this work we are going to apply LDA to a set of songs from four famous Italian songwriters and split them into topics. This work studies the use of themes in lyrics using statistical analysis to detect topics. Aim of the work is to underline the main limits of the standard unsupervised LDA and to propose a supervised extension based on the Correspondence Analysis (CA) association theory.

Mariangela Sciandra, Alessandro Albano, Irene Carola Spera (2020). Supervised vs Unsupervised Latent DirichletAllocation: topic detection in lyrics.. In Book of short papers - SIS 2020.

Supervised vs Unsupervised Latent DirichletAllocation: topic detection in lyrics.

Mariangela Sciandra
;
Alessandro Albano;Irene Carola Spera
2020-01-01

Abstract

Topic modeling is a type of statistical modeling for discovering the abstract ``topics'' that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a fixed number of topics starting from words in each document modeled according to a Dirichlet distribution. In this work we are going to apply LDA to a set of songs from four famous Italian songwriters and split them into topics. This work studies the use of themes in lyrics using statistical analysis to detect topics. Aim of the work is to underline the main limits of the standard unsupervised LDA and to propose a supervised extension based on the Correspondence Analysis (CA) association theory.
2020
Settore SECS-S/01 - Statistica
9788891910776
Mariangela Sciandra, Alessandro Albano, Irene Carola Spera (2020). Supervised vs Unsupervised Latent DirichletAllocation: topic detection in lyrics.. In Book of short papers - SIS 2020.
File in questo prodotto:
File Dimensione Formato  
Sciandra, Albano, Spera.pdf

Solo gestori archvio

Tipologia: Post-print
Dimensione 306.02 kB
Formato Adobe PDF
306.02 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/434533
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact