The aim of this paper is to present a new point of view that makes it possible to give a statistical interpretation of the traditional latent semantic analysis (LSA) paradigm based on the truncated singular value decomposition (TSVD) technique. We show how the TSVD can be interpreted as a statistical estimator derived from the LSA co-occurrence relationship matrix by mapping probability distributions on Riemanian manifolds. Besides, the quality of the estimator model can be expressed by introducing a figure of merit arising from the Solomonoff approach. This figure of merit takes into account both the adherence to the sample data and the simplicity of the model. In our model, the simplicity parameter of the proposed figure of merit depends on the number of the singular values retained after the truncation process, while the TSVD estimator, according to the Hellinger distance, guarantees the minimal distance between the sample probability distribution and the inferred probabilistic model.

Giovanni, P., Giorgio, V. (2015). TSVD as a Statistical Estimator in the Latent Semantic Analysis Paradigm. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 3(2), 185-192 [10.1109/TETC.2014.2385594].

TSVD as a Statistical Estimator in the Latent Semantic Analysis Paradigm

Giorgio Vassallo
2015-01-01

Abstract

The aim of this paper is to present a new point of view that makes it possible to give a statistical interpretation of the traditional latent semantic analysis (LSA) paradigm based on the truncated singular value decomposition (TSVD) technique. We show how the TSVD can be interpreted as a statistical estimator derived from the LSA co-occurrence relationship matrix by mapping probability distributions on Riemanian manifolds. Besides, the quality of the estimator model can be expressed by introducing a figure of merit arising from the Solomonoff approach. This figure of merit takes into account both the adherence to the sample data and the simplicity of the model. In our model, the simplicity parameter of the proposed figure of merit depends on the number of the singular values retained after the truncation process, while the TSVD estimator, according to the Hellinger distance, guarantees the minimal distance between the sample probability distribution and the inferred probabilistic model.
2015
Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni
Giovanni, P., Giorgio, V. (2015). TSVD as a Statistical Estimator in the Latent Semantic Analysis Paradigm. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 3(2), 185-192 [10.1109/TETC.2014.2385594].
File in questo prodotto:
File Dimensione Formato  
06995958.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 2.4 MB
Formato Adobe PDF
2.4 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/264961
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 29
  • ???jsp.display-item.citation.isi??? 20
social impact