Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common structure, but they are not able to detect the difference on the number of occurrences of such a structure in the two sequences. In order to take into account such a multiplicity, we introduce the notion of multifactor, and define a new measure that uses both absent words and multifactors. Surprisingly, we prove that this similarity measure coincides with a distance on sequences introduced by Ehrenfeucht and Haussler in [2], in the context of block-moves strategies. In this way, our result creates a non trivial bridge between similarity measures based on absent words and those based on the block-moves approach.

Castiglione, G., Mantaci, S., Restivo, A. (2020). Some Investigations on Similarity Measures Based on Absent Words. FUNDAMENTA INFORMATICAE, 171(1-4), 97-112 [10.3233/FI-2020-1874].

Some Investigations on Similarity Measures Based on Absent Words

Castiglione G.;Mantaci S.;Restivo A.

2020-01-01

Abstract

In this paper we investigate similarity measures based on minimal absent words, introduced by Chairungsee and Crochemore in [1]. They make use of a length-weighted index on a sample set corresponding to the symmetric difference M(x)ΔM(y) of the minimal absent words M(x) and M(y) of two sequences x and y, respectively. We first propose a variant of this measure by choosing as a sample set a proper subset (x, y) of M(x)ΔM(y), which appears to be more appropriate for distinguishing x and y. From the algebraic point of view, we prove that (x, y) is the base of the ideal generated by M(x)ΔM(y). We then remark that such measures are able to recognize whether the sequences x and y share a common structure, but they are not able to detect the difference on the number of occurrences of such a structure in the two sequences. In order to take into account such a multiplicity, we introduce the notion of multifactor, and define a new measure that uses both absent words and multifactors. Surprisingly, we prove that this similarity measure coincides with a distance on sequences introduced by Ehrenfeucht and Haussler in [2], in the context of block-moves strategies. In this way, our result creates a non trivial bridge between similarity measures based on absent words and those based on the block-moves approach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2020
			
	Settore scientifico disciplinare del contributo
	
				Settore INFO-01/A - Informatica
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				FUNDAMENTA INFORMATICAE
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.3233/FI-2020-1874
			
	URL dell'editore (Open access ove possibile)
	
				https://journals.sagepub.com/doi/abs/10.3233/FI-2020-1874
			
	Citazione
	
				Castiglione, G., Mantaci, S., Restivo, A. (2020). Some Investigations on Similarity Measures Based on Absent Words. FUNDAMENTA INFORMATICAE, 171(1-4), 97-112 [10.3233/FI-2020-1874].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Distances-2.pdf accesso aperto Descrizione: articolo Tipologia: Post-print Dimensione 215.2 kB Formato Adobe PDF Visualizza/Apri	215.2 kB	Adobe PDF	Visualizza/Apri
2620105.pdf Solo gestori archvio Tipologia: Versione Editoriale Dimensione 5.08 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	5.08 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/429666

Citazioni

ND

11

9

social impact