Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. When the data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In this paper, we extend to two dimensions the measures δ and γ, defined in terms of the submatrices of the input, as well as the measures g, g_rl, and b, which are based on copy-paste mechanisms. We study their properties and mutual relationships, and we show that the two classes of measures become incomparable when two-dimensional inputs are considered. We also compare our measures with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024], and provide some insights for the design of effective 2D compressors.

Carfagna, L., Manzini, G., Romana, G., Sciortino, M., Urbina, C. (2024). Generalization of Repetitiveness Measures for Two-Dimensional Strings. In Z. Zsuzsanna Lipták, E. Moura, K. Figueroa, R. Baeza-Yates (a cura di), String Processing and Information Retrieval 31st International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23–25, 2024, Proceedings (pp. 57-72) [10.1007/978-3-031-72200-4_5].

Generalization of Repetitiveness Measures for Two-Dimensional Strings

Manzini, Giovanni
Co-primo
;
Romana, Giuseppe
Co-primo
;
Sciortino, Marinella
Co-primo
;
Urbina, Cristian
Co-primo
2024-01-01

Abstract

Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. When the data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In this paper, we extend to two dimensions the measures δ and γ, defined in terms of the submatrices of the input, as well as the measures g, g_rl, and b, which are based on copy-paste mechanisms. We study their properties and mutual relationships, and we show that the two classes of measures become incomparable when two-dimensional inputs are considered. We also compare our measures with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024], and provide some insights for the design of effective 2D compressors.
2024
Settore INFO-01/A - Informatica
9783031721991
9783031722004
Carfagna, L., Manzini, G., Romana, G., Sciortino, M., Urbina, C. (2024). Generalization of Repetitiveness Measures for Two-Dimensional Strings. In Z. Zsuzsanna Lipták, E. Moura, K. Figueroa, R. Baeza-Yates (a cura di), String Processing and Information Retrieval 31st International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23–25, 2024, Proceedings (pp. 57-72) [10.1007/978-3-031-72200-4_5].
File in questo prodotto:
File Dimensione Formato  
GeneralizationRepetitivenessMeasuresTwoDimensionalStrings.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 306.24 kB
Formato Adobe PDF
306.24 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/665260
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact