Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. When the data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In this paper, we extend to two dimensions the measures δ and γ, defined in terms of the submatrices of the input, as well as the measures g, g_rl, and b, which are based on copy-paste mechanisms. We study their properties and mutual relationships, and we show that the two classes of measures become incomparable when two-dimensional inputs are considered. We also compare our measures with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024], and provide some insights for the design of effective 2D compressors.
Carfagna, L., Manzini, G., Romana, G., Sciortino, M., Urbina, C. (2024). Generalization of Repetitiveness Measures for Two-Dimensional Strings. In Z. Zsuzsanna Lipták, E. Moura, K. Figueroa, R. Baeza-Yates (a cura di), String Processing and Information Retrieval 31st International Symposium, SPIRE 2024, Puerto Vallarta, Mexico, September 23–25, 2024, Proceedings (pp. 57-72) [10.1007/978-3-031-72200-4_5].
Generalization of Repetitiveness Measures for Two-Dimensional Strings
Manzini, GiovanniCo-primo
;Romana, Giuseppe
Co-primo
;Sciortino, MarinellaCo-primo
;Urbina, CristianCo-primo
2024-01-01
Abstract
Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. When the data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In this paper, we extend to two dimensions the measures δ and γ, defined in terms of the submatrices of the input, as well as the measures g, g_rl, and b, which are based on copy-paste mechanisms. We study their properties and mutual relationships, and we show that the two classes of measures become incomparable when two-dimensional inputs are considered. We also compare our measures with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024], and provide some insights for the design of effective 2D compressors.File | Dimensione | Formato | |
---|---|---|---|
GeneralizationRepetitivenessMeasuresTwoDimensionalStrings.pdf
Solo gestori archvio
Tipologia:
Versione Editoriale
Dimensione
306.24 kB
Formato
Adobe PDF
|
306.24 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.