A binary word is Sturmian if the occurrences of each letter are balanced, in the sense that in any two factors of the same length, the difference between the number of occurrences of the same letter is at most 1. In digital geometry, Sturmian words correspond to discrete approximations of straight line segments in the Euclidean plane. The Dorst–Smeulders coding, introduced in 1984, is a 4-tuple of integers that uniquely represents a Sturmian word w, enabling its reconstruction using |w| modular operations, making it highly efficient in practice. In this paper, we present a linear-time algorithm that, given a binary input word w, computes the Dorst–Smeulders coding of its longest Sturmian prefix. This forms the basis for computing the Dorst–Smeulders coding of an arbitrary binary word w, which is a minimal decomposition (in terms of the number of factors) of w into Sturmian words, each represented by its Dorst–Smeulders coding. This coding could be leveraged in compression schemes where the input is transformed into a binary word composed of long Sturmian segments. Although the algorithm is conceptually simple and can be implemented in just a few lines of code, it is grounded in a deep analysis of the structural properties of Sturmian words.

De Luca, A., Fici, G. (2025). Dorst–Smeulders Coding for Arbitrary Binary Words. In G. Golnaz Badkobeh, J. Radoszewski, N. Tonellotto, R. Baeza-Yates (a cura di), String Processing and Information Retrieval 32nd International Symposium, SPIRE 2025, London, UK, September 8–11, 2025, Proceedings (pp. 45-53). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-032-05228-5_5].

Dorst–Smeulders Coding for Arbitrary Binary Words

Fici G.
2025-01-01

Abstract

A binary word is Sturmian if the occurrences of each letter are balanced, in the sense that in any two factors of the same length, the difference between the number of occurrences of the same letter is at most 1. In digital geometry, Sturmian words correspond to discrete approximations of straight line segments in the Euclidean plane. The Dorst–Smeulders coding, introduced in 1984, is a 4-tuple of integers that uniquely represents a Sturmian word w, enabling its reconstruction using |w| modular operations, making it highly efficient in practice. In this paper, we present a linear-time algorithm that, given a binary input word w, computes the Dorst–Smeulders coding of its longest Sturmian prefix. This forms the basis for computing the Dorst–Smeulders coding of an arbitrary binary word w, which is a minimal decomposition (in terms of the number of factors) of w into Sturmian words, each represented by its Dorst–Smeulders coding. This coding could be leveraged in compression schemes where the input is transformed into a binary word composed of long Sturmian segments. Although the algorithm is conceptually simple and can be implemented in just a few lines of code, it is grounded in a deep analysis of the structural properties of Sturmian words.
2025
Settore INFO-01/A - Informatica
978-3-032-05227-8
978-3-032-05228-5
De Luca, A., Fici, G. (2025). Dorst–Smeulders Coding for Arbitrary Binary Words. In G. Golnaz Badkobeh, J. Radoszewski, N. Tonellotto, R. Baeza-Yates (a cura di), String Processing and Information Retrieval 32nd International Symposium, SPIRE 2025, London, UK, September 8–11, 2025, Proceedings (pp. 45-53). Springer Science and Business Media Deutschland GmbH [10.1007/978-3-032-05228-5_5].
File in questo prodotto:
File Dimensione Formato  
Dorst–Smeulders Coding for Arbitrary Binary Words.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 335.39 kB
Formato Adobe PDF
335.39 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/693487
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact