Accurate short-term forecasting of Limnospira platensis biomass is essential for optimizing experimental scheduling and cultivation strategies, yet small datasets and strong temporal autocorrelation pose significant challenges for model reliability. In this study, we developed a leakage-safe, data-driven framework for direct multi-step forecasting of biomass concentration based on experimental time-series data from nine independent cultivation trials conducted under heterogeneous nutritional and environmental conditions. Gradient Boosting consistently outperformed a persistence baseline across all forecasting horizons (R2 ≈ 0.915 at h = 1, 0.935 at h = 2, 0.814 at h = 3), demonstrating strong predictive capability under Leave-One-Experiment-Out cross-validation, which ensures generalization to unseen experiments. Residual analysis and prediction intervals confirmed robust uncertainty quantification and revealed condition-dependent variability in predictive performance. Overall, the results show that rigorously validated machine learning models can reliably forecast biomass trajectories beyond naïve baselines, even under limited and heterogeneous datasets. This approach provides a scalable and reproducible methodological framework for predictive modeling in algal biotechnology; however, because the training data were collected at flask scale, direct transfer to larger photobioreactor or outdoor systems should be considered a future validation step rather than an immediate deployment outcome.
Cosenza, B., Pomaré, M., Concas, A., Cravotto, G., Cosenza, A., Peroni, C.V., et al. (2026). Data-Driven Prediction of Limnospira platensis (Spirulina) Biomass from Experimental Time-Series Data. BIOMASS, 6(3) [10.3390/biomass6030041].
Data-Driven Prediction of Limnospira platensis (Spirulina) Biomass from Experimental Time-Series Data
Cosenza, BartolomeoPrimo
Methodology
;Cravotto, GiancarloConceptualization
;Cosenza, AlidaValidation
;
2026-05-31
Abstract
Accurate short-term forecasting of Limnospira platensis biomass is essential for optimizing experimental scheduling and cultivation strategies, yet small datasets and strong temporal autocorrelation pose significant challenges for model reliability. In this study, we developed a leakage-safe, data-driven framework for direct multi-step forecasting of biomass concentration based on experimental time-series data from nine independent cultivation trials conducted under heterogeneous nutritional and environmental conditions. Gradient Boosting consistently outperformed a persistence baseline across all forecasting horizons (R2 ≈ 0.915 at h = 1, 0.935 at h = 2, 0.814 at h = 3), demonstrating strong predictive capability under Leave-One-Experiment-Out cross-validation, which ensures generalization to unseen experiments. Residual analysis and prediction intervals confirmed robust uncertainty quantification and revealed condition-dependent variability in predictive performance. Overall, the results show that rigorously validated machine learning models can reliably forecast biomass trajectories beyond naïve baselines, even under limited and heterogeneous datasets. This approach provides a scalable and reproducible methodological framework for predictive modeling in algal biotechnology; however, because the training data were collected at flask scale, direct transfer to larger photobioreactor or outdoor systems should be considered a future validation step rather than an immediate deployment outcome.| File | Dimensione | Formato | |
|---|---|---|---|
|
biomass-06-00041-v2.pdf
accesso aperto
Tipologia:
Versione Editoriale
Dimensione
1.6 MB
Formato
Adobe PDF
|
1.6 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


