Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We address the video prediction task by putting forth a novel model that combines (i) a novel hierarchical residual learning vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel autoregressive spatiotemporal predictive model (AST-PM). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the AST-PM's ability to handle spatiotemporal information, S-HR-VQVAE can better deal with major challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on four challenging tasks, namely KTH Human Action, TrafficBJ, Human3.6M, and Kitti, demonstrate that our model compares favorably against state-of-the-art video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and AST-PM parameters.

Adiban, M., Stefanov, K., Siniscalchi, S.M., Salvi, G. (2025). S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction. IEEE TRANSACTIONS ON MULTIMEDIA, 1-13 [10.1109/tmm.2025.3535370].

S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

Stefanov, Kalin^Secondo;Siniscalchi, Sabato Marco^{Co-ultimo

Supervision};Salvi, Giampiero^Co-ultimo

2025-01-27

Abstract

We address the video prediction task by putting forth a novel model that combines (i) a novel hierarchical residual learning vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel autoregressive spatiotemporal predictive model (AST-PM). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the AST-PM's ability to handle spatiotemporal information, S-HR-VQVAE can better deal with major challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on four challenging tasks, namely KTH Human Action, TrafficBJ, Human3.6M, and Kitti, demonstrate that our model compares favorably against state-of-the-art video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and AST-PM parameters.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				27-gen-2025
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE TRANSACTIONS ON MULTIMEDIA
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/tmm.2025.3535370
			
	Citazione
	
				Adiban, M., Stefanov, K., Siniscalchi, S.M., Salvi, G. (2025). S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction. IEEE TRANSACTIONS ON MULTIMEDIA, 1-13 [10.1109/tmm.2025.3535370].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
S-HR-VQVAE_Sequential_Hierarchical_Residual_Learning_Vector_Quantized_Variational_Autoencoder_for_Video_Prediction.pdf accesso aperto Tipologia: Pre-print Dimensione 8 MB Formato Adobe PDF Visualizza/Apri	8 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/672963

Citazioni

ND

3

ND

social impact