Properly training LSTMs requires long time and extensive amount of data. To improve the training of these models, this paper proposes a novel residual and recurrent neural network, Resnet-LSTM, for spatio-temporal pedestrian action recognition from image sequences. The model includes a novel layer, called MapGrad, whose goal is improving stationarity of the feature map sequences processed by the ConvLSTM. The paper demonstrates the effectiveness of the proposed model and the MapGrad layer in the spatio-temporal classification of pedestrian actions through an ablation study and comparison with state-of-the-art methods. Overall, RLSTM achieves an accuracy value of 88% and an average precision of 94% on the JAAD dataset, which is a widely used benchmark in the field. Finally, the paper empirically analyzes the effect of increasing input sequence length on standing action recognition, showing that the proposed method yields a recall of 93%.

Gazzeh, S., Lo Presti, L., Douik, A., La Cascia, M. (2023). RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification. In Computer Analysis of Images and Patterns, Proceedings, Part II, CAIP 2023 (pp. 55-64) [10.1007/978-3-031-44240-7_6].

RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification

Gazzeh, Soulayma
;
Lo Presti, Liliana;La Cascia, Marco
2023-09-01

Abstract

Properly training LSTMs requires long time and extensive amount of data. To improve the training of these models, this paper proposes a novel residual and recurrent neural network, Resnet-LSTM, for spatio-temporal pedestrian action recognition from image sequences. The model includes a novel layer, called MapGrad, whose goal is improving stationarity of the feature map sequences processed by the ConvLSTM. The paper demonstrates the effectiveness of the proposed model and the MapGrad layer in the spatio-temporal classification of pedestrian actions through an ablation study and comparison with state-of-the-art methods. Overall, RLSTM achieves an accuracy value of 88% and an average precision of 94% on the JAAD dataset, which is a widely used benchmark in the field. Finally, the paper empirically analyzes the effect of increasing input sequence length on standing action recognition, showing that the proposed method yields a recall of 93%.
set-2023
Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni
978-3-031-44239-1
978-3-031-44240-7
Gazzeh, S., Lo Presti, L., Douik, A., La Cascia, M. (2023). RLSTM: A Novel Residual and Recurrent Network for Pedestrian Action Classification. In Computer Analysis of Images and Patterns, Proceedings, Part II, CAIP 2023 (pp. 55-64) [10.1007/978-3-031-44240-7_6].
File in questo prodotto:
File Dimensione Formato  
CAIP_2023.pdf

Solo gestori archvio

Descrizione: Articolo
Tipologia: Versione Editoriale
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/610419
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact