Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Multimodal dialogue systems are attracting increasing attention with a more natural and informative way for human-computer interaction. As one of its core components, the belief tracker estimates the user's goal at each step of the dialogue and provides a direct way to validate the ability of dialogue understanding. However, existing studies on belief trackers are largely limited to textual modality, which cannot be easily extended to capture the rich semantics in multimodal systems such as those with product images. For example, in fashion domain, the visual appearance of clothes play a crucial role in understanding the user's intention. In this case, the existing belief trackers may fail to generate accurate belief states for a multimodal dialogue system.In this paper, we present the first neural multimodal belief tracker (NMBT) to demonstrate how multimodal evidence can facilitate semantic understanding and dialogue state tracking. Given the multimodal inputs, while applying a textual encoder to represent textual utterances, the model gives special consideration to the semantics revealed in visual modality. It learns concept level fashion semantics by delving deep into image sub-regions and integrating concept probabilities via multiple instance learning. Then in each turn, an adaptive attention mechanism learns to automatically emphasize on different evidence sources of both visual and textual modalities for more accurate dialogue state prediction. We perform extensive evaluation on a multi-turn task-oriented dialogue dataset in fashion domain and the results show that our method achieves superior performance as compared to a wide range of baselines.

Zhang, Z. (2019). Neural Multimodal Belief Tracker with Adaptive Attention for Dialogue Systems. In G. Schiuma, P. Demartini, Yan MR. (a cura di), Proceedings of knowledge ecosystems and growth (pp. 2401-2412). 1515 BROADWAY, NEW YORK, NY 10036-9998 USA : ASSOC COMPUTING MACHINERY [10.1145/3308558.3313598].

Neural Multimodal Belief Tracker with Adaptive Attention for Dialogue Systems

Zhang Z.;Bivona E.;Yan H.;Yan M.;Qi J.

2019-01-01

Abstract

Multimodal dialogue systems are attracting increasing attention with a more natural and informative way for human-computer interaction. As one of its core components, the belief tracker estimates the user's goal at each step of the dialogue and provides a direct way to validate the ability of dialogue understanding. However, existing studies on belief trackers are largely limited to textual modality, which cannot be easily extended to capture the rich semantics in multimodal systems such as those with product images. For example, in fashion domain, the visual appearance of clothes play a crucial role in understanding the user's intention. In this case, the existing belief trackers may fail to generate accurate belief states for a multimodal dialogue system.In this paper, we present the first neural multimodal belief tracker (NMBT) to demonstrate how multimodal evidence can facilitate semantic understanding and dialogue state tracking. Given the multimodal inputs, while applying a textual encoder to represent textual utterances, the model gives special consideration to the semantics revealed in visual modality. It learns concept level fashion semantics by delving deep into image sub-regions and integrating concept probabilities via multiple instance learning. Then in each turn, an adaptive attention mechanism learns to automatically emphasize on different evidence sources of both visual and textual modalities for more accurate dialogue state prediction. We perform extensive evaluation on a multi-turn task-oriented dialogue dataset in fashion domain and the results show that our method achieves superior performance as compared to a wide range of baselines.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2019
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				9781450366748
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1145/3308558.3313598
			
	Citazione
	
				Zhang, Z. (2019). Neural Multimodal Belief Tracker with Adaptive Attention for Dialogue Systems. In G. Schiuma, P. Demartini,  Yan MR. (a cura di), Proceedings of knowledge ecosystems and growth (pp. 2401-2412). 1515 BROADWAY, NEW YORK, NY 10036-9998 USA : ASSOC COMPUTING MACHINERY [10.1145/3308558.3313598].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
IFKAD2019_PROCEEDINGS_eBOOK_Zhang et al.pdf Solo gestori archvio Descrizione: IFKAD2019_PROCEEDINGS_eBOOK_Zhang et al Tipologia: Post-print Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.02 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/386245

Citazioni

ND

35

28

social impact