Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Transformer-based models such as BERT achieve strong accuracy in predicting vulnerability severity, but their black-box nature raises concerns about alignment with expert reasoning. Accuracy alone may therefore give a misleading view of model reliability. This paper introduces a post-hoc auditing framework that evaluates trust by measuring the semantic alignment between tokens identified via Integrated Gradients and the official CVSS definitions. The framework computes weighted similarities, applies adaptive thresholding, and integrates a dispersion penalty to derive a quantitative trust score, offering interpretable feedback for human review. Experiments on the National Vulnerability Database (NVD) and a Reduced Annotated Dataset (RAD) with BERT-based classifiers across eight Common Vulnerability Scoring System (CVSS) base metrics show that models with similar accuracy can differ in trust scores by more than 35%, revealing critical gaps in reliability. These findings highlight the need to complement accuracy with trust evaluation for interpretable and dependable automation in software vulnerability assessment.

Mirtaheri, S.L., Majd, A., Shahbazian, R., Pugliese, A. (2026). Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment. In ICAAI 2025 - 2025 9th International Conference on Advances in Artificial Intelligence (pp. 87-91). Association for Computing Machinery, Inc [10.1145/3787279.3787294].

Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment

Mirtaheri S. L.;Majd A.;Shahbazian R.;Pugliese A.

2026-01-01

Abstract

Transformer-based models such as BERT achieve strong accuracy in predicting vulnerability severity, but their black-box nature raises concerns about alignment with expert reasoning. Accuracy alone may therefore give a misleading view of model reliability. This paper introduces a post-hoc auditing framework that evaluates trust by measuring the semantic alignment between tokens identified via Integrated Gradients and the official CVSS definitions. The framework computes weighted similarities, applies adaptive thresholding, and integrates a dispersion penalty to derive a quantitative trust score, offering interpretable feedback for human review. Experiments on the National Vulnerability Database (NVD) and a Reduced Annotated Dataset (RAD) with BERT-based classifiers across eight Common Vulnerability Scoring System (CVSS) base metrics show that models with similar accuracy can differ in trust scores by more than 35%, revealing critical gaps in reliability. These findings highlight the need to complement accuracy with trust evaluation for interpretable and dependable automation in software vulnerability assessment.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2026
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				979-8-4007-2104-5
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1145/3787279.3787294
			
	URL dell'editore (Open access ove possibile)
	
				https://dl.acm.org/doi/10.1145/3787279.3787294
			
	Citazione
	
				Mirtaheri, S.L., Majd, A., Shahbazian, R., Pugliese, A. (2026). Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment. In ICAAI 2025 - 2025 9th International Conference on Advances in Artificial Intelligence (pp. 87-91). Association for Computing Machinery, Inc [10.1145/3787279.3787294].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
3787279.3787294.pdf accesso aperto Tipologia: Versione Editoriale Dimensione 859.72 kB Formato Adobe PDF Visualizza/Apri	859.72 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/707569

Citazioni

ND

0

ND

social impact