Transformer-based models such as BERT achieve strong accuracy in predicting vulnerability severity, but their black-box nature raises concerns about alignment with expert reasoning. Accuracy alone may therefore give a misleading view of model reliability. This paper introduces a post-hoc auditing framework that evaluates trust by measuring the semantic alignment between tokens identified via Integrated Gradients and the official CVSS definitions. The framework computes weighted similarities, applies adaptive thresholding, and integrates a dispersion penalty to derive a quantitative trust score, offering interpretable feedback for human review. Experiments on the National Vulnerability Database (NVD) and a Reduced Annotated Dataset (RAD) with BERT-based classifiers across eight Common Vulnerability Scoring System (CVSS) base metrics show that models with similar accuracy can differ in trust scores by more than 35%, revealing critical gaps in reliability. These findings highlight the need to complement accuracy with trust evaluation for interpretable and dependable automation in software vulnerability assessment.

Mirtaheri, S.L., Majd, A., Shahbazian, R., Pugliese, A. (2026). Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment. In ICAAI 2025 - 2025 9th International Conference on Advances in Artificial Intelligence (pp. 87-91). Association for Computing Machinery, Inc [10.1145/3787279.3787294].

Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment

Shahbazian R.;
2026-01-01

Abstract

Transformer-based models such as BERT achieve strong accuracy in predicting vulnerability severity, but their black-box nature raises concerns about alignment with expert reasoning. Accuracy alone may therefore give a misleading view of model reliability. This paper introduces a post-hoc auditing framework that evaluates trust by measuring the semantic alignment between tokens identified via Integrated Gradients and the official CVSS definitions. The framework computes weighted similarities, applies adaptive thresholding, and integrates a dispersion penalty to derive a quantitative trust score, offering interpretable feedback for human review. Experiments on the National Vulnerability Database (NVD) and a Reduced Annotated Dataset (RAD) with BERT-based classifiers across eight Common Vulnerability Scoring System (CVSS) base metrics show that models with similar accuracy can differ in trust scores by more than 35%, revealing critical gaps in reliability. These findings highlight the need to complement accuracy with trust evaluation for interpretable and dependable automation in software vulnerability assessment.
2026
979-8-4007-2104-5
Mirtaheri, S.L., Majd, A., Shahbazian, R., Pugliese, A. (2026). Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment. In ICAAI 2025 - 2025 9th International Conference on Advances in Artificial Intelligence (pp. 87-91). Association for Computing Machinery, Inc [10.1145/3787279.3787294].
File in questo prodotto:
File Dimensione Formato  
3787279.3787294.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 859.72 kB
Formato Adobe PDF
859.72 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/707569
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact