Transformer-based models such as BERT achieve strong accuracy in predicting vulnerability severity, but their black-box nature raises concerns about alignment with expert reasoning. Accuracy alone may therefore give a misleading view of model reliability. This paper introduces a post-hoc auditing framework that evaluates trust by measuring the semantic alignment between tokens identified via Integrated Gradients and the official CVSS definitions. The framework computes weighted similarities, applies adaptive thresholding, and integrates a dispersion penalty to derive a quantitative trust score, offering interpretable feedback for human review. Experiments on the National Vulnerability Database (NVD) and a Reduced Annotated Dataset (RAD) with BERT-based classifiers across eight Common Vulnerability Scoring System (CVSS) base metrics show that models with similar accuracy can differ in trust scores by more than 35%, revealing critical gaps in reliability. These findings highlight the need to complement accuracy with trust evaluation for interpretable and dependable automation in software vulnerability assessment.
Mirtaheri, S.L., Majd, A., Shahbazian, R., Pugliese, A. (2026). Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment. In ICAAI 2025 - 2025 9th International Conference on Advances in Artificial Intelligence (pp. 87-91). Association for Computing Machinery, Inc [10.1145/3787279.3787294].
Automated Trust-Aware Software Vulnerability Scoring via Explainable Feature Alignment
Shahbazian R.;
2026-01-01
Abstract
Transformer-based models such as BERT achieve strong accuracy in predicting vulnerability severity, but their black-box nature raises concerns about alignment with expert reasoning. Accuracy alone may therefore give a misleading view of model reliability. This paper introduces a post-hoc auditing framework that evaluates trust by measuring the semantic alignment between tokens identified via Integrated Gradients and the official CVSS definitions. The framework computes weighted similarities, applies adaptive thresholding, and integrates a dispersion penalty to derive a quantitative trust score, offering interpretable feedback for human review. Experiments on the National Vulnerability Database (NVD) and a Reduced Annotated Dataset (RAD) with BERT-based classifiers across eight Common Vulnerability Scoring System (CVSS) base metrics show that models with similar accuracy can differ in trust scores by more than 35%, revealing critical gaps in reliability. These findings highlight the need to complement accuracy with trust evaluation for interpretable and dependable automation in software vulnerability assessment.| File | Dimensione | Formato | |
|---|---|---|---|
|
3787279.3787294.pdf
accesso aperto
Tipologia:
Versione Editoriale
Dimensione
859.72 kB
Formato
Adobe PDF
|
859.72 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


