Automated and accurate Common Vulnerability Scoring System (CVSS) labeling is required for quick patch processing. Large Language Models (LLMs) have shown impressive capabilities in understanding and generating human language; however, their performance can vary depending on factors like training data and architecture. LLMs may generate biased or irrelevant responses and mis-rank critical flaws. This paper presents CIVS, a collective-intelligence framework that fuses GPT-4 with a fine-tuned GPT-3.5-Turbo via weighted aggregation and ensemble learning. CIVS can match or surpass the accuracy and cost-efficiency of a single large and expensive model while reducing the risk and enhancing the reliability. Evaluated on recent records of the National Vulnerability Database (NVD), CIVS reduces mean-squared error by 10% and improves macro-F1 to 0.76 compared with the strongest individual model. CIVS shows robustness even when challenged with GPT-generated “what-if” variations of vulnerability descriptions. Due to reusing existing models without adding any new trainable parameters, the framework remains cost-efficient while still generalizing to previously unseen vulnerabilities.
Mirtaheri, S.L., Shahbazian, R., Pascucci, V., Movahedkor, N., Pugliese, A. (2025). CIVS: A Collective-Intelligence Ensemble for Automated Software Vulnerability Scoring. IEEE ACCESS, 13 [10.1109/ACCESS.2025.3622663].
CIVS: A Collective-Intelligence Ensemble for Automated Software Vulnerability Scoring
Shahbazian R.;
2025-10-17
Abstract
Automated and accurate Common Vulnerability Scoring System (CVSS) labeling is required for quick patch processing. Large Language Models (LLMs) have shown impressive capabilities in understanding and generating human language; however, their performance can vary depending on factors like training data and architecture. LLMs may generate biased or irrelevant responses and mis-rank critical flaws. This paper presents CIVS, a collective-intelligence framework that fuses GPT-4 with a fine-tuned GPT-3.5-Turbo via weighted aggregation and ensemble learning. CIVS can match or surpass the accuracy and cost-efficiency of a single large and expensive model while reducing the risk and enhancing the reliability. Evaluated on recent records of the National Vulnerability Database (NVD), CIVS reduces mean-squared error by 10% and improves macro-F1 to 0.76 compared with the strongest individual model. CIVS shows robustness even when challenged with GPT-generated “what-if” variations of vulnerability descriptions. Due to reusing existing models without adding any new trainable parameters, the framework remains cost-efficient while still generalizing to previously unseen vulnerabilities.| File | Dimensione | Formato | |
|---|---|---|---|
|
CIVS_A_Collective-Intelligence_Ensemble_for_Automated_Software_Vulnerability_Scoring.pdf
accesso aperto
Tipologia:
Versione Editoriale
Dimensione
2.11 MB
Formato
Adobe PDF
|
2.11 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


