Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Malware poses a significant threat to computing systems, and federated learning (FL) has emerged as a promising privacy-preserving approach to collaboratively train detection models through crowdsourcing. However, FL relies on local client labels that can be noisy or corrupted, and some clients may behave maliciously to poison the global model, leading to severely degraded performance. While techniques for either noisy-label robust FL or defenses against malicious clients exist, few approaches jointly address both challenges, which are likely to co-occur in realistic cybersecurity use-cases. This work presents a federated learning system for robust malware detection in this challenging setting. The proposed pipeline leverages properties of the client update gradients to determine a core set of reliable participants. A cluster-based, distance-aware scoring mechanism isolates low-quality and malicious contributors. Subsequently, the identified clients undergo label refinement through a semi-supervised per-client noisy-label detection and correction using the global model as a reference. Experiments on a publicly available Android Malware dataset show the system maintains high detection accuracy even with high fractions of noisy and adversarial clients, with up to 80% of the training data corrupted, outperforming representative baselines.

Augello, A., De Paola, A., Lo Re, G. (2026). Federated Learning for Robust Malware Detection under Noisy Labels and Malicious Sybils. In P.S. Davide Maiorca (a cura di), Proceedings of the Joint National Conference on Cybersecurity (ITASEC & SERICS 2026). CEUR WS.

Federated Learning for Robust Malware Detection under Noisy Labels and Malicious Sybils

Andrea Augello;Alessandra De Paola;Giuseppe Lo Re

2026-01-01

Abstract

Malware poses a significant threat to computing systems, and federated learning (FL) has emerged as a promising privacy-preserving approach to collaboratively train detection models through crowdsourcing. However, FL relies on local client labels that can be noisy or corrupted, and some clients may behave maliciously to poison the global model, leading to severely degraded performance. While techniques for either noisy-label robust FL or defenses against malicious clients exist, few approaches jointly address both challenges, which are likely to co-occur in realistic cybersecurity use-cases. This work presents a federated learning system for robust malware detection in this challenging setting. The proposed pipeline leverages properties of the client update gradients to determine a core set of reliable participants. A cluster-based, distance-aware scoring mechanism isolates low-quality and malicious contributors. Subsequently, the identified clients undergo label refinement through a semi-supervised per-client noisy-label detection and correction using the global model as a reference. Experiments on a publicly available Android Malware dataset show the system maintains high detection accuracy even with high fractions of noisy and adversarial clients, with up to 80% of the training data corrupted, outperforming representative baselines.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2026
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	URL dell'editore (Open access ove possibile)
	
				https://ceur-ws.org/Vol-4198/paper5.pdf
			
	Citazione
	
				Augello, A., De Paola, A., Lo Re, G. (2026). Federated Learning for Robust Malware Detection under Noisy Labels and Malicious Sybils. In P.S. Davide Maiorca (a cura di), Proceedings of the Joint National Conference on Cybersecurity (ITASEC & SERICS 2026). CEUR WS.
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
2026-itasec.pdf accesso aperto Descrizione: This is an open access article under the terms of the Creative Commons Attribution License Tipologia: Versione Editoriale Dimensione 1.69 MB Formato Adobe PDF Visualizza/Apri	1.69 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/704608

Citazioni

ND

0

ND

social impact