Malware poses a significant threat to computing systems, and federated learning (FL) has emerged as a promising privacy-preserving approach to collaboratively train detection models through crowdsourcing. However, FL relies on local client labels that can be noisy or corrupted, and some clients may behave maliciously to poison the global model, leading to severely degraded performance. While techniques for either noisy-label robust FL or defenses against malicious clients exist, few approaches jointly address both challenges, which are likely to co-occur in realistic cybersecurity use-cases. This work presents a federated learning system for robust malware detection in this challenging setting. The proposed pipeline leverages properties of the client update gradients to determine a core set of reliable participants. A cluster-based, distance-aware scoring mechanism isolates low-quality and malicious contributors. Subsequently, the identified clients undergo label refinement through a semi-supervised per-client noisy-label detection and correction using the global model as a reference. Experiments on a publicly available Android Malware dataset show the system maintains high detection accuracy even with high fractions of noisy and adversarial clients, with up to 80% of the training data corrupted, outperforming representative baselines.
Augello, A., De Paola, A., Lo Re, G. (2026). Federated Learning for Robust Malware Detection under Noisy Labels and Malicious Sybils. In P.S. Davide Maiorca (a cura di), Proceedings of the Joint National Conference on Cybersecurity (ITASEC & SERICS 2026). CEUR WS.
Federated Learning for Robust Malware Detection under Noisy Labels and Malicious Sybils
Andrea Augello;Alessandra De Paola;Giuseppe Lo Re
2026-01-01
Abstract
Malware poses a significant threat to computing systems, and federated learning (FL) has emerged as a promising privacy-preserving approach to collaboratively train detection models through crowdsourcing. However, FL relies on local client labels that can be noisy or corrupted, and some clients may behave maliciously to poison the global model, leading to severely degraded performance. While techniques for either noisy-label robust FL or defenses against malicious clients exist, few approaches jointly address both challenges, which are likely to co-occur in realistic cybersecurity use-cases. This work presents a federated learning system for robust malware detection in this challenging setting. The proposed pipeline leverages properties of the client update gradients to determine a core set of reliable participants. A cluster-based, distance-aware scoring mechanism isolates low-quality and malicious contributors. Subsequently, the identified clients undergo label refinement through a semi-supervised per-client noisy-label detection and correction using the global model as a reference. Experiments on a publicly available Android Malware dataset show the system maintains high detection accuracy even with high fractions of noisy and adversarial clients, with up to 80% of the training data corrupted, outperforming representative baselines.| File | Dimensione | Formato | |
|---|---|---|---|
|
2026-itasec.pdf
accesso aperto
Descrizione: This is an open access article under the terms of the Creative Commons Attribution License
Tipologia:
Versione Editoriale
Dimensione
1.69 MB
Formato
Adobe PDF
|
1.69 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


