EMBER-Based Static Malware Detection: A Critical Review of Accuracy, Explainability, and Temporal Robustness Trade-offs

Ahmed M. Redha Abdulsattar; Riyadh Rahef Nuiaa Alogaili; Ahmed Raad Al-Sudani; Selvakumar Manickam

doi:10.59746/asgg6t42

المؤلفون

Ahmed M. Redha Abdulsattar Software Department, College of Computer Science and Information Technology, Wasit University, Wasit, Al-Kut, 52001, Iraq
Riyadh Rahef Nuiaa Alogaili Cybersecurity Department, College of Computer Science and Information Technology, Wasit University, Wasit, Al-Kut, 52001, Iraq
Ahmed Raad Al-Sudani Software Department, College of Computer Science and Information Technology, Wasit University, Wasit, Al-Kut, 52001, Iraq
Selvakumar Manickam Cybersecurity Research Centre, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia

DOI:

https://doi.org/10.59746/asgg6t42

الكلمات المفتاحية:

static malware detection، EMBER dataset، ensemble learning، explainable AI، windows malware

الملخص

Static analysis-based malware detection of Portable Executable (PE) files has evolved remarkably since the release of the EMBER dataset in 2018. Yet evaluation methodology and model explainability continue to suffer from critical challenges that limit real-world implementation. This literature review explores five thematic clusters: Traditional Machine Learning, Deep Learning, Ensemble and Hybrid Architectures, Explainable AI (XAI), and Zero-day Detection with Concept Drift. The review analyzes 27 primary studies and 15 supporting references for a total of 42 studies published between 2018 and early 2026. The review shows that gradient boosted decision tree steadily offers better baseline performance. In comparison, ensemble and hybrid architectures show the highest accuracy overall. That being said, this comes with the cost of reduced explainability and an increase in computational overhead. Deep Learning methods make the performance gap thinner but bring up transparency and resource concerns. And the emerging Large Language Model (LLM)-based approaches remaining premature and unverified. Across all of the five clusters, six intersecting gaps are identified, the most notable being the near-universal dependence on random instead of temporal train/test splits. Other gaps include the lack of sufficient false positive rate reporting at operational thresholds, and the consistent separation between explainability and detection performance. Critically, no reviewed study achieved a successful integration of ensemble level accuracy, embedded explainability, and temporal oriented evaluation within a single framework. It’s a gap that this review specifically recognizes and highlights as the most crucial priority of the research in this field. The gaps explored can be addressed with the seven future research directions presented later in this review. The most critical one of them is the incorporation of ensemble accuracy, explainability and temporal evaluation in a unified framework. This is a combination that no reviewed study has achieved yet.

المراجع

[1] AV-TEST Institute, “Malware Statistics.” AV-ATLAS. Accessed: Apr. 20, 2026. [Online]. Available: https://portal.av-atlas.org/malware/statistics

[2] J. Ferdous, R. Islam, A. Mahboubi, and Md. Z. Islam, “A Review of State-of-the-Art Malware Attack Trends and Defense Mechanisms,” IEEE Access, vol. 11, pp. 121118–121141, 2023, doi: 10.1109/ACCESS.2023.3328351.

[3] M. G. Gaber, M. Ahmed, and H. Janicke, “Malware Detection with Artificial Intelligence: A Systematic Literature Review,” ACM Comput. Surv., vol. 56, no. 6, pp. 1–33, Jun. 2024, doi: 10.1145/3638552.

[4] N. Z. Gorment, A. Selamat, L. K. Cheng, and O. Krejcar, “Machine Learning Algorithm for Malware Detection: Taxonomy, Current Challenges, and Future Directions,” IEEE Access, vol. 11, pp. 141045–141089, 2023, doi: 10.1109/ACCESS.2023.3256979.

[5] H.-D. Pham, T. D. Le, and T. N. Vu, “Static PE Malware Detection Using Gradient Boosting Decision Trees Algorithm,” in Future Data and Security Engineering, T. K. Dang, J. Küng, R. Wagner, N. Thoai, and M. Takizawa, Eds., Cham: Springer International Publishing, 2018, pp. 228–236. doi: 10.1007/978-3-030-03192-3_17.

[6] N. A. Azeez, O. E. Odufuwa, S. Misra, J. Oluranti, and R. Damaševičius, “Windows PE Malware Detection Using Ensemble Learning,” Informatics, vol. 8, no. 1, p. 10, Feb. 2021, doi: 10.3390/informatics8010010.

[7] R. J. Joyce et al., “EMBER2024 - A Benchmark Dataset for Holistic Evaluation of Malware Classifiers,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2, Toronto ON Canada: ACM, Aug. 2025, pp. 5516–5526. doi: 10.1145/3711896.3737431.

[8] H. S. Anderson and P. Roth, “EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models,” Apr. 16, 2018, arXiv: arXiv:1804.04637. doi: 10.48550/arXiv.1804.04637.

[9] C. Galen and R. Steele, “Evaluating Performance Maintenance and Deterioration Over Time of Machine Learning-based Malware Detection Models on the EMBER PE Dataset,” in 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Dec. 2020, pp. 1–7. doi: 10.1109/SNAMS52053.2020.9336538.

[10] D. G. Corlatescu, A. Dinu, M. P. Gaman, and P. Sumedrea, “EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis,” Adv. Neural Inf. Process. Syst., vol. 36, pp. 26722–26743, Dec. 2023.

[11] P. Švec, Š. Balogh, M. Homola, and J. Kľuka, “Knowledge-Based Dataset for Training PE Malware Detection Models,” Dec. 31, 2022, arXiv: arXiv:2301.00153. doi: 10.48550/arXiv.2301.00153.

[12] D. Gibert, N. Totosis, C. Patsakis, Q. Le, and G. Zizzo, “Assessing the impact of packing on static machine learning-based malware detection and classification systems,” Comput. Secur., vol. 156, p. 104495, Sep. 2025, doi: 10.1016/j.cose.2025.104495.

[13] X. Ling et al., “Adversarial attacks against Windows PE malware detection: A survey of the state-of-the-art,” Comput. Secur., vol. 128, p. 103134, May 2023, doi: 10.1016/j.cose.2023.103134.

[14] K. Aryal, M. Gupta, M. Abdelsalam, P. Kunwar, and B. Thuraisingham, “A Survey on Adversarial Attacks for Malware Analysis,” IEEE Access, vol. 13, pp. 428–459, 2025, doi: 10.1109/ACCESS.2024.3519524.

[15] Y. Gao, H. Hasegawa, Y. Yamaguchi, and H. Shimada, “Malware Detection Using LightGBM With a Custom Logistic Loss Function,” IEEE Access, vol. 10, pp. 47792–47804, 2022, doi: 10.1109/ACCESS.2022.3171912.

[16] Y. Oyama, T. Miyashita, and H. Kokubo, “Identifying Useful Features for Malware Detection in the Ember Dataset,” in 2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW), Nov. 2019, pp. 360–366. doi: 10.1109/CANDARW.2019.00069.

[17] O. Barut, T. Zhang, Y. Luo, and P. Li, “A Comprehensive Study on Efficient and Accurate Machine Learning-Based Malicious PE Detection,” in 2023 IEEE 20th Consumer Communications & Networking Conference (CCNC), Jan. 2023, pp. 632–635. doi: 10.1109/CCNC51644.2023.10060214.

[18] N. Khan, A. Al-Tamimi, A. Bermak, and I. Khalil, “Adaptive malware detection using sequential feature selection: A dueling double deep Q-Network framework for intelligent classification,” J. Inf. Secur. Appl., vol. 99, p. 104407, Jun. 2026, doi: 10.1016/j.jisa.2026.104407.

[19] F. T. ALGorain and A. S. Alnaeem, “Deep Learning Optimisation of Static Malware Detection with Grid Search and Covering Arrays,” Telecom, vol. 4, no. 2, pp. 249–264, May 2023, doi: 10.3390/telecom4020015.

[20] T.-H. Lai, Y.-J. Tsai, and C.-L. Liu, “Improving the Performance of Static Malware Classification Using Deep Learning Models and Feature Reduction Strategies,” Mathematics, vol. 13, no. 23, p. 3753, Nov. 2025, doi: 10.3390/math13233753.

[21] A. Brown, M. Gupta, and M. Abdelsalam, “Automated machine learning for deep learning based malware detection,” Comput. Secur., vol. 137, p. 103582, Feb. 2024, doi: 10.1016/j.cose.2023.103582.

[22] Y. Sun and M. Masum, “PE2Prompt: A Large Language Model-Native Framework for Interpretable Static Malware Analysis,” Dec. 17, 2025, Social Science Research Network, Rochester, NY: 5933533. doi: 10.2139/ssrn.5933533.

[23] N. Gill, A. H. K, and S. D. M. Kumar, “LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection,” in 2025 Conference on Building a Secure & Empowered Cyberspace (BuildSEC), Dec. 2025, pp. 91–99. doi: 10.1109/BuildSEC68439.2025.00022.

[24] R. Damaševičius, A. Venčkauskas, J. Toldinas, and Š. Grigaliūnas, “Ensemble-Based Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection,” Electronics, vol. 10, no. 4, p. 485, Feb. 2021, doi: 10.3390/electronics10040485.

[25] D. Trizna, “Quo Vadis: Hybrid Machine Learning Meta-Model Based on Contextual and Behavioral Malware Representations,” in Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security, Los Angeles CA USA: ACM, Nov. 2022, pp. 127–136. doi: 10.1145/3560830.3563726.

[26] N. Vuran Sarı and M. Acı, “A hybrid CNN-GRU model with XAI-Driven interpretability using LIME and SHAP for static analysis in malware detection,” PeerJ Comput. Sci., vol. 11, p. e3258, Oct. 2025, doi: 10.7717/peerj-cs.3258.

[27] K. Mahmud Sujon, R. Binti Hassan, M. Abdullah-Al-Wadud, and J. Uddin, “OPTISTACK: A Hybrid Ensemble Learning and XAI-Based Approach for Malware Detection in Compressed Files,” IEEE Access, vol. 13, pp. 104992–105026, 2025, doi: 10.1109/ACCESS.2025.3579880.

[28] A. Al Siam, H. Abu-Adaiq, M. Nafis, F. A. Anik, S. Mahmud, and M. M. Hassan, “Explainable Machine Learning for Malware Detection: A SHAP-Based LightGBM Framework,” in 2026 IEEE 5th International Conference on AI in Cybersecurity (ICAIC), Feb. 2026, pp. 1–6. doi: 10.1109/ICAIC67076.2026.11395690.

[29] R. Kumar and G. Subbiah, “Zero-Day Malware Detection and Effective Malware Analysis Using Shapley Ensemble Boosting and Bagging Approach,” Sensors, vol. 22, no. 7, p. 2798, Apr. 2022, doi: 10.3390/s22072798.

[30] H. Manthena, J. C. Kimmel, M. Abdelsalam, and M. Gupta, “Analyzing and Explaining Black-Box Models for Online Malware Detection,” IEEE Access, vol. 11, pp. 25237–25252, 2023, doi: 10.1109/ACCESS.2023.3255176.

[31] E. Baghirov, “A comprehensive investigation into robust malware detection with explainable AI,” Cyber Secur. Appl., vol. 3, p. 100072, Dec. 2025, doi: 10.1016/j.csa.2024.100072.

[32] P. Anthony et al., “Explainable Malware Detection with Tailored Logic Explained Networks,” May 05, 2024, arXiv: arXiv:2405.03009. doi: 10.48550/arXiv.2405.03009.

[33] P. Švec, Š. Balogh, M. Homola, J. Kľuka, T. Bisták, and P. Anthony, “Semantic Data Representation for Explainable Windows Malware Detection Models,” 2024, arXiv. doi: 10.48550/ARXIV.2403.11669.

[34] H. Manthena, S. Shajarian, J. C. Kimmell, M. Abdelsalam, S. Khorsandroo, and M. Gupta, “Explainable Artificial Intelligence (XAI) for Malware Analysis: A Survey of Techniques, Applications, and Open Challenges,” IEEE Access, vol. 13, pp. 61611–61640, 2025, doi: 10.1109/ACCESS.2025.3555926.

[35] M. Saqib, S. Mahdavifar, B. C. M. Fung, and P. Charland, “A Comprehensive Analysis of Explainable AI for Malware Hunting,” ACM Comput Surv, vol. 56, no. 12, p. 314:1-314:40, Oct. 2024, doi: 10.1145/3677374.

[36] E. M. Rudd, D. Krisiloff, S. Coull, D. Olszewski, E. Raff, and J. Holt, “Efficient Malware Analysis Using Metric Embeddings,” Digit. Threats Res. Pract., vol. 5, no. 1, pp. 1–20, Mar. 2024, doi: 10.1145/3615669.

[37] O. Jurečková, M. Jureček, M. Stamp, F. Di Troia, and R. Lórencz, “Classification and online clustering of zero-day malware,” J. Comput. Virol. Hacking Tech., vol. 20, no. 4, pp. 579–592, Nov. 2024, doi: 10.1007/s11416-024-00513-5.

[38] W. Maillet and B. Marais, “Optimized Deep Learning Models for Malware Detection under Concept Drift,” Aug. 01, 2024, arXiv: arXiv:2308.10821. doi: 10.48550/arXiv.2308.10821.

[39] A. Manikandaraja, P. Aaby, and N. Pitropakis, “Rapidrift: Elementary Techniques to Improve Machine Learning-Based Malware Detection,” Computers, vol. 12, no. 10, Sep. 2023, doi: 10.3390/computers12100195.

[40] M. S. Rahman, S. E. Coull, and M. Wright, “On the Limitations of Continual Learning for Malware Classification,” Aug. 13, 2022, arXiv: arXiv:2208.06568. doi: 10.48550/arXiv.2208.06568.

[41] M. S. Rahman, S. Coull, Q. Yu, and M. Wright, “MADAR: Efficient Continual Learning for Malware Analysis with Distribution-Aware Replay,” Sep. 17, 2025, arXiv: arXiv:2502.05760. doi: 10.48550/arXiv.2502.05760.

[42] L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, “Functionality-Preserving Black-Box Optimization of Adversarial Windows Malware,” IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 3469–3478, 2021, doi: 10.1109/TIFS.2021.3082330.

[43] L. Demetrio, B. Biggio, G. Lagorio, F. Roli, and A. Armando, “Explaining Vulnerabilities of Deep Learning to Adversarial Malware Binaries,” Jan. 24, 2019, arXiv: arXiv:1901.03583. doi: 10.48550/arXiv.1901.03583.

[44] L. Demetrio, S. E. Coull, B. Biggio, G. Lagorio, A. Armando, and F. Roli, “Adversarial EXEmples: A Survey and Experimental Evaluation of Practical Attacks on Machine Learning for Windows Malware Detection,” ACM Trans Priv Secur, vol. 24, no. 4, p. 27:1-27:31, Sep. 2021, doi: 10.1145/3473039.

[45] R. Harang and E. M. Rudd, “SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection,” Dec. 14, 2020, arXiv: arXiv:2012.07634. doi: 10.48550/arXiv.2012.07634.

[46] C. Galen and R. Steele, “Evaluating Performance Maintenance and Deterioration Over Time of Machine Learning-based Malware Detection Models on the EMBER PE Dataset,” in 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Dec. 2020, pp. 1–7. doi: 10.1109/SNAMS52053.2020.9336538.

EMBER-Based Static Malware Detection: A Critical Review of Accuracy, Explainability, and Temporal Robustness Trade-offs

المؤلفون

DOI:

الكلمات المفتاحية:

الملخص

المراجع

التنزيلات

منشور

إصدار

القسم

Sidebar-Upper

Download Suplement

Information

اللغة