SECURING ENTERPRISE AI SYSTEMS AGAINST DATA POISONING: ANALYTICAL FRAMEWORK, TOOL-BASED DETECTION, AND CONTROLLED EXPERIMENTATION
Abstract
The accelerated growth of enterprise adoption of artificial intelligence continues to grow; however, the emergence of new adversarial threats (in particular, data poisoning attacks) creates a significant security risk to enterprises. Data poisoning attacks are a form of adversarial attack on AI models that invade an AI model’s training data and manipulate the training data, making it possible for a malicious actor to inject malicious behavior into the AI model. As a result, backdoors are created in the AI model, negatively impacting the integrity of the AI model and the trust and perception in the business. In this paper, we develop a comprehensive analytical framework, leveraging influence functions, to quantify the impact of poison data on the predictive capability of an AI model. Additionally, we conduct a systematic literature review focused on detection methodologies and an analysis of enterprise AI model defense strategies currently used in practice. To provide experimental evidence, we use a controlled experimental design utilizing publicly available open-source toolkits to demonstrate the usefulness of our framework by evaluating poisoning attack vectors against a variety of standard datasets and measuring the effectiveness of defense mechanisms using different metrics including: accuracy degradation, success rate of an attack, and accuracy of a detection. Our research shows that a small number of poisoning samples (i.e.: 1% to 5%) is sufficient to cause a significant degradation in the ability of the AI model to reliably predict, while employing layered defense strategies will reduce the risk of attacks. This research will aid in identifying the deficiencies present in current governance frameworks for enterprise AI and provide actionable methods and procedures forenterprises building AI security program.
Downloads
References
[2] E. Steinhardt et al., "Certified robustness to adversarial examples with differential privacy," in Proc. IEEE Symp. Secur. Privacy (SP), 2018, pp. 655–672.
[3] Z. Mu et al., "Neural cleanse: Identifying and mitigating backdoor attacks in neural networks," in Proc. IEEE Symp. Secur. Privacy (SP), 2021, pp. 1078–1094.
[4] B. Wang et al., "Federated learning with Byzantine-robust aggregation," in Proc. 39th Int. Conf. Mach. Learn., 2022, pp. 11205– 11215.
[5] T. Gu et al., "BadNets: Identifying vulnerabilities in the machine learning model supply chain," IEEE Trans. Dependable Secure Comput., vol. 21, no. 2, pp. 34–47, 2023.
[6] N. Carlini et al., "Poisoning and backdooring contrastive learning," in Proc. 9th Int. Conf. Learn. Represent., 2021, pp. 1–15.
[7] J. Geiping et al., "Poison frogs! Targeted clean-label poisoning attacks on neural networks," in Advances in Neural Information Processing Systems 34, 2021, pp. 1–12.
[8] D. Turner et al., "Label smoothing and logit squeezing: A replacement for adversarial training?" in Proc. 9th Int. Conf. Learn. Represent., 2019, pp. 1–10.
[9] M. S. Paudice et al., "Poison attacks against text datasets with conditional adversarially regularized autoencoder," in Proc. Joint Eur. Conf. Mach. Learn. & Princ. Knowl. Discov. Databases, 2018, pp. 1– 16.
[10] X. Chen et al., "Responsible AI governance in financial services: Frameworks, challenges, and best practices," IEEE Secur. Privacy, vol. 21, no. 3, pp. 58–67, 2023.
[11] L. Koh and P. S. Liang, "Understanding black-box predictions via influence functions," in Proc. 34th Int. Conf. Mach. Learn., 2017, pp. 1885–1894.
[12] X. Liu et al., "Spectral defenses against poisoning attacks in deep learning," IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4260–4275, 2022.
[13] D. Madras et al., "Learning adversarially robust representations via worst-case perturbations," in Proc. Conf. Uncertainty Artif. Intell., 2020, pp. 1–10.
[14] M. Mosbach et al., "On the robustness of self-attentive models," in Proc. 59th Annu. Meeting Assoc. Comput. Linguist., 2021, pp. 1365– 1377.
[15] S. Mahloujifar et al., "The curse of dimensionality for machine learning," in Proc. 56th Annu. IEEE Symp. Found. Comput. Sci., 2019, pp. 1320–1333.
Copyright (c) 2026 IJRDO -Journal of Computer Science Engineering

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Author(s) and co-author(s) jointly and severally represent and warrant that the Article is original with the author(s) and does not infringe any copyright or violate any other right of any third parties, and that the Article has not been published elsewhere. Author(s) agree to the terms that the IJRDO Journal will have the full right to remove the published article on any misconduct found in the published article.
