Autonomous Security Testing for AI Systems: Evaluating AI Red-Teaming Agents for Continuous Adversarial Assessment and Model Resilience
DOI:
https://doi.org/10.63002/asrp.401.1330Keywords:
AI red-teaming agents, Autonomous security testing, Adversarial robustness in AI modelsAbstract
Artificial intelligence systems are increasingly integrated into high-stakes applications, yet their growing complexity introduces unique security challenges. Traditional human-led red-teaming approaches struggle to provide comprehensive, continuous, and reproducible evaluation, leaving AI models vulnerable to adversarial exploitation, including prompt injection, data leakage, hallucination manipulation, and emergent behaviors. This paper investigates the use of autonomous AI red-teaming agents as a scalable and adaptive solution for continuous adversarial assessment and model resilience. It proposes a framework for designing AI-driven agents capable of generating novel attack scenarios, adapting strategies based on observed model responses, and benchmarking effectiveness across defined vulnerability classes. The study further explores operational integration into AI development and deployment pipelines, hybrid testing models that combine human expertise with autonomous evaluation, and governance mechanisms to ensure safe, ethical, and compliant testing. By comparing autonomous agents to human-led teams, the paper demonstrates enhanced coverage, efficiency, and reproducibility while addressing regulatory and assurance requirements. The findings underscore the potential of autonomous red-teaming to transform AI security from episodic assessment to persistent, proactive resilience, providing organizations with robust safeguards against evolving threats.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Ashok Kumar Kanagala

This work is licensed under a Creative Commons Attribution 4.0 International License.