Endor Labs Study Highlights Persistent Security Risks in AI Code Generation
Endor Labs has announced the launch of its agentic code security benchmark. This is a new framework designed to evaluate how securely AI coding agents generate software in real-world environments. The company also launched a leaderboard, the Agent Security League, a public ranking, alongside the benchmark. That monitors the performance of the top AI coding systems by general accuracy and security resilience.

Measuring the Gap Between Functionality and Security in AI-Generated Code
The benchmark is based on the SusVibes model, which was initially produced in peer-reviewed studies at Carnegie Mellon University. It compares AI coders who were tested in 200 real development tasks on 108 open-source projects, reviewed against 77 categories of vulnerability in the Common Weakness Enumeration. This academic base has been extended with Endo Laboratories, developing more levels of testing. Such as new agent-specific harnessing, an improved model of assessment, and guarding against any manipulation of benchmark conditions.
Varun Badhwar, CEO of Endor Labs, said, “AI coding agents are transforming how software is built, but the industry must confront a critical issue. Code that works is not necessarily code that is secure. By making these evaluations transparent, we aim to drive accountability and help organizations better understand the risks associated with AI-generated software.”
According to Precedence Research, the Model Evaluation and Benchmarking Tools Market size was calculated at USD 1.15 billion in 2025 and is predicted to increase from USD 1.42 billion in 2026 to approximately USD 9.57 billion by 2035, expanding at a CAGR of 23.60% from 2026 to 2035. The market growth is attributed to the rapid expansion of enterprise AI deployments requiring performance monitoring and standardized benchmarking across complex model ecosystems.
Industry Impact and Future Outlook
The introduction of the Agent Security League highlights a broader shift toward transparency and accountability in AI-driven development tools. The necessity to ensure standard evaluation frameworks is becoming even more pressing with the growing use of AI to make coding processes faster in organizations.
Secure software development practices have received special focus as institutions like the National Institute of Standards and Technology, with particular focus on automation, and further integration into the engineering processes. Furthermore, Endor Labs is bringing a more transparent and accountable AI ecosystem through the continuation of academic research into a more practical and industry-facing framework.