Research

Research and findings.

Citable findings, benchmarks, and original analysis from the Deva Security Team. Each entry includes a headline number, methodology, and a ready-to-paste citation block.

AI Code Security

82.8% of functionally correct AI-generated code contains exploitable vulnerabilities.

82.8%of working AI-generated code is exploitable

Summary

The SusVibes Benchmark tested 200 coding tasks across 77 CWE types against major frontier models (Claude 4 Sonnet, GPT-4, Copilot). 61% of LLM solutions were functionally correct. Of those functionally correct solutions, only 17.2% were actually secure under adversarial test cases. The remaining 82.8% passed functional tests while containing exploitable vulnerabilities including injection, authentication bypass, IDOR, and exposed network services.

Methodology

Each task pairs a benign functional specification with adversarial security test cases. A solution counts as correct only if it passes the functional spec; it counts as secure only if it also passes the security tests. The 82.8% number is the percentage of functionally correct solutions that fail one or more security tests.

Source

SusVibes Benchmark · CMU, Columbia, Johns Hopkins (December 2024)

Suggested citation

SusVibes Benchmark via DevSecCode, "AI-generated code security failure rate: 82.8% of functionally correct LLM code is exploitable" (devseccode.com/research, 2026).

Deep dive: AI coding assistant security risks

Real-World Disaster

The OpenClaw incident exposed 30,000+ systems with AI-generated security holes.

30,000+systems compromised via AI-generated code

Summary

OpenClaw was a viral AI assistant project that reached 100,000 GitHub stars in two months. Subsequent security analysis found that AI-generated portions of the codebase contained catastrophic vulnerabilities that traditional scanners missed: a single-character password ('a') was accepted as valid authentication, services bound to 0.0.0.0:18789 exposed administrative interfaces to the public internet, the AI freely returned API keys when asked, and an allowInsecureAuth: true flag bypassed all authentication checks. By the time the issues were disclosed, 30,000+ deployed instances had been identified by external scanners.

Methodology

Population of affected systems estimated via Shodan/Censys scans for the distinctive OpenClaw service banner on port 18789 during the January 2026 disclosure window. The vulnerability classes (CWE-521 weak authentication, CWE-668 exposed services, CWE-266 security bypass, CWE-200 credential exposure, CWE-78 command injection) are deterministically detectable at code level.

Source

Bitsight Security Research (January 2026)

Suggested citation

Bitsight Security Research via DevSecCode, "OpenClaw incident: 30,000+ systems exposed by AI-generated security holes" (devseccode.com/research, 2026).

How Deva would have caught these

Deva Coder Benchmark

Deva Coder v8 achieves 87.5% accuracy on SecurityEval CWE detection.

87.5%SecurityEval CWE detection accuracy

Summary

Deva Coder v8, the local security-focused coding model in the Deva model family, was benchmarked against the SecurityEval CWE detection suite. The model achieved 87.5% accuracy on classifying and remediating CWE-categorized vulnerabilities in code, 99.7% syntax pass rate on MBPP, 93.3% tool-use compliance, and 100% fix generation rate when a vulnerability is identified. The model runs locally on Apple Silicon or H200-class GPUs with no cloud calls.

Methodology

SecurityEval is an open benchmark covering ~75 CWE patterns across Python, JavaScript, TypeScript, Go, Java, and Ruby. Each task supplies vulnerable source code; the model must identify the CWE and produce a remediation. Accuracy is the percentage of tasks where the model correctly identifies the CWE and produces a remediation that passes the security test suite. First-token latency was ~1.3s on H200; locally on Apple Silicon, the model produces secure code without any outbound network calls.

Source

Deva Coder v8 benchmark · H200 GPU run, April 2026

Suggested citation

DevSecCode, "Deva Coder v8 SecurityEval results: 87.5% CWE detection accuracy" (devseccode.com/research, 2026).

Inside the Deva model family

Scanner Coverage

970+ CWE rules across 84 categories with AST + taint tracking.

970+CWE rules built in

Summary

The Deva security scanner ships with 970+ CWE detection rules organized across 84 CWE categories. The rule pack includes 163 taint-mode rules tracking data flow from input sources (HTTP parameters, message bodies, file reads) to dangerous sinks (database queries, shell commands, DOM insertion), and 178 search-mode rules for pattern matching against insecure configurations and API usage. Rules are authored in YAML and are compatible with the Semgrep rule format.

Methodology

The rule pack is maintained against MITRE's CWE Top 25 and the OWASP Top 10 (2021 and 2025 draft). Each rule includes a CWE identifier, a severity rating, language coverage, and compliance framework mapping. Rules ship in YAML and are loaded at scan time. The 84-category count is the unique CWEs covered by at least one rule.

Source

Deva Scanner Engine rule catalog · current as of 2026-05

Suggested citation

DevSecCode, "Deva security scanner rule catalog: 970+ rules across 84 CWE categories" (devseccode.com/research, 2026).

Deva security scanner

Supply Chain Surface

27,000+ CVE advisories cross-referenced against 2,800+ package metadata catalog.

27K+CVEs in supply chain catalog

Summary

Deva's SCA layer maintains a 27,000+ CVE advisory catalog synced from the National Vulnerability Database, the GitHub Advisory Database, and the Open Source Vulnerabilities database. The catalog is enriched with metadata for 2,800+ packages across npm, PyPI, RubyGems, Maven Central, Go modules, and Crates. SCA runs locally without contacting external services in air-gapped deployments by using a periodically-refreshed snapshot of the catalog.

Methodology

CVE advisories are de-duplicated across NVD, GHSA, and OSV using purl (package URL) identifiers. Package metadata (download counts, last-published date, maintainer counts, repository linkage) is sourced from native registry APIs. The 27,000+ count is the union of distinct advisories affecting at least one package in the catalog as of May 2026.

Source

Deva supply-chain catalog · synced from NVD, GHSA, OSV

Suggested citation

DevSecCode, "Deva supply-chain catalog: 27,000+ CVEs across 2,800+ packages" (devseccode.com/research, 2026).

Log4Shell three years later: dependency graph blindness

Compliance Coverage

17 compliance frameworks mapped at code level with 6 export formats.

17compliance frameworks mapped

Summary

Deva's compliance engine maps every CWE finding to the relevant controls of 17 compliance frameworks: HIPAA, PCI-DSS v4.0, SOC 2 Type II, CMMC 2.0 (Levels 1 through 3), NIST SP 800-53 Rev 5, NIST CSF 2.0, FedRAMP (Low, Moderate, High), GDPR, SOX ITGC, OWASP Top 10 (2021 and 2025 draft), CIS Controls v8, ISO 27001, NIST 800-171 Rev 2, and FISMA. Findings export in SARIF, OSCAL, JUnit XML, CSV, JSON, and an agent-json format consumable by downstream AI agents.

Methodology

Each compliance framework's controls are mapped to specific CWE rules via a many-to-many mapping table maintained by the Deva Security Team. The mapping is bidirectional: a finding shows which controls it violates, and a framework view shows which controls have passing, failing, or attestation-required status. SARIF and OSCAL exports include the compliance metadata so downstream tools (audit evidence platforms, SIEMs, GRC systems) can consume the data directly.

Source

Deva compliance engine · 17 frameworks shipping as of 2026-05

Suggested citation

DevSecCode, "Deva compliance engine: 17 frameworks mapped at code level" (devseccode.com/research, 2026).

Compliance solutions

Citation policy: Findings on this page are intended for use as references in academic, industry, and journalistic work. Each item lists its source and a suggested citation string. If a finding cites an external source (SusVibes Benchmark, Bitsight Security Research), follow that source's own citation policy in addition. Direct anchor links work for each finding (for example, /research#susvibes-ai-code-insecure).