AI Developer Digest

Sat, May 23, 2026

9 signals that cleared the gate20 scanned18 min read

The Signal — start here

A genuinely light 24-hour window following last week's Google I/O wave. Two threads dominate: Anthropic published its first Project Glasswing quantitative results on May 22 — 10,000+ vulnerabilities found across critical software using Claude Mythos Preview in one month, with access expanded to 90+ organizations and partners now explicitly permitted to publicly disclose findings; and llama.cpp continued its multi-backend hardware sprint (b9286–b9297), with NVFP4 MTP scale tensor support landing for Qwen3.5 (May 23, b9297), ZenDNN Q8_0 support for AMD CPU inference (b9286), and SYCL MoE prefill throughput improvements (b9291). No new model releases or API changes from any major lab in this window. The most urgent item in the entire digest remains the Gemini Interactions API default switch firing in 3 days (May 26).

Must-reads today

Glasswing Initial Update — Anthropic's Mythos Preview is finding vulns at scale (10k+ high/critical, 10× faster than human testers per Cloudflare); partners can now publicly disclose findings; 40+ new orgs just gained access — OSS maintainers should monitor their security disclosure inboxes

⚠️ Gemini Interactions API: 3 DAYS — outputs → steps default switch fires May 26; code not migrated will silently parse wrong response structures

Breaking Changes

No breaking changes this period.

⚠️ URGENT — 3 DAYS: Gemini Interactions API outputs → steps default switch fires May 26, 2026. Legacy schema removed June 8. See Worth Watching section and May 17–22 digests for full migration steps.

Model Releases

Nothing new in this scan window. Last major releases: Gemini 3.5 Flash and Cohere Command A+ (May 19–20, covered in May 21 digest); Claude Opus 4.7 (April 16); GPT-5.5 (April 23).

API & SDK Changes

Nothing new in this scan window. Last Anthropic release notes entry: May 19, 2026 (MCP tunnels research preview, Managed Agents self-hosted sandboxes). Last Google AI changelog entry: May 19, 2026 (gemini-3.5-flash GA). No OpenAI API changes visible in May 22–23 window.

Research

Nothing cleared the quality bar this period. arXiv cs.CL and cs.AI direct listing pages for May 23 returned 403 errors. Search queries surfaced papers from February–May 2026 (ISO-Bench 2602.19594, Mem0 May 20) but none from recognized labs with associated code repos specifically published May 22–23 within the scan window. HuggingFace Papers Daily returned 403 at fetch time.

Tooling

Notable

llama.cpp b9286–b9297 — Multi-Backend Sprint: NVFP4 MTP Lands, SYCL MoE Throughput Improved, ZenDNN Q8_0 Added

What changed

Nine builds (b9286–b9297) extending the multi-backend hardware sprint begun May 17: NVFP4 MTP scale tensor support with Qwen3.5-specific tensor linking and pointer alignment (b9297, May 23); SYCL MoE prefill throughput improvement via contiguous mapping and counting sort (b9291); OpenCL Adreno MoE kernel generalization across all M-series mobile Snapdragon GPUs (b9294); ZenDNN Q8_0 quantization support for AMD CPU inference (b9286); Vulkan SPIRV-Headers Windows find_package fix (b9295); SYCL Level Zero centralized GPU detection (b9290); perplexity integer overflow fix (b9292). All on top of b9272–b9285 covered in the May 22 digest.

TL;DR

NVFP4 quantization now correctly handles MTP (Multi-Token Prediction) scale tensors for Qwen3.5, resolving a functional incorrectness risk documented in Discussion #22042 and improving perplexity from ~11.65 to ~11.60; SYCL gets faster MoE prefill via counting sort; AMD CPUs get Q8_0 support via ZenDNN; Snapdragon GPUs get generalized Adreno MoE kernels.

Developer signal

For teams running NVFP4 Qwen3.5 models with MTP speculative decoding in llama.cpp: b9297 is the build to target — prior builds had a functional incorrectness risk where NVFP4 scale tensor separation was unclear (Discussion #22042), which manifested as 0% draft acceptance in MTP speculative decoding. The b9297 fix correctly links MTP heads to NVFP4 scale tensors; the perplexity improvement to ~11.60 is modest but confirms the fix is real and measurable. For SYCL users (Intel Arc, Intel Data Center GPU Max) running MoE models: b9291's counting sort + contiguous mapping approach reduces prefill latency for batch workloads — update and benchmark your prefill throughput. For AMD CPU inference with ZenDNN backend: b9286 adds Q8_0 quantization, aligning ZenDNN quantization support with the main CUDA/Metal backends. For Snapdragon/Adreno GPU inference: b9294 generalizes the Adreno MoE kernel across all M-series mobile SoCs rather than a single hardware variant — pull b9294+ to pick this up automatically. Caution on pinning strategy: The sprint cadence (9 builds in ~24 hours on top of ~30 in the prior week) continues to create versioning pressure for Docker image pipelines pinned to specific builds. If you pin builds, consider a weekly pin policy or switching to latest during this sprint phase, then re-pinning to a stable build once the sprint settles.

Affects you ifYou run NVFP4 Qwen3.5 with MTP speculative decoding (b9297 fixes a blocking correctness bug); you use SYCL/Intel GPU for MoE model inference; you run AMD CPU inference via ZenDNN; you deploy llama.cpp on Snapdragon/Adreno mobile GPUs.EffortQuick (pull latest build; validate perplexity and draft acceptance rate if using NVFP4+MTP; no config changes required).

ggml-org/llama.cpp (GitHub) | Dates: May 22–23, 2026 | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9297https://github.com/ggml-org/llama.cpp/releases/tag/b9297

Benchmarks & Leaderboards

No new leaderboard entries confirmed in the May 22–23 scan window. LMArena returned 403 at direct fetch; SWE-bench Verified direct page unavailable. Context from prior scans and this window's search results: SWE-bench Verified leaderboard as of early May 2026 — Claude Mythos Preview at 93.9% (top), Claude Opus 4.7 (Adaptive) at 87.6%, GPT-5.5 at 88.7% (OpenAI-reported, April 23); SWE-bench Pro — Claude Opus 4.7 at 64.3% (#1), GPT-5.5 at 58.6% (#2), Gemini 3.5 Flash at 55.1% (covered in May 21–22 digests). LMArena — gemini-3.5-flash added to text and code leaderboards May 19; stable Elo not yet confirmed in this scan window. No new entries or movement to report.

Trends & Emerging Tech

NVFP4 Quantization Becoming First-Class in llama.cpp

What's happening

The b9297 NVFP4 MTP scale tensor fix marks the resolution of the primary blocker for NVFP4 deployment in production llama.cpp setups. NVFP4 development began in llama.cpp in late March–April 2026. The outstanding correctness issue — unclear separation of concerns around scale tensor attachment — caused 0% draft acceptance when paired with MTP speculative decoding. With b9297, Qwen3.5 NVFP4 + MTP is functionally correct and measurably better than the unresolved version. NVFP4 targets NVIDIA Blackwell (H100/H200/B100) and offers 4-bit density with higher fidelity than GGUF Q4_K_M on supported hardware.

Why watch this

If NVFP4 + MTP correction generalizes to Llama-family models beyond Qwen3.5 (which is the current test case), it becomes the standard local inference quantization for teams with NVIDIA H-series hardware. The key open data point is a head-to-head NVFP4 vs. MXFP4 vs. Q4_K_M quality benchmark across a broader model set — that comparison hasn't been published yet. Watch for that benchmark in the next 1–2 weeks; it will determine whether NVFP4 is a meaningful improvement or a marginal one relative to existing quantization options.

ggml-org/llama.cpp GitHub (Discussions #22042, #20711) | Date: May 23, 2026 | Link: https://github.com/ggml-org/llama.cpp/discussions/22042

Technical Discussions

Medium

Project Glasswing: Initial Update — 10,000+ Vulnerabilities Found, 90+ Organizations Now Active, Partners Cleared to Publicly Disclose Findings

What changed

One month after Project Glasswing launched (April 7, 2026 invitation-only), Anthropic published the first quantitative results and significantly expanded the initiative: 40+ additional organizations gained Mythos Preview access (total ~90+ organizations); partners are now explicitly permitted to publicly disclose Mythos-generated findings to security teams, industry organizations, regulators, government agencies, OSS maintainers, media, and the public (subject to responsible disclosure standards — previously partners operated under confidentiality); Anthropic committed $100M in Mythos Preview usage credits and $4M in direct donations to open-source security organizations.

TL;DR

Claude Mythos Preview has found 10,000+ high/critical-severity vulnerabilities across ~50 partners' critical software in one month at a false positive rate better than human testers (Cloudflare's assessment); 6,202 high/critical vulnerabilities identified in 1,000+ open-source projects; partner access expanding from ~50 to ~90+ organizations.

Developer signal

Three concrete developer signals depending on your context: (1) OSS maintainers: This is the most immediate action item. Mythos has identified ~6,200 high/critical vulnerabilities in 1,000+ open-source projects, and as of May 22, Glasswing partners are explicitly permitted to disclose these findings publicly through normal security channels. You may begin receiving vulnerability reports attributed to Glasswing/Mythos Preview scanning from partner security organizations — check your project's security disclosure inbox and SECURITY.md contact. Standard 90-day responsible disclosure timelines apply per partner disclosure agreements, so if you haven't received anything yet, reports may be in the pipeline. (2) Enterprise developers using Claude Security (Claude Enterprise): The Cloudflare data is the first published third-party calibration of Mythos Preview's false positive rate in a production security context — 2,000 bugs found, FP rate "better than human testers" is an external operator claim, not Anthropic self-reporting. The 90.6% true positive rate across 1,752 findings independently reviewed also supports this signal. If you're evaluating Claude Security for your organization, use these numbers as calibration baselines for comparing against your current toolchain's FP rates. (3) Security tooling builders: The Glasswing false positive rate data (Cloudflare: better than human testers; Anthropic independent review: 90.6% TP rate) is a published benchmark for what AI-assisted vulnerability scanning at scale can achieve. Compare against your tool's current TP/FP metrics before positioning against Mythos-tier approaches.

Affects you ifYou maintain open-source software (your project may have Mythos-generated reports incoming via partners); you are a Claude Enterprise customer using Claude Security; you build or evaluate AI-assisted vulnerability scanning tools.EffortModerate (Claude Security via Claude Enterprise; Glasswing partner program requires application to Anthropic; for OSS maintainers, no action required — monitor disclosure inbox and ensure SECURITY.md is current).

Anthropic Research | Date: May 22, 2026 | Link: https://www.anthropic.com/research/glasswing-initial-updatehttps://www.anthropic.com/research/glasswing-initial-update

Quick Hits

llama.cpp b9297 (May 23) — NVFP4 MTP scale tensor fix: adds Qwen3.5 MTP tensor linking and pointer alignment; resolves functional incorrectness risk (Discussion #22042); perplexity improves from ~11.65 to ~11.60. [https://github.com/ggml-org/llama.cpp/releases/tag/b9297]
llama.cpp b9291 (May 22) — SYCL MoE prefill throughput: contiguous mapping + counting sort for Intel GPU; relevant for batch prefill on Intel Arc and Intel Data Center GPU Max. [https://github.com/ggml-org/llama.cpp/releases/tag/b9291]
llama.cpp b9294 (May 22) — OpenCL Adreno MoE kernel generalized across all M-series Snapdragon SoCs; previously variant-specific, now portable. [https://github.com/ggml-org/llama.cpp/releases/tag/b9294]
llama.cpp b9286 (May 22) — ZenDNN backend adds Q8_0 quantization support; AMD CPU inference now matches CUDA/Metal backends on Q8_0 parity. [https://github.com/ggml-org/llama.cpp/releases/tag/b9286]
llama.cpp b9295 (May 22) — Vulkan: fixes find_package(SPIRV-Headers) on Windows; affects cross-compilation setups for Vulkan-targeted Windows deployments. [https://github.com/ggml-org/llama.cpp/releases/tag/b9295]
llama.cpp b9292 (May 22) — Perplexity tool: fixes integer overflow in token count accumulation; affects long-document perplexity measurements. [https://github.com/ggml-org/llama.cpp/releases/tag/b9292]

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️ Gemini Interactions API `outputs` → `steps` — Default Switch May 26 (3 DAYS)

(Carried from May 17–22 digests — CRITICAL: deadline is 3 days away)

Default schema switch fires May 26; legacy schema permanently removed June 8. Python SDK ≥2.0.0 and JS SDK ≥2.0.0 auto-opt into new schema, but response-parsing code reading response.outputs must be updated to iterate response.steps filtered by step.type. Multi-turn history management must also be updated. Apps not migrated will silently parse incorrect response structures from May 26. See May 17 digest for full migration steps.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026

⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (23 days) — NEWLY HIGHLIGHTED

(Announced April 14, 2026 — now surfaced with 23 days remaining)

claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors on June 15, 2026. No automatic failover — the call fails with no fallback. Migration: Sonnet 4 → claude-sonnet-4-6-20260217; Opus 4 → claude-opus-4-7-20260416. Note: Opus 4.7 has breaking changes versus Opus 4.6 — see the migration guide at /docs/en/about-claude/models/migration-guide#migrating-to-claude-opus-4-7 before upgrading. Sonnet 4.6 includes the 1M token context window (GA) and improved agentic search. If you are still using the claude-sonnet-4-20250514 or claude-opus-4-20250514 model IDs anywhere in your stack, migrate now.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown June 1 (9 days)

(Carried from May 21–22 digests)

gemini-2.0-flash and gemini-2.0-flash-lite return errors on June 1, 2026. Migration: gemini-2.5-flash ($0.30/$2.50/MTok) or gemini-2.5-flash-lite ($0.10/$0.40, identical pricing to 2.0 Flash).

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations

Gemini API Unrestricted Key Deadline — June 19

(Carried from May 21–22 digests)

All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API" (one-click action).

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key

Ollama v0.30.0 — Still Pre-Release (rc23 as of May 13)

(Carried from May 15 digest)

v0.30.0 restructures Ollama to use llama.cpp directly as backend, with MLX for Apple Silicon inference. No stable GA date announced.

Ollama (GitHub) | Link: https://github.com/ollama/ollama/releases

Gemini 3.5 Pro — Expected ~June 2026

(Carried from May 21–22 digests)

Confirmed in internal testing at Gemini 3.5 Flash launch (May 19). No model ID, pricing, or benchmarks disclosed.

Google (Google I/O 2026) | Link: https://blog.google/innovation-and-ai/technology/ai/google-io-2026-all-our-announcements/

Filtered from 30+ primary sources against a published quality rubric. No press releases, no fluff — only what changes what you build.

Breaking Changes

Model Releases

API & SDK Changes

Research

Tooling

llama.cpp b9286–b9297 — Multi-Backend Sprint: NVFP4 MTP Lands, SYCL MoE Throughput Improved, ZenDNN Q8_0 Added

Benchmarks & Leaderboards

Trends & Emerging Tech

NVFP4 Quantization Becoming First-Class in llama.cpp

Technical Discussions

Project Glasswing: Initial Update — 10,000+ Vulnerabilities Found, 90+ Organizations Now Active, Partners Cleared to Publicly Disclose Findings

Quick Hits

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️ Gemini Interactions API `outputs` → `steps` — Default Switch **May 26 (3 DAYS)**

⚠️ Claude Sonnet 4 + Opus 4 — Retirement **June 15 (23 days)** — NEWLY HIGHLIGHTED

⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown **June 1 (9 days)**

Gemini API Unrestricted Key Deadline — June 19

Ollama v0.30.0 — Still Pre-Release (rc23 as of May 13)

Gemini 3.5 Pro — Expected ~June 2026

⚠️⚠️ Gemini Interactions API `outputs` → `steps` — Default Switch May 26 (3 DAYS)

⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (23 days) — NEWLY HIGHLIGHTED

⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown June 1 (9 days)