AI Developer Digest
1 item passed quality gate | 22 scanned | 21 excluded | Sources checked: 28 Scan window: May 24 (post-prior-scan cutoff ~20:00 UTC) through May 25, 2026. The May 24 digest covered: llama.cpp b9298–b9305 (cmake -fPIC build fix, prior maintenance builds); Gemini Interactions API outputs→steps default switch (May 26 tomorrow); GitHub Copilot metered billing June 1; Gemini 2.0 Flash/Lite shutdown June 1.
This Week's Signal
The May 25 window is the fourth consecutive near-zero day following the Google I/O 2026 release wave. The single new item passing the quality gate is LiteLLM v1.87.0-rc.1, a release candidate adding Microsoft Purview DLP guardrails, Granian ASGI server support, weighted-routing failover, and OTel GenAI semantic conventions to the AI gateway — the first signs of enterprise compliance features entering open-source AI proxy infrastructure. Google also shut down
gemini-3.1-flash-lite-previewtoday; the GA replacement (gemini-3.1-flash-lite) has been available since May 7. The most time-critical item for developers remains the Gemini Interactions API schema change, which fires tomorrow (May 26) — code readingresponse.outputswill silently parse wrong response structures with no exception raised.
Must-reads this digest:
- ⚠️⚠️⚠️ Gemini Interactions API: FIRES TOMORROW (May 26) —
outputs→stepsdefault switch is live in less than 24 hours; unprepared code silently parses incorrect response structures with zero error raised — if you haven't migrated, stop and do it now - LiteLLM v1.87.0-rc.1 — Microsoft Purview DLP guardrail integration is the first enterprise-grade compliance feature to land in an open-source AI gateway; signals a new surface area in the AI tooling stack
[BREAKING] Breaking Changes
No breaking changes shipped in this window.
⚠️⚠️⚠️ CRITICAL — TOMORROW: Gemini Interactions API outputs → steps default switch fires May 26, 2026 — less than 24 hours from now. Code not migrated will silently parse wrong response structures from May 26 with no exception raised — this is a silent data corruption failure, not a crash. Legacy schema permanently removed June 8. See Worth Watching section for migration steps.
Model Releases
Nothing new in this scan window. Last major releases: Gemini 3.5 Flash and Cohere Command A+ (May 19–20, covered in May 21 digest); Claude Opus 4.7 (April 16); GPT-5.5 (April 23).
API & SDK Changes
Nothing new in this scan window. Anthropic Platform most recent entry: May 19, 2026 (MCP tunnels research preview, self-hosted sandboxes, agent MCP config update, large output spill). OpenAI Platform most recent entries: May 12, 2026 (DALL-E 2/3 deprecation, Realtime API Beta removal; GPT-Realtime-2 + Realtime Translate + Realtime Whisper launched May 7, covered in prior digests). No new Google AI changelog entries confirmed for May 24–25. Anthropic SDK Python most recent: v0.104.1, May 22.
Research
Nothing cleared the quality bar this period. arXiv cs.CL and cs.AI listing pages returned 403 at fetch time. HuggingFace Papers Daily also returned 403. Two search-surfaced papers on agent skills (SkillOpt, "From Raw Experience to Skill Consumption") did not confirm recognized-lab authorship or associated code repositories in search snippets — excluded below quality gate.
Tooling
[NOTABLE] LiteLLM v1.87.0-rc.1 — Enterprise Compliance and Infrastructure Overhaul
Source: LiteLLM (GitHub) | Date: May 24, 2026 | Link: https://github.com/BerriAI/litellm/releases/tag/v1.87.0-rc.1 What changed: Added Microsoft Purview DLP guardrail integration, Granian ASGI web server support, weighted-routing failover mechanism, OTel GenAI semantic conventions, and Anthropic streaming performance improvements; split gateway, UI backend, and UI into separate deployable components. TL;DR: LiteLLM's first enterprise compliance guardrail lands in v1.87.0-rc.1 — Microsoft Purview DLP blocks sensitive data in AI requests before they leave your perimeter, running as an inspection layer in the proxy. Developer signal: If you're operating LiteLLM as a production AI gateway with data sensitivity requirements, the Purview DLP integration lets you enforce data-loss-prevention policies on AI traffic without modifying application code — configure once at the gateway, inspect all upstream requests. The Granian ASGI backend replaces the default server and is reported to improve throughput stability under concurrent load; benchmark against your traffic profile before switching production. The OTel GenAI semantic conventions support standardizes how LiteLLM emits observability data to follow the emerging OpenTelemetry GenAI spec, which means span attributes for model, provider, token counts, and latency will align with other OTel-instrumented services — useful if you're building unified dashboards. The weighted-routing failover mechanism lets you assign relative weights to model endpoints so failover distributes probabilistically rather than sequentially. This is a release candidate — validate in staging before promoting. Affects you if: You are running LiteLLM as a production AI gateway with compliance requirements, handling sensitive data through AI APIs, want Anthropic streaming latency improvements, or are building OpenTelemetry-based observability for AI workloads. Adoption effort: Moderate (RC — test before production; Granian server requires explicit opt-in; DLP integration requires Purview configuration) Primary source: https://github.com/BerriAI/litellm/releases/tag/v1.87.0-rc.1 Quality gate score: 9 (official release +3, concrete code changes +2, GitHub primary source +2, within window +1, technical audience +1)
Benchmarks & Leaderboards
No confirmed new entries in the May 24–25 scan window. LMArena direct fetch returned 403. From search-sourced tracker data (not directly verified from lmarena.ai): Claude Opus 4.6 and Claude Opus 4.6 Thinking and Gemini 3.1 Pro Preview remain in a statistical tie at top-3 on the text leaderboard (Elo ~1501–1504, overlapping confidence intervals); Claude Opus 4.7 leads the Code arena at Elo 1569, ahead of GPT-5.2-codex which has held #1 on Code since January 2026. No confirmed new model entries or leaderboard movements in this scan window.
Trends & Emerging Tech
Enterprise Compliance Is Arriving in Open-Source AI Gateways
Source: LiteLLM GitHub | Date: May 24, 2026 | Link: https://github.com/BerriAI/litellm/releases/tag/v1.87.0-rc.1 What's happening: LiteLLM v1.87.0-rc.1 ships the first Microsoft Purview DLP integration in an open-source AI proxy, alongside OTel GenAI semantic conventions support. Both are enterprise-readiness signals — compliance inspection at the gateway layer and standardized observability telemetry — that previously existed only in closed enterprise AI platforms (Azure AI Studio, AWS Bedrock). The addition of a componentized architecture (separate gateway, UI backend, UI) further mirrors enterprise service decomposition patterns. Why watch this: If this lands in a stable release and competitors (LiteLLM proxies, BerriAI, or cloud-native alternatives) follow suit, enterprise teams may stop building custom compliance middleware on top of AI SDKs and instead drop in a gateway-as-compliance-layer. The DLP pattern in particular — intercept outbound AI requests, inspect for sensitive data, block or redact before forwarding — is a compliance requirement in regulated industries (financial services, healthcare, government). The first open-source implementation of this pattern is a forcing function for enterprise AI adoption conversations. Watch for the stable release of v1.87.0 and whether major AI gateway projects (OpenRouter, port of custom LangChain proxy patterns) follow.
Technical Discussions
Nothing cleared the quality bar this period.
Quick Hits
- llama.cpp b9313 (May 25) — ggml: Parallelize quant LUT init using OpenMP; relocates OpenMP detection from ggml-cpu to ggml-base; updates ggml-config.cmake.in dependencies. Reduces initialization overhead for IQ2XS and IQ3XS quantization lookup tables on multi-core systems. [https://github.com/ggml-org/llama.cpp/releases/tag/b9313]
- llama.cpp b9311 (May 25) — vendor: update cpp-httplib to 0.45.1; dependency maintenance, no feature changes. [https://github.com/ggml-org/llama.cpp/releases/tag/b9311]
- llama.cpp b9315 (May 25) — llama: documentation update clarifying that only one on-device state can be saved per sequence; no code changes. [https://github.com/ggml-org/llama.cpp/releases/tag/b9315]
- gemini-3.1-flash-lite-preview — shutdown today (May 25) — Preview model retired per deprecation schedule. Migration: use
gemini-3.1-flash-lite(GA since May 7, $0.25/MTok input, $1.00/MTok output). If you are still calling the preview model ID your calls will fail from today. [https://ai.google.dev/gemini-api/docs/deprecations]
Worth Watching (Announced, Not Yet Shipped)
⚠️⚠️⚠️ Gemini Interactions API outputs → steps — Default Switch TOMORROW (May 26)
(Carried from May 17–24 digests — CRITICAL: less than 24 hours remaining)
Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026
Default schema switch fires May 26; legacy schema permanently removed June 8. Python SDK ≥2.0.0 and JS SDK ≥2.0.0 auto-opt into the new schema, but any response-parsing code reading response.outputs must be updated to iterate response.steps filtered by step.type. Multi-turn history management must also be updated. Apps not migrated will silently parse incorrect response structures from May 26 — no exception is raised, just wrong data. See the May 17 digest for full migration steps. Act now.
⚠️⚠️ GitHub Copilot — Metered Billing Transition June 1 (7 days)
(Announced April 28, 2026) Source: GitHub Blog | Link: https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/ All GitHub Copilot plans switch from request-based to token-based AI Credit billing on June 1. Code completions remain free. Agent-heavy workflows (multi-file edits, long-horizon reasoning with o3/GPT-5+) now have explicit per-token costs. Audit projected usage in the GitHub preview bill experience before June 1.
⚠️⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown June 1 (7 days)
(Carried from May 21–24 digests)
Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations
gemini-2.0-flash and gemini-2.0-flash-lite return errors on June 1, 2026. Migration: gemini-2.5-flash ($0.30/$2.50/MTok) or gemini-2.5-flash-lite ($0.10/$0.40/MTok).
⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (21 days)
(Carried from May 22–24 digests)
Source: Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations
claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors on June 15, 2026. No automatic failover. Migration: Sonnet 4 → claude-sonnet-4-6-20260217; Opus 4 → claude-opus-4-7-20260416. Note: Opus 4.7 has breaking changes vs. Opus 4.6 — read the migration guide before upgrading.
Gemini API Unrestricted Key Deadline — June 19 (25 days)
(Carried from May 21–24 digests) Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API" (one-click action).
Ollama v0.30.0 — Still Pre-Release (rc23 as of May 22)
(Carried from May 15 digest) Source: Ollama (GitHub) | Link: https://github.com/ollama/ollama/releases v0.30.0 restructures Ollama to use llama.cpp directly as backend, with MLX for Apple Silicon inference. No stable GA date announced.
Gemini 3.5 Pro — Expected ~June 2026
(Carried from May 21–24 digests) Source: Google (Google I/O 2026) | Link: https://blog.google/innovation-and-ai/technology/ai/google-io-2026-all-our-announcements/ Confirmed in internal testing at Gemini 3.5 Flash launch (May 19). No model ID, pricing, or benchmarks disclosed.
<details> <summary>🔭 Horizon — Open Questions, Emerging Patterns & Grounded Speculation</summary>
This section operates under different rules than the digest above. Evidence-grounded speculation is allowed. Pure prediction is not. Every claim here must cite a source from this digest or a real paper/benchmark. Label each entry by type so the reader knows what kind of thinking they're engaging with.
[PATTERN] Enterprise compliance is entering the open-source AI gateway layer — and it will force a market consolidation question LiteLLM v1.87.0-rc.1 ships Microsoft Purview DLP as a gateway-layer guardrail, the first compliance inspection feature of this kind in an open-source AI proxy. This follows the componentized architecture split (gateway, UI backend, UI as separate services) and OTel GenAI semantic conventions support — both patterns that signal a shift from "developer utility" to "enterprise infrastructure." The parallel is instructive: when Nginx added rate limiting and auth modules, the market question shifted from "which reverse proxy" to "which enterprise proxy platform." LiteLLM is now at that inflection. The open question is whether the compliance features will remain open-source (attracting enterprise adoption as a foundation) or become a paid enterprise edition (the Kong/Nginx Enterprise pattern). Watch for whether a stable v1.87.x ships with Purview DLP in the community edition or behind a gate. Grounded in: LiteLLM v1.87.0-rc.1 Purview DLP + OTel + componentized architecture (this digest Tooling section)
[IF THIS CONTINUES] The post-Google-I/O release lull is now in its 5th day — and the deprecation deadlines are accelerating May 21–25 produced zero main-section items from major AI labs across five consecutive digest windows. The paradox: this is simultaneously the highest-urgency 5-day stretch in the last 30 days of digests because four hard deadlines cluster between May 26 and June 19 (Gemini Interactions API, Gemini 2.0 Flash/Lite shutdown, GitHub Copilot billing, Claude Sonnet 4/Opus 4, Gemini API key restriction). If the current pattern holds — labs concentrate releases around conference dates (I/O, DevDay) and then go quiet for 1–2 weeks — then the near-term signal is: the next wave of major releases from Google will cluster around early June (around Gemini 3.5 Pro), from OpenAI likely around late June. The maintenance and deprecation work that actually requires developer attention (the four June deadlines) competes for time precisely in the post-conference period when engineering attention drifts back to product work. Grounded in: Zero main-section items in May 21–25 digests; four deprecation deadlines between May 26 and June 19 (this digest Worth Watching section)
[BUILDER'S ANGLE] OTel GenAI semantic conventions in gateways unlock a new class of AI observability tooling LiteLLM v1.87.0-rc.1 adopts the OpenTelemetry GenAI semantic conventions, which means span attributes for model name, provider, token counts (prompt/completion), finish reason, and request latency will follow a standardized schema across all models proxied through LiteLLM. This is the same standardization moment that happened when HTTP spans got standardized in OTel HTTP semantic conventions — and it unlocked a generation of APM tools (Datadog, Honeycomb, Grafana) that could ingest any service's traces without custom parsers. A developer building AI observability today has two options: (1) wait for their existing APM vendor to add native GenAI support, or (2) deploy LiteLLM as a proxy and immediately get standardized spans that flow into any existing OTel collector. Option 2 is now viable for production use without custom instrumentation code. The unlocked builder opportunity: generic AI latency dashboards, per-model cost attribution, and cross-provider token budget tracking — all from a single OTel pipeline. Grounded in: LiteLLM v1.87.0-rc.1 OTel GenAI semantic conventions support (this digest Tooling section)
[OPEN QUESTION] What is the right preview model lifecycle for foundation model APIs — and is Google's 3-week window sustainable?
gemini-3.1-flash-lite-preview was deprecated May 11 and shut down today (May 25), 18 days after the GA replacement (gemini-3.1-flash-lite) launched on May 7. This is the third Google AI preview model shutdown in May 2026 (alongside the broader Gemini 2.0 Flash/Lite retirement and the API key restriction deadline). For reference, gemini-2.5-flash and gemini-2.5-pro have their own June 2026 deprecation deadlines for non-stable model IDs. The open question: if preview models are now running ~3-week lifecycles from deprecation announcement to shutdown, how does a developer working from a personal project, a hackathon, or a research prototype absorb this cadence? Enterprise teams have processes for dependency management; individual developers and researchers do not. The forum thread title "It's May 2026 and the ecosystem is breaking" (discuss.ai.google.dev, observed in search results but not fetched) suggests this is a live community pain point. Worth watching whether Google adjusts preview windows in response to developer feedback, or maintains the current fast-cycle approach.
Grounded in: gemini-3.1-flash-lite-preview shutdown today (this digest Quick Hits); gemini-2.0-flash shutdown June 1 (this digest Worth Watching); gemini-3.1-flash-lite GA May 7, 2026 (confirmed via search)
Excluded: 21 items below quality gate threshold or outside scan window. Near-misses: LiteLLM v1.85.1 + v1.84.1 (May 21 — outside window by 3+ days, minor patches); Anthropic SDK Python v0.104.1 (May 22 — outside window; carries encrypted_content through beta compaction accumulator); Claude Code v2.1.150 (May 23 — outside window by ~1 day; adds JSON session listing, richer plugin discovery, improved hooks and status UX); Unsloth 2026.5.4 (May 22 — outside window); transformers v5.9.0 (May 20 — outside window; added Cohere Command A+ MoE, Granite Speech Plus, Granite Vision 4.1); Ollama v0.23.4 (May 14 — outside window); LangChain LangGraph DeltaChannel + per-node timeouts (May 12 — outside window); vLLM v0.21.1rc0 (May 15 — outside window, still RC); GPT-Realtime-2 + Realtime Translate + Realtime Whisper (May 7 — well outside window; covered in prior digests); NVIDIA ML blog posts (May 7–11 — outside window; MLPerf inference records, Fleet Intelligence); AWS ML Blog (May 5–12 — outside window); Groq Blog (no posts since February 2026); Together AI (MLSys 2026 content from May 18 — outside window); arXiv cs.CL/cs.AI (403 errors on listing pages at fetch time; SkillOpt paper surfaced via search — no recognized-lab authorship confirmed, no associated code repo found, quality gate score below threshold); HuggingFace Papers Daily (403 error at fetch time); Simon Willison most recent post May 21 (outside window); LMArena direct fetch (403 — standings sourced from third-party trackers, no confirmed new movements); gemini-3.1-flash-lite-preview shutdown (event/deadline, not new release — included as Quick Hit but does not qualify for main section).