AI Developer Digest

Mon, May 25, 2026

9 signals that cleared the gate14 min read

The Signal — start here

The May 25 window is the fourth consecutive near-zero day following the Google I/O 2026 release wave. The single new item passing the quality gate is LiteLLM v1.87.0-rc.1, a release candidate adding Microsoft Purview DLP guardrails, Granian ASGI server support, weighted-routing failover, and OTel GenAI semantic conventions to the AI gateway — the first signs of enterprise compliance features entering open-source AI proxy infrastructure. Google also shut down gemini-3.1-flash-lite-preview today; the GA replacement (gemini-3.1-flash-lite) has been available since May 7. The most time-critical item for developers remains the Gemini Interactions API schema change, which fires tomorrow (May 26) — code reading response.outputs will silently parse wrong response structures with no exception raised.

Must-reads today

⚠️⚠️⚠️ Gemini Interactions API: FIRES TOMORROW (May 26) — outputs → steps default switch is live in less than 24 hours; unprepared code silently parses incorrect response structures with zero error raised — if you haven't migrated, stop and do it now

LiteLLM v1.87.0-rc.1 — Microsoft Purview DLP guardrail integration is the first enterprise-grade compliance feature to land in an open-source AI gateway; signals a new surface area in the AI tooling stack

Breaking Changes

No breaking changes shipped in this window.

⚠️⚠️⚠️ CRITICAL — TOMORROW: Gemini Interactions API outputs → steps default switch fires May 26, 2026 — less than 24 hours from now. Code not migrated will silently parse wrong response structures from May 26 with no exception raised — this is a silent data corruption failure, not a crash. Legacy schema permanently removed June 8. See Worth Watching section for migration steps.

Model Releases

Nothing new in this scan window. Last major releases: Gemini 3.5 Flash and Cohere Command A+ (May 19–20, covered in May 21 digest); Claude Opus 4.7 (April 16); GPT-5.5 (April 23).

API & SDK Changes

Nothing new in this scan window. Anthropic Platform most recent entry: May 19, 2026 (MCP tunnels research preview, self-hosted sandboxes, agent MCP config update, large output spill). OpenAI Platform most recent entries: May 12, 2026 (DALL-E 2/3 deprecation, Realtime API Beta removal; GPT-Realtime-2 + Realtime Translate + Realtime Whisper launched May 7, covered in prior digests). No new Google AI changelog entries confirmed for May 24–25. Anthropic SDK Python most recent: v0.104.1, May 22.

Research

Nothing cleared the quality bar this period. arXiv cs.CL and cs.AI listing pages returned 403 at fetch time. HuggingFace Papers Daily also returned 403. Two search-surfaced papers on agent skills (SkillOpt, "From Raw Experience to Skill Consumption") did not confirm recognized-lab authorship or associated code repositories in search snippets — excluded below quality gate.

Tooling

Notable

LiteLLM v1.87.0-rc.1 — Enterprise Compliance and Infrastructure Overhaul

What changed

Added Microsoft Purview DLP guardrail integration, Granian ASGI web server support, weighted-routing failover mechanism, OTel GenAI semantic conventions, and Anthropic streaming performance improvements; split gateway, UI backend, and UI into separate deployable components.

TL;DR

LiteLLM's first enterprise compliance guardrail lands in v1.87.0-rc.1 — Microsoft Purview DLP blocks sensitive data in AI requests before they leave your perimeter, running as an inspection layer in the proxy.

Developer signal

If you're operating LiteLLM as a production AI gateway with data sensitivity requirements, the Purview DLP integration lets you enforce data-loss-prevention policies on AI traffic without modifying application code — configure once at the gateway, inspect all upstream requests. The Granian ASGI backend replaces the default server and is reported to improve throughput stability under concurrent load; benchmark against your traffic profile before switching production. The OTel GenAI semantic conventions support standardizes how LiteLLM emits observability data to follow the emerging OpenTelemetry GenAI spec, which means span attributes for model, provider, token counts, and latency will align with other OTel-instrumented services — useful if you're building unified dashboards. The weighted-routing failover mechanism lets you assign relative weights to model endpoints so failover distributes probabilistically rather than sequentially. This is a release candidate — validate in staging before promoting.

Affects you ifYou are running LiteLLM as a production AI gateway with compliance requirements, handling sensitive data through AI APIs, want Anthropic streaming latency improvements, or are building OpenTelemetry-based observability for AI workloads.EffortModerate (RC — test before production; Granian server requires explicit opt-in; DLP integration requires Purview configuration)

LiteLLM (GitHub) | Date: May 24, 2026 | Link: https://github.com/BerriAI/litellm/releases/tag/v1.87.0-rc.1https://github.com/BerriAI/litellm/releases/tag/v1.87.0-rc.1

Benchmarks & Leaderboards

No confirmed new entries in the May 24–25 scan window. LMArena direct fetch returned 403. From search-sourced tracker data (not directly verified from lmarena.ai): Claude Opus 4.6 and Claude Opus 4.6 Thinking and Gemini 3.1 Pro Preview remain in a statistical tie at top-3 on the text leaderboard (Elo ~1501–1504, overlapping confidence intervals); Claude Opus 4.7 leads the Code arena at Elo 1569, ahead of GPT-5.2-codex which has held #1 on Code since January 2026. No confirmed new model entries or leaderboard movements in this scan window.

Trends & Emerging Tech

Enterprise Compliance Is Arriving in Open-Source AI Gateways

What's happening

LiteLLM v1.87.0-rc.1 ships the first Microsoft Purview DLP integration in an open-source AI proxy, alongside OTel GenAI semantic conventions support. Both are enterprise-readiness signals — compliance inspection at the gateway layer and standardized observability telemetry — that previously existed only in closed enterprise AI platforms (Azure AI Studio, AWS Bedrock). The addition of a componentized architecture (separate gateway, UI backend, UI) further mirrors enterprise service decomposition patterns.

Why watch this

If this lands in a stable release and competitors (LiteLLM proxies, BerriAI, or cloud-native alternatives) follow suit, enterprise teams may stop building custom compliance middleware on top of AI SDKs and instead drop in a gateway-as-compliance-layer. The DLP pattern in particular — intercept outbound AI requests, inspect for sensitive data, block or redact before forwarding — is a compliance requirement in regulated industries (financial services, healthcare, government). The first open-source implementation of this pattern is a forcing function for enterprise AI adoption conversations. Watch for the stable release of v1.87.0 and whether major AI gateway projects (OpenRouter, port of custom LangChain proxy patterns) follow.

LiteLLM GitHub | Date: May 24, 2026 | Link: https://github.com/BerriAI/litellm/releases/tag/v1.87.0-rc.1

Technical Discussions

Nothing cleared the quality bar this period.

Quick Hits

llama.cpp b9313 (May 25) — ggml: Parallelize quant LUT init using OpenMP; relocates OpenMP detection from ggml-cpu to ggml-base; updates ggml-config.cmake.in dependencies. Reduces initialization overhead for IQ2XS and IQ3XS quantization lookup tables on multi-core systems. [https://github.com/ggml-org/llama.cpp/releases/tag/b9313]
llama.cpp b9311 (May 25) — vendor: update cpp-httplib to 0.45.1; dependency maintenance, no feature changes. [https://github.com/ggml-org/llama.cpp/releases/tag/b9311]
llama.cpp b9315 (May 25) — llama: documentation update clarifying that only one on-device state can be saved per sequence; no code changes. [https://github.com/ggml-org/llama.cpp/releases/tag/b9315]
gemini-3.1-flash-lite-preview — shutdown today (May 25) — Preview model retired per deprecation schedule. Migration: use gemini-3.1-flash-lite (GA since May 7, $0.25/MTok input, $1.00/MTok output). If you are still calling the preview model ID your calls will fail from today. [https://ai.google.dev/gemini-api/docs/deprecations]

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini Interactions API `outputs` → `steps` — Default Switch TOMORROW (May 26)

(Carried from May 17–24 digests — CRITICAL: less than 24 hours remaining)

Default schema switch fires May 26; legacy schema permanently removed June 8. Python SDK ≥2.0.0 and JS SDK ≥2.0.0 auto-opt into the new schema, but any response-parsing code reading response.outputs must be updated to iterate response.steps filtered by step.type. Multi-turn history management must also be updated. Apps not migrated will silently parse incorrect response structures from May 26 — no exception is raised, just wrong data. See the May 17 digest for full migration steps. Act now.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026

⚠️⚠️ GitHub Copilot — Metered Billing Transition June 1 (7 days)

(Announced April 28, 2026)

All GitHub Copilot plans switch from request-based to token-based AI Credit billing on June 1. Code completions remain free. Agent-heavy workflows (multi-file edits, long-horizon reasoning with o3/GPT-5+) now have explicit per-token costs. Audit projected usage in the GitHub preview bill experience before June 1.

GitHub Blog | Link: https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/

⚠️⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown June 1 (7 days)

(Carried from May 21–24 digests)

gemini-2.0-flash and gemini-2.0-flash-lite return errors on June 1, 2026. Migration: gemini-2.5-flash ($0.30/$2.50/MTok) or gemini-2.5-flash-lite ($0.10/$0.40/MTok).

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations

⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (21 days)

(Carried from May 22–24 digests)

claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors on June 15, 2026. No automatic failover. Migration: Sonnet 4 → claude-sonnet-4-6-20260217; Opus 4 → claude-opus-4-7-20260416. Note: Opus 4.7 has breaking changes vs. Opus 4.6 — read the migration guide before upgrading.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

Gemini API Unrestricted Key Deadline — June 19 (25 days)

(Carried from May 21–24 digests)

All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API" (one-click action).

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key

Ollama v0.30.0 — Still Pre-Release (rc23 as of May 22)

(Carried from May 15 digest)

v0.30.0 restructures Ollama to use llama.cpp directly as backend, with MLX for Apple Silicon inference. No stable GA date announced.

Ollama (GitHub) | Link: https://github.com/ollama/ollama/releases

Gemini 3.5 Pro — Expected ~June 2026

(Carried from May 21–24 digests)

Confirmed in internal testing at Gemini 3.5 Flash launch (May 19). No model ID, pricing, or benchmarks disclosed.

Google (Google I/O 2026) | Link: https://blog.google/innovation-and-ai/technology/ai/google-io-2026-all-our-announcements/

Filtered from 30+ primary sources against a published quality rubric. No press releases, no fluff — only what changes what you build.

Breaking Changes

Model Releases

API & SDK Changes

Research

Tooling

LiteLLM v1.87.0-rc.1 — Enterprise Compliance and Infrastructure Overhaul

Benchmarks & Leaderboards

Trends & Emerging Tech

Enterprise Compliance Is Arriving in Open-Source AI Gateways

Technical Discussions

Quick Hits

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini Interactions API `outputs` → `steps` — Default Switch **TOMORROW (May 26)**

⚠️⚠️ GitHub Copilot — Metered Billing Transition **June 1 (7 days)**

⚠️⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown **June 1 (7 days)**

⚠️ Claude Sonnet 4 + Opus 4 — Retirement **June 15 (21 days)**

Gemini API Unrestricted Key Deadline — June 19 (25 days)

Ollama v0.30.0 — Still Pre-Release (rc23 as of May 22)

Gemini 3.5 Pro — Expected ~June 2026

⚠️⚠️⚠️ Gemini Interactions API `outputs` → `steps` — Default Switch TOMORROW (May 26)

⚠️⚠️ GitHub Copilot — Metered Billing Transition June 1 (7 days)

⚠️⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown June 1 (7 days)

⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (21 days)