AI Developer Digest

Sun, Jun 7, 2026

16 signals that cleared the gate17 min read

The Signal — start here

A genuinely light 24-hour window. One urgent action item dominates everything else: the Gemini Interactions API opt-out header (Api-Revision: 2026-05-07) stops being accepted tomorrow at the start of June 8 — developers who haven't migrated from response.outputs to response.steps have hours, not days. Beyond that one must-act item, the one new technical development worth noting is llama.cpp b9549 landing Gemma4 Multi-Token Prediction in the official upstream repo — community testing shows ~40% throughput gain on consumer hardware with no quality loss. The rest of the period is bug fixes and minor SDK patches across the usual repos.

Must-reads today

Gemini Interactions API — LAST DAY before June 8 removal — if your code reads response.outputs or sends Api-Revision: 2026-05-07, it breaks tomorrow; migrate to response.steps now

llama.cpp b9549: Gemma4 MTP — 40% throughput gain (97 → 138 tokens/s on M5 Max) for local Gemma4 inference, no quality tradeoff

Breaking Changes

●Breaking

Gemini Interactions API — Legacy Schema Removal June 8, Less Than 24 Hours

What changed

The Api-Revision: 2026-05-07 opt-out header that preserved the legacy response.outputs schema stops being accepted on June 8. No grandfathering, no extended opt-out. After today, all Interactions API traffic uses only the new response.steps schema — unconditionally.

TL;DR

The Gemini Interactions API legacy response.outputs schema is removed June 8 (fewer than 24 hours); any code sending Api-Revision: 2026-05-07 or reading response.outputs stops working at the start of June 8.

Developer signal

You have today. Three things to check right now: (1) Search your codebase for response.outputs, Api-Revision: 2026-05-07, and response_mime_type. Each of these either breaks or stops working June 8. Replace response.outputs with response.steps — the steps array provides a structured timeline of each interaction turn with polymorphic entry types. (2) If you use the Python SDK (google-generativeai ≥2.0.0) or JavaScript SDK (@google/generative-ai ≥2.0.0), the SDK already automatically uses the new schema — you only need to update how your application code reads the response structure. (3) response_mime_type is gone; use the new response_format polymorphic field instead. Note: any Gemini features shipped after May 7 — including new Gemini 3.5 Flash capabilities — are only available in the new schema. Staying on the opt-out was already costing you new capabilities. Full migration guide with before/after code examples at the primary source link.

Affects you ifYou call the Gemini Interactions API directly or via SDK and your code accesses response.outputs, sends the Api-Revision: 2026-05-07 header, or references response_mime_typeEffortModerate (response-reading code must be updated; if using a current SDK version, the SDK transport layer is already migrated — only application code needs changing)

Google AI for Developers | Date: Removal scheduled June 8, 2026 | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026

Model Releases

No new model releases in this 24h period.

API & SDK Changes

No new API or SDK changes requiring full entries in this 24h period beyond the Gemini breaking change above. See Quick Hits for anthropic-sdk-python v0.107.1 Foundry bug fix.

Research

Nothing cleared the quality bar in this 24h period. No new arXiv papers from recognized labs with measurable benchmark numbers and associated code found in the June 6–7 window.

Tooling

Notable

llama.cpp b9549: Gemma4 Multi-Token Prediction Lands in Official Upstream

What changed

PR #23398 merges Gemma4 MTP (Multi-Token Prediction) support into the official llama.cpp repo. Previously, Gemma4 MTP was only available in the ik_llama.cpp performance fork (PR #1744, merged May 10, 2026, with verified 2.6–2.98x lossless speedup); b9549 brings it to the main project used by most downstream integrations.

TL;DR

llama.cpp b9549 adds official Gemma4 Multi-Token Prediction support, enabling community-verified ~40% throughput gains (97 → 138 tokens/s on MacBook Pro M5 Max) with no measurable quality degradation.

Developer signal

If you're running Gemma4 locally via llama.cpp, update to b9549 and enable MTP to unlock the throughput improvement without changing your model. To enable: add --draft-model <gemma4-mtp-draft-gguf> to your llama-cli or llama-server invocation. The draft GGUF is generated from the Gemma4AssistantForCausalLM model class via the standard convert_hf_to_gguf.py conversion path. The 40% figure comes from community benchmarks on Apple Silicon (M5 Max); CUDA users should expect comparable or better gains since MTP overhead is typically lower on GPU. Before this build, using Gemma4 MTP required building from the ik_llama.cpp fork or maintaining a custom build — b9549 removes that barrier and makes MTP a standard option in any llama.cpp distribution. The ik_llama.cpp fork results (2.6–2.98x at larger batch sizes) are the upper bound; 40% is the conservative single-stream figure. Check the PR for updated configuration examples.

Affects you ifYou run Gemma4 models locally using llama.cpp and care about inference throughput on consumer or prosumer hardware (Apple Silicon, CUDA, or CPU)EffortQuick (update to b9549, generate or download a Gemma4 MTP draft GGUF, add one CLI flag — no architecture or code changes required)

ggml-org/llama.cpp | Date: June 7, 2026, 13:38 UTC | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9549https://github.com/ggml-org/llama.cpp/pull/23398

Benchmarks & Leaderboards

No new leaderboard movements for June 6–7. The LMArena text leaderboard top cluster and SWE-bench Verified rankings are unchanged from the prior digest. Last significant leaderboard entry was June 5 (mistral-medium-3.5 added to Code Arena WebDev leaderboard).

Trends & Emerging Tech

Multi-Token Prediction Crossing the Mainstream Threshold in Local Inference

What's happening

llama.cpp b9549 is the latest in a run of open inference runtimes adding MTP support through Q2 2026, following ik_llama.cpp (May 10), a Gemma4 Transformers-native MTP implementation by the Hugging Face team (April 2026), and experimental DeepSeek V4 MTP in vLLM's roadmap. The common pattern: 1.5x–3x throughput gains without quality loss, by predicting 2–4 tokens per forward pass using a small draft head trained alongside the main model. The ik_llama.cpp fork verified a 2.6–2.98x lossless speedup for Gemma4 at launch prior to upstream merge; b9549 makes this a standard option in mainstream llama.cpp distributions.

Why watch this

MTP is following the same adoption arc as speculative decoding: experimental fork proves it works → upstream integration → standard config option developers set once and forget. If that trajectory continues, the practical impact for builders is twofold. First, throughput benchmark comparisons for local inference need a new disclosure: tokens/second without stating whether MTP was active and what draft model was used is no longer a comparable number. Second, watch whether model card releases start shipping companion draft-model GGUF files as a standard artifact alongside base-model quantizations — that would signal the ecosystem treating MTP as a first-class configuration rather than a power-user optimization.

ggml-org/llama.cpp | Date: June 7, 2026 | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9549

Technical Discussions

Nothing cleared the quality bar this period. No HN threads with score >200 and concrete technical depth found for June 6–7. No qualifying posts from Nathan Lambert (last: June 1), Eugene Yan, or Sebastian Raschka in the scan window.

Quick Hits

Claude Code v2.1.168 (June 6, 23:41) — bug fixes and reliability improvements; update to @latest if on v2.1.167. [https://github.com/anthropics/claude-code/releases/tag/v2.1.168]
anthropic-sdk-python v0.107.1 (June 7, 17:18) — fixes missing x-api-key header for API-key authentication on Claude Platform on AWS (Foundry); if you use Foundry with API-key auth and are seeing auth failures, this is the fix. [https://github.com/anthropics/anthropic-sdk-python/releases/tag/v0.107.1]
llama.cpp b9544 (June 6, 21:20) — fixes LFM2/LFM2.5 reasoning round-trip and memory leak in chat template handling. [https://github.com/ggml-org/llama.cpp/releases/tag/b9544]
llama.cpp b9547–b9551 (June 7) — b9547 skips redundant mmproj download when user supplies their own; b9548 fixes vocab compatibility check in speculative decoding; b9550/b9551 fix KV-cache cell sharing overflow and copy overhead. [https://github.com/ggml-org/llama.cpp/releases]

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal TOMORROW June 8 — ACT TODAY

(Elevated to [BREAKING] Breaking Changes above — same item; see that section for full migration checklist)

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026

⚠️⚠️ Windows Local AI Runtime — KB5039239 June 9 (2 days)

(Countdown updated)

Windows Update KB5039239 delivers the expanded on-device AI stack (Aion 1.0 runtime, CPU/GPU/NPU support) on June 9. Required for production use of Aion 1.0 Instruct and Aion 1.0 Plan on end-user devices. Aion 1.0 open weights land on Hugging Face in July.

Windows Developer Blog | Link: https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (8 days)

(Countdown updated)

claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Migrate to claude-sonnet-4-6-20260217 and claude-opus-4-8 respectively. Review the Opus 4.8 migration guide before upgrading — adaptive thinking replaces budget_tokens; setting temperature, top_p, or top_k to non-default values returns a 400 error.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

⚠️⚠️ Gemini CLI Hard Stop — June 18 (11 days)

(Countdown updated)

gemini CLI and Gemini Code Assist IDE extensions stop serving requests for Google AI Pro, Ultra, and free personal users on June 18. Replacement is Antigravity CLI (agy). Audit CLI scripts and CI pipeline steps now — Antigravity CLI does not have 1:1 feature parity.

Google Developers Blog | Link: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (12 days)

(Countdown updated)

All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API." Takes 2 minutes; no code changes required.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key

⚠️ Gemini Image Models Shutdown — June 25 (18 days)

(Countdown updated)

gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shutting down June 25, 2026. Migrate to stable image model equivalents before the shutdown date.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (20 days)

(Countdown updated)

GPT-4.5 being retired from the ChatGPT product surface on June 27; direct API route retirement unconfirmed. Audit gpt-4.5 model identifiers in code.

OpenAI Platform Changelog | Link: https://platform.openai.com/docs/changelog

⚠️ Claude Opus 4.1 Retirement — August 5 (59 days)

(Countdown updated)

claude-opus-4-1-20250805 retires August 5. Migrate to claude-opus-4-8. Significant migration effort if coming from a pre-4.7 model — see June 6, 2026 digest for the full migration checklist including breaking changes around adaptive thinking, sampling parameters, and tokenizer differences.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — November 30 (176 days)

Deprecated June 3, shutdown November 30, 2026. Move prompt content to application code.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

⚠️ OpenAI Evals Platform Shutdown — November 30 (176 days)

Read-only October 31, shutdown November 30, 2026. Export eval configs before October 31.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

⚠️ OpenAI Agent Builder Shutdown — November 30 (176 days)

Shutdown November 30, 2026. Migrate to Agents SDK (openai.agents) or ChatGPT Workspace Agents.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

(Carried — status unchanged)

No timeline given. Currently: no public API, no claude.ai access at any tier. Leads SWE-bench Verified at 93.9% (internal benchmark as of June 2, 2026).

Anthropic | Link: https://www.anthropic.com/news/expanding-project-glasswing

Gemini 3.5 Pro — Expected July 2026

(Carried — no official date)

Sundar Pichai stated "give us until next month" at Google I/O 2026 (May 19). No official announcement, pricing, model ID, or benchmark numbers.

Filtered from 30+ primary sources against a published quality rubric. No press releases, no fluff — only what changes what you build.

Breaking Changes

Gemini Interactions API — Legacy Schema Removal June 8, Less Than 24 Hours

Model Releases

API & SDK Changes

Research

Tooling

llama.cpp b9549: Gemma4 Multi-Token Prediction Lands in Official Upstream

Benchmarks & Leaderboards

Trends & Emerging Tech

Multi-Token Prediction Crossing the Mainstream Threshold in Local Inference

Technical Discussions

Quick Hits

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal **TOMORROW June 8 — ACT TODAY**

⚠️⚠️ Windows Local AI Runtime — **KB5039239 June 9 (2 days)**

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement **June 15 (8 days)**

⚠️⚠️ Gemini CLI Hard Stop — **June 18 (11 days)**

⚠️⚠️ Gemini API Unrestricted Key Deadline — **June 19 (12 days)**

⚠️ Gemini Image Models Shutdown — **June 25 (18 days)**

⚠️ GPT-4.5 Retirement from ChatGPT — **June 27 (20 days)**

⚠️ Claude Opus 4.1 Retirement — **August 5 (59 days)**

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — **November 30 (176 days)**

⚠️ OpenAI Evals Platform Shutdown — **November 30 (176 days)**

⚠️ OpenAI Agent Builder Shutdown — **November 30 (176 days)**

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

Gemini 3.5 Pro — Expected July 2026

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal TOMORROW June 8 — ACT TODAY

⚠️⚠️ Windows Local AI Runtime — KB5039239 June 9 (2 days)

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (8 days)

⚠️⚠️ Gemini CLI Hard Stop — June 18 (11 days)

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (12 days)

⚠️ Gemini Image Models Shutdown — June 25 (18 days)

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (20 days)

⚠️ Claude Opus 4.1 Retirement — August 5 (59 days)

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — November 30 (176 days)

⚠️ OpenAI Evals Platform Shutdown — November 30 (176 days)

⚠️ OpenAI Agent Builder Shutdown — November 30 (176 days)