← All digests
📡

AI Developer Digest

Mon, Jun 8, 2026

3 items passed quality gate | ~40 scanned | ~37 excluded | Sources checked: 25 Scan window: June 7–8, 2026 (24h). Prior digest covered: Gemini Interactions API legacy schema removal (June 8); llama.cpp b9544, b9547–b9551; Claude Code v2.1.168; anthropic-sdk-python v0.107.1; llama.cpp b9549 Gemma4 MTP.


This Week's Signal

WWDC 2026 dominated today. Apple's most developer-relevant AI platform move since the Foundation Models launch at WWDC25: iOS 27 introduces an Extensions framework (App Intents-based) that lets Claude, ChatGPT, and Gemini plug into Siri, Writing Tools, and Image Playground — a system-level AI provider choice API. Developer beta is available right now for registered Apple Developer Program members. The underlying Siri model runs on a custom Google Gemini 1.2T model, but the Extensions API is separate from that: it's a platform distribution channel for any AI provider that implements the required App Intent schemas. Outside WWDC, llama.cpp had an active patch day: video input support landed in the official repo (FFmpeg-based, model-agnostic), and RDNA3.5 (RX 9070 series) GPU support came to the HIP backend.

Must-reads this digest:

  • Apple iOS 27 + Siri Extensions API — Extensions framework lets any AI app become a Siri/Writing Tools provider via App Intents; developer beta released today; Core AI replaces Core ML for on-device inference. Act now to stay ahead of September GA.
  • llama.cpp b9562: Video Input — Official video input support for multimodal inference in llama.cpp; requires FFmpeg; any mtmd-capable model (Qwen3-VL, Gemma-4, etc.) now accepts video via --video flag or base64 server API.

[BREAKING] Breaking Changes

No breaking changes this period.

(The Gemini Interactions API legacy schema removal — the major breaking change from yesterday — took effect at the start of June 8 UTC. If you haven't migrated from response.outputs to response.steps, act immediately. See the June 7, 2026 digest for the full migration checklist.)


Model Releases

No new model releases in this 24h period.


API & SDK Changes

No new API or SDK changes this period. See Quick Hits for llama.cpp patch-level fixes.


Research

Nothing cleared the quality bar this period. arXiv direct fetch returned 403 for June 8 submission listings; no papers from recognized labs with measurable benchmark numbers and associated code were confirmed within the June 7–8 window.


Tooling

[HIGH] Apple iOS 27 Developer Beta: Siri Extensions API, Core AI, and Foundation Models Multimodal Support

Source: Apple Developer | Date: June 8, 2026 | Link: https://developer.apple.com/ios/ What changed: iOS 27 introduces the Extensions framework for Apple Intelligence — an App Intents-based API that lets any AI app (Claude, ChatGPT, Gemini, or any compliant provider) become a system-level AI integration point within Siri, Writing Tools, and Image Playground. Foundation Models now supports "any Language Model protocol-conforming provider," not just Apple's on-device model. Core AI replaces Core ML as the on-device inference framework. Developer Beta 1 was seeded the same afternoon as the WWDC keynote — it is available today. TL;DR: iOS 27 (developer beta live) opens Apple's AI surface to third-party providers via an App Intents Extensions API; Foundation Models gains multi-provider protocol support; Core AI supersedes Core ML; Siri itself runs on a custom Google Gemini 1.2T-parameter model under a $1B/year licensing deal. Developer signal: Three actions depending on your situation: (1) If you build iOS/macOS AI apps: Enroll in the Apple Developer Program and download iOS 27 Developer Beta today from developer.apple.com. The new IntelligenceExtension App Intent protocol (name based on current developer.apple.com documentation — check session videos for finalized API names) allows your app to register as an AI provider in Settings → Apple Intelligence & Siri. Users can then invoke your model through Siri, Writing Tools, and Image Playground without leaving Apple's native UX. This is platform-level distribution with zero API cost for the user interaction layer. (2) If you use Foundation Models for on-device inference: The framework now accepts "any Language Model protocol-conforming provider" — this means you can plug remote APIs (including Anthropic's) or custom on-device weights into the Foundation Models pipeline alongside Apple's built-in 3B model, which runs at ~30 t/s on iPhone 15 Pro. Review the new provider conformance requirements in the WWDC 2026 Foundation Models session. (3) If you use Core ML: Begin auditing your Core ML usage now — Core AI replaces Core ML in iOS 27, and migration will be required before the Fall GA release (September 2026, alongside iPhone 18). Full iOS 27 GA ships in Fall 2026; developer betas run now through July public beta. The full Siri 2.0 experience (Gemini-powered) is separate from and independent of the Extensions API — using or not using Gemini for Siri does not affect whether your app can be an Extension provider. Affects you if: You are building iOS, iPadOS, or macOS apps that integrate AI models; you use Foundation Models or Core ML for on-device inference; you are an AI provider or developer looking for platform-native distribution on Apple hardware; you have CI/CD pipelines targeting the latest iOS SDK Adoption effort: Moderate (Extensions require implementing the new App Intent protocol schema; Core AI migration from Core ML required for on-device inference workloads before Fall 2026 GA) Primary source: https://developer.apple.com/documentation/FoundationModels Quality gate score: 8 (Apple Developer Platform — official primary source +3, specific API names and developer beta release with SDK documentation +2, primary developer.apple.com source links confirmed +2, within 24h scan window +1)


[NOTABLE] llama.cpp b9562: Video Input Support Lands in Official Repo

Source: ggml-org/llama.cpp | Date: June 8, 2026, 16:41 UTC | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9562 What changed: PR #24269 adds mtmd_helper_video to the mtmd (multimodal) module, enabling video file input for any multimodal model running in llama.cpp. Previously, llama.cpp's mtmd system processed images only — there was no video path at all in the official repo. TL;DR: llama.cpp b9562 adds native video input support (base64-encoded on the server, file paths on CLI via --video flag) using FFmpeg for frame extraction, tested with Qwen3-VL-2B and Gemma-4-E4B. Developer signal: Update to b9562 and confirm FFmpeg is installed on your inference host — it is required; the implementation invokes FFmpeg as an external subprocess for frame decoding rather than bundling codecs. On the server (llama-server), send video as base64-encoded data in the /chat/completions request using the new MTMD_VIDEO content type. On the CLI (mtmd-cli), use the new --video flag with a file path. The implementation is model-agnostic: any model that accepts image frames through the mtmd system will accept video frames through this path — no model-specific configuration beyond updating to b9562 is required. Important limitations to plan around: (1) Audio extraction is not implemented — video is treated as a sequence of image frames only. (2) Full video frames are loaded into memory before processing — budget accordingly for long or high-resolution video. (3) For fine-grained control over how frames are merged, see the 3D convolution merge mechanism in the companion PR (referenced in #24269). If you're building video-capable local inference pipelines, this removes the previous requirement to pre-extract frames using ffmpeg/OpenCV outside llama.cpp and feed them as images. Affects you if: You run multimodal models locally via llama.cpp and want to process video inputs directly without a separate frame extraction pipeline Adoption effort: Quick (update to b9562, ensure FFmpeg is on PATH, update API calls to use base64 video input — no architecture changes required) Primary source: https://github.com/ggml-org/llama.cpp/pull/24269 Quality gate score: 8 (official GitHub primary source +3, concrete technical implementation details and tested model list +2, primary PR link +2, within scan window +1)


[NOTABLE] llama.cpp b9556: AMD RDNA3.5 GPU Support Added to HIP Backend

Source: ggml-org/llama.cpp | Date: June 8, 2026, 13:48 UTC | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9556 What changed: PR #24129 adds gfx1152 and gfx1153 to the RDNA3.5 device list in the HIP (ROCm) backend. These GPU IDs correspond to AMD's Radeon RX 9070 and RX 9070 XT, which were previously missing from the supported device list and fell back to CPU inference. TL;DR: llama.cpp b9556 adds HIP backend support for AMD Radeon RX 9070 / 9070 XT (gfx1152 and gfx1153 — RDNA3.5 architecture), enabling ROCm-accelerated GPU inference on these cards for the first time. Developer signal: If you are running llama.cpp on an AMD Radeon RX 9070 or RX 9070 XT, update to b9556 and rebuild with ROCm/HIP, targeting your GPU arch (-DAMDGPU_TARGETS=gfx1152 or gfx1153). Before this build, these specific RDNA3.5 GPU IDs were absent from the HIP device table, causing silent CPU fallback. No changes to your inference configuration, model files, or quantization setup are required beyond rebuilding. If you're on another RDNA3.5 variant not in the list, open an issue on the llama.cpp repo — this PR shows the pattern for adding new gfx targets. Affects you if: You run llama.cpp on AMD Radeon RX 9070 or RX 9070 XT (gfx1152/gfx1153) with ROCm/HIP Adoption effort: Quick (update to b9556 and rebuild with HIP support for your GPU arch — no inference code changes required) Primary source: https://github.com/ggml-org/llama.cpp/pull/24129 Quality gate score: 7 (official GitHub primary source +3, specific GPU IDs and technical path +2, within scan window +1, hardware-specific detail +1)


Benchmarks & Leaderboards

No new leaderboard movements for June 7–8. SWE-bench Verified standings are unchanged (Claude Mythos Preview leading at 93.9%, Opus 4.8 at 88.6%). LMArena text and coding leaderboards show no new model entries in the scan window.


Trends & Emerging Tech

llama.cpp Video Input Completes the Multimodal Stack for Local Inference

Source: ggml-org/llama.cpp | Date: June 8, 2026 | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9562 What's happening: With b9562, llama.cpp's mtmd system now covers text, image, and video frame inputs via a unified local inference path. Audio (both standalone via whisper.cpp and embedded within video files) is the remaining gap. The video addition follows the same pattern as the image multimodal work: FFmpeg-based frame extraction was prototyped and discussed in issues and community forks before landing in the official repo. The implementation is intentionally model-agnostic — any multimodal model that accepted images now accepts video without additional configuration. Why watch this: When audio-from-video lands (explicitly noted as a future iteration in PR #24269), the local inference stack will handle all major modalities — text, image, video, and audio — from a single llama.cpp server endpoint with no cloud dependency. That transition point matters for builders: recording-heavy workloads (call transcription, video analysis, document processing from screen recordings) that currently require cloud multimodal APIs become viable candidates for on-device or self-hosted pipelines. Watch for PRs on the llama.cpp repo that add audio extraction to the mtmd video path, and watch whether model cards for multimodal models start including explicit video benchmark numbers alongside image benchmarks as video support becomes standard.


Technical Discussions

Nothing cleared the quality bar this period. No qualifying HN threads (score >200 with concrete technical depth) found for June 7–8. No qualifying posts from Nathan Lambert (last: June 1), Eugene Yan, or Sebastian Raschka in the scan window. Simon Willison's blog returned 403 on direct fetch; no qualifying posts confirmed via search.


Quick Hits


Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Windows Local AI Runtime — KB5039239 TOMORROW June 9

(Countdown updated) Source: Windows Developer Blog | Link: https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/ Windows Update KB5039239 delivers the expanded on-device AI stack (Aion 1.0 runtime, CPU/GPU/NPU support) tomorrow, June 9. Required for production use of Aion 1.0 Instruct and Aion 1.0 Plan on end-user devices. Aion 1.0 open weights land on Hugging Face in July.

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (7 days)

(Countdown updated) Source: Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Migrate to claude-sonnet-4-6-20260217 and claude-opus-4-8 respectively. Review the Opus 4.8 migration guide before upgrading — adaptive thinking replaces budget_tokens; setting temperature, top_p, or top_k to non-default values returns a 400 error.

⚠️⚠️ Gemini CLI Hard Stop — June 18 (10 days)

(Countdown updated) Source: Google Developers Blog | Link: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/ gemini CLI and Gemini Code Assist IDE extensions stop serving requests on June 18. Replacement is Antigravity CLI (agy). Audit CLI scripts and CI pipeline steps now — Antigravity CLI does not have 1:1 feature parity.

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (11 days)

(Countdown updated) Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API." Takes 2 minutes; no code changes required.

⚠️ Gemini Image Models Shutdown — June 25 (17 days)

(Countdown updated) Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shutting down June 25, 2026. Migrate to stable image model equivalents.

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (19 days)

(Countdown updated) Source: OpenAI Platform Changelog | Link: https://platform.openai.com/docs/changelog GPT-4.5 being retired from the ChatGPT product surface on June 27. Direct API route retirement unconfirmed. Audit gpt-4.5 model identifiers in code.

⚠️⚠️ Claude Opus 4.1 Retirement — August 5 (58 days)

(Countdown updated) Source: Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations claude-opus-4-1-20250805 retires August 5. Migrate to claude-opus-4-8. Significant migration effort if coming from a pre-4.7 model — see June 6, 2026 digest for the full migration checklist including breaking changes around adaptive thinking, sampling parameters, and tokenizer differences.

⚠️ OpenAI Reusable Prompts (v1/prompts) Shutdown — November 30 (175 days)

Source: OpenAI | Link: https://developers.openai.com/api/docs/deprecations Deprecated June 3, shutdown November 30, 2026. Move prompt content to application code.

⚠️ OpenAI Evals Platform Shutdown — November 30 (175 days)

Source: OpenAI | Link: https://developers.openai.com/api/docs/deprecations Read-only October 31, shutdown November 30, 2026. Export eval configs before October 31.

⚠️ OpenAI Agent Builder Shutdown — November 30 (175 days)

Source: OpenAI | Link: https://developers.openai.com/api/docs/deprecations Shutdown November 30, 2026. Migrate to Agents SDK (openai.agents) or ChatGPT Workspace Agents.

Apple iOS 27 / macOS Golden Gate / Core AI GA — Fall 2026 (September, ~3 months)

NEW — Added June 8, 2026 Source: Apple Developer / WWDC 2026 | Link: https://developer.apple.com/ios/ iOS 27, iPadOS 27, and macOS Golden Gate ship with iPhone 18 in September 2026. Includes: Siri Extensions API (App Intents-based, third-party AI providers), Core AI (replaces Core ML), expanded Foundation Models multi-provider support. Developer Beta 1 available today. Public beta expected mid-July. Start auditing Core ML usage and planning Extensions integration now — 3 months to GA.

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

(Carried — status unchanged) Source: Anthropic | Link: https://www.anthropic.com/news/expanding-project-glasswing No timeline given. Currently: no public API, no claude.ai access at any tier. Leads SWE-bench Verified at 93.9% (internal benchmark as of June 2, 2026).

Gemini 3.5 Pro — Expected July 2026

(Carried — no official date) Sundar Pichai stated "give us until next month" at Google I/O 2026 (May 19). No official announcement, pricing, model ID, or benchmark numbers.


<details> <summary>🔭 Horizon — Open Questions, Emerging Patterns & Grounded Speculation</summary>

This section operates under different rules than the digest above. Evidence-grounded speculation is allowed. Pure prediction is not. Every claim here must cite a source from this digest or a real paper/benchmark. Label each entry by type so the reader knows what kind of thinking they're engaging with.

[PATTERN] OS platforms are each picking a default AI model while opening extension/marketplace channels for competitors iOS 27 ships with Gemini as the Siri default (custom 1.2T-parameter model, $1B/year licensing deal) while simultaneously opening an App Intents Extensions API for Claude, ChatGPT, and other providers to integrate at the system level. Windows AI Runtime (KB5039239 ships tomorrow, June 9) ships Aion 1.0 as the on-device default with open weights coming in July. Google replaced its own CLI tooling (Gemini CLI) with Antigravity CLI while maintaining an open Gemini API for third parties. The pattern across all three platforms is identical: pick a default (for revenue, integration quality, or both), then open a marketplace to avoid antitrust pressure and attract the developer ecosystem. The risk for non-default AI providers mirrors the App Store dynamic from 2010: marketplace placement requires active opt-in from users, while the default gets surface area at every entry point. Grounded in: Apple iOS 27 WWDC announcement (this digest, Tooling); Windows KB5039239 (Worth Watching, this digest and prior digests); Gemini CLI → Antigravity migration (June 7 digest)

[TENSION] Apple's "any Language Model protocol-conforming provider" in Foundation Models vs. Gemini exclusivity for the default Siri experience The Foundation Models framework documentation (developer.apple.com) confirms support for "any Language Model protocol-conforming provider," meaning third-party models including Anthropic's can be wired into the on-device Foundation Models pipeline for app-level inference. But the default Siri experience — the one that runs when a user talks to Siri — runs on a custom 1.2T-parameter Google Gemini model under a multi-year exclusivity deal. These two things coexist without contradiction: Foundation Models for apps is open, Siri's core model is closed. The Extensions API is a third layer: it allows other providers to handle user-initiated queries routed through Siri's UI, but the routing choice is always the user's, not the developer's. The unresolved question is what happens if Gemini underperforms a competitor Extension in user feedback — does Apple's licensing deal protect Gemini's default position even if users consistently route to Claude or ChatGPT Extensions? Watch whether iOS 27.x updates touch the default AI provider setting or make the Extensions choice more or less prominent in onboarding. Grounded in: Apple iOS 27 WWDC announcement (this digest, Tooling); Foundation Models documentation at developer.apple.com (this digest)

[IF THIS CONTINUES] At the current pace of local multimodal capability expansion in llama.cpp, on-device video analysis becomes viable for production use within 2-3 builds As of b9562, llama.cpp handles text, image, and video frames from a single server endpoint. The remaining gaps are: (1) audio extraction from video (explicitly deferred in PR #24269 to a future PR), (2) memory optimization for long videos (full frame loading is the current approach), and (3) stable benchmarks for video-specific tasks. If the audio PR lands within the next few weeks — consistent with the pace of other mtmd additions — and frame-selective sampling is added for long videos, a self-hosted llama.cpp endpoint will cover all four modalities (text, image, video, audio) that currently require cloud multimodal APIs. The implication for cost: at ~30 t/s on an M-series MacBook for text with multimodal models running locally, even conservatively assuming 3x lower throughput for video-heavy inference, the cost per processed video-minute on owned hardware becomes negligible versus per-token cloud API pricing for video. Watch for PRs labeled mtmd: audio or mtmd: video optimizations in the ggml-org/llama.cpp repository. Grounded in: llama.cpp b9562 video input (this digest, Tooling); llama.cpp b9549 Gemma4 MTP 40% throughput gain (June 7 digest, for context on local inference performance trajectory)

[OPEN QUESTION] Will Apple's Language Model protocol-conforming provider interface become a public API, or remain internal? The Foundation Models framework says it supports "any Language Model protocol-conforming provider" — but developer.apple.com documentation does not yet confirm whether this protocol is public-facing (importable by third-party apps to route calls to non-Apple models) or internal-facing (used by Apple's infrastructure to manage the Gemini/on-device model split). The distinction matters enormously: if public, iOS app developers can implement the protocol to use Anthropic, OpenAI, or their own fine-tuned models inside Foundation Models' inference pipeline, inheriting the framework's Swift concurrency integration, structured output support, and Private Cloud Compute privacy guarantees. If internal-only, the "any provider" language is effectively a future-proofing note for Apple's own roadmap, not a third-party developer surface. Watch the WWDC 2026 Foundation Models session video (expected on developer.apple.com within 24–48 hours) for confirmation of whether the protocol has a public API declaration. Grounded in: Apple iOS 27 Foundation Models framework documentation (this digest, Tooling); Foundation Models framework original launch at WWDC 2025 (background context)

[BUILDER'S ANGLE] The iOS 27 Extensions API creates a new distribution channel for AI applications that bypasses App Store discovery entirely If the Extensions API works as announced — users configure their preferred AI provider in Settings, and Siri routes queries to it system-wide — then an AI app that becomes a registered Extension gets surface area in every Apple Intelligence context on the user's device: keyboard suggestions, Writing Tools, Spotlight, Siri voice queries, Image Playground. This is qualitatively different from competing for search ranking in the App Store. It's closer to how browser extensions work on desktop: once installed and configured, they intercept the platform's standard UX flows rather than waiting for direct user invocation. For developers building AI apps on iOS, the strategic question is not just "does my app have good App Store ratings" but "can my app implement the Extensions protocol well enough to be the user's configured default." The winner of that choice gets passive, ambient usage without requiring explicit app opens — a fundamentally different monetization and engagement model than the current "open the app to use the AI" pattern. Grounded in: Apple iOS 27 Extensions API announcement (this digest, Tooling); Foundation Models "any Language Model protocol-conforming provider" documentation (this digest)

</details>

Excluded: ~37 items below quality gate threshold, outside scan window, or duplicate coverage. Near-misses: OpenAI prompt_cache_retention default change to 24h (date unconfirmed within June 7–8 window — could not fetch changelog directly; flagged for re-check if confirmed); vLLM (GitHub releases page returning 2025 data for the third consecutive scan — persistent display issue, no confirmed June 2026 release verified); arXiv June 8 submissions (direct fetch 403, no confirmed papers from recognized labs with code and benchmark numbers via search for the June 7–8 window); ToolChoiceConfusion paper arXiv 2606.06284 (submitted June 4 — outside window); TokenMizer paper arXiv 2606.06337 (date unconfirmed within window); Nathan Lambert "My bets on open models, mid-2026" (publication date unconfirmed for June 7–8 window, Substack 403 on direct fetch); Simon Willison (blog 403 on direct fetch, no qualifying posts confirmed via search for June 7–8); LMArena text leaderboard (no new model entries June 7–8); SWE-bench Verified (no new entries in window); llama.cpp b9559 (CLI spinner fix — UI-only, not developer-relevant); llama.cpp b9561 (ggml sync — infra update only); Unsloth (repo updated June 8, last release v2026.5.x — no new June release confirmed); HuggingFace Transformers v5.8.0 (May 5 — outside window); Mistral (last release March 2026 — outside window); Meta AI (nothing in window); Groq (no technical release in window); Together AI / Fireworks AI / Modal (nothing in window); AWS ML Blog / Azure AI (no developer-relevant technical items in window); NVIDIA developer blog (no June 7–8 items); xAI (403 on direct fetch, no confirmed June 7–8 technical release via search); Ollama (no specific June 7–8 release confirmed); Cohere (nothing in window); HuggingFace Papers Daily (403 on direct fetch, no June 7–8 papers confirmed via search); Apple WWDC marketing/non-technical coverage (excluded per sources file — general audience framing).

← All digestspersonal/digests/ai-2026-06-08.md