AI Developer Digest

Mon, Jun 8, 2026

17 signals that cleared the gate21 min read

The Signal — start here

WWDC 2026 dominated today. Apple's most developer-relevant AI platform move since the Foundation Models launch at WWDC25: iOS 27 introduces an Extensions framework (App Intents-based) that lets Claude, ChatGPT, and Gemini plug into Siri, Writing Tools, and Image Playground — a system-level AI provider choice API. Developer beta is available right now for registered Apple Developer Program members. The underlying Siri model runs on a custom Google Gemini 1.2T model, but the Extensions API is separate from that: it's a platform distribution channel for any AI provider that implements the required App Intent schemas. Outside WWDC, llama.cpp had an active patch day: video input support landed in the official repo (FFmpeg-based, model-agnostic), and RDNA3.5 (RX 9070 series) GPU support came to the HIP backend.

Must-reads today

Apple iOS 27 + Siri Extensions API — Extensions framework lets any AI app become a Siri/Writing Tools provider via App Intents; developer beta released today; Core AI replaces Core ML for on-device inference. Act now to stay ahead of September GA.

llama.cpp b9562: Video Input — Official video input support for multimodal inference in llama.cpp; requires FFmpeg; any mtmd-capable model (Qwen3-VL, Gemma-4, etc.) now accepts video via --video flag or base64 server API.

Breaking Changes

No breaking changes this period.

(The Gemini Interactions API legacy schema removal — the major breaking change from yesterday — took effect at the start of June 8 UTC. If you haven't migrated from response.outputs to response.steps, act immediately. See the June 7, 2026 digest for the full migration checklist.)

Model Releases

No new model releases in this 24h period.

API & SDK Changes

No new API or SDK changes this period. See Quick Hits for llama.cpp patch-level fixes.

Research

Nothing cleared the quality bar this period. arXiv direct fetch returned 403 for June 8 submission listings; no papers from recognized labs with measurable benchmark numbers and associated code were confirmed within the June 7–8 window.

Tooling

High

Apple iOS 27 Developer Beta: Siri Extensions API, Core AI, and Foundation Models Multimodal Support

What changed

iOS 27 introduces the Extensions framework for Apple Intelligence — an App Intents-based API that lets any AI app (Claude, ChatGPT, Gemini, or any compliant provider) become a system-level AI integration point within Siri, Writing Tools, and Image Playground. Foundation Models now supports "any Language Model protocol-conforming provider," not just Apple's on-device model. Core AI replaces Core ML as the on-device inference framework. Developer Beta 1 was seeded the same afternoon as the WWDC keynote — it is available today.

TL;DR

iOS 27 (developer beta live) opens Apple's AI surface to third-party providers via an App Intents Extensions API; Foundation Models gains multi-provider protocol support; Core AI supersedes Core ML; Siri itself runs on a custom Google Gemini 1.2T-parameter model under a $1B/year licensing deal.

Developer signal

Three actions depending on your situation: (1) If you build iOS/macOS AI apps: Enroll in the Apple Developer Program and download iOS 27 Developer Beta today from developer.apple.com. The new IntelligenceExtension App Intent protocol (name based on current developer.apple.com documentation — check session videos for finalized API names) allows your app to register as an AI provider in Settings → Apple Intelligence & Siri. Users can then invoke your model through Siri, Writing Tools, and Image Playground without leaving Apple's native UX. This is platform-level distribution with zero API cost for the user interaction layer. (2) If you use Foundation Models for on-device inference: The framework now accepts "any Language Model protocol-conforming provider" — this means you can plug remote APIs (including Anthropic's) or custom on-device weights into the Foundation Models pipeline alongside Apple's built-in 3B model, which runs at ~30 t/s on iPhone 15 Pro. Review the new provider conformance requirements in the WWDC 2026 Foundation Models session. (3) If you use Core ML: Begin auditing your Core ML usage now — Core AI replaces Core ML in iOS 27, and migration will be required before the Fall GA release (September 2026, alongside iPhone 18). Full iOS 27 GA ships in Fall 2026; developer betas run now through July public beta. The full Siri 2.0 experience (Gemini-powered) is separate from and independent of the Extensions API — using or not using Gemini for Siri does not affect whether your app can be an Extension provider.

Affects you ifYou are building iOS, iPadOS, or macOS apps that integrate AI models; you use Foundation Models or Core ML for on-device inference; you are an AI provider or developer looking for platform-native distribution on Apple hardware; you have CI/CD pipelines targeting the latest iOS SDKEffortModerate (Extensions require implementing the new App Intent protocol schema; Core AI migration from Core ML required for on-device inference workloads before Fall 2026 GA)

Apple Developer | Date: June 8, 2026 | Link: https://developer.apple.com/ios/https://developer.apple.com/documentation/FoundationModels

Notable

llama.cpp b9562: Video Input Support Lands in Official Repo

What changed

PR #24269 adds mtmd_helper_video to the mtmd (multimodal) module, enabling video file input for any multimodal model running in llama.cpp. Previously, llama.cpp's mtmd system processed images only — there was no video path at all in the official repo.

TL;DR

llama.cpp b9562 adds native video input support (base64-encoded on the server, file paths on CLI via --video flag) using FFmpeg for frame extraction, tested with Qwen3-VL-2B and Gemma-4-E4B.

Developer signal

Update to b9562 and confirm FFmpeg is installed on your inference host — it is required; the implementation invokes FFmpeg as an external subprocess for frame decoding rather than bundling codecs. On the server (llama-server), send video as base64-encoded data in the /chat/completions request using the new MTMD_VIDEO content type. On the CLI (mtmd-cli), use the new --video flag with a file path. The implementation is model-agnostic: any model that accepts image frames through the mtmd system will accept video frames through this path — no model-specific configuration beyond updating to b9562 is required. Important limitations to plan around: (1) Audio extraction is not implemented — video is treated as a sequence of image frames only. (2) Full video frames are loaded into memory before processing — budget accordingly for long or high-resolution video. (3) For fine-grained control over how frames are merged, see the 3D convolution merge mechanism in the companion PR (referenced in #24269). If you're building video-capable local inference pipelines, this removes the previous requirement to pre-extract frames using ffmpeg/OpenCV outside llama.cpp and feed them as images.

Affects you ifYou run multimodal models locally via llama.cpp and want to process video inputs directly without a separate frame extraction pipelineEffortQuick (update to b9562, ensure FFmpeg is on PATH, update API calls to use base64 video input — no architecture changes required)

ggml-org/llama.cpp | Date: June 8, 2026, 16:41 UTC | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9562https://github.com/ggml-org/llama.cpp/pull/24269

Notable

llama.cpp b9556: AMD RDNA3.5 GPU Support Added to HIP Backend

What changed

PR #24129 adds gfx1152 and gfx1153 to the RDNA3.5 device list in the HIP (ROCm) backend. These GPU IDs correspond to AMD's Radeon RX 9070 and RX 9070 XT, which were previously missing from the supported device list and fell back to CPU inference.

TL;DR

llama.cpp b9556 adds HIP backend support for AMD Radeon RX 9070 / 9070 XT (gfx1152 and gfx1153 — RDNA3.5 architecture), enabling ROCm-accelerated GPU inference on these cards for the first time.

Developer signal

If you are running llama.cpp on an AMD Radeon RX 9070 or RX 9070 XT, update to b9556 and rebuild with ROCm/HIP, targeting your GPU arch (-DAMDGPU_TARGETS=gfx1152 or gfx1153). Before this build, these specific RDNA3.5 GPU IDs were absent from the HIP device table, causing silent CPU fallback. No changes to your inference configuration, model files, or quantization setup are required beyond rebuilding. If you're on another RDNA3.5 variant not in the list, open an issue on the llama.cpp repo — this PR shows the pattern for adding new gfx targets.

Affects you ifYou run llama.cpp on AMD Radeon RX 9070 or RX 9070 XT (gfx1152/gfx1153) with ROCm/HIPEffortQuick (update to b9556 and rebuild with HIP support for your GPU arch — no inference code changes required)

ggml-org/llama.cpp | Date: June 8, 2026, 13:48 UTC | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9556https://github.com/ggml-org/llama.cpp/pull/24129

Benchmarks & Leaderboards

No new leaderboard movements for June 7–8. SWE-bench Verified standings are unchanged (Claude Mythos Preview leading at 93.9%, Opus 4.8 at 88.6%). LMArena text and coding leaderboards show no new model entries in the scan window.

Trends & Emerging Tech

llama.cpp Video Input Completes the Multimodal Stack for Local Inference

What's happening

With b9562, llama.cpp's mtmd system now covers text, image, and video frame inputs via a unified local inference path. Audio (both standalone via whisper.cpp and embedded within video files) is the remaining gap. The video addition follows the same pattern as the image multimodal work: FFmpeg-based frame extraction was prototyped and discussed in issues and community forks before landing in the official repo. The implementation is intentionally model-agnostic — any multimodal model that accepted images now accepts video without additional configuration.

Why watch this

When audio-from-video lands (explicitly noted as a future iteration in PR #24269), the local inference stack will handle all major modalities — text, image, video, and audio — from a single llama.cpp server endpoint with no cloud dependency. That transition point matters for builders: recording-heavy workloads (call transcription, video analysis, document processing from screen recordings) that currently require cloud multimodal APIs become viable candidates for on-device or self-hosted pipelines. Watch for PRs on the llama.cpp repo that add audio extraction to the mtmd video path, and watch whether model cards for multimodal models start including explicit video benchmark numbers alongside image benchmarks as video support becomes standard.

ggml-org/llama.cpp | Date: June 8, 2026 | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9562

Technical Discussions

Nothing cleared the quality bar this period. No qualifying HN threads (score >200 with concrete technical depth) found for June 7–8. No qualifying posts from Nathan Lambert (last: June 1), Eugene Yan, or Sebastian Raschka in the scan window. Simon Willison's blog returned 403 on direct fetch; no qualifying posts confirmed via search.

Quick Hits

llama.cpp b9557 (June 8, 14:17 UTC) — CUDA: reset context after reading memory size; fixes buffer counting crash on affected CUDA setups. [https://github.com/ggml-org/llama.cpp/releases/tag/b9557]
llama.cpp b9558 (June 8, 14:44 UTC) — Vulkan: use cm2 decode_vector for mul_mat_id B matrix loads; performance improvement for matrix multiply in Vulkan inference. [https://github.com/ggml-org/llama.cpp/releases/tag/b9558]
llama.cpp b9564–b9565 (June 8) — WebGPU: 2D workgroups for scale/binary/unary ops (perf) + buffer overlap fix for concat operator (correctness). [https://github.com/ggml-org/llama.cpp/releases]
llama.cpp b9566 (June 8, 18:07 UTC) — Bug fix: SWA-only draft heads (StepFun MTP models) crash on load fixed; kq_mask buffer is now guarded independently per attention path. [https://github.com/ggml-org/llama.cpp/releases/tag/b9566]
llama.cpp b9567 (June 8, 19:02 UTC) — Server: skip parsing when flushing HTTP headers; fixes server response handling stability. [https://github.com/ggml-org/llama.cpp/releases/tag/b9567]

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Windows Local AI Runtime — KB5039239 TOMORROW June 9

(Countdown updated)

Windows Update KB5039239 delivers the expanded on-device AI stack (Aion 1.0 runtime, CPU/GPU/NPU support) tomorrow, June 9. Required for production use of Aion 1.0 Instruct and Aion 1.0 Plan on end-user devices. Aion 1.0 open weights land on Hugging Face in July.

Windows Developer Blog | Link: https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (7 days)

(Countdown updated)

claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Migrate to claude-sonnet-4-6-20260217 and claude-opus-4-8 respectively. Review the Opus 4.8 migration guide before upgrading — adaptive thinking replaces budget_tokens; setting temperature, top_p, or top_k to non-default values returns a 400 error.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

⚠️⚠️ Gemini CLI Hard Stop — June 18 (10 days)

(Countdown updated)

gemini CLI and Gemini Code Assist IDE extensions stop serving requests on June 18. Replacement is Antigravity CLI (agy). Audit CLI scripts and CI pipeline steps now — Antigravity CLI does not have 1:1 feature parity.

Google Developers Blog | Link: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (11 days)

(Countdown updated)

All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API." Takes 2 minutes; no code changes required.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key

⚠️ Gemini Image Models Shutdown — June 25 (17 days)

(Countdown updated)

gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shutting down June 25, 2026. Migrate to stable image model equivalents.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (19 days)

(Countdown updated)

GPT-4.5 being retired from the ChatGPT product surface on June 27. Direct API route retirement unconfirmed. Audit gpt-4.5 model identifiers in code.

OpenAI Platform Changelog | Link: https://platform.openai.com/docs/changelog

⚠️⚠️ Claude Opus 4.1 Retirement — August 5 (58 days)

(Countdown updated)

claude-opus-4-1-20250805 retires August 5. Migrate to claude-opus-4-8. Significant migration effort if coming from a pre-4.7 model — see June 6, 2026 digest for the full migration checklist including breaking changes around adaptive thinking, sampling parameters, and tokenizer differences.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — November 30 (175 days)

Deprecated June 3, shutdown November 30, 2026. Move prompt content to application code.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

⚠️ OpenAI Evals Platform Shutdown — November 30 (175 days)

Read-only October 31, shutdown November 30, 2026. Export eval configs before October 31.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

⚠️ OpenAI Agent Builder Shutdown — November 30 (175 days)

Shutdown November 30, 2026. Migrate to Agents SDK (openai.agents) or ChatGPT Workspace Agents.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

Apple iOS 27 / macOS Golden Gate / Core AI GA — Fall 2026 (September, ~3 months)

NEW — Added June 8, 2026

iOS 27, iPadOS 27, and macOS Golden Gate ship with iPhone 18 in September 2026. Includes: Siri Extensions API (App Intents-based, third-party AI providers), Core AI (replaces Core ML), expanded Foundation Models multi-provider support. Developer Beta 1 available today. Public beta expected mid-July. Start auditing Core ML usage and planning Extensions integration now — 3 months to GA.

Apple Developer / WWDC 2026 | Link: https://developer.apple.com/ios/

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

(Carried — status unchanged)

No timeline given. Currently: no public API, no claude.ai access at any tier. Leads SWE-bench Verified at 93.9% (internal benchmark as of June 2, 2026).

Anthropic | Link: https://www.anthropic.com/news/expanding-project-glasswing

Gemini 3.5 Pro — Expected July 2026

(Carried — no official date)

Sundar Pichai stated "give us until next month" at Google I/O 2026 (May 19). No official announcement, pricing, model ID, or benchmark numbers.

Filtered from 30+ primary sources against a published quality rubric. No press releases, no fluff — only what changes what you build.

Breaking Changes

Model Releases

API & SDK Changes

Research

Tooling

Apple iOS 27 Developer Beta: Siri Extensions API, Core AI, and Foundation Models Multimodal Support

llama.cpp b9562: Video Input Support Lands in Official Repo

llama.cpp b9556: AMD RDNA3.5 GPU Support Added to HIP Backend

Benchmarks & Leaderboards

Trends & Emerging Tech

llama.cpp Video Input Completes the Multimodal Stack for Local Inference

Technical Discussions

Quick Hits

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Windows Local AI Runtime — KB5039239 **TOMORROW June 9**

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement **June 15 (7 days)**

⚠️⚠️ Gemini CLI Hard Stop — **June 18 (10 days)**

⚠️⚠️ Gemini API Unrestricted Key Deadline — **June 19 (11 days)**

⚠️ Gemini Image Models Shutdown — **June 25 (17 days)**

⚠️ GPT-4.5 Retirement from ChatGPT — **June 27 (19 days)**

⚠️⚠️ Claude Opus 4.1 Retirement — **August 5 (58 days)**

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — **November 30 (175 days)**

⚠️ OpenAI Evals Platform Shutdown — **November 30 (175 days)**

⚠️ OpenAI Agent Builder Shutdown — **November 30 (175 days)**

Apple iOS 27 / macOS Golden Gate / Core AI GA — **Fall 2026 (September, ~3 months)**

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

Gemini 3.5 Pro — Expected July 2026

⚠️⚠️⚠️ Windows Local AI Runtime — KB5039239 TOMORROW June 9

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (7 days)

⚠️⚠️ Gemini CLI Hard Stop — June 18 (10 days)

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (11 days)

⚠️ Gemini Image Models Shutdown — June 25 (17 days)

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (19 days)

⚠️⚠️ Claude Opus 4.1 Retirement — August 5 (58 days)

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — November 30 (175 days)

⚠️ OpenAI Evals Platform Shutdown — November 30 (175 days)

⚠️ OpenAI Agent Builder Shutdown — November 30 (175 days)

Apple iOS 27 / macOS Golden Gate / Core AI GA — Fall 2026 (September, ~3 months)