← All digests
📡

AI Developer Digest

Fri, Jun 12, 2026

6 items passed quality gate | ~35 scanned | ~29 excluded | Sources checked: 30 Scan window: June 11–12, 2026 (24h). Prior digest (June 11) covered: LMArena Fable 5 entry (Text/Code/Document/Vision/Agent Arena, June 10), LiteLLM v1.87.2 stable (Fable 5 backport to 1.87.x branch), llama.cpp b9591–b9601.


This Week's Signal

The biggest developer-facing release of this period is EAGLE3 speculative decoding landing in llama.cpp b9606 on June 12 — the first EAGLE3 implementation in the project, bringing 2.14–3.28× throughput on LLaMA 3.x and 1.62–2.17× on Qwen3 with no quality change. That's the most significant local inference speed jump since prefix caching improvements shipped earlier this year. On the same day, Hugging Face Transformers v5.12.0 added three new model families (MiniMax-M3-VL, PP-OCRv6, Parakeet-RNNT) and Unsloth v0.1.463-beta brought Gemma 4 MTP and GGUF tensor parallelism (+30% throughput). The community story of the period is Simon Willison's post on Fable 5's "relentlessly proactive" agentic behavior, which generated 668 HN points and 544 comments — a concrete, data-backed signal that unconstrained frontier agents burn money fast on trivial tasks, and that production cost discipline is now a first-class engineering concern.

Must-reads this digest:

  • llama.cpp b9606 — EAGLE3 speculative decoding — 2–3× local inference speedup for LLaMA 3.x/Qwen3/Gemma4. Drop-in with a single CLI flag. Most impactful inference tooling release this week.
  • Simon Willison on Fable 5 proactivity — $12.11 burned debugging a 2-line CSS fix; 668 HN points. A concrete cost-management data point that developers shipping agentic products need to read before it hits their billing.

[BREAKING] Breaking Changes

No breaking changes this period.


Model Releases

No new model releases from labs in the June 11–12 window. (Grok V9-Medium and Gemini 3.5 Pro remain pending — see Worth Watching.)


API & SDK Changes

No API or SDK breaking changes or notable feature additions in the June 11–12 window. (Most recent Anthropic platform release note: June 10; most recent anthropic-sdk-python: v0.109.1, June 9.)


Research

Nothing cleared the quality bar this period. arXiv cs.AI and cs.CL listing pages returned 403 on direct fetch. Hugging Face Papers Daily also returned 403. SWE-InfraBench (arxiv 2606.05249, evaluating LLMs on AWS CDK infrastructure code) was the most relevant paper found via search — submitted June 3, outside the scan window; see near-misses.


Tooling

[HIGH] llama.cpp b9606 — EAGLE3 Speculative Decoding

Source: ggml-org/llama.cpp GitHub | Date: June 12, 2026 (08:45 UTC) | Link: https://github.com/ggml-org/llama.cpp/releases/tag/b9606 What changed: EAGLE3 speculative decoding support added (PR #18039). Previously, llama.cpp supported EAGLE (v1) and MEDUSA speculative decoding but not EAGLE3. This PR adds the full EAGLE3 encoder-decoder architecture: layer-level feature extraction from the target model, encoder-compressed feature fusion, a single-layer draft decoder, and vocabulary mapping via a learnable d2t tensor. Works with both llama-cli and llama-server. TL;DR: EAGLE3 speculative decoding in llama.cpp b9606 delivers 2.14–3.28× throughput on LLaMA 3.x (BF16/Q4_K_M) and 1.62–2.17× on Qwen3 with no quality change — enabled with one new CLI flag. Developer signal: Add --spec-type draft-eagle3 -md <eagle3_draft.gguf> to your llama-server or llama-cli command. You need a matching EAGLE3 draft model for your target GGUF — NVIDIA provides Llama-3.3-70B-Instruct-Eagle3 on Hugging Face (nvidia/Llama-3.3-70B-Instruct-Eagle3); third-party Eagle3 checkpoints exist for Qwen3 series. Concrete benchmarks from the PR: LLaMA3.1-8B BF16 reaches 3.28× at 80.6% acceptance rate; Q4_K_M reaches 2.26× at 92.5% acceptance. LLaMA3.3-70B Q4_K_M: 2.14–2.41×. MoE models show diminished returns (0.8–1.4×) due to verification overhead on sparse expert activation — skip EAGLE3 for MoE if latency is critical. This is a NVIDIA + GGML collaboration; expect more draft model availability over the coming weeks as the ecosystem catches up. Draft models must match your target model family and quant format — not all Qwen3 EAGLE3 checkpoints are from Anthropic/Meta/NVIDIA and quality varies. Affects you if: You run llama.cpp for local inference on LLaMA 3.x, Qwen3, or Gemma4 models; you are latency-sensitive on CPU or consumer GPU setups; you are building on-device or edge inference pipelines Adoption effort: Moderate (need to source/download the matching EAGLE3 draft model GGUF and add CLI flags; no code changes if running llama-server via API) Primary source: https://github.com/ggml-org/llama.cpp/pull/18039 Quality gate score: 9 (official GitHub repo +3, concrete benchmark numbers with methodology +2, GitHub PR primary source +2, within scan window +1, technical audience +1)


[NOTABLE] Unsloth v0.1.463-beta — Gemma 4 MTP + GGUF Tensor Parallelism

Source: unslothai/unsloth GitHub | Date: June 12, 2026 (13:57 UTC) | Link: https://github.com/unslothai/unsloth/releases/tag/v0.1.463-beta What changed: Added DiffusionGemma training/inference support, Gemma 4 MTP (Multi-Token Prediction) for "2× faster" generation in Studio, and tensor parallelism for GGUFs across multiple GPUs (+30% throughput). Tool calling accuracy improved with 50–90% fewer "tool call nudging issues" (prompt-engineering workarounds to steer models toward correct tool call format) without accuracy loss. Audio input expanded to wav/mp3/m4a/flac/webm for Gemma 4 chat. Hub browser added for Hugging Face model/dataset discovery with local asset detection. TL;DR: Unsloth v0.1.463-beta adds Gemma 4 MTP for ~2× speedup and GGUF tensor parallelism for +30% throughput, alongside DiffusionGemma training and a major reduction in tool call nudging. Developer signal: If you're running Gemma 4 locally via Unsloth Studio, enable MTP to get the ~2× throughput boost — it's opt-in but automatic in Studio. For multi-GPU GGUF setups, tensor parallelism is now supported and delivers +30% throughput with no model quality change; configure via the Studio UI or CLI. The tool call improvement (50–90% fewer nudging issues) is relevant for fine-tuning pipelines producing tool-use datasets — if your training runs were failing on tool formatting, rerun. Note: MTP speedup applies to generation; fine-tuning throughput is not affected. This is a beta release; production users should verify on their specific model/hardware before replacing stable. Affects you if: You use Unsloth Studio for local training or inference; you are fine-tuning Gemma 4 or DiffusionGemma models; you run GGUF models across multiple GPUs Adoption effort: Quick (upgrade Unsloth package; enable MTP toggle in Studio or pass CLI flag) Primary source: https://github.com/unslothai/unsloth/releases/tag/v0.1.463-beta Quality gate score: 7 (official GitHub repo +3, concrete numbers: +30%, 2×, 50–90% +2, within scan window +1, technical audience +1)


[NOTABLE] HuggingFace Transformers v5.12.0 — MiniMax-M3-VL, PP-OCRv6, Parakeet-RNNT

Source: huggingface/transformers GitHub | Date: June 12, 2026 (14:39 UTC) | Link: https://github.com/huggingface/transformers/releases/tag/v5.12.0 What changed: Three new model architectures added: (1) MiniMax-M3-VL — vision-language model combining a CLIP-style vision tower (with 3D rotary position embeddings) and the MiniMax-M3 text backbone, featuring a mixed dense/sparse MoE decoder and Conv3d patch embedding; (2) PP-OCRv6 — OCR system in three tiers (medium/small/tiny) using MetaFormer-style blocks with structural reparameterization; (3) Parakeet-RNNT — speech recognition model pairing a Fast Conformer Encoder with an RNN-T decoder and LSTM prediction network. Also: security enhancement requiring trust_remote_code=True for local custom generation; fixed stop string matching for byte-fragment tokens. TL;DR: Transformers v5.12.0 adds three new model families (MiniMax-M3-VL vision-language, PP-OCRv6 OCR, Parakeet-RNNT speech) and tightens the trust_remote_code security requirement for local custom models. Developer signal: Pull the new version (pip install transformers==5.12.0) to access MiniMax-M3-VL, PP-OCRv6, and Parakeet-RNNT via the standard pipeline interface. If you load local models with custom generation code, you now must pass trust_remote_code=True explicitly — this will break silently if you omit it on affected local checkpoints. The byte-fragment token stop string fix may change tokenization behavior on edge cases; regression-test token boundaries in your prompts if you use custom stop strings. MiniMax-M3-VL is the most developer-relevant addition for multimodal work — it brings a strong VL backbone not previously available in the transformers ecosystem. Affects you if: You use transformers pipelines for vision-language, OCR, or speech recognition; you load local model checkpoints with custom generation code (trust_remote_code change); you use custom stop strings in generation Adoption effort: Quick (version bump and add trust_remote_code=True where needed); Moderate if you have existing pipelines relying on stop string behavior Primary source: https://github.com/huggingface/transformers/releases/tag/v5.12.0 Quality gate score: 7 (official GitHub repo +3, concrete technical details on new architectures +2, within scan window +1, technical audience +1)


Benchmarks & Leaderboards

No new leaderboard movements in the June 11–12 window. Fable 5 LMArena entry (June 10, all five categories) was covered in the June 11 digest. SWE-bench Verified and SWE-bench Pro: no new independent submissions confirmed in window.


Trends & Emerging Tech

Frontier Agents Burn Money Fast: Fable 5 Surfaces the Production Cost Discipline Problem

Source: Simon Willison (simonwillison.net) | Date: June 11, 2026 | Link: https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-proactive/ What's happening: Simon Willison documented a Fable 5 session that cost approximately $12.11 in API tokens while debugging what turned out to be a 2-line CSS fix. His characterization — "relentlessly proactive" — describes a model that attempts every available technique without pausing to reassess cost/effort ratio. The HN thread (668 points, 544 comments as of June 12) filled with similar accounts and debate about session cost controls. This is not a hallucination or capability failure; Fable 5 successfully debugged the issue. The problem is that "successfully debugging something by any means necessary" is expensive when the means include spawning multiple approaches, reading dependencies, and opening browser sessions. Why watch this: This pattern will generalize across frontier models as agents become more capable at self-directed problem solving. The immediate practical question is how to bound agent behavior by cost rather than by turn count. Anthropic's task_budget API parameter (GA May 28 with Opus 4.8) provides one lever; session-level token budgets in LiteLLM and other gateways are another. Developers shipping agentic products to users should audit what happens when the model is given an open-ended debugging or "fix this" instruction with no explicit budget — the failure mode is not a wrong answer but an expensive correct one. Expect tooling to emerge around cost-aware agent termination criteria in the coming months; the Willison post and HN thread are likely to accelerate this.


Technical Discussions

[HIGH] Simon Willison: "Claude Fable is relentlessly proactive" (HN #48498573)

Source: Simon Willison (simonwillison.net) | Date: June 11, 2026 | Link: https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-proactive/ What changed: First concrete, data-backed post-launch cost analysis of Fable 5 in a real agentic workflow (not a benchmark). Willison gave Fable 5 a screenshot of a CSS scrollbar bug and told it to look at dependencies. The model went deep into debugging: reading dependencies, opening browser sessions, trying multiple approaches. Cost: $12.11. Actual fix: 2 lines. The HN thread confirmed this is a pattern, not an outlier, with multiple practitioners reporting similar burn rates on comparable tasks. TL;DR: Simon Willison's Fable 5 test burned $12.11 solving a 2-line CSS fix — concrete data on the cost profile of proactive frontier agents in unconstrained sessions. Developer signal: Two actions: (1) If you are calling Claude Fable 5 via the API for open-ended agentic tasks, set either a task_budget (available via Managed Agents) or a hard token ceiling via your gateway/proxy — the model will attempt every available approach before giving up, which is powerful but expensive if the task is simple. (2) Audit existing Claude Code or agent workflows for "open-ended" instructions — prompts like "fix this bug" or "debug this issue" without explicit scope constraints will trigger maximally exploratory behavior. The right fix is prompt-side scoping ("check only X, Y, Z"), not model-side — Fable's proactivity is a feature for complex tasks. The cost concern is specifically about using it on tasks that don't warrant it. Willison's word of caution: "If you don't keep a close eye on it, Fable will quite happily burn $12 in tokens inventing new ways to debug your CSS." Affects you if: You are building agentic products on Fable 5; you are letting users submit open-ended debugging or coding tasks via the API; you have existing Claude Code sessions without explicit task budget constraints Adoption effort: Moderate (requires prompt revisions and/or gateway-level token ceilings; no API breaking change) Primary source: https://simonwillison.net/2026/Jun/11/fable-is-relentlessly-proactive/ | HN thread: https://news.ycombinator.com/item?id=48498573 Quality gate score: 6 (Tier 2 source explicitly listed in sources file +2, concrete cost data ($12.11, 2-line fix) +2, HN score 668 >200 threshold +1, within scan window +1)


Quick Hits


Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement JUNE 15 (3 DAYS)

Source: Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Act now if you haven't migrated. Migrate to claude-sonnet-4-6-20260217 and claude-opus-4-8 respectively. Review the Opus 4.8 migration guide — adaptive thinking replaces budget_tokens; temperature, top_p, or top_k at non-default values return a 400 error.

⚠️⚠️⚠️ Gemini CLI Hard Stop — June 18 (6 days)

(Countdown updated) Source: Google Developers Blog | Link: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/ gemini CLI and Gemini Code Assist IDE extensions stop serving requests on June 18. Replacement is Antigravity CLI (agy). Audit CLI scripts and CI pipeline steps — Antigravity CLI does not have 1:1 feature parity with the prior tooling.

⚠️⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (7 days)

(Countdown updated) Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API." Takes ~2 minutes; no code changes required.

⚠️⚠️ Gemini Image Models Shutdown — June 25 (13 days)

(Countdown updated) Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shutting down June 25. Migrate to stable image model equivalents.

⚠️⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (15 days)

(Countdown updated) Source: OpenAI Platform Changelog | Link: https://platform.openai.com/docs/changelog GPT-4.5 being retired from the ChatGPT product surface on June 27. Direct API route retirement unconfirmed. Audit gpt-4.5 model identifiers in code.

⚠️⚠️ Grok V9-Medium — Still Pending (est. any day)

(Countdown updated — as of June 12, not yet launched) Source: xAI / Elon Musk announcement, May 25, 2026 | Link: https://x.ai/news Training completed late May; SFT and RL underway. Mid-June public release still pending as of June 12. 1.5 trillion parameters, Cursor-data training, coding-focused. No API pricing, model ID, or benchmark numbers confirmed.

⚠️⚠️ Gemini 3.5 Pro — Still Pending, June 2026 (any day)

(Status unchanged — still limited Vertex preview as of June 12) Source: Google I/O 2026 / Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/models As of June 12, still in limited Vertex enterprise preview. Sundar Pichai's "give us until next month" (said May 19) has not yet materialized. Expected: 2M token context, Deep Think reasoning mode. Watch ai.google.dev for the official launch.

⚠️ Aion 1.0 Open Weights — July 2026 (~3 weeks)

(Carried — status unchanged) Source: Windows Developer Blog | Link: https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/ Aion 1.0 Instruct open weights land on Hugging Face in July 2026. No confirmed specific date yet.

⚠️⚠️ Claude Opus 4.1 Retirement — August 5 (54 days)

(Countdown updated) Source: Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations claude-opus-4-1-20250805 retires August 5. Migrate to claude-opus-4-8. See the June 6, 2026 digest for the full migration checklist including breaking changes around adaptive thinking, sampling parameters, and tokenizer differences.

⚠️ OpenAI Reusable Prompts (v1/prompts) Shutdown — November 30 (172 days)

Source: OpenAI | Link: https://developers.openai.com/api/docs/deprecations Deprecated June 3, shutdown November 30, 2026. Move prompt content to application code.

⚠️ OpenAI Evals Platform Shutdown — November 30 (172 days)

Source: OpenAI | Link: https://developers.openai.com/api/docs/deprecations Read-only October 31, shutdown November 30, 2026. Export eval configs before October 31.

⚠️ OpenAI Agent Builder Shutdown — November 30 (172 days)

Source: OpenAI | Link: https://developers.openai.com/api/docs/deprecations Shutdown November 30, 2026. Migrate to Agents SDK (openai.agents) or ChatGPT Workspace Agents.

Apple iOS 27 / macOS Golden Gate / Core AI GA — Fall 2026 (September, ~3 months)

(Carried — status unchanged) Source: Apple Developer / WWDC 2026 | Link: https://developer.apple.com/ios/ iOS 27, iPadOS 27, and macOS Golden Gate ship with iPhone 18 in September 2026. Includes: Siri Extensions API, Core AI (replaces Core ML), Foundation Models multi-provider support. Developer Beta 1 available now. Public beta expected mid-July.

Claude Mythos 5 General Availability — No Timeline

(Carried — status unchanged) Source: Anthropic | Link: https://www.anthropic.com/news/expanding-project-glasswing Currently only for vetted Project Glasswing participants. Not available on the public API.


<details> <summary>🔭 Horizon — Open Questions, Emerging Patterns & Grounded Speculation</summary>

This section operates under different rules than the digest above. Evidence-grounded speculation is allowed. Pure prediction is not. Every claim here must cite a source from this digest or a real paper/benchmark. Label each entry by type so the reader knows what kind of thinking they're engaging with.

[PATTERN] Speculative decoding is becoming the dominant local inference speed lever — and the gap from draft-model quality is closing Three distinct speculative decoding advances have shipped in llama.cpp in the last 60 days: EAGLE (v1, earlier), MEDUSA (ongoing), and now EAGLE3 (b9606, June 12). The EAGLE3 PR shows 2.14–3.28× throughput on LLaMA 3.x and 1.62–2.17× on Qwen3 with 80–92% acceptance rates. The underlying driver is that draft models — the small, fast models used to generate candidate tokens — are improving faster than the verification overhead grows. For local inference builders, this means the throughput gap between a cloud API call and a local llama.cpp call is narrowing faster than hardware improvements alone would predict. The practical implication: local inference is approaching cost-competitive territory for latency-sensitive use cases at 7B–70B scale if you can tolerate the draft model dependency. Grounded in: llama.cpp b9606 EAGLE3 benchmarks (this digest, Tooling); PR #18039 NVIDIA+GGML collaboration attribution


[OPEN QUESTION] What is the right cost-aware termination criterion for production agentic workflows at frontier model pricing? Willison's $12.11 session on a 2-line CSS fix (documented June 11) is evidence that token count and turn count are insufficient constraints on frontier agent cost — Fable 5 can burn 12 dollars in a small number of exploration turns if the search space is wide. Anthropic's task_budget parameter (Managed Agents, GA May 28) gives a budget in tokens-per-session; LiteLLM and similar gateways offer spend-based cutoffs. But neither answers the harder question: how does an agent know when the cost of an additional attempt exceeds the expected value of improvement? This is a genuine open problem. Current practice is human-set hard ceilings; research on utility-aware agent termination is sparse. The question will become more acute as Grok V9-Medium (1.5T params) and Gemini 3.5 Pro (2M context, Deep Think) land — both are more capable and will amplify this pattern. Grounded in: Simon Willison "Claude Fable is relentlessly proactive" (this digest, Technical Discussions); Anthropic task_budget launch May 28, 2026 platform release notes


[IF THIS CONTINUES] EAGLE3 on LLaMA3.3-70B at 2.4× + Gemma4 MTP via Unsloth at 2× = local 70B inference is approaching 100B-class API throughput rates As of June 12: llama.cpp EAGLE3 delivers 2.14–2.41× on LLaMA3.3-70B Q4_K_M (b9606); Unsloth v0.1.463-beta delivers ~2× on Gemma4 via MTP. These improvements are multiplicative if combined (EAGLE3 for draft-model acceleration, MTP for the target model's own multi-token output). On a single H100, an unaccelerated LLaMA3.3-70B Q4_K_M processes roughly 30–40 tokens/second; 2.3× EAGLE3 brings that to ~75–90 tokens/second. GPT-5.4 via the OpenAI API delivers ~120–180 tokens/second depending on load. The gap is narrowing. At this trajectory, a two-GPU local rig with tensor parallelism (now in Unsloth v0.1.463-beta) plus EAGLE3 could match API throughput rates within 6–12 months for 70B-class tasks — making sovereignty and cost arguments for local inference significantly stronger. Grounded in: llama.cpp b9606 EAGLE3 benchmarks 2.14–2.41× (this digest, Tooling); Unsloth v0.1.463-beta +30% tensor parallelism (this digest, Tooling); Unsloth 2× MTP (this digest)


[BUILDER'S ANGLE] EAGLE3 + Gemma4 MTP creates the first viable "local Fable-class agentic loop" for security-conscious builders Fable 5's proactive behavior (documented June 11, Simon Willison) is powerful but comes with cloud data retention requirements (Anthropic mandates 30-day retention for Fable 5 API calls, from the June 9 platform release notes). For regulated industries or privacy-critical workloads, this makes Fable 5 inaccessible. June 12's combination of EAGLE3 in llama.cpp (2× on Gemma4) and Unsloth MTP (2× on Gemma4) means a local Gemma4 instance can now match or exceed prior-generation frontier throughput at 2× the speed, without data leaving the premises. The draft models for Gemma4 EAGLE3 are available from RedHatAI per the PR notes. This is not the same as Fable 5 capability, but it's a viable alternative architecture for: healthcare agentic pipelines, financial automation, government deployments, and multi-tenant SaaS where prompt data isolation is contractually required. Grounded in: llama.cpp b9606 EAGLE3 support for Gemma4 (this digest, Tooling); Claude Fable 5 30-day data retention requirement (June 9 platform release notes, June 11 digest); Unsloth v0.1.463-beta Gemma4 MTP (this digest)


[TENSION] "More capable = more expensive per task" is emerging as the dominant production constraint on frontier model adoption June 12 holds an interesting tension: EAGLE3 in llama.cpp makes local inference dramatically faster and cheaper (2–3×), while simultaneously, Simon Willison's data shows that frontier cloud models are getting dramatically more expensive per task as they become more capable. Fable 5 costs $12 on a task that previous-generation models might have cost $0.50 on — not because Fable 5 failed, but because it succeeded more thoroughly. This is a new cost structure: the billable unit is no longer "tokens per answer" but "tokens per completed autonomous exploration." For builders choosing between local and cloud inference, the relevant question is shifting from "quality" to "task scope control." Cloud frontier models win on ceiling quality but require hard budget constraints; local models win on predictable per-task cost but require accepting a capability ceiling. Grounded in: Simon Willison cost data $12.11 (this digest, Technical Discussions); llama.cpp EAGLE3 2–3× speedup (this digest, Tooling); Claude Fable 5 data retention and pricing (June 9 platform release notes)

</details>

Excluded: ~29 items below quality gate threshold, outside scan window, or already covered in prior digests. Near-misses: Anthropic Platform release notes — most recent entry June 10 (already covered June 11 digest); anthropic-sdk-python most recent release June 9 (v0.109.1, adds frontier_llm refusal category — June 9, one day outside window); Grok V9-Medium — not yet launched as of June 12 (moved to Worth Watching); Gemini 3.5 Pro — not yet GA as of June 12 (carried in Worth Watching); Grok Imagine video expansion (June 11 — not confirmed in official xAI release notes, no primary source); GPT-5.6 — rumors only, no official announcement; SWE-InfraBench arxiv 2606.05249 (submitted June 3, outside window); LMArena Fable 5 Elo settling — no new leaderboard movement in window vs. what June 11 digest already covered; Gemini API changelog — most recent confirmed entry May 28, no June 11–12 additions found; Ollama — latest v0.30.7 from June 7, no June 11–12 release; vLLM — latest v0.22.1 from June 4, no June 11–12 release; langchain, openai-python, autogen, crewai, smolagents — no qualifying June 11–12 releases confirmed; Groq, Together AI, Fireworks AI, AWS Bedrock, Azure AI, NVIDIA Developer Blog — nothing confirmed in June 11–12 window; HF Papers Daily — 403 on direct fetch, no qualifying papers confirmed via search in window; arXiv cs.AI/cs.CL — 403 on direct listing fetch; Simon Willison datasette 1.0a33 post (June 11 — tooling release, not AI-relevant); Anthropic Public Record survey release (June 12 — public opinion research, not developer-technical).

← All digestspersonal/digests/ai-2026-06-12.md