AI Developer Digest
9 items passed quality gate | ~32 candidates evaluated | ~23 excluded | Sources checked: 26 Scan window: June 1, 2026 (24h). Prior digest covered: llama.cpp b9430–b9442; Claude Code v2.1.159 (internal infra only); LMArena state (Grok 4.1 Thinking 1483 Elo, Opus 4.8 88.6% SWE-bench Verified); Gemini 2.0 Flash + Flash-Lite shutdown (live); Claude Managed Agents on AWS webhooks/multiagent/self-hosted sandboxes (May 29).
This Week's Signal
NVIDIA's Computex 2026 keynote on June 1 was the day's dominant story: Jensen Huang announced Nemotron 3 Ultra (550B-param MoE, 55B active at inference, 48.0 Intelligence Index, 300+ tokens/sec, weights arriving June 4) and Cosmos 3 (first open Physical AI omnimodel, accepts and generates text + images + video + audio + robot actions in a single architecture, available now). For general LLM developers, Nemotron 3 Ultra is the most deployable US open-weights model at this intelligence tier — though Kimi K2.6 (54.0) leads globally among open models, and Opus 4.8 (61.0) leads overall. For robotics and physical-AI developers, Cosmos 3 removes the three-model stack (world model + VLM + policy network) and replaces it with one open, commercially licensed model. Outside Computex, llama.cpp shipped 10+ incremental builds on June 1, including EXAONE 4.5 model support and a spec-decode behavior change that may affect deployments relying on automatic
draft-simpleactivation.
Must-reads this digest:
- Nemotron 3 Ultra — US open-weights frontier hits 48.0 Intelligence Index at 300+ tokens/sec; weights on HuggingFace/NIM June 4; benchmark context: trails Kimi K2.6 (54.0) and Opus 4.8 (61.0)
- Cosmos 3 — first open Physical AI omnimodel (32B/8B) with robot action output, available now on GitHub with commercial license; skip unless you're in robotics/AV, but if you are, this is the story of the week
- Gemini CLI June 18 stop — newly surfaced, not in prior digests: if you have any scripted
geminiCLI usage, 17 days to test migration to Antigravity CLI (agy)
[BREAKING] Breaking Changes
No breaking changes this period.
Model Releases
[HIGH] NVIDIA Nemotron 3 Ultra — US Open-Weights Frontier Reaches 48.0 Intelligence Index
Source: NVIDIA Newsroom (Computex 2026 keynote) | Date: June 1, 2026 | Link: https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai What changed: 550B-total / 55B-active MoE model replaces Nemotron 3 Super as NVIDIA's flagship open model; Intelligence Index jumps from 36 (Super) to 48.0 (Ultra) while output speed increases to 300+ tokens/second — no prior Nemotron model combined this quality tier with this throughput on a single architecture TL;DR: 550B total / 55B active open-weights MoE, 48.0 Artificial Analysis Intelligence Index, 300+ tokens/sec output, 30% lower inference cost vs. comparable alternatives; weights arriving June 4 on HuggingFace, ModelScope, OpenRouter, and NVIDIA NIM Developer signal: Weights land June 4 — not yet downloadable as of this digest. Deployment paths: HuggingFace / ModelScope for self-hosted, OpenRouter for managed API, NVIDIA NIM microservice (build.nvidia.com) for drop-in vLLM-style serving on NVIDIA hardware. Critical benchmark context: at 48.0 Intelligence Index, Nemotron 3 Ultra is the strongest US open-weight model, but Kimi K2.6 from Moonshot AI scores 54.0 (13% quality gap) and Opus 4.8 scores 61.0 (27% gap). If you have no US-hosting or export-control requirements, Kimi K2.6 is the stronger open-weights choice on intelligence alone. Nemotron 3 Ultra wins on throughput (300+ tokens/sec vs. slower alternatives at the same quality tier) and on deployment integration with NVIDIA infrastructure. For gpt-oss-120b (33 index) or Gemma 4 31B (39 index) users: Ultra is a substantial direct upgrade. For Opus 4.7/4.8 users evaluating open-weights cost reduction: benchmark the quality delta on your actual task before committing — the 27% Intelligence Index gap is real. Affects you if: You are serving open-weight models on NVIDIA GPU infrastructure at scale; you use gpt-oss-120b or Gemma 4 in production and are evaluating upgrades; you have US-hosting or export-control constraints that rule out non-US model weights Adoption effort: Quick (drop-in via NVIDIA NIM or HuggingFace once weights available June 4 — eval first recommended) Primary source: https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai | Artificial Analysis detail: https://artificialanalysis.ai/articles/nvidia-nemotron-3-ultra-launch-announced Quality gate score: 7 (official NVIDIA source +3, concrete benchmark numbers and pricing comparison +2, HuggingFace/NIM release path confirmed +1, within scan window +1)
[HIGH] NVIDIA Cosmos 3 — First Open Physical AI Omnimodel with Native Robot Action Output
Source: NVIDIA Newsroom (Computex 2026 keynote) | Date: June 1, 2026 | Link: https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-3-the-open-frontier-foundation-model-for-physical-ai What changed: Previous Cosmos series generated video/images of simulated worlds but required a separate policy network to produce robot actions; Cosmos 3 adds native action output — a single forward pass now accepts text, images, video, and ambient audio, and can output any of those modalities plus concrete robot action data (joint angles, gripper positions, trajectory waypoints) TL;DR: Open Physical AI omnimodel — 32B (Super) and 8B (Nano) variants, #1 across 7+ robotics benchmarks, trained on 20 trillion multimodal tokens including 500k robotics trajectories and 100TB vehicle sensor data; weights and training code available now with commercial license Developer signal: Weights and training scripts are live at https://github.com/nvidia/cosmos and Hugging Face. Start with Cosmos 3 Nano (8B) for local evaluation on a single high-end workstation GPU; use Super (32B) for production simulation pipelines. The commercial license removes the research-only restriction that blocked Cosmos 1/2 from production deployments. Training data (500k robot trajectories, 455k protein structures, 100TB vehicle sensor data) is also open — use as fine-tuning seed data for domain-specific physical AI. A third Edge variant for real-time on-device inference was announced but not yet released. For developers not building robotic control, simulation, or autonomous vehicle pipelines: this is a Watch item only — the model's differentiated capability is action generation, not text reasoning. Affects you if: You are building robot control policies, physics simulators, autonomous vehicle data pipelines, or any system requiring multimodal input → robot action output Adoption effort: Moderate (new model architecture with structured action output format; requires understanding joint/trajectory representation schema before integrating into existing robot control stacks) Primary source: https://github.com/nvidia/cosmos | https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-3-the-open-frontier-foundation-model-for-physical-ai Quality gate score: 7 (official NVIDIA source +3, #1 across 7+ robotics benchmarks +2, open weights + training code on GitHub with dataset +1, within scan window +1)
API & SDK Changes
Nothing new within the 24h scan window. Most recent Anthropic Platform release note: May 29 (Claude Managed Agents on AWS — covered in prior digest). Claude Code CHANGELOG shows no new version released on June 1. anthropic-sdk-python last update: v0.105.2 (May 29 — outside window).
Research
arXiv cs.CL/cs.AI/cs.LG listing pages returned 403 errors at fetch time. HuggingFace Papers Daily returned 403. No papers surfaced via search meeting all quality gate criteria (recognized-lab authorship + associated code + concrete benchmarks + within 24h window) for this period. MiniCPM-V 4.5 (8B multimodal, May 2026) is noted as near-miss — outside scan window.
Tooling
Nothing at full-entry level within the 24h scan window. Five [NOTABLE] llama.cpp builds from June 1 appear in Quick Hits below.
Benchmarks & Leaderboards
[NOTABLE] Nemotron 3 Ultra Places at 48.0 on Artificial Analysis Intelligence Index — Open-Weights Map Updated
Source: Artificial Analysis | Date: June 1, 2026 | Link: https://artificialanalysis.ai/articles/nvidia-nemotron-3-ultra-launch-announced What changed: Nemotron 3 Ultra (48.0) enters the Intelligence Index as the highest-scoring US open-weights model, above gpt-oss-120b (33) and Gemma 4 31B (39); the open-weights global frontier is now: Kimi K2.6 54.0 (China) > Nemotron 3 Ultra 48.0 (US) > Gemma 4 31B 39.0; closed-model leader remains Opus 4.8 at 61.0 TL;DR: US open-weights quality frontier now at 48.0 (Nemotron 3 Ultra); 13% below global open leader Kimi K2.6 (54.0); 27% below Opus 4.8 (61.0) Developer signal: The index positions matter for model selection decisions under constraints. US-hosted open-weights: Nemotron 3 Ultra is now the benchmark ceiling. Global open-weights: Kimi K2.6 scores 54.0 with no export-control complications for most developers outside defense/government contexts. For teams evaluating "when can we move from closed to open model?": at 48.0 vs. 61.0, you are accepting a 27% quality reduction on the index vs. Opus 4.8. Benchmark this against your task-specific evals, not just the aggregate index. Intelligence Index = aggregate across reasoning, coding, math, instruction following — it will not predict quality gaps on narrow domain tasks. Affects you if: You make model selection decisions under cost or sovereignty constraints; you are benchmarking open vs. closed model quality tradeoffs Adoption effort: Quick (evaluation only — model weights arrive June 4) Primary source: https://artificialanalysis.ai/articles/nvidia-nemotron-3-ultra-launch-announced Quality gate score: 5 (benchmark source with concrete numbers +2, linked to primary source confirming figures +2, within scan window +1)
Trends & Emerging Tech
NVIDIA's Open-Model Strategy Clarifies: Hardware Company Becomes Foundational Model Distributor
Source: NVIDIA Newsroom (Computex 2026) | Date: June 1, 2026 | Link: https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai What's happening: In a single Computex keynote, NVIDIA released Nemotron 3 Ultra (language/reasoning, open weights, commercial) and Cosmos 3 (physical AI, open weights, commercial), alongside open training datasets (2.5 trillion pre-training tokens for Nano, 500k robotics trajectories, 100TB vehicle sensor data). This is the third consecutive Computex where NVIDIA used open model releases as a keynote centerpiece, now covering both language intelligence and physical AI simultaneously. Why watch this: The pattern is now clear: NVIDIA's open model releases are GPU demand drivers, not standalone AI products. Open + commercial license means any developer who downloads Nemotron 3 Ultra or Cosmos 3 and runs it needs NVIDIA GPUs — NVIDIA wins whether or not it monetizes the model directly. For developers, this translates to: expect NVIDIA NIM microservice integration to be the path of least resistance for these models, with NIM abstracting away manual vLLM/TensorRT-LLM setup on NVIDIA hardware. The Cosmos 3 action-generation capability is genuinely novel; if physical AI adoption follows a 2–3 year lag behind language AI (analogous to 2021 → 2023 LLM diffusion into production), 2027–2028 is when Cosmos-style models show up in non-robotics enterprise contexts (smart infrastructure, AV data pipelines, simulation-as-a-service).
Technical Discussions
Nothing cleared the quality bar this period. LMArena state as of June 1: Grok 4.1 Thinking leads at 1483 Elo (established November 2025, no new model additions confirmed in the 24h window). SWE-bench standings unchanged from prior digest: Claude Mythos Preview 93.9%, GPT-5.5 88.7%, Opus 4.8 88.6%.
Quick Hits
- llama.cpp b9453 (June 1, 14:56 UTC) — EXAONE 4.5 model support: adds EXAONE 4.5 (LG AI Research) model implementation with vision marker support and GQA for multimodal projection; EXAONE 4.5 multimodal can now be loaded and run locally in llama.cpp. [https://github.com/ggml-org/llama.cpp/releases/tag/b9453]
- llama.cpp b9452 (June 1, 14:20 UTC) — Vulkan Block-load Q3_K/Q6_K: accelerates quantized matrix operations on Intel Battlemage (BMG) GPUs and NVIDIA via the Vulkan path; performance improvement for INT3/INT6 quantized inference on non-CUDA-primary hardware. [https://github.com/ggml-org/llama.cpp/releases/tag/b9452]
- llama.cpp b9464 (June 1, 19:57 UTC) — Spec decode behavior change:
draft-simpleauto-enable feature removed; n_outputs_max fixed. If your llama.cpp deployment relied on automatic speculative decoding activation, you must now enable it explicitly. This is a silent behavior change for any deployment where draft-simple was active without explicit configuration. [https://github.com/ggml-org/llama.cpp/releases/tag/b9464] - llama.cpp b9459 (June 1, 18:58 UTC) — Metal GLU kernel templating: Apple Silicon GLU operations now use templated f16/f32 kernels with native type load/store; incremental inference quality/performance improvement for M-series Mac users running quantized models via Metal. [https://github.com/ggml-org/llama.cpp/releases/tag/b9459]
- llama.cpp b9460 (June 1, 19:23 UTC) — VRAM savings: limits maximum outputs reserved per llama_context sequence; reduces peak VRAM allocation on multi-sequence workloads without changing inference behavior — relevant if you're running at VRAM limits on consumer GPUs. [https://github.com/ggml-org/llama.cpp/releases/tag/b9460]
Worth Watching (Announced, Not Yet Shipped)
⚠️⚠️⚠️ Gemini CLI Hard Stop — June 18 (17 days) — First time in this digest
Source: Google Developers Blog (announced May 20, 2026) | Link: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/
On June 18, 2026, gemini CLI and Gemini Code Assist IDE extensions stop serving requests for Google AI Pro, Ultra, and free personal users (Enterprise/Standard Gemini Code Assist licenses are unaffected through a later date). The replacement is Antigravity CLI (agy), built in Go, available today. Antigravity CLI retains Agent Skills, Hooks, Subagents, and Extensions (renamed to Antigravity Plugins) but does not have 1:1 feature parity out of the gate. Action this week: Install agy, audit your gemini CLI scripts and CI pipeline steps, and test each against Antigravity. Any script using gemini CLI commands that do not map cleanly to agy equivalents needs a workaround before June 18.
⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal June 8 (7 days)
(Carried from May 26 digest — 7 days remain, urgent)
Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026
The Api-Revision: 2026-05-07 opt-out header stops working June 8. Applications still using response.outputs structure must migrate to response.steps. Action today: search your codebase for response.outputs and Api-Revision: 2026-05-07. 7 days is not much runway.
⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (14 days)
(Carried from prior digests)
Source: Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations
claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Migration: Sonnet 4 → claude-sonnet-4-6-20260217; Opus 4 → claude-opus-4-8. Read the migration guide before upgrading to Opus 4.8 — adaptive thinking replaces explicit budget_tokens; setting temperature, top_p, or top_k to non-default values returns a 400 error.
⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (18 days)
(Carried from prior digests) Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API." Takes 2 minutes; no code changes required.
⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (26 days) — Newly surfaced
Source: OpenAI Platform Changelog | Link: https://platform.openai.com/docs/changelog
GPT-4.5 will be retired from ChatGPT on June 27, 2026. Note: this is specifically the ChatGPT product surface; direct API route retirement is unconfirmed from a primary source as of this digest. If you have any gpt-4.5 model identifiers in code, audit whether they target the API or ChatGPT-based integrations. OpenAI o3 retirement from ChatGPT is also set for August 26, 2026.
NVIDIA Cosmos 3 Edge Variant — Announced, Not Released
Source: NVIDIA Newsroom / Computex 2026 | Link: https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-3-the-open-frontier-foundation-model-for-physical-ai Edge variant of Cosmos 3 announced for real-time inference on embedded/edge hardware in physical AI deployments. No model size, benchmark numbers, or release date given. Watch: https://github.com/nvidia/cosmos for release.
NVIDIA Nemotron 3 Ultra — Weights Available June 4 (3 days)
Source: NVIDIA Newsroom / Computex 2026 | Link: https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai Weights, NIM microservice, and OpenRouter listing expected June 4. Confirmed platforms: Hugging Face, ModelScope, OpenRouter, build.nvidia.com (NVIDIA NIM).
Gemini 3.5 Pro — In Internal Use, Expected July 2026
Source: Multiple Google I/O briefing reports | Link: https://ofox.ai/blog/gemini-3-5-pro-release-date-expected-specs-2026/ Gemini 3.5 Pro is reportedly in internal use at Google; slated for public release next month. No official announcement yet — only Flash (released May 19, GA) is publicly available. When released, expect significant leaderboard updates. No pricing, model ID, or benchmark numbers disclosed.
Claude Mythos — Public Release Expected "In Coming Weeks"
(Carried from prior digests) Source: Anthropic | Link: https://anthropic.com/glasswing Claude Mythos Preview leads SWE-bench Verified at 93.9% (5.3pp above Opus 4.8). Broad API access pending Anthropic cybersecurity safeguards completion. No GA date, pricing, or model ID disclosed.
Ollama v0.30.0 — Still Pre-Release (rc31 as of May 29)
(Carried from May 15 digest) Source: Ollama (GitHub) | Link: https://github.com/ollama/ollama/releases v0.30.0 restructures Ollama to use llama.cpp directly as backend, with MLX for Apple Silicon. No stable GA date announced.
<details> <summary>🔭 Horizon — Open Questions, Emerging Patterns & Grounded Speculation</summary>
This section operates under different rules than the digest above. Evidence-grounded speculation is allowed. Pure prediction is not. Every claim here must cite a source from this digest or a real paper/benchmark. Label each entry by type so the reader knows what kind of thinking they're engaging with.
[TENSION] "America's most intelligent open model" is technically correct and strategically incomplete NVIDIA frames Nemotron 3 Ultra as the "smartest US open-weights model" with 48.0 Intelligence Index. Both claims check out. What the framing omits: Kimi K2.6 (Moonshot AI, China) scores 54.0 on the same index — a 13% quality gap — and has been available for weeks. The gap is real: 48.0 vs. 54.0 is not noise on a well-calibrated index. For the majority of developers globally who have no sovereign or export-control restrictions on model weights, the benchmark-optimal open-weights choice right now is Kimi K2.6, not Nemotron 3 Ultra. The case for Ultra is speed (300+ tokens/sec), NIM deployment simplicity, and NVIDIA infrastructure integration. A useful heuristic: if your workload is throughput-constrained (chat, summarization, document Q&A at scale), Ultra's speed advantage may outweigh the quality delta. If your workload is quality-constrained (deep reasoning, complex coding, multi-step agentic tasks), the 13% gap will show in your task-specific evals. Grounded in: Nemotron 3 Ultra Intelligence Index 48.0 (this digest, Model Releases); Kimi K2.6 Intelligence Index 54.0 and Opus 4.8 61.0 (Artificial Analysis, confirmed via search snippets this digest)
[BUILDER'S ANGLE] Cosmos 3 makes "simulate then train" tractable for small robotics teams without three-model infrastructure Prior to Cosmos 3, building a data-efficient robot control pipeline required maintaining three separate trained models: a world model for simulation (generating what the world looks like after an action), a vision-language model for scene understanding, and a policy network producing concrete motor actions. Each required its own GPU budget, fine-tuning pipeline, and inference infrastructure. Cosmos 3 collapses these into one open-weights model (8B Nano, commercial license, weights live now). The specific enablement for small teams: Cosmos 3 Nano runs inference on a single high-end workstation GPU, making "input visual scene → generate action trajectory → evaluate outcome" loops feasible without cloud-scale compute. The 500,000 open robotics trajectories in the training dataset also give teams a fine-tuning seed without expensive proprietary data collection. The unresolved question: how much task-specific fine-tuning is needed before the action output quality is production-usable? The benchmark result ("1 across 7+ robotics benchmarks") does not specify fine-tuning cost. Grounded in: Cosmos 3 (this digest, Model Releases) — 8B Nano variant, 500k robotics trajectories, commercial license, GitHub weights available now
[IF THIS CONTINUES] Three developer CLI platforms deprecating or restructuring in 30 days signals a new maintenance category
In 30 days: Gemini CLI deprecated June 18 (→ Antigravity CLI, no 1:1 parity); Claude Code Workflows entering research preview (May 28, diverging from single-session model toward persistent multi-agent orchestration); Google's ant CLI launching for Anthropic API (April 8). Each lab is shipping its own agent-native CLI paradigm. If this pace continues, organizations that scripted lab CLIs into their CI/CD pipelines face a new recurring maintenance category: "CLI migration sprint" — separate from API migrations, requiring workflow-level testing, not just model ID substitution. The Gemini CLI case is the clearest example: Antigravity CLI does not have 1:1 feature parity, so any automated pipeline using gemini commands needs workflow-level validation, not a find-and-replace. The practical recommendation: if you have more than a handful of CLI-dependent scripts, maintain a test harness for them — one that runs against the new CLI before the hard cutoff date.
Grounded in: Gemini CLI deprecation June 18, Antigravity CLI available today (this digest, Worth Watching); Claude Code Workflows research preview May 28 (Anthropic Platform release notes, prior digest); Anthropic ant CLI April 8 (Anthropic Platform release notes)
[OPEN QUESTION] What regression forced the removal of draft-simple auto-enable in llama.cpp b9464 — and how many deployments were unknowingly running speculative decoding?
llama.cpp b9464 (June 1) removes the draft-simple speculative decoding auto-enable feature. The commit description confirms removal and an n_outputs_max fix but does not name the triggering regression. Speculative decoding produces incorrect output tokens when the draft model diverges from the main model in specific token patterns; if auto-enable was silently activating, deployments may have been running speculative decoding without knowing it. The silent nature is the issue: a deployment that never explicitly configured draft-simple would not have known it was running, and therefore would not have tested its outputs against a non-speculative baseline. If you have any llama.cpp deployment that hasn't been rebuilt against b9464 yet, run your output quality regression suite before updating — the delta in output behavior (if any) will tell you whether auto-enable was active and impacting your production outputs.
Grounded in: llama.cpp b9464 spec decode behavior change (this digest, Quick Hits)
Excluded: ~23 items below quality gate threshold or outside scan window. Near-misses: Gemini 3.5 Flash (May 19, Google I/O — outside 24h window; 76.2% Terminal-Bench 2.1, 83.6% MCP Atlas, $1.50/$9 per 1M, 1M context, 4x faster than frontier; presumably covered in May 19–25 digest window); MiniCPM-V 4.5 (8B multimodal, May 2026 — outside window, 3D-Resampler, hybrid RL); OpenAI GPT Realtime 2 / Translate / Whisper (May 7 — outside window, GPT-5-class reasoning in realtime voice API); Meta SAM Audio (December 16, 2025 — outside window, audio separation foundation model, github.com/facebookresearch/sam-audio); Grok 4.1 Thinking LMArena position 1483 (November 2025 — outside window, no new movement in 24h); arXiv cs.CL/cs.AI/cs.LG (403 errors — no papers evaluated); HuggingFace Papers Daily (403 error — no papers evaluated); Mistral AI Now Summit June 2026 (403 on direct fetch — 10MW Les Ulis Q3 inference facility announced; insufficient technical developer signal without primary source read); Claude Code v2.1.159 (May 31 — internal infrastructure, no user-facing changes); anthropic-sdk-python v0.105.2 (May 29 — outside window, routine patch); Ollama v0.30.0-rc31 (May 29 — pre-release); Ollama v0.24.0 (May 14 — outside window, Codex App support + reworked MLX sampler); vLLM v0.22.0 (May 29 — covered in prior digest; v0.22.0 confirmed as most recent release, no v0.23.0 in window); HuggingFace hub v1.0 (October 2025 — outside window, httpx migration, hf CLI redesign, hf_xet file transfers); Transformers v5.9.0 (current as of June 2026 but release date outside window); Intel OpenVINO 2026.1 (April 2026 — outside window, llama.cpp backend for Intel CPU/GPU/NPU); NVIDIA CES 2026 open models (January 2026 — outside window); GPT-5.3-Codex (February 5, 2026 — outside window); GPT-4.5 retirement — ChatGPT-only confirmation, API scope unconfirmed (added to Worth Watching with caveat); Gemini models Firebase AI Logic shutdown June 1 (pre-announced Gemini 2.0 Flash/Flash-Lite for Firebase — already carried in prior digest); llama.cpp b9444 (May 31 — If-None-Match weak ETag handling, cosmetic server fix); llama.cpp b9451 (June 1 — Vulkan unused function removal, internal cleanup only); llama.cpp b9455 (June 1 — quantized KV cache TP partial-view fix, infrastructure); llama.cpp b9457/b9458 (June 1 — Vulkan mutex/lock contention improvements, internal concurrency only).