AI Developer Digest

Thu, Jun 4, 2026

19 signals that cleared the gate22 min read

The Signal — start here

NVIDIA ships Nemotron 3 Ultra 550B — the first US open-weights model to score at frontier-class intelligence on an independent index (48 on Artificial Analysis Intelligence Index, behind Kimi K2.6 at 54 but ahead of every other US open model). The architecture is unconventional: a hybrid Mamba-2 / Transformer / MoE that achieves 420 tokens/sec with 1M token context and NVFP4 quantization for commercial deployment. The secondary story is OpenAI consolidating its developer surface: Reusable Prompts, Evals Platform, and Agent Builder all deprecated simultaneously with a November 30 shutdown — the clearest signal yet that OpenAI's developer thesis has consolidated around the Agents SDK and Responses API. For tooling: Claude Code v2.1.162 fixes a silent data-loss bug in MCP servers with paginated tool lists, and llama.cpp's WebGPU backend gets a FlashAttention refactor standardizing quantization across tile paths.

Must-reads today

NVIDIA Nemotron 3 Ultra 550B weights live — first US open-weights frontier-class model; 420 tok/s, 1M context, NVFP4, free commercial use; available now on HuggingFace, NIM, and OpenRouter

OpenAI triple deprecation (Reusable Prompts + Evals + Agent Builder, all Nov 30) — if you use any of these products, audit and start migration planning now

Claude Code v2.1.162 MCP fix — paginated tool lists were silently truncated to the first page, dropping tools; update immediately if you use MCP servers with more than one page of tools

Breaking Changes

●Breaking

OpenAI Deprecates Reusable Prompts, Evals Platform, and Agent Builder — All Shutdown November 30, 2026

What changed

Three OpenAI developer products were simultaneously deprecated on June 3: (1) Reusable Prompt objects (v1/prompts) — announced deprecated, shutdown November 30, 2026; (2) Evals Platform — read-only October 31, shutdown November 30, 2026; (3) Agent Builder — shutdown November 30, 2026; ChatKit remains available

TL;DR

OpenAI deprecated its managed Prompt API, hosted Evals product, and GUI Agent Builder tool on the same day (June 3), all with a November 30, 2026 hard shutdown — developers have roughly 5 months to migrate code that calls v1/prompts, move eval workflows to external tooling, or rebuild agent configurations in the Agents SDK.

Developer signal

Audit your codebase for three distinct migration needs. (1) Reusable Prompts: search for v1/prompts, client.prompts, or managed prompt object references; move prompt content directly into application code before November 30. Migration guide: https://developers.openai.com/api/docs/guides/prompting/migrate-from-prompt-object. (2) Evals Platform: if you use OpenAI's hosted evaluation UI or the Evals API, export your existing eval configs and results before October 31 (when it goes read-only); OpenAI's suggested migration path is Promptfoo, an open-source eval framework. (3) Agent Builder: if you use OpenAI's GUI-based agent builder, migrate to the Agents SDK for code-based agent construction (openai.agents) or to ChatGPT Workspace Agents. Start now — 5 months is long enough to feel comfortable but short enough to surprise teams that don't actively maintain their eval or prompt infrastructure. The simultaneous triple deprecation signals OpenAI is consolidating developer tooling around the Responses API and Agents SDK as the canonical surfaces.

Affects you ifYou call v1/prompts in your OpenAI API integration; you use OpenAI's hosted Evals platform for model evaluation; you use OpenAI Agent Builder to configure agentsEffortModerate (Reusable Prompts: move logic to code; Evals: tool migration; Agent Builder: SDK rebuild — each is a distinct migration task)

OpenAI Platform Changelog | Date: June 3, 2026 | Link: https://platform.openai.com/docs/changeloghttps://developers.openai.com/api/docs/deprecations

Model Releases

High

NVIDIA Nemotron 3 Ultra 550B: First US Open-Weights Model at Frontier-Class Intelligence Index

What changed

NVIDIA released open weights for Nemotron 3 Ultra — 550B total parameters, 55B active (MoE), hybrid Mamba-2/Transformer architecture — the first US open-weights model to reach an Artificial Analysis Intelligence Index score of 48, ahead of every other US open model (Nemotron 3 Super at 36, Gemma 4 31B at 39) and behind the current Chinese open-weights frontier (Kimi K2.6 at 54); weights available today on HuggingFace, OpenRouter, ModelScope, and NVIDIA NIM

TL;DR

Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index, achieves 420 tokens/sec throughput (fastest US open model in its class), supports a 1M token context window, ships with NVFP4 and BF16 formats, includes training recipes, and is licensed for commercial use under the NVIDIA Open Model License — weights live now at nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 and nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 on Hugging Face.

Developer signal

If you have been waiting for an open-weights model competitive with frontier closed models for agentic tasks, Nemotron 3 Ultra is the current best US option. The NVFP4 format delivers up to 5x throughput vs. BF16 on compatible NVIDIA hardware (RTX 40-series and above, H100, B100) — use NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 for production inference on NVIDIA GPUs. The hybrid Mamba-2/Transformer architecture provides sub-quadratic attention for long-context tasks, meaning the 1M context window is more practical than with pure Transformer models at the same parameter count. For NVIDIA NIM: nim/nvidia/nemotron-3-ultra-550b-a55b is the endpoint — NIM handles quantization selection automatically. For OpenRouter: nvidia/nemotron-3-ultra-550b-a55b. The LatentMoE routing and multi-token prediction support make this model particularly well-suited for multi-turn agent tasks where total token count is high. Training recipes and a substantial portion of training data are included in the release — this is useful for labs that want to fine-tune or study the training methodology. Caveat: benchmark claims (AIME 2025, TerminalBench, SWE-Bench Verified "leading accuracy") are NVIDIA-reported; independent third-party verification pending — Artificial Analysis Intelligence Index 48 is the most independently grounded number currently available.

Affects you ifYou are building agentic pipelines and want open-weights inference without cloud API costs; you are running long-context workloads (>200K tokens) where open-weights quality was previously insufficient; you have NVIDIA GPU infrastructure and need a commercial-use model at frontier-class capabilityEffortModerate (download weights via HuggingFace or use NIM endpoint; NVFP4 requires NVIDIA GPU with NVFP4 support; 55B active parameters requires multi-GPU setup for BF16)

NVIDIA Technical Blog | Date: June 4, 2026 | Link: https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents/https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

API & SDK Changes

Notable

OpenAI Container Sessions Billed Per-Minute Starting June 2

What changed

Container sessions previously billed at the full 20-minute session rate regardless of actual duration; from June 2, eligible container sessions are billed per minute with a 5-minute minimum, using the same underlying per-minute rate

TL;DR

OpenAI container session billing changed from a flat 20-minute rate to per-minute (5-minute minimum) — short container sessions now cost proportionally less; no code changes required, billing changes automatically.

Developer signal

No code changes required. If you run container sessions shorter than 20 minutes on average, your container costs will decrease proportionally. For sessions under 5 minutes, you are still billed for 5 minutes. Update your cost projections — the per-minute rate is the same, but the billing floor dropped from 20 minutes to 5 minutes. For workloads that spin up containers for short tasks (sub-5-minute tool calls, quick eval runs), the floor means little change; for workloads that average 6–15 minute sessions, this is a meaningful reduction.

Affects you ifYou use OpenAI container sessions for code execution, sandboxed tool calls, or evaluation workloadsEffortQuick (no code changes; update cost projections)

OpenAI Platform Changelog | Date: June 2, 2026 | Link: https://platform.openai.com/docs/changeloghttps://platform.openai.com/docs/changelog

Notable

Claude Code v2.1.162: MCP Paginated Tools Fix, Bedrock/Vertex Picker Regression, Login Fix

What changed

v2.1.162 ships four fixes: (1) MCP servers with paginated tools/list responses now return all pages — previously only the first page was returned, silently dropping any tools beyond the first page; (2) Bedrock and Vertex users can now select "Opus (1M context)" from the /model picker (regression introduced in v2.1.129); (3) remote-session login no longer fails with "Can't access this organization" for users with forceLoginMethod and forceLoginOrgUUID configured; (4) file descriptor exhaustion fixed when running a build inside a skill directory (non-.md files no longer trigger skill reloads)

TL;DR

Claude Code v2.1.162 fixes a silent tool-dropping bug in MCP servers with many tools (paginated lists were truncated to page 1), restores the Opus 1M context model picker for Bedrock/Vertex users, and fixes remote-session org login for enterprise configurations.

Developer signal

The MCP paginated tools fix is the most impactful item: if you use Claude Code with an MCP server that exposes a large number of tools (enough to require pagination in the tools/list response), all tools beyond the first page were silently unavailable. Claude Code would not surface an error — it simply wouldn't know those tools existed. Update to v2.1.162 immediately and re-test any workflows that use MCP servers with large tool sets. To verify: run claude mcp list after updating and check that your tool count matches what the MCP server actually exposes. For Bedrock/Vertex teams: the Opus 1M context picker was broken since v2.1.129 — this restores it. For enterprise teams using forceLoginMethod + forceLoginOrgUUID in remote sessions: the org authentication failure is now fixed. Update via npm i -g @anthropic-ai/claude-code@latest.

Affects you ifYou use Claude Code with MCP servers that have many tools (large tool catalogs exposed via MCP); you use Claude Code on Bedrock or Vertex and need Opus 1M context; you configure remote sessions with forced login methodsEffortQuick (update Claude Code; re-test MCP tool availability)

Anthropic / Claude Code | Date: June 3, 2026 | Link: https://github.com/anthropics/claude-code/releaseshttps://github.com/anthropics/claude-code/releases

Research

No papers cleared the quality gate this period. arXiv searches for June 3–4 returned no qualifying submissions meeting the criteria of recognized-lab authorship + associated code + concrete benchmark numbers. Hugging Face Papers Daily returned no qualifying June 4 papers. The Anthropic MITRE ATT&CK analysis is covered under Trends below — it is primarily a security research report rather than a ML research paper with benchmarks and code.

Tooling

Notable

llama.cpp b9499–b9501: WebGPU FlashAttention Refactor, Metal Heartbeat Optimization

What changed

Three builds on June 4: b9499 refactors FlashAttention in the WebGPU backend and standardizes quantization support across tile paths and mul_mat operations; b9500 reduces the Metal GPU heartbeat interval from 500ms to 5ms; b9501 (minor — details not in published notes at time of scan)

TL;DR

llama.cpp's WebGPU backend gets a FlashAttention refactor that unifies quantization handling across paths (reducing code divergence and potential correctness bugs), and the Metal backend's GPU polling interval drops from 500ms to 5ms — improving responsiveness for macOS/iOS inference workloads.

Developer signal

For WebGPU inference (browser-based llama.cpp or WebGPU compute in edge deployments): b9499's FlashAttention refactor standardizes how quantized tile paths handle attention computation — if you've seen subtle output differences between quantized and non-quantized WebGPU runs, rebuild from b9499+. No API changes; the fix is internal to the kernel. For macOS/iOS Metal inference: the 500ms → 5ms heartbeat reduction (b9500) means GPU polling is now near-real-time; this reduces the idle stall visible as input latency on interactive inference. Practical impact: noticeable for streaming token output in Metal-backed applications; likely imperceptible for batch inference. Rebuild from b9501 or latest to pick up all three builds.

Affects you ifYou run llama.cpp with WebGPU backend (browser or edge inference); you run llama.cpp with Metal backend on macOS or iOS and care about interactive token streaming latencyEffortQuick (rebuild from b9501 or latest tag; no API changes)

llama.cpp GitHub | Date: June 4, 2026 | Link: https://github.com/ggml-org/llama.cpp/releaseshttps://github.com/ggml-org/llama.cpp/releases

Benchmarks & Leaderboards

No new leaderboard entries or SOTA movements confirmed for June 4, 2026 independent of this digest's Model Releases. The Nemotron 3 Ultra Artificial Analysis Intelligence Index score (48) is the only new benchmark entry today — covered under Model Releases above. LMArena frontier band (1,450–1,561 Elo) unchanged. No new SWE-bench entries. Kimi K2.6 (54 on Artificial Analysis Intelligence Index) remains the top open-weights model globally; Nemotron 3 Ultra (48) is now the top US open-weights model, displacing Nemotron 3 Super (36).

Trends & Emerging Tech

Anthropic's First Year-Over-Year Data on AI-Enabled Attacks: Post-Compromise Use Surges

What's happening

Anthropic analyzed 832 accounts banned for malicious cyber activity between March 2025 and March 2026 and mapped attack patterns to MITRE ATT&CK. Key numbers: 67.3% of cases involved AI for malware writing or attack preparation; AI-assisted phishing fell 8.6% (initial access); AI-assisted post-compromise activity (account discovery) rose 8.9% — attackers are deploying AI deeper in the kill chain, not just at the entry point. The proportion of actors rated medium risk or higher rose from roughly one-third to well over half in twelve months. Anthropic is in discussions with MITRE about new ATT&CK categories for autonomous AI-orchestrated attack chains — behaviors like "make real-time decisions about what to do next and execute without human intervention" don't map to existing framework entries.

Why watch this

For developers building systems that call LLMs with user-supplied input: the shift of AI use toward post-compromise phases (discovery, lateral movement) means the most dangerous threat vector is not a user tricking your app into phishing — it's a compromised environment using AI to enumerate what it can reach. The practical implication is defense-in-depth at the AI layer: restricting what tools an agent can call, enforcing egress allowlists, and monitoring tool_use patterns for anomalous discovery requests. Anthropic's data is the first year-over-year primary-source dataset on this; independent replication at other labs is needed before drawing hard conclusions, but the directional shift toward post-compromise AI use is consistent with what security practitioners have been reporting anecdotally. New MITRE ATT&CK categories for AI orchestration, if adopted, would eventually affect how detection rules are written for AI-integrated systems.

Anthropic | Date: June 4, 2026 | Link: https://www.anthropic.com/news/AI-enabled-cyber-threats-mitre-attack

Technical Discussions

Nothing cleared the quality bar this period. No Hacker News threads with score >200 and concrete technical depth found for June 3–4, 2026. Simon Willison's most recent post was June 2, 2026 (Microsoft MAI models — covered in prior digest).

Quick Hits

OpenAI sora-2 slug update — sora-2 now points to sora-2-2025-12-08; previous snapshot sora-2-2025-10-06 still accessible by pinned slug if needed. [https://platform.openai.com/docs/changelog]
OpenAI gpt-4o-mini-tts and gpt-4o-mini-transcribe slug updates — both now point to 2025-12-15 snapshots; prior 2025-03-20 snapshots still accessible by pinned slug. [https://platform.openai.com/docs/changelog]
NVIDIA Nemotron 3 Ultra BF16 and NVFP4 variants both available — nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 (full precision, large VRAM footprint) and nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 (5x throughput on compatible NVIDIA GPUs). [https://huggingface.co/nvidia]

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal June 8 (4 days) — MOST URGENT

(Countdown updated — 4 days remaining)

The Api-Revision: 2026-05-07 opt-out header stops working June 8. Applications using response.outputs structure must migrate to response.steps. Action today: search your codebase for response.outputs and Api-Revision: 2026-05-07. 4 days is the entire remaining window.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026

⚠️⚠️ Windows Local AI Runtime — KB5039239 June 9 (5 days)

(Countdown updated)

Windows Update KB5039239 delivers the expanded on-device AI stack (Aion 1.0 runtime, CPU/GPU/NPU support) on June 9. Required for production use of Aion 1.0 Instruct and Aion 1.0 Plan on end-user devices. Aion 1.0 open weights land on Hugging Face in July.

Windows Developer Blog | Link: https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (11 days)

(Countdown updated)

claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Migrate to claude-sonnet-4-6-20260217 and claude-opus-4-8 respectively. Review the Opus 4.8 migration guide before upgrading — adaptive thinking replaces budget_tokens; setting temperature, top_p, or top_k to non-default values returns a 400 error.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

⚠️⚠️⚠️ Gemini CLI Hard Stop — June 18 (14 days)

(Countdown updated)

gemini CLI and Gemini Code Assist IDE extensions stop serving requests for Google AI Pro, Ultra, and free personal users on June 18. Replacement is Antigravity CLI (agy). Audit CLI scripts and CI pipeline steps now — Antigravity CLI does not have 1:1 feature parity.

Google Developers Blog | Link: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (15 days)

(Countdown updated)

All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API." Takes 2 minutes; no code changes required.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key

⚠️ Gemini Image Models Shutdown — June 25 (21 days)

(Countdown updated)

gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shutting down June 25, 2026. Migrate to stable image model equivalents before the shutdown date.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (23 days)

(Countdown updated)

GPT-4.5 being retired from the ChatGPT product surface on June 27; direct API route retirement unconfirmed. Audit gpt-4.5 model identifiers in code.

OpenAI Platform Changelog | Link: https://platform.openai.com/docs/changelog

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — November 30 (179 days)

(New — from today's Breaking Changes)

Deprecated June 3, shutdown November 30, 2026. Move prompt content to application code. Migration guide: https://developers.openai.com/api/docs/guides/prompting/migrate-from-prompt-object

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

⚠️ OpenAI Evals Platform Shutdown — November 30 (179 days)

(New — from today's Breaking Changes)

Read-only October 31, shutdown November 30, 2026. Export eval configs before October 31; migrate to Promptfoo or equivalent.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

⚠️ OpenAI Agent Builder Shutdown — November 30 (179 days)

(New — from today's Breaking Changes)

Shutdown November 30, 2026. Migrate to Agents SDK (openai.agents) or ChatGPT Workspace Agents.

OpenAI | Link: https://developers.openai.com/api/docs/deprecations

Meta Muse Spark API — Broader Release Expected "Later June 2026" (No Confirmed Date)

(New — announced, repeatedly delayed)

Meta's Muse Spark AI API has been delayed multiple times. As of June 4, a spokesperson confirmed it is in testing with select early partners and targeting broader release later in June 2026. No API documentation or pricing published. No confirmed date.

Meta (third-party reporting) | Link: https://yournews.com/2026/06/04/7030067/meta-delays-launch-of-muse-spark-ai-api-despite-earlier/

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

(Carried — status unchanged)

No timeline given. Currently: no public API, no claude.ai access at any tier.

Anthropic | Link: https://www.anthropic.com/news/expanding-project-glasswing

Gemini 3.5 Pro — Expected July 2026

(Carried — no official date)

Sundar Pichai stated "give us until next month" at Google I/O 2026 (May 19). No official announcement, pricing, model ID, or benchmark numbers.

Filtered from 30+ primary sources against a published quality rubric. No press releases, no fluff — only what changes what you build.

Breaking Changes

OpenAI Deprecates Reusable Prompts, Evals Platform, and Agent Builder — All Shutdown November 30, 2026

Model Releases

NVIDIA Nemotron 3 Ultra 550B: First US Open-Weights Model at Frontier-Class Intelligence Index

API & SDK Changes

OpenAI Container Sessions Billed Per-Minute Starting June 2

Claude Code v2.1.162: MCP Paginated Tools Fix, Bedrock/Vertex Picker Regression, Login Fix

Research

Tooling

llama.cpp b9499–b9501: WebGPU FlashAttention Refactor, Metal Heartbeat Optimization

Benchmarks & Leaderboards

Trends & Emerging Tech

Anthropic's First Year-Over-Year Data on AI-Enabled Attacks: Post-Compromise Use Surges

Technical Discussions

Quick Hits

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal **June 8 (4 days)** — MOST URGENT

⚠️⚠️ Windows Local AI Runtime — **KB5039239 June 9 (5 days)**

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement **June 15 (11 days)**

⚠️⚠️⚠️ Gemini CLI Hard Stop — **June 18 (14 days)**

⚠️⚠️ Gemini API Unrestricted Key Deadline — **June 19 (15 days)**

⚠️ Gemini Image Models Shutdown — **June 25 (21 days)**

⚠️ GPT-4.5 Retirement from ChatGPT — **June 27 (23 days)**

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — **November 30 (179 days)**

⚠️ OpenAI Evals Platform Shutdown — **November 30 (179 days)**

⚠️ OpenAI Agent Builder Shutdown — **November 30 (179 days)**

Meta Muse Spark API — Broader Release Expected "Later June 2026" (No Confirmed Date)

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

Gemini 3.5 Pro — Expected July 2026

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal June 8 (4 days) — MOST URGENT

⚠️⚠️ Windows Local AI Runtime — KB5039239 June 9 (5 days)

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (11 days)

⚠️⚠️⚠️ Gemini CLI Hard Stop — June 18 (14 days)

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (15 days)

⚠️ Gemini Image Models Shutdown — June 25 (21 days)

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (23 days)

⚠️ OpenAI Reusable Prompts (`v1/prompts`) Shutdown — November 30 (179 days)

⚠️ OpenAI Evals Platform Shutdown — November 30 (179 days)

⚠️ OpenAI Agent Builder Shutdown — November 30 (179 days)