AI Developer Digest

Wed, Jun 3, 2026

18 signals that cleared the gate25 min read

The Signal — start here

Microsoft Build 2026 (June 2–3) is the dominant story: Microsoft shipped its first in-house reasoning model (MAI-Thinking-1, 35B active MoE), a compact coding model rolling out to GitHub Copilot (MAI-Code-1-Flash), and on-device SLMs for Windows (Aion 1.0 — Instruct runs on CPU with no NPU/GPU required, Plan is a 14B in-box agentic model). All three are built without OpenAI components. The strategic signal: Microsoft is no longer purely an OpenAI reseller for AI-native developers in its own ecosystem. The secondary story is infrastructure: the Windows Local AI Runtime update (KB5039239, June 9) brings Aion 1.0 to nearly any Windows PC, and NVIDIA's RTX Spark platform sets the hardware table for personal AI agents. For non-Microsoft developers, today is quieter — Anthropic shipped a useful billing fix (no charge for clean refusals) and Claude Code v2.1.161 closed out several persistent bugs.

Must-reads today

Microsoft MAI-Thinking-1 — first in-house Microsoft reasoning model not based on OpenAI data; private preview via Azure Foundry + free via GitHub Models; 97% AIME 2025, human-preference parity with Sonnet 4.6 in blind evals

Microsoft Aion 1.0 — on-device SLMs shipping into Windows itself; Instruct runs on pure CPU (any Windows PC), Plan is a 14B in-box agentic model; Windows Local AI Runtime KB5039239 ships June 9

Anthropic: no billing for clean refusals — Claude API no longer charges when a request returns stop_reason: "refusal" with zero generated output; no code changes required, retroactive cost reduction for content-flagged workloads

Breaking Changes

No breaking changes this period.

Model Releases

High

Microsoft MAI-Thinking-1: First In-House Reasoning Model, Trained Without OpenAI Data

What changed

Microsoft released its first large-scale in-house reasoning model — not a distillation or derivative of OpenAI's models — built from scratch on commercially licensed data using a sparse Mixture-of-Experts architecture; this is the first Microsoft-origin model available through Azure AI Foundry

TL;DR

MAI-Thinking-1 is a 35B-active-parameter sparse MoE reasoning model with a 256K context window, scoring 97.0% on AIME 2025 and 94.5% on AIME 2026, matching Claude Opus 4.6 on SWE-Bench Pro, and reaching human-preference parity with Claude Sonnet 4.6 in blind evals (Surge independent ratings); available in private preview via Azure AI Foundry and free via GitHub Models for prototyping.

Developer signal

If you currently use Azure AI Foundry and want a reasoning model that's fully within Microsoft's commercial data chain (no OpenAI IP exposure), MAI-Thinking-1 is the access path. To request private preview: visit the Azure AI Foundry model catalog and apply for access. For prototyping and evaluation with no Azure subscription required: GitHub Models provides free-tier access. The model is Chat Completions API-compatible, supports function calling, and carries a reported ~40% cost discount vs. equivalent o3 usage on Azure — though per-token rate cards are not yet published. Caveats: benchmark claims are self-reported (Microsoft published a preprint; independent replication has not yet occurred). Treat AIME and SWE-Bench numbers as directionally useful, not independently verified. If your workload is math-heavy multi-step reasoning or long-horizon coding tasks, this is worth a side-by-side evaluation. temperature, top_p, and top_k behavior at non-default values is undocumented for this model — test before assuming they work as expected.

Affects you ifYou are building reasoning-heavy workloads on Azure and want a Microsoft-native model alternative to o3/o4; you have procurement or data-licensing constraints that preclude OpenAI-origin modelsEffortModerate (request private preview access; test Chat Completions API shape; verify function calling behavior in your use case)

Microsoft AI | Date: June 2, 2026 | Link: https://microsoft.ai/news/introducing-mai-thinking-1/https://microsoft.ai/news/introducing-mai-thinking-1/

Medium

Microsoft MAI-Code-1-Flash: 5B Coding Model Shipping into GitHub Copilot

What changed

Microsoft launched its first coding-specific model, MAI-Code-1-Flash — a 5B-parameter model trained for code generation and modification; it is rolling out directly to GitHub Copilot individual plans in VS Code as both a model picker option and under the default "auto" picker

TL;DR

MAI-Code-1-Flash (5B parameters) outperforms Claude Haiku 4.5 by 16 percentage points on SWE-Bench Pro while using 60% fewer tokens on complex coding tasks; it is rolling out to GitHub Copilot individual users in VS Code immediately, with enterprise plans following.

Developer signal

If you use GitHub Copilot in VS Code, you will see MAI-Code-1-Flash appear in the model picker as an option. It is also under the default "auto" picker, meaning Copilot may route eligible requests to it without manual selection. For teams evaluating whether to use Copilot vs. direct Claude/GPT API calls for code generation: the 5B size means it will be faster and cheaper-per-token than larger models; if the benchmark numbers hold (self-reported — independent verification pending), it is a meaningful option for first-pass code generation, test scaffolding, and inline completions. For API access beyond GitHub Copilot, the model is available via Azure AI Foundry in the model catalog. No migration steps required for existing Copilot users — it is additive to the model picker.

Affects you ifYou use GitHub Copilot in VS Code and care about which model is routing your requests; you are evaluating coding models via Azure AI FoundryEffortQuick (no action required for Copilot users; model picker exposure is automatic; explicit model selection optional)

Microsoft AI | Date: June 2, 2026 | Link: https://microsoft.ai/news/introducingmai-code-1-flash/https://microsoft.ai/news/introducingmai-code-1-flash/

Medium

Microsoft Aion 1.0: On-Device SLMs Now Ship In-Box with Windows

What changed

Microsoft announced Aion 1.0, a new family of on-device SLMs shipping as part of Windows itself — Aion 1.0 Instruct (CPU-only, any Windows hardware, no NPU/GPU required, available via Windows Copilot Runtime API now) and Aion 1.0 Plan (14B parameter reasoning + tool-calling model, in-box with Windows, 32K context); Windows Local AI Runtime update KB5039239 ships June 9 and enables the expanded on-device AI stack

TL;DR

Aion 1.0 Instruct runs on pure CPU on any Windows PC and is accessible today via the Windows Copilot Runtime API through Edge Insider; Aion 1.0 Plan (14B, 32K context) ships in-box as part of Windows for fully local agentic workflows; open weights land on Hugging Face in July; KB5039239 delivers the runtime on June 9.

Developer signal

For Windows-targeted applications: Aion 1.0 Instruct is the first Windows inbox SLM accessible without NPU or dedicated GPU — access it via window.ai or the Windows Copilot Runtime API in Edge Insider today. This dramatically expands the addressable device surface for on-device inference (no more Copilot+ PC requirement for NPU-based models). Aion 1.0 Plan (14B, 32K context) enables fully local agentic pipelines: tool calling, file management, and sub-agent orchestration without any cloud API call. It ships in-box, meaning no manual model download for end users. For production use: watch for KB5039239 on June 9, which is the Windows Update delivery vehicle for the expanded on-device AI stack. Open weights scheduled for Hugging Face in July will let you run the model outside of Windows in custom stacks. For Edge-embedded applications, a parallel announcement at Build extends on-device AI APIs in Microsoft Edge across CPU and GPU (not just NPU), available in Edge Insider now.

Affects you ifYou are building Windows applications that want on-device inference without cloud API latency or cost; you target consumer Windows hardware where NPU availability is not guaranteedEffortModerate (access via Windows Copilot Runtime API through Edge Insider; production use requires waiting for KB5039239 on June 9; open-weights access requires waiting for July Hugging Face release)

Microsoft / Windows Developer Blog | Date: June 2, 2026 | Link: https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/

API & SDK Changes

Notable

Anthropic: Refusal Responses With Zero Output Are No Longer Billed

What changed

Previously, Claude API requests that returned stop_reason: "refusal" without generating any output were still billed; as of June 2, such requests cost nothing — zero input and output tokens charged

TL;DR

If a Claude API request returns stop_reason: "refusal" with no generated tokens, you are no longer charged; applies immediately to all plans with no code changes required.

Developer signal

No action required — this is a retroactive billing improvement. If you run high-volume workloads where some fraction of requests hit content filters (e.g., user-submitted content pipelines, document analysis, or content moderation use cases), review your cost profile — the savings will depend on your refusal rate. To detect and handle these responses in code, see the Streaming refusals documentation — stop_reason: "refusal" with an empty content array is the indicator. The stop_details field (introduced May 28 with Opus 4.8) returns a category field (cyber, bio, or null) and a human-readable explanation for routing different refusal classes downstream.

Affects you ifYou submit user-generated or potentially policy-violating content to the Claude API and occasionally hit clean refusals (zero-output responses); any high-volume content pipeline where refusal rate is non-zeroEffortQuick (no code changes required; update cost estimates in your billing projections)

Anthropic Platform Release Notes | Date: June 2, 2026 | Link: https://platform.claude.com/docs/en/release-notes/overviewhttps://platform.claude.com/docs/en/release-notes/overview

Notable

Claude Code v2.1.161: Parallel Tool Call Resilience, OTEL Metric Labels, Linux Clipboard Fix

What changed

v2.1.161 ships after v2.1.160 (covered in prior digest); key new items: failed Bash commands in a parallel tool batch no longer cancel other calls in the same batch; OTEL_RESOURCE_ATTRIBUTES values now applied as labels on metric datapoints; fullscreen mode now uses wl-copy/xclip/xsel on Linux for clipboard; agent view shows done/total count; /mcp collapses unused claude.ai connectors

TL;DR

v2.1.161 makes parallel Bash tool calls resilient to individual failures (each returns independently), adds custom dimension slicing to OTEL metrics via OTEL_RESOURCE_ATTRIBUTES, and fixes Linux clipboard behavior in fullscreen mode.

Developer signal

For teams using Claude Code in parallel tool-call workflows: previously a failed Bash command could cancel the entire batch; now each tool returns independently. This matters for agent pipelines that run multiple shell commands in parallel — a timeout or error in one no longer kills the rest. For teams with observability/OTEL instrumentation: set OTEL_RESOURCE_ATTRIBUTES=team=backend,repo=api-service (or similar) in your environment; those key-value pairs now appear as labels on all Claude Code metric datapoints, enabling per-team or per-repo cost and usage slicing in your metrics backend. For Linux users in fullscreen mode: clipboard now uses wl-copy (Wayland), xclip, or xsel with automatic fallback, and writes to both clipboard and PRIMARY selection for middle-click paste. The claude agents row now shows done/total task count upfront, useful for monitoring long-running multi-agent sessions. Also included: fix for claude mcp commands that previously expanded ${VAR} secrets in logs — credentials are now redacted.

Affects you ifYou run Claude Code with parallel Bash tool calls in agent pipelines; you have OTEL observability on Claude Code usage; you use Claude Code in fullscreen mode on Linux; you use claude agents for multi-agent session monitoringEffortQuick (update Claude Code to v2.1.161; set OTEL_RESOURCE_ATTRIBUTES env var if you want metric labeling; no other changes required)

Claude Code Docs / Anthropic | Date: June 2, 2026 | Link: https://code.claude.com/docs/en/changeloghttps://github.com/anthropics/claude-code/blob/main/CHANGELOG.md

Research

No papers cleared the quality gate this period. arXiv cs.AI/cs.CL/cs.LG searches returned 403 errors or no papers matching the minimum criteria (recognized-lab authorship + associated code + concrete benchmark numbers) within the June 2–3 window. Hugging Face Papers Daily returned no qualifying June 3 papers.

Tooling

Notable

llama.cpp June 3 Builds (b9485–b9495): Gemma 4 Unified Vision, Qwen3 SSM Support, KV Cache Reservation

What changed

11 builds shipped June 3 — enabling non-causal vision for Gemma 4 unified models (b9494), adding qwen3 SSM architecture test support and new LLM_KV_ATTENTION_RECURRENT_LAYERS config key (b9488), reserving GPU memory for quantized KV cache at initialization (b9489), fixing unnecessary mmproj downloads when --no-mmproj is set (b9485), updating BoringSSL to 0.20260526.0 (b9487), and fixing a PDL race condition in kernel headers (b9491)

TL;DR

11 llama.cpp builds on June 3 add Gemma 4 unified non-causal vision support, qwen3 SSM architecture handling (with new LLM_KV_ATTENTION_RECURRENT_LAYERS key), GPU KV cache memory pre-reservation at init, and fix unnecessary multimodal projection downloads when --no-mmproj is specified.

Developer signal

If you run Gemma 4 unified multimodal: build from b9494 or later for correct non-causal vision behavior — earlier builds lack this path. If you are evaluating qwen3 SSM variants: b9488 adds architecture test support; the new LLM_KV_ATTENTION_RECURRENT_LAYERS GGUF key is required for qwen3 SSM models — check your GGUF file metadata before running. The KV cache GPU memory pre-reservation (b9489) reduces dynamic allocation overhead for quantized KV cache setups; if you see GPU OOM at startup after this change, you may need to reduce --kv-cache-quant size or context length. For anyone running --no-mmproj: the flag now correctly suppresses multimodal projection downloads at model pull time (b9485) — prior builds would download projection files even when explicitly told not to, wasting bandwidth and storage. BoringSSL update (b9487) is a library hygiene update with no user-facing behavior changes.

Affects you ifYou run Gemma 4 unified or qwen3 SSM models in llama.cpp; you use quantized KV cache with GPU inference; you use --no-mmproj to suppress projection downloadsEffortQuick (rebuild from b9495 or latest; check GGUF metadata for LLM_KV_ATTENTION_RECURRENT_LAYERS if running qwen3 SSM)

llama.cpp GitHub | Date: June 3, 2026 | Link: https://github.com/ggml-org/llama.cpp/releaseshttps://github.com/ggml-org/llama.cpp/releases

Notable

Ollama v0.30.2: Poolside/Laguna Architecture, Radeon 8060S iGPU, Token Counting Fix

What changed

Patch release following v0.30.0 stable — adds poolside/Laguna architecture support, adds Radeon 8060S iGPU detection, fixes token counting to include cached prompt tokens from llama-server, adds llama-server load stall detection, improves SSE ping comment handling, and isolates Codex launch configuration

TL;DR

Ollama v0.30.2 adds support for poolside's Laguna model architecture (enabling ollama run on Laguna-family models), adds Radeon 8060S iGPU support, and fixes a token counting bug where cached prompt tokens from llama-server were excluded from usage counts.

Developer signal

If you track token usage via the Ollama API for billing or budgeting: upgrade to v0.30.2 — cached prompt tokens from llama-server were excluded from reported usage counts in v0.30.0; the fix means reported counts will be higher after upgrade (reflecting actual usage, not undercounting). If you run an Ollama server behind a proxy: the SSE ping comment handling improvement reduces false-positive connection drops. For Radeon 8060S iGPU users: this is the first Ollama release with explicit detection for this chip; upgrade to enable GPU acceleration. The poolside/Laguna architecture addition (14 PRs, 4 contributors) enables ollama run hf.co/<user>/laguna-* once Laguna-family GGUF models are available on Hugging Face — no additional configuration required after upgrading.

Affects you ifYou track token usage via Ollama API (token counts change after upgrade); you run Ollama on a system with a Radeon 8060S iGPU; you deploy Ollama server behind a proxy with SSE keep-alive requirementsEffortQuick (standard version update; note token count reporting change and update any billing projections accordingly)

Ollama GitHub | Date: June 3, 2026 | Link: https://github.com/ollama/ollama/releases/tag/v0.30.2https://github.com/ollama/ollama/releases/tag/v0.30.2

Benchmarks & Leaderboards

No new leaderboard entries or SOTA movements confirmed for June 3, 2026. LMArena frontier range sits between 1,450–1,561 Elo; GPT-5, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4, and DeepSeek V3.2 occupy the frontier band. No new models entered the top tier June 3. SWE-bench standings unchanged from prior digest. The Microsoft self-reported benchmarks for MAI-Thinking-1 (97% AIME 2025, SWE-Bench Pro parity with Opus 4.6) are covered under Model Releases above — they are self-reported and await third-party replication.

Trends & Emerging Tech

Microsoft Builds Its Own AI Stack — and Deploys It Through GitHub and Windows

What's happening

In a single Build keynote, Microsoft announced MAI-Thinking-1 (reasoning), MAI-Code-1-Flash (coding), Aion 1.0 Instruct (on-device, CPU), and Aion 1.0 Plan (on-device, agentic) — all built in-house, without OpenAI components. The distribution channel is Microsoft's existing surface area: GitHub Copilot (MAI-Code-1-Flash rolling out now), GitHub Models (MAI-Thinking-1 free tier), and Windows itself (Aion 1.0 ships in-box). NVIDIA is providing the hardware layer: RTX Spark and Microsoft MXC containers for personal AI agent isolation on Windows.

Why watch this

The practical implication for developers is that Microsoft's AI ecosystem is bifurcating: OpenAI-powered (Azure OpenAI Service, ChatGPT Enterprise) for teams that want OpenAI's model lineage, and MAI-powered (Azure Foundry MAI models, GitHub Copilot) for teams that want Microsoft's data-licensing chain or want to reduce OpenAI dependency. If you build on GitHub Copilot or target Windows end users, you will be running on Microsoft's model stack within months whether you opt in or not — the "auto" picker in Copilot already routes to MAI-Code-1-Flash. The open question is whether Microsoft's model quality ramp will be fast enough to justify the switch for API-level workloads, or whether the first-mover advantage stays with OpenAI/Anthropic for reasoning-heavy tasks.

Microsoft AI / Windows Developer Blog | Date: June 2, 2026 | Link: https://microsoft.ai/news/microsoft-build-2026-mai-keynote-transcript/

Technical Discussions

Nothing cleared the quality bar this period. No Hacker News threads with score >200 and concrete technical depth found for June 3, 2026.

Quick Hits

Anthropic Claude Partner Network — Services Track + Partner Hub launched — Three-tier partner program (Select/Preferred/Global Premier) with a new Partner Hub portal; partners can connect the Hub to Claude via a new MCP connector to query their partnership status. [https://www.anthropic.com/news/services-track-partner-hub]
NVIDIA RTX Spark superchip announced — New Windows PC superchip for personal AI agents, with RTX Spark laptops and desktops arriving from ASUS, Dell, HP, Lenovo, Surface, and MSI this fall; pairs with Microsoft MXC containers for sandboxed agent execution. [https://nvidianews.nvidia.com/news/nvidia-microsoft-windows-pcs-agents-rtx-spark]

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal June 8 (5 days) — URGENT

(Carried — now 5 days, most urgent item)

The Api-Revision: 2026-05-07 opt-out header stops working June 8. Applications using response.outputs structure must migrate to response.steps. Action today: search your codebase for response.outputs and Api-Revision: 2026-05-07. 5 days is the entire remaining window.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (12 days)

(Carried)

claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Migrate to claude-sonnet-4-6-20260217 and claude-opus-4-8 respectively. Review the migration guide before upgrading to Opus 4.8 — adaptive thinking replaces budget_tokens; setting temperature, top_p, or top_k to non-default values returns a 400 error.

Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations

⚠️⚠️⚠️ Gemini CLI Hard Stop — June 18 (15 days)

(Carried)

gemini CLI and Gemini Code Assist IDE extensions stop serving requests for Google AI Pro, Ultra, and free personal users on June 18. Replacement is Antigravity CLI (agy). Audit CLI scripts and CI pipeline steps now — Antigravity CLI does not have 1:1 feature parity.

Google Developers Blog | Link: https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (16 days)

(Carried)

All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API." Takes 2 minutes; no code changes required.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key

⚠️ Gemini Image Models Shutdown — June 25 (22 days)

(Carried)

gemini-3.1-flash-image-preview and gemini-3-pro-image-preview shutting down June 25, 2026. Migrate to stable image model equivalents before the shutdown date.

Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (24 days)

(Carried)

GPT-4.5 being retired from the ChatGPT product surface on June 27; direct API route retirement unconfirmed from primary source. Audit gpt-4.5 model identifiers in code to determine if they target API or ChatGPT-based integrations.

OpenAI Platform Changelog | Link: https://platform.openai.com/docs/changelog

⚠️⚠️⚠️ NVIDIA Nemotron 3 Ultra Weights — June 4 (TOMORROW)

(Urgency upgrade — weights drop tomorrow)

550B total / 55B active open-weights MoE. 48.0 Artificial Analysis Intelligence Index (ahead of Gemma 4 31B at 39, Nemotron 3 Super at 36). Weights go live tomorrow on Hugging Face, ModelScope, OpenRouter, and NVIDIA NIM. Prepare eval pipelines now.

NVIDIA Newsroom | Link: https://nvidianews.nvidia.com/news/nvidia-expands-open-model-families-to-power-the-next-wave-of-agentic-physical-and-healthcare-ai

⚠️ Windows Local AI Runtime — KB5039239 June 9 (6 days)

Windows Update KB5039239 delivers the expanded on-device AI stack (Aion 1.0 runtime, CPU/GPU/NPU support) on June 9. Required for production use of Aion 1.0 Instruct and Aion 1.0 Plan on end-user devices. Aion 1.0 open weights land on Hugging Face in July.

Windows Developer Blog | Link: https://blogs.windows.com/windowsdeveloper/2026/06/02/build-2026-furthering-windows-as-the-trusted-platform-for-development/

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

(Carried — status unchanged)

No timeline given; speculative forecast puts earliest general access at September 2026. Currently: no public API, no claude.ai access at any tier.

Anthropic | Link: https://www.anthropic.com/news/expanding-project-glasswing

Gemini 3.5 Pro — Expected July 2026

(Carried — third-party reporting only, no official announcement)

Sundar Pichai stated "give us until next month" at Google I/O 2026 (May 19). No official announcement, pricing, model ID, or benchmark numbers. Watch for July announcement.

Filtered from 30+ primary sources against a published quality rubric. No press releases, no fluff — only what changes what you build.

Breaking Changes

Model Releases

Microsoft MAI-Thinking-1: First In-House Reasoning Model, Trained Without OpenAI Data

Microsoft MAI-Code-1-Flash: 5B Coding Model Shipping into GitHub Copilot

Microsoft Aion 1.0: On-Device SLMs Now Ship In-Box with Windows

API & SDK Changes

Anthropic: Refusal Responses With Zero Output Are No Longer Billed

Claude Code v2.1.161: Parallel Tool Call Resilience, OTEL Metric Labels, Linux Clipboard Fix

Research

Tooling

llama.cpp June 3 Builds (b9485–b9495): Gemma 4 Unified Vision, Qwen3 SSM Support, KV Cache Reservation

Ollama v0.30.2: Poolside/Laguna Architecture, Radeon 8060S iGPU, Token Counting Fix

Benchmarks & Leaderboards

Trends & Emerging Tech

Microsoft Builds Its Own AI Stack — and Deploys It Through GitHub and Windows

Technical Discussions

Quick Hits

Worth Watching (Announced, Not Yet Shipped)

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal **June 8 (5 days)** — URGENT

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement **June 15 (12 days)**

⚠️⚠️⚠️ Gemini CLI Hard Stop — **June 18 (15 days)**

⚠️⚠️ Gemini API Unrestricted Key Deadline — **June 19 (16 days)**

⚠️ Gemini Image Models Shutdown — **June 25 (22 days)**

⚠️ GPT-4.5 Retirement from ChatGPT — **June 27 (24 days)**

⚠️⚠️⚠️ NVIDIA Nemotron 3 Ultra Weights — **June 4 (TOMORROW)**

⚠️ Windows Local AI Runtime — **KB5039239 June 9 (6 days)**

Claude Mythos — Public Release "Once Stronger Safeguards Ready"

Gemini 3.5 Pro — Expected July 2026

⚠️⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal June 8 (5 days) — URGENT

⚠️⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (12 days)

⚠️⚠️⚠️ Gemini CLI Hard Stop — June 18 (15 days)

⚠️⚠️ Gemini API Unrestricted Key Deadline — June 19 (16 days)

⚠️ Gemini Image Models Shutdown — June 25 (22 days)

⚠️ GPT-4.5 Retirement from ChatGPT — June 27 (24 days)

⚠️⚠️⚠️ NVIDIA Nemotron 3 Ultra Weights — June 4 (TOMORROW)

⚠️ Windows Local AI Runtime — KB5039239 June 9 (6 days)