AI Developer Digest
4 items passed quality gate | 26 candidates evaluated | 22 excluded | Sources checked: 26 Scan window: May 29 (post-prior-scan) – May 30, 2026. Prior digest covered: Opus 4.8 fast mode pricing ($10/$50/MTok); Claude Platform on AWS Managed Agents (webhooks, multiagent, self-hosted sandboxes); Claude Code v2.1.154 + v2.1.156 (Opus 4.8 integration, thinking-block bug fix); llama.cpp b9411 (DeepSeek V3.2 local inference).
This Week's Signal
Two major tooling releases define today's digest. vLLM v0.22.0 is the headline: 459 commits, and EAGLE 3.1 speculative decoding ships in it delivering 2.03× throughput at concurrency 1 — making long-context speculative decoding reliably fast for the first time. Claude Code v2.1.157 overhauled the plugin system: skills in
.claude/skillsnow auto-load without marketplace registration, removing the main friction point for teams wanting custom Claude Code extensions. Both items reward a few hours of hands-on testing. The most urgent action today is neither: GitHub Copilot metered billing and Gemini 2.0 Flash shutdown both activate tomorrow (June 1). Check your Copilot usage preview and migration status today.
Must-reads this digest:
- vLLM v0.22.0 with EAGLE 3.1 — 2.03× per-user throughput, 28.9% e2e latency gain via FP8; if you run vLLM inference, this is the most impactful single upgrade in months
- Claude Code v2.1.157 — local plugins without marketplace,
claude plugin init, 20+ bug fixes; the skills system is now practical for teams without waiting for marketplace approval - GitHub Copilot billing changes TOMORROW (June 1) — agentic sessions can exhaust a Pro plan's monthly credits in a single session; check your usage preview today
[BREAKING] Breaking Changes
No breaking changes to APIs or SDKs this period. However, two deadline-triggered events activate tomorrow (June 1) that are effectively breaking for affected workflows — see Worth Watching section for migration steps.
Tooling
[HIGH] Claude Code v2.1.157 — Local Plugin Auto-Load, claude plugin init, Agent Settings, and 20+ Bug Fixes
Source: Anthropic (code.claude.com) | Date: May 29, 2026 | Link: https://code.claude.com/docs/en/changelog
What changed: Plugins in .claude/skills directories now auto-load without marketplace registration. Previously, skills were installed through the Claude marketplace or required manual configuration; now any plugin placed in the .claude/skills directory is automatically discovered and loaded at session start. Added claude plugin init <name> scaffold command. The agent field in settings.json is now honored for dispatched sessions (override with --agent <name>). EnterWorktree can switch between Claude-managed worktrees mid-session. tool_decision telemetry events now include tool_parameters when OTEL_LOG_TOOL_DETAILS=1.
TL;DR: Claude Code v2.1.157 removes marketplace dependency for custom plugins — any .claude/skills directory is auto-loaded, enabling teams to ship and iterate on local skills without waiting for marketplace approval.
Developer signal: If you're maintaining custom Claude Code skills or evaluating the skills system for your team, this is the release to test against. The auto-load change means you can: (1) claude plugin init my-skill to scaffold a new skill in .claude/skills/my-skill/, (2) restart Claude Code — the skill loads automatically with no registration step. The agent field in settings.json is now meaningful for agentic workflows: set "agent": "my-agent-profile" in your project's settings.json to make dispatched sessions always use that agent configuration, with --agent <name> available as a per-invocation override. Separately, EnterWorktree mid-session switching means you no longer need to restart Claude Code when moving between Claude-managed worktrees in the same project. The 20+ bug fixes in this release include: WSL image paste (alt+v), Windows 11 screenshot paste, Windows Explorer drag-and-drop, right-click paste duplication in VS Code/Cursor/Windsurf, background session orphaned worktrees, sandbox network permission prompts in auto mode, literal markdown markers appearing in fullscreen mode, and terminal freezing after managed-settings security dialogs. Update: npm update -g @anthropic-ai/claude-code.
Affects you if: You are building or using custom Claude Code plugins/skills; you are running Claude Code in agentic workflows with dispatched sessions; you are using Claude Code on WSL or Windows with integrated terminal issues.
Adoption effort: Quick (update Claude Code; place plugins in .claude/skills/ — no registration needed)
Primary source: https://code.claude.com/docs/en/changelog
Quality gate score: 9 (official Anthropic source +3, concrete feature list with specific commands/APIs +2, primary source link +2, within window +1, technical audience +1)
[MEDIUM] Claude Code v2.1.158 — Auto Mode Available on Bedrock, Vertex, and Foundry for Opus 4.7/4.8
Source: Anthropic (code.claude.com) | Date: May 30, 2026 | Link: https://code.claude.com/docs/en/changelog
What changed: Auto mode — which dynamically selects between fast completion and deep reasoning based on task complexity — is now available for Opus 4.7 and Opus 4.8 on Amazon Bedrock, Google Vertex AI, and Microsoft Foundry deployments. Previously, Auto mode was only available through the first-party Claude API and Claude Code Max plan. Opt in via CLAUDE_CODE_ENABLE_AUTO_MODE=1.
TL;DR: Claude Code v2.1.158 extends Auto mode (adaptive fast/deep switching) to Bedrock, Vertex, and Foundry for Opus 4.7 and Opus 4.8 — enterprise deployments on cloud platforms now get the same adaptive reasoning depth as first-party API users.
Developer signal: If you're running Claude Code through Bedrock, Vertex, or Foundry and have been using Opus 4.7 or 4.8 with static effort settings, Auto mode lets the model self-select reasoning depth per turn rather than requiring you to choose between fast and careful modes up front. Enable with CLAUDE_CODE_ENABLE_AUTO_MODE=1 in your environment. Note that Auto mode on cloud platforms still doesn't include fast mode (the 2.5× speed acceleration at 2× price) — fast mode for Opus 4.8 remains Claude API-only per the May 28 release notes. If you're on the first-party Claude API and Claude Code Max plan, Auto mode was already available; this release adds parity for enterprise cloud-platform deployments. Useful for agentic coding loops where simple navigation turns don't need Opus-level reasoning but complex refactoring turns do.
Affects you if: You deploy Claude Code via Amazon Bedrock, Google Vertex AI, or Microsoft Foundry and want adaptive reasoning depth without manually switching effort levels per task.
Adoption effort: Quick (set CLAUDE_CODE_ENABLE_AUTO_MODE=1 — no code changes, no config migration)
Primary source: https://code.claude.com/docs/en/changelog
Quality gate score: 9 (official Anthropic source +3, concrete env var and model names +2, primary source link +2, within window today +1, technical audience +1)
[HIGH] vLLM v0.22.0 — EAGLE 3.1 (2.03× Throughput), 28.9% FP8 Latency Improvement, DeepSeek V4 Hardening, Rust Frontend Preview
Source: vllm-project/vllm (GitHub) | Date: May 29, 2026 | Link: https://github.com/vllm-project/vllm/releases/tag/v0.22.0
What changed: vLLM v0.22.0 ships EAGLE 3.1 speculative decoding (previously in preview, now integrated as config-driven extension with full backward compatibility for EAGLE 3 checkpoints), batch-invariant Cutlass FP8 inference (28.9% e2e latency reduction), CutlassFP8 padding preprocessing (+13.5% TTFT), padded NVFP4 quantization (+2.4–5.7% e2e), Model Runner V2 advancement (Qwen3-dense-by-default oracle, sleep-mode weight reload, shared KV-cache layers), DeepSeek V4 with NVFP4 fused MoE and full CUDA graph, and an experimental Rust frontend. New thinking_token_budget API parameter and API-key authorization for /v2 endpoints are also included. Breaking: removed old get_tokenizer and resolve_hf_chat_template import locations; removed deprecated MLA prefill arguments; environment variables for backend selection now replaced by --moe-backend / --linear-backend flags.
TL;DR: vLLM v0.22.0 delivers 2.03× per-user output throughput via EAGLE 3.1 speculative decoding at C=1 (1.66× at C=16), 28.9% end-to-end latency improvement via batch-invariant FP8, and DeepSeek V4 with NVFP4 fused MoE in 459 commits from 230 contributors.
Developer signal: This is the most impactful vLLM upgrade in several months. Three distinct things to act on: (1) EAGLE 3.1 — if you're running Kimi K2.x, Qwen3, or other models with available EAGLE draft checkpoints, upgrading to v0.22.0 and enabling EAGLE 3.1 via the config extension delivers 2.03× per-user throughput at concurrency 1 (1.71× at C=4, 1.66× at C=16) on SPEED-Bench. The key technical fix vs. EAGLE 3 is FC normalization after each target hidden state plus post-norm hidden state feeding — this eliminates attention drift that was causing acceptance length degradation in long-context workloads. EAGLE 3 checkpoints remain fully compatible. (2) FP8 improvements — the batch-invariant Cutlass FP8 path yields 28.9% e2e latency reduction and the CutlassFP8 padding preprocessing delivers +13.5% TTFT improvement; no config changes needed if you're already using FP8 quantization — the improvements apply automatically. (3) Breaking changes: if your codebase imports get_tokenizer or resolve_hf_chat_template from old vLLM locations, you will get import errors on upgrade; check your import paths before deploying. MLA prefill arguments deprecated in v0.21.x are now removed — use --moe-backend and --linear-backend flags instead of the old environment variable equivalents. CUDA 12.9 wheels now use PyTorch manylinux_2_28 base — verify your base image is compatible. FlashInfer bumped to v0.6.11.post2 and nvidia-cutlass-dsl to 4.5.2.
Affects you if: You are running vLLM for inference serving (FP8 improvements apply broadly); you are using speculative decoding with EAGLE models (EAGLE 3.1 upgrade available); you are importing from vllm.utils.tokenizer or vllm.chat_template old paths (breaking change on upgrade); you are using DeepSeek V4 via vLLM (NVFP4 fused MoE now supported).
Adoption effort: Moderate (upgrade package; verify import paths; check CUDA wheel compatibility; configure EAGLE 3.1 via config extension if using speculative decoding)
Primary source: https://github.com/vllm-project/vllm/releases/tag/v0.22.0 | EAGLE 3.1 blog: https://vllm.ai/blog/2026-05-26-eagle-3-1
Quality gate score: 9 (official GitHub release +3, concrete benchmark numbers 2.03×/28.9%/13.5% +2, primary source link +2, within window +1, technical audience +1)
API & SDK Changes
Nothing new this period. The Anthropic Platform release notes show no entries dated May 30, 2026 (most recent entry: May 29 — AWS Managed Agents, covered in prior digest).
Research
Nothing cleared the quality bar this period. arXiv cs.AI and cs.CL listing pages returned 403 at fetch time. HuggingFace Papers Daily returned 403. No papers surfaced via search meeting the bar of: recognized lab authorship + associated code repo + benchmark numbers + within the 24h window simultaneously.
Benchmarks & Leaderboards
Nothing new in the 24-hour scan window. SWE-bench Verified leaderboard stands at: Claude Mythos Preview 93.9%, Claude Opus 4.8 88.6%, GPT-5.5 88.7% — all confirmed from prior scan window. No new model additions to LMArena text/code leaderboards confirmed within window (most recent confirmed additions: mai-image-2.5-preview May 26, qwen3.7-max May 25).
Trends & Emerging Tech
Claude Code's Plugin Ecosystem Is Maturing — Local-First Development Now Practical
Source: Anthropic (code.claude.com) | Date: May 29–30, 2026 | Link: https://code.claude.com/docs/en/changelog
What's happening: Claude Code v2.1.157's auto-load of .claude/skills plugins, combined with the May 28 launch of Claude Code Workflows (research preview), represents a pattern shift in how teams extend Claude Code: the locus of customization is moving from marketplace-registered extensions to local project-level skills checked into the repo. The claude plugin init scaffolding command, autocomplete for /plugin, the agent field in settings.json, and the "Workflow keyword trigger" config toggle all point toward a more programmable Claude Code that teams configure once per project and commit rather than configure per-user per-machine.
Why watch this: Teams that previously avoided Claude Code customization because of marketplace friction or per-seat setup overhead should re-evaluate. The pattern emerging is: skills in .claude/skills/ define what Claude can do in this repo, settings.json defines which agent profile runs by default, and Workflows define repeatable multi-step patterns. This is converging toward something closer to a repo-local "Claude configuration" that travels with the codebase. The practical experiment to run this week: scaffold a skill for your most common Claude Code interaction pattern (e.g., "run tests and summarize failures") and validate that it auto-loads for every team member without individual setup.
Technical Discussions
[MEDIUM] GitHub Copilot Metered Billing Generates 900 Downvotes, 400 Comments on the Day Before Activation
Source: GitHub Community Discussion | Date: May 29–30, 2026 | Link: https://github.com/orgs/community/discussions/192948 What changed: GitHub Copilot billing switches from "premium requests" to "AI Credits" on June 1. Code completions remain free; all other interactions consume AI Credits at token-based rates. The community thread — 400+ comments and 900 downvotes as of today — is substantive: developers are posting documented estimates of per-session credit consumption for agentic workflows. TL;DR: GitHub Copilot Pro users get 1,000 AI Credits/month ($10/month); agentic Copilot sessions have been documented consuming 30–40 credits per session, meaning Pro-tier users can exhaust their monthly allotment in a single heavy agentic session starting tomorrow. Developer signal: The specific numbers that matter for planning: Copilot Pro (1,000 credits/month at $10), Pro+ (3,900 credits/month at $39), Business (1,900 credits/user/month at $19/user), Enterprise (3,900 credits/user/month at $39/user). Code completions and Next Edit Suggestions do not consume credits. Agentic sessions (multi-step planning, research, execution) do consume credits at token-based rates using the listed API rates per model. Developers reporting $30–40/session credit consumption are typically running Copilot agent workflows against large codebases with long context. If your team uses Copilot primarily for completions and occasional one-shot chat, you are likely within credit limits. If you run regular agentic refactoring or long multi-turn sessions, check the GitHub billing preview today (GitHub → Settings → Billing & plans → Copilot usage preview) to see projected usage. A second related change: Copilot code review begins consuming GitHub Actions minutes on June 1 — check your Actions minutes balance if you have Copilot code review enabled on PRs. Affects you if: You use GitHub Copilot for agentic coding workflows with multi-step sessions; you have Copilot code review enabled on pull requests; you are budgeting Copilot costs for your team. Adoption effort: Quick (check GitHub billing preview today; no code changes required — the billing model changes on GitHub's end on June 1) Primary source: https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/ | https://github.blog/changelog/2026-04-27-github-copilot-code-review-will-start-consuming-github-actions-minutes-on-june-1-2026/ Quality gate score: 7 (concrete credit amounts and documented session consumption estimates +2, primary source link to official GitHub blog +2, within window today +1, technical audience +1, concrete data from community thread +1)
Quick Hits
- llama.cpp b9434 (May 30, 14:25 UTC) — TP granularity fix for Qwen 3.5/3.6 models on 3-GPU tensor parallel setups; resolves a bug where afmoe TP (Mixture of Experts tensor parallelism) was incorrectly computing granularity for these architectures. [https://github.com/ggml-org/llama.cpp/releases/tag/b9434]
Worth Watching (Announced, Not Yet Shipped)
⚠️⚠️⚠️ GitHub Copilot — Metered Billing LIVE TOMORROW (June 1)
(Carried from May 21–29 digests — now in final hours) Source: GitHub Blog | Link: https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/ All GitHub Copilot plans switch to AI Credit token-based billing on June 1. Action today: Check your usage preview at GitHub → Settings → Billing & plans → Copilot usage preview. Agentic sessions can exhaust a Pro plan (1,000 credits/$10/month) in a single session. See Technical Discussions above for specifics.
⚠️⚠️⚠️ Gemini 2.0 Flash + 2.0 Flash Lite — Shutdown LIVE TOMORROW (June 1)
(Carried from May 21–29 digests — now in final hours)
Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/deprecations
gemini-2.0-flash and gemini-2.0-flash-lite return errors starting June 1. Migration: For cost-first pipelines → gemini-2.5-flash-lite ($0.10/$0.40/MTok, same price as 2.0 Flash, 8× output token limit). For quality-first → gemini-2.5-flash ($0.30/$2.50/MTok — 3× input and 6.25× output cost increase vs. 2.0 Flash). Search your codebase for gemini-2.0-flash string today.
⚠️⚠️ Gemini API Legacy Schema (Interactions) — Hard Removal June 8 (9 days)
(Carried from May 26 digest)
Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/interactions-breaking-changes-may-2026
The Api-Revision: 2026-05-07 opt-out header stops working June 8. Applications still using response.outputs structure must migrate to response.steps.
⚠️⚠️ Claude Sonnet 4 + Opus 4 — Retirement June 15 (16 days)
(Carried from May 22–29 digests)
Source: Anthropic | Link: https://platform.claude.com/docs/en/about-claude/model-deprecations
claude-sonnet-4-20250514 and claude-opus-4-20250514 return errors June 15. Migration: Sonnet 4 → claude-sonnet-4-6-20260217; Opus 4 → claude-opus-4-8.
⚠️ Gemini API Unrestricted Key Deadline — June 19 (20 days)
(Carried from May 21–29 digests) Source: Google AI for Developers | Link: https://ai.google.dev/gemini-api/docs/api-key All unrestricted Gemini API keys blocked June 19. Restrict via AI Studio → API Keys → "Restrict to Gemini API."
⚠️ Claude Mythos — Public Release Expected "In Coming Weeks"
(Preview announced April 7, 2026; first confirmed public benchmarks May 28) Source: Anthropic | Link: https://anthropic.com/glasswing Claude Mythos Preview leads SWE-bench Verified at 93.9% (5.3pp above Opus 4.8). Broad API access delayed while Anthropic finalizes cybersecurity safeguards. No model ID, pricing, or exact GA date disclosed.
Ollama v0.30.0 — Still Pre-Release (rc23 as of May 22)
(Carried from May 15 digest) Source: Ollama (GitHub) | Link: https://github.com/ollama/ollama/releases v0.30.0 restructures Ollama to use llama.cpp directly as backend, with MLX for Apple Silicon. No stable GA date announced.
<details> <summary>🔭 Horizon — Open Questions, Emerging Patterns & Grounded Speculation</summary>
This section operates under different rules than the digest above. Evidence-grounded speculation is allowed. Pure prediction is not. Every claim here must cite a source from this digest or a real paper/benchmark. Label each entry by type so the reader knows what kind of thinking they're engaging with.
[PATTERN] Local-first tooling customization is becoming the dominant pattern across Claude Code and inference runtimes simultaneously
This digest's two headline items both move customization closer to the developer and away from centralized registries. Claude Code v2.1.157 drops the marketplace requirement for skills — .claude/skills auto-loads. vLLM v0.22.0 ships EAGLE 3.1 as a "config-driven extension" with full backward compatibility for EAGLE 3 checkpoints. Both represent the same architectural preference: extend-at-config, not extend-at-registration. The practical implication is that team-level infrastructure customization (Claude Code skills, inference draft models) can now be version-controlled, repo-local, and deployable without registry friction. Teams that treat their AI tooling as code (skills in .claude/skills/, EAGLE draft model paths in vLLM config, settings.json agent fields committed to the repo) gain iteration speed that teams depending on marketplace-mediated extensions will not.
Grounded in: Claude Code v2.1.157 auto-load of .claude/skills (this digest); vLLM v0.22.0 EAGLE 3.1 "config-driven extension with full backward compatibility" (this digest)
[OPEN QUESTION] At 2.03× per-user throughput, when does EAGLE 3.1 speculative decoding break even against the cost of maintaining draft models? EAGLE 3.1 delivers 2.03× throughput at C=1 on Kimi K2.6 NVFP4 on SPEED-Bench. The cost is: (1) a draft model that must be kept in sync with each target model you serve, (2) VRAM for the draft model (~7B parameters for most EAGLE draft models), (3) engineering overhead to train or source compatible EAGLE checkpoints when the target model updates. At 2.03× throughput, you effectively halve your inference infrastructure cost at low concurrency — the draft model's VRAM overhead is likely worth it once your target model is at Kimi-K2.6 or Qwen3 scale. The open question is how quickly EAGLE draft models become available after a new target model releases, and whether community-sourced EAGLE checkpoints (from HuggingFace or the EAGLE team's repo) have the same acceptance quality as first-party trained ones. The EAGLE 3 checkpoint compatibility guarantee in v0.22.0 helps — you can upgrade inference without retraining — but target model changes still require new draft model checkpoints. Grounded in: EAGLE 3.1 2.03× throughput at C=1 on Kimi K2.6 NVFP4 (this digest, vLLM v0.22.0); EAGLE 3.1 "config-driven extension with full backward compatibility for EAGLE 3 checkpoints" (this digest)
[TENSION] GitHub Copilot's billing switch arrives the same week that Claude Code's plugin system becomes significantly easier to use — this is competitive pressure landing at an inflection point
GitHub Copilot's move to metered billing for agentic workflows (Pro: 1,000 credits/month, ~$30–40/session for heavy usage per community reports) coincides with Claude Code v2.1.157 removing the primary friction point for teams evaluating it: marketplace registration for custom skills. A Claude Code Max plan user running local skills in .claude/skills/ with Auto mode on Bedrock now has a materially better-configured local experience than they did a week ago. This doesn't mean Copilot is losing — it has deep IDE integrations and existing enterprise contracts — but the timing is notable: developers who are repricing their Copilot usage and looking for alternatives will find Claude Code's skills system significantly more approachable than it was 48 hours ago. Worth watching whether the GitHub Copilot community backlash (900 downvotes) translates into any measurable shift in Claude Code adoption metrics over the next 2–4 weeks.
Grounded in: GitHub Copilot metered billing June 1, documented 1,000 credits/month for Pro, $30–40/session community estimates (this digest Technical Discussions); Claude Code v2.1.157 plugin auto-load without marketplace (this digest)
[IF THIS CONTINUES] vLLM's bi-weekly major release cadence with 200–750 contributors per release is compressing the open-to-enterprise inference quality gap faster than anyone anticipated vLLM v0.20.0 had 752 commits from 320 contributors; v0.22.0 has 459 commits from 230 contributors (63 new). That's approximately one major release every two weeks, each compressing the gap between open-source inference quality and commercial hosted API latency. EAGLE 3.1 at 2.03× throughput is a direct challenge to the throughput advantages of hosted inference APIs — and it's available free, self-hosted, to anyone running the right hardware. If this pace holds through 2026 (and there's no sign it won't — the Q2 2026 vLLM roadmap is public at github.com/vllm-project/vllm/issues/39749), teams evaluating whether to maintain self-hosted inference vs. pay hosted-API rates will find the cost crossover point moving steadily toward "self-host" as performance parity improves. Grounded in: vLLM v0.22.0 459 commits from 230 contributors (63 new), 2.03× EAGLE 3.1 throughput, 28.9% FP8 latency improvement (this digest); vLLM Q2 2026 roadmap (github.com/vllm-project/vllm/issues/39749)
[BUILDER'S ANGLE] EAGLE 3.1 + vLLM v0.22.0 thinking_token_budget together unlock cost-sensitive reasoning inference
vLLM v0.22.0 adds thinking_token_budget as an API parameter — letting you cap the tokens a reasoning model spends thinking before generating a response. Combined with EAGLE 3.1's 2.03× throughput improvement, this creates a new operating point: reasoning at scale with constrained cost. Previously, running a reasoning model (Kimi K2.6, Qwen3-thinking variants) with speculative decoding and budget-capped thinking was not cleanly supported in a single vLLM request. Now it is. The practical pattern: thinking_token_budget: 2000 on tasks where you want some reasoning but not full extended thinking, plus EAGLE 3.1 draft model for the output generation phase. This is particularly relevant for agentic pipelines that need smarter than base-model reasoning on intermediate steps but can't afford unconstrained extended thinking token costs per step.
Grounded in: vLLM v0.22.0 thinking_token_budget API parameter (this digest); EAGLE 3.1 2.03× throughput on Kimi K2.6 NVFP4 (this digest)
Excluded: 22 items below quality gate threshold or outside scan window.
Near-misses: Anthropic $47B run-rate revenue / $65B Series H (Simon Willison May 29 — business/financial news, no developer-technical signal); Transformers v5.6.0 (April 22, 2025 — well outside window); Bedrock AgentCore BYOF/S3/EFS, Memory metadata, GovCloud, Performance Optimization (date not confirmed within 24h window — "May 2026" per AWS but no day-level date accessible); OpenAI return_token_budget for Responses API web search (date not confirmed within window — platform.openai.com/docs/changelog returned 403); GPT-5.4 (March 5, 2026 — outside window); GPT-5.5 (April 23, 2026 — outside window); LiteLLM v1.88.0-dev.1 (dev pre-release, not stable — last stable v1.87.2 was earlier); Ollama v0.30.0 (still rc23 — no stable GA in window); EAGLE 3.1 blog post (May 26 — 3 days outside 24h window; included as mechanism reference for vLLM v0.22.0 entry which is within window); HuggingFace Papers Daily (403 error — no papers evaluated); arXiv cs.AI and cs.CL (403 error — no papers evaluated); LMArena new entries (none confirmed within May 29–30 window; most recent: mai-image-2.5-preview May 26); SWE-bench new entries (none confirmed in window); Mistral AI Now Summit industrial physics AI (not developer-API-relevant — physics/robotics domain); Together AI/Groq/Fireworks (no confirmed new items within window); Bedrock Spring AI SDK GA (date not confirmed in window); OpenAI Realtime 2 / Sora updates / GPT-5.4 (all confirmed from earlier months, outside 24h window); Anthropic acquires Vercept / $50B infrastructure investment (business news, no API-level changes in window); Gemini 3.5 Flash (Google I/O May 19 — 11 days outside window, previously covered); llama.cpp b9412–b9433 (checked release notes — routine ggml sync, test additions, and minor fixes with no user-facing inference changes).